Understanding and Installing DeepSeek-Coder
Thursday, Jan 2, 2025 | 8 minute read
Revolutionary AI tool designed to boost coding efficiency 💻🚀, supports 80+ languages 🌐, offers various model sizes 📊, excels in performance benchmarks 🏆, and features powerful completion capabilities, transforming coding into a seamless experience!✨
Deep Dive: What is DeepSeek-Coder? 🤖
“In today’s fast-paced technological landscape, understanding the value of AI programming tools is more important than ever.” 🌍
With the rapid advancement of artificial intelligence, a plethora of tools have emerged to assist developers in boosting their productivity and coding skills. In this context, the advent of DeepSeek-Coder marks a revolutionary shift in the programming world. 💥
DeepSeek-Coder is an innovative open-source tool launched by DeepSeek, a company based in China, in 2023 that focuses on the development of AI programming language models. 🇨🇳✨ This tool has been meticulously trained on an extensive dataset, comprising over 20 trillion tokens, with 87% being code and 13% comprised of natural language in both Chinese and English. 📝💻 The core functionality of this groundbreaking software lies in its ability to help developers generate and refine code, ultimately enhancing programming efficiency. Additionally, DeepSeek-Coder is also dedicated to advancing Artificial General Intelligence (AGI) technology, aiming to open new avenues for the future of programming. 🌟
Breaking the Mold: What Makes DeepSeek-Coder Unique 🔍
DeepSeek-Coder distinguishes itself from traditional programming models in several key ways:
-
Diverse Training Data: DeepSeek-Coder has been trained on a wide array of code across more than 80 programming languages, ensuring its broad applicability. 🌐
-
Variety of Model Sizes: The project offers a range of models with different parameters, including 1.3B, 5.7B, 6.7B, and 33B, to cater to various project needs. 📊
-
Outstanding Performance Metrics: Compared to existing programming models, DeepSeek-Coder has excelled in multiple competitive benchmarks, showcasing its high efficiency and accuracy. 🏆
-
Powerful Code Completion Capabilities: DeepSeek-Coder features a 16K context window, allowing it to perform exceptionally well in completing lengthy code segments. 📈
Back to Basics: Why Developers Favor DeepSeek-Coder ❤️
Thanks to its many advantages, DeepSeek-Coder has quickly gained the favor of developers, notably through:
-
Rich Resource Support: DeepSeek actively provides resources for the AI community, including model access, tool demonstrations, and integrations on platforms like GitHub and Hugging Face, ensuring developers can easily get started. 📚
-
Community Participation Convenience: Users can find detailed documentation, updates, and community discussions on DeepSeek’s official website, enhancing their experience. 🌍
-
Broad Practical Application Scenarios: In evaluations such as HumanEval and MBPP, the DeepSeek-Coder model not only meets practical programming needs but also outperforms many competitors, such as CodeLLama, demonstrating superior adaptability and flexibility. 💪
In these respects, DeepSeek-Coder leverages cutting-edge technology to ensure that developers work more efficiently in code writing and smart completion, making it a vital tool in the programming domain. 🔧💡
How to Install and Use DeepSeek-Coder 📦
To begin using DeepSeek-Coder, make sure that you have Python and pip installed in your environment. Then, open a terminal and run the following command to install the required dependencies:
pip install -r requirements.txt
- This command will read the
requirements.txt
file, which lists all the libraries and versions needed for DeepSeek-Coder. Ensure you have an internet connection as pip will need to download these packages from the Python Package Index (PyPI).
Additionally, you can access the Hugging Face Space to view a demo of DeepSeek-Coder. If you wish to run the demo locally, simply execute the app.py
file to start the application.
Feature Introduction ⚙️
DeepSeek-Coder offers three main functionalities to assist you in code generation, insertion, and inference:
- Code Completion
- Code Insertion
- Chat Model Inference
Next, these functionalities will be introduced in detail, along with code examples.
1) Code Completion 🔍
To use the code completion feature, you need to run the following code:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
# Input text requesting the model to generate a quick sort algorithm
input_text = "#write a quick sort algorithm"
# Tokenize the input text and convert it to a PyTorch tensor, subsequently moving it to the model's device (GPU)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate the code output with a maximum length of 128
outputs = model.generate(**inputs, max_length=128)
# Decode the output tensor into human-readable text and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Code Explanation:
AutoTokenizer
andAutoModelForCausalLM
are used to load the tokenizer and model respectively. The parametertrust_remote_code=True
indicates that you trust the code available from the Hugging Face community to use community-uploaded models.- The
cuda()
method moves the model to the GPU, significantly speeding up computations, especially during complex calculations. - The method
tokenizer(input_text, return_tensors="pt")
encodes the input text into a PyTorch tensor and moves it to the appropriate computing device (CPU/GPU) usingto(model.device)
. - The method
model.generate
is used to produce output, withmax_length
limiting the maximum length of the output. - Finally, the
tokenizer.decode
method converts the produced tensor back to a human-readable string andskip_special_tokens=True
ensures that special tokens are omitted.
2) Code Insertion ➕
To implement code insertion functionality, use the following code:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
# Input text includes a code snippet that needs to be inserted
input_text = """<\xef\xbd\x9cfim\xe2\x96\x81begin\xef\xbd\x9c>def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
left = []
right = []
<\xef\xbd\x9cfim\xe2\x96\x81hole\xef\xbd\x9c>
if arr[i] < pivot:
left.append(arr[i])
else:
right.append(arr[i])
return quick_sort(left) + [pivot] + quick_sort(right)<\xef\xbd\x9cfim\xe2\x96\x81end\xef\xbd\x9c>"""
# Tokenize the input text and convert it to a PyTorch tensor, subsequently moving it to the model's device (GPU)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate the code output with a maximum length of 128
outputs = model.generate(**inputs, max_length=128)
# Decode the output tensor to human-readable text and print, omitting the original input text to only show the new inserted content
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
Code Explanation:
- Special markers are used in the input text (
<\xef\xbd\x9cfim\xe2\x96\x81begin\xef\xbd\x9c>
) to define the start and end of code blocks. - The output is extracted using
tokenizer.decode(outputs[0], skip_special_tokens=True)
and by subtracting the length of the original input, only the newly generated content is displayed.
3) Chat Model Inference 💬
Here’s an example code for performing chat model inference:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
# Define the message content
messages=[
{'role': 'user', 'content': "write a quick sort algorithm in python."}
]
# Apply chat template and convert input text to a PyTorch tensor, subsequently moving it to the model's device
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# Generate output
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
# Decode the output into human-readable text and print
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
Code Explanation:
- In this instance, the message list defines the user’s request, which the model will infer to generate the appropriate code.
- The input text is formatted using
apply_chat_template
for the model to understand the context better. - Output generation parameters such as
max_new_tokens
,do_sample
, etc., control the variety and length of the produced content.
How to Fine-tune DeepSeek-Coder ⚙️🛠️
If you wish to fine-tune DeepSeek-Coder on a custom task, start by installing the required packages:
pip install -r finetune/requirements.txt
Next, prepare your training data and follow the sample dataset format.
You can run the sample shell script as follows:
DATA_PATH="<your_data_path>"
OUTPUT_PATH="<your_output_path>"
MODEL="deepseek-ai/deepseek-coder-6.7b-instruct"
cd finetune && deepspeed finetune_deepseekcoder.py \
--model_name_or_path $MODEL_PATH \
--data_path $DATA_PATH \
--output_dir $OUTPUT_PATH \
--num_train_epochs 3 \
--model_max_length 1024 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 100 \
--save_total_limit 100 \
--learning_rate 2e-5 \
--warmup_steps 10 \
--logging_steps 1 \
--lr_scheduler_type "cosine" \
--gradient_checkpointing True \
--report_to "tensorboard" \
--deepspeed configs/ds_config_zero3.json \
--bf16 True
Code Explanation:
- Most parameters like
num_train_epochs
control the number of iterations for the model, andlearning_rate
adjusts the learning rate, affecting the training process for better model performance. - Using
deepspeed
can speed up training and enhance the model’s robustness when working with large datasets.
Using vLLM for High-Throughput Inference 🚀
An example of using vLLM for efficient inference:
Text Completion Example:
from vllm import LLM, SamplingParams
tp_size = 4 # Set tensor parallel size
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
model_name = "deepseek-ai/deepseek-coder-6.7-base"
llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)
prompts = [
"If everyone in a country loves one another,",
"The research should also focus on the technologies",
"To determine if the label is correct, we need to"
]
# Generate text
outputs = llm.generate(prompts, sampling_params)
# Extract and print the generated text
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)
Chat Completion Example:
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
tp_size = 4 # Set tensor parallel size
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
model_name = "deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name) # Load tokenizer
llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)
# Define message list
messages_list = [
[{"role": "user", "content": "Who are you?"}],
[{"role": "user", "content": "What can you do?"}],
[{"role": "user", "content": "Explain Transformer briefly."}],
]
# Apply chat template and disable tokenization
prompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_list]
sampling_params.stop = [tokenizer.eos_token] # Set stop condition
# Generate outputs
outputs = llm.generate(prompts, sampling_params)
# Extract and print generated text
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)
Code Explanation:
- vLLM utilizes configured parameters to achieve high-throughput inference, maintaining stability even when addressing multiple requests.
- The
SamplingParams
settings make the results more diverse, whiletokenizer.apply_chat_template
assists in formatting messages for the model’s processing.
I hope this information helps beginners quickly get started with DeepSeek-Coder and utilize its powerful capabilities to enhance their code generation and application skills! ✨