Understanding and Installing DeepSeek-Coder

Thursday, Jan 2, 2025 | 8 minute read

GitHub Trend

Understanding and Installing DeepSeek-Coder

Revolutionary AI tool designed to boost coding efficiency 💻🚀, supports 80+ languages 🌐, offers various model sizes 📊, excels in performance benchmarks 🏆, and features powerful completion capabilities, transforming coding into a seamless experience!✨

Deep Dive: What is DeepSeek-Coder? 🤖

“In today’s fast-paced technological landscape, understanding the value of AI programming tools is more important than ever.” 🌍

With the rapid advancement of artificial intelligence, a plethora of tools have emerged to assist developers in boosting their productivity and coding skills. In this context, the advent of DeepSeek-Coder marks a revolutionary shift in the programming world. 💥

DeepSeek-Coder is an innovative open-source tool launched by DeepSeek, a company based in China, in 2023 that focuses on the development of AI programming language models. 🇨🇳✨ This tool has been meticulously trained on an extensive dataset, comprising over 20 trillion tokens, with 87% being code and 13% comprised of natural language in both Chinese and English. 📝💻 The core functionality of this groundbreaking software lies in its ability to help developers generate and refine code, ultimately enhancing programming efficiency. Additionally, DeepSeek-Coder is also dedicated to advancing Artificial General Intelligence (AGI) technology, aiming to open new avenues for the future of programming. 🌟

Breaking the Mold: What Makes DeepSeek-Coder Unique 🔍

DeepSeek-Coder distinguishes itself from traditional programming models in several key ways:

Diverse Training Data: DeepSeek-Coder has been trained on a wide array of code across more than 80 programming languages, ensuring its broad applicability. 🌐
Variety of Model Sizes: The project offers a range of models with different parameters, including 1.3B, 5.7B, 6.7B, and 33B, to cater to various project needs. 📊
Outstanding Performance Metrics: Compared to existing programming models, DeepSeek-Coder has excelled in multiple competitive benchmarks, showcasing its high efficiency and accuracy. 🏆
Powerful Code Completion Capabilities: DeepSeek-Coder features a 16K context window, allowing it to perform exceptionally well in completing lengthy code segments. 📈

Back to Basics: Why Developers Favor DeepSeek-Coder ❤️

Thanks to its many advantages, DeepSeek-Coder has quickly gained the favor of developers, notably through:

Rich Resource Support: DeepSeek actively provides resources for the AI community, including model access, tool demonstrations, and integrations on platforms like GitHub and Hugging Face, ensuring developers can easily get started. 📚
Community Participation Convenience: Users can find detailed documentation, updates, and community discussions on DeepSeek’s official website, enhancing their experience. 🌍
Broad Practical Application Scenarios: In evaluations such as HumanEval and MBPP, the DeepSeek-Coder model not only meets practical programming needs but also outperforms many competitors, such as CodeLLama, demonstrating superior adaptability and flexibility. 💪

In these respects, DeepSeek-Coder leverages cutting-edge technology to ensure that developers work more efficiently in code writing and smart completion, making it a vital tool in the programming domain. 🔧💡

How to Install and Use DeepSeek-Coder 📦

To begin using DeepSeek-Coder, make sure that you have Python and pip installed in your environment. Then, open a terminal and run the following command to install the required dependencies:

pip install -r requirements.txt

This command will read the requirements.txt file, which lists all the libraries and versions needed for DeepSeek-Coder. Ensure you have an internet connection as pip will need to download these packages from the Python Package Index (PyPI).

Additionally, you can access the Hugging Face Space to view a demo of DeepSeek-Coder. If you wish to run the demo locally, simply execute the app.py file to start the application.

Feature Introduction ⚙️

DeepSeek-Coder offers three main functionalities to assist you in code generation, insertion, and inference:

Code Completion
Code Insertion
Chat Model Inference

Next, these functionalities will be introduced in detail, along with code examples.

1) Code Completion 🔍

To use the code completion feature, you need to run the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

# Input text requesting the model to generate a quick sort algorithm
input_text = "#write a quick sort algorithm"
# Tokenize the input text and convert it to a PyTorch tensor, subsequently moving it to the model's device (GPU)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate the code output with a maximum length of 128
outputs = model.generate(**inputs, max_length=128)
# Decode the output tensor into human-readable text and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Explanation:

AutoTokenizer and AutoModelForCausalLM are used to load the tokenizer and model respectively. The parameter trust_remote_code=True indicates that you trust the code available from the Hugging Face community to use community-uploaded models.
The cuda() method moves the model to the GPU, significantly speeding up computations, especially during complex calculations.
The method tokenizer(input_text, return_tensors="pt") encodes the input text into a PyTorch tensor and moves it to the appropriate computing device (CPU/GPU) using to(model.device).
The method model.generate is used to produce output, with max_length limiting the maximum length of the output.
Finally, the tokenizer.decode method converts the produced tensor back to a human-readable string and skip_special_tokens=True ensures that special tokens are omitted.

2) Code Insertion ➕

To implement code insertion functionality, use the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

# Input text includes a code snippet that needs to be inserted
input_text = """<\xef\xbd\x9cfim\xe2\x96\x81begin\xef\xbd\x9c>def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = []
    right = []
<\xef\xbd\x9cfim\xe2\x96\x81hole\xef\xbd\x9c>
        if arr[i] < pivot:
            left.append(arr[i])
        else:
            right.append(arr[i])
    return quick_sort(left) + [pivot] + quick_sort(right)<\xef\xbd\x9cfim\xe2\x96\x81end\xef\xbd\x9c>"""
# Tokenize the input text and convert it to a PyTorch tensor, subsequently moving it to the model's device (GPU)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate the code output with a maximum length of 128
outputs = model.generate(**inputs, max_length=128)
# Decode the output tensor to human-readable text and print, omitting the original input text to only show the new inserted content
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

Code Explanation:

Special markers are used in the input text (<\xef\xbd\x9cfim\xe2\x96\x81begin\xef\xbd\x9c>) to define the start and end of code blocks.
The output is extracted using tokenizer.decode(outputs[0], skip_special_tokens=True) and by subtracting the length of the original input, only the newly generated content is displayed.

3) Chat Model Inference 💬

Here’s an example code for performing chat model inference:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

# Define the message content
messages=[
    {'role': 'user', 'content': "write a quick sort algorithm in python."}
]
# Apply chat template and convert input text to a PyTorch tensor, subsequently moving it to the model's device
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# Generate output
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
# Decode the output into human-readable text and print
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

Code Explanation:

In this instance, the message list defines the user’s request, which the model will infer to generate the appropriate code.
The input text is formatted using apply_chat_template for the model to understand the context better.
Output generation parameters such as max_new_tokens, do_sample, etc., control the variety and length of the produced content.

How to Fine-tune DeepSeek-Coder ⚙️🛠️

If you wish to fine-tune DeepSeek-Coder on a custom task, start by installing the required packages:

pip install -r finetune/requirements.txt

Next, prepare your training data and follow the sample dataset format.

You can run the sample shell script as follows:

DATA_PATH="<your_data_path>"
OUTPUT_PATH="<your_output_path>"
MODEL="deepseek-ai/deepseek-coder-6.7b-instruct"

cd finetune && deepspeed finetune_deepseekcoder.py \
    --model_name_or_path $MODEL_PATH \
    --data_path $DATA_PATH \
    --output_dir $OUTPUT_PATH \
    --num_train_epochs 3 \
    --model_max_length 1024 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 100 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --warmup_steps 10 \
    --logging_steps 1 \
    --lr_scheduler_type "cosine" \
    --gradient_checkpointing True \
    --report_to "tensorboard" \
    --deepspeed configs/ds_config_zero3.json \
    --bf16 True

Code Explanation:

Most parameters like num_train_epochs control the number of iterations for the model, and learning_rate adjusts the learning rate, affecting the training process for better model performance.
Using deepspeed can speed up training and enhance the model’s robustness when working with large datasets.

Using vLLM for High-Throughput Inference 🚀

An example of using vLLM for efficient inference:

Text Completion Example:

from vllm import LLM, SamplingParams

tp_size = 4 # Set tensor parallel size
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
model_name = "deepseek-ai/deepseek-coder-6.7-base"
llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)

prompts = [
    "If everyone in a country loves one another,",
    "The research should also focus on the technologies",
    "To determine if the label is correct, we need to"
]
# Generate text
outputs = llm.generate(prompts, sampling_params)

# Extract and print the generated text
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

Chat Completion Example:

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

tp_size = 4 # Set tensor parallel size
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
model_name = "deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name) # Load tokenizer
llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)

# Define message list
messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "What can you do?"}],
    [{"role": "user", "content": "Explain Transformer briefly."}],
]
# Apply chat template and disable tokenization
prompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_list]

sampling_params.stop = [tokenizer.eos_token] # Set stop condition
# Generate outputs
outputs = llm.generate(prompts, sampling_params)

# Extract and print generated text
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

Code Explanation:

vLLM utilizes configured parameters to achieve high-throughput inference, maintaining stability even when addressing multiple requests.
The SamplingParams settings make the results more diverse, while tokenizer.apply_chat_template assists in formatting messages for the model’s processing.

I hope this information helps beginners quickly get started with DeepSeek-Coder and utilize its powerful capabilities to enhance their code generation and application skills! ✨

Previous page Qwen-Agent: How to Install and Use This Powerful Development Framework

Next page How to Install and Use llm-app: Unlocking a New Dimension of AI Applications! 🌟