Understanding and Installing DeepSeek-Coder

Thursday, Jan 2, 2025 | 8 minute read

GitHub Trend
Understanding and Installing DeepSeek-Coder

Revolutionary AI tool designed to boost coding efficiency 💻🚀, supports 80+ languages 🌐, offers various model sizes 📊, excels in performance benchmarks 🏆, and features powerful completion capabilities, transforming coding into a seamless experience!✨

Deep Dive: What is DeepSeek-Coder? 🤖

“In today’s fast-paced technological landscape, understanding the value of AI programming tools is more important than ever.” 🌍

With the rapid advancement of artificial intelligence, a plethora of tools have emerged to assist developers in boosting their productivity and coding skills. In this context, the advent of DeepSeek-Coder marks a revolutionary shift in the programming world. 💥

DeepSeek-Coder is an innovative open-source tool launched by DeepSeek, a company based in China, in 2023 that focuses on the development of AI programming language models. 🇨🇳✨ This tool has been meticulously trained on an extensive dataset, comprising over 20 trillion tokens, with 87% being code and 13% comprised of natural language in both Chinese and English. 📝💻 The core functionality of this groundbreaking software lies in its ability to help developers generate and refine code, ultimately enhancing programming efficiency. Additionally, DeepSeek-Coder is also dedicated to advancing Artificial General Intelligence (AGI) technology, aiming to open new avenues for the future of programming. 🌟

Breaking the Mold: What Makes DeepSeek-Coder Unique 🔍

DeepSeek-Coder distinguishes itself from traditional programming models in several key ways:

  1. Diverse Training Data: DeepSeek-Coder has been trained on a wide array of code across more than 80 programming languages, ensuring its broad applicability. 🌐

  2. Variety of Model Sizes: The project offers a range of models with different parameters, including 1.3B, 5.7B, 6.7B, and 33B, to cater to various project needs. 📊

  3. Outstanding Performance Metrics: Compared to existing programming models, DeepSeek-Coder has excelled in multiple competitive benchmarks, showcasing its high efficiency and accuracy. 🏆

  4. Powerful Code Completion Capabilities: DeepSeek-Coder features a 16K context window, allowing it to perform exceptionally well in completing lengthy code segments. 📈

Back to Basics: Why Developers Favor DeepSeek-Coder ❤️

Thanks to its many advantages, DeepSeek-Coder has quickly gained the favor of developers, notably through:

  1. Rich Resource Support: DeepSeek actively provides resources for the AI community, including model access, tool demonstrations, and integrations on platforms like GitHub and Hugging Face, ensuring developers can easily get started. 📚

  2. Community Participation Convenience: Users can find detailed documentation, updates, and community discussions on DeepSeek’s official website, enhancing their experience. 🌍

  3. Broad Practical Application Scenarios: In evaluations such as HumanEval and MBPP, the DeepSeek-Coder model not only meets practical programming needs but also outperforms many competitors, such as CodeLLama, demonstrating superior adaptability and flexibility. 💪

In these respects, DeepSeek-Coder leverages cutting-edge technology to ensure that developers work more efficiently in code writing and smart completion, making it a vital tool in the programming domain. 🔧💡


How to Install and Use DeepSeek-Coder 📦

To begin using DeepSeek-Coder, make sure that you have Python and pip installed in your environment. Then, open a terminal and run the following command to install the required dependencies:

pip install -r requirements.txt
  • This command will read the requirements.txt file, which lists all the libraries and versions needed for DeepSeek-Coder. Ensure you have an internet connection as pip will need to download these packages from the Python Package Index (PyPI).

Additionally, you can access the Hugging Face Space to view a demo of DeepSeek-Coder. If you wish to run the demo locally, simply execute the app.py file to start the application.

Feature Introduction ⚙️

DeepSeek-Coder offers three main functionalities to assist you in code generation, insertion, and inference:

  1. Code Completion
  2. Code Insertion
  3. Chat Model Inference

Next, these functionalities will be introduced in detail, along with code examples.

1) Code Completion 🔍

To use the code completion feature, you need to run the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

# Input text requesting the model to generate a quick sort algorithm
input_text = "#write a quick sort algorithm"
# Tokenize the input text and convert it to a PyTorch tensor, subsequently moving it to the model's device (GPU)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate the code output with a maximum length of 128
outputs = model.generate(**inputs, max_length=128)
# Decode the output tensor into human-readable text and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Explanation:

  • AutoTokenizer and AutoModelForCausalLM are used to load the tokenizer and model respectively. The parameter trust_remote_code=True indicates that you trust the code available from the Hugging Face community to use community-uploaded models.
  • The cuda() method moves the model to the GPU, significantly speeding up computations, especially during complex calculations.
  • The method tokenizer(input_text, return_tensors="pt") encodes the input text into a PyTorch tensor and moves it to the appropriate computing device (CPU/GPU) using to(model.device).
  • The method model.generate is used to produce output, with max_length limiting the maximum length of the output.
  • Finally, the tokenizer.decode method converts the produced tensor back to a human-readable string and skip_special_tokens=True ensures that special tokens are omitted.

2) Code Insertion ➕

To implement code insertion functionality, use the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

# Input text includes a code snippet that needs to be inserted
input_text = """<\xef\xbd\x9cfim\xe2\x96\x81begin\xef\xbd\x9c>def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = []
    right = []
<\xef\xbd\x9cfim\xe2\x96\x81hole\xef\xbd\x9c>
        if arr[i] < pivot:
            left.append(arr[i])
        else:
            right.append(arr[i])
    return quick_sort(left) + [pivot] + quick_sort(right)<\xef\xbd\x9cfim\xe2\x96\x81end\xef\xbd\x9c>"""
# Tokenize the input text and convert it to a PyTorch tensor, subsequently moving it to the model's device (GPU)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate the code output with a maximum length of 128
outputs = model.generate(**inputs, max_length=128)
# Decode the output tensor to human-readable text and print, omitting the original input text to only show the new inserted content
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

Code Explanation:

  • Special markers are used in the input text (<\xef\xbd\x9cfim\xe2\x96\x81begin\xef\xbd\x9c>) to define the start and end of code blocks.
  • The output is extracted using tokenizer.decode(outputs[0], skip_special_tokens=True) and by subtracting the length of the original input, only the newly generated content is displayed.

3) Chat Model Inference 💬

Here’s an example code for performing chat model inference:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
# Load the pre-trained model with bfloat16 data type and move it to GPU
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()

# Define the message content
messages=[
    {'role': 'user', 'content': "write a quick sort algorithm in python."}
]
# Apply chat template and convert input text to a PyTorch tensor, subsequently moving it to the model's device
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# Generate output
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
# Decode the output into human-readable text and print
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

Code Explanation:

  • In this instance, the message list defines the user’s request, which the model will infer to generate the appropriate code.
  • The input text is formatted using apply_chat_template for the model to understand the context better.
  • Output generation parameters such as max_new_tokens, do_sample, etc., control the variety and length of the produced content.

How to Fine-tune DeepSeek-Coder ⚙️🛠️

If you wish to fine-tune DeepSeek-Coder on a custom task, start by installing the required packages:

pip install -r finetune/requirements.txt

Next, prepare your training data and follow the sample dataset format.

You can run the sample shell script as follows:

DATA_PATH="<your_data_path>"
OUTPUT_PATH="<your_output_path>"
MODEL="deepseek-ai/deepseek-coder-6.7b-instruct"

cd finetune && deepspeed finetune_deepseekcoder.py \
    --model_name_or_path $MODEL_PATH \
    --data_path $DATA_PATH \
    --output_dir $OUTPUT_PATH \
    --num_train_epochs 3 \
    --model_max_length 1024 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 100 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --warmup_steps 10 \
    --logging_steps 1 \
    --lr_scheduler_type "cosine" \
    --gradient_checkpointing True \
    --report_to "tensorboard" \
    --deepspeed configs/ds_config_zero3.json \
    --bf16 True

Code Explanation:

  • Most parameters like num_train_epochs control the number of iterations for the model, and learning_rate adjusts the learning rate, affecting the training process for better model performance.
  • Using deepspeed can speed up training and enhance the model’s robustness when working with large datasets.

Using vLLM for High-Throughput Inference 🚀

An example of using vLLM for efficient inference:

Text Completion Example:

from vllm import LLM, SamplingParams

tp_size = 4 # Set tensor parallel size
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
model_name = "deepseek-ai/deepseek-coder-6.7-base"
llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)

prompts = [
    "If everyone in a country loves one another,",
    "The research should also focus on the technologies",
    "To determine if the label is correct, we need to"
]
# Generate text
outputs = llm.generate(prompts, sampling_params)

# Extract and print the generated text
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

Chat Completion Example:

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

tp_size = 4 # Set tensor parallel size
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
model_name = "deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name) # Load tokenizer
llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)

# Define message list
messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "What can you do?"}],
    [{"role": "user", "content": "Explain Transformer briefly."}],
]
# Apply chat template and disable tokenization
prompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_list]

sampling_params.stop = [tokenizer.eos_token] # Set stop condition
# Generate outputs
outputs = llm.generate(prompts, sampling_params)

# Extract and print generated text
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

Code Explanation:

  • vLLM utilizes configured parameters to achieve high-throughput inference, maintaining stability even when addressing multiple requests.
  • The SamplingParams settings make the results more diverse, while tokenizer.apply_chat_template assists in formatting messages for the model’s processing.

I hope this information helps beginners quickly get started with DeepSeek-Coder and utilize its powerful capabilities to enhance their code generation and application skills! ✨

© 2024 - 2025 GitHub Trend

📈 Fun Projects 🔝