Skip to content

Commit

Permalink
Deepseek R1 Support (#4603)
Browse files Browse the repository at this point in the history
* Update README.md to reflect support for 15+ cloud accounts in Llama 3.1 documentation

* Update model configuration in deepseek-r1-vllm.yaml and enhance README.md for clarity. Changed model name to DeepSeek-R1-Distill-Llama-8B and added MAX_MODEL_LEN variable. Updated README to reflect changes in model compatibility and command usage for launching the service.

* Update README.md to enhance clarity and model compatibility details for DeepSeek-R1. Removed outdated GPU requirements section and added new commands for launching models with Llama-8B and Llama-70B. Updated model-GPU compatibility matrix for accuracy.

* Update deepseek-r1-vllm.yaml and README.md for improved configuration and usage instructions. Added disk tier specification and tensor parallel size to YAML. Updated README to reflect changes in GPU allocation for model launching and included example queries for Llama-8B and Llama-70B models.

* Update deepseek-r1-vllm.yaml and README.md to specify exact package versions and enhance usage instructions. Changed transformer and vllm installation commands to fixed versions. Updated README to clarify GPU usage instructions and modified example queries for improved clarity.

* Update README.md to improve clarity and structure for DeepSeek-R1 usage. Corrected spelling errors, enhanced instructions for running models with SkyPilot, and added a shutdown section. Updated section titles for better organization and user guidance.

* Update README.md to reflect the latest release of DeepSeek-R1 with SkyPilot integration. Enhanced visual presentation with an image, corrected spelling errors, and clarified instructions for running models. Improved consistency in terminology and formatting throughout the document.

* Fix image link in README.md for DeepSeek-R1 documentation to ensure proper display of the visual representation.

* Update README.md to correct image link for DeepSeek-R1, ensuring proper display. This change enhances the visual representation of the framework and maintains consistency in documentation.

* Enhance README.md for DeepSeek-R1 by adding a greeting and detailed thought process for counting letters in "strawberry." This update improves user engagement and provides a clearer understanding of the AI's reasoning capabilities.

* Remove unnecessary code snippets from README.md for DeepSeek-R1 to enhance readability and streamline the explanation of counting letters in "strawberry." This update improves the clarity of the document by eliminating clutter and focusing on the essential information.

* Update llm/deepseek-r1/README.md

Co-authored-by: Christopher Cooper <[email protected]>

* Update README.md to include launch details for DeepSeek-R1 on Kubernetes and correct formatting in thought process section. Added a new entry in the news section for the upcoming launch, enhancing user awareness of features and updates.

* date

---------

Co-authored-by: Christopher Cooper <[email protected]>
  • Loading branch information
KeplerC and cg505 authored Jan 22, 2025
1 parent 8cf6c86 commit 5b0f80d
Show file tree
Hide file tree
Showing 4 changed files with 287 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@

----
:fire: *News* :fire:
- [Jan 2025] Launch and Serve **[DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1)** on Kubernetes or Any Cloud: [**example**](./llm/deepseek-r1/)
- [Oct 2024] :tada: **SkyPilot crossed 1M+ downloads** :tada:: Thank you to our community! [**Twitter/X**](https://x.com/skypilot_org/status/1844770841718067638)
- [Sep 2024] Point, Launch and Serve **Llama 3.2** on Kubernetes or Any Cloud: [**example**](./llm/llama-3_2/)
- [Sep 2024] Run and deploy [**Pixtral**](./llm/pixtral), the first open-source multimodal model from Mistral AI.
Expand Down
261 changes: 261 additions & 0 deletions llm/deepseek-r1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
# Run and Serve DeepSeek-R1 with SkyPilot

> SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability.
<p align="center">
<img src="https://i.imgur.com/6umSuKw.png" alt="DeepSeek-R1 on SkyPilot" style="width: 70%;">
</p>

On Jan 20, 2025, DeepSeek AI released the [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1), including a family of models up to 671B parameters.

DeepSeek-R1 naturally emerged with numerous powerful and interesting reasoning behaviors. It outperforms **state-of-the-art proprietary models** such as OpenAI-o1-mini and becomes **the first time** an open LLM closely rivals like OpenAI-o1.

This guide walks through how to run and host DeepSeek-R1 models **on any infrastructure** from ranging from Local GPU workstation, Kubernetes cluster and public Clouds ([15+ clouds supported](https://docs.skypilot.co/en/latest/getting-started/installation.html)).

Skypilot supports a variety of LLM frameworks and models. In this guide, we use [vLLM](https://github.com/vllm-project/vllm), an open-source library for fast LLM inference and serving, as an example.


### Step 0: Bring any infra

Install SkyPilot on your local machine:

```bash
pip install 'skypilot-nightly[all]'
```

Pick one of the following depending on what infra you want to run DeepSeek-R1 on:

**If your local machine/cluster has GPU**: you can run SkyPilot [directly on existing machines](https://docs.skypilot.co/en/latest/reservations/existing-machines.html) with

```bash
sky local up
```

**If you want to use Clouds** (15+ clouds are supported):

```bash
sky check
```
See [docs](https://docs.skypilot.co/en/latest/getting-started/installation.html) for details.


### Step 1: Run it with SkyPilot

Now it's time to run deepseek with SkyPilot. The instruction can be dependent on your existing hardware.

8B:
```
sky launch deepseek-r1-vllm.yaml \
-c deepseek \
--env HF_TOKEN=YOUR_HUGGING_FACE_API_TOKEN \
--env MODEL_NAME=deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
--gpus L4:1
```

70B:
```
sky launch deepseek-r1-vllm.yaml \
-c deepseek \
--env HF_TOKEN=YOUR_HUGGING_FACE_API_TOKEN \
--env MODEL_NAME=deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--gpus A100-80GB:2
```

replace the command with your own huggingface token and the GPU that you wish to use. You may run `sky show-gpus` to know what GPU that you have access to. As a reference, here is the model-GPU compatibility matrix:

| **GPU** | **DeepSeek-R1-Distill-Qwen-7B** | **DeepSeek-R1-Distill-Llama-70B** | **DeepSeek-R1** |
|----------------- |------------------------------ |------------------------ |------------------------------ |
| **L4:1** | ✅, with `--max-model-len 4096` |||
| **L4:8** ||||
| **A100:8** ||||
| **A100-80GB:12** ||| ✅, with `--max-model-len 4096` |

### Step 2: Get Results
Get a single endpoint that load-balances across replicas:

```
ENDPOINT=$(sky status --ip deepseek)
```

Query the endpoint in a terminal:
8B:
```
curl http://$ENDPOINT:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}' | jq .
```

70B:
```
curl http://$ENDPOINT:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "how many rs are in strawberry"
}
]
}' | jq .
```
</details>

You will get both the chain of thoughts within `<think>` tags and the final results.

<details>
<summary>Who are you? I'm DeepSeek-R1.</summary>

Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.

```console
{
"id": "chatcmpl-507f467863344f31b98d8bf36b9a3c1c",
"object": "chat.completion",
"created": 1737503962,
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<think>\n\n</think>\n\nGreetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 13,
"total_tokens": 57,
"completion_tokens": 44,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
```
</details>


<details>
<summary>How many Rs are in strawberry: There are 3 Rs in strawberry.</summary>

\<think\>
Okay, so I need to figure out how many times the letter 'R' appears in the word "strawberry." Hmm, let me think about this step by step. First, I should probably write out the word to visualize it better. The word is S-T-R-A-W-B-E-R-R-Y. Wait, is that right? Let me double-check. S-T-R-A-W-B-E-R-R-Y, yes, that's how it's spelled.

Now, I need to go through each letter one by one and count the Rs. Starting with the first letter, it's an S. Not an R, so move on. The second letter is T, still not an R. The third letter is R. Okay, that's the first R. I'll note that down.

Next letters: A, W, B, E. None of those are Rs. So far, only one R. Then comes R again after E, right? So that's the second R. But wait, I think there's another R after that. Let me make sure. After the second R, there's another R, making it the third R. Wait, no, let me check the spelling again. It's S-T-R-A-W-B-E-R-R-Y. So after E, it's R, then another R, so that's two Rs there. So total, how many?

Let me recount: first R is the third letter, then after E, there's another R, making it two in total, or three? Wait, no. Let me look at each position:

1. S
2. T
3. R (1st R)
4. A
5. W
6. B
7. E
8. R (2nd R)
9. R (3rd R)
10. Y

Wait, so after E, there are two Rs in a row, which would make it the 8th and 9th letters. So that's two more Rs after the first one. So total, it's three Rs? Or is that correct? Let me make sure I'm not overcounting. Let's write it out:

S T R A W B E R R Y

So, positions:

1: S

2: T

3: R (1)

4: A

5: W

6: B

7: E

8: R (2)

9: R (3)

10: Y

So that's three Rs. Wait, but when I think about the word "strawberry," I thought it had two Rs, but maybe it's three. Wait, maybe I'm wrong. Let me check a dictionary or something, but since I can't do that, I'll have to rely on my memory. Hmm, maybe I was mistaken earlier. Let me think again. Strawberries have a double R, I believe. But in the spelling, is it R-A-W-B-E-R-R-Y? So after the E, it's R-R-Y. So that's two Rs at the end. Plus the one after the T, so that's three Rs total.

Wait, no. Let me think about how the word is pronounced. It's "straw" plus "berry," right? So "straw" has one R, and "berry" has two Rs? No, "berry" only has one R. Wait, no, "berry" is B-E-R-R-Y, so there are two Rs there. So when you put it together, "strawberry" would have the R from "straw" and two Rs from "berry," making three Rs. Hmm, but I'm not sure. Some people might think it's only two Rs, but based on the spelling, it's three.

Wait, no, actually, let me break it down. The word is S-T-R-A-W-B-E-R-R-Y. So after the T, there's an R, then later after the B and E, there's another R, and then another R before Y. So that's three Rs. So the answer should be three. But I'm a bit confused because sometimes people might miscount, thinking it's two. But according to the spelling, it's three. I think that's correct.
\</think\>

The word "strawberry" contains three Rs.

Step-by-step breakdown:
- The first R is the third letter.
- The second R is the eighth letter.
- The third R is the ninth letter.

Counting each occurrence: 1 (position 3), 2 (position 8), and 3 (position 9).

Answer: There are 3 Rs in "strawberry."

```console
{
"id": "chatcmpl-d532bd1c1738493ab9c8c906550044bf",
"object": "chat.completion",
"created": 1737507945,
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<think>\nOkay, so I need to figure out how many times the letter 'R' appears in the word \"strawberry.\" Hmm, let me think about this step by step. First, I should probably write out the word to visualize it better. The word is S-T-R-A-W-B-E-R-R-Y. Wait, is that right? Let me double-check. S-T-R-A-W-B-E-R-R-Y, yes, that's how it's spelled.\n\nNow, I need to go through each letter one by one and count the Rs. Starting with the first letter, it's an S. Not an R, so move on. The second letter is T, still not an R. The third letter is R. Okay, that's the first R. I'll note that down.\n\nNext letters: A, W, B, E. None of those are Rs. So far, only one R. Then comes R again after E, right? So that's the second R. But wait, I think there's another R after that. Let me make sure. After the second R, there's another R, making it the third R. Wait, no, let me check the spelling again. It's S-T-R-A-W-B-E-R-R-Y. So after E, it's R, then another R, so that's two Rs there. So total, how many?\n\nLet me recount: first R is the third letter, then after E, there's another R, making it two in total, or three? Wait, no. Let me look at each position:\n\n1. S\n2. T\n3. R (1st R)\n4. A\n5. W\n6. B\n7. E\n8. R (2nd R)\n9. R (3rd R)\n10. Y\n\nWait, so after E, there are two Rs in a row, which would make it the 8th and 9th letters. So that's two more Rs after the first one. So total, it's three Rs? Or is that correct? Let me make sure I'm not overcounting. Let's write it out:\n\nS T R A W B E R R Y\n\nSo, positions:\n\n1: S\n\n2: T\n\n3: R (1)\n\n4: A\n\n5: W\n\n6: B\n\n7: E\n\n8: R (2)\n\n9: R (3)\n\n10: Y\n\nSo that's three Rs. Wait, but when I think about the word \"strawberry,\" I thought it had two Rs, but maybe it's three. Wait, maybe I'm wrong. Let me check a dictionary or something, but since I can't do that, I'll have to rely on my memory. Hmm, maybe I was mistaken earlier. Let me think again. Strawberries have a double R, I believe. But in the spelling, is it R-A-W-B-E-R-R-Y? So after the E, it's R-R-Y. So that's two Rs at the end. Plus the one after the T, so that's three Rs total.\n\nWait, no. Let me think about how the word is pronounced. It's \"straw\" plus \"berry,\" right? So \"straw\" has one R, and \"berry\" has two Rs? No, \"berry\" only has one R. Wait, no, \"berry\" is B-E-R-R-Y, so there are two Rs there. So when you put it together, \"strawberry\" would have the R from \"straw\" and two Rs from \"berry,\" making three Rs. Hmm, but I'm not sure. Some people might think it's only two Rs, but based on the spelling, it's three.\n\nWait, no, actually, let me break it down. The word is S-T-R-A-W-B-E-R-R-Y. So after the T, there's an R, then later after the B and E, there's another R, and then another R before Y. So that's three Rs. So the answer should be three. But I'm a bit confused because sometimes people might miscount, thinking it's two. But according to the spelling, it's three. I think that's correct.\n</think>\n\nThe word \"strawberry\" contains three Rs. \n\nStep-by-step breakdown:\n- The first R is the third letter.\n- The second R is the eighth letter.\n- The third R is the ninth letter.\n\nCounting each occurrence: 1 (position 3), 2 (position 8), and 3 (position 9).\n\nAnswer: There are 3 Rs in \"strawberry.\"",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 15,
"total_tokens": 985,
"completion_tokens": 970,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
```

</details>


### Shutdown
To shutdown, run
```
sky down deepseek
```
24 changes: 24 additions & 0 deletions llm/deepseek-r1/deepseek-r1-vllm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
envs:
MODEL_NAME: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
MAX_MODEL_LEN: 4096
HF_TOKEN: # TODO: Fill with your own huggingface token, or use --env to pass.

resources:
accelerators: {L4:1, A10G:1, A10:1, A100:1, A100-80GB:1}
ports:
- 8000
disk_tier: best

setup: |
pip install transformers==4.48.1
pip install vllm==0.6.6.post1
python -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')"
run: |
echo 'Starting vllm openai api server...'
python -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 \
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
--model $MODEL_NAME \
--max-model-len $MAX_MODEL_LEN
2 changes: 1 addition & 1 deletion llm/llama-3_1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This guide walks through how to serve Llama 3.1 models **completely on your infr

- Local GPU workstation
- Kubernetes cluster
- Cloud accounts ([12 clouds supported](https://docs.skypilot.co/en/latest/getting-started/installation.html))
- Cloud accounts ([15+ clouds supported](https://docs.skypilot.co/en/latest/getting-started/installation.html))

SkyPilot will be used as the unified framework to launch serving on any (or multiple) infra that you bring.

Expand Down

0 comments on commit 5b0f80d

Please sign in to comment.