Implement DeepSeek V2 #2744

EricLBuehler · 2025-01-27T17:29:24Z

This PR implements the DeepSeek V2 architecture.

candle-examples/Cargo.toml

candle-examples/examples/deepseekv2/main.rs

candle-transformers/src/models/deepseek2.rs

LaurentMazare · 2025-01-28T21:59:05Z

candle-transformers/src/models/deepseek2.rs

+}
+
+pub trait SplitOp {
+    fn split<D: Dim>(&self, splits: &[usize], dim: D) -> Result<Vec<Tensor>>;


Looks like split is only used to split two tensors at a time, having a specialized op split2 instead that returns a pair of tensors would seem more convenient.

Actually, my idea was to maybe eventually migrate all the ...Op traits in this file to candle_nn as they are both used in DeepSeek 3 (that PR duplicates these ops) and might be very useful.

What do you think?

I would rather keep candle_nn on the simpler side, here it seems that the ops can be just specialized to what is exactly required by the model, I would even suggest defining helper functions rather than going through traits, and the deepseek-v3 implementation can use some functions from the deepseek-v2 module.

candle-transformers/src/models/deepseek2.rs

LaurentMazare · 2025-01-31T19:18:01Z

candle-transformers/src/models/deepseek2.rs

+}
+
+pub trait SplitOp {
+    fn split<D: Dim>(&self, splits: &[usize], dim: D) -> Result<Vec<Tensor>>;


I would rather keep candle_nn on the simpler side, here it seems that the ops can be just specialized to what is exactly required by the model, I would even suggest defining helper functions rather than going through traits, and the deepseek-v3 implementation can use some functions from the deepseek-v2 module.

LaurentMazare · 2025-01-31T19:20:40Z

candle-transformers/src/models/deepseek2.rs

+                // (n, topk_group)
+                let group_idx = scores.topk_unsorted(self.cfg.topk_group)?.indices;
+                // (n, n_group)
+                let mut group_mask = group_scores.zeros_like()?;


Cannot you just avoid this mut by chaining calls or using a local scope, seems fairly easy to do. Please also review the other remaining muts.

Add deepseek v2

c3a9775

EricLBuehler mentioned this pull request Jan 27, 2025

Add support for Deepseek #2692

Open

Fix

cb47324

EricLBuehler marked this pull request as ready for review January 27, 2025 17:42

EricLBuehler added 2 commits January 27, 2025 13:08

Remove unused

a1dd5a1

Add kv cache

bd8b5c4

LaurentMazare reviewed Jan 28, 2025

View reviewed changes

EricLBuehler added 6 commits January 28, 2025 17:06

Remove from cargo.toml

66170a0

Fix dtype selection logic

5d42e6f

Fix unnecessary u32->f32->gather->u32

6df4f13

Remove fromstr impl

1ad7e92

Use local scopes for some clarity

b8974e9

Typo

ed0953e

EricLBuehler requested a review from LaurentMazare January 29, 2025 16:42

LaurentMazare reviewed Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DeepSeek V2 #2744

Implement DeepSeek V2 #2744

EricLBuehler commented Jan 27, 2025

LaurentMazare Jan 28, 2025

EricLBuehler Jan 28, 2025

LaurentMazare Jan 31, 2025

LaurentMazare Jan 31, 2025

LaurentMazare Jan 31, 2025

Implement DeepSeek V2 #2744

Are you sure you want to change the base?

Implement DeepSeek V2 #2744

Conversation

EricLBuehler commented Jan 27, 2025

LaurentMazare Jan 28, 2025

Choose a reason for hiding this comment

EricLBuehler Jan 28, 2025

Choose a reason for hiding this comment

LaurentMazare Jan 31, 2025

Choose a reason for hiding this comment

LaurentMazare Jan 31, 2025

Choose a reason for hiding this comment

LaurentMazare Jan 31, 2025

Choose a reason for hiding this comment