[Feature] remove vllm _custom_ops #2965

zhyncs · 2025-01-18T12:05:06Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

scaled_fp8_quant @HandH1998

sglang/python/sglang/srt/layers/quantization/fp8.py

Line 10 in 2add697

from vllm import _custom_ops as ops

sglang/python/sglang/srt/layers/quantization/fp8.py

Line 297 in 2add697

qweight, weight_scale = ops.scaled_fp8_quant(layer.weight, scale=None)

sglang/python/sglang/srt/layers/moe/ep_moe/layer.py

Line 6 in 2add697

from vllm import _custom_ops as ops

sglang/python/sglang/srt/layers/moe/ep_moe/layer.py

Line 603 in 2add697

ops.scaled_fp8_quant(layer.w13_weight.data[expert, :, :])

rotary_embedding @ByronHsu

sglang/python/sglang/srt/layers/rotary_embedding.py

Line 142 in 2add697

from vllm import _custom_ops as ops

sglang/python/sglang/srt/layers/rotary_embedding.py

Lines 159 to 166 in 2add697

    
           ops.rotary_embedding( 
        
               positions, 
        
               query, 
        
               key, 
        
               self.head_size, 
        
               self.cos_sin_cache, 
        
               self.is_neox_style, 
        
           )

BTW we don't need batched_rotary_embedding

topk_softmax @zhyncs

sglang/python/sglang/srt/layers/moe/topk.py

Line 48 in 2add697

from vllm import _custom_ops as ops

sglang/python/sglang/srt/layers/moe/topk.py

Lines 62 to 67 in 2add697

    
           ops.topk_softmax( 
        
               topk_weights, 
        
               topk_ids, 
        
               token_expert_indicies, 
        
               gating_output.float(), 
        
           )

moe_align_block_size moe_sum @BBuf @zhyncs

sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py

Line 14 in 2add697

from vllm import _custom_ops as ops

sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py

Lines 448 to 455 in 2add697

    
           ops.moe_align_block_size( 
        
               topk_ids, 
        
               num_experts, 
        
               block_size, 
        
               sorted_ids, 
        
               expert_ids, 
        
               num_tokens_post_pad, 
        
           )

sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py

Lines 1012 to 1015 in 2add697

    
           ops.moe_sum( 
        
               intermediate_cache3.view(*intermediate_cache3.shape), 
        
               out_hidden_states[begin_chunk_idx:end_chunk_idx], 
        
           )

awq_dequantize @bjmsong @zhyncs

sglang/python/sglang/srt/models/deepseek_v2.py

Line 25 in 2add697

from vllm import _custom_ops as ops

sglang/python/sglang/srt/models/deepseek_v2.py

Lines 946 to 952 in 2add697

    
           w = ops.awq_dequantize( 
        
               self_attn.kv_b_proj.qweight, 
        
               self_attn.kv_b_proj.scales, 
        
               self_attn.kv_b_proj.qzeros, 
        
               0, 
        
               0, 
        
               0,

Related resources

No response

The text was updated successfully, but these errors were encountered:

zhyncs · 2025-01-18T12:07:08Z

We need to ensure that

the functionality is consistent with the original
without precision issues
and that performance is as good as or better than before

zhyncs added good first issue Good for newcomers help wanted Extra attention is needed high priority labels Jan 18, 2025

zhyncs assigned BBuf, ByronHsu, HandH1998, bjmsong, ispobock and zhyncs Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] remove vllm _custom_ops #2965

[Feature] remove vllm _custom_ops #2965

zhyncs commented Jan 18, 2025

zhyncs commented Jan 18, 2025

[Feature] remove vllm _custom_ops #2965

[Feature] remove vllm _custom_ops #2965

Comments

zhyncs commented Jan 18, 2025

Checklist

Motivation

Related resources

zhyncs commented Jan 18, 2025