Release v3.0.2 · huggingface/text-generation-inference

Tl;dr

New transformers backend supporting flashattention at roughly same performance as pure TGI for all non officially supported models directly in TGI. Congrats @Cyrilvallez

New models unlocked: Cohere2, olmo, olmo2, helium.

What's Changed

docs(README): supported hardware links TGI AMD GPUs by @guspan-tanadi in #2814
Fixing latest flavor by disabling it. by @Narsil in #2831
fix facebook/opt-125m not working issue by @sywangyi in #2824
Fixup opt to reduce the amount of odd if statements. by @Narsil in #2833
TensorRT-LLM backend bump to latest version + misc fixes by @mfuntowicz in #2791
Feat/trtllm cancellation dev container by @Hugoch in #2795
New arg. by @Narsil in #2845
Fixing CI. by @Narsil in #2846
fix: lint backend and doc files by @drbh in #2850
Qwen2-VL runtime error fix when prompted with multiple images by @janne-alatalo in #2840
Update vllm kernels for ROCM by @mht-sharma in #2826
change xpu lib download link by @sywangyi in #2852
fix: include add_special_tokens in kserve request by @drbh in #2859
chore: fixed some typos and attribute issues in README by @ruidazeng in #2891
update ipex xpu to fix issue in ARC770 by @sywangyi in #2884
Basic flashinfer 0.2 support by @danieldk in #2862
Improve vlm support (add idefics3 support) by @drbh in #2437
Update to marlin-kernels 0.3.7 by @danieldk in #2882
chore: Update jsonschema to 0.28.0 by @Stranger6667 in #2870
Add possible variants for A100 and H100 GPUs for auto-detecting flops by @lazariv in #2837
Update using_guidance.md by @nbroad1881 in #2901
fix crash in torch2.6 if TP=1 by @sywangyi in #2885
Add Flash decoding kernel ROCm by @mht-sharma in #2855
Enable FP8 Per-Tensor Scales and Integrate Marlin/MoE Kernels Repo for ROCm by @mht-sharma in #2825
Baichuan2-13B does not have max_position_embeddings in config by @sywangyi in #2903
docs(conceptual/speculation): available links Train Medusa by @guspan-tanadi in #2863
Fix docker run in README.md by @alvarobartt in #2861
📝 add guide on using TPU with TGI in the docs by @baptistecolle in #2907
Upgrading our rustc version. by @Narsil in #2908
Fix typo in TPU docs by @baptistecolle in #2911
Removing the github runner. by @Narsil in #2912
Upgrading bitsandbytes. by @Narsil in #2910
Do not convert weight scale to e4m3fnuz on CUDA by @danieldk in #2917
feat: improve star coder to support multi lora layers by @drbh in #2883
Flash decoding kernel adding and prefill-chunking and prefix caching enabling in intel cpu/xpu by @sywangyi in #2815
nix: update to PyTorch 2.5.1 by @danieldk in #2921
Moving to uv instead of poetry. by @Narsil in #2919
Add fp8 kv cache for ROCm by @mht-sharma in #2856
fix the crash of meta-llama/Llama-3.2-1B by @sywangyi in #2918
feat: improve qwen2-vl startup by @drbh in #2802
Revert "feat: improve qwen2-vl startup " by @drbh in #2924
flashinfer: switch to plan API by @danieldk in #2904
Fixing TRTLLM dockerfile. by @Narsil in #2922
Flash Transformers modeling backend support by @Cyrilvallez in #2913
Give TensorRT-LLMa proper CI/CD 😍 by @mfuntowicz in #2886
Trying to avoid the random timeout. by @Narsil in #2929
Run pre-commit run --all-files to fix CI by @alvarobartt in #2933
Upgrading the deps to have transformers==4.48.0 necessary by @Narsil in #2937
fix moe in quantization path by @sywangyi in #2935
Clarify FP8-Marlin use on capability 8.9 by @danieldk in #2940
Bump TensorRT-LLM backend dependency to v0.16.0 by @mfuntowicz in #2931
Set alias for max_completion_tokens in ChatRequest by @alvarobartt in #2932
Add NVIDIA A40 to known cards by @kldzj in #2941
[TRTLLM] Expose finish reason by @mfuntowicz in #2841
Tmp tp transformers by @Narsil in #2942
Transformers backend TP fix by @Cyrilvallez in #2945
Trying to put back the archlist (to fix the oom). by @Narsil in #2947

New Contributors

@janne-alatalo made their first contribution in #2840
@ruidazeng made their first contribution in #2891
@Stranger6667 made their first contribution in #2870
@lazariv made their first contribution in #2837
@baptistecolle made their first contribution in #2907
@Cyrilvallez made their first contribution in #2913
@kldzj made their first contribution in #2941

Full Changelog: v3.0.1...v3.0.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.2

What's Changed

New Contributors

Contributors