Skip to content

v3.0.2

Latest
Compare
Choose a tag to compare
@Narsil Narsil released this 24 Jan 11:16
b70f29d

Tl;dr

New transformers backend supporting flashattention at roughly same performance as pure TGI for all non officially supported models directly in TGI. Congrats @Cyrilvallez

New models unlocked: Cohere2, olmo, olmo2, helium.

What's Changed

New Contributors

Full Changelog: v3.0.1...v3.0.2