v0.3.1
Highlights
- UQFF
- FLUX model
- Llama 3.2 Vision model
MSRV
The MSRV of this release is 1.79.0.
What's Changed
- Enable automatic determination of normal loader type by @EricLBuehler in #742
- Add the
ForwardInputsResult
api by @EricLBuehler in #745 - Implement Mixture of Quantized Experts (MoQE) by @EricLBuehler in #747
- Bump quinn-proto from 0.11.6 to 0.11.8 by @dependabot in #748
- Fix f64-f32 type mismatch for Metal/Accelerate by @EricLBuehler in #752
- Nicer error when misconfigured PagedAttention input metadata by @EricLBuehler in #753
- Update deps, support CUDA 12.6 by @EricLBuehler in #755
- Patch bug when not using PagedAttention by @EricLBuehler in #759
- Fix
MistralRs
Drop impl in tokio runtime by @EricLBuehler in #762 - Use nicer Candle Error APIs by @EricLBuehler in #767
- Support setting seed by @EricLBuehler in #766
- Fix Metal build error with seed by @EricLBuehler in #771
- Fix and add checks for no kv cache by @EricLBuehler in #776
- UQFF: The uniquely powerful quantized file format. by @EricLBuehler in #770
- Add
Scheduler::running_len
by @EricLBuehler in #780 - Deduplicate RoPE caches by @EricLBuehler in #787
- Easier and simpler Rust-side API by @EricLBuehler in #785
- Add some examples for AnyMoE by @EricLBuehler in #788
- Rust API for sampling by @EricLBuehler in #790
- Our first Diffusion model: FLUX by @EricLBuehler in #758
- Fix build bugs with metal, NSUInteger by @EricLBuehler in #792
- Support weight tying in Llama 3.2 GGUF models by @EricLBuehler in #801
- Implement the Llama 3.2 vision models by @EricLBuehler in #796
Full Changelog: v0.3.0...v0.3.1