Tags: octoml/flashinfer
Tags
chore(main): release 0.1.0 (flashinfer-ai#373) 🤖 I have created a release *beep* *boop* --- ## [0.1.0](flashinfer-ai/flashinfer@v0.0.9...v0.1.0) (2024-07-17) ### Features * Add mask to `merge_state_in_place` ([flashinfer-ai#372](flashinfer-ai#372)) ([e14fa81](flashinfer-ai@e14fa81)) * expose pytorch api for block sparse attention ([flashinfer-ai#375](flashinfer-ai#375)) ([4bba6fa](flashinfer-ai@4bba6fa)) * Fused GPU sampling kernel for joint top-k & top-p sampling ([flashinfer-ai#374](flashinfer-ai#374)) ([6e028eb](flashinfer-ai@6e028eb)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
chore(main): release 0.0.9 (flashinfer-ai#359) 🤖 I have created a release *beep* *boop* --- ## [0.0.9](flashinfer-ai/flashinfer@v0.0.8...v0.0.9) (2024-07-12) ### Bugfix * fix the decode kernel segfault in cudagraph mode ([flashinfer-ai#368](https://github.com/flashinfer-ai/flashinfer/pull/368))([c69cfa](https://github.com/flashinfer-ai/flashinfer/commit/c69cfabc540e4a7edd991713df10d575ff3b0c21)) - fix decode kernels output for empty kv cache ([flashinfer-ai#363](https://github.com/flashinfer-ai/flashinfer/pull/363))([ac72b1](https://github.com/flashinfer-ai/flashinfer/commit/ac72b1cc14a6474d601f371c8d69e2600ac28d2f)) - check gpu id in PyTorch APIs and use input tensor's gpu default stream ([flashinfer-ai#361](https://github.com/flashinfer-ai/flashinfer/pull/361))([1b84fa](https://github.com/flashinfer-ai/flashinfer/commit/1b84fab3e4f53fb4fa26952fdb46fa8018634057)) ### Performance Improvements * accelerate alibi ([flashinfer-ai#365](flashinfer-ai#365)) ([4f0a9f9](flashinfer-ai@4f0a9f9)) * accelerate gqa performance ([flashinfer-ai#356](flashinfer-ai#356)) ([e56ddad](flashinfer-ai@e56ddad)) * Optimize tensor conversions in C++ code to avoid unnecessary copies ([flashinfer-ai#366](flashinfer-ai#366)) ([1116237](flashinfer-ai@1116237)) ### Acknowledgement We thank [@Yard1](https://github.com/Yard1), [@Ying1123](https://github.com/Ying1123) and [@zhyncs](https://github.com/zhyncs) for their contributions. --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Zihao Ye <expye@outlook.com>
bump version: v0.0.8 (flashinfer-ai#355) ## [0.0.8](flashinfer-ai/flashinfer@v0.0.7...v0.0.8) (2024-07-03) ### Bugfix * fix prefill/append kernel behavior for empty kv-cache ([flashinfer-ai#353](flashinfer-ai#353)) ([7adc8c](flashinfer-ai@7adc8cf)) * fix decode attention kernel with logits cap ([flashinfer-ai#350](flashinfer-ai#350)) ([f5f7a2](flashinfer-ai@f5f7a2a))
ci: remove redundant `NUM_FRAGS_Z` (flashinfer-ai#345) Do not compile `NUM_FRAGS_Z=6` to reduce wheel size. Also revert flashinfer-ai#341 as they don't make effect.
fix: disable other warp layout because of large binary size (flashinf… …er-ai#326) Disable flashinfer-ai#322 for v0.0.6 release because binary size is too large. v0.0.6 will only include bugfix at the moment.
ci: fix setuptools version (flashinfer-ai#319) We met the following error when running the building scripts: ``` File "/tmp/home/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 28, in <module> from pkg_resources import packaging # type: ignore[attr-defined] ImportError: cannot import name 'packaging' from 'pkg_resources' (/tmp/home/.local/lib/python3.10/site-packages/pkg_resources/__init__.py) ``` Let's follow [this](https://stackoverflow.com/questions/78604018/importerror-cannot-import-name-packaging-from-pkg-resources-when-trying-to) and see whether it works or not.
fix: remove 8 from default page size (flashinfer-ai#233) To reduce pip wheel size.
fix: fatal bugfix in batch decode operator (flashinfer-ai#177) The `BatchDecodeWithPagedKVCacheWrapper` didn't run into the kernel.
[Bugfix] Python package do not have `__version__` (flashinfer-ai#104) Also fix some format issues in python docstring.