Skip to content

Tags: octoml/flashinfer

Tags

v0.1.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(main): release 0.1.0 (flashinfer-ai#373)

🤖 I have created a release *beep* *boop*
---


##
[0.1.0](flashinfer-ai/flashinfer@v0.0.9...v0.1.0)
(2024-07-17)


### Features

* Add mask to `merge_state_in_place`
([flashinfer-ai#372](flashinfer-ai#372))
([e14fa81](flashinfer-ai@e14fa81))
* expose pytorch api for block sparse attention
([flashinfer-ai#375](flashinfer-ai#375))
([4bba6fa](flashinfer-ai@4bba6fa))
* Fused GPU sampling kernel for joint top-k & top-p sampling
([flashinfer-ai#374](flashinfer-ai#374))
([6e028eb](flashinfer-ai@6e028eb))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

v0.0.9

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(main): release 0.0.9 (flashinfer-ai#359)

🤖 I have created a release *beep* *boop*
---


##
[0.0.9](flashinfer-ai/flashinfer@v0.0.8...v0.0.9)
(2024-07-12)

### Bugfix

* fix the decode kernel segfault in cudagraph mode
([flashinfer-ai#368](https://github.com/flashinfer-ai/flashinfer/pull/368))([c69cfa](https://github.com/flashinfer-ai/flashinfer/commit/c69cfabc540e4a7edd991713df10d575ff3b0c21))
- fix decode kernels output for empty kv cache
([flashinfer-ai#363](https://github.com/flashinfer-ai/flashinfer/pull/363))([ac72b1](https://github.com/flashinfer-ai/flashinfer/commit/ac72b1cc14a6474d601f371c8d69e2600ac28d2f))
- check gpu id in PyTorch APIs and use input tensor's gpu default stream
([flashinfer-ai#361](https://github.com/flashinfer-ai/flashinfer/pull/361))([1b84fa](https://github.com/flashinfer-ai/flashinfer/commit/1b84fab3e4f53fb4fa26952fdb46fa8018634057))

### Performance Improvements

* accelerate alibi
([flashinfer-ai#365](flashinfer-ai#365))
([4f0a9f9](flashinfer-ai@4f0a9f9))
* accelerate gqa performance
([flashinfer-ai#356](flashinfer-ai#356))
([e56ddad](flashinfer-ai@e56ddad))
* Optimize tensor conversions in C++ code to avoid unnecessary copies
([flashinfer-ai#366](flashinfer-ai#366))
([1116237](flashinfer-ai@1116237))

### Acknowledgement

We thank [@Yard1](https://github.com/Yard1),
[@Ying1123](https://github.com/Ying1123) and
[@zhyncs](https://github.com/zhyncs) for their contributions.

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zihao Ye <expye@outlook.com>

v0.0.8

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
bump version: v0.0.8 (flashinfer-ai#355)

##
[0.0.8](flashinfer-ai/flashinfer@v0.0.7...v0.0.8)
(2024-07-03)

### Bugfix

* fix prefill/append kernel behavior for empty kv-cache
([flashinfer-ai#353](flashinfer-ai#353))
([7adc8c](flashinfer-ai@7adc8cf))
* fix decode attention kernel with logits cap
([flashinfer-ai#350](flashinfer-ai#350))
([f5f7a2](flashinfer-ai@f5f7a2a))

v0.0.7

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: remove redundant `NUM_FRAGS_Z` (flashinfer-ai#345)

Do not compile `NUM_FRAGS_Z=6` to reduce wheel size.
Also revert flashinfer-ai#341 as they don't make effect.

v0.0.6

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: disable other warp layout because of large binary size (flashinf…

…er-ai#326)

Disable flashinfer-ai#322 for v0.0.6 release because binary size is too large.
v0.0.6 will only include bugfix at the moment.

v0.0.5

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: fix setuptools version (flashinfer-ai#319)

We met the following error when running the building scripts:
```
   File "/tmp/home/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 28, in <module>
      from pkg_resources import packaging  # type: ignore[attr-defined]
  ImportError: cannot import name 'packaging' from 'pkg_resources' (/tmp/home/.local/lib/python3.10/site-packages/pkg_resources/__init__.py)
```

Let's follow
[this](https://stackoverflow.com/questions/78604018/importerror-cannot-import-name-packaging-from-pkg-resources-when-trying-to)
and see whether it works or not.

v0.0.4

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: remove 8 from default page size (flashinfer-ai#233)

To reduce pip wheel size.

v0.0.3

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: fatal bugfix in batch decode operator (flashinfer-ai#177)

The `BatchDecodeWithPagedKVCacheWrapper` didn't run into the kernel.

v0.0.2

update v0.0.2

v0.0.1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[Bugfix] Python package do not have `__version__` (flashinfer-ai#104)

Also fix some format issues in python docstring.