Tags · octoml/flashinfer

v0.1.0

chore(main): release 0.1.0 (flashinfer-ai#373)

🤖 I have created a release *beep* *boop*
---


##
[0.1.0](flashinfer-ai/flashinfer@v0.0.9...v0.1.0)
(2024-07-17)


### Features

* Add mask to `merge_state_in_place`
([flashinfer-ai#372](flashinfer-ai#372))
([e14fa81](flashinfer-ai@e14fa81))
* expose pytorch api for block sparse attention
([flashinfer-ai#375](flashinfer-ai#375))
([4bba6fa](flashinfer-ai@4bba6fa))
* Fused GPU sampling kernel for joint top-k & top-p sampling
([flashinfer-ai#374](flashinfer-ai#374))
([6e028eb](flashinfer-ai@6e028eb))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Jul 17, 2024
58b68d0
zip
tar.gz

v0.0.9

chore(main): release 0.0.9 (flashinfer-ai#359)

🤖 I have created a release *beep* *boop*
---


##
[0.0.9](flashinfer-ai/flashinfer@v0.0.8...v0.0.9)
(2024-07-12)

### Bugfix

* fix the decode kernel segfault in cudagraph mode
([flashinfer-ai#368](https://github.com/flashinfer-ai/flashinfer/pull/368))([c69cfa](https://github.com/flashinfer-ai/flashinfer/commit/c69cfabc540e4a7edd991713df10d575ff3b0c21))
- fix decode kernels output for empty kv cache
([flashinfer-ai#363](https://github.com/flashinfer-ai/flashinfer/pull/363))([ac72b1](https://github.com/flashinfer-ai/flashinfer/commit/ac72b1cc14a6474d601f371c8d69e2600ac28d2f))
- check gpu id in PyTorch APIs and use input tensor's gpu default stream
([flashinfer-ai#361](https://github.com/flashinfer-ai/flashinfer/pull/361))([1b84fa](https://github.com/flashinfer-ai/flashinfer/commit/1b84fab3e4f53fb4fa26952fdb46fa8018634057))

### Performance Improvements

* accelerate alibi
([flashinfer-ai#365](flashinfer-ai#365))
([4f0a9f9](flashinfer-ai@4f0a9f9))
* accelerate gqa performance
([flashinfer-ai#356](flashinfer-ai#356))
([e56ddad](flashinfer-ai@e56ddad))
* Optimize tensor conversions in C++ code to avoid unnecessary copies
([flashinfer-ai#366](flashinfer-ai#366))
([1116237](flashinfer-ai@1116237))

### Acknowledgement

We thank [@Yard1](https://github.com/Yard1),
[@Ying1123](https://github.com/Ying1123) and
[@zhyncs](https://github.com/zhyncs) for their contributions.

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zihao Ye <expye@outlook.com>

Jul 12, 2024
17a5f1b
zip
tar.gz

v0.0.8

bump version: v0.0.8 (flashinfer-ai#355)

##
[0.0.8](flashinfer-ai/flashinfer@v0.0.7...v0.0.8)
(2024-07-03)

### Bugfix

* fix prefill/append kernel behavior for empty kv-cache
([flashinfer-ai#353](flashinfer-ai#353))
([7adc8c](flashinfer-ai@7adc8cf))
* fix decode attention kernel with logits cap
([flashinfer-ai#350](flashinfer-ai#350))
([f5f7a2](flashinfer-ai@f5f7a2a))

Jul 3, 2024
478447e
zip
tar.gz

v0.0.7

ci: remove redundant `NUM_FRAGS_Z` (flashinfer-ai#345)

Do not compile `NUM_FRAGS_Z=6` to reduce wheel size.
Also revert flashinfer-ai#341 as they don't make effect.

Jun 30, 2024
fec77d0
zip
tar.gz

v0.0.6

fix: disable other warp layout because of large binary size (flashinf…

…er-ai#326)

Disable flashinfer-ai#322 for v0.0.6 release because binary size is too large.
v0.0.6 will only include bugfix at the moment.

Jun 21, 2024
c146e06
zip
tar.gz

v0.0.5

ci: fix setuptools version (flashinfer-ai#319)

We met the following error when running the building scripts:
```
   File "/tmp/home/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 28, in <module>
      from pkg_resources import packaging  # type: ignore[attr-defined]
  ImportError: cannot import name 'packaging' from 'pkg_resources' (/tmp/home/.local/lib/python3.10/site-packages/pkg_resources/__init__.py)
```

Let's follow
[this](https://stackoverflow.com/questions/78604018/importerror-cannot-import-name-packaging-from-pkg-resources-when-trying-to)
and see whether it works or not.

Jun 20, 2024
a0297e7
zip
tar.gz

v0.0.4

fix: remove 8 from default page size (flashinfer-ai#233)

To reduce pip wheel size.

May 4, 2024
62343e6
zip
tar.gz

v0.0.3

fix: fatal bugfix in batch decode operator (flashinfer-ai#177)

The `BatchDecodeWithPagedKVCacheWrapper` didn't run into the kernel.

Mar 12, 2024
238563f
zip
tar.gz

v0.0.2

update v0.0.2

Feb 18, 2024
1b75874
zip
tar.gz

v0.0.1

[Bugfix] Python package do not have `__version__` (flashinfer-ai#104)

Also fix some format issues in python docstring.

Feb 1, 2024
c55cd60
zip
tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0

v0.0.9

v0.0.8

v0.0.7

v0.0.6

v0.0.5

v0.0.4

v0.0.3

v0.0.2

v0.0.1

Tags: octoml/flashinfer