Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GPT-NeoX allocating full-length KV cache (octoml#179)
This PR changes the GPT-NeoX KV cache creation function to create to full size at the beginning, so no memory allocation will be required when running on the fly.
- Loading branch information