Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multihead support for ITA #16

Merged
merged 4 commits into from
Feb 28, 2024
Merged

Conversation

Xeratec
Copy link
Member

@Xeratec Xeratec commented Feb 27, 2024

This pull request adds support for MHSA with ITA by decoupling the four ITA cores in MemPool and supporting different input matrix shapes (without precisely modelling the tiling of the hardware). Furthermore, it introduces fetching the requantization parameter from memory and specifying the location of the computation result by the user.

Finally, to reduce the generated output when running Banshee in debug mode, I propose to remove the log outputs for aligned and unaligned memcpy operation as well as switching from debug to trace messages for some very frequently produced messages.

Features

  • Added configuration file for MemPool with ITA

Changes

  • ITA: Load RQS parameter from memory
  • ITA: Support user specified output address for results
  • ITA: Support 32-bit row-vise biases
  • ITA: Support different matrix shapes
  • ITA: Fetch Q and K always from ITA Core 0
  • ITA: Vectorize computations
  • Cleanup debug message

Fix

  • ITA: Fix overflow bug in streaming_partial_softmax

Important Note

The softmax values have a maximum value of 127 as sumdot modules of the hardware can only do signed-signed operations for now. This is a temporary fix until sumdot is fixed.

config/mempool_ita.yaml Show resolved Hide resolved
src/engine.rs Show resolved Hide resolved
src/engine.rs Show resolved Hide resolved
Features
- Added configuration file for MemPool with ITA

Changes
- ITA: Load RQS parameter from memory
- ITA: Support user specified output address for results
- ITA: Support different matrix shapes
- ITA: Support 32-bit row-vise biases
- ITA: Fetch Q and K always from ITA Core 0

Fix
- ITA: Fix overflow bug in streaming_partial_softmax

Important Note
The softmax values have a maximum value of 127 as `sumdot` modules of the hardware can only do signed-signed operations for now. This is a temporary fix until `sumdot` is fixed.
@viv-eth viv-eth enabled auto-merge (squash) February 28, 2024 16:59
@viv-eth viv-eth merged commit 29607d2 into pulp-platform:main Feb 28, 2024
2 checks passed
@Xeratec Xeratec deleted the pr/ita_multihead branch April 17, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants