Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Speculative Decoding to picoGPT
This PR implements speculative decoding to improve text generation performance in picoGPT. The implementation uses a smaller draft model (124M) to predict multiple tokens at once, which are then verified by the main model, potentially reducing the number of forward passes needed for generation.
Key Changes
Added
generate_speculative()
function ingpt2.py
that implements:Modified
main()
function to support both standard and speculative generation modesuse_generate_speculative
flag (defaults to True)Added
benchmark_speculative.py
for performance comparison:Implementation Details
The speculative decoding algorithm works as follows:
Benefits
Testing
The PR includes a benchmarking script that measures performance improvements across different model sizes. Results can be reproduced by running:
Notes
n_speculative
parameterFuture Work