Adds speculative decoding #27

yashkgp · 2024-11-17T11:16:56Z

Add Speculative Decoding to picoGPT

This PR implements speculative decoding to improve text generation performance in picoGPT. The implementation uses a smaller draft model (124M) to predict multiple tokens at once, which are then verified by the main model, potentially reducing the number of forward passes needed for generation.

Key Changes

Added generate_speculative() function in gpt2.py that implements:
- Draft model token generation (default: 3 tokens at a time)
- Main model verification of speculative tokens
- Acceptance/rejection mechanism for speculative predictions
Modified main() function to support both standard and speculative generation modes
- Added use_generate_speculative flag (defaults to True)
- Loads both main and draft model parameters
- Routes to appropriate generation function based on the flag
Added benchmark_speculative.py for performance comparison:
- Benchmarks both standard and speculative generation
- Supports multiple model sizes (124M, 355M)
- Includes warm-up runs for more accurate measurements
- Reports generation time and percentage improvement

Implementation Details

The speculative decoding algorithm works as follows:

Draft model (124M) generates N speculative tokens (default N=3)
Main model verifies these predictions in a single forward pass
Accepted tokens are added to the sequence
If rejection occurs, fall back to main model's prediction

Benefits

Potential speedup in text generation by reducing the number of forward passes
Configurable number of speculative tokens
Minimal memory overhead (reuses existing 124M model as draft model)
Easy to toggle between standard and speculative modes

Testing

The PR includes a benchmarking script that measures performance improvements across different model sizes. Results can be reproduced by running:

python benchmark_speculative.py

Notes

The draft model is fixed to 124M for simplicity, but this could be made configurable
The number of speculative tokens (N=3) can be adjusted through the n_speculative parameter
Implementation is kept minimal and numpy-based, consistent with picoGPT's philosophy

Future Work

Make draft model size configurable
Add adaptive speculation length based on acceptance rate
Optimize verification step for better performance

Adds speculative decoding

20301c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds speculative decoding #27

Adds speculative decoding #27

yashkgp commented Nov 17, 2024

Adds speculative decoding #27

Are you sure you want to change the base?

Adds speculative decoding #27

Conversation

yashkgp commented Nov 17, 2024

Add Speculative Decoding to picoGPT

Key Changes

Implementation Details

Benefits

Testing

Notes

Future Work