Add interface to `Guide` object to update masks in place, and associated kernels. #183

unaidedelf8777 · 2025-02-25T01:28:29Z

towards resolving #178

On the guide object, I added the write_mask_into method. This method takes 3 arguments:

data_ptr: pointer to the start of the contiguous memory for the array
numel: number of elements in the array
element_size: size in bytes of each element in the array. This is checked to be 4, since we only support u32 arrays. If it is not 4, a ValueError is thrown.

In a mask array, each u32 represents the validity of 32 tokens ( one per bit ). Additionally, masks must also be stored in contiguous memory, in order for Rust to access and modify them.

Currently, kernels for both torch and numpy are implemented. The numpy kernels require an additional dependency on numba in order to bring runtime down to around 40 microseconds ( 1 mask, 1 logits array ). runtime for the torch kernel with 1 mask and 1 logits array is half of numpy, at ~23 microseconds per run, mostly due to torch.compile. The form of the numpy kernel is not final as of now; It will be updated to have better scaling and vectorized ops. If I can without hurting performance ( or if you all would like ), I will remove the dependency on numba.

All kernels reside in the outlines_core.kernels submodule, with dependencies for each kernel imported dynamically in a try - except instead of being added to the package dependencies.

TODO:

Please feel free to critique any of this.

unaidedelf8777 · 2025-02-25T02:31:59Z

Benchmarks for write_into_mask as of now, tested with the unsloth/Llama-3.1-8B-Instruct tokenizer:

Benchmarking regex: "[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
Time to write mask for regex: 93 useconds
Num Allowed tokens: 25510

Benchmarking regex: '\\\\+?[1-9][0-9]{7,14}'
Time to write mask for regex: 11 useconds
Num Allowed tokens: 4

Benchmarking regex: '([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])(\\.|-|/)([1-9]|0[1-9]|1[0-2])(\\.|-|/)([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])|([0-9][0-9]|19[0-9][0-9]|20[0-9][0-9])(\\.|-|/)([1-9]|0[1-9]|1[0-2])(\\.|-|/)([1-9]|0[1-9]|1[0-9]|2[0-9]|3[0-1])'
Time to write mask for regex: 8 useconds
Num Allowed tokens: 130

Benchmarking regex: '(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
Time to write mask for regex: 10 useconds
Num Allowed tokens: 366

Benchmarking regex: '(https?:\\/\\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\\/\\w \\.-]*)*\\/?'
Time to write mask for regex: 84 useconds
Num Allowed tokens: 23277

Benchmarking regex: '\\d{3}-\\d{2}-\\d{4}'
Time to write mask for regex: 13 useconds
Num Allowed tokens: 1222

src/index.rs

python/outlines_core/outlines_core_rs.pyi

python/outlines_core/kernels/torch.py

rlouf · 2025-02-25T07:26:50Z

Awesome! Do you have some profiling results that show the time spend on each operation across the whole chain?

rlouf · 2025-02-25T07:30:49Z

python/outlines_core/kernels/torch.py

+
+# This takes roughly 23 microseconds per run, with a bitmask of 
+# 1k allowed tokens, and 128k logits tensor.
+# Also compiles to one graph with no graph breaks


Is there any way to access the CUDA code generated by PyTorch? It might be over-engineering for now, but I'd like to get an idea of how efficient that code is and if there are gains to be had there in the future.

Seems possible - just have to find the temp directory where it dumps it: https://pytorch.org/tutorials/intermediate/inductor_debug_cpu.html

Co-authored-by: Rémi Louf <[email protected]>

unaidedelf8777 · 2025-02-25T16:51:30Z

Awesome! Do you have some profiling results that show the time spend on each operation across the whole chain?

@rlouf For Rust, Kernels, Or Both?

Add write_into_mask method on Guide. Add kernels for torch and numpy

c01e1b8

rlouf reviewed Feb 25, 2025

View reviewed changes

src/index.rs Outdated Show resolved Hide resolved

python/outlines_core/outlines_core_rs.pyi Outdated Show resolved Hide resolved

python/outlines_core/kernels/torch.py Outdated Show resolved Hide resolved

rlouf reviewed Feb 25, 2025

View reviewed changes

unaidedelf8777 and others added 3 commits February 25, 2025 10:47

Update src/index.rs

31a48cc

Co-authored-by: Rémi Louf <[email protected]>

Update python/outlines_core/outlines_core_rs.pyi

d874306

Co-authored-by: Rémi Louf <[email protected]>

Update python/outlines_core/kernels/torch.py

bb854f6

Co-authored-by: Rémi Louf <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add interface to `Guide` object to update masks in place, and associated kernels. #183

Add interface to `Guide` object to update masks in place, and associated kernels. #183

unaidedelf8777 commented Feb 25, 2025

unaidedelf8777 commented Feb 25, 2025 •

edited by rlouf

Loading

rlouf commented Feb 25, 2025

rlouf Feb 25, 2025

unaidedelf8777 Feb 25, 2025 •

edited

Loading

unaidedelf8777 commented Feb 25, 2025

Add interface to Guide object to update masks in place, and associated kernels. #183

Are you sure you want to change the base?

Add interface to Guide object to update masks in place, and associated kernels. #183

Conversation

unaidedelf8777 commented Feb 25, 2025

unaidedelf8777 commented Feb 25, 2025 • edited by rlouf Loading

rlouf commented Feb 25, 2025

rlouf Feb 25, 2025

Choose a reason for hiding this comment

unaidedelf8777 Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

unaidedelf8777 commented Feb 25, 2025

Add interface to `Guide` object to update masks in place, and associated kernels. #183

Add interface to `Guide` object to update masks in place, and associated kernels. #183

unaidedelf8777 commented Feb 25, 2025 •

edited by rlouf

Loading

unaidedelf8777 Feb 25, 2025 •

edited

Loading