Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nbits=1 unpacking order and benchmarks #6

Closed
pravirkr opened this issue Nov 19, 2021 · 3 comments
Closed

nbits=1 unpacking order and benchmarks #6

pravirkr opened this issue Nov 19, 2021 · 3 comments

Comments

@pravirkr
Copy link
Collaborator

The order of the returned bits in unpack(array, nbits=1) is "little", while it should be big-endian to be consistent?
Checked with numpy.unpackbits.

Also, the numpy function is much faster. Benchmarked using the script stress_numbits.py from #1

OzSTAR 
python stress_numbits.py 32767000 1000
numbits
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=179.862s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=73.307s
numpy
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=76.506s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=44.053s
i7-7500U
python stress_numbits.py 32767000 1000
numbits
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=139.389s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=85.698s
numpy
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=37.087s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=32.116s
@telegraphic
Copy link
Owner

Interesting!

Would we just need to edit https://github.com/telegraphic/numbits/blob/master/src/numbits.cpp#L44 to reverse the order?

After some digging the numpy code is here:
https://github.com/numpy/numpy/blob/5cc7ef066fca7a821a2160b095578384c301ae3c/numpy/core/src/multiarray/compiled_base.c#L1723

It looks insanely complicated, so surprised it's faster. I suspect it's multithreading? Could you add this at the top and see if speed changes?

import os 
os.environ['OPENBLAS_NUM_THREADS'] = '1' 
os.environ['MKL_NUM_THREADS'] = '1'

(from https://stackoverflow.com/questions/17053671/how-do-you-stop-numpy-from-multithreading)

My understanding is that 'there be dragons' with openmp with pybind due to the GIL, so may not be straightforward to match numpy, if it is indeed multithreading.

@pravirkr
Copy link
Collaborator Author

yes, I think reversing the order would do: jj -> 8 - nbits * (jj + 1). I can do a PR and add some tests.

It could be multithreading. I run the script again with the following before importing numpy.

os.environ["MKL_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
os.environ['OPENBLAS_NUM_THREADS'] = '1'

I get the same results for i7-7500U (laptop), but a bit different on OzSTAR CPU (still faster).

numbits
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=286.129s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=73.289s
numpy
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=170.583s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=41.141s

@pravirkr
Copy link
Collaborator Author

pravirkr commented Apr 9, 2024

Fixed by #9

@pravirkr pravirkr closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants