nbits=1 unpacking order and benchmarks #6

pravirkr · 2021-11-19T06:14:51Z

The order of the returned bits in unpack(array, nbits=1) is "little", while it should be big-endian to be consistent?
Checked with numpy.unpackbits.

Also, the numpy function is much faster. Benchmarked using the script stress_numbits.py from #1

OzSTAR 
python stress_numbits.py 32767000 1000
numbits
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=179.862s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=73.307s
numpy
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=76.506s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=44.053s

i7-7500U
python stress_numbits.py 32767000 1000
numbits
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=139.389s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=85.698s
numpy
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=37.087s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=32.116s

The text was updated successfully, but these errors were encountered:

telegraphic · 2021-11-19T07:05:32Z

Interesting!

Would we just need to edit https://github.com/telegraphic/numbits/blob/master/src/numbits.cpp#L44 to reverse the order?

After some digging the numpy code is here:
https://github.com/numpy/numpy/blob/5cc7ef066fca7a821a2160b095578384c301ae3c/numpy/core/src/multiarray/compiled_base.c#L1723

It looks insanely complicated, so surprised it's faster. I suspect it's multithreading? Could you add this at the top and see if speed changes?

import os 
os.environ['OPENBLAS_NUM_THREADS'] = '1' 
os.environ['MKL_NUM_THREADS'] = '1'

(from https://stackoverflow.com/questions/17053671/how-do-you-stop-numpy-from-multithreading)

My understanding is that 'there be dragons' with openmp with pybind due to the GIL, so may not be straightforward to match numpy, if it is indeed multithreading.

pravirkr · 2021-11-19T08:16:55Z

yes, I think reversing the order would do: jj -> 8 - nbits * (jj + 1). I can do a PR and add some tests.

It could be multithreading. I run the script again with the following before importing numpy.

os.environ["MKL_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
os.environ['OPENBLAS_NUM_THREADS'] = '1'

I get the same results for i7-7500U (laptop), but a bit different on OzSTAR CPU (still faster).

numbits
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=286.129s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=73.289s
numpy
nbits=1: unpack array_shape=(32767000,), loop_count=1000, et=170.583s
nbits=1:   pack array_shape=(32767000,), loop_count=1000, et=41.141s

pravirkr · 2024-04-09T23:47:36Z

Fixed by #9

pravirkr mentioned this issue Nov 24, 2021

Fix for nbits=1 unpacking order #7

Merged

pravirkr mentioned this issue Apr 3, 2024

unpack() function deals 1-bit data differently with ewanbarr/sigpyproc FRBs/sigpyproc3#26

Closed

pravirkr closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nbits=1 unpacking order and benchmarks #6

nbits=1 unpacking order and benchmarks #6

pravirkr commented Nov 19, 2021

telegraphic commented Nov 19, 2021

pravirkr commented Nov 19, 2021

pravirkr commented Apr 9, 2024

nbits=1 unpacking order and benchmarks #6

nbits=1 unpacking order and benchmarks #6

Comments

pravirkr commented Nov 19, 2021

telegraphic commented Nov 19, 2021

pravirkr commented Nov 19, 2021

pravirkr commented Apr 9, 2024