- Change rmm to a simple self written mem pool .
Change nanomsg to NNG.
- Use Cap'n Proto or flatbuffers instead of Protobuf.
CMake use boost histogram version aaccording to libboost-dev version.
- Allow user to choose 64bit or 32bit precision
try multi streamWorker workers
git branch multiThreadMultiStream e3cf1b4a92071e561ceed992cb18147915fd8f20
Test std::timed_mutex::try_lock_for vs std::mutex::try_lock
Use cudaStreamQuery to Query all stream, or cudaStreamAddCallback
Use streams in GPU.Run cudaOccupancyMaxPotentialBlockSize 1 time to get blockSize.- Use shared memory in kernel to reduce the usage of reg in kernel thread.