Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from ashvardanian:main #18

Merged
merged 3 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,9 @@
"unscalable",
"Uring",
"Vardanian",
"Verilog",
"vfmadd",
"VHDL",
"VNNI",
"VPCLMULQDQ",
"Weis",
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ cmake_minimum_required(VERSION 3.25.2 FATAL_ERROR)
# Project Setup
# ------------------------------------------------------------------------------
project(less_slow
VERSION 0.9.0
VERSION 0.9.1
LANGUAGES C CXX ASM
DESCRIPTION "Learning how to write Less Slow code, from numerical micro-kernels and SIMD to coroutines, ranges, and polymorphic state machines"
HOMEPAGE_URL "https://github.com/ashvardanian/less_slow.cpp")
Expand Down
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ Some of the highlights include:
- __CUDA C++, [PTX](https://en.wikipedia.org/wiki/Parallel_Thread_Execution) Intermediate Representations, and SASS__, and how do they differ from CPU code?
- __How to choose between intrinsics, inline `asm`, and separate `.S` files__ for your performance-critical code?
- __Tensor Cores & Memory__ differences on CPUs, and Volta, Ampere, Hopper, and Blackwell GPUs!
- __What are Encrypted Enclaves__ and what's the latency of Intel SGX, AMD SEV, and ARM Realm? 🔜
- __How coding FPGA differs from GPU__ and what is High-Level Synthesis, Verilog, and VHDL? 🔜 #36
- __What are Encrypted Enclaves__ and what's the latency of Intel SGX, AMD SEV, and ARM Realm? 🔜 #31

To read, jump to the [`less_slow.cpp` source file](https://github.com/ashvardanian/less_slow.cpp/blob/main/less_slow.cpp) and read the code snippets and comments.
Follow the instructions below to run the code in your environment and compare it to the comments as you read through the source.
Expand Down Expand Up @@ -109,6 +110,23 @@ Alternatively, use the Linux `perf` tool for performance counter collection:
sudo perf stat taskset 0xEFFFEFFFEFFFEFFFEFFFEFFFEFFFEFFF build_release/less_slow --benchmark_enable_random_interleaving=true --benchmark_filter=super_sort
```

## Project Structure

The primary file of this repository is clearly the `less_slow.cpp` C++ file with CPU-side code.
Several other files for different hardware-specific optimizations are created:

```sh
$ tree .
.
├── CMakeLists.txt # Build & assembly instructions for all files
├── less_slow.cpp # Primary CPU-side benchmarking code with the majority of examples
├── less_slow_amd64.S # Hand-written Assembly kernels for 64-bit x86 CPUs
├── less_slow_aarch64.S # Hand-written Assembly kernels for 64-bit Arm CPUs
├── less_slow.cu # CUDA C++ examples for parallel algorithms for Nvidia GPUs
├── less_slow_sm70.ptx # Hand-written PTX IR kernels for Nvidia Volta GPUs
└── less_slow_sm90a.ptx # Hand-written PTX IR kernels for Nvidia Hopper GPUs
```

## Memes and References

Educational content without memes?!
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.9.0
0.9.1