Skip to content

mert-kurttutan/pire

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pire: Package for high performance cpu kernels

State of the art gemm-like kernels for cpus (using inline assembly)

Include quantized gemm, sgemm, hgemm, dgemm, integer gemm.

Working on putting more kernels used in many LLMs.

Features:

  • packed api for matrices
  • gemm+unary function fusion

State of the Art:

  • The same performance as MKL within 1% performance, you can check benchmark directory
  • Needed some optimization for layout with tn, nt (dot kernel in blis terminology)

Why this, not blis, or openblas?

  • I wanted to write somehting on my own so I can explore the path towards state of the art faster. I also wanted to write something in Rust since Rust had two features I liked very much, run time dispatch features target with #[target_feature(enable = "avx,avx2,fma")] and its inline assembly combined with its macro system seemed very convenient to write hand optimized assembly code for gemm kernels.
  • Those projects also dont offer several features I wanted: packed interface, unary function fusion, integer gemm, inline assembly for llm quantized gemm kernels (to be worked)

Another way to look at this project is to have collection most optimized cpu kernels for their respective gemm ops so that it can be used as a reference for high performance numerical jit kernels.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages