llama.cpp is an open-source implementation of Meta's LLaMA architecture, written in C++. It is designed to facilitate efficient inference of large language models on various hardware platforms, including consumer-grade devices.
Platform Specific Instuctions and scripts used for LLM-Inference-Bench