Is there a possiblity to support sycl natively as backend for llamafile? #498
Replies: 2 comments 1 reply
-
I think someone from Intel was working on this four months ago, but it was a weekend project and he ran into difficulties getting Windows to work. Integrating foreign code into Cosmopolitan Libc presents unique challenges that have a nontrivial learning curve. I'm the person most qualified to integrate something like SYCL. However I don't have any devices that need SYCL so there's no reason for me to do it. Contributions are very much welcome from outsiders if someone is up for the challenge of maintaining llamafile SYCL support. Otherwise it's unlikely to happen. We obviously do support Intel as a project very much. However I believe our resources are best devoted to having outstanding support for Intel's CPUs which is where Intel has always shined the most. It surprises me that the focus has been shifting to novel GPU architectures. Perhaps in the future things will change, but without the resources we can't be of much help. |
Beta Was this translation helpful? Give feedback.
-
I agree, it'd be nice to see. Not just speaking for this project, but in general, it's unfortunate that there is so very much fragmentation of usability wrt. GPUs with respect to them acting as compute accelerator devices such as in ML / NPU applications like this. In that sense they basically perform fast matrix / vector / tensor operations which are inherently conceptually and architecturally potentially very portable which is why BLAS, etc. have been core libraries across CPU and even GPU platforms for decades. Yet despite the modest "I'm just a compute accelerator" use cases we've ended up in a tower of Babylon where one has to be VERY LUCKY if any given SW project supports ANY GPU, and if they do support any one, it's probably a single specific platform e.g. nvidia / cuda ecosystem, apple / metal, whatever and likely do not support anything whatever else. We wouldn't nowadays commonly tolerate that kind of non-portability / fragmentation for CPU codes which can usually be / ideally are pretty platform / target portable across many languages, but we're still in the tar pit for GPUs. For GPUs (and even CPUs and other kinds of accelerators) there are (relatively) more open standards / cross platform based middleware / runtime / framework options like OpenCL, Vulkan compute, SYCL, OpenMP, OpenACC, but it's still quite a rarity to see such "more target portable" compute layers supported at all or first class by many projects vs. supporting natively 0-1 GPU type and nothing else. Eventually maybe we'll see the GPU acceleration capabilities eclipsed and made obsolete by incorporating relevant capabilities into the CPUs themselves at which point maybe we'll be back to more cross-platform portable parallel / vector accelerated codes in cross platform standard languages using layers like LLVM for targeting. In the mean time it's almost like anything but nvidia GPUs are just considered irrelevant in much of the ML space though not because they're not equally usable or because they're particularly hard to support for core use cases like BLAS etc. The thing about first class support for intel CPUs vs intel GPUs though is sort-of an interesting juxtaposition since SYCL / OneAPI etc. are originally from Intel and are Intel's own conceived solution for having a platform independent (Intel CPUs, Intel GPUs) HPC / parallel SW development & runtime framework that would be effective to use for projects that want to write the code once and be able to run it efficiently on a variety of intel CPUs, intel GPUs, as well as other vendor platforms / targets that also support SYCL (nvidia, I think, is one other). |
Beta Was this translation helpful? Give feedback.
-
The simplicity of LlamaFile is incredible, as all of us users know. I am just curious if there are any plans to support SYCL as a backend for LlamaFile. With Intel's current-gen Meteor Lake and next-generation Lunar Lake platforms, the iGPUs are getting at least half of the system RAM (so in a 32 GB RAM machine, we get around 16 GB VRAM), and these are based on the Arc Architecture. It would be awesome to get native support in LlamaFile so that applied AI folks can use whichever hardware they want (CPU, GPU, or even NPU in the near future) on Intel platforms. (Also, SYCL code can be run on GPUs of other vendors as well, using plugins from Codeplay.) Although there is Vulkan or OpenCL support, the SYCL backend on runtimes like llama.cpp (which already has support for sycl) has given me the best performance on my Meteor Lake and also on my Arc 770.
For people who are truly GPU-poor, having LlamaFile support for SYCL would enable users to run fast on Arc GPUs (an Arc 770 with 16 GB VRAM costs around 300 USD). This can be an excellent LLM server for home automation, and CPU cores can be used for other things like databases and non-matrix multiply operations.
Beta Was this translation helpful? Give feedback.
All reactions