Richie’s Techbits newsletter: Issue 12

LocalAI has at least two backends that support Intel’s SYCL. One is Llama.cpp which is used to run LLMs locally and another is stable-diffusion.cpp which can generate images.

I own an Intel Arc A770 16GB card because there aren’t many GPUs you can buy with 16GB of RAM for $300. To my knowledge this is an unnecessary amount of VRAM for computer games, but it’s just enough to run smaller LLMs and other types of gen AI.

To be clear, this amount of VRAM allows you to run a 12b LLM with 8bit quantization. You won’t be able to plug this into a system that expects Claude or ChatGPT, but it is usable with software that takes the limits of a model this size into account.

However there is a bigger problem with this card, namely that it doesn’t support CUDA, NVIDIA’s very special toolkit for creating GPU accelerated applications that are not computer games.

Instead it supports Intel SYCL, OpenVINO and Vulkan. I don’t know where to begin with OpenVINO; LocalAI supports it as a seperate backend, it is fast when it works, but models have to be converted to its format and I could only ever get integrated GPUs to work with it. The A770 caused a bug which I reported, but I’m not sure what came of it.

Meanwhile Llama.cpp supports both SYCL and Vulkan, the latter is what computer games use and has good support on Linux. It has computation shaders (https://richiejp.com/1d-reversible-automata-webgpu) which can be used to perform the inference, but it’s rather low level and requires a lot of optimization work. As a result it is presently not that fast.

Intel OneAPI SYCL is similar to CUDA in that you write the computations in a C/C++ like language. It does a whole bunch to make this easier for you in comparison to using shader language. Including that it has optimized library functions and a special LLVM MLIR based C/C++ optimizing compiler. The result being that the performance is decent. I could be missing some details, but this is my general impression.

So this has been implemented in Llama.cpp and all LocalAI has to do is turn it on and use it. What could go wrong?

Compiling and linking fucking C/C++, that’s what.

To keep pace with the current version of Llama.cpp LocalAI compiles it from source. LocalAI has multiple backends to run various types of AI model and each backend communicates over gRPC because they are written in different languages. The Llama.cpp is written in C++ and uses Llama.cpp as a library.

In order to compile Llama.cpp with SYCL support it is not only required that Intel’s OneAPI libraries are present, but even that the whole thing is compiled with a Clang based Intel compiler called icp/icpx. What’s more you must pass the flag ‘-fsycl’ otherwise linking will fail with cryptic error messages about missing symbols.

And no, asking AI does not help at this point in time, it’s the type of issue that requires a lot of context and investigation. The framework needed to get an agent to perform this investigation is not in place yet. I’m sure we’ll get there, but you’ll be happy to know that solving shitty issues like this is still in the human domain at this time.

Anyway Llama.cpp is compiled with CMake along with the GGML library that is at its core. The Intel OneAPI supports CMake and GGML includes it this way. The LocalAI backend for Llama.cpp also uses CMake, so it can include Llama.cpp and enable SYCL using CMake variables.

Well actually that is not enough, we also need to call a script from OneAPI that sets some variables and set the CMake variables that control what compiler is being used.

This doesn’t all need to be figured out for LocalAI, Llama.cpp has a Dockerfile for Intel SYCL where it compiles its CLI in CI. We should just be able to adapt this Dockerfile to LocalAI’s build system and it should all work, right? Wrong, no, it did not work.

The reason it did not work is because ‘-fsycl’ was missing from some invocation of the icp/icpx compiler. It shouldn’t be missing though, it is set by Llama.cpp’s CMake file and it’s why the Llama.cpp Dockerfile works. At least that is what I thought.

However Builker pointed out here (https://github.com/mudler/LocalAI/issues/4905#issuecomment-2774074934) Ollama encountered a similar issue and added the ‘-fsycl’ flag to correct it. So I tried it and it worked. Why? I’m not sure exactly, perhaps the final executable doesn’t get any of the flags that Llama.cpp configures. I haven’t looked because prying apart the build system to find out takes time.

Then there is the stable-diffusion.cpp backend which has bigger problems because it is partially written in Go and uses CGO to interoperate with C++. Once again I encountered cryptic linker errors about missing symbols, but I had implemented all the fixes for Llama.cpp and also applied updates to stable-diffusion.cpp itself.

Then I realised from trawling through the 1000’s of lines of logs that the errors were actually coming from CGO and it wasn’t even using Intel’s special compiler. So of course it wouldn’t work. This made me feel rather silly, but then again, the Go code just calls the stable-diffusion.cpp library and doesn’t do anything with SYCL.

So why should it need to use the special compiler or link to some low level intrinsic? This should all be abstracted away in the library, but no. Linking is left until as late as possible so that when building the final executable I need to provide the dependencies of my dependencies.

Perhaps it’s possible to force linking at an earlier stage, perhaps the library could be compiled into a dynamic library that the CGO program links to a runtime or the Go code could be rewritten in C++. However instead I opted to configure CGO to use the right compiler and libraries.

I’m not sure if CMake works well with Go, so I ended up configuring the necessary libraries and flags with a combination of pkg-config and Intel OneAPIs online linker tool. This appears to work, but we have to hope that the necessary flags don’t change on a regular basis. (https://github.com/mudler/LocalAI/pull/5144)

So what is the moral of this story? Simply saying CMake, C/C++, Makefiles, etc. are bad is tempting, but… actually no I’m just going to say it, they are fucking awful. There is a lot of wisdom and knowledge locked away in them, there is a reason they survived, but there is soo much dead weight being carried with them.

The fact I can write complete rubbish in a Makefile and it won’t throw a syntax error until it passes the text to a shell or how in C you can call functions that don’t exist. These would be fine if they were the exception rather than the norm, but it’s not, it’s the default to be extremely relaxed and let stuff happen that will waste time.

That said I don’t blame people for picking tried and tested tools, but there are alternatives like Nix, Zig, Go and, if you really must go to the opposite extreme, Rust.