richiejp logo

Richie’s Techbits newsletter: Issue 3: Spying with eBPF, WASM shouldn’t exist, Go fast with Unikernels and tools

In this weeks issue

Using eBPF to spy on Linux kernel internals

Before I get into the tech details here are some practial things you can do with eBPF

The eBPF subsystem for Linux allows you to safely (well almost) insert code into the Linux kernel and userspace. The code can be attached at various points, some stable and others not so much. There are different types of eBPF program with some allowing actions to be taken via helper functions or return codes, while others can read arbitrary kernel memory. The programs have access to maps which allow them to communicate with user space programs or each other.

This essentially provides a safe runtime extension mechanism for the Linux kernel. I’m a bit dubious about whether it is safe to load some random eBPF code, but the eBPF verifier makes it safer than loading a kernel module. The verifier itself has been the subject of numerous exploits and no doubt the JIT compiler to. When I worked on creating reproducers for eBPF bugs I got the impression that the subsystem is complex enough that securing it against malicious eBPF may be a Sisphyean task beyond the resources available for it.

Of course focusing on bugs gives one a slightly warped view of things. The verifier does an excellent job of making code safe, it can be a pain to work with, but it’s hard to accidentally write eBPF that breaks your system. Especially if you stick to the eBPF programs that can run in unprivileged mode. Typically I would think it unwise to allow completely untrusted users to load any kind of eBPF program, but as a way of stopping accidents it is a brilliant tool.

A strong use for eBPF, and perhaps even more common that packet filtering, is observability. There are many tools available that use eBPF to measure performance, bandwidth and various other metrics. This week I spent some time investigating how to track which process is responsible for sending a packet.

Although there are many tools that claim to do this, it is somewhat surprising that there isn’t a truly convenient hook point in the kernel where a packet’s headers can be read while the sending process’s PID is known. I’ll expand on this in a seperate article, but here are some things to look out for when writing eBPF, in particular programs dealing with packets:

libbpf in combination with BTF has this great feature called CO-RE which allows one to partially define kernel structs at compile time, then relocates the program when it is loaded into the kernel. This means kprobes can be used on different kernels. However I’ve seen a number of eBPF tools that try to hook kernel functions where some of the call sites are probably missing on different kernels. There’s no error when this happens because some call sites still exist, so telemetry can be silently discarded. Some eBPF program types have stable interfaces, but the more advanced programs I have seen usually resort to kprobes where prolems like this abound.

The stable interface for eBPF is constantly expanding however, so my prediction is that in the future these tools will evolve with the kernel to become rock solid. I even think it could make the Rust for Linux project slightly redundant if device drivers can be written in eBPF.

WASM shouldn’t exist

I remember reading the original paper on BPF and why it was important that BPF was register based. They thought about the actual hardware BPF would need to run on. BPF was then baptised in fire in FreeBSD and then the Linux kernel. It wasn’t forced on Linux, it didn’t have a standards body behind it, there was just FreeBSD to vouch for it. Later eBPF was introduced which added JIT, more instructions and a bunch of other stuff you can see in the linked article. It has evolved over time based on feedback and contains a lot of organic solutions for integrating byte code into a broader system.

The fact that it is register based and has about the number of registers that real CPUs have, means it can be easily JIT translated into the host CPU’s native instructions. This may not be the absolute best thing for the performance of hot loops on any particular CPU, but it makes the JIT translation very fast and the performance is close enough to if the compiler was optimizing for a particular CPU.

Meanwhile WASM is stack based and it is expected that the WASM compiler will optimize this for real, register based, CPUs at load time. It’s not clear to me what the advantage of this is over JavaScript with some extension’s for things like SIMD, native types and manual memory management. They both require compiling and actually JavaScript has the advantage of being able to access all of the web API’s without awkward bindings.

In my opinion they should have adapted eBPF to the needs of the browser or just stuck with asm.js. Having said that, in absolute terms, WASM is very good and most of it’s problem will be arbitraged with shims and libraries. So practically speaking WASM may be a good choice when choosing a byte code to support.

You should convert your app to use a Unikernel

In fact you should hire me to convert your app to run on/in a Unikernel. What you’ll get

Although I have to point out that cutting down the Linux kernel and running your app as init (I did this) will get a lot of these benefits while retaining Linux’s hardening and many other features. Fly do something similar by using a very slim init and running apps in a single container on a lightweight container runtime. So in this case an attacker would usually first need to get root in their VM, then escape the VM. In a Unikernel an attacker is either “root” or very close as soon as they get code execution.

Regardless of how much you cut out of Linux though, it’s never going to beat a unikernel in terms of overall performance. There may be edge cases where Linux’s memory mangement has been far better optimized, but most of the time it will just be doing a whole lot of unecessary work.

Traditionally Unikernels have been difficult to write for because they don’t have POSIX compatible system calls or indeed any system calls. They have an API particular to them, like embedded kernels or really just like libraries you use on bare-metal. Some time ago though I came across Nanos and more recently Unikraft, both of these are Linux compatible to the extent that many popular applications will run on them unmodified.

Nanos even retains the kernel-user-space barrier, meaning it has system calls and the kernel has some memory protection from your app. With Unikraft I’m not so sure, but of course there is a performance cost with having real system calls so it is a trade-off.

I haven’t used Unikraft, but I did convert a NodeJS app to work on Nanos and here are some issues you may face

On the last point Unikraft claims some support for Docker and Kubernetes and possibly Nanos has moved on since I used it. However a VM is fundamentally different from a container and if your current infra is based on containers from top to bottom then there is going to be friction. Personally though I think it would be a net win to get rid of Kubernetes :-).

Writting your app to run on top of Linux with a minimal userland is also a valid way of doing things if you want to keep features like eBPF. Also if you want to deploy to a bare metal system which needs the Linux drivers. There are of course embedded “distributions” like Yocto and Buildroot, which can produce a stripped down userland.

Tools