On Linux you can create and mount file systems in userspace. You don’t even need to be root. This allows for things like LiteFS which intercepts reads and writes to an SQLite database, or jxl-fuse.
Jxl-fuse is particularly relevant because it is written in Zig. It is also an interesting use case; it allows you to store images in JPEG XL format and then convert them on the fly to regular JPEG.
This means you get better compression while maintaining compatibility. Generalising, this let’s us decouple the storage format from what the application loads. Essentially meaning we can transparently insert an adapter between the storage and the application at runtime.
FUSE gets interesting when you think; what is a file system really? I usually think of a complicated data structure which stores data at particular paths.
However if you are familiar with the linux kernel (or similar)
interfaces. In particular the /proc
and
/sys
file systems, then the term starts to take on
another meaning. A file system in the kernel is really some code
which implements an interface.
The interface being functions like open
,
stat
, read
, write
,
seek
, close
, etc. Each function is limited
by what arguments it takes, but potentially it can do anything.
read
can generate data on the fly or have side
effects.
Zig is a great language for systems programming and it turns out to be reasonably easy to get it working with FUSE.
libfuse
To simplify interacting with the kernel interface there is a C library with the obvious name. It would be better to use the kernel interface directly. Both for performance and to take full advantage of Zig. However it would require implementing the message protocol from scratch.
Also libfuse comes with some examples, so I approximately translated one of the examples into Zig. This is an easy way to get a feel for FUSE development or to quickly get started implementing a file system.
I had some trouble compiling libfuse with Zig. Libfuse uses Meson
which didn’t like Zig’s linker version output. It would be nice to
create a build.zig
for libfuse, but I think the effort
would be better directed at implementing the FUSE protocol
directly.
So to get things moving I linked against the system’s libfuse
(NixOS in my case). This can be seen in the build.zig
,
which we will get to in a moment.
Zig can directly import C headers, however I decided to translate the header to Zig and include that instead. The reason being that I can then look at the contents and modify them.
$ zig translate-c -DFUSE_USE_VERSION=31 \
-isystem /nix/store/jan1gkl34v83h1pwd43q716nsvf06miq-fuse-3.11.0/include\
-isystem /nix/store/kd1z202w3l3njfn7n6dkyridwvnm3yg2-musl-1.2.3-dev/include \
> src/fuse31.zig /nix/store/jan1gkl34v83h1pwd43q716nsvf06miq-fuse-3.11.0/include/fuse3/fuse.h
The path names are awful because it allows Nix to maintain many versions of the same software on the same system.
I found that I had to include the FUSE directory and the libc directory. I used musl instead of glibc because whenever I want to know how something in libc works I go to musl.
The FUSE_USE_VERSION
needs to be set. Possibly other
things could be set, but this was enough to get the symbols I
wanted.
building
The build.zig
is pretty much the default produced by
zig init-exe
. I’ll just include the bits that were
changed.
const exe = b.addExecutable(.{
"fuse",
.name =
...true,
.link_libc = });
"fuse3"); exe.linkSystemLibrary(
So all I had to do was link to libc and fuse3. As discussed above, using the system’s libfuse is not ideal; depending on the distribution the static and cross-compiled libraries may not be available. Linking to a shared library is not great for optimisation.
It is actually quite easy to compile libfuse to a static library with Meson if the distribution doesn’t support it. However we still don’t get the full magic of Zig’s cross compilation.
Hello Zig
I copied libfuse/example/hello.c. The entry point in C looks like the following.
static const struct fuse_operations hello_oper = {
.init = hello_init,
.getattr = hello_getattr,
.readdir = hello_readdir,
.open = hello_open,
.read = hello_read,
};
...
int main(int argc, char *argv[])
{
int ret;
struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
...
.filename = strdup("hello");
options.contents = strdup("Hello World!\n");
options
/* Parse options */
if (fuse_opt_parse(&args, &options, option_spec, NULL) == -1)
return 1;
...
if (options.show_help) {
(argv[0]);
show_help(fuse_opt_add_arg(&args, "--help") == 0);
assert.argv[0][0] = '\0';
args}
= fuse_main(args.argc, args.argv, &hello_oper, NULL);
ret (&args);
fuse_opt_free_argsreturn ret;
}
I ignored the stuff about parsing the filename and contents from
the command line. You probably don’t want libfuse to parse the
command line when using Zig, but for now I just passed the args into
fuse_main
untouched (there are alternatives to
fuse_main
).
const std = @import("std");
const log = std.log;
...const fuse = @import("fuse31.zig");
...
const ops = mem.zeroInit(fuse.struct_fuse_operations, .{
.init = init,
.getattr = getattr,
.readdir = readdir,
.open = open,
.read = read,});
pub fn main() !u8 {
"Zig hello FUSE", .{});
log.info(
const ret = fuse.fuse_main_real(
@intCast(std.os.argv.len),
@ptrCast(std.os.argv.ptr),
&ops,@sizeOf(@TypeOf(ops)),
null,
);
return switch (ret) {
0 => 0,
1 => error.FuseParseCmdline,
2 => error.FuseMountpoint,
3 => error.FuseNew,
4 => error.FuseMount,
5 => error.FuseDaemonize,
6 => error.FuseSession,
7 => error.FuseLoopCfg,
8 => error.FuseEventLoop,
else => error.FuseUnknown,
};
}
libfuse uses a common C idiom of having a struct full of
callbacks to implement an interface. In this case
struct fuse_operations
(fuse.struct_fuse_operations
in Zig) which we pass to
fuse_main_real
. We’ll look at the function
implementations below.
Note that in the C we just call fuse_main
which is a
macro. Zig could not translate this macro. So instead we have to
call fuse_main_real
which is what the macro points
to.
In Zig struct fuse_operations
needs to be
initialised with mem.zeroInit
. This sets most of the
fields (of which there are a lot) to null
except for
those specified in the second argument. This is an anti-pattern in
Zig, but is required when dealing with C.
In the Zig version I translated the fuse_main
return
value to an implicit error enum. I’m not sure why, probably I
thought it would help with debugging.
getattr
Now lets look at some of the interface implementation. First we
have getattr
which more or less correlates with the
stat
system call. This returns some file attributes
like its size, whether it is a directory, whether it can be read or
written.
The C version looks like this.
static int hello_getattr(const char *path, struct stat *stbuf,
struct fuse_file_info *fi)
{
(void) fi;
int res = 0;
(stbuf, 0, sizeof(struct stat));
memsetif (strcmp(path, "/") == 0) {
->st_mode = S_IFDIR | 0755;
stbuf->st_nlink = 2;
stbuf} else if (strcmp(path+1, options.filename) == 0) {
->st_mode = S_IFREG | 0444;
stbuf->st_nlink = 1;
stbuf->st_size = strlen(options.contents);
stbuf} else
= -ENOENT;
res
return res;
}
When converting this, my first question was how do I create a Zig
function which can be called from C? This is where
fuse31.zig
is very useful because it contains the
function signatures inside.
pub const struct_fuse_operations = extern struct {
const fn ([*c]const u8, ?*struct_stat, ?*struct_fuse_file_info) callconv(.C) c_int,
getattr: ?*const fn ([*c]const u8, [*c]u8, usize) callconv(.C) c_int,
readlink: ?*const fn ([*c]const u8, mode_t, dev_t) callconv(.C) c_int,
mknod: ?*const fn ([*c]const u8, mode_t) callconv(.C) c_int,
mkdir: ?* ...
We just need to add a function name and some argument names to
the function pointer’s signature. Below is the Zig implementation of
getattr
along with some helpers.
const E = std.os.linux.E;
...
const filename: [:0]const u8 = "hello";
const contents = "Alright, mate!\n";
fn cErr(err: E) c_int {
const n: c_int = @intFromEnum(err);
return -n;
}
...
fn getattr(
const u8,
path: [*c]
stat: ?*fuse.struct_stat,
_: ?*fuse.struct_fuse_file_info,c_int {
) callconv(.C) var st = mem.zeroes(fuse.struct_stat);
const p = mem.span(path);
"stat: {s}", .{p});
log.info(
if (mem.eql(u8, "/", p)) {
0o0755;
st.st_mode = fuse.S_IFDIR | 2;
st.st_nlink = } else if (mem.eql(u8, filename, p[1..])) {
0o0444;
st.st_mode = fuse.S_IFREG | 1;
st.st_nlink =
st.st_size = contents.len;} else {
return cErr(E.NOENT);
}
stat.?.* = st;
return 0;
}
It looks similar to the C, but note that we do not write directly
to the passed struct stat
in Zig. We zero a new struct
and copy it at the end of the function. The stat argument is an
optional pointer and it feels like Zig discourages one from
interacting with it piecemeal. It makes sense to only check once if
it is null or not.
The path argument should be a null terminated C string. I like to
convert it to a slice using mem.span
. Then we can
compare it directly with other slices or get its length without
doing another count.
readdir
Next up we have readdir
which correlates with the
opendir[at]
and getdents[64]
system calls.
There is a deprecated readdir
syscall on some
architectures as well, but something went wrong.
Implementing this allows us to use ls
on the root of
the mount. The C implementation looks like this
static int hello_readdir(const char *path, void *buf, fuse_fill_dir_t filler,
, struct fuse_file_info *fi,
off_t offsetenum fuse_readdir_flags flags)
{
(void) offset;
(void) fi;
(void) flags;
if (strcmp(path, "/") != 0)
return -ENOENT;
(buf, ".", NULL, 0, 0);
filler(buf, "..", NULL, 0, 0);
filler(buf, options.filename, NULL, 0, 0);
filler
return 0;
}
It seems we are given a buffer and function called
filler
. We can add entries to the buffer with
filler
. Most of the functionality is ignored, just the
paths are added.
Now the Zig version
fn readdir(
const u8,
path: [*c]
buf: ?*anyopaque,
filler: fuse.fuse_fill_dir_t,
_: fuse.off_t,
_: ?*fuse.struct_fuse_file_info,
_: fuse.enum_fuse_readdir_flags,c_int {
) callconv(.C) const p = mem.span(path);
"readdir: {s}", .{p});
log.info(
if (!mem.eql(u8, "/", p))
return cErr(E.NOENT);
const names = [_][:0]const u8{ ".", "..", filename };
for (names) |n| {
const ret = filler.?(buf, n, null, 0, 0);
if (ret > 0)
"readdir: {s}: {}", .{ p, ret });
log.err(}
return 0;
}
The filler
callback returns a value which Zig
doesn’t want to be ignored. C has an attribute for that as well, but
it is not the default. In Zig we either pay attention to the return
value or explicitly ignore it with _ = filler...
.
I didn’t look into what is the right thing to do when filler fails. It depends on what is likely to fail and how that could be communicated to the user.
open
The open
syscall tries to associate a file handle
with a path. All the libfuse callbacks I have seen take a path as
their first argument instead of a file handle. However the file
handle is still there it is just buried in
struct fuse_file_info
.
The C implementation of open
looks like this
static int hello_open(const char *path, struct fuse_file_info *fi)
{
if (strcmp(path+1, options.filename) != 0)
return -ENOENT;
if ((fi->flags & O_ACCMODE) != O_RDONLY)
return -EACCES;
return 0;
}
It just checks the path and access mode. The Zig version looks like this
// May not be the correct size depending on the target because of the
// bitfield: https://github.com/ziglang/zig/issues/1499
const FileInfo = extern struct {
c_int,
flags: u32,
bitfield: u32,
padding2: u64,
fh: u64,
lock_owner: u32,
poll_events: };
...
fn open(
const u8,
path: [*c]
file_info: ?*fuse.struct_fuse_file_info,c_int {
) callconv(.C) const p = mem.span(path);
const fi: *FileInfo = @ptrCast(@alignCast(file_info.?));
"open: {s}", .{p});
log.info(
if (!mem.eql(u8, filename, p[1..]))
return cErr(E.NOENT);
if ((fi.flags & fuse.O_ACCMODE) != fuse.O_RDONLY)
return cErr(E.ACCES);
return 0;
}
The struct fuse_file_info
contains a bitfield which
can’t presently be translated from C. Zig has bitfields as well, but
they have the same layout on all targets. In C, bitfields change
between targets, which means extra work for Zig’s authors. You can
see why in the linked
issue.
Luckily we just want to access flags
which comes
before the bitfield. We could even just cast the pointer to
*c_int
as we don’t access any memory after it. If we
needed to know where some other part of the struct came in memory
then we could have an issue.
read
Finally we have a call to read the file content
static int hello_read(const char *path, char *buf, size_t size, off_t offset,
struct fuse_file_info *fi)
{
size_t len;
(void) fi;
if(strcmp(path+1, options.filename) != 0)
return -ENOENT;
= strlen(options.contents);
len if (offset < len) {
if (offset + size > len)
= len - offset;
size (buf, options.contents + offset, size);
memcpy} else
= 0;
size
return size;
}
And the Zig version
fn read(
const u8,
path: [*c]u8,
buf: [*c]usize,
size:
offset: fuse.off_t,
_: ?*fuse.struct_fuse_file_info,c_int {
) callconv(.C) const p = mem.span(path);
const off: usize = @intCast(offset);
"read: {s},size={},offset={}", .{ p, size, offset });
log.info(
if (!mem.eql(u8, filename, p[1..]))
return cErr(E.NOENT);
if (off >= contents.len)
return 0;
const s = if (off + size > contents.len)
contents.len - offelse
size;
@memcpy(buf[0..s], contents[off..]);
return @intCast(s);
}
Zig is quite strict about what types can appear in operations
together. So off
has to be cast to usize
or else we would have to cast size
and
contents.len
to
fuse.off_t
/c_long
.
The memcpy
in Zig is done with slices instead of
pointer arithmetic. The buf
argument is a many-item
pointer, but we slice it up to s
which is either
size
or contents.len - off
.
run
The Zig version can be built and run as follows, this will also mount the filesystem.
$ mkdir /tmp/fuse
$ zig build run -- -f /tmp/fuse
Then in another terminal you can do
$ ls -l /tmp/fuse
total 0
-r--r--r-- 1 root root 15 Jan 1 1970 hello
$ cat /tmp/fuse/hello
Alright, mate!
This produces log output similar to
info: Zig hello FUSE
info: stat: /
info: readdir: /
info: stat: /hello
info: stat: /hello
info: stat: /hello
info: stat: /
info: readdir: /
info: open: /hello
info: read: /hello,size=4096,offset=0
fin
Depending on what it is you want to do this is a quick way of getting started. If you are embarking on a complex project then implementing the kernel interface directly seems like the way to go.
For something simple, then the main concern is the usage of bitfields. I guess from looking at the bitfield in question that it won’t have padding added to it. Below is the struct definition with comments removed.
struct fuse_file_info {
int flags;
unsigned int writepage : 1;
unsigned int direct_io : 1;
unsigned int keep_cache : 1;
unsigned int parallel_direct_writes : 1;
unsigned int flush : 1;
unsigned int nonseekable : 1;
unsigned int flock_release : 1;
unsigned int cache_readdir : 1;
unsigned int noflush : 1;
unsigned int padding : 23;
unsigned int padding2 : 32;
...
}
The author added explicit padding for the remaining 23 bits in a 32-bit int. So probably it’s fine, the same struct can be recreated in Zig.