richiejp logo

Zig Vs C - Minimal HTTP server

While working on my Linux socket example I decided to write a tiny HTTP server for previewing my static website. This shows the basics of using TCP sockets, correctly adds .html to routes without it and saves me the distress of typing python, npm or similar blasphemies. The server is barely functional of course. However it is enough to get my pages to appear in FireFox and Chrome.

I started a HTTP/2 version: A barely HTTP/2 server in Zig

It also happens that I am desperate to write Zig code. It’s an unfortunate part of my personality that I can not stay away from new languages (and kernels, web frameworkers etc.). If you want to ruin a project then choosing all new stuff is an excellent way to go about it. However I’ve learned the hard way to try out one new thing at a time. So in this article I’m just going to use Zig to do something I have done before.

Update in 2023!

The segfaults mentioned here have been solved to my knowledge.

This is the second time I have written some Zig, the first time I tried using it to build and test a radix sort and hash map implementation in C. This was moderately successful. One problem was that I managed to segfault the compiler, the other that I was confused about slices and pointers. This time I managed to also segfault the compiler and was still confused about slices.

Update in 2023!

The compiler now prints some helpful hints when there is an issue. Also I think it is now more permissive in situations where there is no ambiguity.

After a bit more Zig hacking I now feel totally comfortable with slices and the various pointer types.

This hasn’t deterred me however. For one thing I have spent barely any time on Zig. I’ve spent more time trying to figure out if something is a scalar or an array in Perl than I have with Zig. So I can forgive some head scratching over its obtuse type system errors.

Just to be clear, this is hardly an apples to apples comparison. For that I think we would have to rip out the standard libraries for both languages. Then build an application with total feature parity. Then we shall see exactly what each language gives us. Alternatively we could try using a C library which provides similar features to the Zig one.

Anyway enough rambling and interlinking. You can see the latest zig code here and the latest C code here. Let’s compare the imports and includes first.

Import/Include

Zig

const std = @import("std");
const net = std.net;
const mem = std.mem;
const fs = std.fs;
const io = std.io;

C

#define _GNU_SOURCE

#include <limits.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <sys/sendfile.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>

I only used the standard library for Zig and POSIX for C. With the exception of sys/sendfile.h and perhaps something else I have forgotten about. Everything from the Zig standard library is imported entirely with @import("std"), the other statements are just regular assignments.

Zig doesn’t specifically have modules or whatever, things like structs and unions act as namespaces. The @import statement wraps the source file it includes in a struct type. So std is a type of struct. Struct types (or just structs) can have static variables, which I assume is what std.io is.

All struct types in Zig are anonymous unless they are assigned to a variable or appear in a return statement. Then they take on the name of the variable or the returning function respectively. It seems the first assignment becomes the canonical name.

Already this is saying a lot about Zig I think. Meanwhile the C #includes are not actually C, they are preprocessor directives. The C preprocessor is a templating language more or less. Including a file inserts its processed content at the point of the include. It’s not immediately obvious what was included and which parts of it we use.

I’m not entirely sure all of those includes are needed either. It should be possible to find out using static analysis, however I’m not exactly sure how to do it. Having said that, I’m pretty sure they all are needed.

The header files don’t include the full code for the functions being included either. The could do of course, but I’m linking against glibc and that is not how it works. By default Zig’s standard library is fully included. There is a huge discussion to be had about that, but it doesn’t effect the current project.

The Zig produced executable is bigger than the C one and it takes longer to compile. However they are both more than adequate for this project. It’s difficult to extrapolate this to a larger or more constrained scenario because Zig appears to have ways of dealing with these issues. Not to mention that you can throw out the c standard library.

What I think matters most here is that we have a big long list of C headers for a relatively simple program. Also we know that everything from std is in the std variable. At least until we assign something from std to an outer variable.

It may be feasible to do something similar in C with structs and clever macros. However, using the defaults, Zig wins here.

Main

Zig

pub fn main() anyerror!void {
    var args = std.process.args();
    const exe_name = args.next() orelse "zelf-zerve";
    const public_path = args.next() orelse {
        std.log.err("Usage: {s} <dir to serve files from>", .{exe_name});
        return;
    };

    var dir = try fs.cwd().openDir(public_path, .{});
    const self_addr = try net.Address.resolveIp("127.0.0.1", 9000);
    var listener = net.StreamServer.init(.{});
    try (&listener).listen(self_addr);

    std.log.info("Listening on {}; press Ctrl-C to exit...", .{self_addr});

    while ((&listener).accept()) |conn| {
        std.log.info("Accepted Connection from: {}", .{conn.address});

        serveFile(&conn.stream, dir) catch |err| {
            if (@errorReturnTrace()) |bt| {
                std.log.err("Failed to serve client: {}: {}", .{err, bt});
            } else {
                std.log.err("Failed to serve client: {}", .{err});
            }
        };

        conn.stream.close();
    } else |err| {
        return err;
    }
}

C

int main(const int argc, const char *const argv[])
{
    const pid_t orig_parent = getppid();
    const struct sockaddr_in self_addr = {
        .sin_family = AF_INET,
        .sin_port = htons(9000),
        .sin_addr = {
            htonl(INADDR_LOOPBACK)
        }
    };
    const int listen_sk = socket(AF_INET, SOCK_STREAM, 0);
    const int public_dir = open(argv[1], O_PATH);
    struct sockaddr client_addr;
    socklen_t addr_len;

    if (argc < 2) {
        dprintf(STDERR_FILENO,
            "usage: %s <dir to serve files from>\n",
            argv[0]);
        return 1;
    }

    if (bind(listen_sk, (struct sockaddr *)&self_addr, sizeof(self_addr))) {
        perror("bind");
        return 1;
    }

    if (listen(listen_sk, 8)) {
        perror("listen");
        return 1;
    }

    printf("[+] Listening; press Ctrl-C to exit...\n");

    while (orig_parent == getppid()) {
        const int sk = accept(listen_sk, &client_addr, &addr_len);

        if (sk < 0) {
            perror("[-] accept");
            break;
        }

        printf("[+] Accepted Connection\n");

        serve_file(sk, public_dir);
        close(sk);
    }

    return 0;
}

If you want to access argv in Zig, then you usually create an iterator around it. You can of course access it directly, but this is more error prone. You can see in the C code that I am accessing argv[1] before checking argc. The result is that it could try opening a path descriptor from an environment variable or something along these lines.

For whatever reason Zig does not include args in main’s arguments. I can’t say this makes any difference to me. The Zig return value is void or an error code. If an error code is returned from main then Zig prints it. If debugging info is available then Zig also prints a return error trace. This is not to be confused with a back trace.

The way that Zig handles errors has a very significant impact on this program. Most functions which can return an error are prefixed with try. If an error is returned then try acts like return and propagates the error. Otherwise it behaves like an expression.

There is also catch which can be used in various places to branch on an error. Other things like while can handle errors as well. You can see on the bottom that the loop there has an else clause.

In C we just use if statements and you can see I am ignoring some errors. My guess is that it is possible to implement error return traces in C and something similar to try using various types of magic. However I haven’t seen it done, so this is a win for Zig.

The way that the while loop captures the connection variable |conn| is a big win. Also note the orelse which specifically handles a null result. The type system forces us to check that something is not null or an error before we try using it. This mitigates a category of bugs and then Zig also provides some syntax to avoid having ifs all over the place (or if you have used Rust then… well, you know).

Variables in Zig must either be declared with const or var. What is more, if a variable can be const it must be. By default, in C everything is mutable. I also haven’t found a way to warn when a variable could be const. Again it should be possible to implement for C, but for now Zig wins here. Zig also can infer the type of a variable most of the time. This is obviously a good thing in some situations, but here it may just leave a reader wondering what types the variables are.

Let’s ignore the address declaration in C, I could have done that differently. So moving on.

Receiving

Zig

const ServeFileError = error {
    RecvHeaderEOF,
    RecvHeaderExceededBuffer,
    HeaderDidNotMatch,
};

fn serveFile(stream: *const net.Stream, dir: fs.Dir) !void {
    var recv_buf: [BUFSIZ]u8 = undefined;
    var recv_total: usize = 0;

    while (stream.read(recv_buf[recv_total..])) |recv_len| {
        if (recv_len == 0)
            return ServeFileError.RecvHeaderEOF;

        recv_total += recv_len;

        if (mem.containsAtLeast(u8, recv_buf[0..recv_total], 1, "\r\n\r\n"))
            break;

        if (recv_total >= recv_buf.len)
            return ServeFileError.RecvHeaderExceededBuffer;
    } else |read_err| {
        return read_err;
    }

    const recv_slice = recv_buf[0..recv_total];
    std.log.info(" <<<\n{s}", .{recv_slice});

    ...

C

static void serve_file(const int sk, const int public_dir)
{
    char recv_buf[BUFSIZ];
    char head_buf[BUFSIZ];
    const size_t buf_len = BUFSIZ - 1;
    char path_buf[256];
    char *file_path;
    ssize_t recv, sent;
    size_t recv_total = 0, sent_total = 0;
    int body_fd;

    while (1) {
        recv = read(sk,
                recv_buf + recv_total,
                buf_len - recv_total);

        if (recv < 0) {
            perror("[-] read");
            return;
        }

        if (!recv) {
            dprintf(STDERR_FILENO,
                "[-] End of data before header was received\n");
            return;
        }

        recv_total += recv;
        recv_buf[recv_total] = 0;

        if (strstr(recv_buf, "\r\n\r\n"))
            break;

        if (recv_total >= buf_len) {
            dprintf(STDERR_FILENO,
                "Exceeded buffer reading header\n");
            return;
        }
    }

    printf("[*] <<<\n%s\n", recv_buf);
    ...

When we have a connection the first thing we do is receive the header. It’s expected that the entire header will be received in a single read most of the time. This web server is only for local usage after all. However occasionally this won’t happen because the copying of buffers can be interrupted and other random reasons. So we need a loop.

It’s difficult to know where to start here. I guess the weirdest thing about the Zig code is that the while has |recv_len| an else clause. The while loop here is saying “while read is not an error then… else if it is an error…”. The symbol enclosed in pipes (|) is capturing the return value or error.

The call to read is the first thing we do and will want to break on if it goes wrong. In the C code I use a while(1) loop for the same reason; there is nothing to check before we do the read. If the Zig code provides any concrete advantage over C it is that it forces error checking. Meanwhile Zig gives you a minimal effort way of debugging errors.

If I were to just return the errno from serve_file in C then I wouldn’t know exactly where an error came from. That is unless I use and outside tool like strace to see which system call caused an error (if any). So ignoring outside tracing methods, Zig gets another win here.

Also here you can see Zig’s arrays and slices; recv_buf[recv_total..] means we begin reading into the buffer at an offset of recv_total. Also we don’t need to pass the buffer length separately because it is part of the slice struct. Nor do we need to calculate the remaining length. Hurray!

I suspect that Zig gets another win through slices for making it easy to avoid null terminated strings. Zig explicitly supports null terminated strings, but you don’t need them for the standard library’s string functions.

Routing

Zig

    var file_path: []const u8 = undefined;
    var tok_itr = mem.tokenize(u8, recv_slice, " ");

    if (!mem.eql(u8, tok_itr.next() orelse "", "GET"))
        return ServeFileError.HeaderDidNotMatch;

    const path = tok_itr.next() orelse "";
    if (path[0] != '/')
        return ServeFileError.HeaderDidNotMatch;

    if (mem.eql(u8, path, "/"))
        file_path = "index"
    else
        file_path = path[1..];

    if (!mem.startsWith(u8, tok_itr.rest(), "HTTP/1.1\r\n"))
        return ServeFileError.HeaderDidNotMatch;

    var file_ext = fs.path.extension(file_path);
    var path_buf: [fs.MAX_PATH_BYTES]u8 = undefined;

    if (file_ext.len == 0) {
        var path_fbs = io.fixedBufferStream(&path_buf);

        try path_fbs.writer().print("{s}.html", .{file_path});
        file_ext = ".html";
        file_path = path_fbs.getWritten();
    }

    std.log.info("Opening {s}", .{file_path});

    var body_file = try dir.openFile(file_path, .{});
    defer body_file.close();

    const file_len = try body_file.getEndPos();

C

    if (!sscanf(recv_buf, "GET %250s HTTP/1.1", path_buf)) {
        dprintf(STDERR_FILENO,
            "[-] 'GET <file_path> HTTP/1.1' not matched in:\n %s",
            recv_buf);
    }

    if (!strcmp("/", path_buf)) {
        strcpy(path_buf, "index.html");
        file_path = path_buf;
    } else if (path_buf[0] == '/') {
        file_path = path_buf + 1;
    }

    printf("[*] Opening %s", file_path);
    body_fd = openat(public_dir, file_path, O_RDONLY);

    if (body_fd < 0 && errno == ENOENT) {
        strcpy(file_path + strlen(file_path), ".html");
        body_fd = openat(public_dir, file_path, O_RDONLY);
        printf(" failed trying with .html");
    }
    printf("\n");

    if (body_fd < 0) {
        perror("[-] openat");
        return;
    }

The Zig code is a bit lot longer because there is no sscanf equivalent in the Zig library. I’m not that confident about either the C or Zig code. However note the defer body_file.close() line. This saves having to do a goto or close the file at every early return thereafter.

Sending

zig

    const http_head =
        "HTTP/1.1 200 OK\r\n" ++
        "Connection: close\r\n" ++
        "Content-Type: {s}\r\n" ++
        "Content-Length: {}\r\n" ++
        "\r\n";
    const mimes = .{
        .{".html", "text/html"},
        .{".css", "text/css"},
        .{".map", "application/json"},
        .{".svg", "image/svg+xml"},
        .{".jpg", "image/jpg"},
        .{".png", "image/png"}
    };
    var mime: []const u8 = "text/plain";

    inline for (mimes) |kv| {
        if (mem.eql(u8, file_ext, kv[0]))
            mime = kv[1];
    }

    std.log.info(" >>>\n" ++ http_head, .{mime, file_len});
    try stream.writer().print(http_head, .{mime, file_len});

    const zero_iovec = &[0]std.os.iovec_const{};
    var send_total: usize = 0;

    while (true) {
        const send_len = try std.os.sendfile(
            stream.handle,
            body_file.handle,
            send_total,
            file_len,
            zero_iovec,
            zero_iovec,
            0
        );

        if (send_len == 0)
            break;

        send_total += send_len;
    }
}
    const char *const http_head =
        "HTTP/1.1 200 OK\r\n"
        "Connection: close\r\n"
        "Content-Type: %s\r\n"
        "Content-Length: %lu\r\n"
        "\r\n";
    const char *mime = "text/html";
    if (strstr(file_path, ".css"))
        mime = "text/css";
    if (strstr(file_path, ".map"))
        mime = "application/json";
    if (strstr(file_path, ".svg"))
        mime = "image/svg+xml";
    if (strstr(file_path, ".jpg"))
        mime = "image/jpg";
    if (strstr(file_path, ".png"))
        mime = "image/png";

    struct stat body_stat;
    if (fstat(body_fd, &body_stat)) {
        perror("[-] fstat");
        goto close_body;
    }
    sprintf(head_buf, http_head, mime, body_stat.st_size);
    printf("[*] >>>\n%s", head_buf);

    while (sent_total < strlen(http_head)) {
        sent = write(sk, head_buf + sent_total, strlen(head_buf));

        if (sent < 0) {
            perror("[-] write");
            goto close_body;
        }

        sent_total += sent;
    }

    do {
        sent = sendfile(sk, body_fd, NULL, body_stat.st_size);

        if (sent < 0) {
            perror("[-] sendfile");
            goto close_body;
        }

        sent_total += sent;
    } while (sent > 0);

close_body:
    close(body_fd);
}

So here we can see the C has a goto in it. I’m not sure it makes much of a difference here although I guess it’s easier to mess up using goto than defer for freeing resources on exit. On the other hand you may be looking at defer thinking “huh? When does that run?”.

I have to say that Zig suffered a major fail in this part because the compiler segfaulted when I was trying to write the mime selection code. At the time of writing the following code will cause a segfault.

    const ms = .{ "a", "b" };
    const a = set: {
        inline for (ms) |m| {
            if (mem.eql(u8, "a", m))
                break :set m;
        }
        break :set "c";
    };

    const a2: [:0]const u8 = "a";
    try testing.expectEqual(a2, a);

This appears to be valid Zig code because it at least gets as far as emitting LLVM IR. However there is some issue there. Of course this is also very weird looking, so it’s perhaps best that I removed it.

Also note the inline for, this is required because ms and mimes are known at compile time and I think have comptime types. Zig doesn’t have a preprocessor, macro’s or templates. Instead it allows code with inputs known at compile time, to be ran at compile time. I suppose we could stop this code being evaluated at compile time by specifying runtime types on mimes.

In this program it’s not clear what the advantages of comptime are. Meanwhile it got in my way a little bit when getting errors like.

./src/self-serve.zig:114:5: error: unable to evaluate constant expression
    for (mimes) |kv| {

It’s worth mentioning that C compilers can evaluate a lot at compile time as well. You can see this demonstrated in my automata article. This simply happens when turning on optimisations and avoiding things which will hide the “constness” of variables. I suppose that comptime has resulted in a win for C here. Although this won’t dampen my enthusiasm for comptime in general.

Frankly I’m finding it increasingly difficult to draw solid comparisons at this point. While writing this article I keep discovering things I could do differently in both Zig and C. However I feel like it is time to cap this off.

Conclusion

This application isn’t exactly a major stress test for either language. They both fit well within my requirements for executable size and execution performance even with all the sanitizers turned on. There aren’t any of the complications of a large modular code base either. It doesn’t even allocate heap memory.

However I think this shows that Zig makes some concrete advances over C. Meanwhile it doesn’t appear to make anything more difficult. At least so long as the compiler doesn’t segfault or blurt out something like “cannot store runtime value in type ‘comptime_int’”, without any hint as to what to do about it.

Most issues I have encountered seem to be temporary implementation problems. Andrew Kelly and Co. didn’t decide to make radical changes over C that introduce new problems. Rather they changed some defaults and added evolutionary improvements. At least as far as this application shows. I still wonder if there are dragons lurking in the comptime features. On the other hand comptime can be seen as an evolution of the C preprocessor and other tools which generate C code.