richiejp logo

A barely HTTP/2 server in Zig

Intro

Don’t get me wrong HTTP/2 is a clear improvement on HTTP1.x, but where is the entry level version? It’s certainly not HTTP/3, that’s for sure.

The general wisdom amongst any new technology group, such as Zig, is to stick with text based HTTP and hide behind a proxy. This is the logical thing to do because HTTP1.1 has a lower barrier to entry. Primarily because of HPACK which we’ll get into in a moment.

To see why HTTP1.1 is an attractive option, take a look at my article on Zig Vs C - Minimal HTTP server. It’s relatively easy to slap together a non-compliant HTTP server. My static file server is an extreme example, but practically you can aim for partial compliance then hide behind a normalising HTTP proxy.

In theory and ostensibly in practice the proxy takes whatever awful HTTP is thrown at you from the internet, then converts it into some manageable subset. It can also handle TLS termination, so you can leave all that bad jazz to a third party.

There are two problems with this. The least important being performance and the most being security.

In my semi-sophisticated opinion, both issues have the same root cause. Essentially the length of a HTTP message is not known until you parse a variable length list of variable length headers.

In theory you should then know how long the body is. However there is confusion over what specifies the body length which leads to request smuggling.

Request smuggling and desync attacks are enabled by the presence of proxies. Allegedly the problem gets worse with the introduction of HTTP/2. However at the end of the linked article it states:

If you’re setting up a web application, avoid HTTP/2 downgrading - it’s the root cause of most of these vulnerabilities. Instead, use HTTP/2 end to end.

OK, so how hard can it be to implement HTTP/2 then? This is something I was excited to find out about. Not least because it is an excuse to try out Zig for implementing network protocols. With the eventual goal of doing some security research into crusty old tech using exciting new tech (a vague plan of mine).

Zig isn’t just exciting in my opinion; the traffic to my blog indicates it is the most interesting thing. I also believe it is practical. Surprisingly my original HTTP server still compiles and runs on the latest Zig from Git. It seems the build API has changed quite a bit, but for an unstable language it is quite stable if you stay away from new features.

The source code for the parser and a static file server is here: github.com/richiejp/barely-http2

Here is some sample output upon a request by Curl; it’s not pretty:

zig run src/self-serve2.zig -- ~/portfolio/public
info: Listening on 127.0.0.1:9001; press Ctrl-C to exit...
info: Accepted Connection from: 127.0.0.1:48274
info: <<< Got preface!
info: >>> Sending server preface
info: <<< http2.FrameHdr{ .length = 18, .type = http2.FrameType.settings, .flags = http2.FrameFlags{ .settings = http2.SettingsFlags{ .ack = false, .unused = 0 } }, .r = false, .id = 0 } http2.Payload{ .settings = http2.SettingsPayload{ .settings = { 0, 3, 0, 0, 0, 100, 0, 4, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0 } } }
info:     http2.Setting{ .maxConcurrentStreams = 100 }
info:     http2.Setting{ .initialWindowSize = 33554432 }
info:     http2.Setting{ .enablePush = false }
info: <<< http2.FrameHdr{ .length = 4, .type = http2.FrameType.windowUpdate, .flags = http2.FrameFlags{ .unused = 0 }, .r = false, .id = 0 } http2.Payload{ .windowUpdate = http2.WindowUpdatePayload{ .r = false, .windowSizeIncrement = 33488897 } }
info: <<< http2.FrameHdr{ .length = 44, .type = http2.FrameType.headers, .flags = http2.FrameFlags{ .headers = http2.HeadersFlags{ .endStream = true, .unused1 = false, .endHeaders = true, .padded = false, .unused2 = false, .priority = false, .unused3 = 0 } }, .r = false, .id = 1 } http2.Payload{ .headers = http2.HeadersPayload{ .headerBlockFragment = { 130, 4, 141, 98, 49, 216, 90, 61, 45, 58, 83, 88, 150, 246, 105, 191, 134, 65, 138, 160, 228, 29, 19, 157, 9, 184, 248, 0, 31, 122, 136, 37, 182, 80, 195, 203, 129, 112, 255, 83, 3, 42, 47, 42 }, .hdec = hpack.Decoder{ .from = { ... }, .to = { ... }, .table = hdrIndx.Table{ ... } } } }
info:     :method => GET
info:     :path => /barely-http2-zig
info:     :scheme => http
info:     :authority => localhost:9001
info:     user-agent => curl/8.0.1
info:     accept => */*
info: *** Opening barely-http2-zig.html
info: >>> Sending OK headers
info: >>> Sending DATA http2.FrameHdr{ .length = 16384, .type = http2.FrameType.data, .flags = http2.FrameFlags{ .data = http2.DataFlags{ .endStream = false, .unused = 0, .padded = false } }, .r = false, .id = 1 }
info: >>> Sending DATA http2.FrameHdr{ .length = 16384, .type = http2.FrameType.data, .flags = http2.FrameFlags{ .data = http2.DataFlags{ .endStream = false, .unused = 0, .padded = false } }, .r = false, .id = 1 }
info: >>> Sending DATA http2.FrameHdr{ .length = 16384, .type = http2.FrameType.data, .flags = http2.FrameFlags{ .data = http2.DataFlags{ .endStream = false, .unused = 0, .padded = false } }, .r = false, .id = 1 }
info: >>> Sending DATA http2.FrameHdr{ .length = 16384, .type = http2.FrameType.data, .flags = http2.FrameFlags{ .data = http2.DataFlags{ .endStream = false, .unused = 0, .padded = false } }, .r = false, .id = 1 }
info: >>> Sending DATA http2.FrameHdr{ .length = 16384, .type = http2.FrameType.data, .flags = http2.FrameFlags{ .data = http2.DataFlags{ .endStream = false, .unused = 0, .padded = false } }, .r = false, .id = 1 }
info: >>> Sending DATA http2.FrameHdr{ .length = 12857, .type = http2.FrameType.data, .flags = http2.FrameFlags{ .data = http2.DataFlags{ .endStream = true, .unused = 0, .padded = false } }, .r = false, .id = 1 }
info: >>> Sent 94777 file bytes

HPACK

It turned out the main obstacle to slapping together a half-arsed HTTP/2 parser and writing a victory blog post declaring it took me only N hours, is the header compression scheme.

This, in addition to the stream dependency and encryption talk, is probably what frightens people away from HTTP/2.

There is no way that you can skip over HPACK decoding unless the other side is equally as lazy. Initially I thought I had found a way of doing it by setting the decoder table size to zero. However this is scuppered by an allowance for the client to start sending frames before it has received any settings (because latency).

In fact even if you could avoid implementing the decoder table, you would still have to deal with Huffman encoded strings and variable length integers. Although these things are standard computer science, so there is plenty of good material to fall back on.

I did however fuss over avoiding memory allocations and implement all the data structures myself. It’s worth pointing out that the Zig standard library has some nice data structures and functions for doing stuff like this.

Let’s take a look at the thing which I tried hardest to avoid, the decoder table. This thing provides some nice compression for site specific headers. If you are lost as to what I am talking about then do a search on HPACK. There are some articles waxing lyrical about the greatness of HPACK. Essentially though it remembers headers, in full or part, so that they don’t have to be resent for multiple requests.

/// The content of the table entries; A FIFO buffer.
///
/// It's a buffer with capacity 3x the size of the minimum table size
/// required by HPACK. This allows us to keep adding entries in
/// contiguous chunks with only an occasional copy.
///
/// New entries are added before start and their length subtracted
/// from start.  When start gets below the minimum table size,
/// everything is shifted backwards.
///
/// Only 2X the table size would be needed except that a new entry can
/// reference the index of an entry which is about to be removed.
const HdrData = struct {
    /// The start of the first entry (most recently added)
    start: u16 = 2 * 4096,
    /// Where the start was at the previous copy to shift everything backwards.
    /// Needed to correct indexes for copied items.
    prevStart: u16 = 3 * 4096,
    /// The length of the current entries
    len: u16 = 0,
    /// The data
    vec: [3 * 4096]u8 = undefined,
};

/// An entry in the table data. Sort of like a slice, but using
/// 16-bit indexes.
const HdrPtr = struct {
    start: u16,
    nameLen: u16,
    valueLen: u16,
};

/// An inner table indexing the table's entries. Needed because the
/// entries are uneven.
const HdrIndx = struct {
    start: u8 = 127,
    len: u8 = 0,
    vec: [256]HdrPtr = undefined,
};

/// The encapsulating Table struct because I forgot that files in Zig
/// are structs.
pub const Table = struct {
    data: HdrData = HdrData{},
    indx: HdrIndx = HdrIndx{},
    size: u16 = 4096,

    fn capacity(self: *Table) u16 {
        return self.data.vec.len / 3;
    }

    /// Get an entry from the table. The returned struct is borrowed
    /// and needs to be copied if it is to be used after it has been
    /// evicted from the table.
    pub fn get(self: *Table, i: u8) !HdrConst {
        const data = &self.data;
        const indx = &self.indx;
        const slen = STATIC_INDX.len;

        if (i == 0)
            return error.InvalidIndexZero;

        if (i < slen)
            return STATIC_INDX[i];

        if (i - slen >= indx.len) {
            return error.IndexTooBig;
        }

        const hdr = indx.vec[indx.start + (i - slen)];
        const start = if (hdr.start < data.start)
            2 * self.capacity() + (hdr.start - data.prevStart)
        else
            hdr.start;

        const value_start = start + hdr.nameLen;

        return .{
            .name = data.vec[start..value_start],
            .value = data.vec[value_start .. value_start + hdr.valueLen],
        };
    }

    /// The length of the table according to the HPACK spec
    fn nominalLen(self: *Table, name: []const u8, value: []const u8) usize {
        const estimated_overhead = 32 * (1 + @as(usize, self.indx.len));

        return self.data.len + name.len + value.len + estimated_overhead;
    }

    /// Add an entry to the table. The name and value arguments can
    /// point to an existing entry which will evict itself.
    pub fn add(self: *Table, name: []const u8, value: []const u8) !void {
        const data = &self.data;
        const indx = &self.indx;

        while (self.nominalLen(name, value) > self.size) {
            if (indx.len == 0)
                return;

            const last = indx.vec[indx.start + indx.len - 1];

            data.len -= last.nameLen + last.valueLen;
            indx.len -= 1;
        }

        if (indx.start == 0) {
            mem.copy(HdrPtr, indx.vec[128..], indx.vec[0..128]);
            indx.start = 128;
        }

        indx.len += 1;
        indx.start -= 1;

        const hdr = &indx.vec[indx.start];
        hdr.nameLen = @truncate(u16, name.len);
        hdr.valueLen = @truncate(u16, value.len);

        data.start -= hdr.nameLen;
        data.start -= hdr.valueLen;
        hdr.start = data.start;

        data.len += hdr.nameLen;
        data.len += hdr.valueLen;

        const value_start = hdr.start + hdr.nameLen;

        mem.copy(u8, data.vec[hdr.start..value_start], name);
        mem.copy(u8, data.vec[value_start .. value_start + hdr.valueLen], value);

        if (data.start < self.capacity()) {
            mem.copyBackwards(
                u8,
                data.vec[2 * self.capacity() ..],
                data.vec[data.start .. data.start + data.len],
            );
            data.prevStart = data.start;
            data.start = 2 * self.capacity();
        }
    }
};

This stores the full name and value of each header in a buffer that takes up 3x the space of the nominal table size. This is not great memory usage, but it’s relatively simple and hopefully cache efficient. Because the memory accesses are likely to be close together.

The table is of a fixed size of 4096, the minimum required by HTTP/2. Zig would allow this to be changed easily in a variety of ways. I just haven’t bothered to do it.

In the end it’s not a lot of code, although it’s the type of code which can be a pain to debug. Zig’s built in tests helped with that. You can run them with:

$ zig test src/hdrIndx.zig

Each file in the src directory has its own tests. Continuing with HPACK let’s look at integer decoding. Each header field name and value is prepended with a variable length integer describing the fields length.

What’s more this integer encoding allows some bits in the first byte of the encoding to be used for flags or whatever. So the integer value actually starts part way through the first byte. The first bit on the remaining bytes (if any) is then used as a stop bit.

This allows for infinitely large integers, but I don’t allow that because I’m a spoil sport.

/// Get the unsigned type big enough to count the bits in T. Needed
/// because Zig constrains the right hand side of a shift to an
/// integer only big enough to perform a full shift. Which is only u3
/// for u8 (for e.g.).
///
/// Meanwhile I don't know a way to specify this type other than to
/// construct it like this.
fn ShiftSize(comptime T: type) type {
    const ShiftInt = Type{
        .Int = .{
            .signedness = std.builtin.Signedness.unsigned,
            .bits = comptime std.math.log2_int(u16, @bitSizeOf(T)),
        },
    };

    return @Type(ShiftInt);
}

fn decodeInt(comptime T: type, comptime n: u3, buf: *[]const u8) !T {
    const prefix = (1 << n) - 1;
    var b = buf.*[0];
    var i: T = b & prefix;

    if (i < prefix) {
        buf.* = buf.*[1..];
        return i;
    }

    var j: ShiftSize(T) = 1;
    while ((j - 1) * 7 < @bitSizeOf(T)) : (j += 1) {
        b = buf.*[j];

        i += @as(T, (b & 0x7f)) << (7 * (j - 1));

        if (b < 0x80)
            break;
    } else {
        return UnpackError.IntTooBig;
    }

    buf.* = buf.*[j + 1 ..];
    return i;
}

It’s been a while since I wrote this and boy am I glad I wrote that stuff about the shift size. This function takes comptime parameters which allow different functions to be generated depending on the type it returns and how many bits are ignored (n).

Interestingly it seems that Zig forces you to use the minimum sized type to perform a shift. Which I think caught a few mistakes. The ShiftSize function is constructing the necessary sized type to shift T that was passed into decodeInt.

So indeed, Zig allows constructing arbitrary types from ordinary code at comptime.

Something else to note about this code is that buf is a pointer to a slice. The syntax buf.* dereferences the pointer. I update the slice to chop off the bits that were used decoding the integer. I’m not sure this is an advisable thing to do.

Now let’s look at string encodings, which display the other type of compression employed by HPACK.

/// Decode a string which if it is not Huffman encoded is fairly
/// straight forward.
///
/// If it is Huffman encoded then we have to deal with the fact
/// Huffman codes are not byte aligned and are variable length.
///
/// We could put the huffman codes in a binary tree and lookup one bit
/// at a time. However I doubt this is the right place to start on
/// common CPUs.
///
/// So instead we shift (at most) the next four bytes into a
/// buffer. Then compare the first bits of the first byte to the
/// shortest huffman codes. If it doesn't match any, then move on to
/// longer codes until we are comparing all four bytes.
///
/// I haven't done any research into the fastest methods of Huffman
/// decoding. This is just a first approximation.
fn decodeStr(from: *[]const u8, to: *[]u8) ![]const u8 {
    const huffman = from.*[0] & 0x80 == 0x80;
    const len = try decodeInt(u16, 7, from);
    const str = from.*[0..len];

    from.* = from.*[len..];

    if (!huffman)
        return str;

    var i: u16 = 0;
    var j: u32 = 0;
    var k: u16 = 0;
    var c = [_]u8{0} ** 5;

    all: while (i < len) {
        mem.copy(u8, &c, str[i..std.math.min(i + 5, str.len)]);

        const j_rem = @truncate(u3, j);

        var l: u3 = 0;
        while (j_rem > 0 and l < c.len - 1) : (l += 1) {
            c[l] <<= j_rem;
            c[l] |= c[l + 1] >> @truncate(u3, 8 - @as(u4, j_rem));
        }

        var glen: u5 = 0;
        const dehuff = decode: for (DEHUFF) |group| {
            glen = 1 + ((group.len - 1) >> 3);

            if (glen != group.codes[0].code.len)
                return error.GlenWrongLen;

            const bits_left = str.len * 8 - j;
            if (bits_left < group.len) {
                if (glen > 1 or j_rem < 1)
                    return error.HuffNoMatchInputEndedEarly;

                const pad_mask = @truncate(u8, @as(u16, 0xff00) >> @truncate(u4, bits_left));
                if (c[0] & pad_mask == pad_mask)
                    break :all
                else
                    return error.HuffInvalidPadding;
            }

            const glen_rem = @truncate(u3, group.len);
            const last_mask = if (glen_rem == 0)
                0xff
            else
                @truncate(u8, @as(u16, 0xff00) >> glen_rem);
            const last = c[glen - 1];

            for (group.codes) |huff| {
                if (huff.code[glen - 1] != last_mask & last)
                    continue;

                if (mem.eql(u8, huff.code[0 .. glen - 1], c[0 .. glen - 1]))
                    break :decode huff;
            }
        } else {
            return error.HuffNoMatch;
        };

        j += dehuff.len;
        i = @truncate(u16, j / 8);

        to.*[k] = dehuff.sym;
        k += 1;
    }

    const ret = to.*[0..k];
    to.* = to.*[k..];

    return ret;
}

The first four lines of the function decode the string if it is not Huffman encoded. HPACK is actually relatively simple in the sense that the Huffman encoding is static. So I didn’t have spend long reading my algorithms text book and could put it back under the table leg, thus restoring stability.

There is a big static table in src/huff.zig which I sort at comptime. From a Zig point of view it makes use of the loop labels and break statements. Which are good. There is lots of use of the @truncate builtin. Possibly this is wrong in some cases and it should be @intCast or something. The later having a safety check I believe.

Zig prevents some bit banging techniques common to C. Because it results in undefined behaviour. The end result being lots of uses of builtins instead of shifts and masking. Of course you can still use the wrong builtins.

Finally, for HPACK, let’s look at the Decoder struct which provides an iterator interface for incoming headers. I’ll show how it’s used first:

fn serveFiles(h2c: *http2.NetConnection, dir: fs.Dir) !void {

    ...

                var path_buf: [fs.MAX_PATH_BYTES]u8 = undefined;
                var path: ?[]const u8 = null;

                while (headers.next()) |h| {
                    std.log.info("    {s} => {s}", h);

                    if (mem.eql(u8, ":path", h.name) and h.value.len <= path_buf.len) {
                        mem.copy(u8, &path_buf, h.value);
                        path = path_buf[0..h.value.len];
                    }
                } else |err| {
                    if (err != error.EndOfData) return err;
                }

                if (path) |p|
                    try sendFile(h2c, dir, frame.hdr.id, p)
                else
                    return error.DidntFindThePathHeader;
    ...

Above we can see that headers.next() is called and if we find the path header we copy it. Most HTTP implementations I have seen will allocate strings for every header and pass them up to a higher level Framework. We could do that too, but I think it’s important to keep the option of just skipping over stuff we don’t care about.

Also some headers we do care about we could deal with during the parsing without performing a copy. Below is the implementation which correlates with this part of the spec.

/// An iterator that takes I/O buffers, a decoding table and returns
/// header entries. The table is mutated so you can't run the iterator
/// twice.
pub const Decoder = struct {
    /// buffer containing the encoded data
    from: []const u8,
    /// A scratch buffer for decoded headers
    to: []u8,
    table: hdrIndx.Table = hdrIndx.Table{},

    pub fn init(from: []const u8, to: []u8) Decoder {
        return .{
            .from = from,
            .to = to,
        };
    }

    pub fn newData(self: *Decoder, from: []const u8, to: []u8) void {
        self.from = from;
        self.to = to;
    }

    /// Get the next header. The contents of the header may be
    /// borrowed from the scratch buffer or the table's buffer.
    pub fn next(self: *Decoder) !HdrConst {
        if (self.from.len < 1)
            return error.EndOfData;

        const table = &self.table;
        const tag = self.from[0];
        const repr = try HdrFieldRepr.from(tag);

        switch (repr) {
            .indexed => {
                const i = try decodeInt(u8, 7, &self.from);
                return table.get(i);
            },
            .indexedNameAddValue => {
                const i = try decodeInt(u8, 6, &self.from);
                const ihdr = try table.get(i);
                const hdr = .{
                    .name = ihdr.name,
                    .value = try decodeStr(&self.from, &self.to),
                };

                try table.add(hdr.name, hdr.value);

                return hdr;
            },
            .indexedNameLitValue => {
                const i = try decodeInt(u8, 4, &self.from);
                const ihdr = try table.get(i);
                const hdr = .{
                    .name = ihdr.name,
                    .value = try decodeStr(&self.from, &self.to),
                };

                return hdr;
            },
            .addNameAddValue, .litNameLitValue => {
                self.from = self.from[1..];

                const hdr: HdrConst = .{
                    .name = try decodeStr(&self.from, &self.to),
                    .value = try decodeStr(&self.from, &self.to),
                };

                if (repr == .addNameAddValue)
                    try table.add(hdr.name, hdr.value);

                return hdr;
            },
        }
    }
};

Couldn’t they have just made a HTTP1.2 that encoded the headers in RESP strings or something?

Frames

A HTTP/2 connection is made up of a series of frames with various types of payload. The frame header is always the same and not too complicated. Initially I tried decoding the header by declaring a packed struct for it and doing a @ptrCast or @bitCast.

If it worked it would look something like this:

const HdrData = packed struct {
    length: u24,
    type: FrameType, // or u8
    flags: FrameFlags,
    r: bool,
    id: u31,
};

...
    var hdr = @ptrCast(HdrData, buf[n..n+9]);
    // Then switch the endianess

The major issue is the endianess needs switching from network byte order to native for length and perhaps id. There are functions to help with reading in integers of different endianess. However it didn’t seem worth trying to do that and a pointer cast.

Then there is FrameFlags, which it is convenient to define as a tagged union for printing. To my knowledge Zig doesn’t define where the tag is or allow you to specify it. We can’t tell it that a packed union has its tag immediately before the union field content.

So I feel this code could be cleaner, but also that I am fussing over a minor details. The below is how a Frame is decoded and encoded.

/// All HTTP/2 traffic is made up of frames with a fixed sized header
/// of the same format. Each frame specifies its type and payload
/// length. On an abstract level this makes parsing HTTP/2 traffic
/// easy.
///
/// The only complicatin in Zig being the interactiong between
/// endianess and packed u24. Otherwise we could declare the flags as
/// u8, mark the struct as packed then do a single @ptrCast. I tried
/// something like this, but got in a mess and settled on the below.
pub const FrameHdr = struct {
    /// Length of the frame's payload
    length: u24,
    /// How we should interpret everything that follows
    type: FrameType,
    flags: FrameFlags,
    /// A reserved bit, which we can set to 1 as an act of rebellion.
    r: bool = false,
    /// The stream ID or zero if this frame applies to the connection.
    id: u31,

    pub fn from(buf: *const [9]u8) FrameHdr {
        const ftype = @intToEnum(FrameType, buf[3]);

        return .{
            .length = mem.readIntBig(u24, buf[0..3]),
            .type = ftype,
            .flags = switch (ftype) {
                .headers => .{ .headers = @bitCast(HeadersFlags, buf[4]) },
                .settings => .{ .settings = @bitCast(SettingsFlags, buf[4]) },
                .windowUpdate => .{ .unused = buf[4] },
                else => .{ .unknown = buf[4] },
            },
            .r = @bitCast(bool, @truncate(u1, buf[5] >> 7)),
            .id = @intCast(u31, mem.readIntBig(u32, buf[5..9]) & 0x7fffffff),
        };
    }

    pub fn to(self: FrameHdr, buf: []u8) []const u8 {
        mem.writeIntBig(u24, buf[0..3], self.length);
        buf[3] = @enumToInt(self.type);
        buf[4] = switch (self.flags) {
            .data => |flags| @bitCast(u8, flags),
            .headers => @bitCast(u8, self.flags.headers),
            .settings => @bitCast(u8, self.flags.settings),
            else => unreachable,
        };
        // r is always 0
        mem.writeIntBig(u32, buf[5..9], @intCast(u32, self.id));

        return buf[0..9];
    }
};

There are a number of frame types which are listed below.

const FrameType = enum(u8) {
    data,
    headers,
    priority,
    rstStream,
    settings,
    pushPromise,
    ping,
    goAway,
    windowUpdate,
    continuation,
    _,
};

These mostly have different payloads. The structure of the payloads varies depending what flags are specified in the frame header. They are mostly not too complicated in terms of layout though. The exception being the headers of course.

Let’s look at the settings payload decoder. It would be possible to ignore the settings completely. However I was curious to see what Curl would send.

const SettingId = enum(u16) {
    headerTableSize = 0x1,
    enablePush,
    maxConcurrentStreams,
    initialWindowSize,
    maxFrameSize,
    maxHeaderListSize,
};

const Setting = union(SettingId) {
    headerTableSize: u32,
    enablePush: bool,
    maxConcurrentStreams: u32,
    initialWindowSize: u31,
    maxFrameSize: u24,
    maxHeaderListSize: u32,
};

/// Settings limit what we can send to the other side. This is
/// essentially an iterator which returns a tagged u32
const SettingsPayload = struct {
    settings: []const u8,
    used: usize,

    pub fn init(buf: []const u8) SettingsPayload {
        return .{ .settings = buf, .used = 0 };
    }

    pub fn next(self: *SettingsPayload) !Setting {
        if (self.settings.len - self.used == 0)
            return error.EndOfData;

        if (self.settings.len - self.used < 6)
            return error.UnexpectedEndOfData;

        const buf = self.settings[self.used..][0..6];
        self.used += 6;

        const id = mem.readIntBig(u16, buf[0..2]);
        const val = mem.readIntBig(u32, buf[2..]);

        return switch (id) {
            1 => .{ .headerTableSize = val },
            2 => .{ .enablePush = @bitCast(bool, @intCast(u1, val)) },
            3 => .{ .maxConcurrentStreams = val },
            4 => .{ .initialWindowSize = @intCast(u31, val) },
            5 => .{ .maxFrameSize = @intCast(u24, val) },
            6 => .{ .maxHeaderListSize = val },
            else => error.NoIdeaWhatThatSettingIs,
        };
    }
};

It is another iterator. Again it returns a tagged union which is nice because we can switch on it and it is formatted for printing automatically.

By this point I was no longer passing around pointers to slices. Instead I had (re)discovered that functions in a Zig struct can begin with a self argument. I could then just keep a slice and used argument on the struct. It occurred to me while writing this, that I don’t even need used.

const SettingsPayload = struct {
    settings: []const u8,

    pub fn init(buf: []const u8) SettingsPayload {
        return .{ .settings = buf };
    }

    pub fn next(self: *SettingsPayload) !Setting {
        if (self.settings.len == 0)
            return error.EndOfData;

        if (self.settings.len < 6)
            return error.UnexpectedEndOfData;

        const buf = self.settings[0..6];

        self.settings = self.settings[6..];

        const id = mem.readIntBig(u16, buf[0..2]);
        const val = mem.readIntBig(u32, buf[2..]);

        return switch (id) {
            1 => .{ .headerTableSize = val },
            2 => .{ .enablePush = @bitCast(bool, @intCast(u1, val)) },
            3 => .{ .maxConcurrentStreams = val },
            4 => .{ .initialWindowSize = @intCast(u31, val) },
            5 => .{ .maxFrameSize = @intCast(u24, val) },
            6 => .{ .maxHeaderListSize = val },
            else => error.NoIdeaWhatThatSettingIs,
        };
    }
};

It’s not like I haven’t used a language with slices in before. However there does seem to be some kind of mental block. Perhaps because they look similar to arrays.

Connection

HTTP/2 organises things into connections and streams within connections. Some things are connection level and other things are stream level.

Also a connection maps to a TCP/TLS connection… I think. At least for the current purposes a stream is just an ID number we need to remember when responding to a headers frame containing a get method.

There are no explicit request/response objects. However a request can be thought of as a sequence of headers, continuation and data frames on a single stream. There seems to be some leeway here though which could be another barrier to adoption.

For now though, we just have a Connection object which provides a low level interface for getting frames.

/// HTTP/2's idea of a connection which wraps around the underlying
/// stream.  Usually the underlying stream will be a TCP connection,
/// but could by anything which provides the Reader and Writer
/// interfaces e.g. a file or buffer with captured frame data in.
///
/// This is essentially an iterator interface to the underlying
/// data. Which recurses into iterators for the various types of frame
/// payload.
///
/// It only allocates memory during init and we can reuse the
/// connection object by calling reinit. This preempts the No. 1
/// performance issue I have seen in most open source libraries.
///
/// One should assume that any pointers it returns are to buffers it
/// allocated at init time. So their lifetime is only until the next
/// call to nextFrame[Hdr]. Long lived data therefor needs to be
/// copied. Whether this is an issue depends on the use case.
pub fn Connection(comptime Reader: type, comptime Writer: type) type {
    const preface = [_]u8{
        0x50, 0x52, 0x49, 0x20, 0x2a, 0x20, 0x48, 0x54, 0x54, 0x50, 0x2f,
        0x32, 0x2e, 0x30, 0x0d, 0x0a, 0x0d, 0x0a, 0x53, 0x4d, 0x0d, 0x0a,
        0x0d, 0x0a,
    };

    return struct {
        const Self = @This();

        allocator: std.mem.Allocator,
        frame_in: []u8,
        have: usize = 0,
        used: usize = 0,

        headers_in: []u8,
        hdec: hpack.Decoder,
        frame_out: []u8,

        reader: Reader,
        writer: Writer,

        pub fn init(
            a: std.mem.Allocator,
            buf_len: usize,
        ) !Self {
            return .{
                .allocator = a,
                .frame_in = try a.alloc(u8, buf_len),
                .headers_in = try a.alloc(u8, buf_len),
                .hdec = .{ .from = undefined, .to = undefined },
                .frame_out = try a.alloc(u8, buf_len),
                .reader = undefined,
                .writer = undefined,
            };
        }

        pub fn reinit(self: *Self, r: Reader, w: Writer) void {
            self.have = 0;
            self.used = 0;
            self.reader = r;
            self.writer = w;

            mem.set(u8, self.frame_in, 0);
            mem.set(u8, self.frame_out, 0);
            mem.set(u8, self.headers_in, 0);
        }

        pub fn deinit(self: *Self) void {
            self.allocator.free(self.frame_in);
            self.allocator.free(self.frame_out);
            self.allocator.free(self.headers_in);
        }

        fn read(self: *Self, needed: usize) ![]const u8 {
            var have = self.have - self.used;
            const in = self.frame_in[self.used..];

            const len = if (have < needed)
                try self.reader.readAtLeast(in, needed - have)
            else
                0;

            have += len;
            self.have += len;

            if (have == 0)
                return error.EndOfData;

            if (have < needed)
                return error.UnexpectedEndOfData;

            self.used += needed;

            return in[0..needed];
        }

        /// Start the HTTP/2 connection as the server and assuming
        /// "prior knowledge". This is a simple case of reading in the
        /// magic string (preface) sent by the client and sending a
        /// settings frame.
        pub fn start(self: *Self) !void {
            const pface = try self.read(preface.len);

            if (mem.eql(u8, &preface, pface))
                std.log.info("<<< Got preface!", .{})
            else {
                std.log.info("<<< Expected preface, bug got\n: {s}", .{pface});
                return error.InvalidPreface;
            }

            std.log.info(">>> Sending server preface", .{});
            const empty_settings = FrameHdr{
                .length = 0,
                .type = .settings,
                .flags = .{ .settings = .{ .ack = false } },
                .id = 0,
            };
            try self.writer.writeAll(empty_settings.to(self.frame_out));
        }

        /// Lower level iterator which returns just the frame
        /// header. Potentially this can be used to skip over
        /// uninteresting frames.
        pub fn nextFrameHdr(self: *Self) !FrameHdr {
            return FrameHdr.from((try self.read(9))[0..9]);
        }

        /// Returns a slightly higher level Frame payload iterator and
        /// frame header object. Still pretty low level. We'd probably
        /// want to abstract this into streams and abstract streams
        /// into requests and responses.
        pub fn nextFrame(self: *Self) !Frame {
            const hdr = try self.nextFrameHdr();

            return Frame.init(
                hdr,
                &self.hdec,
                try self.read(hdr.length),
                self.headers_in,
            );
        }

        ...
}

There is a simplified listener in the main library file which can be run with $ zig run src/http2.c. This uses the connection as follows.

pub const NetConnection = Connection(std.net.Stream, std.net.Stream);

fn serve(h2c: *NetConnection) !void {
    try h2c.start();

    while (h2c.nextFrame()) |frame| {
        std.log.info("<<< {} {}", frame);

        var payload = frame.payload;

        switch (payload) {
            .settings => |*settings| {
                while (settings.next()) |setting| {
                    std.log.info("    {}", .{setting});
                } else |err| {
                    if (err != error.EndOfData) return err;
                }
            },
            .headers => |*headers| {
                while (headers.next()) |h| {
                    std.log.info("    {s} => {s}", h);
                } else |err| {
                    if (err != error.EndOfData) return err;
                }

                std.log.info(">>> Sending 200 OK and end stream", .{});

                try h2c.sendHeaders(.{
                    .end_stream = true,
                    .stream_id = frame.hdr.id,
                }, &.{.{
                    .indexed = .status200,
                }});
            },
            else => {},
        }
    } else |err| {
        if (err != error.EndOfData)
            return err;
    }
}

So again it is an iterator that goes over the incoming frames from any stream. I have omitted the sendHeaders() function, but it just sends a headers frame back with :status => 200 and END_STREAM set. This closes the stream and Curl seems to go away happy.

Zelf Zerve 2

Finally I converted my static site server to use barely HTTP/2. It’s currently pretty useless due to the lack of TLS, which I’ll come to in a minute. To my knowledge browsers refuse to use HTTP/2 without TLS. So all we can do is use Curl.

I suppose one part that might be interesting is the use of std.os.sendfile to populated data frames.

    while (send_total < file_len) {
        const len_left = file_len - send_total;
        const frame_len = std.math.min(max_frame_len, len_left);
        const data_hdr = http2.FrameHdr{
            .length = @intCast(u24, frame_len),
            .type = .data,
            .flags = .{ .data = .{
                .endStream = len_left == frame_len,
                .padded = false,
            } },
            .id = stream_id,
        };
        var data_buf: [9]u8 = undefined;

        std.log.info(">>> Sending DATA {}", .{data_hdr});
        try h2c.writer.writeAll(data_hdr.to(&data_buf));

        var send_len: usize = 0;
        while (send_len < frame_len) {
            send_len += try std.os.sendfile(
                h2c.writer.handle,
                body_file.handle,
                send_total,
                frame_len - send_len,
                zero_iovec,
                zero_iovec,
                0,
            );
        }

        send_total += send_len;
    }

This is a bit more complicated than the HTTP1.1 version, but that is only because we split the data across multiple frames. It would probably be worse in HTTP1.1 if we started using chunked encoding.

Because we are working on a low level we can send the frame headers then copy the file (or page cache) data to the socket buffer in the kernel. Meaning we never have to buffer the file data in user land.

This may even be possible if we are using TLS if we can use a Linux crypto socket. Speaking of which…

TLS (TODO)

I was quite shocked that Zig already has a TLS implementation in the standard library. Presently though only the client is implemented. There is an issue open to create the server.

HTTP/2 connections are meant to be established with TLS and the ALPN extension. I suppose some clients may support HTTP/2, but not ALPN however I imagine Chrome and Firefox do. So hopefully I can find time to implement that or someone else will.