While working on my Linux socket
example I decided to write a tiny HTTP server for previewing my static website. This shows the
basics of using TCP sockets, correctly adds .html
to routes
without it and saves me the distress of typing python
,
npm
or similar blasphemies. The server is barely functional
of course. However it is enough to get my pages to appear in FireFox and
Chrome.
It also happens that I am desperate to write Zig code. It’s an unfortunate part of my personality that I can not stay away from new languages (and kernels, web frameworkers etc.). If you want to ruin a project then choosing all new stuff is an excellent way to go about it. However I’ve learned the hard way to try out one new thing at a time. So in this article I’m just going to use Zig to do something I have done before.
This is the second time I have written some Zig, the first time I tried using it to build and test a radix sort and hash map implementation in C. This was moderately successful. One problem was that I managed to segfault the compiler, the other that I was confused about slices and pointers. This time I managed to also segfault the compiler and was still confused about slices.
This hasn’t deterred me however. For one thing I have spent barely any time on Zig. I’ve spent more time trying to figure out if something is a scalar or an array in Perl than I have with Zig. So I can forgive some head scratching over its obtuse type system errors.
Just to be clear, this is hardly an apples to apples comparison. For that I think we would have to rip out the standard libraries for both languages. Then build an application with total feature parity. Then we shall see exactly what each language gives us. Alternatively we could try using a C library which provides similar features to the Zig one.
Anyway enough rambling and interlinking. You can see the latest zig code here and the latest C code here. Let’s compare the imports and includes first.
Import/Include
Zig
const std = @import("std");
const net = std.net;
const mem = std.mem;
const fs = std.fs;
const io = std.io;
C
#define _GNU_SOURCE
#include <limits.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <sys/sendfile.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
I only used the standard library for Zig and POSIX for C. With the
exception of sys/sendfile.h
and perhaps something else I
have forgotten about. Everything from the Zig standard library is
imported entirely with @import("std")
, the other statements
are just regular assignments.
Zig doesn’t specifically have modules or whatever, things
like structs and unions act as namespaces. The @import
statement wraps the source file it includes in a struct type. So
std
is a type of struct. Struct types (or just structs) can
have static variables, which I assume is what std.io
is.
All struct types in Zig are anonymous unless they are assigned to a variable or appear in a return statement. Then they take on the name of the variable or the returning function respectively. It seems the first assignment becomes the canonical name.
Already this is saying a lot about Zig I think. Meanwhile the C
#includes
are not actually C, they are preprocessor
directives. The C preprocessor is a templating language more or less.
Including a file inserts its processed content at the point of the
include. It’s not immediately obvious what was included and which parts
of it we use.
I’m not entirely sure all of those includes are needed either. It should be possible to find out using static analysis, however I’m not exactly sure how to do it. Having said that, I’m pretty sure they all are needed.
The header files don’t include the full code for the functions being included either. The could do of course, but I’m linking against glibc and that is not how it works. By default Zig’s standard library is fully included. There is a huge discussion to be had about that, but it doesn’t effect the current project.
The Zig produced executable is bigger than the C one and it takes longer to compile. However they are both more than adequate for this project. It’s difficult to extrapolate this to a larger or more constrained scenario because Zig appears to have ways of dealing with these issues. Not to mention that you can throw out the c standard library.
What I think matters most here is that we have a big long list of C
headers for a relatively simple program. Also we know that everything
from std
is in the std
variable. At least
until we assign something from std
to an outer
variable.
It may be feasible to do something similar in C with structs and clever macros. However, using the defaults, Zig wins here.
Main
Zig
pub fn main() anyerror!void {
var args = std.process.args();
const exe_name = args.next() orelse "zelf-zerve";
const public_path = args.next() orelse {
"Usage: {s} <dir to serve files from>", .{exe_name});
std.log.err(return;
};
var dir = try fs.cwd().openDir(public_path, .{});
const self_addr = try net.Address.resolveIp("127.0.0.1", 9000);
var listener = net.StreamServer.init(.{});
try (&listener).listen(self_addr);
"Listening on {}; press Ctrl-C to exit...", .{self_addr});
std.log.info(
while ((&listener).accept()) |conn| {
"Accepted Connection from: {}", .{conn.address});
std.log.info(
catch |err| {
serveFile(&conn.stream, dir) if (@errorReturnTrace()) |bt| {
"Failed to serve client: {}: {}", .{err, bt});
std.log.err(} else {
"Failed to serve client: {}", .{err});
std.log.err(}
};
conn.stream.close();} else |err| {
return err;
}
}
C
int main(const int argc, const char *const argv[])
{
const pid_t orig_parent = getppid();
const struct sockaddr_in self_addr = {
.sin_family = AF_INET,
.sin_port = htons(9000),
.sin_addr = {
(INADDR_LOOPBACK)
htonl}
};
const int listen_sk = socket(AF_INET, SOCK_STREAM, 0);
const int public_dir = open(argv[1], O_PATH);
struct sockaddr client_addr;
;
socklen_t addr_len
if (argc < 2) {
(STDERR_FILENO,
dprintf"usage: %s <dir to serve files from>\n",
[0]);
argvreturn 1;
}
if (bind(listen_sk, (struct sockaddr *)&self_addr, sizeof(self_addr))) {
("bind");
perrorreturn 1;
}
if (listen(listen_sk, 8)) {
("listen");
perrorreturn 1;
}
("[+] Listening; press Ctrl-C to exit...\n");
printf
while (orig_parent == getppid()) {
const int sk = accept(listen_sk, &client_addr, &addr_len);
if (sk < 0) {
("[-] accept");
perrorbreak;
}
("[+] Accepted Connection\n");
printf
(sk, public_dir);
serve_file(sk);
close}
return 0;
}
If you want to access argv
in Zig, then you usually
create an iterator around it. You can of course access it directly, but
this is more error prone. You can see in the C code that I am accessing
argv[1]
before checking argc
. The result is
that it could try opening a path descriptor from an environment variable
or something along these lines.
For whatever reason Zig does not include args in main’s arguments. I
can’t say this makes any difference to me. The Zig return value is
void
or an error code. If an error code is returned from
main then Zig prints it. If debugging info is available then Zig also
prints a return error trace. This is not to be confused with a
back trace.
The way that Zig handles errors has a very significant impact on this
program. Most functions which can return an error are prefixed with
try
. If an error is returned then try
acts
like return
and propagates the error. Otherwise it behaves
like an expression.
There is also catch
which can be used in various places
to branch on an error. Other things like while
can handle
errors as well. You can see on the bottom that the loop there has an
else
clause.
In C we just use if
statements and you can see I am
ignoring some errors. My guess is that it is possible to implement error
return traces in C and something similar to try
using
various types of magic. However I haven’t seen it done, so this is a win
for Zig.
The way that the while
loop captures the connection
variable |conn|
is a big win. Also note the
orelse
which specifically handles a null
result. The type system forces us to check that something is not
null
or an error before we try using it. This mitigates a
category of bugs and then Zig also provides some syntax to avoid having
if
s all over the place (or if you have used Rust then…
well, you know).
Variables in Zig must either be declared with const
or
var
. What is more, if a variable can be const
it must be. By default, in C everything is mutable. I also haven’t found
a way to warn when a variable could be const. Again it should be
possible to implement for C, but for now Zig wins here. Zig also can
infer the type of a variable most of the time. This is obviously a good
thing in some situations, but here it may just leave a reader wondering
what types the variables are.
Let’s ignore the address declaration in C, I could have done that differently. So moving on.
Receiving
Zig
const ServeFileError = error {
RecvHeaderEOF,
RecvHeaderExceededBuffer,
HeaderDidNotMatch,};
fn serveFile(stream: *const net.Stream, dir: fs.Dir) !void {
var recv_buf: [BUFSIZ]u8 = undefined;
var recv_total: usize = 0;
while (stream.read(recv_buf[recv_total..])) |recv_len| {
if (recv_len == 0)
return ServeFileError.RecvHeaderEOF;
recv_total += recv_len;
if (mem.containsAtLeast(u8, recv_buf[0..recv_total], 1, "\r\n\r\n"))
break;
if (recv_total >= recv_buf.len)
return ServeFileError.RecvHeaderExceededBuffer;
} else |read_err| {
return read_err;
}
const recv_slice = recv_buf[0..recv_total];
" <<<\n{s}", .{recv_slice});
std.log.info(
...
C
static void serve_file(const int sk, const int public_dir)
{
char recv_buf[BUFSIZ];
char head_buf[BUFSIZ];
const size_t buf_len = BUFSIZ - 1;
char path_buf[256];
char *file_path;
ssize_t recv, sent;
size_t recv_total = 0, sent_total = 0;
int body_fd;
while (1) {
= read(sk,
recv + recv_total,
recv_buf - recv_total);
buf_len
if (recv < 0) {
("[-] read");
perrorreturn;
}
if (!recv) {
(STDERR_FILENO,
dprintf"[-] End of data before header was received\n");
return;
}
+= recv;
recv_total [recv_total] = 0;
recv_buf
if (strstr(recv_buf, "\r\n\r\n"))
break;
if (recv_total >= buf_len) {
(STDERR_FILENO,
dprintf"Exceeded buffer reading header\n");
return;
}
}
("[*] <<<\n%s\n", recv_buf);
printf...
When we have a connection the first thing we do is receive the header. It’s expected that the entire header will be received in a single read most of the time. This web server is only for local usage after all. However occasionally this won’t happen because the copying of buffers can be interrupted and other random reasons. So we need a loop.
It’s difficult to know where to start here. I guess the weirdest
thing about the Zig code is that the while has |recv_len|
an else
clause. The while loop here is saying “while
read
is not an error then… else if it is an error…”. The
symbol enclosed in pipes (|
) is capturing the return value
or error.
The call to read
is the first thing we do and will want
to break on if it goes wrong. In the C code I use a
while(1)
loop for the same reason; there is nothing to
check before we do the read. If the Zig code provides any concrete
advantage over C it is that it forces error checking. Meanwhile Zig
gives you a minimal effort way of debugging errors.
If I were to just return the errno
from
serve_file
in C then I wouldn’t know exactly where an error
came from. That is unless I use and outside tool like
strace
to see which system call caused an error (if any).
So ignoring outside tracing methods, Zig gets another win here.
Also here you can see Zig’s arrays and slices;
recv_buf[recv_total..]
means we begin reading into the
buffer at an offset of recv_total
. Also we don’t need to
pass the buffer length separately because it is part of the slice
struct. Nor do we need to calculate the remaining length. Hurray!
I suspect that Zig gets another win through slices for making it easy to avoid null terminated strings. Zig explicitly supports null terminated strings, but you don’t need them for the standard library’s string functions.
Routing
Zig
var file_path: []const u8 = undefined;
var tok_itr = mem.tokenize(u8, recv_slice, " ");
if (!mem.eql(u8, tok_itr.next() orelse "", "GET"))
return ServeFileError.HeaderDidNotMatch;
const path = tok_itr.next() orelse "";
if (path[0] != '/')
return ServeFileError.HeaderDidNotMatch;
if (mem.eql(u8, path, "/"))
"index"
file_path = else
1..];
file_path = path[
if (!mem.startsWith(u8, tok_itr.rest(), "HTTP/1.1\r\n"))
return ServeFileError.HeaderDidNotMatch;
var file_ext = fs.path.extension(file_path);
var path_buf: [fs.MAX_PATH_BYTES]u8 = undefined;
if (file_ext.len == 0) {
var path_fbs = io.fixedBufferStream(&path_buf);
try path_fbs.writer().print("{s}.html", .{file_path});
".html";
file_ext =
file_path = path_fbs.getWritten();}
"Opening {s}", .{file_path});
std.log.info(
var body_file = try dir.openFile(file_path, .{});
defer body_file.close();
const file_len = try body_file.getEndPos();
C
if (!sscanf(recv_buf, "GET %250s HTTP/1.1", path_buf)) {
(STDERR_FILENO,
dprintf"[-] 'GET <file_path> HTTP/1.1' not matched in:\n %s",
);
recv_buf}
if (!strcmp("/", path_buf)) {
(path_buf, "index.html");
strcpy= path_buf;
file_path } else if (path_buf[0] == '/') {
= path_buf + 1;
file_path }
("[*] Opening %s", file_path);
printf= openat(public_dir, file_path, O_RDONLY);
body_fd
if (body_fd < 0 && errno == ENOENT) {
(file_path + strlen(file_path), ".html");
strcpy= openat(public_dir, file_path, O_RDONLY);
body_fd (" failed trying with .html");
printf}
("\n");
printf
if (body_fd < 0) {
("[-] openat");
perrorreturn;
}
The Zig code is a bit lot longer because there is no
sscanf
equivalent in the Zig library. I’m not that
confident about either the C or Zig code. However note the
defer body_file.close()
line. This saves having to do a
goto
or close the file at every early return
thereafter.
Sending
zig
const http_head =
"HTTP/1.1 200 OK\r\n" ++
"Connection: close\r\n" ++
"Content-Type: {s}\r\n" ++
"Content-Length: {}\r\n" ++
"\r\n";
const mimes = .{
{".html", "text/html"},
.{".css", "text/css"},
.{".map", "application/json"},
.{".svg", "image/svg+xml"},
.{".jpg", "image/jpg"},
.{".png", "image/png"}
.};
var mime: []const u8 = "text/plain";
inline for (mimes) |kv| {
if (mem.eql(u8, file_ext, kv[0]))
1];
mime = kv[}
" >>>\n" ++ http_head, .{mime, file_len});
std.log.info(try stream.writer().print(http_head, .{mime, file_len});
const zero_iovec = &[0]std.os.iovec_const{};
var send_total: usize = 0;
while (true) {
const send_len = try std.os.sendfile(
stream.handle,
body_file.handle,
send_total,
file_len,
zero_iovec,
zero_iovec,0
);
if (send_len == 0)
break;
send_total += send_len;}
}
const char *const http_head =
"HTTP/1.1 200 OK\r\n"
"Connection: close\r\n"
"Content-Type: %s\r\n"
"Content-Length: %lu\r\n"
"\r\n";
const char *mime = "text/html";
if (strstr(file_path, ".css"))
= "text/css";
mime if (strstr(file_path, ".map"))
= "application/json";
mime if (strstr(file_path, ".svg"))
= "image/svg+xml";
mime if (strstr(file_path, ".jpg"))
= "image/jpg";
mime if (strstr(file_path, ".png"))
= "image/png";
mime
struct stat body_stat;
if (fstat(body_fd, &body_stat)) {
("[-] fstat");
perrorgoto close_body;
}
(head_buf, http_head, mime, body_stat.st_size);
sprintf("[*] >>>\n%s", head_buf);
printf
while (sent_total < strlen(http_head)) {
= write(sk, head_buf + sent_total, strlen(head_buf));
sent
if (sent < 0) {
("[-] write");
perrorgoto close_body;
}
+= sent;
sent_total }
do {
= sendfile(sk, body_fd, NULL, body_stat.st_size);
sent
if (sent < 0) {
("[-] sendfile");
perrorgoto close_body;
}
+= sent;
sent_total } while (sent > 0);
:
close_body(body_fd);
close}
So here we can see the C has a goto
in it. I’m not sure
it makes much of a difference here although I guess it’s easier to mess
up using goto
than defer
for freeing resources
on exit. On the other hand you may be looking at defer
thinking “huh? When does that run?”.
I have to say that Zig suffered a major fail in this part because the compiler segfaulted when I was trying to write the mime selection code. At the time of writing the following code will cause a segfault.
const ms = .{ "a", "b" };
const a = set: {
inline for (ms) |m| {
if (mem.eql(u8, "a", m))
break :set m;
}
break :set "c";
};
const a2: [:0]const u8 = "a";
try testing.expectEqual(a2, a);
This appears to be valid Zig code because it at least gets as far as emitting LLVM IR. However there is some issue there. Of course this is also very weird looking, so it’s perhaps best that I removed it.
Also note the inline for
, this is required
because ms
and mimes
are known at compile time
and I think have comptime
types. Zig doesn’t have a
preprocessor, macro’s or templates. Instead it allows code with inputs
known at compile time, to be ran at compile time. I suppose we could
stop this code being evaluated at compile time by specifying runtime
types on mimes
.
In this program it’s not clear what the advantages of
comptime
are. Meanwhile it got in my way a little bit when
getting errors like.
./src/self-serve.zig:114:5: error: unable to evaluate constant expression
for (mimes) |kv| {
It’s worth mentioning that C compilers can evaluate a lot at compile
time as well. You can see this demonstrated in my automata article. This simply happens
when turning on optimisations and avoiding things which will hide the
“constness” of variables. I suppose that comptime
has
resulted in a win for C here. Although this won’t dampen my enthusiasm
for comptime
in general.
Frankly I’m finding it increasingly difficult to draw solid comparisons at this point. While writing this article I keep discovering things I could do differently in both Zig and C. However I feel like it is time to cap this off.
Conclusion
This application isn’t exactly a major stress test for either language. They both fit well within my requirements for executable size and execution performance even with all the sanitizers turned on. There aren’t any of the complications of a large modular code base either. It doesn’t even allocate heap memory.
However I think this shows that Zig makes some concrete advances over C. Meanwhile it doesn’t appear to make anything more difficult. At least so long as the compiler doesn’t segfault or blurt out something like “cannot store runtime value in type ‘comptime_int’”, without any hint as to what to do about it.
Most issues I have encountered seem to be temporary implementation
problems. Andrew Kelly and Co. didn’t decide to make radical changes
over C that introduce new problems. Rather they changed some defaults
and added evolutionary improvements. At least as far as this application
shows. I still wonder if there are dragons lurking in the
comptime
features. On the other hand comptime
can be seen as an evolution of the C preprocessor and other tools which
generate C code.