While working on my Linux socket
example I decided to write a tiny HTTP server for previewing my static website. This shows
the basics of using TCP sockets, correctly adds .html
to routes without it and saves me the distress of typing
python
, npm
or similar blasphemies. The
server is barely functional of course. However it is enough to get
my pages to appear in FireFox and Chrome.
It also happens that I am desperate to write Zig code. It’s an unfortunate part of my personality that I can not stay away from new languages (and kernels, web frameworkers etc.). If you want to ruin a project then choosing all new stuff is an excellent way to go about it. However I’ve learned the hard way to try out one new thing at a time. So in this article I’m just going to use Zig to do something I have done before.
This is the second time I have written some Zig, the first time I tried using it to build and test a radix sort and hash map implementation in C. This was moderately successful. One problem was that I managed to segfault the compiler, the other that I was confused about slices and pointers. This time I managed to also segfault the compiler and was still confused about slices.
This hasn’t deterred me however. For one thing I have spent barely any time on Zig. I’ve spent more time trying to figure out if something is a scalar or an array in Perl than I have with Zig. So I can forgive some head scratching over its obtuse type system errors.
Just to be clear, this is hardly an apples to apples comparison. For that I think we would have to rip out the standard libraries for both languages. Then build an application with total feature parity. Then we shall see exactly what each language gives us. Alternatively we could try using a C library which provides similar features to the Zig one.
Anyway enough rambling and interlinking. You can see the latest zig code here and the latest C code here. Let’s compare the imports and includes first.
Import/Include
Zig
const std = @import("std");
const net = std.net;
const mem = std.mem;
const fs = std.fs;
const io = std.io;
C
#define _GNU_SOURCE
#include <limits.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <sys/sendfile.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
I only used the standard library for Zig and POSIX for C. With
the exception of sys/sendfile.h
and perhaps something
else I have forgotten about. Everything from the Zig standard
library is imported entirely with @import("std")
, the
other statements are just regular assignments.
Zig doesn’t specifically have modules or whatever,
things like structs and unions act as namespaces. The
@import
statement wraps the source file it includes in
a struct type. So std
is a type of struct. Struct types
(or just structs) can have static variables, which I assume is what
std.io
is.
All struct types in Zig are anonymous unless they are assigned to a variable or appear in a return statement. Then they take on the name of the variable or the returning function respectively. It seems the first assignment becomes the canonical name.
Already this is saying a lot about Zig I think. Meanwhile the C
#includes
are not actually C, they are preprocessor
directives. The C preprocessor is a templating language more or
less. Including a file inserts its processed content at the point of
the include. It’s not immediately obvious what was included and
which parts of it we use.
I’m not entirely sure all of those includes are needed either. It should be possible to find out using static analysis, however I’m not exactly sure how to do it. Having said that, I’m pretty sure they all are needed.
The header files don’t include the full code for the functions being included either. The could do of course, but I’m linking against glibc and that is not how it works. By default Zig’s standard library is fully included. There is a huge discussion to be had about that, but it doesn’t effect the current project.
The Zig produced executable is bigger than the C one and it takes longer to compile. However they are both more than adequate for this project. It’s difficult to extrapolate this to a larger or more constrained scenario because Zig appears to have ways of dealing with these issues. Not to mention that you can throw out the c standard library.
What I think matters most here is that we have a big long list of
C headers for a relatively simple program. Also we know that
everything from std
is in the std
variable. At least until we assign something from std
to an outer variable.
It may be feasible to do something similar in C with structs and clever macros. However, using the defaults, Zig wins here.
Main
Zig
pub fn main() anyerror!void {
var args = std.process.args();
const exe_name = args.next() orelse "zelf-zerve";
const public_path = args.next() orelse {
"Usage: {s} <dir to serve files from>", .{exe_name});
std.log.err(return;
};
var dir = try fs.cwd().openDir(public_path, .{});
const self_addr = try net.Address.resolveIp("127.0.0.1", 9000);
var listener = net.StreamServer.init(.{});
try (&listener).listen(self_addr);
"Listening on {}; press Ctrl-C to exit...", .{self_addr});
std.log.info(
while ((&listener).accept()) |conn| {
"Accepted Connection from: {}", .{conn.address});
std.log.info(
catch |err| {
serveFile(&conn.stream, dir) if (@errorReturnTrace()) |bt| {
"Failed to serve client: {}: {}", .{err, bt});
std.log.err(} else {
"Failed to serve client: {}", .{err});
std.log.err(}
};
conn.stream.close();} else |err| {
return err;
}
}
C
int main(const int argc, const char *const argv[])
{
const pid_t orig_parent = getppid();
const struct sockaddr_in self_addr = {
.sin_family = AF_INET,
.sin_port = htons(9000),
.sin_addr = {
(INADDR_LOOPBACK)
htonl}
};
const int listen_sk = socket(AF_INET, SOCK_STREAM, 0);
const int public_dir = open(argv[1], O_PATH);
struct sockaddr client_addr;
;
socklen_t addr_len
if (argc < 2) {
(STDERR_FILENO,
dprintf"usage: %s <dir to serve files from>\n",
[0]);
argvreturn 1;
}
if (bind(listen_sk, (struct sockaddr *)&self_addr, sizeof(self_addr))) {
("bind");
perrorreturn 1;
}
if (listen(listen_sk, 8)) {
("listen");
perrorreturn 1;
}
("[+] Listening; press Ctrl-C to exit...\n");
printf
while (orig_parent == getppid()) {
const int sk = accept(listen_sk, &client_addr, &addr_len);
if (sk < 0) {
("[-] accept");
perrorbreak;
}
("[+] Accepted Connection\n");
printf
(sk, public_dir);
serve_file(sk);
close}
return 0;
}
If you want to access argv
in Zig, then you usually
create an iterator around it. You can of course access it directly,
but this is more error prone. You can see in the C code that I am
accessing argv[1]
before checking argc
.
The result is that it could try opening a path descriptor from an
environment variable or something along these lines.
For whatever reason Zig does not include args in main’s
arguments. I can’t say this makes any difference to me. The Zig
return value is void
or an error code. If an error code
is returned from main then Zig prints it. If debugging info is
available then Zig also prints a return error trace. This
is not to be confused with a back trace.
The way that Zig handles errors has a very significant impact on
this program. Most functions which can return an error are prefixed
with try
. If an error is returned then try
acts like return
and propagates the error. Otherwise it
behaves like an expression.
There is also catch
which can be used in various
places to branch on an error. Other things like while
can handle errors as well. You can see on the bottom that the loop
there has an else
clause.
In C we just use if
statements and you can see I am
ignoring some errors. My guess is that it is possible to implement
error return traces in C and something similar to try
using various types of magic. However I haven’t seen it done, so
this is a win for Zig.
The way that the while
loop captures the connection
variable |conn|
is a big win. Also note the
orelse
which specifically handles a null
result. The type system forces us to check that something is not
null
or an error before we try using it. This mitigates
a category of bugs and then Zig also provides some syntax to avoid
having if
s all over the place (or if you have used Rust
then… well, you know).
Variables in Zig must either be declared with const
or var
. What is more, if a variable can be
const
it must be. By default, in C everything is
mutable. I also haven’t found a way to warn when a variable could be
const. Again it should be possible to implement for C, but for now
Zig wins here. Zig also can infer the type of a variable most of the
time. This is obviously a good thing in some situations, but here it
may just leave a reader wondering what types the variables are.
Let’s ignore the address declaration in C, I could have done that differently. So moving on.
Receiving
Zig
const ServeFileError = error {
RecvHeaderEOF,
RecvHeaderExceededBuffer,
HeaderDidNotMatch,};
fn serveFile(stream: *const net.Stream, dir: fs.Dir) !void {
var recv_buf: [BUFSIZ]u8 = undefined;
var recv_total: usize = 0;
while (stream.read(recv_buf[recv_total..])) |recv_len| {
if (recv_len == 0)
return ServeFileError.RecvHeaderEOF;
recv_total += recv_len;
if (mem.containsAtLeast(u8, recv_buf[0..recv_total], 1, "\r\n\r\n"))
break;
if (recv_total >= recv_buf.len)
return ServeFileError.RecvHeaderExceededBuffer;
} else |read_err| {
return read_err;
}
const recv_slice = recv_buf[0..recv_total];
" <<<\n{s}", .{recv_slice});
std.log.info(
...
C
static void serve_file(const int sk, const int public_dir)
{
char recv_buf[BUFSIZ];
char head_buf[BUFSIZ];
const size_t buf_len = BUFSIZ - 1;
char path_buf[256];
char *file_path;
ssize_t recv, sent;
size_t recv_total = 0, sent_total = 0;
int body_fd;
while (1) {
= read(sk,
recv + recv_total,
recv_buf - recv_total);
buf_len
if (recv < 0) {
("[-] read");
perrorreturn;
}
if (!recv) {
(STDERR_FILENO,
dprintf"[-] End of data before header was received\n");
return;
}
+= recv;
recv_total [recv_total] = 0;
recv_buf
if (strstr(recv_buf, "\r\n\r\n"))
break;
if (recv_total >= buf_len) {
(STDERR_FILENO,
dprintf"Exceeded buffer reading header\n");
return;
}
}
("[*] <<<\n%s\n", recv_buf);
printf...
When we have a connection the first thing we do is receive the header. It’s expected that the entire header will be received in a single read most of the time. This web server is only for local usage after all. However occasionally this won’t happen because the copying of buffers can be interrupted and other random reasons. So we need a loop.
It’s difficult to know where to start here. I guess the weirdest
thing about the Zig code is that the while has
|recv_len|
an else
clause. The while loop
here is saying “while read
is not an error then… else
if it is an error…”. The symbol enclosed in pipes (|
)
is capturing the return value or error.
The call to read
is the first thing we do and will
want to break on if it goes wrong. In the C code I use a
while(1)
loop for the same reason; there is nothing to
check before we do the read. If the Zig code provides any concrete
advantage over C it is that it forces error checking. Meanwhile Zig
gives you a minimal effort way of debugging errors.
If I were to just return the errno
from
serve_file
in C then I wouldn’t know exactly where an
error came from. That is unless I use and outside tool like
strace
to see which system call caused an error (if
any). So ignoring outside tracing methods, Zig gets another win
here.
Also here you can see Zig’s arrays and slices;
recv_buf[recv_total..]
means we begin reading into the
buffer at an offset of recv_total
. Also we don’t need
to pass the buffer length separately because it is part of the slice
struct. Nor do we need to calculate the remaining length.
Hurray!
I suspect that Zig gets another win through slices for making it easy to avoid null terminated strings. Zig explicitly supports null terminated strings, but you don’t need them for the standard library’s string functions.
Routing
Zig
var file_path: []const u8 = undefined;
var tok_itr = mem.tokenize(u8, recv_slice, " ");
if (!mem.eql(u8, tok_itr.next() orelse "", "GET"))
return ServeFileError.HeaderDidNotMatch;
const path = tok_itr.next() orelse "";
if (path[0] != '/')
return ServeFileError.HeaderDidNotMatch;
if (mem.eql(u8, path, "/"))
"index"
file_path = else
1..];
file_path = path[
if (!mem.startsWith(u8, tok_itr.rest(), "HTTP/1.1\r\n"))
return ServeFileError.HeaderDidNotMatch;
var file_ext = fs.path.extension(file_path);
var path_buf: [fs.MAX_PATH_BYTES]u8 = undefined;
if (file_ext.len == 0) {
var path_fbs = io.fixedBufferStream(&path_buf);
try path_fbs.writer().print("{s}.html", .{file_path});
".html";
file_ext =
file_path = path_fbs.getWritten();}
"Opening {s}", .{file_path});
std.log.info(
var body_file = try dir.openFile(file_path, .{});
defer body_file.close();
const file_len = try body_file.getEndPos();
C
if (!sscanf(recv_buf, "GET %250s HTTP/1.1", path_buf)) {
(STDERR_FILENO,
dprintf"[-] 'GET <file_path> HTTP/1.1' not matched in:\n %s",
);
recv_buf}
if (!strcmp("/", path_buf)) {
(path_buf, "index.html");
strcpy= path_buf;
file_path } else if (path_buf[0] == '/') {
= path_buf + 1;
file_path }
("[*] Opening %s", file_path);
printf= openat(public_dir, file_path, O_RDONLY);
body_fd
if (body_fd < 0 && errno == ENOENT) {
(file_path + strlen(file_path), ".html");
strcpy= openat(public_dir, file_path, O_RDONLY);
body_fd (" failed trying with .html");
printf}
("\n");
printf
if (body_fd < 0) {
("[-] openat");
perrorreturn;
}
The Zig code is a bit lot longer because there is no
sscanf
equivalent in the Zig library. I’m not that
confident about either the C or Zig code. However note the
defer body_file.close()
line. This saves having to do a
goto
or close the file at every early return
thereafter.
Sending
zig
const http_head =
"HTTP/1.1 200 OK\r\n" ++
"Connection: close\r\n" ++
"Content-Type: {s}\r\n" ++
"Content-Length: {}\r\n" ++
"\r\n";
const mimes = .{
{".html", "text/html"},
.{".css", "text/css"},
.{".map", "application/json"},
.{".svg", "image/svg+xml"},
.{".jpg", "image/jpg"},
.{".png", "image/png"}
.};
var mime: []const u8 = "text/plain";
inline for (mimes) |kv| {
if (mem.eql(u8, file_ext, kv[0]))
1];
mime = kv[}
" >>>\n" ++ http_head, .{mime, file_len});
std.log.info(try stream.writer().print(http_head, .{mime, file_len});
const zero_iovec = &[0]std.os.iovec_const{};
var send_total: usize = 0;
while (true) {
const send_len = try std.os.sendfile(
stream.handle,
body_file.handle,
send_total,
file_len,
zero_iovec,
zero_iovec,0
);
if (send_len == 0)
break;
send_total += send_len;}
}
const char *const http_head =
"HTTP/1.1 200 OK\r\n"
"Connection: close\r\n"
"Content-Type: %s\r\n"
"Content-Length: %lu\r\n"
"\r\n";
const char *mime = "text/html";
if (strstr(file_path, ".css"))
= "text/css";
mime if (strstr(file_path, ".map"))
= "application/json";
mime if (strstr(file_path, ".svg"))
= "image/svg+xml";
mime if (strstr(file_path, ".jpg"))
= "image/jpg";
mime if (strstr(file_path, ".png"))
= "image/png";
mime
struct stat body_stat;
if (fstat(body_fd, &body_stat)) {
("[-] fstat");
perrorgoto close_body;
}
(head_buf, http_head, mime, body_stat.st_size);
sprintf("[*] >>>\n%s", head_buf);
printf
while (sent_total < strlen(http_head)) {
= write(sk, head_buf + sent_total, strlen(head_buf));
sent
if (sent < 0) {
("[-] write");
perrorgoto close_body;
}
+= sent;
sent_total }
do {
= sendfile(sk, body_fd, NULL, body_stat.st_size);
sent
if (sent < 0) {
("[-] sendfile");
perrorgoto close_body;
}
+= sent;
sent_total } while (sent > 0);
:
close_body(body_fd);
close}
So here we can see the C has a goto
in it. I’m not
sure it makes much of a difference here although I guess it’s easier
to mess up using goto
than defer
for
freeing resources on exit. On the other hand you may be looking at
defer
thinking “huh? When does that run?”.
I have to say that Zig suffered a major fail in this part because the compiler segfaulted when I was trying to write the mime selection code. At the time of writing the following code will cause a segfault.
const ms = .{ "a", "b" };
const a = set: {
inline for (ms) |m| {
if (mem.eql(u8, "a", m))
break :set m;
}
break :set "c";
};
const a2: [:0]const u8 = "a";
try testing.expectEqual(a2, a);
This appears to be valid Zig code because it at least gets as far as emitting LLVM IR. However there is some issue there. Of course this is also very weird looking, so it’s perhaps best that I removed it.
Also note the inline for
, this is required
because ms
and mimes
are known at compile
time and I think have comptime
types. Zig doesn’t have
a preprocessor, macro’s or templates. Instead it allows code with
inputs known at compile time, to be ran at compile time. I suppose
we could stop this code being evaluated at compile time by
specifying runtime types on mimes
.
In this program it’s not clear what the advantages of
comptime
are. Meanwhile it got in my way a little bit
when getting errors like.
./src/self-serve.zig:114:5: error: unable to evaluate constant expression
for (mimes) |kv| {
It’s worth mentioning that C compilers can evaluate a lot at
compile time as well. You can see this demonstrated in my automata article. This simply
happens when turning on optimisations and avoiding things which will
hide the “constness” of variables. I suppose that
comptime
has resulted in a win for C here. Although
this won’t dampen my enthusiasm for comptime
in
general.
Frankly I’m finding it increasingly difficult to draw solid comparisons at this point. While writing this article I keep discovering things I could do differently in both Zig and C. However I feel like it is time to cap this off.
Conclusion
This application isn’t exactly a major stress test for either language. They both fit well within my requirements for executable size and execution performance even with all the sanitizers turned on. There aren’t any of the complications of a large modular code base either. It doesn’t even allocate heap memory.
However I think this shows that Zig makes some concrete advances over C. Meanwhile it doesn’t appear to make anything more difficult. At least so long as the compiler doesn’t segfault or blurt out something like “cannot store runtime value in type ‘comptime_int’”, without any hint as to what to do about it.
Most issues I have encountered seem to be temporary
implementation problems. Andrew Kelly and Co. didn’t decide to make
radical changes over C that introduce new problems. Rather they
changed some defaults and added evolutionary improvements. At least
as far as this application shows. I still wonder if there are
dragons lurking in the comptime
features. On the other
hand comptime
can be seen as an evolution of the C
preprocessor and other tools which generate C code.