Drop me an E-mail at io@richiejp.com and I’ll try my best
to respond within a couple of days.
How I may help you
Below is an illustration of how I may help you. I am open to adjacent
work.
I have the ability to independently map out, understand and debug an
unfamiliar system. Including the layers below the application.
System Software
- Develop tests for kernel and system software
- Bug review
- Reproducer development
- Fuzzing
- Test harness development
- Track down bugs throughout your software stack
- Code review
- Kernel and userland tracing
- Niche, bespoke or legacy software maintenance
- Solve problems in software there is no expert for
- Explore and document mysterious code
- Introduce testing
- Solve performance issues at the source
- Better utilise lower level system components
- Select better data structures and algorithms
How does this create value?
- Find problems before they happen in production
- Prioritise problem areas with the highest payoff for introducing
testing
- Enable legacy systems to be updated or replaced
- Produce a better user experience by reducing latency
Technologies
In my 10+ years experience; solving a problem in a sufficiently
complex system will require learning or relearning some software or
theory. There is no such thing as a polyglot programmer just people who
have sequentially used many languages and can learn or relearn them
quickly.
However here are my core competencies to get us in the right ball
park.
System Software
I know Linux having spent the last 6+ years writing kernel
tests for SUSE. I also have
knowledge of more niche systems such as FreeBSD and the Nanos unikernel.
Web Software
Having recently developed DoBu.uk I am
intimately familiar with the following.
- Svelte with TypeScript and NodeJS
- TailwindCSS
- Redis/KeyDB
- Fly.io
Languages
The major two languages I am most familiar with are
Less well known are
To name a few; in the past I have used
- Perl
- C++ (QT)
- Rust
- C# (8+ years ago)
- Python
Computer Science
I am aware of cache efficient data structures
and big O notation. I know the basics of compiler and operating system
theory. There are problems which require theory (e.g. random number
generation) and I can find the relevant material.
Projects
Here are some brief case studies of projects I have worked on and
particular problems that I solved. For more technical stuff see this
website’s index.
Mostly a collection of Linux kernel tests. I have made many
contributions to this, including a lot of code review and leveraged it
within SUSE to test the Linux kernel.
Vulnerability (CVE) testing
- Problem: Fixes for bugs which cause vulnerabilities
sometimes do not work.
- Solution: Create reproducers for those bugs which
test fix
- Benefit: Some bad fixes are detected which had the
second order effect of highlighting broken procedures in how fixes are
backported.
Creating reproducers is not a new concept. However I rebooted efforts
to get more reproducers into the LTP. This encouraged others to
contribute as well.
FuzzSync race exposition library
This leads on from the previous one.
- Problem: Reproducing some bugs requires reliably
reproducing a data race. The usual methods of doing this are either
resource intensive or require a particular kernel.
- Solution: Create a library which makes that
easier
- Benefit: We can easily reproduce most bugs
involving a data race without resorting to tricks that require a
particular kernel config.
Control groups have emerged as a critical kernel interface. They are
used by container, VM and system managers. As part of a larger effort to
increase test coverage of them I increased LTP’s support.
- Problem: It’s difficult to write tests which
interact with both Kernel CGroup APIs V1 and V2. Also to discover the
existing CGroup setup created by, for e.g., systemd.
- Solution: Create a compatability layer which
abstracts controller discovery, CGroup creation and interactions.
- Benefit: It is far easier now to write tests which
interact with CGroups, for example cfs_bandwidth01 which I wrote. More
importantly it encourages others to write tests interacting with
CGroups.
- Problem: We encounter repetitive mistakes during
review especially around LTP library usage
- Solution: Implement our own C static analysis tool
based on Sparse. So far only 3 checks were implemented
- Benefit: 3 less problems around improper usage of
the API. A better experience for contributors and maintainers.
Improving the new
user/contributor experience
- Problem: It wasn’t immediately clear how to run the
tests or write a new one
- Solution: Add a quick start and other
documentation
- Benefit: More new users and contributors assuming
it wasn’t a coincidence
I have introduced other areas of testing which follow the same
pattern as above.
- Problem: It’s difficult to test some kernel feature
consistently across different systems.
- Solution: Introduce some supporting code into the
LTP test framework
- Benefit: Increased test coverage of the kernel and
better feedback for kernel developers
Linux Kernel
I have found and fixed a number of bugs in the Linux kernel. The
reasons for doing this are rather roundabout.
- Problem: I/we don’t fully understand the challenges
in fixing a kernel bug
- Solution: Personally fix some kernel bugs
- Benefit: First hand experience of what is required
or nice to have when fixing a bug which and what are the challenges of
testing during development. Leading to better testing.
CAN and SLIP
Found and fixed some issues in CAN and SLIP which are potentially
exploitable:
b9258a2cece4
slcan: Don’t transmit uninitialized stack
data in padding
0ace17d56824
can, slip: Protect tty->disc_data in
write_wakeup and close with RCU
vsock
4c1e34c0dbff
vsock: Enable y2038 safe timeval for
timeout
685c3f2fba29
vsock: Refactor vsock_*_getsockopt to
resemble sock_getsockopt
memcg/slab
I found and suggested a fix for a bug in the memory CGroup. It was
refused in favor of a more general fix.
mm:
memcg/slab: Stop reparented obj_cgroups from charging root
Full operating system testing framework.
New QEMU backend
The QEMU backend manages VMs used for testing. OpenQA makes use of
snapshotting to revert a VM to a good state (anchor point) when
something fails and continue testing.
- Problem: Snapshotting was slow and would fail under
many circumstances. Additionally there were many smaller issues, such as
exporting UEFI firmware variables. This made some testing
impossible.
- Solution: Rewrite
the QEMU backend
- Benefits: Our test matrix could be expanded by a
considerable amount. Increasing coverage and finding more problems.
Real serial console
- Problem: Originally being purely a GUI testing
framework, serial consoles were supported in a very roundabout way
- Solution: Directly support serial consoles
(primarily virtio in QEMU)
- Benefits: Dramatic decrease in test runtime which
means developers get test results faster and have more time to react to
failures. It also decreased resource usage allowing more testing.
LTP test runner
- Problem: Only some LTP tests were being run, many
were failing and determining whether it was due to a product bug or test
bug was difficult
- Solution: Write a test runner for LTP within OpenQA
that displayed test results well and provided debugging aids.
- Benefits: A big expansion in test coverage, kernel
bugs were reported with useful info and LTP tests could be flagged for
fixing.
A SaaS which I have written about extensively and candidly. Both here and on IndieHackers.
To summarise:
- Problem: christineharp.co.uk needed to
display her availability without taking bookings.
- Solution: I worked with her to create a product
which does that.
- Benefit: She now has a beautiful calendar
JDP
Now defunct data analysis framework in Julia which I planned to use
to create an “ontology” (to steal Palantir’s wording) that would allow
bug reports, bug fixes, test failures and logs from various sources to
be automatically cross referenced.
- Problem: An LTP test has failed; it’s not
immediately obvious what all the relevant context may be.
- Solution: Use whatever algorithms to scour all our
data and find any information that may be relevant.
- Benefits: It automatically
matched test failures with existing bug reports saving a lot of
review time