One Dimensional Reversible Automata
I created a demo for the GFXPrim library. It implements and displays a nearest-neighbor, one-dimensional, binary cell automata. Additionally it implements a reversible automata, which is almost identical except for a small change to make it reversible. The automata is displayed over time in two dimensions, time travels from top to bottom. Although in the reversible case time could be played backwards.
The automata works as follows:
- Each cell has a state, which is on or off, black or white, boolean etc.
- At each time step, the state of a cell in the next step is chosen by a rule.
- The rule looks at a cell’s current value and the values of its left and right neighbors.
- There are 23 = 8 possible state combinations (patterns) for 3 binary cells.
- A rule states which patterns result in a black cell in the next time step.
- There are 28 = 256 possible rules. That is, 256 unique combinations of patterns.
So a pattern is a 3 digit binary number, where each digit corresponds to a cell. The middle digit is the center cell, the high order bit the left cell, the low order bit the right cell. A rule can be display by showing a row of patterns and a row of next states.
111 | 110 | 101 | 100 | 011 | 010 | 001 | 000 |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
Above is rule 110
, 0x6e
or
01101110
. It essentially says to match patterns
110
, 101
, 011
,
010
, 001
. Where a pattern match results in
the cell being set to 1 at the next time step. If no pattern is
matched or equivalently, an inactive pattern is matched, then the
cell will be set to 0.
Again note that each pattern resembles a 3bit binary number. Also the values of the active patterns resemble an 8bit binary number. We can use this to perform efficient matching of the patterns using binary operations.
Let’s assume our CPU natively operates on 64bit integers (called
words). We can pack a 64 cell automata into a single 64bit
integer. Each bit corresponds to a cell. If a bit is 1
then it is a black cell and 0
for white. In this case
we are using integers as bit fields. We don’t care about the integer
number the bits can represent.
The CPU can perform bitwise operations on all 64bits in parallel and without branching. This means we can perform a single operation 64 times in parallel.1
If we rotate (wrapped >>
2) all bits to the right by one,
then we get a new integer where the left neighbor of a bit is now in
its position. Likewise if we shift all bits to the left, then we get
an integer representing the right neighbors. This gives us 3
integers where the left, center and right bits are in the same
position. For example, using only 8bits:
left: | 0100 1011 | >> |
center: | 1001 0110 | |
right: | 0010 1101 | << |
Each pattern can be represented as a 3bit number, plus a 4th bit to say whether it is active in a given rule. As we want to operate on all 64bits at once in the left, right and center bit fields. We can generate 64bit long masks from the value of each bit in a given pattern.
So if we have a pattern where the left cell should be one, then
we can create a 64bit mask of all ones. If it should be
zero, then all zeroes. Likewise for the center and right cells. The
masks can be xor’ed3 (^
)
with the corresponding cell fields to show if no match occurred.
That is, if the pattern is one and the cell is zero or the cell is
one and the pattern is zero. We can invert this (~
) to
give one when a match occurs.
To see whether all components (left, right, center) of a pattern
matches we can bitwise and (&
) them
together. We can then bitwise or4
(|
) the result of the pattern matches together to
produce the final values.
If we wish to operate on an automata larger than 64 cells, then we can combine multiple integers into an array. After performing the left and right shifts, we get the high or low bit from the next or previous integers in the array. Then set the low and high bits of the right and left bit fields. In other words we chain them together using the end bits of the left and right bit fields.
For illustration purposes, below is the kernel of the the automata algorithm.
/* If bit n is 1 then make all bits 1 otherwise 0 */
#define BIT_TO_MAX(b, n) (((b >> n) & 1) * ~0UL)
/* Numeric representation of the current update rule */
static uint8_t rule = 110;
/* Apply the current rule to a 64bit segment of a row */
static inline uint64_t ca1d_rule_apply(uint64_t c_prev,
uint64_t c,
uint64_t c_next,
uint64_t c_prev_step)
{
int i;
/* These are wrapping shifts when c_prev == c or c_next == c */
uint64_t l = (c >> 1) ^ (c_prev << 63);
uint64_t r = (c << 1) ^ (c_next >> 63);
uint64_t c_next_step = 0;
for (i = 0; i < 8; i++) {
uint64_t active = BIT_TO_MAX(rule, i);
uint64_t left = BIT_TO_MAX(i, 2);
uint64_t center = BIT_TO_MAX(i, 1);
uint64_t right = BIT_TO_MAX(i, 0);
|=
c_next_step & ~(left ^ l) & ~(center ^ c) & ~(right ^ r);
active }
/* The automata becomes reversible when we use c_prev_state */
return c_next_step ^ c_prev_step;
}
To make the automata “reversible” an extra step can be added. We look at a cell’s previous (in addition to the current, left and right) and if it was one then invert the next value. This is equivalent to xor’ring the previous value with the next.
It is not entirely clear to me what the mathematical implications are of being reversible. However it is important to physics and makes some really cool patterns which mimic nature. Also entropy and the second rule of themodynamics, yada, yada…
The automata definition is taken from Stephen Wolfram’s “A new
kind of science”. He proposes at least one obvious5 C implementation using arrays of
cells. He also provides a table of binary expressions for each rule.
E.g. rule 90 reduces to just the l^r
binary expression.
It may be possible for the compiler to automatically reduce my
implementation to these minimal expressions.
To see why, let’s consider rule 90 for each pattern.
111 | 110 | 101 | 100 | 011 | 010 | 001 | 000 |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
01011010 = 90
First for pattern 000
.
& ~(left ^ l) & ~(center ^ c) & ~(right ^ r);
active = 0 & ~(0 ^ l) & ~(0 ^ c) & ~(0 ^ r);
= 0.`
Active is zero so the whole line reduces to zero. Now for
001
. Note that 1
here actually means
~0UL
, that is 64bit integer max.
1 & ~(0 ^ l) & ~(0 ^ c) & ~(1 ^ r);
= ~l & ~c & r.
As expected pattern 001
matches
l=0, c=0, r=1
. Let’s just list the remaining patterns
or’ed together in their reduced state. Then reduce that further.
Note that the for
loop in ca1d_rule_apply
will be unrolled by the compiler when optimising for
performance. It’s also quite clear that c_next_step
is
dependant on an expression from the previous iteration or zero. So
all the pattern match results will get or’ed together.
& c & ~r | l & ~c & ~r | ~l & c & r | ~l & ~c & r;
l = l & ~r | ~l & r;
= l ^ r.
See on the top row that
(l & c & ~r | l & ~c & ~r)
or’s
together c
and not c
. So we can remove it.
Then we get an expression equivalent to xor’ring l
and
r
.
In theory at least, the compiler can see that rule
only has 256 values and create a reduced version of
ca1d_rule_apply
for each value. Whether it actually
does is not of much practical concern when the rendering code is the
bottle neck. However it’s interesting to see if the compiler can
deduce the best solution or whether anything trips it up.
Judging from the disassembly from
gcc -O3 -mcpu=native -mtune=native
, it may actually do
this. Additionally it vectorizes the code packing four
64bit ints at a time into 256bit registers and operating on those. I
don’t know which part of the code it is vectorising or how. It’s
possible that what I think is the rule being reduced is something
related to vectorisation.
To render the automata we take the approach of iterating over each pixel in the image. We calculate which cell the pixel falls inside and set the color of the pixel to that of the cell. That’s it.
/* Draws a pixel */
static inline void shade_pixel(gp_pixmap *p, gp_coord x, gp_coord y,
, gp_pixel fg)
gp_pixel bg{
;
gp_pixel pxsize_t i = (x * (64 * width)) / p->w;
size_t j = (y * height) / p->h;
size_t k = 63 - (i & 63);
uint64_t c = steps[gp_matrix_idx(width, j, i >> 6)];
= BIT_TO_MAX(c, k);
c = (fg & c) | (bg & ~c);
px
(p, x, y, px);
gp_putpixel_raw}
GFXPrim makes drawing very simple. The above code is fast enough for my purpsoses, but a significant improvement can be had. Integer division is much slower than floating point multiplication on most newer CPUs. It’s actually much faster (2x at least) on my CPU to calculate a pair of ratios in floating point, then convert them back to integers.
However, you may ask why we are even drawing on the CPU in the first place? This is because GFXPrim targets embedded systems with no graphics processor. Additionally the CPU may not even support floating point natively. So integer division may actually be faster in this case. Still better would be to limit the size of the pixmap to be 2x larger than the dimensions of the automata, where x ∈ ℕ then we can use shifts instead of division.
In fact on my CPU it is 256 cells at a time. As AVX can be used to operate on 4 64bit words at a time.↩︎
We perform a rotating shift which moves the end bit to the start. This causes the automata to wrap around.↩︎
Combined with exclusive or.↩︎
Or we can use xor as the patterns are mutually exclusive, so only one may match at a time for each bit.↩︎
That’s me being an arse not Wolfram. Of course whenever “obvious” is used in these contexts it’s never really correct.↩︎