December 18th, 2021 @ justine's web page
SectorLISP now supports garbage collection. This is the first time that a high-level garbage collected programming language has been optimized to fit inside the 512-byte boot sector of a floppy disk. Since we only needed 436 bytes, that means LISP has now outdistanced FORTH and BASIC to be the tiniest programming language in the world.
SectorLISP consists of 223 lines of assembly. It provides a LISP system that's powerful enough to let you write your own LISP interpreter in just 40 lines of LISP. It's compatible with all PC models dating back to 1981 which have at least 64kb of RAM. This isn't a toy because SectorLISP can run the proof assistant that was included in LISP 1.5. We achieved the small file size thanks to 20/20 hindsight and an unbiased approach of maximum austerity. The goal of this project has been to have fun building a kit that optimizes for file size at the expense of everything else, which means SectorLISP has more in common with a game like Universal Paperclips than a talking paperclip like Clippy.
This is a follow-up to a previous announcement made in October that SectorLISP now fits in one sector. There's been many changes over the past few months that made it possible to shave away another hundred bytes from the i8086 assembly implementation. It left plenty of room to add a 40 byte garbage collector. This blog post will tell the story of how our low-level assembly listing evolved, using a plain and simple C / JavaScript / Thompson Shell polyglot.
sectorlisp.bin
see also sectorlisp.S
sectorlisp-friendly.bin
see also sectorlisp.S
lisp.com
see also lisp.js
blinkenlights.com
-rt sectorlisp.bin # runs emulator |
sectorlisp.bin.dbg
sectorlisp-friendly.bin.dbg
lisp.com.dbg
blinkenlights.com.dbg |
The .bin floppy disk boot sectors can be emulated using Blinkenlights or QEMU, or you can watch the video below.
curl --compressed https://justine.lol/sectorlisp2/blinkenlights.com >blinkenlights.com
curl https://justine.lol/sectorlisp2/sectorlisp-friendly.bin >sectorlisp-friendly.bin
chmod +x blinkenlights.com
./blinkenlights.com -rt sectorlisp-friendly.bin # then press 'c'
qemu-system-i386 -nographic -fda sectorlisp-friendly.bin
The .com file is slightly larger but it runs on seven operating systems. It's the same as the JS simulator below.
curl https://justine.lol/sectorlisp2/lisp.com >lisp.com chmod +x lisp.com ./lisp.com
See the assembly listing listing section. You can
build the simulator on your UNIX system if cc
is installed:
curl https://justine.lol/sectorlisp2/lisp.js >lisp.js chmod +x lisp.js ./lisp.js
Building from source means you get a better command line interface. The shell script above will curl a GNU Readline replacement we wrote called Bestline from this same domain (justine.lol) to make sure you have the latest copy, because we've been slowly but steadily recreating support for Emacs' famous Paredit-style editing features.
Jim Leonard was able to confirm for us that SectorLISP does in fact run on the original hardware, or more specifically, the IBM PC model 5150. That was quite thrilling to learn, since SectorLISP was largely written a priori using a emulator on Linux that we created.
His video provides more background on what boot sectors are. He talks about similar projects that have been created in the past, such as Oscar Toledo's pioneering work. Then, towards the end of the video, you get to watch SectorLISP actually running on one of the finest computers ever sold, which has been preserved in perfect quality.
For example, you can hear the thunk of the power switch on the IBM PC. It's about as pleasing to hear as the thud sound an expensive German car like a Mercedes makes when you close the door. But the most pleasing of all is the vintage Model F mechanical keyboard, which to this day remains quite possibly the greatest and most respected mechanical keyboard of all time. No keyboard exists in the world that's superior in quality to those sold with the original IBM PC, except for the tenkeyless version of the Model F with APL legends on the keycaps, or better yet, the IBM 3278 Beam Spring keyboard for which the Model F was intended to be a more economical replacement.
Here's a demo of SectorLISP v2 booting from BIOS in Blinkenlights and running the metacircular evaluator, i.e. LISP written in LISP. If you compare this video to the one from the previous blog post you'll see how the new garbage collector has dramatically changed the personality of the software, in terms of how it uses physical memory. The WRITE memory panel in particular dances more, and shows how its heap allocations behave very similar to a stack. You can also use SectorLISP v1.o online in a PC browser emulator by visiting copy.sh/v86/.
If you're looking for something more modern, the multiplatform lisp.com executable also runs on bare metal. Here's a Blinkenlights demo of it booting from BIOS in 16-bit real mode. It then bootstraps itself into 32-bit mode so it can load itself off disk into memory. The final stage of bootstrapping inverts the physical memory, in a similar manner to the Linux Kernel, which enables it to run in 64-bit long mode. Once it's reached the zenith of computing modernity, it displays a LISP REPL via the serial port.
Keep in mind that the LISP operating system above that's fast-forwarding its way through history, is actually just the JavaScript source code built with Cosmopolitan Libc. Although we tuned it in lisp.c for a slight performance nudge when compiled with GCC.
The most important trick to implementing LISP is to redefine
NULL
.
function Set(i, x) { M[Null + i] = x; } function Get(i) { return M[Null + i]; }
LISP has two types of memory: atoms and cons cells. We store them in two
adjacent stacks that grow outward from NULL
.
Positive addresses are used to intern strings. Interning is good for
storing symbols (which LISP calls atoms) since it lets us test string
equality by comparing unique addresses. First among atoms
is NIL
which the assembly encodes as
a NUL
-terminated string residing at the
NULL
address.
Negative addresses are used to store Pair<int>
tuples, which LISP calls cons cells. These are used to chain atoms
together into data structures such as lists and binary trees. First
among cons cells is the empty list ()
which is stored at
negative NULL
and therefore equal to NIL
.
This multifacted nature of NIL
serves as evidence of how
LISP's design lends itself to symmetrical partitioning of memory, even
if the underlying machine arithmetic doesn't support signed zero.
Negative memory then grows down as CONS
is called, which
returns tuples indexed by CAR
and CDR
.
function Cons(car, cdr) { Set(--cx, cdr); Set(--cx, car); return cx; } |
function Car(x) { return Get(x); } function Cdr(x) { return Get(x + 1); } |
The assembly version does things the same way with the slight difference
that cons cells grow up from INT_MIN
rather than growing
down from zero. Doing that made the code smaller, but it also lets us
make the claim that it runs on the stock configuration of the first IBM
PC, which shipped with 64kb of RAM. It's because
rebasing NULL
on the boot address 0x7c00
gives
negative memory a legal range of -0x8000
to -0x7c00
since only 0x10000
bytes of linear
memory exist. Working within the constraints of an old computer like
i8086 that required us to confront unfamiliar concepts like negative
memory is what helped the most elegant approach for C and JavaScript to
become clear.
function ReadList() { var x = Read(); if (x > 0 && Get(t) == Ord(')')) { return -0; } else { return Cons(x, ReadList(t)); } } |
function Print(x) { if (1./x < 0) { PrintList(x); } else { PrintAtom(x); } } |
One advantage of JavaScript is that it uses a sign-magnitude encoding
similar to the IBM 704 computer for which LISP was designed. That means
we can tell ()
apart from NIL
in
our Read
and Print
code, even
though Eval
doesn't care about signed zeroes. The divide by
zero hack can test for cons cells including ()
because 1/-0 = -Infinity
and -Infinity < 0
.
C environments with integral two's complement arithmetic will simply
normalize ()
to NIL
and the floating point
operations can be optimized away automatically
using -ffast-math
.
LISP evaluation works by recursively calling Apply
to
transform the first element of a list into a lambda. It zips the
arguments with their parameters into the environment variables, and runs
the code contained in the function.
Eval(code=(SECOND ARG1 ARG2), vars={SECOND=(LAMBDA (X Y) Y), ARG1=FOO, ARG2=BAR}) → Apply(code=((LAMBDA (X Y) Y) FOO BAR), vars={SECOND=(LAMBDA (X Y) Y), ARG1=FOO, ARG2=BAR}) → Eval(code=Y, vars={X=FOO, Y=BAR, SECOND=(LAMBDA (X Y) Y), ARG1=FOO, ARG2=BAR}) → BAR
John McCarthy discovered an elegant self-defining way to compute the above steps, more commonly known as the metacircular evaluator. Alan Kay once described this code as the "Maxwell's equations of software". Here are those equations as implemented by SectorLISP:
//¶` #define var int #define function //` function Evcon(c, a) { if (Eval(Car(Car(c)), a)) { return Eval(Car(Cdr(Car(c))), a); } else { return Evcon(Cdr(c), a); } } function Evlis(m, a) { return m ? Cons(Eval(Car(m), a), Evlis(Cdr(m), a)) : m; } function Assoc(x, y) { if (x == Car(Car(y))) return Cdr(Car(y)); return Assoc(x, Cdr(y)); } function Pairlis(x, y, a) { return x ? Cons(Cons(Car(x), Car(y)), Pairlis(Cdr(x), Cdr(y), a)) : a; } function Eval(e, a) { var A = cx; if (!e) return e; if (e > 0) return Assoc(e, a); if (Car(e) == kQuote) return Car(Cdr(e)); if (Car(e) == kCond) return Evcon(Cdr(e), a); return Gc(A, Apply(Car(e), Evlis(Cdr(e), a), a)); } function Apply(f, x, a) { if (f < 0) return Eval(Car(Cdr(Cdr(f))), Pairlis(Car(Cdr(f)), x, a)); if (f == kEq) return Car(x) == Car(Cdr(x)); if (f == kCons) return Cons(Car(x), Car(Cdr(x))); if (f == kAtom) return Car(x) >= 0; if (f == kCar) return Car(Car(x)); if (f == kCdr) return Cdr(Car(x)); return Apply(Assoc(f, a), x, a); }
The code above is from lisp.js. What makes our
Rosetta Stone possible is that C was designed to use int
as
an implicit default type, and compilers maintained backwards
compatibility ever since. Traditional C may as well be Sanskrit since it
can't be a coincidence that the languages with the most gravitas seem to
share this as their common subset. It's unfortunate the C standards
committee intends to remove support for K&R syntax, because it
implements LISP so well. Let's compare the code above to John McCarthy's
original 1950's paper which used M-expression notation:
eval[e; a] = [ atom[e] → assoc[e; a]; atom[car[e]] → [ eq[car[e]; QUOTE] → cadr[e]; eq[car[e]; ATOM] → atom[eval[cadr[e]; a]]; eq[car[e]; EQ] → [eval[cadr[e]; a] = eval[caddr[e]; a]]; eq[car[e]; COND] → evcon[cdr[e]; a]; eq[car[e]; CAR] → car[eval[cadr[e]; a]]; eq[car[e]; CDR] → cdr[eval[cadr[e]; a]]; eq[car[e]; CONS] → cons[eval[cadr[e]; a]; eval[caddr[e]; a]]; T → eval[cons[assoc[car[e]; a]; evlis[cdr[e]; a]]; a] ]; eq[caar[e]; LAMBDA] → eval[caddar[e]; append[pair[cadar[e]; evlis[cdr[e]; a]; a]]] ]
It's a good thing that C wasn't designed until the 1970's since
otherwise JMC might never have discovered LISP. Or maybe the semicolons,
equals sign, and brackets that behave
like switch
, PROGN
, and COND
suggest the existence of a lost language that both Ritchie and JMC were
fortunate enough to use. It's unlikely, but it'd explain why he felt so
unhappy working with IBM on FORTRAN since it would be like asking a Rust
developer to fix Visual Basic 6. IBM likely knew he was uniquely
qualified to help them, but they didn't accept any of his proposals,
rejecting his asks for features like recursion as unnecessary. LISP
happened as a result. It was a big opportunity loss for both the IBM PC
and JMC himself, since he could have been Bill Gates if the two had
found a way to work together.
SectorLISP uses what we call an ABC garbage collector and it took only
40 bytes of assembly. It works by saving the position of the cons stack
before and after evaluation. Those values are called A and B. It then
decreases the cx cons stack pointer further by recursively copying the
Eval
result. The new stack position is called C. The memory
between B and C is then copied up to A. Once that happens, the new cons
stack position becomes A - B + C. The purpose of this operation is to
discard all the cons cells that got created which aren't part of the
result, because we know for certain they can't be accessed anymore
(assuming functions aren't added which mutate cells).
function Copy(x, m, k) { return x < m ? Cons(Copy(Car(x), m, k), Copy(Cdr(x), m, k)) + k : x; } function Gc(A, x) { var C, B = cx; x = Copy(x, A, A - B), C = cx; while (C < B) Set(--A, Get(--B)); return cx = A, x; } |
Copy: cmp %dx,%di jb 1f push (%bx,%di) mov (%di),%di call Copy pop %di push %ax call Copy pop %di call Cons sub %si,%ax add %dx,%ax ret 1: xchg %di,%ax ret |
Fast immediate garbage collection with zero memory overhead and perfect heap defragmentation is as easy as ABC when your language guarantees data structures are acyclic. The trick is to not copy anything above A, since that memory space consists of cons cells owned by calling functions as well as interned atoms, which should be ignored. That way overlaps aren't possible. Thus a GC cycle can't increase memory usage.
In order to bootstrap LISP one must first bootstrap its builtin atoms. In C you might do that using a for loop which copies chars from a string to the memory array. However the assembly implementation doesn't need to do anything.
NIL T Ω, └• QUOTE COND ATOM CAR CDR CONS EQ ╝ Ç♫▼♫•♫↨╗☻ ëßΦ◄ ΦT ΦA☺ûΦC ░♪Φr δΩë╧ê╨< v☻¬ûΦ_ < v±<)v♣Ç·)wΦê=û├░(λ0ï4Φ↕ ░ ^à÷x≥t♣░∙ Φ♦ ░)δ7Φ4 à÷x▐¼ä└u⌠├<(tPQë²)═E1λ^VëΘë°8=t♀≤ªt◙O1└«u²δΩ≤ñY├1└═▬┤♫ ═►<♪u♦░◙δ⌠Æ├àλt▬λ1ï♣Φ¡ _PΦ≡λ_ç∙ë♪ë☺ìM♦ù├Φcλ<)t_ΦóλPΦ≥λδΣ9╫rΩλ1ï= Φ⌡λ_PΦ≡λ_Φ╤λ)≡☺╨├VΦo ^à└y↔ùï9Wï=àλt╲¡λ1ï=ï4Φ»λùÆΦ¬λÆ_δΘ=) w╒ï<<∟ t+< t&<↨u•àλy♫1└├<$ï0¡tà1°u≥░♦├ë╓ï<ï0»u∙÷ï9<»ï♣├ï9Wï5¡Φ♂ _à└t≥λ5 _Φσλà└t+y╒û¡=♀ ï<t┌=↕ t┌RQPΦ.λûXΦsλZë╬ùΦNλë╫)±≤ñë∙Z├╬╬╬╬╬╬╬╬╬╬╬╬ ╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬ SECTORLISP v2 U¬
x86 processors have a feature that lets us literally
redefine NULL
to be the address to which our program is
loaded by the BIOS. It's a privilege that's normally reserved only for
operating systems. The trick we use here is to treat the program image
itself as a set of entries in the interned strings table.
One problem that arises is that the BIOS asks the CPU to execute the
ASCII strings at the beginning of the image. By sheer luck we learned
NIL
and T
could be safely decoded without
side-effects. Had that not been the case, we'd've needed to choose a
different terminator than NUL
which would have made the
program file larger.
"N" dec %si "I" dec %cx "L" dec %sp "\0T\0" add %dl,(%si) # and we know for certain %dl is 0 ljmp $0x7c00>>4,$begin
One thing you may be wondering about the above sector, is what's the point of making it smaller than 512 bytes if you have to pad it to 512 bytes anyway to include the U¬ (AA55h) boot signature? The answer is that we simply learned more about LISP by doing it. But it's also because that signature wasn't part of the original PC design. It was actually added by Microsoft to later models, similar to the more recently introduced requirements that the Linux Kernel be distributed as a Windows executable. The IBM PC XT will happily load and run the 436 byte version and it's a nice design.
If high-level programming languages like C are the Ice Hotel and assembly is the tip of the iceberg, then the hidden dimension of complexity lurking beneath would be Intel's variable length encoding. This is where boot sectors get esoteric real fast, since tools can't easily visualize it. for example, consider the following:
/ %ip is 0 mov $0xf4,%al ret |
/ %ip is 1 .byte 0xb0 wut: hlt # and catch fire ret |
Similar to how a Chess game may unfold very differently if a piece is moved to an unintended adjacent square, an x86 program can take on an entirely different meaning if the instruction pointer becomes off by one. We were able to use this to our advantage, since that lets us code functions in such a way that they overlap with one another.
/ SectorLISP code. 89 D6 Assoc: mov %dx,%si 8B 3C 1: mov (%si),%di 8B 30 mov (%bx,%si),%si AF scasw 75 F9 jne 1b F6 .byte 0xF6 8B 39 Cadr: mov (%bx,%di),%di 3C .byte 0x3C AF Cdr: scasw 8B 05 Car: mov (%di),%ax C3 ret |
89 D6 Assoc: mov %dx,%si 8B 3C 1: mov (%si),%di 8B 30 mov (%bx,%si),%si AF scasw 75 F9 jne 1b F6 8B 39 3C AF testw $0xaf,0x3c39(%bp,%di) 8B 05 mov (%di),%ax C3 ret 8B 39 Cadr: mov (%bx,%di),%di 3C AF cmp $0xaf,%al 8B 05 mov (%di),%ax C3 ret AF Cdr: scasw 8B 05 mov (%di),%ax C3 ret 8B 05 Car: mov (%di),%ax C3 ret |
It takes about 10 milliseconds on a 4.77mhz IBM PC to run to run a LISP
program wrapped inside John McCarthy's metacircular evaluator. However
an obvious bottleneck exists with the interning algorithm which takes
upwards of 50 milliseconds during the Read
operation. Even
on a forty year old computer, fifty milliseconds of latency is quite
sluggish in terms of performance; so let's fix that.
Tinier solutions are usually better ones, but that's not always the
case. The assembly version of our Intern
function has an
obvious scalability bottleneck since it saves space by using a
double-nul-terminated list. The simplest way to improve that is to
redefine positive memory to be a hash table containing linked lists of
characters.
function Probe(h, p) { return (h + p * p) & (Null / 2 - 1); } function Hash(h, x) { return (((h + x) * 3083 + 3191) >> 4) & (Null / 2 - 1); } function Intern(x, y, h, p) { if (x == Get(h) && y == Get(h + Null / 2)) return h; if (Get(h)) return Intern(x, y, Probe(h, p), p + 1); Set(h, x); Set(h + Null/2, y); return h; } function ReadAtom() { var x, y; ax = y = 0; do x = ReadChar(); while (x <= Ord(' ')); if (x > Ord(')') && dx > Ord(')')) y = ReadAtom(); return Intern(x, y, (ax = Hash(x, ax)), 1); } |
/ this is slow Intern: push %cx mov %di,%bp sub %cx,%bp inc %bp xor %di,%di 1: pop %si push %si mov %bp,%cx mov %di,%ax cmp %bh,(%di) je 8f rep cmpsb je 9f xor %ax,%ax dec %di 2: scasb jne 2b jmp 1b 8: rep movsb 9: pop %cx ret |
The trick here is to find a hash function so that NIL
interns at the NULL
position and T
interns at
1. That's what lets our Apply
code be more elegant. In the
above example, the magic numbers (3083, 3191, 4) were chosen under the
assumption that Null is 040000. If it were a different two-power like
0400000, then the magnums would be (60611, 20485, 0). If you wish to
dive deeper then take a look at
lisp.js, hash.c, and
hash.com.
(DEFINE REVERSE . (LAMBDA (X Y) (COND (X (REVERSE (CDR X) (CONS (CAR X) Y))) ((QUOTE T) Y)))) |
(REVERSE (QUOTE (A B C D)) ())
|
One of the known issues with recursive functions is that they have suboptimal performance without the aid of a tail call optimizer that can do copy elision. This can have a negative impact on the performance of the ABC Garbage Collector in list building functions such as the one above. Blinkenlights is good at spotting scalability problems early.
The canonical workaround to this kind of problem is to use binary trees instead of lists for big data, since binary trees will reduce the the call stack depth from being linear to logarithmic.
(DEFINE REVERSE . (LAMBDA (X) (COND ((ATOM X) X) ((QUOTE T) (CONS (REVERSE (CDR X)) (REVERSE (CAR X))))))) |
(REVERSE
(QUOTE
(((((A . B) . C) .
((D . E) . F)) .
(((G . H) . I) .
((J . K) .
(L . M)))) .
((((N . O) . P) .
((Q . R) . S)) .
(((T . U) . V) .
((W . X) .
(Y . Z)))))))
|
One thing profiling reveals is that LISP spends most of its time inside
Assoc
looking up variables, because recursive functions
repeatedly append the same variables to the dynamically-scoped
a
list. One low-hanging fruit optimization that's much
easier than implementing a tail-call optimizer is simply tuning
Pairlis
to peel away repeated variables.
function Peel(x, a) { return a && x == Car(Car(a)) ? Cdr(a) : a; } function Pairlis(x, y, a) { return x ? Cons(Cons(Car(x), Car(y)), Pairlis(Cdr(x), Cdr(y), Peel(Car(x), a))) : a; }
The above optimization allows SectorLISP to outperform Emacs, based on a benchmark of the "Triple LISP" example available under the Simulator Load button. See eval4.lisp for the Emacs Lisp translation. Emacs takes 2,187µs to run that, whereas SectorLISP takes 1580µs. This hack also helped reduce JavaScript latency from ~30ms to ~15ms.
|
|
The meaning of truth is underspecified by John McCarthy's original paper; however, if you read the LISP 1.5 source code, one of the first things you'll see is this charming comment, which attempts to provide a definition:
* PROP LISTS FOR ATOMS NIL & VERITAS-NUMQUAM-PERIT * THE ZERO AND THE BINARY TRUTH ATOMS RESPECTIVELY * 77640 ORG COMMON-18 77640 0 00137 0 07335 NILSXX $PNAME,,-*-1 77641 0 00000 0 00136 -*-1 77642 -0 00000 0 00135 MZE -*-1 77643 -053143777777 OCT 453143777777 NIL 77644 0 00000 0 00370 NILLOC $ZERO * 77645 0 00132 0 10742 STS $APVAL,,-*-1 77646 -0 00130 0 00131 MZE -*-1,,-*-2 77647 0 00000 0 00001 1 IS A CONSTANT ,,1 FOR APPLY 77650 0 00127 0 07335 $PNAME,,-*-1 77651 0 00000 0 00126 -*-1 77652 -0 00000 0 00125 MZE -*-1 77653 546351642554 BCI 1,*TRUE*
LISP 1.5 is a real trickster compared to modern LISP implementations
because it clandestinely maps T
to *TRUE*
the
latter of which is the true truth, since it's what builtin predicates
like ATOM
and EQ
actually return. It figures
they would quote someone who was tormenting Hector's widow since that
design probably tormented many MIT students. LISP 1.5 also jealously
guards its definition of truth, and will throw a constant error if you
try to change it.
A Latin phrase that better describes SectorLISP is NIHIL NVMQVAM PERIT
or in English what is dead may never die. What it means is
that NIL
is the only atom SectorLISP won't let you
redefine. You can however define T
to have any meaning you
wish, or use it as a variable name, since it's just an atom, and
anything that isn't NIL
is considered true by
EVCON
. From a programming perspective, this means you'd
need to use something like (QUOTE T)
rather
than T
in the default clauses of your COND
statements, unless you map T
to T
.
We chose this model simply because it's what made the file size tinier.
So it actually can be a good idea to have a less permissive design which
prevents T
from perishing, because making that a builtin
feature of Eval
alone can improve the performance of the
arithmetic code below by ten percent.
SectorLISP doesn't support numbers; but that's OK, since Arabic numerals
are after all just a sequence of digits, and digits are symbols. Since
SectorLISP does support sequences and symbols, you can use LISP's
preschool math to implement elementary maths like arithmetic. All you
need is a few lines of code. The simplest way we could model unsigned
integers is by using lists and then switching the Arabic right-to-left
ordering to be left-to-right instead (i.e. little endian). We'll then
define
NIL
as 0, and anything that isn't
NIL
shall be 1. For example, a number such as ten
(or 0b1010
in binary) could then be encoded as (NIL T
NIL T)
. Now that we've defined how our LISP numbers look, we can
implement recursive functions that operate on them.
(DEFINE T . T) (DEFINE NOT . (LAMBDA (X) (COND (X NIL) (T T)))) (DEFINE OR . (LAMBDA (X Y) (COND (X T) (T Y)))) (DEFINE AND . (LAMBDA (X Y) (COND (X Y) (T NIL)))) (DEFINE XOR . (LAMBDA (X Y) (COND (X (NOT Y)) (T Y)))) (DEFINE HEAD . (LAMBDA (X) (COND (X (CAR X)) (T NIL)))) (DEFINE TAIL . (LAMBDA (X) (COND (X (CDR X)) (T ())))) (DEFINE + . (LAMBDA (A B) (ADD A B NIL))) (DEFINE ADD . (LAMBDA (A B C) (COND ((OR A B) (CONS (XOR (XOR (HEAD A) (HEAD B)) C) (ADD (TAIL A) (TAIL B) (OR (AND (XOR (HEAD A) (HEAD B)) C) (AND (HEAD A) (HEAD B)))))) (C (CONS C ())) (T ())))) |
(DEFINE EQUAL . (LAMBDA (X Y) (COND ((ATOM X) (EQ X Y)) ((ATOM Y) (EQ X Y)) ((EQUAL (CAR X) (CAR Y)) (EQUAL (CDR X) (CDR Y))) ((QUOTE T) NIL)))) (DEFINE COMMENT 0 + 0 = 0) (EQUAL (+ () ()) ()) (DEFINE COMMENT 1 + 1 = 2) (EQUAL (+ (QUOTE (T)) (QUOTE (T))) (QUOTE (NIL T))) (DEFINE COMMENT 1 + 2 = 3) (EQUAL (+ (QUOTE (T)) (QUOTE (NIL T))) (QUOTE (T T))) (DEFINE COMMENT 2 + 2 = 4) (EQUAL (+ (QUOTE (NIL T)) (QUOTE (NIL T))) (QUOTE (NIL NIL T))) (DEFINE COMMENT 10 + 10 = 20) (EQUAL (+ (QUOTE (NIL T NIL T)) (QUOTE (NIL T NIL T))) (QUOTE (NIL NIL T NIL T))) |
The above code iterates over two lists of bits and then applies the full adder schematic. The nice thing about this code is it supports arbitrary precision. The Googlepedia solution to this kind of problem is to use Church numerals; but since Church encoding requires an amount of memory equal to the numbers themselves, that means we wouldn't be able to have numbers larger than 8192 on the IBM PC XT. That might be fine for proving theorems, but how would you like to own a 13-bit computer? SectorLISP instead unleashes your oldskool 16-bit CPU, enabling it to perform 64-bit and even 128-bit computations.
Since microoptimizations are very highly prized in arithmetic, one way
we might do that with SectorLISP by using builtins more and taking
advantage of friendly branch features, such as dot cons literals which
let us remove a CAR
call. The friendly branch also defines
CAR
and CDR
the modern way which means we
don't need to define HEAD
and TAIL
plus it
obfuscates the need for a wrapper. If we combine those techniques with a
builtin tautology atom then this new definition should triple addition
performance with a nice raw quality that self-documents the truth table.
(DEFINE + . (LAMBDA (A B C) (COND ((COND (A T) (T B)) ((LAMBDA (S) (CONS (CAR S) (+ (CDR A) (CDR B) (CDR S)))) (COND ((CAR A) (COND ((CAR B) (COND (C (QUOTE (T . T ))) (T (QUOTE (NIL . T ))))) (T (COND (C (QUOTE (NIL . T ))) (T (QUOTE (T . NIL))))))) (T (COND ((CAR B) (COND (C (QUOTE (NIL . T ))) (T (QUOTE (T . NIL))))) (T (COND (C (QUOTE (T . NIL))) (T (QUOTE (NIL . NIL)))))))))) (C (CONS C ())) (T ()))))
Another interesting thing about our arithmetic model is how the loose typing of truthiness means that S-expressions in general could be seen as having numeric values, almost like Hebrew numerals. We haven't imagined a use case for it yet, but let us know if you do, since that could be very cool.
Arithmetic in SectorLISP is very elegant, but unfortunately the numbers themselves aren't that readable unless you write a radix converter and reverse the lists. So calculus and algebra are usually better use cases for LISP since their inputs and outputs are already symbolic sequences and LISP is a symbolic language. Here's an example of how you can use SectorLISP to perform symbolic differentiation with support for the most common mathematical operators.
(DEFINE DIFF . (LAMBDA (E WRT) (COND ((EQ E WRT) 1) ((ATOM E) 0) ((QUOTE T) (DIFF3 (CAR E) (CAR (CDR E)) (COND ((CDR (CDR E)) (CAR (CDR (CDR E)))) ((QUOTE T) NIL))))))) (DEFINE DIFF3 . (LAMBDA (OP X Y) (COND ((EQ OP ADD) (L3 OP (DIFF X WRT) (DIFF Y WRT))) ((EQ OP SUB) (L3 OP (DIFF X WRT) (DIFF Y WRT))) ((EQ OP MUL) (L3 ADD (L3 MUL X (DIFF Y WRT)) (L3 MUL (DIFF X WRT) Y))) ((EQ OP DIV) (L3 SUB (L3 DIV (DIFF X WRT) Y) (L3 DIV (L3 MUL X (DIFF Y WRT)) (L3 POW Y (QUOTE 2))))) ((EQ OP POW) (L3 ADD (L3 MUL (L3 POW X Y) (L3 MUL (L2 LOG X) (DIFF Y WRT))) (L3 MUL Y (L3 MUL (L3 POW X (L3 SUB Y 1)) (DIFF X WRT))))) ((EQ OP LOG) (L3 DIV (DIFF X WRT) X)) ((EQ OP (QUOTE SIN)) (L3 MUL (DIFF X WRT) (L2 (QUOTE COS) X))) ((EQ OP (QUOTE COS)) (L3 SUB 0 (L3 MUL (DIFF X WRT) (L2 (QUOTE SIN) X)))) ((EQ OP (QUOTE ABS)) (L3 DIV (L3 MUL X (DIFF X WRT)) (L2 (QUOTE ABS) X))) ((QUOTE T) :HOW)))) |
(DEFINE 0 . 0) (DEFINE 1 . 1) (DEFINE ADD . ADD) (DEFINE SUB . SUB) (DEFINE DIV . DIV) (DEFINE POW . POW) (DEFINE MUL . MUL) (DEFINE LOG . LOG) (DEFINE L2 . (LAMBDA (X Y) (CONS X (CONS Y ())))) (DEFINE L3 . (LAMBDA (X Y Z) (CONS X (L2 Y Z)))) (DEFINE EQUAL . (LAMBDA (X Y) (COND ((ATOM X) (EQ X Y)) ((ATOM Y) (EQ X Y)) ((EQUAL (CAR X) (CAR Y)) (EQUAL (CDR X) (CDR Y))) ((QUOTE T) NIL)))) (EQUAL 1 (DIFF (QUOTE X) (QUOTE X))) (EQUAL (QUOTE (ADD 1 1)) (DIFF (QUOTE (ADD X X)) (QUOTE X))) (EQUAL (QUOTE (ADD 1 0)) (DIFF (QUOTE (ADD X Y)) (QUOTE X))) (EQUAL (QUOTE (ADD (MUL X 0) (MUL 1 Y))) (DIFF (QUOTE (MUL X Y)) (QUOTE X))) (EQUAL (QUOTE (COS X)) (DIFF (QUOTE (SIN X)) (QUOTE X))) (EQUAL (QUOTE (ADD (MUL (POW X Y) (MUL (LOG X) 0)) (MUL Y (MUL (POW X (SUB Y 1)) 1)))) (DIFF (QUOTE (POW X Y)) (QUOTE X))) |
Once again we see LISP's ability to not only compute, but elegantly
explain complex topics too, such as derivatives. LISP obviously knows
nothing about the meaning of the symbols you choose, such
as COS
or 0
even though it's able to
manipulate them for you at a high level. For that reason, some of the
expressions it outputs, e.g. (ADD 1 0)
can obviously be
simplified. LISP is good at doing that too. As a language, LISP is
popular for implementing computer algebra systems as well as compilers,
since simplified math is usually faster math. For example, GCC actually
uses a dialect of LISP internally for exactly this purpose. So even if
your experience has been focused on writing C and C++ in Vim, you may
already be a LISP user enjoying the benefits, from a certain point of
view.
The strangest thing that became apparent in our quest to demo a language
that's based on the lambda calculus is that the lambda keyword itself is
superfluous. That's not to imply LISP doesn't have lambdas, but rather
that it's such a powerful concept that it needn't be named.
Apply
is able to check for lambdas
using f<0
and then ignores Car(f)
. We could
have saved 2 bytes by dropping the LAMBDA
keyword
(e.g. (((X) X) (QUOTE ARG))
instead of ((LAMBDA (X)
X) (QUOTE ARG))
) but we left things alone, since it's an obvious
focal point for language revision schemes. It also makes SectorLISP
itself more flexible in terms of syntax. For example, you could
say ((FUNCTION (X) X) (QUOTE ARG))
or ((LOL (X) X)
(QUOTE ARG))
and it would mean the same thing. In the friendly
branch, you can also say (DEFINE IDENTITY AS (X) X)
rather
than (DEFINE IDENTITY . (LAMBDA (X) X))
.
The same concept also applies to COND
which can be removed
if we treat the LAMBDA
body as an
implicit COND
. But it'd only save us 11 bytes. That's too
big a tradeoff for the glory of 425 bytes, which can be viewed in the
reform
branch. Reforming the LISP language has never been our goal, because
it should be your goal. For example, here's how LISP could be defined
using only six builtins, which we'll call
LINK
, HEAD
, TAIL
,
EQ
, ATOM
, and Q
.
(EVAL (QUOTE (EVAL (Q (FF X)) (Q ((FF ((X) ((ATOM X) X) (FF (HEAD X)))) (X ((A) B C)))))) (QUOTE ((NULL ((X) (EQ X ()))) (LIST ((X Y) (LINK X (LINK Y ())))) (ASSOC ((X Y) ((NULL Y) (LIST (Q ?) X)) ((EQ X (HEAD (HEAD Y))) (HEAD (TAIL (HEAD Y)))) (ASSOC X (TAIL Y)))) (EVLAM ((C A) ((NULL (TAIL C)) (EVAL (HEAD C) A)) ((EVAL (HEAD (HEAD C)) A) (EVLAM (TAIL (HEAD C)) A)) (EVLAM (TAIL C) A))) (ZIP ((X Y A) ((NULL X) A) (LINK (LIST (HEAD X) (HEAD Y)) (ZIP (TAIL X) (TAIL Y) A)))) (EVLIS ((M A) ((NULL M) M) (LINK (EVAL (HEAD M) A) (EVLIS (TAIL M) A)))) (APPLY ((FN X A) ((ATOM FN) ((EQ FN (Q HEAD)) (HEAD (HEAD X))) ((EQ FN (Q TAIL)) (TAIL (HEAD X))) ((EQ FN (Q ATOM)) (ATOM (HEAD X))) ((EQ FN (Q LINK)) (LINK (HEAD X) (HEAD (TAIL X)))) ((EQ FN (Q EQ)) (EQ (HEAD X) (HEAD (TAIL X)))) (APPLY (EVAL FN A) X A)) (EVLAM (TAIL FN) (ZIP (HEAD FN) X A)))) (EVAL ((E A) ((NULL E) E) ((ATOM E) (ASSOC E A)) ((ATOM (HEAD E)) ((EQ (HEAD E) (Q Q)) (HEAD (TAIL E))) (APPLY (HEAD E) (EVLIS (TAIL E) A) A)) (APPLY (HEAD E) (EVLIS (TAIL E) A) A)))))) |
(DEFINE ASSOC . (LAMBDA (X Y) (COND ((EQ X (CAR (CAR Y))) (CAR (CDR (CAR Y)))) ((QUOTE T) (ASSOC X (CDR Y)))))) (DEFINE EVLIS . (LAMBDA (M A) (COND (M (CONS (EVAL (CAR M) A) (EVLIS (CDR M) A))) ((QUOTE T) ())))) (DEFINE EVLAM . (LAMBDA (C A) (COND ((EQ (CDR C) ()) (EVAL (CAR C) A)) ((EVAL (CAR (CAR C)) A) (EVLAM (CDR (CAR C)) A)) ((QUOTE T) (EVLAM (CDR C) A))))) (DEFINE ZIP . (LAMBDA (X Y A) (COND (X (CONS (CONS (CAR X) (CONS (CAR Y) ())) (ZIP (CDR X) (CDR Y) A))) ((QUOTE T) A)))) (DEFINE APPLY . (LAMBDA (FN X A) (COND ((ATOM FN) (COND ((EQ FN (QUOTE HEAD)) (COND ((CAR X) (CAR (CAR X))) (T ()))) ((EQ FN (QUOTE TAIL)) (COND ((CAR X) (CDR (CAR X))) (T ()))) ((EQ FN (QUOTE ATOM)) (ATOM (CAR X))) ((EQ FN (QUOTE LINK)) (CONS (CAR X) (CAR (CDR X)))) ((EQ FN (QUOTE EQ)) (EQ (CAR X) (CAR (CDR X)))) ((QUOTE T) (APPLY (EVAL FN A) X A)))) ((QUOTE T) (EVLAM (CDR FN) (ZIP (CAR FN) X A)))))) (DEFINE EVAL . (LAMBDA (E A) (COND ((EQ E ()) ()) ((ATOM E) (ASSOC E A)) ((ATOM (CAR E)) (COND ((EQ (CAR E) (QUOTE Q)) (CAR (CDR E))) ((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A)))) ((QUOTE T) (APPLY (CAR E) (EVLIS (CDR E) A) A))))) |
The code above is a LISP within a LISP within a LISP: three levels. You
can use this technique to implement missing features like macros. It
also becomes clear that LISP is a king of kings among programming
languages, since if a language can succinctly implement itself, then it
can also let you easily build any other domain-specific language too (so
long as they also use parenthesis notation). One such DSL we could build
is for a Turing machine, which in this case has been configured to
compute popcount(r) % 2 == 1
.
(DEFINE SCANNED-SYMBOL . 0) (DEFINE LEFT-HAND-TAPE . (1 0 1 B B)) (DEFINE RIGHT-HAND-TAPE . (1 1 0 B)) ;; we're computing parity of 0b011 (TURING (QUOTE (0 B (0 0 B R 0) ;; if (state==0 && input==0) print(B), move(Right), state=0 (0 1 B R 1) ;; if (state==0 && input==1) print(B), move(Right), state=1 (0 B 0 R 2) ;; if (state==0 && input==B) print(0), move(Right), state=2 (1 0 B R 1) ;; if (state==1 && input==0) print(B), move(Right), state=1 (1 1 B R 0) ;; if (state==1 && input==1) print(B), move(Right), state=0 (1 B 1 R 2))) ;; if (state==1 && input==B) print(1), move(Right), state=2 (LIST-3 SCANNED-SYMBOL LEFT-HAND-TAPE RIGHT-HAND-TAPE))
You can view the full Turing machine example by clicking the simulator's Load button above. For further details on how the DSL is defined, please see AI Memo No. 8.
While knowledge of the technique existed, no one's published software before that polyglots C and JavaScript, so it's worth going into further detail on how that works. Language polyglots are similar in spirit to how a lady or gentleman might choose a communications style that focuses on a subset of rhetoric all groups feel comfortable hearing. The code examples in this blog post operate by that same principle, since they're designed to run in the most popular programming environments. It's the same technique that was used for Actually Portable Executable which lets us build binaries that run on seven operating systems.
There's nothing extraordinary about having a coding style that focuses on what different programmers share in common. It's the initial process of determining what that software consensus is that's difficult, since it requires one to not only understand of the requirements of modern systems, but also undergo an archaeological voyage reading tomes of legacy code like a librarian in order to understand their phylogenesis. The results end up being very easy:
// plain simple executable code that // helps c / js communities use lisp function Apply(f, x, a) { if (f < 0) return Eval(Car(Cdr(Cdr(f))), Pairlis(Car(Cdr(f)), x, a)); if (f == kEq) return Car(x) == Car(Cdr(x)); if (f == kCons) return Cons(Car(x), Car(Cdr(x))); if (f == kAtom) return Car(x) >= 0; if (f == kCar) return Car(Car(x)); if (f == kCdr) return Cdr(Car(x)); return Apply(Assoc(f, a), x, a); }
But not every line of code can be made a good citizen of multiple languages. We need workarounds for code that's platform specific. Ulysses (the person the LISP 1.5 source code quoted earlier) is most famous for his Trojan Horse trick that let Greece defeat Troy. His guile inspired another recent paper called Trojan Source. The trick SectorLISP uses which is similar in spirit (but not intent) is the PARAGRAPH SEPARATOR (u2029) which ECMA-262 2021 §12.3 defines as a line terminator, whereas ANSI C and nearly everything else does not.
javascript syntax highlighting//¶` ... C only code goes here ... //`
It works by sneaking multiline JavaScript strings into C comments, to
serve as a cross-language #ifdef
statement. That lets us
ask JavaScript to ignore small portions of code that are only meant for
C compilers. If we wish to ask C to ignore code that's intended only for
JavaScript, then the technique can be used as follows:
c syntax highlighting//¶` #if 0 //` ... JavaScript only code goes here ... //¶` #endif //`
It should be a requirement for C projects that call themselves portable.
Codebases like Zlib / InfoZIP use a notorious number
of #ifdef
statements to bring the benefits of support for
the DEFLATE algorithm to platforms like VAX, QDOS, Amiga, IBM
System/360, and possibly even those IBM Series/1 minicomputers. If we go
to great lengths using ifdefs to maintain rare platforms, then why
shouldn't we use polyglot ifdefs to support the most popular platform
too? There's wisdom in a LISP that's fits on floppy disks if we consider
that a U.S. nuclear weapons agency got hacked immediately after they
stopped using them6. They should
have considered LISP instead since it has all the qualities of the old
technologies while continuing to be ahead of its time, even after all
these years.
Please note the PARAGRAPH SEPARATOR (u2029) is normally invisible so it's been typeset above as PILCROW SIGN (u00b6). Since creative uses of technology have a tendency to reveal bugs, it should be noted that, thanks to the limitless configurability of LISP, you'll always have a safer space for programming since you can tune your Emacs editor as follows:
(or standard-display-table (setq standard-display-table (make-display-table))) (aset standard-display-table #x2028 [?↵]) ;; LINE SEPARATOR (aset standard-display-table #x2029 [?¶]) ;; PARAGRAPH SEPARATOR (aset standard-display-table #x202A [?⟫]) ;; LEFT-TO-RIGHT EMBEDDING (aset standard-display-table #x202B [?⟪]) ;; RIGHT-TO-LEFT EMBEDDING (aset standard-display-table #x202D [?❯]) ;; LEFT-TO-RIGHT OVERRIDE (aset standard-display-table #x202E [?❮]) ;; RIGHT-TO-LEFT OVERRIDE (aset standard-display-table #x2066 [?⟩]) ;; LEFT-TO-RIGHT ISOLATE (aset standard-display-table #x2067 [?⟨]) ;; RIGHT-TO-LEFT ISOLATE (aset standard-display-table #x2068 [?⧽]) ;; FIRST STRONG ISOLATE (aset standard-display-table #x202C [?⇮]) ;; POP DIRECTIONAL FORMATTING (aset standard-display-table #x2069 [?⇯]) ;; POP DIRECTIONAL ISOLATE
SectorLISP is non-revisionist because if we'd gone down the path of changing the definition of LISP, then it'd've taken us to a place where lists become tape and then you've got an ASCII Turing machine like Brainfuck, which can be implemented with only 99 bytes. It may be capable of computing everything that's computable, but can we really call gibberish a programming language?
Hello World in Brainf*#k
++++++++[>++++
[>++>+++>+++>+<<<<-]
>+>+>->>+[<]<-]
>>. >---. +++++++..
+++. >>. <-.
<. +++. ------.
--------. >>+. >++.
If that were the goal, we'd be better served by simply not having an abstraction layer at all. For example, it's possible in 23 bytes to have a universal language on x86 simply by exposing the x86 machine language. If you compile the program below and load it into Blinkenlights then you can copy and paste the gibberish string for Hello World and it'll print "hello world". If you can chord then you can type the binary into your IBM PC with the Model F keyboard too.
/ twenty three byte loader _start: ljmp $0x600>>4,$_begin _begin: push %cs pop %ds push %cs pop %es 0: call 1f 1: push %di 2: xor %ax,%ax int $0x16 stosb cmp $206,%al jne 2b ret |
/ example program with loader / put string in blinkenlights / j♪^┤♫¼═►<◙u∙├hello world♪◙╬ push $13 pop %si mov $0x0e,%ah 0: lodsb int $0x10 cmp $10,%al jnz 0b ret .ascii "hello world\r\n" into |
Languages like Brainfuck were known in JMC's time, but it was the human quality of his research that made LISP so attractive. Universal languages are common enough that one of the biggest challenges in securing protocols and file formats is demonstrating that they're not accidentally a Turing machine. For example, the UNICODE stack machine characters above flirt dangerously close with Turing's thesis. Even Conway's Game of Life was proven to be Turing complete. On the other hand, LISP is a tool that people want to use. It's always commanded respect and some companies have built veritable empires using it. That's why it's such a great find that, thanks to the simplicity and generality of its original formulas, we were able to implement LISP in nearly the same number of bytes as Brainfuck.
It might be possible to create something even smaller than Brainfuck.
Computer science provides some clues as to what the tiniest esoteric
boot sector language might be. For example, Marvin Minsky discovered a
7-state 4-symbol universal Turing machine in 1962. The chart above is
from Turlough Neary's PhD thesis which shows a tradeoff between
minimizing symbols and states. If we could find some way to get the
rules for one of these tiny Turing machines to neatly map onto a bitwise
arithmetic hack, similar to the (B11 | r0) & r1 &
~r2
trick that Browne and
Tavener discovered for Conway's Game of Life, like a NAND gate, then we
might create an esoteric boot sector language so tiny in scale and
austere in its definition that its parsimoniousness would never be
challenged.
|
|
Tiny code contests have historically focused on the source code. What makes that problematic is how it incentivizes obfuscation. On the other hand, optimizing for binary size usually results in the source code becoming more elegant. In order to make program files tinier, one must choose data structures and design patterns that are harmonious with the way computers work. Doing that requires us to better understand the nature of the problem.
Is it better to help maintain the largest piece of software in the world when you can create the tiniest one instead? There's more combinations in a 512-byte boot sector than there are positions on a Go board. The fact that people are still publicly developing new boot sector programs for a forty year old computer should speak for itself. Demoscene is a relative newcomer compared to Go and Chess, but it could potentially be a more enjoyable competitive mind sport since Kolmogorov complexity is AI-hard.
While code golfing is fun, the goal of any technology is to help us better understand nature and to improve upon the human condition. The way tinier software conventionally does this is by letting us have more instances of it at scale. The same is true of nature where the tiniest creatures are the most predominant. They're called phages and there's 10,000,000,000,000,000,000,000,000,000,000 of these guys on earth. The tiniest phage in the world that's listed in the European Nucleotide Archive goes by the name Leuconostoc Phage L5 and it has a complete genome size of 631 bytes. So by the standards of our game, we could say that LISP has now outdistanced nature too. This of course isn't meant to imply that smaller is necessarily better. For example, ring tailed lemurs have a binary footprint of 537 megs12 similar to Racket's 500mb install size. They're both quite adorable, but certainly not simple.
> L06183: Leuconostoc Phage L5 complete genome cp437 le-bin acgt Z6♫τ♫▼≡2É<ÅΣ< 2âÖ⌂δ╧δ♦♫ü◘&☼G~C ≥♥Fî` ÉO²æ⌡f1▬←°ⁿ<7¿√•▼╠♫&μ»↕_░»≡ ä ╗≤↔?☼@v¿ ○☻ r┤ⁿ░≥▼.λ┐¢ⁿ∩&╗↓≡╗╔λo■ⁿ÷∩o╛■Bj♂X»ô┤∙â☺☺`7s╨ƒ•3b♪l§► →ü♫╓╠‼μ▀X√╛←╝∩♠<┐∩⌂≡≈λc±∩≡£ ‼²∙²•o♥|╧φ/C▀╟♪;╛♥⌂λ╟♀♀═tλî;/┴D4α╬╝3 ⌂╚$á┬╠á▒☼ °└,╚å‼A λ├≥τ` ╩╕û├↨F♦○B$34l?└┘‼LÇy♣(Mⁿây┐╧♂↕╝C2NKÇαá╞⌂ ╦┴♥?│├Çä☼├Ç£ ån<0⌡╧∙╦■║♀╙Ç|<é╜3☼Ü>♀pé²76+┘▓╧≈▬“∩∩A│3 6 NO║oƒ<n@~ ╕┐M)@> 8e☼∙ò£ê≤├♥▒♫╠◙l⌂F╛≥┤λ♦░⌂λ▬╛▲☼↑ ↨♀═‘/≡♠ñ•¶╒λ≈↨╪/Ç√ô♀├ ♂≤∩∟ 0N╚ñ ♂αCé╘#┘ú»⌠♫0•♀[9=¢2╬☻♦ï▼δ£ä6 ¼±¡÷n↑♦ìp(ⁿMX≤æ°∩¬á┴╧<+☼↓<ç÷O╦ {ŃÄ=•â┬∟¼╨╠╠▀ôⁿ≤╖┴=≡∩▼l♥►5♂t%/█∩o╝4┼↨ç≡•ε·H<∩╦♀ε«╣╠■ëO♥ļ¡═•│m♂ ÉN< G≡23â≈⌠├°6∩²┌?│α·α>┘°▲L±@=7░íⁿ0▀╨≤C2>√?▼█≥┴♂♥?ó├?`♀ êyC┴â▬Æ ²├╧╧╧╧ƒ¢║a╦♂a┌@c∩÷A‼╓┐╪╢L←ç╝╪┐π¢▬
John McCarthy may have lost out on the opportunity to be the father of the personal computing software revolution, but there's still a chance for his ideas to lead the second. The tools for natural engineering became cheap and open source in recent years. Many people are scrambling to be the next Monsanto, but what we actually want is the next Linux, Apple, and Microsoft. We need a tool that can abstract genetics and proteins in the smallest way possible, similar to how SectorLISP currently offers the service of abstracting gritty assembly in the fewest commands. For example one might edit a tiny symbiotic bacteria like Candidatus Carsonella Ruddii (40,000 bytes) to be a platform for running LISP programs that would be delivered by printed phages. In fact, someone might have even figured it out already. It's been public knowledge for fifty years people can build recombinant organisms, like the gap between iPhone and SIGSALY. Crispr can be programmed in XML. If it can be done with LISP instead then we must use it.
Here's the full i8086 assembly listing for SectorLISP v2.
It's 8 pages and available in
TXT,
HTML,
BIN, or
ELF.
You may also view the sources on GitHub as either
ASM or
C.
GAS LISTING sectorlisp.S page 1 GNU assembler version 2.34 (x86_64-alpine-linux-musl) using BFD version (GNU Binutils) 2.34. options passed : -aghlms=sectorlisp.lst -g input file : sectorlisp.S output file : sectorlisp.o target : x86_64-alpine-linux-musl time stamp : 2021-12-11T00:49:38.000-0800GAS LISTING sectorlisp.S page 21 /*-*- mode:unix-assembly; indent-tabs-mode:t; tab-width:8; coding:utf-8 -*-│ 2 │vi: set et ft=asm ts=8 tw=8 fenc=utf-8 :vi│ 3 ╞══════════════════════════════════════════════════════════════════════════════╡ 4 │ Copyright 2020 Justine Alexandra Roberts Tunney │ 5 │ Copyright 2021 Alain Greppin │ 6 │ Some size optimisations by Peter Ferrie │ 6 │ │ 7 │ Permission to use, copy, modify, and/or distribute this software for │ 8 │ any purpose with or without fee is hereby granted, provided that the │ 9 │ above copyright notice and this permission notice appear in all copies. │ 10 │ │ 11 │ THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL │ 12 │ WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED │ 13 │ WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE │ 14 │ AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL │ 15 │ DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR │ 16 │ PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER │ 17 │ TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR │ 18 │ PERFORMANCE OF THIS SOFTWARE. │ 19 ╚─────────────────────────────────────────────────────────────────────────────*/ 21 22 // LISP meta-circular evaluator in a MBR 23 // Compatible with the original hardware 24 25 .code16 26 .globl _start 27 0000 4E494C00 _start: .asciz "NIL" # dec %si ; dec %cx ; dec %sp 28 0004 5400 kT: .asciz "T" # add %dl,(%si) boot A:\ DL=0 29 0006 EA0000C0 start: ljmp $0x7c00>>4,$begin # cs = 0x7c00 is boot address 29 07 30 000b 00 .asciz "" 31 000c 51554F54 kQuote: .asciz "QUOTE" 31 4500 32 0012 434F4E44 kCond: .asciz "COND" 32 00 33 0017 41544F4D kAtom: .asciz "ATOM" # ordering matters 33 00 34 001c 43415200 kCar: .asciz "CAR" # ordering matters 35 0020 43445200 kCdr: .asciz "CDR" # ordering matters 36 0024 434F4E53 kCons: .asciz "CONS" # ordering matters 36 00 37 0029 455100 kEq: .asciz "EQ" # needs to be last 38 39 002c BC0080 begin: mov $0x8000,%sp # uses higher address as stack 40 # and set independently of SS! 41 # 8088 doesn't stop interrupts 42 # after SS is set, and PC BIOS 43 # sets SP to a value that will 44 # damage our code if int fires 45 # between it setting SS and SP 46 002f 0E push %cs # that means ss = ds = es = cs 47 0030 1F pop %ds # noting ljmp set cs to 0x7c00 48 0031 0E push %cs # that's the bios load address 49 0032 07 pop %es # therefore NULL points to NUL 50 0033 0E push %cs # terminated NIL string above! 51 0034 17 pop %ss # errata exists but don't care 52 0035 BB0200 mov $2,%bxGAS LISTING sectorlisp.S page 353 0038 89E1 main: mov %sp,%cx 54 003a E81100 call GetToken 55 003d E85400 call GetObject 56 0040 E84101 call Eval 57 0043 96 xchg %ax,%si 58 0044 E84300 call PrintObject 59 0047 B00D mov $'\r',%al 60 0049 E87200 call PutChar 61 004c EBEA jmp main 62 63 GetToken: # GetToken():al, dl is g_look 64 004e 89CF mov %cx,%di 65 0050 88D0 1: mov %dl,%al 66 0052 3C20 cmp $' ',%al 67 0054 7602 jbe 2f 68 0056 AA stosb 69 0057 96 xchg %ax,%si 70 0058 E85F00 2: call GetChar # exchanges dx and ax 71 005b 3C20 cmp $' ',%al 72 005d 76F1 jbe 1b 73 005f 3C29 cmp $')',%al 74 0061 7605 jbe 3f 75 0063 80FA29 cmp $')',%dl # dl = g_look 76 0066 77E8 ja 1b 77 0068 883D 3: mov %bh,(%di) # bh is zero 78 006a 96 xchg %si,%ax 79 006b C3 ret 80 81 .PrintList: 82 006c B028 mov $'(',%al 83 006e FF30 2: push (%bx,%si) 84 0070 8B34 mov (%si),%si 85 0072 E81200 call .PutObject 86 0075 B020 mov $' ',%al 87 0077 5E pop %si # restore 1 88 0078 85F6 test %si,%si 89 007a 78F2 js 2b # jump if cons 90 007c 7405 jz 4f # jump if nil 91 007e B0F9 mov $249,%al # bullet (A∙B) 92 0080 E80400 call .PutObject 93 0083 B029 4: mov $')',%al 94 0085 EB37 jmp PutChar 95 96 .PutObject: # .PutObject(c:al,x:si) 97 .PrintString: # nul-terminated in si 98 0087 E83400 call PutChar # preserves si 99 PrintObject: # PrintObject(x:si) 100 008a 85F6 test %si,%si # set sf=1 if cons 101 008c 78DE js .PrintList # jump if not cons 102 .PrintAtom: 103 008e AC lodsb 104 008f 84C0 test %al,%al # test for nul terminator 105 0091 75F4 jnz .PrintString # -> ret 106 0093 C3 ret 107 108 GetObject: # called just after GetToken 109 0094 3C28 cmp $'(',%alGAS LISTING sectorlisp.S page 4110 0096 7450 je GetList 111 # jmp Intern 112 113 0098 51 Intern: push %cx # Intern(cx,di): ax 114 0099 89FD mov %di,%bp 115 009b 29CD sub %cx,%bp 116 009d 45 inc %bp 117 009e 31FF xor %di,%di 118 00a0 5E 1: pop %si 119 00a1 56 push %si 120 00a2 89E9 mov %bp,%cx 121 00a4 89F8 mov %di,%ax 122 00a6 383D cmp %bh,(%di) 123 00a8 740C je 8f 124 00aa F3A6 rep cmpsb # memcmp(di,si,cx) 125 00ac 740A je 9f 126 00ae 4F dec %di 127 00af 31C0 xor %ax,%ax 128 00b1 AE 2: scasb # memchr(di,al,cx) 129 00b2 75FD jne 2b 130 00b4 EBEA jmp 1b 131 00b6 F3A4 8: rep movsb # memcpy(di,si,cx) 132 00b8 59 9: pop %cx 133 00b9 C3 ret 134 135 00ba 31C0 GetChar:xor %ax,%ax # GetChar→al:dl 136 00bc CD16 int $0x16 # get keystroke 137 00be B40E PutChar:mov $0x0e,%ah # prints CP-437 138 00c0 CD10 int $0x10 # vidya service 139 00c2 3C0D cmp $'\r',%al # don't clobber 140 00c4 7504 jne 1f # look xchg ret 141 00c6 B00A mov $'\n',%al 142 00c8 EBF4 jmp PutChar 143 00ca 92 1: xchg %dx,%ax 144 00cb C3 ret 145 146 //////////////////////////////////////////////////////////////////////////////// 147 148 00cc 85FF Evlis: test %di,%di # Evlis(m:di,a:dx):ax 149 00ce 7416 jz 1f # jump if nil 150 00d0 FF31 push (%bx,%di) # save 1 Cdr(m) 151 00d2 8B05 mov (%di),%ax 152 00d4 E8AD00 call Eval 153 00d7 5F pop %di # restore 1 154 00d8 50 push %ax # save 2 155 00d9 E8F0FF call Evlis 156 # jmp xCons 157 158 00dc 5F xCons: pop %di # restore 2 159 00dd 87F9 Cons: xchg %di,%cx # Cons(m:di,a:ax):ax 160 00df 890D mov %cx,(%di) # must preserve si 161 00e1 8901 mov %ax,(%bx,%di) 162 00e3 8D4D04 lea 4(%di),%cx 163 00e6 97 1: xchg %di,%ax 164 00e7 C3 ret 165 166 00e8 E863FF GetList:call GetTokenGAS LISTING sectorlisp.S page 5167 00eb 3C29 cmp $')',%al 168 00ed 745F je .retF 169 00ef E8A2FF call GetObject 170 00f2 50 push %ax # popped by xCons 171 00f3 E8F2FF call GetList 172 00f6 EBE4 jmp xCons 173 174 00f8 39D7 Gc: cmp %dx,%di # Gc(x:di,A:dx,B:si):ax 175 00fa 72EA jb 1b # we assume immutable cells 176 00fc FF31 push (%bx,%di) # mark prevents negative gc 177 00fe 8B3D mov (%di),%di 178 0100 E8F5FF call Gc 179 0103 5F pop %di 180 0104 50 push %ax 181 0105 E8F0FF call Gc 182 0108 5F pop %di 183 0109 E8D1FF call Cons 184 010c 29F0 sub %si,%ax # ax -= C - B 185 010e 01D0 add %dx,%ax 186 0110 C3 ret 187 188 0111 56 .dflt1: push %si # save x 189 0112 E86F00 call Eval 190 0115 5E pop %si # restore x 191 # jmp Apply 192 193 0116 85C0 Apply: test %ax,%ax # Apply(fn:ax,x:si:a:dx):ax 194 0118 791D jns .switch # jump if atom 195 011a 97 xchg %ax,%di # di = fn 196 011b 8B39 .lambda:mov (%bx,%di),%di # di = Cdr(fn) 197 011d 57 push %di # for .EvCadr 198 011e 8B3D mov (%di),%di # di = Cadr(fn) 199 0120 85FF Pairlis:test %di,%di # Pairlis(x:di,y:si,a:dx):dx 200 0122 745C jz .EvCadr # return if x is nil 201 0124 AD lodsw # ax = Car(y) 202 0125 FF31 push (%bx,%di) # push Cdr(x) 203 0127 8B3D mov (%di),%di # di = Car(x) 204 0129 8B34 mov (%si),%si # si = Cdr(y) 205 012b E8AFFF call Cons # Cons(Car(x),Car(y)) 206 012e 97 xchg %ax,%di 207 012f 92 xchg %dx,%ax 208 0130 E8AAFF call Cons # Cons(Cons(Car(x),Car(y)),a) 209 0133 92 xchg %ax,%dx # a = new list 210 0134 5F pop %di # grab Cdr(x) 211 0135 EBE9 jmp Pairlis 212 0137 3D0000 .switch:cmp $kEq,%ax # eq is last builtin atom 213 013a 77D5 ja .dflt1 # ah is zero if not above 214 013c 8B3C mov (%si),%di # di = Car(x) 215 013e 3C00 .ifCar: cmp $kCar,%al 216 0140 742B je Car 217 0142 3C00 .ifCdr: cmp $kCdr,%al 218 0144 7426 je Cdr 219 0146 3C00 .ifAtom:cmp $kAtom,%al 220 0148 7507 jne .ifCons 221 014a 85FF test %di,%di # test if atom 222 014c 790E jns .retT 223 014e 31C0 .retF: xor %ax,%ax # ax = nilGAS LISTING sectorlisp.S page 6224 0150 C3 ret 225 0151 3C00 .ifCons:cmp $kCons,%al 226 0153 8B30 mov (%bx,%si),%si # si = Cdr(x) 227 0155 AD lodsw # si = Cadr(x) 228 0156 7485 je Cons 229 0158 31F8 .isEq: xor %di,%ax # we know for certain it's eq 230 015a 75F2 jne .retF 231 015c B000 .retT: mov $kT,%al 232 015e C3 ret 233 234 015f 89D6 Assoc: mov %dx,%si # Assoc(x:ax,y:dx):ax 235 0161 8B3C 1: mov (%si),%di 236 0163 8B30 mov (%bx,%si),%si 237 0165 AF scasw 238 0166 75F9 jne 1b 239 0168 F6 .byte 0xF6 # testb §i8,i16(%bp,%di) jmp Car 240 0169 8B39 Cadr: mov (%bx,%di),%di # contents of decrement register 241 016b 3C .byte 0x3C # cmp §scasw,%al (nop next byte) 242 016c AF Cdr: scasw # increments our data index by 2 243 016d 8B05 Car: mov (%di),%ax # contents of address register!! 244 016f C3 2: ret 245 246 0170 8B39 1: mov (%bx,%di),%di # di = Cdr(c) 247 0172 57 Evcon: push %di # save c 248 0173 8B35 mov (%di),%si # di = Car(c) 249 0175 AD lodsw # ax = Caar(c) 250 0176 E80B00 call Eval 251 0179 5F pop %di # restore c 252 017a 85C0 test %ax,%ax # nil test 253 017c 74F2 jz 1b 254 017e FF35 push (%di) # push Car(c) 255 0180 5F .EvCadr:pop %di 256 0181 E8E5FF call Cadr # ax = Cadar(c) 257 # jmp Eval 258 259 0184 85C0 Eval: test %ax,%ax # Eval(e:ax,a:dx):ax 260 0186 742B jz 1f 261 0188 79D5 jns Assoc # lookup val if atom 262 018a 96 xchg %ax,%si # di = e 263 018b AD lodsw # ax = Car(e) 264 018c 3D0000 cmp $kQuote,%ax # maybe CONS 265 018f 8B3C mov (%si),%di # di = Cdr(e) 266 0191 74DA je Car 267 0193 3D0000 cmp $kCond,%ax 268 0196 74DA je Evcon # ABC Garbage Collector 269 0198 52 push %dx # save a 270 0199 51 push %cx # save A 271 019a 50 push %ax 272 019b E82EFF call Evlis 273 019e 96 xchg %ax,%si 274 019f 58 pop %ax 275 01a0 E873FF call Apply 276 01a3 5A pop %dx # restore A 277 01a4 89CE mov %cx,%si # si = B 278 01a6 97 xchg %ax,%di 279 01a7 E84EFF call Gc 280 01aa 89D7 mov %dx,%di # di = AGAS LISTING sectorlisp.S page 7281 01ac 29F1 sub %si,%cx # cx = C - B 282 01ae F3A4 rep movsb 283 01b0 89F9 mov %di,%cx # cx = A + (C - B) 284 01b2 5A pop %dx # restore a 285 01b3 C3 1: ret 286 287 01b4 CECECECE .sig: .fill 512 - (2f - 1f) - (. - _start), 1, 0xce 287 CECECECE 287 CECECECE 287 CECECECE 287 CECECECE 288 01ef 20534543 1: .ascii " SECTORLISP v2 " 288 544F524C 288 49535020 288 763220 289 01fe 55AA .word 0xAA55 290 2: .type .sig,@object 291 .type kQuote,@object 292 .type kCond,@object 293 .type kAtom,@object 294 .type kCar,@object 295 .type kCdr,@object 296 .type kCons,@object 297 .type kEq,@object
SectorLISP started as an experiment in the Cosmopolitan repo where Justine Tunney used sed to get code built by a Linux x86_64 compiler to run in 16-bit mode. Its original size was 948 bytes, which was gradually reduced with help from Ilya Kurdyukov at BaseALT and Scott Wolchok at Facebook. Alain Greppin did a rewrite after having the brilliant insight of using Steve Russel's coding techniques from the LISP 1.5 manual which finally let us fit SectorLISP in one sector. Peter Ferrie at Amazon and Justine reduced SectorLISP further, by another 110 bytes. The ABC garbage collector was designed and implemented by Justine. The 99 byte Brainfuck implementation was written by Peter and Justine. The musical track is an instrumental recording of Das Model by Kraftwerk from 1978. Credit for the UNICODE paragraph separator trick for C / JS polyglots goes to a code golfer from Estonia named randomdude999. You're invited to come hang out with the SectorLISP team. Use the following link to join our Discord chatroom: https://discord.gg/FwAVVu7eJ4
Marc Feeley and Samuel Yvon recently wrote a paper on a Small Scheme VM, Compiler, and REPL which cites SectorLISP. Rather than targeting the legacy sector size of 512 bytes, they targeted the modern 4096-byte page size. That enabled them to offer a great deal more value in a comparatively tiny size, such as a bytecode compiler, which helps illuminates Gosling's original intentions for Java.
Funding for this blog post was crowdsourced from Justine Tunney's GitHub sponsors and Patreon subscribers. Your support is what makes projects like SectorLISP possible. Thank you.