Aug 7th, 2022 @ justine's web page

Using Landlock to Sandbox GNU Make

I've modified GNU Make to support strict dependency checking. This is all thanks to the Landlock LSM system calls which were introduced in Linux Kernel 5.13 twelve months ago. What it means is that Make can now solve the cache invalidation problem similar to Bazel except with 5x better performance.

Background

I blogged last month about our work porting OpenBSD pledge() and unveil() to Linux as part of the Cosmopolitan Libc project. The thought occurred to me that sandboxes aren't just good for security: they have applications in build systems too. So I used unveil() to patch GNU Make so it can function like a zero-configuration sandbox, and I'm making this work available to the community using the Actually Portable Executable format.

How It Works

The basic idea is when Make runs a command, that command should only have access to a limited number of files:

  1. The resolved command executable (read/execute permission)
  2. The "prerequisite" or dependent files (read-only permission)
  3. The "target" output file (read/write/create permission)

That way, if some rogue unit test accidentally tries to rm -rf /, the kernel will simply reject it using an EACCES error, because your root directory wasn't declared as a dependency in your Makefile config.

For convenience, I've also chosen to implicitly whitelist a few other hard-coded paths. The following files are always unveiled by Make:

Configuration

Landlock Make is configured simply by writing a normal Makefile. For example, you can read the landlock-make/Makefile template to get the basic idea. However there's sometimes cases where you want to do something special. Special variables have been introduced for this purpose, which can be specified on a per-target basis:

.UNVEIL works basically the same way as the new .EXTRA_PREREQS variable that was added to GNU Make this last year. You can specify as many paths as you want. The permission defaults to read-only, but you can override that by putting the appropriate letters with a colon in from of the file path. The permissions take effect recursively too.

It's important to use the private keyword, because GNU Make variable inheritance makes it far too easy to accidentally remove safety from everything. For example, if you define a variable on a rule that generates an executable without using private, then the variable definition will also apply to all the object files going into that executable.

Another configuration option is .STRICT (updated: now mandatory as of landlockmake v1.5) mode which turns off all the implicitly unveiled stuff, including $PATH resolution, which means you can explicitly define the perfect hermetic environment. Here's how Cosmopolitan uses it alongside a global .UNVEIL variable:

.STRICT = 1
.UNVEIL =			\
	rwcx:o/tmp		\
	libc/integral		\
	libc/disclaimer.inc	\
	rx:build/bootstrap	\
	rx:o/third_party/gcc	\
	/proc/self/status	\
	rw:/dev/null		\
	w:o/stack.log		\
	/etc/hosts		\
	~/.runit.psk

Performance

Landlock Make can build code five times faster than Bazel, while offering the same advantages in terms of safety. In other words, you get all the benefits of a big corporation build system, in a tiny lightweight binary that any indie developer can love.

To demonstrate this, I've configured this repository to compile 448 .c files which are linked into 40 executables. Building 448 files in 448 different sandboxes takes:

Landlock Make is the winner here and Bazel is wrekt. The benchmark was performed on a 2 core Ubuntu 22.04 VM with 4gb of RAM running Linux 5.15. Landlock requires Linux 5.13+. If you don't have Landlock in your kernel, then GNU Make will silently continue along without sandboxing.

Try It Out   [Linux] [OpenBSD] [MacOS] [FreeBSD] [NetBSD] [Windows]

Here's a patched prebuilt fat binary of Landlock Make for x86-64 and Arm64 operating systems. Sandboxing is only supported on Linux and OpenBSD. For the other platforms, you've just got a nice portable drop-in replacement for the GNU make command.

Here's a template project for getting started.

https://github.com/jart/landlock-make

The example repository explains how to write a best practices Makefile configuration that utilizes Landlock Make features. It also contains a Bazel configuration so you can reproduce our benchmarks.

git clone https://github.com/jart/landlock-make
cd landlock-make
build/bootstrap/make.com

You can build Landlock Make from source here:

git clone https://github.com/jart/cosmopolitan
cd cosmopolitan
make -j8 o//third_party/make/make.com

Source Code

Why It Matters

GNU Make already has a file dependency graph. It's a rich data structure you define when you write your Makefile. It's a no-brainer to leverage that data to implement a zero-configuration sandbox. That's the only way to automatically prove a build configuration is correct. This technique is commonly known as strict dependency checking. What it means is that each target must declare all its dependencies. This must happen, since otherwise GNU Make can't solve the second hardest problem in computer science, which is cache invalidation.

Without strict dependency checking, your Makefile is going to behave in strange and mysterious ways. You'll be constantly frustrated and running make clean whenever something goes wrong, which slows things down by forcing everything to start over. In the traditional world of Make, even if you take great care in writing your makefile, there's simply no way to prove it's correct without sandboxing. It's the missing link we've been wanting for decades. It's a surprise no one's done it sooner.

Google came to a similar conclusion back in the 2000's. They solved this by ditching GNU Make and inventing a new build system called Blaze. A blog post was published back in 2011 announcing their work. Google said strict dependency checking was the key motivator for reinventing things. Blaze was then later open sourced to the public as Bazel in 2015, but it wasn't until 2021 that it was able to do strict dependency checking.

Because Bazel was written a long time ago, it implements sandboxing in a clumsy way. Bazel creates a giant hierarchy of symbolic links. Then it mounts and unmounts a ton of folders to create a fake filesystem which is how they limit access. It's all written in Java, which isn't very popular in the open source community. Bazel does however deserve credit for all the work they put into making Java as tiny as possible. Bazel is shipped as a 40mb single-file binary that extracts itself on the fly. That's pretty impressive by Java standards, but it's still a monster compared to my slim and sexy 519kb make.com binary which runs on six operating systems and doesn't require extraction. It's only got a few microseconds of startup latency too.

Mega-corporations love Bazel because its safety benefits enable them to scale their eng efforts into monolithic repositories with petabytes of code. So naturally they don't care that much if Bazel is fifty megs. I however refuse to believe that safety and professionalism go hand in hand with bloat. Not at any scale. I believe we can have our cake and eat it too. That's why I view Landlock as being such a game changer. It lets us have 85% the benefits of Blaze, in a tiny lightweight package. Due to the fact that all the complexity of sandboxing is now being abstracted by the Linux Kernel, all that I needed to do was add about 200 lines of code to the GNU Make codebase. No root, no mounts, no chroot, no cgroups, and especially no Docker required! All you have to do is issue a system call that tells the kernel which paths should be accessible.

Troubleshooting

Here are some basic troubleshooting commands you can try, should you encounter any problems:

./make.com --strace   # system call logging
./make.com -pn        # dump build graph
./make.com --ftrace   # very verbose!

Caveats

Landlock Make offers the strongest sandboxing when you:

  1. Use static executables
  2. Vendor all tools and dependencies

If your build rule launches a dynamic or interpreted executable that relies on distro-installed files which are outside your project folder (e.g. /usr/bin/cc) then Make will react by unveiling a very broad list of paths:

So basically, depending on any system-provided functionality will schlep in nearly all system-provided functionality. This isn't a great situation to be in, since at that point, you're a hair's width away from needing Docker. If you're not sure if you're being impacted, then you can use make.com --strace to see what it does. The landlock-make GitHub template repository takes a more conservative approach, of vendoring a custom-built musl-cross-make gcc toolchain. It only relies on the system for very trivial commands, e.g. mkdir.

Yes, the Makefile config in the landlock-make GitHub template repo is very verbose. Cosmopolitan Libc has tools for solving that. The mkdeps program is able to crawl 1.5 million lines of code in 100ms on my PC to generate a 175,712 line o/depend file. It's so much faster than using gcc -M and it totally automates the arduous task of explicitly declaring header file dependencies. Give it a try. The download link is above.

The mkdeps.com program is usually invoked as follows:

./mkdeps.com -o o//depend -r o// @o//srcs.txt @o//hdrs.txt @o//incs.txt

The @ symbol is useful for alternatively passing arguments in a file, which is useful for situations where you have so many source files that they'd otherwise exceed ARG_MAX. Modern Make is really good at quickly generating arguments files. For example, you might configure mkdeps in your Makefile as follows:

uniq = $(if $1,$(firstword $1) $(call uniq,$(filter-out $(firstword $1),$1)))
o//srcs.txt: $(call uniq,$(foreach x,$(SRCS),$(dir $(x))))
	$(file >$@,$(SRCS))
o//hdrs.txt: $(call uniq,$(foreach x,$(HDRS) $(INCS),$(dir $(x))))
	$(file >$@,$(HDRS) $(INCS))
o//incs.txt: $(call uniq,$(foreach x,$(INCS) $(INCS),$(dir $(x))))
	$(file >$@,$(INCS))
o//depend: o//srcs.txt o//hdrs.txt o//incs.txt $(SRCS) $(HDRS) $(INCS)
	./mkdeps.com -o $@ -r o// @o//srcs.txt @o//hdrs.txt @o//incs.txt

Another thing to take into consideration, is it's best to refrain from using shell script syntax in your build commands. If you don't use any special characters, then GNU Make has an optimization where it'll pass your command and arguments directly to execve(). That way Landlock will know exactly which executable should be whitelisted. If you use special shell syntax, then the files in your shell script might not be whitelisted automatically, since we currently aren't parsing that.

Since Landlock is still very new, there's a few peculiar kinks about it right now that some folks might find surprising. While we've generally been able to make it consistent on Linux with the OpenBSD behaviors, there's still a few places where it differs slightly.

For example, unlike OpenBSD, Linux does nothing to conceal the existence of paths. Even with an unveil() policy in place, it's still possible to access the metadata of all files using functions like stat() and open(O_PATH), provided you know the full path ahead of time. This means a sandboxed process can always, for example, determine how many bytes of data are in /etc/passwd, even through the contents of the file can't actually be read. The good news is it's still not possible to use opendir() and go fishing for paths which weren't previously known. So if you want to play up your secrecy in addition to security, consider OpenBSD instead of Linux.

Another truly weird behavior of Linux is that Landlock currently isn't able to restrict file truncation. For example, did you know that opening a file on Linux using open(O_RDONLY | O_TRUNC) will actually delete the contents of the file? The same is also the case with the truncate() system call, which is a blind spot with Landlock. Right now Cosmopolitan Libc addresses this by blocking those corner cases using the SECCOMP BPF security policies we've programmed into our pledge() polyfill. However we're not currently using pledge() in make.com, since the emphasis is on preventing accidental misuse rather than preventing malicious misuse. Please note, this may change in the future, should we decide to beef up the security of make.com. If this topic interests you, then please reach out and contact us, to let us know what use cases and dreams you have in mind!

Finally please note that we haven't incorporated the GNU Make tests into the Cosmopolitan Libc continuous integration system yet. Our C library is still a relative newcomer that has gaps in terms of things like locale support. The last time we checked the GNU Make test suite, our port was 80% conformant. That hasn't stopped us from eating our own dogfood though, since we use make.com every single day to maintain all our repositories. If you encounter any issues with it, or are willing to help us expand our C library implementation, then once again please don't hesitate to reach out.

Future Roadmap

Since my GNU Make fork is an Actually Portable Executable that runs on six operating systems, it'd be great to polyfill unveil() on other operating systems too. The next fun project on my list will probably be looking into FreeBSD jails, since I've heard so many good things about them on online forums.

Special Thanks

I'd like to thank Mickaël Salaün for his work on bringing Landlock to the Linux Kernel, as well as being a big help on Twitter. Stephen Gregoratto contributed the Linux unveil() implementation to Cosmopolitan Libc in #490. Gautham Venkatasubramanian contributed the initial port of GNU Make to Cosmopolitan Libc in PR #305. I'd also thank Günther Noack for offering superb code reviews and feedback.

Funding

[United States of Lemuria - two dollar bill - all debts public and primate]

Funding for the development of this project was crowdsourced from Justine Tunney's GitHub sponsors and Patreon subscribers. Your support is what makes projects like Landlocked Make possible. Thank you.