July 13th, 2022 @ justine's web page
OpenBSD is an operating system that's famous for its focus on security. Unfortunately, OpenBSD leader Theo states that there are only 7000 users of OpenBSD. So it's a very small but elite group, that wields a disproportionate influence; since we hear all the time about the awesome security features these guys get to use, even though we usually can't use them ourselves.
Pledge is like the forbidden fruit we all covet when the boss says we must use things like Linux. Why does it matter? It's because pledge() actually makes security comprehensible. Linux has never really had a security layer that mere mortals can understand. For example, let's say you want to do something on Linux like control whether or not some program you downloaded from the web is allowed to have telemetry. You'd need to write stuff like this:
static const struct sock_filter kFilter[] = { /* L0*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, syscall, 0, 14 - 1), /* L1*/ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, OFF(args[0])), /* L2*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 2, 4 - 3, 0), /* L3*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 10, 0, 13 - 4), /* L4*/ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, OFF(args[1])), /* L5*/ BPF_STMT(BPF_ALU | BPF_AND | BPF_K, ~0x80800), /* L6*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 1, 8 - 7, 0), /* L7*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 2, 0, 13 - 8), /* L8*/ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, OFF(args[2])), /* L9*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 0, 12 - 10, 0), /*L10*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 6, 12 - 11, 0), /*L11*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 17, 0, 13 - 11), /*L12*/ BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW), /*L13*/ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, OFF(nr)), /*L14*/ /* next filter */ };
Oh my gosh. It's like we traded one form of security privilege for another. OpenBSD limits security to a small pond, but makes it easy. Linux is a big tent, but makes it impossibly hard. SECCOMP BPF might as well be the Traditional Chinese of programming languages, since only a small number of people who've devoted the oodles of time it takes to understand code like what you see above have actually been able to benefit from it. But if you've got OpenBSD privilege, then doing the same thing becomes easy:
pledge("stdio rpath", 0);
That's really all OpenBSD users have to do to prevent things like leaks of confidential information. So how do we get it that simple on Linux? I believe the answer is to find someone with enough free time to figure out how to use SECCOMP BPF to implement pledge. The latest volunteer is me, so look upon my code ye mighty and despair.
There's been a few devs in the past who've tried this. I'm not going to name names, because most of these projects were never completed. When it comes to SECCOMP, the online tutorials only explain how to whitelist the system calls themselves, so most people lose interest before figuring out how to filter arguments. The projects that got further along also had oversights like allowing the changing of setuid/setgid/sticky bits. So none of the current alternatives should be used. I believe this effort gets us much closer to having pledge() than ever before.
I originally wrote my pledge() polyfill for the redbean web server as a sandboxing solution. However it turns out pledge() is robust enough as an abstraction that I thought it'd be useful to create a small command line utility which launches processes under pledge(), so that anyone can use it, without having to configure it in C code.
pledge-1.8.com
88kb - x86-64 elf executable (debug data, source code)
Written by Justine Alexandra Roberts Tunney (Twitter, GitHub, LinkedIn)
22d33574e244883a87e54169f4ed82ea40cabb17b79c9e57559b0fa8454dd698
That binary will work on all Linux distros since RHEL6. Root privileges
are not required. You just use it to wrap your command invocations. It's
so tiny and lightweight that it only adds a few microseconds of startup
latency to your program. It's great for shell scripts and automated
tools. For example, if you want to run the list directory command, and
only permit that command to do basic stdio (-p stdio
) and
filesystem path (-p rpath
) reading in the current directory
(-v .
), then you'd say:
$ wget https://justine.lol/pledge/pledge.com
$ chmod +x pledge.com
$ ./pledge.com -v. -p 'stdio rpath' ls
file listing output...
You can now be certain your ls command isn't doing things like spying on you, or uploading your bitcoin wallet to the cloud. However let's say authorizing network access is what you want. One command that has a real legitimate need for that is curl. However, since it needs needs DNS, it's a little trickier because DNS is the Hunger Games of systems engineering, and not all Libc implementations agree on how it should be implemented. Here's some strategies depending on your tools and distro:
# standard curl on alpine linux 3.16 (musl) ./pledge.com -p 'stdio rpath dns inet' \ curl -s http://justine.lol/hello.txt # standard curl on ubuntu 22.04 (glibc) ./pledge.com -p 'stdio rpath inet dns tty sendfd recvfd' \ curl -s http://justine.lol/hello.txt hello world # cosmopolitan's curl as static binary # see git clone and make instructions below ./assimilate.com ./curl.com ./pledge.com -p 'stdio rpath dns inet' \ ./curl.com https://justine.lol/hello.txt # cosmopolitan's curl as ape binary # non-assimilated cosmopolitan ape binary ./pledge.com -p 'stdio rpath prot_exec dns inet' \ ./curl.com https://justine.lol/hello.txt
The choice of C library usually impacts which permissions are needed. Musl and Cosmopolitan need the least permission since they were built with sandboxing in mind. Glibc on the other hand does some strange stuff with DNS, which requires us to weaken the sandbox with recvmsg() and sendmsg() which also enable SCM_RIGHTS unfortunately.
Both Musl and Glibc use dynamic binaries. In order to be able to launch
them, pledge.com temporarily implies both exec
and
prot_exec
. We then inject an
LD_PRELOAD
library which runs inside the process at
initialization. That library calls pledge() again automatically, and
drops the both exec
and prot_exec
privileges
if needed. This dynamic library also lets us print helpful messages to
stderr to explain which promises are needed when a violation occurs.
Let's say you have a public ssh server and you want to let people read and take notes of your book collection, but you don't want anyone rewriting your books. In that case, you can repupose something like the nano command as a strictly read-only editor. Since nano has a TUI interface, you'd need to grant it TTY privileges.
./pledge.com -v $HOME/books -np 'stdio rpath tty' nano ~/books/bofh.txt
Here's how you'd sandbox Vim to only be able to change the current directory, tested on Alpine and Ubuntu.
./pledge.com \ -v rwc:. \ -v /etc/vim \ -v $HOME/.vimrc \ -v /usr{,/local}/share/vim \ -p 'stdio rpath wpath cpath tty prot_exec' \ vim
Here's how you'd sandbox Emacs to only be able to change the current directory, tested on Alpine and Ubuntu.
./pledge.com \ -v rwc:. \ -v $HOME/.emacs \ -v rwc:$HOME/.emacs.d \ -v /etc/emacs \ -v /etc/passwd \ -v /usr/share/X11/locale \ -v /usr{,/local}/{libexec,share}/emacs \ -p 'stdio rpath wpath cpath tty proc tmppath prot_exec' \ emacs -nw
If your program crashes, then you can figure out why by tracing the binary and seeing which system call is EPERM'ing or which veiled path is EACCES'ing. For example, let's see what happens if we reduce the privileges to just stdio.
$ strace -ff ./pledge.com -p stdio ls
open("/etc/ld-musl-x86_64.path", O_RDONLY|O_CLOEXEC) = -1 EPERM (Operation not permitted)
Well that didn't take long. Now that you know what's wrong, you would
then consult the Promises section to see which
promise you need. For example, you'd know open(O_RDONLY)
is
provided by rpath
and that in order to fork()
you need -p proc
.
In addition to polyfilling pledge, your pledge command is also able to apply some other very important safety hacks that aren't obvious to the uninitiated. For example, we've all run a program before that hammers the system. Linux is very generous in how much memory programs can allocate. An accidental loop in just one program, by default on Linux, will absolutely take the whole machine out of commission for a few minutes before the "OOM Killer" kicks in. In other cases, like a fork() bomb, the default Linux environment provides no such protection, so it's essentially equivalent to a blue screen of death.
Your pledge command imposes some perfectly reasonable resource quotas on programs by default, to prevent that from happening. By default, unless you tune the flags, a program is allowed to use only the amount of memory you have. If you've permitted it to fork off new processes, then it won't be able to spawn more of them at the same time than twice your number of CPUs. This way if your sandboxed program gets out of control, it'll most likely crash itself before it can crash your whole computer.
We also have a niceness feature. Have you ever had a program use so much
disk i/o that everything crawls to a halt? You run some program, and
then suddenly every small file takes seconds to load in Emacs? Your
pledge command can fix that. If you're got a compute heavy long running
program, then pass the -n
flag for a nice
that's actually nice. The naive nice command doesn't really do much,
since it doesn't change the scheduler and it doesn't change the i/o
priority. This command actually does. Using the -n
flag
will guarantee the sandbox program will stay out of the way, since the
kernel will only let it use spare capacity.
-p 'stdio rpath'
. It's repeatable. May
contain any of following separated by spaces:PERM
defaults to r
and may have any
of the following:
r
makes PATH
available for read-only
path operations, corresponding to the pledge promise "rpath".
w
makes PATH
available for write
operations, corresponding to the pledge promise "wpath".
x
makes PATH
available for execute
operations, corresponding to the pledge promises "exec" and
"execnative".
c
allows PATH
to be created and
removed, corresponding to the pledge promise "cpath".
Some paths are implicitly defined by pledge.com depending on which promises you've used. See the Implicitly Unveiled Paths section for further details. Unveiling is implemented using Landlock which requires Linux Kernel 5.13+. On older kernels, all filesystem paths will be allowed (unless you use the chroot flag).
pledge.com -T pledge
to test for
the availability of this feature. Please note this only impacts very old
Linux systems like RHEL5 since SECCOMP was introduced around 2010.
pledge.com
-T unveil
to test for the availability of this feature.
SIGXCPU
signal is sent to your program, after which it has
precisely one second to gratefully shutdown before SIGKILL
is used.
ENOMEM
which will trickle down
into functions like malloc() failing.
EAGAIN
.
SIGXFSZ
signal is sent to your program. If the limit is
150% exceeded then SIGKILL
is used.
EMFILE
.
The pledge.com program will automatically unveil the following paths for your convenience when certain conditions are met. In most cases, we use the categories you've pledged as a hint as to what needs unveiling. Please note that this automatic unveiling does not apply to the Linux C API interface for pledge(), where unveil() must be called explicitly. However OpenBSD will unveil some key paths for things like stdio. The files we've chosen below are a superset of what OpenBSD does, intended to conform to the same principles adapted for Linux.
pledge("stdio")
-v /dev/fd
-v w:/dev/log
-v /dev/zero
-v rw:/dev/null
-v rw:/dev/full
-v rw:/dev/stdin
-v rw:/dev/stdout
-v rw:/dev/stderr
-v /dev/urandom
-v /etc/localtime
-v rw:/proc/self/fd
-v /proc/self/stat
-v /proc/self/status
-v /usr/share/locale
-v /proc/self/cmdline
-v /usr/share/zoneinfo
-v /proc/sys/kernel/version
-v /usr/share/common-licenses
-v /proc/sys/kernel/ngroups_max
-v /proc/sys/kernel/cap_last_cap
-v /proc/sys/vm/overcommit_memory
pledge("rpath")
-v /proc/filesystems
pledge("inet")
-v /etc/ssl/certs/ca-certificates.crt
pledge("dns")
-v /etc/hosts
-v /etc/hostname
-v /etc/services
-v /etc/protocols
-v /etc/resolv.conf
pledge("tty")
-v rw:$PTY
-v rw:/dev/tty
-v rw:/dev/console
-v /etc/terminfo
-v /usr/lib/terminfo
-v /usr/share/terminfo
pledge("prot_exec")
-v rx:/usr/bin/ape
pledge("vminfo")
-v /proc/stat
-v /proc/meminfo
-v /proc/cpuinfo
-v /proc/diskstats
-v /proc/self/maps
-v /sys/devices/system/cpu
pledge("tmppath")
-v rwc:/tmp
-v rwc:$TMPPATH
-v rx:/lib
-v rx:/lib64
-v rx:/usr/lib
-v rx:/usr/lib64
-v rx:/usr/local/lib
-v rx:/usr/local/lib64
-v /etc/ld-musl-x86_64.path
-v /etc/ld.so.conf
-v /etc/ld.so.cache
-v /etc/ld.so.conf.d
-v /etc/ld.so.preload
Actually Portable Executables should be written to call pledge() internally. But if you want to secure an APE binary that doesn't, using the pledge.com command, then you need to convert (or "assimilate") it into the ELF format beforehand. You can usually do this by saying:
$ file redbean.com redbean.com: DOS/MBR boot sector $ ./redbean.com --assimilate $ file redbean.com redbean.com: ELF 64-bit LSB executable
Please note that won't work if you're using the binfmt_misc with the new APE Loader then you can't run the APE shell script to assimilate your binary. We instead provide a new assimilate.com program which can be used to convert APE programs to ELF or Mach-O.
assimilate.com
Works on x86-64 Linux+Mac+Windows+FreeBSD+NetBSD+OpenBSD
92kb - PE+ELF+MachO+ZIP+SH executable (debug data, source code)
Written by Justine Alexandra Roberts Tunney (Twitter, GitHub, LinkedIn)
593a8119049e9e8a88d29f80af83bfdbb5fcdd8a4cbad934af05dd6a5145ce77
Pledge works best when developing software using Cosmpolitan Libc. You can get started relatively easily writing pledge() programs using the cosmopolitan monorepo. The zero config solution is to just plop this program file into the examples folder. Start by cloning the repo:
$ git clone https://github.com/jart/cosmopolitan $ cd cosmopolitan $ nano examples/mypledge.c
You can then copy and paste this code:
#include "libc/calls/calls.h" #include "libc/stdio/stdio.h" int main() { pledge("stdio", 0); printf("hello world\n"); }
You can then build and run your program as follows:
$ make -j8 o//examples/mypledge.com $ o//examples/mypledge.com hello world
One of the things you may have noticed about the pledge.com command, is
its most restrictive mode (pledge.com -p "" cmd...
) can't
actually be used. Your program will just crash. That's because it's
intended for the C API. What it means is that your process or thread
won't be able to call any system call except exit. Such a program might
sound impossible, but you can actually communicate between processes
using shared memory. For example, here's how you'd do it with threads.
int enclave(void *arg, int tid) { if (pledge("", 0)) return 1; int *job = arg; // get job job[0] = job[0] + job[1]; // do work return 0; // exit } int main() { struct spawn worker; int job[2] = {2, 2}; // create workload _spawn(enclave, job, &worker); // create worker _join(&worker); // wait for exit assert(job[0] == 4); // check result }
The above example shows an enclaved worker doing some kind of computational task, possibly executing untrusted code, and then storing the result to some memory location that the parent thread can see when the worker has finished executing. It works great and is fast.
One of the disadvantages of the above example, is that the enclaved worker has unfettered access to your stack memory and might make a mess of things. That's potentially creepy and not very enclaved. One way to fix that is to use fork() instead of threads. In that case, you can explicitly whitelist which memory is shared.
int ws; // create small shared memory region int *job = mmap(0, FRAMESIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); job[0] = 2; // create workload job[1] = 2; if (!fork()) { // create enclaved worker if (pledge("", 0)) _Exit(1); job[0] = job[0] + job[1]; // do work _Exit(0); } wait(&ws); // wait for worker assert(WIFEXITED(ws)); assert(WEXITSTATUS(ws) == 0); assert(job[0] == 4); // check result munmap(job, FRAMESIZE);
Most of our the Cosmopolitan Libc unit tests have been set up to use pledge() these days. Not necessarily because we're concerned about them being compromised, but because the pledge function has outstanding documentation value in helping people understand our tests, since it readily communicates what system functionality they need. For example, our tests for the access() filesystem function says:
__attribute__((__constructor__)) static void init(void) { pledge("stdio rpath wpath cpath fattr", 0); errno = 0; }
When you write your own Actually Portable Executables, you also get some
added security benefits compared to pledge.com. For example, another
famous OpenBSD system call is msyscall() which causes the kernel to
validate the RIP register of anything that issues a system call. In
Cosmopolitan, calling pledge() will polyfill that feature too
automatically, to only allow functions which are annotated with
the privileged
keyword to use SYSCALL
. What
that means is if someone manages to compromise your server to inject
executable code into your program's memory, then that code effectively
will have pledge("", 0)
privileges, even if when your app
called pledge(), it specified something much broader. The redbean web
server's unix.pledge()
function is also able to take advantage of this.
File system access is a blind spot [update 2022-07-22: we now have unveil() thanks to the Landlock system calls introduced a one year ago]. OpenBSD solves this with another famous system call called unveil(), which lets users control file system paths too. Right now there's no clear way to implement that for Linux. However our pledge() polyfill does do a reasonable job in restricting which file system operations are possible. But once you permit the file system ops, the ops are allowed to happen on pretty much any file the user has access to.
I personally don't view this as a problem. What I love about pledge.com
is it tells me if the programs I run that I downloaded from random
strangers on the Internet, are actually the good little command line
citizens that they claim to be. For example, if I download a tool for
computing some math, or compressing a file, then it really shouldn't
need any access except -p "stdio rpath"
especially if I'm
able to use pipes. So I can use pledge.com to make sure the command
keeps its promise and lets me know if there's any surprising behaviors.
So this is great security if you're dealing with command line programs
that are written in a conscientious manner. If it's only able to read
files and can't talk to the Internet, then seriously, what could it
possibly do? It's such a simple pareto-optimized niche that I can't
believe no one's made it easily addressable until now.
However, there's always going to be that one program you want that's power hungry, possibly due to bloated frameworks and dependencies. In that case, we may want access to some (but not all) of the file system. pledge.com is able to address the need somewhat using chroot(). It's worth noting though that chroot() has weaknesses that kernel devs have refused to fix for decades. Most of the docs on this subject are unprofessional and crazy. For example, the chroot(2) man page is probably the only category 2 man page I've ever seen that uses shell script code to describe its functionality. As far as I can tell, the only convincing weakness with chroot() is that the jail is only locked from the inside. If you take away the freedom of a process by putting it in a chroot jail, then another process that's free can use its freedom to bust its friend out of jail. For example, here's how root can leave a backdoor that lets the process escape:
mkdir("/tmp/mydir", 0755); // privileged user opens a backdoor int dirfd = open("/tmp", O_RDONLY | O_DIRECTORY); // process enters chroot jail chdir("/tmp/mydir"); chroot("/tmp/mydir"); // process escapes jail fchdir(dirfd); chdir(".."); // list root directory struct dirent *e; DIR *d = opendir("."); while ((e = readdir(d))) { printf("%s\n", e->d_name); } closedir(d);
The Linux devs could fix that if they wanted to. However I personally don't see why it's a total dealbreaker, pledge.com helps avoid it by closing rogue file descriptors at startup using poll(). What even more surprising is that this weakness is also exploitable on OpenBSD, since they too seem to have given up on securing the traditional chroot() call. But at least OpenBSD provides an alternative that's easy to use, called unveil(). It'd be great to see that leadership from the Linux kernel, but instead we just see blog posts from companies like RedHat saying that having chroot() will make us more insecure than having no security at all. It's like banning locks because lockpick kits exist. RedHat must be experts at mental gymnastics to publish such communiqués. It's also comical that Linux addresses the problem by restricting chroot() to the root user account, since clearly something which is so "insecure" will become more secure if you only do it from the most privileged user. What an unfortunate state of affairs, since many of us have needed to look elsewhere for answers, and the only folks offering those right now is bloatware like Docker that locks-in your filesystem with a bunch of cryptically named tar files. And they say that Docker isn't a security layer too! Even though it's based things like cgroups which are even more elite and difficult to understand than SECCOMP BPF. We can only guess why the kernel devs do it. Maybe they're afraid of issue workload burnout and figure people won't complain about security if no one understands it! That's something we're working to change.
It should also be noted that there's some features OpenBSD bakes into pledge() that we're not able to polyfill with Linux SECCOMP BPF. One of the things OpenBSD does is it can check file system paths, in order to loosen up restrictions around things like accessing the time zone database. This isn't a problem if you're a Cosmopolitan Libc user. Because APE binaries don't read tzdata from the filesystem and instead embed time zone data inside the ZIP structure of the binary. However it could potentially be problematic if you're using pledge.com to launch binaries that are provided by your distro. Ask your friendly distro maintainers to improve their security solutions. If they can't, then you can always switch to Cosmopolitan Libc.
Another caveat is that, so far, I've only implemented the things described in the OpenBSD pledge(2) manual page. We still need to reconcile this properly with the primary materials which would be the OpenBSD pledge() kernel source code. We also need more community feedback to make sure there aren't things we haven't considered. For example, Linux has a lot of sneaky capabilities in a shifting landscape that aren't always widely understood, which can potentially bite the authors of security tools, even when they've done due diligence.
I've also only really tested this on console applications. If you want a pledge() that's likely to work with GUIs, then, knowing the way the Linux desktop goes, you really should consider SerenityOS since Andreas added pledge() support a couple years ago.
Pledging causes most system calls to become unavailable. Your system
call policy is enforced by the kernel, which means it can propagate
across execve() if permitted. This system call is supported on OpenBSD
and Linux where it's polyfilled using SECCOMP BPF. The way it works on
Linux is verboten system calls will raise EPERM
whereas
OpenBSD just kills the process while logging a helpful message to
/var/log/messages explaining which promise category you needed.
By default exit() is allowed. This is useful for processes that perform
pure computation and interface with the parent via shared memory. On
Linux we mean sys_exit (_Exit1), not sys_exit_group (_Exit). The
difference is effectively meaningless, since _Exit() will attempt both.
All it means is that, if you're using threads, then a pledge("",
0)
thread can't kill all your threads unless
you pledge("stdio")
.
Once pledge is in effect, the chmod functions (if allowed) will not
permit the sticky/setuid/setgid bits to change. Linux
will EPERM
here and OpenBSD should ignore those three bits
rather than crashing.
User and group IDs can't be changed once pledge is in effect. OpenBSD
should ignore chown without crashing; whereas Linux will
just EPERM
.
Memory functions won't permit creating executable code after pledge.
Restrictions on origin of SYSCALL
instructions will become
enforced on Linux (cf. msyscall()) after pledge too, which means the
process gets killed if SYSCALL
is used outside the
.privileged section. One exception is if the "exec" group is specified,
in which case these restrictions need to be loosened.
Using pledge is irreversible. On Linux it
causes PR_SET_NO_NEW_PRIVS
to be set on your process;
however, if "id" or "recvfd" are allowed then then they theoretically
could permit the gaining of some new privileges. You may call pledge()
multiple times if "stdio" is allowed. In that case, the process can only
move towards a more restrictive state.
pledge() can't filter file system paths or internet addresses. For example, if you enable a category like "inet" then your process will be able to talk to any internet address. The same applies to categories like "wpath" and "cpath"; if enabled, any path the effective user id is permitted to change will be changeable.
The Linux pledge() polyfill isn't able to support the OpenBSD
execpromises
parameter.
Your promises is a string that may include any of the following groups delimited by spaces.
Funding for the development of pledge() on Linux was crowdsourced from Justine Tunney's GitHub sponsors and Patreon subscribers. Your support is what makes projects like Cosmopolitan Libc possible. Thank you.