We’re constantly running software that we don’t exactly understand. We know it does things, useful things, but now always how it gets the necessary information, or what effects it causes on our system. Is it from a file, deep within my hard drive? Is it from the operating system? Or is it fetching from the Internet?
And so, in comes ✨strace✨ . It’s a program that lets you trace (inspect) all the communication between a process and the Linux Kernel. This means you can monitor all the information that comes in and out of your process. It’s an invaluable tool for when a process is misbehaving, or to learn more about how things work.
Nelly Just read the source code. Most software in Linux is source available. It’s not that hard.
While the source code is the ultimate source of truth, it’s also very time consuming to read, if you even know how to read it, that is. Most of the time, we don’t care much about each individual instruction a program is doing, only what it’s reading and writing.
In this article, I intend to not only explain strace’s output format, but also to go through the entire output of tracing a regular program, explaining bit by bit of what’s going on. My hope is that you, the reader, will leave this article capable of parsing strace’s output to find what you’re looking for.
First, what does strace do?
Strace is a syscall tracer. It prints every syscall a process does. Without syscalls, a process is just a very complicated electric heater, so inspecting syscalls is crucial for understanding what it does.
A system call (abbreviated to syscall) is a request that programs make to the operating system (OS), which in this case is the Linux Kernel. The OS is responsible for managing all resources on a computer, so if a program wants to access any resource, it must ask the OS first.
You can think of a syscall as a function call to the OS. It has arguments and a
return value (always an integer). Reading files? open and read syscalls;
Writing files? open and write syscalls; Connecting to the internet? socket
and connect syscalls.
Running strace is very simple: strace [OPTIONS]... PROGRAM [ARGS].... So, for
tracing a program, like printf, you’d do strace printf "hello, world!".
Understanding the output
Let’s try running the line mentioned above: strace printf "Hello, wold!"
$ strace printf "Hello, world!"
execve("/usr/bin/printf", ["printf", "Hello, world!"], 0x7ffdaab444e8 /* 18 vars */) = 0
brk(NULL) = 0x55f320282000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=22731, ...}) = 0
mmap(NULL, 22731, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360w\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
fstat(3, {st_mode=S_IFREG|0755, st_size=2145632, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274bc000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
mmap(NULL, 2169904, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9c27200000
mmap(0x7f9c27224000, 1511424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f9c27224000
mmap(0x7f9c27395000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f9c27395000
mmap(0x7f9c27404000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x203000) = 0x7f9c27404000
mmap(0x7f9c2740a000, 31792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9c2740a000
close(3) = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274b9000
arch_prctl(ARCH_SET_FS, 0x7f9c274b9740) = 0
set_tid_address(0x7f9c274b9a10) = 2670
set_robust_list(0x7f9c274b9a20, 24) = 0
rseq(0x7f9c274b9680, 0x20, 0, 0x53053053) = 0
mprotect(0x7f9c27404000, 16384, PROT_READ) = 0
mprotect(0x55f2e229b000, 4096, PROT_READ) = 0
mprotect(0x7f9c27505000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
getrandom("\x2f\x77\x04\x87\x6d\xb5\xb8\xd2", 8, GRND_NONBLOCK) = 8
munmap(0x7f9c274be000, 22731) = 0
brk(NULL) = 0x55f320282000
brk(0x55f3202a3000) = 0x55f3202a3000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2998, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 4096) = 2998
read(3, "", 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=258, ...}) = 0
mmap(NULL, 258, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c3000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/gconv/gconv-modules.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=27010, ...}) = 0
mmap(NULL, 27010, PROT_READ, MAP_SHARED, 3, 0) = 0x7f9c274b2000
close(3) = 0
futex(0x7f9c2740972c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
mmap(NULL, 23, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c2000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
mmap(NULL, 47, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c1000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
mmap(NULL, 127, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c0000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_NAME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_NAME", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=62, ...}) = 0
mmap(NULL, 62, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274bf000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_PAPER", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=34, ...}) = 0
mmap(NULL, 34, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=48, ...}) = 0
mmap(NULL, 48, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274b1000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=270, ...}) = 0
mmap(NULL, 270, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274b0000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1406, ...}) = 0
mmap(NULL, 1406, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274af000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_TIME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_TIME", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=3360, ...}) = 0
mmap(NULL, 3360, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274ae000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=50, ...}) = 0
mmap(NULL, 50, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274ad000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=367708, ...}) = 0
mmap(NULL, 367708, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c27453000
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "Hello, world!", 13) = 13
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++ This was run on an x86_64 Arch Linux distribution.
Each line in the output is a syscall made by the program, printf. It has the
syscall name, the arguments, and the return value. If there was an error, it’ll
also helpfully append the
errno symbol to the end of
the line.
For example, the line
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
shows the syscall access, called with the arguments "/etc/ld.so.preload" and
R_OK, and returned -1, which means the file was not found.
An important detail: strace will print the syscall arguments after the syscall
is done. This means it will show the final value of any buffers or structs
modified by the kernel. This is important for syscalls like read and fstat,
which fills up the buffer passed to them as an argument.
This information is a great start, but this output just looks like a huge wall of text. It’s barely comprehensible. So what I’ll try to do here is break it down into smaller chunks and try to explain what’s going on. My hope is that you, the reader, will leave this article capable of understanding strace’s output as a whole, not line by line.
The exec
The first syscall is always exec. This is because strace will start tracing as
soon as the process forks, which happens before the accompanying exec. Here’s
the line for the syscall:
execve("/usr/bin/printf", ["printf", "Hello, world!"], 0x7ffdaab444e8 /* 18 vars */) = 0
fork? exec? What are those? fork asks the kernel to duplicate the current process. exec asks the kernel
to replace the current process with a different program, specified at the path
provided as the first argument.
So strace will clones itself with fork, and the process being traced will then
call exec to become the process we specified in the command line (in this
case, printf)
The Dynamic Linker (aka Runtime Loader)
The dynamic linker is in itself very complicated and very out of scope for this article, but it does make a lot of syscalls, which show up in strace. We’ll cover just the basics of it; enough to explain its syscalls.
Some libraries are so common that almost every sigle program depends on them (ex: libc). These libraries could be embeded into the programs, but that would cause almost every program running to have a duplicate of the same libary, wasting disk space and ram. Instead, what programs do is reference these libraries by name, but not include the actual code of the library. When the program starts, it executes a small subroutine that fetches the contents of library’s contents from the filesystem, loads them into the process’s address space, and fixes up any references to the library in the program to the correct addresses. This subroutine is the dynamic linker.
Do note that not all programs use the dynamic linker. A program with no runtime dependencies is called a static program, and does not invoke the dynamic linker.
With this in mind, let’s take a look at the output of strace that’s related to the dynamic linker:
brk(NULL) = 0x55f320282000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=22731, ...}) = 0
mmap(NULL, 22731, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360w\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
fstat(3, {st_mode=S_IFREG|0755, st_size=2145632, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274bc000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
mmap(NULL, 2169904, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9c27200000
mmap(0x7f9c27224000, 1511424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f9c27224000
mmap(0x7f9c27395000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f9c27395000
mmap(0x7f9c27404000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x203000) = 0x7f9c27404000
mmap(0x7f9c2740a000, 31792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9c2740a000
close(3) = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274b9000
arch_prctl(ARCH_SET_FS, 0x7f9c274b9740) = 0
set_tid_address(0x7f9c274b9a10) = 2670
set_robust_list(0x7f9c274b9a20, 24) = 0
rseq(0x7f9c274b9680, 0x20, 0, 0x53053053) = 0
mprotect(0x7f9c27404000, 16384, PROT_READ) = 0
mprotect(0x55f2e229b000, 4096, PROT_READ) = 0
mprotect(0x7f9c27505000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
getrandom("\x2f\x77\x04\x87\x6d\xb5\xb8\xd2", 8, GRND_NONBLOCK) = 8
munmap(0x7f9c274be000, 22731) = 0 This still looks like an incomprehensible wall of text, so let’s break it into smaller chunks:
brk(NULL) = 0x55f320282000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=22731, ...}) = 0
mmap(NULL, 22731, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3) = 0
The first thing it tries to do is load the ld.so.preload file, which is
missing, and then the ld.so.cache file, which is found and then mmaped into
memory. The ld.so.cache file assists the dynamic linker in finding the
location for the libraries it’ll later load.
mmap? From the point of view of a process, a memory page is a block of continuous non-overlapping memory (usually 4096 bytes or 0x1000, but can vary from system to system). There are more details regarding virtual memory but it’s out of scope for this article. When allocating/deallocating memory, the kernel doesn’t track individual bytes, only memory pages.
mmap is a system call
that allocates a new page to the process:
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);It has two modes relevant to this article: one for allocating file-backed pages,
and one for allocating anonymous pages. File-backed pages are initialized with
the contents of the file associated with the fd file descriptor, starting at
offset. Anonymous pages are zero-initialized and are not associated with any
file.
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360w\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
fstat(3, {st_mode=S_IFREG|0755, st_size=2145632, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274bc000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
TODO - add proper link to wikipedia program headers
This is when things start getting interesting, the dynamic linker is loading its
first (and, in this case, the only) library: libc.so.6. This is the famous
libc, which in this case is (correctly) assumed to be located at /usr/lib/.
The first read will load the first 832 bytes of the file on a buffer. This is
a guess made by the glibc authors of how long the
program headers will be. Since libc is usually huge,
832 is not enough, so it later does a pread call to load 896 bytes from the
library, which is presumably the size of all the necessary headers.
The fstat call reads some information about the library, most importantly the
file size, as it’ll be used soon. It then allocates 8192 bytes of memory (two
memory pages in my system) through mmap, presumably to store metadata about
the loaded libraries. Then, for some reason, it does another pread syscall to
a part of the library it previously read, which is honestly beyond me.
mmap(NULL, 2169904, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9c27200000
mmap(0x7f9c27224000, 1511424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f9c27224000
mmap(0x7f9c27395000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f9c27395000
mmap(0x7f9c27404000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x203000) = 0x7f9c27404000
mmap(0x7f9c2740a000, 31792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9c2740a000
close(3) = 0
These mmaps are part of the library loading process. The first mmap will map
the entire library into the process’s memory space, and the following mmaps
will adjust the necessary permissions for the segments of the library. Lastly,
it closes the library file.
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274b9000
arch_prctl(ARCH_SET_FS, 0x7f9c274b9740) = 0
set_tid_address(0x7f9c274b9a10) = 2670
set_robust_list(0x7f9c274b9a20, 24) = 0
rseq(0x7f9c274b9680, 0x20, 0, 0x53053053) = 0
These syscalls are responsible for setting up the thread-local memory used for multithreading. It’s really specific stuff that’s not really worth going into.
mprotect(0x7f9c27404000, 16384, PROT_READ) = 0
mprotect(0x55f2e229b000, 4096, PROT_READ) = 0
mprotect(0x7f9c27505000, 8192, PROT_READ) = 0
mprotect sets access protection to previously allocated memory pages. These
pages were previously allocated with write access, but are now being set to
read-only.
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
getrandom("\x2f\x77\x04\x87\x6d\xb5\xb8\xd2", 8, GRND_NONBLOCK) = 8
The prlimit64 is reading the current maximum stack size of the process, and
getrandom is fetching 8 random bytes. I’m not completely sure what this is
for, but I assume it’s for some sort of initialization.
munmap(0x7f9c274be000, 22731) = 0
Finally, this munmap is freeing the previously allocated pages for the
ld.so.cache file.
Cala Phew. That was quite a trip. Do I have to remember all this when tracing other programs?
No, this was just a brief explanation of what’s going on when a regular program starts with a dynamic linker. Since most programs use a dynamic linker, it’s good to have a general understanding of what’s going on, but we’ll mostly pay attention to what happens after the dynamic linker has done its job.
glibc localization
This is where our program really starts. Unfortunately, this specific program has a rather boring start. It begins by loading the system’s locale in order to display messages in the correct language and format.
brk(NULL) = 0x55f320282000
brk(0x55f3202a3000) = 0x55f3202a3000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2998, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 4096) = 2998
read(3, "", 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=258, ...}) = 0
mmap(NULL, 258, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c3000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/gconv/gconv-modules.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=27010, ...}) = 0
mmap(NULL, 27010, PROT_READ, MAP_SHARED, 3, 0) = 0x7f9c274b2000
close(3) = 0
futex(0x7f9c2740972c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
mmap(NULL, 23, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c2000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
mmap(NULL, 47, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c1000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
mmap(NULL, 127, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c0000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_NAME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_NAME", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=62, ...}) = 0
mmap(NULL, 62, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274bf000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_PAPER", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=34, ...}) = 0
mmap(NULL, 34, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=48, ...}) = 0
mmap(NULL, 48, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274b1000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=270, ...}) = 0
mmap(NULL, 270, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274b0000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1406, ...}) = 0
mmap(NULL, 1406, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274af000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_TIME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_TIME", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=3360, ...}) = 0
mmap(NULL, 3360, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274ae000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=50, ...}) = 0
mmap(NULL, 50, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274ad000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=367708, ...}) = 0
mmap(NULL, 367708, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c27453000
close(3) = 0 If you pay attention, it has a pattern: it first does openat to open a file,
fstat to get its size, mmap to pull it into memory, and close to close it
(duh). It’ll then use these files to make decisions on what messages to display
to the user, if needed. This specific system is very bare bones; it only has the
default C.UTF-8 locale, as you can guess from the files that are being
accessed.
Most programs don’t actually care about localization (even though they probably should), so you might not see this when tracing other programs. If you want to learn more about the localization process in glibc’s the manual TODO link.
The actual program
Until now, everything has just been library loading and initialization. Mostly boring stuff. But now that everything is ready, we can actually run the code that’s meaningful for our application
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "Hello, world!", 13) = 13
close(1) = 0
close(2) = 0
exit_group(0) = ?
Yep. That’s it. To be fair, printf is a really simple program, so there isn’t
much to to see here. It first does a fstat on stdout to see what kind of
file descriptor it is, and if it’s open. It then writes our message to stdout,
closes stdout and stderr, and exits with 0.
A more interesting program: ping
ping is a classic networking tool. You probably used it at least once to check
if you have an internet connection. Have you ever wondered how it works? After
this article, you’ll know enough to implement your own version of it, if you’d
like.
Let’s run strace ping 127.0.0.1 -c 3
execve("/usr/bin/ping", ["ping", "127.0.0.1", "-c", "3"], 0x7ffd03f3f6a8 /* 18 vars */) = 0
[...Dynamic Linker...]
prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 0
prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 0
prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
brk(NULL) = 0x5555f791a000
brk(0x5555f793b000) = 0x5555f793b000
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP, inheritable=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP}) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
prctl(PR_SET_KEEPCAPS, 1) = 0
getuid() = 0
setuid(0) = 0
prctl(PR_SET_KEEPCAPS, 0) = 0
getuid() = 0
[...glibc localization...]
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_RAW, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP) = 3
socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6) = 4
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_RAW, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(1025), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(53455), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
close(5) = 0
setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(3, SOL_IP, IP_RECVTTL, [1], 4) = 0
setsockopt(3, SOL_IP, IP_RETOPTS, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [131072], [4]) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "PING 127.0.0.1 (127.0.0.1) 56(84"..., 49) = 49
setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO_OLD, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO_OLD, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
rt_sigaction(SIGINT, {sa_handler=0x5555ca2ef2f0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=0x5555ca2ef2f0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=0x5555ca2e9d60, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
ioctl(1, TCGETS2, {c_iflag=ICRNL|IXON|IXOFF|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|B38400<<IBSHIFT|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|ECHOKE, ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=31, ws_col=125, ws_xpixel=1875, ws_ypixel=1023}) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0
sendto(3, "\10\0 \231\377\377\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\33\311\f\320\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040283, tv_usec=947715}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
poll([{fd=3, events=POLLIN}], 1, 999) = 0 (Timeout)
sendto(3, "\10\0\232\223\377\377\0\2\334\264\275i\0\0\0\0\366x\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\225\303\f\320\0\2\334\264\275i\0\0\0\0\366x\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040284, tv_usec=948866}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
poll([{fd=3, events=POLLIN}], 1, 999) = 0 (Timeout)
sendto(3, "\10\0\263\210\377\377\0\3\335\264\275i\0\0\0\0\334\202\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1, tv_usec=0}}, NULL) = 0
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\256\270\f\320\0\3\335\264\275i\0\0\0\0\334\202\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040285, tv_usec=951433}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
write(1, "\n", 1) = 1
write(1, "--- 127.0.0.1 ping statistics --"..., 34) = 34
write(1, "3 packets transmitted, 3 receive"..., 63) = 63
write(1, "rtt min/avg/max/mdev = 0.396/0.4"..., 50) = 50
close(1) = 0
close(2) = 0
exit_group(0) = ? Do note that I omited the dynamic linker and the glibc localization sections.
They’re essentially the same thing as the printf program, so it’s not worth
going over. The rest of the output is directly related to the workings of ping
This is clearly a more complex program than printf. Again, it might seem like
a lot, but after taking a closer look, we can easily break down what’s going on.
Let’s start with the first piece:
Setting up initial capabilities
[...Dynamic Linker...]
prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 0
prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 0
prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
brk(NULL) = 0x5555f791a000
brk(0x5555f793b000) = 0x5555f793b000
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP, inheritable=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP}) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
prctl(PR_SET_KEEPCAPS, 1) = 0
getuid() = 0
setuid(0) = 0
prctl(PR_SET_KEEPCAPS, 0) = 0
getuid() = 0
[...glibc localization...]
Before we begin, a little bit of history. It used to be that a program would
need to be run as root to send ping packets. This requirement changed on Linux
2.6.24, with the introduction of
file-based capabilities,
as you could just mark the binary with the required capabilities. This
requirement once again changed on kernel 2.6.39, with the kernel parameter
net.ipv4.ping_group_range,
that could allow any user to send ping packets without any special permissions.
These details aren’t incredibly important; what matters is to understand that
ping was written with all of these possibilities in mind.
On linux, “capabilities” are one of the many ways the kernel controls what a
process can and cannot do. You can think of them as sets of permissions that
threads have. For example, the CAP_SYS_CHROOT allows a process to change its
root directory; CAP_NET_BROADCAST allows a process to send broadcast packets
to the network; and CAP_NET_RAW allows a process to send raw IP and Ethernet
packets.
CAP_NET_RAW is the one that’s important here. Processes don’t need any special
permissions to send TCP or UDP packets, but ping is neither; it’s an ICMP
packet. In order to send an ICMP packet, processes need to construct a “raw” IP
packet, which require the CAP_NET_RAW capability.
Threads have a few sets of capabilities, where the two most important ones for
this article are the permitted and effective sets. If a capability is in the
effective set, then the thread has all the permissions that capability could
grant; if it’s in the permitted set, then the thread doesn’t necessarily have
the permissions of that capability, but it has the ability to later add that
capability to its effective set. You can think of the permitted set as
capabilities that aren’t active, but can later be activated.
With this in mind, it’s important to note that I ran Ping as root. I made this choice to mimic how Ping used to work as an SUID binary. This will trigger all the capability juggling that Ping used to do, and will make for an interesting study case. A non-root execution of ping will do essentially the same thing, but we won’t be able to see all that Ping does to limit its capabilities.
A suid (Set User ID) program has a special property that, whenever it’s run,
it’ll be run as the user that owns the program. So. If the user john is the
owner of a suid program foo, and alice runs foo, the resulting process
will be owned by john, not alice.
This is mostly used so users can run programs as root without having to become
root themselves. In this case, this means you can run ping without having to
use sudo.
As you can see in the syscalls above, there are a lot of capability checks,
making sure the process can do what it has to do, and at the end, it calls
capset in order to drop all the unnecessary capabilities (all except
CAP_NET_RAW). Also note that the CAP_NET_RAW capability is only set in the
permitted set, but not the effective set, which means the program can’t use
this capability by accident, but can later grant itself that capability when
needed.
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_RAW, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP) = 3
socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6) = 4
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_RAW, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
These syscalls show a pattern that will be prevalent on the rest of the program:
whenever it needs to do an operation that requires a capability, it will first
move that capability to its effective set, do the operation, and immediately
drop that capability from its effective set. This prevents the program from
accidentaly doing anything destructive, and minimizing damage in case the
program has any vulnerabilities.
This block creates two sockets: one as an ICMP socket, and the other as an ICMPv6 socket. We won’t be using the ICMPv6 socket (file descriptor 4), as we’re pinging an IPv4. On the other hand, The regular ICMP socket (file descriptor 3) is very important, as it’s the one that will soon send the actual ping packet.
socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(1025), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(53455), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
close(5) = 0
I’m unsure of what’s going on here. I think it’s doing a check to see if it’s in a specific kind of system. Anyway, it’s irrelevant to the rest of the program.
setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(3, SOL_IP, IP_RECVTTL, [1], 4) = 0
setsockopt(3, SOL_IP, IP_RETOPTS, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [131072], [4]) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "PING 127.0.0.1 (127.0.0.1) 56(84"..., 49) = 49
setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO_OLD, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO_OLD, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
This section is just setting some socket options. They aren’t crucial for the functioning of the program, but they help with better error messages and performance. If you’re curious about what they do, you can read about them on the ip(7) manual page and the socket(7) manual page
rt_sigaction(SIGINT, {sa_handler=0x5555ca2ef2f0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=0x5555ca2ef2f0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=0x5555ca2e9d60, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
It then sets the handler functions for the SIGINT, SIGALARM, and SIGQUIT
signals.
ioctl(1, TCGETS2, {c_iflag=ICRNL|IXON|IXOFF|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|B38400<<IBSHIFT|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|ECHOKE, ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=31, ws_col=125, ws_xpixel=1875, ws_ypixel=1023}) = 0
These lines are just getting information regarding the terimnal connected to sdout. I’m not sure what it does with this information, but presumably it’s useful for other kinds of outputs that aren’t a command line.
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0
This is one last capability drop. Now that we have our ICMP socket set up, we
don’t net CAP_NET_RAW anymore, so we remove it from all sets, including the
permitted set. This means it’s impossible to aquire this capability again.
sendto(3, "\10\0 \231\377\377\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
Now we’re getting into the interesting part. We’re finally sending some packets! Let’s have a quick reminder of how an ICMP Echo (ping) looks like, From RFC 792:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type (8) | Code (0) | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+-
For a ping request, the type field is always 8, and code is always 0. The
Identifier and sequence number can be anything you want. Now, let’s look at the
string being given to the sendto syscall:
\10\0 \231\377\377\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37 !\"#$%&'()*+,-./01234567
Note that in the original trace, the string was hidden in an ellipse. I re-ran
the strace command with the -s 9999 argument for it to expand the string.
Strings in strace are represented in a C escape sequence style. This means
that backslashes followed by a number are an octal representation of a byte. For
example, \10 represents 8 in decimal, and \377 represents 255 in decimal.
Given that this is supposed to be a ping packet, let’s convert that string into
hex format:
0000 08 00 0A 99 ff ff 00 01
0008 db b4 bd 69 00 00 00 00
0010 71 74 0e 00 00 00 00 00
0018 10 11 12 13 14 15 16 17
0020 18 19 1a 1b 1c 1d 1e 1f
0028 20 21 22 23 24 25 26 27
0030 28 29 2a 2b 2c 2d 2e 2f
0038 30 31 32 33 34 35 36 37
Now it’s starting to make sense! The first and second bytes match the 8 and 0
values that we expected for an echo packet. Then, there’s the checksum
(0A 99), the Identifier (ff ff), and the sequence number (00 01).
Everything after that is the data.
The data section is meaningless from the perspective of the ICMP protocol. Some
implementations of ping just fill it up with random stuff. But after fiddling
a bit with the numbers above, I figured out what some of them mean.
From byte 8 to byte 15 (db b4 bd 69 00 00 00 00) we have a little-endian unix
timestamp in seconds. Transforming he above string to decimal, we get
1774040283, and transforming it to a date we get 2026-03-20 20:58:03, the
exact date I ran this ping program.
From byte 16 to 23 (71 74 0e 00 00 00 00 00) is the microseconds portion of
the timestamp, also in little-endian. In this case, it’s 947313us (TODO - add
micro unicode).
Byte 24 to 64 is always this sequence of numbers that increase by one. I’m not sure why.
This is the whole packet that’s constructed by Ping, and sent to our previously created ICMP socket. The next step is to get a reply:
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\33\311\f\320\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040283, tv_usec=947715}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
Let me prettify this so it’s easier to read:
recvmsg(3, {
msg_name={
sa_family=AF_INET,
sin_port=htons(0),
sin_addr=inet_addr("127.0.0.1")
},
msg_namelen=128 => 16,
msg_iov=[
{
iov_base="\0\0\33\311\f\320\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37 !\"#$%&'()*+,-./01234567",
iov_len=192
}
],
msg_iovlen=1,
msg_control=[
{
cmsg_len=32,
cmsg_level=SOL_SOCKET,
cmsg_type=SO_TIMESTAMP_OLD,
cmsg_data={
tv_sec=1774040283,
tv_usec=947715
947313
}
},
{
cmsg_len=20,
cmsg_level=SOL_IP,
cmsg_type=IP_TTL,
cmsg_data=[64]
}
],
msg_controllen=56,
msg_flags=0
}, 0) = 64
Alright, looking good! msg_name tells us we’re getting an AF_INET packet
(IPv4) from 127.0.0.1 to a null port (ICMP packets don’t have ports).
msg_iov has the actual contents of the packet. Again, it was truncated in the
original strace output; I added the rest myself after re-running it with
-s 99999. msg_control has some meta-information regarding the packet, which
was requested when seeting the socket options (TODO - add link to previous
section). These control messages tells the timestamp that the packet arrived,
and its TTL (Time to Live). Note that the timestamp here is also split into the
seconds part and microseconds part. The seconds part is the exact same as the
seconds sent in the original packet. The microseconds part, however, is 402
microseconds greater than our original, which immediately tells us the roundtrip
time.
Now, decoding the contens of msg_iov, we can see this packet:
0000 00 00 1b c9 0c d0 00 01
0008 db b4 bd 69 00 00 00 00
0010 71 74 0e 00 00 00 00 00
0018 10 11 12 13 14 15 16 17
0020 18 19 1a 1b 1c 1d 1e 1f
0028 20 21 22 23 24 25 26 27
0030 28 29 2a 2b 2c 2d 2e 30
0038 31 32 32 33 34 35 36 37
The only thing that’s different here is the type byte, which is 0 on an echo
response. The rest of the packet should be very familiar: the checksum
(1b c9), the identifier (0c d0), and the sequence number (00 01). The rest
of the packet is the data that was sent on the previous instruction. It’s all
making sense here
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
poll([{fd=3, events=POLLIN}], 1, 999) = 0 (Timeout)
Ping then reports the time it took from packet sent to packet received, and waits for the socket to become available to send again.
sendto(3, "\10\0\232\223\377\377\0\2\334\264\275i\0\0\0\0\366x\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\225\303\f\320\0\2\334\264\275i\0\0\0\0\366x\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040284, tv_usec=948866}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
poll([{fd=3, events=POLLIN}], 1, 999) = 0 (Timeout)
Ping then just repeats the previous steps: send a ICMP Echo Request packet, receives an ICMP Echo Response packet, reports the time it took between send and receive, and waits for the socket to be available again. The only meaningful difference is that the sequence number here is 2 instead of 1, as this is the second packet.
sendto(3, "\10\0\263\210\377\377\0\3\335\264\275i\0\0\0\0\334\202\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1, tv_usec=0}}, NULL) = 0
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\256\270\f\320\0\3\335\264\275i\0\0\0\0\334\202\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040285, tv_usec=951433}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
This part should be just another repetition of the precious loop, but it has a small difference: it sets a timer of one second after sending the packet. This is so the program doesn’t hang forever if the last packet is lost.
write(1, "\n", 1) = 1
write(1, "--- 127.0.0.1 ping statistics --"..., 34) = 34
write(1, "3 packets transmitted, 3 receive"..., 63) = 63
write(1, "rtt min/avg/max/mdev = 0.396/0.4"..., 50) = 50
close(1) = 0
close(2) = 0
exit_group(0) = ?
And finally, the program finished, printing relevant statistics of how many packets were dropped, and statistics about their latency. This is the entirety of Ping.
Conclusion
This was an incredibly extensive analysis of printf and ping using strace.
The main idea is not for you to memorise this whole thing, but to give you a
better understanding of what can be accomplished with a careful use of strace