Strace: Understanding Software Through Syscalls

Written on

We’re constantly running software that we don’t exactly understand. We know it does things, useful things, but now always how it gets the necessary information, or what effects it causes on our system. Is it from a file, deep within my hard drive? Is it from the operating system? Or is it fetching from the Internet?

And so, in comes strace. It’s a program that lets you trace (inspect) all the communication between a process and the Linux Kernel. This means you can monitor all the information that comes in and out of your process. It’s an invaluable tool for when a process is misbehaving, or to learn more about how things work.

Nelly, the Nerdy Newt says: Nelly

Just read the source code. Most software in Linux is source available. It’s not that hard.

While the source code is the ultimate source of truth, it’s also very time consuming to read, if you even know how to read it, that is. Most of the time, we don’t care much about each individual instruction a program is doing, only what it’s reading and writing.

In this article, I intend to not only explain strace’s output format, but also to go through the entire output of tracing a regular program, explaining bit by bit of what’s going on. My hope is that you, the reader, will leave this article capable of parsing strace’s output to find what you’re looking for.

First, what does strace do?

Strace is a syscall tracer. It prints every syscall a process does. Without syscalls, a process is just a very complicated electric heater, so inspecting syscalls is crucial for understanding what it does.

What is a syscall?

A system call (abbreviated to syscall) is a request that programs make to the operating system (OS), which in this case is the Linux Kernel. The OS is responsible for managing all resources on a computer, so if a program wants to access any resource, it must ask the OS first.

You can think of a syscall as a function call to the OS. It has arguments and a return value (always an integer). Reading files? open and read syscalls; Writing files? open and write syscalls; Connecting to the internet? socket and connect syscalls.

Running strace is very simple: strace [OPTIONS]... PROGRAM [ARGS].... So, for tracing a program, like printf, you’d do strace printf "hello, world!".

Understanding the output

Let’s try running the line mentioned above: strace printf "Hello, wold!"

$ strace printf "Hello, world!"

execve("/usr/bin/printf", ["printf", "Hello, world!"], 0x7ffdaab444e8 /* 18 vars */) = 0
brk(NULL)                               = 0x55f320282000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=22731, ...}) = 0
mmap(NULL, 22731, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360w\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
fstat(3, {st_mode=S_IFREG|0755, st_size=2145632, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274bc000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
mmap(NULL, 2169904, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9c27200000
mmap(0x7f9c27224000, 1511424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f9c27224000
mmap(0x7f9c27395000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f9c27395000
mmap(0x7f9c27404000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x203000) = 0x7f9c27404000
mmap(0x7f9c2740a000, 31792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9c2740a000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274b9000
arch_prctl(ARCH_SET_FS, 0x7f9c274b9740) = 0
set_tid_address(0x7f9c274b9a10)         = 2670
set_robust_list(0x7f9c274b9a20, 24)     = 0
rseq(0x7f9c274b9680, 0x20, 0, 0x53053053) = 0
mprotect(0x7f9c27404000, 16384, PROT_READ) = 0
mprotect(0x55f2e229b000, 4096, PROT_READ) = 0
mprotect(0x7f9c27505000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
getrandom("\x2f\x77\x04\x87\x6d\xb5\xb8\xd2", 8, GRND_NONBLOCK) = 8
munmap(0x7f9c274be000, 22731)           = 0
brk(NULL)                               = 0x55f320282000
brk(0x55f3202a3000)                     = 0x55f3202a3000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2998, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 4096) = 2998
read(3, "", 4096)                       = 0
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=258, ...}) = 0
mmap(NULL, 258, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c3000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/gconv/gconv-modules.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=27010, ...}) = 0
mmap(NULL, 27010, PROT_READ, MAP_SHARED, 3, 0) = 0x7f9c274b2000
close(3)                                = 0
futex(0x7f9c2740972c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
mmap(NULL, 23, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c2000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
mmap(NULL, 47, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c1000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
mmap(NULL, 127, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c0000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_NAME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_NAME", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=62, ...}) = 0
mmap(NULL, 62, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274bf000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_PAPER", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=34, ...}) = 0
mmap(NULL, 34, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=48, ...}) = 0
mmap(NULL, 48, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274b1000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=270, ...}) = 0
mmap(NULL, 270, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274b0000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1406, ...}) = 0
mmap(NULL, 1406, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274af000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_TIME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_TIME", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=3360, ...}) = 0
mmap(NULL, 3360, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274ae000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=50, ...}) = 0
mmap(NULL, 50, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274ad000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=367708, ...}) = 0
mmap(NULL, 367708, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c27453000
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "Hello, world!", 13)           = 13
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

This was run on an x86_64 Arch Linux distribution.

Each line in the output is a syscall made by the program, printf. It has the syscall name, the arguments, and the return value. If there was an error, it’ll also helpfully append the errno symbol to the end of the line.

For example, the line access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) shows the syscall access, called with the arguments "/etc/ld.so.preload" and R_OK, and returned -1, which means the file was not found.

An important detail: strace will print the syscall arguments after the syscall is done. This means it will show the final value of any buffers or structs modified by the kernel. This is important for syscalls like read and fstat, which fills up the buffer passed to them as an argument.

This information is a great start, but this output just looks like a huge wall of text. It’s barely comprehensible. So what I’ll try to do here is break it down into smaller chunks and try to explain what’s going on. My hope is that you, the reader, will leave this article capable of understanding strace’s output as a whole, not line by line.

The exec

The first syscall is always exec. This is because strace will start tracing as soon as the process forks, which happens before the accompanying exec. Here’s the line for the syscall:

execve("/usr/bin/printf", ["printf", "Hello, world!"], 0x7ffdaab444e8 /* 18 vars */) = 0

fork? exec? What are those?

fork asks the kernel to duplicate the current process. exec asks the kernel to replace the current process with a different program, specified at the path provided as the first argument.

So strace will clones itself with fork, and the process being traced will then call exec to become the process we specified in the command line (in this case, printf)

The Dynamic Linker (aka Runtime Loader)

The dynamic linker is in itself very complicated and very out of scope for this article, but it does make a lot of syscalls, which show up in strace. We’ll cover just the basics of it; enough to explain its syscalls.

Some libraries are so common that almost every sigle program depends on them (ex: libc). These libraries could be embeded into the programs, but that would cause almost every program running to have a duplicate of the same libary, wasting disk space and ram. Instead, what programs do is reference these libraries by name, but not include the actual code of the library. When the program starts, it executes a small subroutine that fetches the contents of library’s contents from the filesystem, loads them into the process’s address space, and fixes up any references to the library in the program to the correct addresses. This subroutine is the dynamic linker.

Do note that not all programs use the dynamic linker. A program with no runtime dependencies is called a static program, and does not invoke the dynamic linker.

With this in mind, let’s take a look at the output of strace that’s related to the dynamic linker:

brk(NULL)                               = 0x55f320282000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=22731, ...}) = 0
mmap(NULL, 22731, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360w\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
fstat(3, {st_mode=S_IFREG|0755, st_size=2145632, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274bc000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
mmap(NULL, 2169904, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9c27200000
mmap(0x7f9c27224000, 1511424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f9c27224000
mmap(0x7f9c27395000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f9c27395000
mmap(0x7f9c27404000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x203000) = 0x7f9c27404000
mmap(0x7f9c2740a000, 31792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9c2740a000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274b9000
arch_prctl(ARCH_SET_FS, 0x7f9c274b9740) = 0
set_tid_address(0x7f9c274b9a10)         = 2670
set_robust_list(0x7f9c274b9a20, 24)     = 0
rseq(0x7f9c274b9680, 0x20, 0, 0x53053053) = 0
mprotect(0x7f9c27404000, 16384, PROT_READ) = 0
mprotect(0x55f2e229b000, 4096, PROT_READ) = 0
mprotect(0x7f9c27505000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
getrandom("\x2f\x77\x04\x87\x6d\xb5\xb8\xd2", 8, GRND_NONBLOCK) = 8
munmap(0x7f9c274be000, 22731)           = 0

This still looks like an incomprehensible wall of text, so let’s break it into smaller chunks:

brk(NULL)                               = 0x55f320282000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=22731, ...}) = 0
mmap(NULL, 22731, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3)                                = 0

The first thing it tries to do is load the ld.so.preload file, which is missing, and then the ld.so.cache file, which is found and then mmaped into memory. The ld.so.cache file assists the dynamic linker in finding the location for the libraries it’ll later load.

What are memory pages? And mmap?

From the point of view of a process, a memory page is a block of continuous non-overlapping memory (usually 4096 bytes or 0x1000, but can vary from system to system). There are more details regarding virtual memory but it’s out of scope for this article. When allocating/deallocating memory, the kernel doesn’t track individual bytes, only memory pages.

mmap is a system call that allocates a new page to the process:

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

It has two modes relevant to this article: one for allocating file-backed pages, and one for allocating anonymous pages. File-backed pages are initialized with the contents of the file associated with the fd file descriptor, starting at offset. Anonymous pages are zero-initialized and are not associated with any file.

openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360w\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896
fstat(3, {st_mode=S_IFREG|0755, st_size=2145632, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274bc000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 896, 64) = 896

TODO - add proper link to wikipedia program headers

This is when things start getting interesting, the dynamic linker is loading its first (and, in this case, the only) library: libc.so.6. This is the famous libc, which in this case is (correctly) assumed to be located at /usr/lib/. The first read will load the first 832 bytes of the file on a buffer. This is a guess made by the glibc authors of how long the program headers will be. Since libc is usually huge, 832 is not enough, so it later does a pread call to load 896 bytes from the library, which is presumably the size of all the necessary headers.

The fstat call reads some information about the library, most importantly the file size, as it’ll be used soon. It then allocates 8192 bytes of memory (two memory pages in my system) through mmap, presumably to store metadata about the loaded libraries. Then, for some reason, it does another pread syscall to a part of the library it previously read, which is honestly beyond me.

mmap(NULL, 2169904, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9c27200000
mmap(0x7f9c27224000, 1511424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f9c27224000
mmap(0x7f9c27395000, 454656, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f9c27395000
mmap(0x7f9c27404000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x203000) = 0x7f9c27404000
mmap(0x7f9c2740a000, 31792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9c2740a000
close(3)                                = 0

These mmaps are part of the library loading process. The first mmap will map the entire library into the process’s memory space, and the following mmaps will adjust the necessary permissions for the segments of the library. Lastly, it closes the library file.

mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9c274b9000
arch_prctl(ARCH_SET_FS, 0x7f9c274b9740) = 0
set_tid_address(0x7f9c274b9a10)         = 2670
set_robust_list(0x7f9c274b9a20, 24)     = 0
rseq(0x7f9c274b9680, 0x20, 0, 0x53053053) = 0

These syscalls are responsible for setting up the thread-local memory used for multithreading. It’s really specific stuff that’s not really worth going into.

mprotect(0x7f9c27404000, 16384, PROT_READ) = 0
mprotect(0x55f2e229b000, 4096, PROT_READ) = 0
mprotect(0x7f9c27505000, 8192, PROT_READ) = 0

mprotect sets access protection to previously allocated memory pages. These pages were previously allocated with write access, but are now being set to read-only.

prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
getrandom("\x2f\x77\x04\x87\x6d\xb5\xb8\xd2", 8, GRND_NONBLOCK) = 8

The prlimit64 is reading the current maximum stack size of the process, and getrandom is fetching 8 random bytes. I’m not completely sure what this is for, but I assume it’s for some sort of initialization.

munmap(0x7f9c274be000, 22731)           = 0

Finally, this munmap is freeing the previously allocated pages for the ld.so.cache file.

Cala, the Curious Capybara says: Cala

Phew. That was quite a trip. Do I have to remember all this when tracing other programs?

No, this was just a brief explanation of what’s going on when a regular program starts with a dynamic linker. Since most programs use a dynamic linker, it’s good to have a general understanding of what’s going on, but we’ll mostly pay attention to what happens after the dynamic linker has done its job.

glibc localization

This is where our program really starts. Unfortunately, this specific program has a rather boring start. It begins by loading the system’s locale in order to display messages in the correct language and format.

brk(NULL)                               = 0x55f320282000
brk(0x55f3202a3000)                     = 0x55f3202a3000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2998, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 4096) = 2998
read(3, "", 4096)                       = 0
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=258, ...}) = 0
mmap(NULL, 258, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c3000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/gconv/gconv-modules.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=27010, ...}) = 0
mmap(NULL, 27010, PROT_READ, MAP_SHARED, 3, 0) = 0x7f9c274b2000
close(3)                                = 0
futex(0x7f9c2740972c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
mmap(NULL, 23, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c2000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_TELEPHONE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
mmap(NULL, 47, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c1000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_ADDRESS", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
mmap(NULL, 127, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274c0000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_NAME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_NAME", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=62, ...}) = 0
mmap(NULL, 62, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274bf000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_PAPER", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_PAPER", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=34, ...}) = 0
mmap(NULL, 34, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274be000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=48, ...}) = 0
mmap(NULL, 48, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274b1000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_MONETARY", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=270, ...}) = 0
mmap(NULL, 270, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274b0000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_COLLATE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1406, ...}) = 0
mmap(NULL, 1406, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274af000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_TIME", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_TIME", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=3360, ...}) = 0
mmap(NULL, 3360, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274ae000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_NUMERIC", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=50, ...}) = 0
mmap(NULL, 50, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c274ad000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=367708, ...}) = 0
mmap(NULL, 367708, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9c27453000
close(3)                                = 0

If you pay attention, it has a pattern: it first does openat to open a file, fstat to get its size, mmap to pull it into memory, and close to close it (duh). It’ll then use these files to make decisions on what messages to display to the user, if needed. This specific system is very bare bones; it only has the default C.UTF-8 locale, as you can guess from the files that are being accessed.

Most programs don’t actually care about localization (even though they probably should), so you might not see this when tracing other programs. If you want to learn more about the localization process in glibc’s the manual TODO link.

The actual program

Until now, everything has just been library loading and initialization. Mostly boring stuff. But now that everything is ready, we can actually run the code that’s meaningful for our application

fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "Hello, world!", 13)           = 13
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

Yep. That’s it. To be fair, printf is a really simple program, so there isn’t much to to see here. It first does a fstat on stdout to see what kind of file descriptor it is, and if it’s open. It then writes our message to stdout, closes stdout and stderr, and exits with 0.

A more interesting program: ping

ping is a classic networking tool. You probably used it at least once to check if you have an internet connection. Have you ever wondered how it works? After this article, you’ll know enough to implement your own version of it, if you’d like.

Let’s run strace ping 127.0.0.1 -c 3

execve("/usr/bin/ping", ["ping", "127.0.0.1", "-c", "3"], 0x7ffd03f3f6a8 /* 18 vars */) = 0
[...Dynamic Linker...]
prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 0
prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 0
prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
brk(NULL)                               = 0x5555f791a000
brk(0x5555f793b000)                     = 0x5555f793b000
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP, inheritable=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP}) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
prctl(PR_SET_KEEPCAPS, 1)               = 0
getuid()                                = 0
setuid(0)                               = 0
prctl(PR_SET_KEEPCAPS, 0)               = 0
getuid()                                = 0
[...glibc localization...]
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_RAW, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP) = 3
socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6) = 4
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_RAW, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(1025), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(53455), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
close(5)                                = 0
setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(3, SOL_IP, IP_RECVTTL, [1], 4) = 0
setsockopt(3, SOL_IP, IP_RETOPTS, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [131072], [4]) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "PING 127.0.0.1 (127.0.0.1) 56(84"..., 49) = 49
setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO_OLD, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO_OLD, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
rt_sigaction(SIGINT, {sa_handler=0x5555ca2ef2f0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=0x5555ca2ef2f0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=0x5555ca2e9d60, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
ioctl(1, TCGETS2, {c_iflag=ICRNL|IXON|IXOFF|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|B38400<<IBSHIFT|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|ECHOKE, ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=31, ws_col=125, ws_xpixel=1875, ws_ypixel=1023}) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0
sendto(3, "\10\0 \231\377\377\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\33\311\f\320\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040283, tv_usec=947715}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
poll([{fd=3, events=POLLIN}], 1, 999)   = 0 (Timeout)
sendto(3, "\10\0\232\223\377\377\0\2\334\264\275i\0\0\0\0\366x\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\225\303\f\320\0\2\334\264\275i\0\0\0\0\366x\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040284, tv_usec=948866}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
poll([{fd=3, events=POLLIN}], 1, 999)   = 0 (Timeout)
sendto(3, "\10\0\263\210\377\377\0\3\335\264\275i\0\0\0\0\334\202\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1, tv_usec=0}}, NULL) = 0
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\256\270\f\320\0\3\335\264\275i\0\0\0\0\334\202\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040285, tv_usec=951433}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
write(1, "\n", 1)                       = 1
write(1, "--- 127.0.0.1 ping statistics --"..., 34) = 34
write(1, "3 packets transmitted, 3 receive"..., 63) = 63
write(1, "rtt min/avg/max/mdev = 0.396/0.4"..., 50) = 50
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

Do note that I omited the dynamic linker and the glibc localization sections. They’re essentially the same thing as the printf program, so it’s not worth going over. The rest of the output is directly related to the workings of ping

This is clearly a more complex program than printf. Again, it might seem like a lot, but after taking a closer look, we can easily break down what’s going on. Let’s start with the first piece:

Setting up initial capabilities

[...Dynamic Linker...]
prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 0
prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 0
prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
brk(NULL)                               = 0x5555f791a000
brk(0x5555f793b000)                     = 0x5555f793b000
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP, inheritable=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_RAW|1<<CAP_IPC_OWNER|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP}) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
prctl(PR_SET_KEEPCAPS, 1)               = 0
getuid()                                = 0
setuid(0)                               = 0
prctl(PR_SET_KEEPCAPS, 0)               = 0
getuid()                                = 0
[...glibc localization...]

Before we begin, a little bit of history. It used to be that a program would need to be run as root to send ping packets. This requirement changed on Linux 2.6.24, with the introduction of file-based capabilities, as you could just mark the binary with the required capabilities. This requirement once again changed on kernel 2.6.39, with the kernel parameter net.ipv4.ping_group_range, that could allow any user to send ping packets without any special permissions. These details aren’t incredibly important; what matters is to understand that ping was written with all of these possibilities in mind.

What are Linux Capabilities?

On linux, “capabilities” are one of the many ways the kernel controls what a process can and cannot do. You can think of them as sets of permissions that threads have. For example, the CAP_SYS_CHROOT allows a process to change its root directory; CAP_NET_BROADCAST allows a process to send broadcast packets to the network; and CAP_NET_RAW allows a process to send raw IP and Ethernet packets.

CAP_NET_RAW is the one that’s important here. Processes don’t need any special permissions to send TCP or UDP packets, but ping is neither; it’s an ICMP packet. In order to send an ICMP packet, processes need to construct a “raw” IP packet, which require the CAP_NET_RAW capability.

Threads have a few sets of capabilities, where the two most important ones for this article are the permitted and effective sets. If a capability is in the effective set, then the thread has all the permissions that capability could grant; if it’s in the permitted set, then the thread doesn’t necessarily have the permissions of that capability, but it has the ability to later add that capability to its effective set. You can think of the permitted set as capabilities that aren’t active, but can later be activated.

With this in mind, it’s important to note that I ran Ping as root. I made this choice to mimic how Ping used to work as an SUID binary. This will trigger all the capability juggling that Ping used to do, and will make for an interesting study case. A non-root execution of ping will do essentially the same thing, but we won’t be able to see all that Ping does to limit its capabilities.

What is SUID?

A suid (Set User ID) program has a special property that, whenever it’s run, it’ll be run as the user that owns the program. So. If the user john is the owner of a suid program foo, and alice runs foo, the resulting process will be owned by john, not alice.

This is mostly used so users can run programs as root without having to become root themselves. In this case, this means you can run ping without having to use sudo.

As you can see in the syscalls above, there are a lot of capability checks, making sure the process can do what it has to do, and at the end, it calls capset in order to drop all the unnecessary capabilities (all except CAP_NET_RAW). Also note that the CAP_NET_RAW capability is only set in the permitted set, but not the effective set, which means the program can’t use this capability by accident, but can later grant itself that capability when needed.

capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_RAW, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP) = 3
socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6) = 4
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_NET_RAW, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=1<<CAP_NET_RAW, inheritable=0}) = 0

These syscalls show a pattern that will be prevalent on the rest of the program: whenever it needs to do an operation that requires a capability, it will first move that capability to its effective set, do the operation, and immediately drop that capability from its effective set. This prevents the program from accidentaly doing anything destructive, and minimizing damage in case the program has any vulnerabilities.

This block creates two sockets: one as an ICMP socket, and the other as an ICMPv6 socket. We won’t be using the ICMPv6 socket (file descriptor 4), as we’re pinging an IPv4. On the other hand, The regular ICMP socket (file descriptor 3) is very important, as it’s the one that will soon send the actual ping packet.

socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(1025), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(53455), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
close(5)                                = 0

I’m unsure of what’s going on here. I think it’s doing a check to see if it’s in a specific kind of system. Anyway, it’s irrelevant to the rest of the program.

setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(3, SOL_IP, IP_RECVTTL, [1], 4) = 0
setsockopt(3, SOL_IP, IP_RETOPTS, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [131072], [4]) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}) = 0
write(1, "PING 127.0.0.1 (127.0.0.1) 56(84"..., 49) = 49
setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_SNDTIMEO_OLD, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO_OLD, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0

This section is just setting some socket options. They aren’t crucial for the functioning of the program, but they help with better error messages and performance. If you’re curious about what they do, you can read about them on the ip(7) manual page and the socket(7) manual page

rt_sigaction(SIGINT, {sa_handler=0x5555ca2ef2f0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigaction(SIGALRM, {sa_handler=0x5555ca2ef2f0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=0x5555ca2e9d60, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fcbd6e3e4d0}, NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

It then sets the handler functions for the SIGINT, SIGALARM, and SIGQUIT signals.

ioctl(1, TCGETS2, {c_iflag=ICRNL|IXON|IXOFF|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|B38400<<IBSHIFT|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|ECHOKE, ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=31, ws_col=125, ws_xpixel=1875, ws_ypixel=1023}) = 0

These lines are just getting information regarding the terimnal connected to sdout. I’m not sure what it does with this information, but presumably it’s useful for other kinds of outputs that aren’t a command line.

capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0

This is one last capability drop. Now that we have our ICMP socket set up, we don’t net CAP_NET_RAW anymore, so we remove it from all sets, including the permitted set. This means it’s impossible to aquire this capability again.

sendto(3, "\10\0 \231\377\377\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64

Now we’re getting into the interesting part. We’re finally sending some packets! Let’s have a quick reminder of how an ICMP Echo (ping) looks like, From RFC 792:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Type (8)  |     Code (0)  |          Checksum             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Identifier          |        Sequence Number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Data ...
+-+-+-+-+-

For a ping request, the type field is always 8, and code is always 0. The Identifier and sequence number can be anything you want. Now, let’s look at the string being given to the sendto syscall:

\10\0 \231\377\377\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37 !\"#$%&'()*+,-./01234567

Note that in the original trace, the string was hidden in an ellipse. I re-ran the strace command with the -s 9999 argument for it to expand the string.

Strings in strace are represented in a C escape sequence style. This means that backslashes followed by a number are an octal representation of a byte. For example, \10 represents 8 in decimal, and \377 represents 255 in decimal. Given that this is supposed to be a ping packet, let’s convert that string into hex format:

0000 08 00 0A 99 ff ff 00 01
0008 db b4 bd 69 00 00 00 00
0010 71 74 0e 00 00 00 00 00
0018 10 11 12 13 14 15 16 17
0020 18 19 1a 1b 1c 1d 1e 1f
0028 20 21 22 23 24 25 26 27
0030 28 29 2a 2b 2c 2d 2e 2f
0038 30 31 32 33 34 35 36 37

Now it’s starting to make sense! The first and second bytes match the 8 and 0 values that we expected for an echo packet. Then, there’s the checksum (0A 99), the Identifier (ff ff), and the sequence number (00 01). Everything after that is the data.

The data section is meaningless from the perspective of the ICMP protocol. Some implementations of ping just fill it up with random stuff. But after fiddling a bit with the numbers above, I figured out what some of them mean.

From byte 8 to byte 15 (db b4 bd 69 00 00 00 00) we have a little-endian unix timestamp in seconds. Transforming he above string to decimal, we get 1774040283, and transforming it to a date we get 2026-03-20 20:58:03, the exact date I ran this ping program.

From byte 16 to 23 (71 74 0e 00 00 00 00 00) is the microseconds portion of the timestamp, also in little-endian. In this case, it’s 947313us (TODO - add micro unicode).

Byte 24 to 64 is always this sequence of numbers that increase by one. I’m not sure why.

This is the whole packet that’s constructed by Ping, and sent to our previously created ICMP socket. The next step is to get a reply:

recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\33\311\f\320\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040283, tv_usec=947715}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64

Let me prettify this so it’s easier to read:

recvmsg(3, {
    msg_name={
        sa_family=AF_INET,
        sin_port=htons(0),
        sin_addr=inet_addr("127.0.0.1")
    },
    msg_namelen=128 => 16,
    msg_iov=[
        {
            iov_base="\0\0\33\311\f\320\0\1\333\264\275i\0\0\0\0qt\16\0\0\0\0\0\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37 !\"#$%&'()*+,-./01234567",
            iov_len=192
        }
    ],
    msg_iovlen=1,
    msg_control=[
        {
            cmsg_len=32,
            cmsg_level=SOL_SOCKET,
            cmsg_type=SO_TIMESTAMP_OLD,
            cmsg_data={
                tv_sec=1774040283,
                tv_usec=947715
                        947313
            }
        },
        {
            cmsg_len=20,
            cmsg_level=SOL_IP,
            cmsg_type=IP_TTL,
            cmsg_data=[64]
        }
    ],
    msg_controllen=56,
    msg_flags=0
}, 0) = 64

Alright, looking good! msg_name tells us we’re getting an AF_INET packet (IPv4) from 127.0.0.1 to a null port (ICMP packets don’t have ports). msg_iov has the actual contents of the packet. Again, it was truncated in the original strace output; I added the rest myself after re-running it with -s 99999. msg_control has some meta-information regarding the packet, which was requested when seeting the socket options (TODO - add link to previous section). These control messages tells the timestamp that the packet arrived, and its TTL (Time to Live). Note that the timestamp here is also split into the seconds part and microseconds part. The seconds part is the exact same as the seconds sent in the original packet. The microseconds part, however, is 402 microseconds greater than our original, which immediately tells us the roundtrip time.

Now, decoding the contens of msg_iov, we can see this packet:

0000 00 00 1b c9 0c d0 00 01
0008 db b4 bd 69 00 00 00 00
0010 71 74 0e 00 00 00 00 00
0018 10 11 12 13 14 15 16 17
0020 18 19 1a 1b 1c 1d 1e 1f
0028 20 21 22 23 24 25 26 27
0030 28 29 2a 2b 2c 2d 2e 30
0038 31 32 32 33 34 35 36 37

The only thing that’s different here is the type byte, which is 0 on an echo response. The rest of the packet should be very familiar: the checksum (1b c9), the identifier (0c d0), and the sequence number (00 01). The rest of the packet is the data that was sent on the previous instruction. It’s all making sense here

write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
poll([{fd=3, events=POLLIN}], 1, 999)   = 0 (Timeout)

Ping then reports the time it took from packet sent to packet received, and waits for the socket to become available to send again.

sendto(3, "\10\0\232\223\377\377\0\2\334\264\275i\0\0\0\0\366x\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\225\303\f\320\0\2\334\264\275i\0\0\0\0\366x\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040284, tv_usec=948866}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57
poll([{fd=3, events=POLLIN}], 1, 999)   = 0 (Timeout)

Ping then just repeats the previous steps: send a ICMP Echo Request packet, receives an ICMP Echo Response packet, reports the time it took between send and receive, and waits for the socket to be available again. The only meaningful difference is that the sequence number here is 2 instead of 1, as this is the second packet.

sendto(3, "\10\0\263\210\377\377\0\3\335\264\275i\0\0\0\0\334\202\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1, tv_usec=0}}, NULL) = 0
recvmsg(3, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128 => 16, msg_iov=[{iov_base="\0\0\256\270\f\320\0\3\335\264\275i\0\0\0\0\334\202\16\0\0\0\0\0\20\21\22\23\24\25\26\27"..., iov_len=192}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1774040285, tv_usec=951433}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, cmsg_data=[64]}], msg_controllen=56, msg_flags=0}, 0) = 64
write(1, "64 bytes from 127.0.0.1: icmp_se"..., 57) = 57

This part should be just another repetition of the precious loop, but it has a small difference: it sets a timer of one second after sending the packet. This is so the program doesn’t hang forever if the last packet is lost.

write(1, "\n", 1)                       = 1
write(1, "--- 127.0.0.1 ping statistics --"..., 34) = 34
write(1, "3 packets transmitted, 3 receive"..., 63) = 63
write(1, "rtt min/avg/max/mdev = 0.396/0.4"..., 50) = 50
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

And finally, the program finished, printing relevant statistics of how many packets were dropped, and statistics about their latency. This is the entirety of Ping.

Conclusion

This was an incredibly extensive analysis of printf and ping using strace. The main idea is not for you to memorise this whole thing, but to give you a better understanding of what can be accomplished with a careful use of strace