In the day to day toils on a command-line, it can be easy to overlook the complexities behind many of the constructs you use all the time.
In a POSIX shell, one such construct is the ability to pipe between, as well as redirect input and output of various commands with <, > and |.
Let’s stop and smell the roses, and ask; How does this actually work?
As an example, have you ever wondered, what happens under the hood, when you write a command like this?
cat foo.txt > bar.txt
That’s what we’ll take a look at in this post.
In order for us to look into the belly of the beast, so to speak, we’ll need a tool to monitor system calls for a given process.
Since I’m doing this on an OS X system, the tool of choice is dtruss, a DTrace version of truss. On Linux strace can be used instead.
If you’re not interested in trying this out for yourself, skip on ahead to the Inspection section.
By default dtruss doesn’t work because of the System Integrity Protection (SIP), security feature of OS X. If you try to attach to a running process, you’ll get this error message from dtrace initially:
$ sudo dtruss -f -p 43334
dtrace: system integrity protection is on, some features will not be available
And then the log will be filled with dtrace errors like this, as soon as the process makes any system calls:
dtrace: error on enabled probe ID 2633 (ID 265: syscall::ioctl:return): invalid user access in action #5 at DIF offset 0
In order to work around this problem, it’s possible to disable SIP for dtrace exclusively. Reboot OS X in recovery mode and enter the following command in a terminal:
csrutil enable --without dtrace
You’ll see the following warning message:
This is an unsupported configuration, likely to break in the future and leave your machine in an unknown state.
That’s ok for now. Restoring the default configuration later can be done with:
Reboot to normal mode again and open a terminal.
To reduce the amount of unrelated events in the output from dtruss, it’s a good idea to run commands in a minimal environment without various hooks and other modern shell niceties.
Starting up, e.g. a new instance of bash, without inheriting the parent environment and loading a profile or rc, can be done like so:
env -i bash --noprofile --norc
In the minimal bash instance just started, get the process ID of the shell:
bash-3.2$ echo $$
Now we’re ready to start monitoring. Open up a separate shell; note this doesn’t have to be minimal like above. Start up dtruss, attaching it to the bash process:
$ sudo dtruss -p 529 -f
The -f here makes sure any forked children is followed as well. If all went well, you’ll see this header appear:
PID/THRD SYSCALL(args) = return
Now we’re ready to issue our command with output redirection.
I’m using the following small test file in this example, but any file will do really:
Back in our minimal bash shell, we’ll issue this simple command, redirecting stdout to the file bar.txt:
cat foo.txt > bar.txt
Now let’s take a look at what dtruss has picked up.
After running the command, we should see a lot of stuff in the log output from dtruss.
The full output I got from dtruss can be found in this gist. For a better overview, I created a filtered version with irrelevant system calls omitted:
grep -v -E "ioctl|sigaction|sigprocmask|stat64|mprotect" dtruss.log > dtruss.short.log
Here’s the shortened version:
Quickly skimming the log reveals, that we’re looking at two different process ID / thread ID pairs. Namely 1436/0x5b3d on lines 1-5 and 36-39, as well as 1458/0x5c1d from 6 to 35.
The reason for this, is that the shell utilises a fork-exec approach, for running program binaries, e.g. cat, or anything that isn’t a shell builtin really.
The way it works, is by the parent process, in this case 1436, calling fork. This makes a copy of the current process and continues execution in both, albeit with some important differences.
In the child, fork returns with a value of zero and in the parent, it returns the process id of the forked child. That way it’s determined which of the two will subsequently transform into a new process through one of the exec family of system calls. In this case the dtrace probe is unable to properly trace it, but on line 21 we see an error for an execve call, so that is most likely the one in this case.
From line 6 the log output is coming from the child process. The first lines of interest here is 11-13. Let’s look at them one at a time.
On line 11, we can see an open system call for the file bar.txt returning successfully with a file descriptor value of 3, or 0x3 if you will.
Next on line 12, there is a dup2 call, with the descriptor value for bar.txt and then 0x1.
The man page for dup2 is somewhat awkwardly worded, but in short, this means “change whatever file descriptor 0x1 is pointing to, to whatever file descriptor 0x3 is pointing to”.
We already know 0x3 is a descriptor for bar.txt, but what about 0x1?
In POSIX any process has three standard streams made available by the host system, stdin, stdout and stderr, which by definition have the values 0, 1 and 2.
That means the dup2 call effectively changes the descriptor for stdout to point to the same thing as the descriptor for bar.txt. This is relevant, since cat reads files and writes them to the standard output.
On line 13 there is a close call on the descriptor for bar.txt. Now this may seem weird, since no data has actually been written to the file yet, but keep in mind this is only releasing the file descriptor. It doesn’t do anything to the file itself. Remember the descriptor for stdout now points to bar.txt, so the new descriptor is no longer needed and can just as well be made available to the system again.
The next lines of interest is 29-33.
On line 29, we again see another open call, but this time for foo.txt. Since the descriptor 0x3 was released on line 13, it is the first one available and is reused here.
On line 30-31 we see a read call on descriptor 0x3, which puts the content of foo.txt into memory, followed by a write on the stdout descriptor. Remembering stdout now points to bar.txt, we can assert the content of foo.txt has been written to bar.txt.
With line 32-33 a final read on the descriptor of foo.txt returns zero, which indicates end-of-file, followed by an immediate close.
On line 35, the last event from the child process closes stdin, with a call to close_nocancel.
Finally, on line 36, we see control return to the parent process with wait4, which waits for the child process to finish.
After this the log trace ends and the command is done.
So, to come full circle, when you enter a command like this:
cat foo.txt > bar.txt
What really happens behind the scenes, is the following:
- A child process is spawned from current process.
- The child process is transformed to a new process for cat via an exec type call.
- bar.txt is opened for writing, creating a new file descriptor.
- The file descriptor for stdout is made to point to bar.txt.
- The new descriptor is closed.
- foo.txt is opened for reading, creating a new file descriptor.
- A read to memory from the new descriptor of foo.txt is done.
- A write from memory to the descriptor of stdout is done.
- The new descriptor of foo.txt is closed.
- The descriptor of stdout is closed.
- Parent process waits for child to finish.
It’s not all magic, but pretty close.