Behind the scenes of shell IO redirection

In the day to day toils on a command-line, it can be easy to overlook some of the complexities that lie behind many of the constructs used pervasively.

In a POSIX shell, one such construct is the ability to pipe between, as well as redirect input and output of various commands with <, > and |.

If we stop to smell the roses, the question thus becomes; How does it actually work?

As an example, have you ever wondered, what actually happens under the hood, when you write a command like this?

cat foo.txt > bar.txt

That’s what we’ll take a look at in this post.

dtruss

In order for us to look into the belly of the beast so to speak, we’ll need a tool to monitor system calls for a given process.

Since I’m doing this on an OS X system, the tool of choice is dtruss, a DTrace version of truss. On Linux strace can be used instead.

If you’re not interested in trying this out for yourself, skip on ahead to the Inspection section.

Preflight checklist

By default dtruss doesn’t work because of the System Integrity Protection (SIP), security feature of OS X. If you try to attach to a running process, you’ll get this error message from dtrace initially:

$ sudo dtruss -f -p 43334
dtrace: system integrity protection is on, some features will not be available

And then the log will be filled with dtrace errors like this, as soon as the process makes any system calls:

dtrace: error on enabled probe ID 2633 (ID 265: syscall::ioctl:return): invalid user access in action #5 at DIF offset 0

In order to work around this problem, it’s possible to disable SIP for dtrace exclusively. Reboot OS X in recovery mode and enter the following command in a terminal:

csrutil enable --without dtrace

You’ll see the following warning message:

This is an unsupported configuration, likely to break in the future and leave your machine in an unknown state.

That’s ok for now. Restoring the default configuration later can be done with:

csrutil enable

Reboot to normal mode again and open a terminal.

Noise reduction

To reduce the amount of unrelated events in the output from dtruss, it’s a good idea to run commands in a minimal environment without various hooks and other modern shell niceties.

Starting up, e.g. a new instance of bash, without inheriting the parent environment and loading a profile or rc, can be done like so:

env -i bash --noprofile --norc

Take-off

In the minimal bash instance just started, get the process ID of the shell:

bash-3.2$ echo $$
529

Now we’re ready to start monitoring. Open up a separate shell; note this doesn’t have to be minimal like above. Start up dtruss, attaching it to the bash process:

$ sudo dtruss -p 529 -f

The -f here makes sure any forked children is followed as well. If all went well, you’ll see this header appear:

PID/THRD SYSCALL(args) = return

Now we’re ready to issue our command with output redirection.

Run

I’m using this small test file in this example, but any file will do really:

Back in our minimal bash shell, we’ll issue this simple command, redirecting stdout to the file bar.txt:

cat foo.txt > bar.txt

Now let’s take a look at what dtruss has picked up.

Inspection

After running the command, we should see a lot of stuff in the log output from dtruss.

The full output I got from dtruss can be found in this gist. For a better overview, I created a filtered version with irrelevant system calls omitted:

grep -v -E "ioctl|sigaction|sigprocmask|stat64|mprotect" dtruss.log > dtruss.short.log

Here’s the shortened version:

Target file

Quickly skimming the log reveals, that we’re looking at two different process ID / thread ID pairs. Namely 1436/0x5b3d on lines 1-5 and 36-39, as well as 1458/0x5c1d from 6 to 35.

The reason for this, is that the shell utilises a fork-exec approach, for running program binaries, e.g. cat, or anything that isn’t a shell builtin really.

The way it works, is by the parent process, in this case 1436, calling fork. This makes a copy of the current process and continues execution in both, albeit with some important differences.

In the child, fork returns with a value of zero and in the parent, it returns the process id of the forked child. That way it’s determined which of the two will subsequently transform into a new process through one of the exec family of system calls. In this case the dtrace probe is unable to properly trace it, but on line 21 we see an error for an execve call, so that is most likely the one in this case.

From line 6 the log output is coming from the child process. The first lines of interest here is 11-13. Let’s look at them one at a time.

On line 11, we can see an open system call for the file bar.txt returning successfully with a file descriptor value of 3, or 0x3 if you will.

Next on line 12, there is a dup2 call, with the descriptor value for bar.txt and then 0x1.

The man page for dup2 is somewhat awkwardly worded, but in short, this means “change whatever file descriptor 0x1 is pointing to, to whatever file descriptor 0x3 is pointing to”.

We already know 0x3 is a descriptor for bar.txt, but what about 0x1?

In POSIX any process has three standard streams made available by the host system, stdin, stdout and stderr, which by definition have the values 0, 1 and 2.

That means the dup2 call effectively changes the descriptor for stdout to point to the same thing as the descriptor for bar.txt. This is relevant, since cat reads files and writes them to the standard output.

On line 13 there is a close call on the descriptor for bar.txt. Now this may seem weird, since no data has actually been written to the file yet, but keep in mind this is only releasing the file descriptor. It doesn’t do anything to the file itself. Remember the descriptor for stdout now points to bar.txt, so the new descriptor is no longer needed and can just as well be made available to the system again.

Source file

The next lines of interest is 29-33.

On line 29, we again see another open call, but this time for foo.txt. Since the descriptor 0x3 was released on line 13, it is the first one available and is reused here.

On line 30-31 we see a read call on descriptor 0x3, which puts the content of foo.txt into memory, followed by a write on the stdout descriptor. Remembering stdout now points to bar.txt, we can assert the content of foo.txt has been written to bar.txt.

With line 32-33 a final read on the descriptor of foo.txt returns zero, which indicates end-of-file, followed by an immediate close.

On line 35, the last event from the child process closes stdin, with a call to close_nocancel.

Finally, on line 36, we see control return to the parent process with wait4, which waits for the child process to finish.

After this the log trace ends and the command is done.

Recap

So, to come full circle, when you enter a command like this:

cat foo.txt > bar.txt

What really happens behind the scenes, is the following:

  1. A child process is spawned from current process.
    1. The child process is transformed to a new process for cat via an exec type call.
    2. bar.txt is opened for writing, creating a new file descriptor.
    3. The file descriptor for stdout is made to point to bar.txt.
    4. The new descriptor is closed.
    5. foo.txt is opened for reading, creating a new file descriptor.
    6. A read to memory from the new descriptor of foo.txt is done.
    7. A write from memory to the descriptor of stdout is done.
    8. The new descriptor of foo.txt is closed.
    9. The descriptor of stdout is closed.
  2. Parent process waits for child to finish.
  3. Done.

It’s not all magic, but pretty close.

Further reading

References

Ruby on Rails Gotcha: Asynchronous loading of Javascript in development mode

Everyone knows that you shouldn’t block page rendering by synchronously loading a big chunk of javascript in the head of your page right? Hence you might be tempted to change the default Javascript include tag, from this:

<%= javascript_include_tag 'application' %>

To this:

<%= javascript_include_tag 'application', async: true %>

Which makes perfect sense, when serving all Javascript in one big file, as is the case in production, meaning everything is defined at the same time. What about development though?

Well, in development rails is kind enough to let you work on individual Javascript files, which means it will recompile only as needed, when a single file is changed. To this effect, each file is included separately via their own script tag in the header. E.g:

<script src="/assets/jquery-87424--.js?body=1"></script>
<script src="/assets/jquery_ujs-e27bd--.js?body=1"></script>
<script src="/assets/turbolinks-da8dd--.js?body=1"></script>
<script src="/assets/somepage-b57f2--.js?body=1"></script>
<script src="/assets/application-628b3--.js?body=1"></script>

* Tags intentionally shortened in example.

There is a subtlety here that is quite important. All the scripts are loaded synchronously, one after the other, as specified by the order they appear in the application.js manifest. This means we’re guaranteed that jQuery, etc. is available once we get to our own scripts.

Now consider the the same scripts, but with async=true:

<script src="/assets/jquery-87424--.js?body=1" async="async"></script>
<script src="/assets/jquery_ujs-e27bd--.js?body=1" async="async"></script>
<script src="/assets/turbolinks-da8dd--.js?body=1" async="async"></script>
<script src="/assets/somepage-e23b4--.js?body=1" async="async"></script>
<script src="/assets/application-628b3--.js?body=1" async="async"></script>

Since all scripts in this case is loaded *asynchronously*, all previous guarantees are now lost, and we’ll very likely start seeing errors like this:

Uncaught ReferenceError: $ is not defined

Oops!

The fix is simple though: Don’t load Javascript assets asynchronously in development mode!

Here’s one way of doing it:

<%= javascript_include_tag 'application', async: Rails.env.production? %>

Happy hacking!

Page specific Javascript in Rails 3

Premise

One of the neat features from Rails 3.1 and up is the asset pipeline:

The asset pipeline provides a framework to concatenate and minify or compress JavaScript and CSS assets. It also adds the ability to write these assets in other languages such as CoffeeScript, Sass and ERB.

This means that in production, you will have one big Javascript file and also one big CSS file. This reduces the number of request the browser has to make and generally loads the page faster.

In the case of Javascript concatenation however, it does bring about a problem. Executing code when the DOM has loaded is commonplace in most web applications today, but if everything is included in one big file, and more importantly the same file, for all actions on all controllers, how do you run code that is specific to just a single view?

Solution(s)

Obviously there is more than one way of solving this problem, and rather unlike Rails, there doesn’t seem to be any “best practice” dictated. The closest I found is this excerpt from section 2 of the Rails Guide about the Asset Pipeline:

You should put any JavaScript or CSS unique to a controller inside their respective asset files, as these files can then be loaded just for these controllers with lines such as <%= javascript_include_tag params[:controller] %> or <%= stylesheet_link_tag params[:controller] %>.

And it isn’t even followed by an example, which seems more of an indication, that this isn’t something you should do at all.

Let’s start by this example nonetheless.

Per controller inclusion

By default Rails has only one top level Javascript manifest file, namely app/assets/javascripts/application.js which has the following content:

// This is a manifest file that'll be compiled into including all the files listed below.
// Add new JavaScript/Coffee code in separate files in this directory and they'll automatically
// be included in the compiled file accessible from http://example.com/assets/application.js
// It's not advisable to add code directly here, but if you do, it'll appear at the bottom of the
// the compiled file.
//
//= require jquery
//= require jquery_ujs
//= require_tree .

And this is included in the default layout with:

<%= javascript_include_tag "application" %>

N.B. When testing production on localhost, with e.g. rails s -e production, rails by default wont serve static assets, which application.js becomes after pre-compilation, so to avoid any problems when locally testing production, the following setting needs to be changed from false to true in config/environments/production.rb:

# Disable Rails's static asset server (Apache or nginx will already do this)
config.serve_static_assets = true

Now let’s say we have a controller, let’s call it ApplesController, and its corresponding Coffescript file, apples.js.coffee. We might try to include it as per the Rails Guide suggestion like so:

<%= javascript_include_tag params[:controller] %>

And this will work just fine in development mode, but in production produce the following error:

ActionView::Template::Error (apples.js isn't precompiled):

To remedy this, we need to do a couple of things. First off we should remove the require_tree . part from application.js, so we don’t wind up including the same script twice. Just removing the equal sign is enough:

//  require_tree .

To avoid a name clash rename apples.js.coffee to something else, e.g. apples.controller.js.coffee. Then create a new manifest file named apples.js, which includes your coffeescript file:

//= require apples.controller

Lastly, the default configuration of Rails only includes and pre-compiles application.js, so we need to tell the pre-compiler to now also include apples.js. This is also in config/environments/production.rb. Uncomment the following setting, and change search.js to apples.js:

# Precompile additional assets (application.js, application.css, and all non-JS/CSS are already added)
config.assets.precompile += %w( apples.js )

Note that this is a match, so it could also be something like '*.js' in case you have more manifests, which would be the case for per controller inclusion.

Views

The same concept as above could be extended to target individual actions/views of each controller, by having the actions be part of the manifest name. Individual javascript files could then be included like so:

<%= javascript_include_tag "#{params[:controller]}.#{params[:action}" %>

This makes an assumption that all actions on all controllers have a dedicated Javascript file. An assumption which most likely won’t be true in most cases. Another option could be an conditional include like so:

<%= yield :action_specific_js if content_for?(:action_specific_js %>

And then move the include tag to the specific views that need it.

Testing for existence of a page element or class

The DOM loaded event handler could look something like this:

jQuery ->
  if $('#some_element').length > 0
    // Do some stuff here

This could also be a class on body eg.:

jQuery ->
  if $('body.controller_name_action_name').length > 0
    // Do some stuff here

And then then the erb would be like this:

<!DOCTYPE html>
<html>
<head>
  <title>AppName</title>
  <%= stylesheet_link_tag    "application" %>
  <%= javascript_include_tag "application" %>
  <%= csrf_meta_tags %>
</head>
<body class="<%= "#{params[:controller]}_#{params[:action]}" %>">
 
<%= yield %>
 
</body>
</html>

Function encapsulation and on-page triggering

Instead of registering the handlers for DOM loaded, wrap the necessary code in a function that can be called later and then trigger that function directly in the respective view.

There is one thing we need to consider though. All Coffeescript sources for each controller get wrapped in it’s own closed scope, i.e this Coffeescript in apples.js.coffee:

apples_index = ->
  console.log("Hello! Yes, this is Apples.")

Becomes:

((
  function(){
    var a;
    a=function(){
        return console.log("Hello! Yes, this is Apples.")
    }
  }
)).call(this);

So in order for us to have a globally callable function, we must first expose it somehow. We can do this by attaching the function to the window object. Changing the above code like so:

window.exports ||= {}
window.exports.apples_index = ->
  console.log("Hello! Yes, this is Apples.")

If we insert this line in application.html.erb layout just before the closing body tag:

<!DOCTYPE html>                                                                           
<html>                                                                                    
<head>                                                                                    
  <title>AppName</title>                                                                   
  <%= stylesheet_link_tag    "application" %>                                             
  <%= javascript_include_tag "application" %>                                             
  <%= csrf_meta_tags %>                                                                   
</head>                                                                                   
<body>                                                                                    
 
<%= yield %>                                                                              
 
<%= yield :action_specific_js if content_for?(:action_specific_js) %>                     
</body>                                                                                   
</html>

We can now call the exposed function directly from our view like so:

<% content_for :action_specific_js do %>
<script type="text/javascript" language="javascript">
  $(function() { window.exports.apples_index(); });
</script>
<% end %>

Wrap up

Neither of these three examples is a “one-fit-all” solution I would say. Dividing up the Javascript source will start to make sense as soon as the Javascript codebase grows past a certain size. It might be interesting to test out, just how big that size is on a certain bandwidth, but I think that’s out of the scope for this post.

Given the fact that there isn’t really a defined best practice yet, perhaps the ruby community will come up with something better than the examples I presented here. In my opinion I think this is definitely something that could be better thought out.

Speeding up with parallel compression – pbzip2

Premise

Today I found myself in need of archiving some virtual machines, which is quite often rather large. The actual machine I was working on was a 4 core, 8 with HT, Xeon powerhouse and I was curious to see if there was any way to speed up compression times for this particular task.

Looking into things

Usually I always grab for my trusty old friend tar when creating archives and it does get the job done well. The thing about tar though, is that it is inherently single-threaded, so it doesn’t really matter how many CPU cores you throw at it.

After digging around a bit I found pbzip2. Description:

pbzip2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines.

Sounds good right? I decided to try i out and measure the results. The size of the virtual machine was about 11G:

~$ du -hs *
11G     WinXP_32Bit

With just plain old tar it took about eight minutes:

~$ time tar zcvf winxp.tar.gz WinXP_32Bit
 
real    8m18.583s
user    6m47.089s
sys     0m15.129s

Not bad, but that is nothing compared to piping it through pbzip2:

~$ time tar -c WinXP_32Bit | pbzip2 -c > winxp.tar.bz2
 
real    4m54.942s
user    38m22.452s
sys     0m25.022s

Screenshots of htop to show the difference in cpu core utilisation:

Both resulting archives were of equal size, so the immediate benefit is purely speed:

~$ du -hs *
11G     WinXP_32Bit
6.2G    winxp.tar.bz2
6.2G    winxp.tar.gz

For good measure, I also timed the decompression speeds. Though there was still a gain in speed, it was not quite as significant as with compression:

~$ time tar zxvf winxp.tar.gz
 
real    5m8.636s
user    1m20.061s
sys     0m20.413s
~$ time pbzip2 -d winxp.tar.bz2
&lt;
real    4m32.329s
user    13m15.814s
sys     0m19.057s

Some things might be said for a lot of other limiting factors such as disk read/write speed etc. Playing around with different settings of pbzip2 might also reveal greater performance boosts than this simple example, but by standard, it is now a welcome addition to my *nix toolkit.

Web based WIFI analyzer

Update Jan. 11th 2017: The web version of the stumbler is no longer available. An Android app is all that remains; available from Google Play store: https://play.google.com/store/apps/details?id=com.meraki.wifistumbler.


If you’ve ever felt that your wireless was a tad bit sluggish, one of the reasons might be because of interfering signals from your neighbours. Choosing the channel with the least amount of interference is thus the way to go, if you want a good strong signal.

Obviously there is a ton of WIFI analyzer programs out there, but who would bother if you could just open up your browser and do it without needing to install anything?

Enter Meraki Tools | WIFI STUMBLER:

URL: http://tools.meraki.com/stumbler