Input Output Pipelines

Getting from here to there

Last week I mentioned the | (pipeline) operator, and gave a quick example of how to use it: ls -lS | less.

What that does is take the output of the ls command and sends it to the less command as input. This is what pipelines or “pipes” are for. They allow you to take the output of one program and send it to another program for more processing. You can do this over and over again, for example:

$ ls -la | tr -s " " | cut -d " " -f 3,9 | sort

If we take this piece by piece, we can see how a pipeline works.

The first step ls -la produces a directory listing of the current working directory:

$ ls -la
total 260
drwxr-xr-x  4 gabe  gabe     512 May  4 12:14 .
drwxr-xr-x  3 root  wheel    512 Oct 25  2015 ..
-rw-r--r--  1 gabe  gabe      87 Aug 16  2015 .Xdefaults
-rw-r--r--  1 gabe  gabe     773 Aug 16  2015 .cshrc
-rw-r--r--  1 gabe  gabe     103 Aug 16  2015 .cvsrc
-rw-r--r--  1 gabe  gabe     398 Aug 16  2015 .login
-rw-r--r--  1 gabe  gabe     175 Aug 16  2015 .mailrc
-rw-r--r--  1 gabe  gabe     218 Aug 16  2015 .profile
drwx------  2 gabe  gabe     512 Oct 25  2015 .ssh
-rw-r--r--  1 gabe  gabe   10014 May  4 12:14 dmesg.txt
drwxr-xr-x  2 gabe  gabe     512 Dec  5  2015 etc
-rw-r--r--  1 gabe  gabe   11914 Dec 16  2015 index.html
-rw-------  1 gabe  gabe    5267 Oct 25  2015 mbox
-rw-r--r--  1 gabe  gabe   82300 May  3 13:27 pxeboot

Normally the output of a command will be displayed in the terminal for me to see. If I want to do something with that output other than just look at it, I can “pipe” it to another program using the | operator. In this case, I pipe it to the tr -s " " command.

The tr command “translates” some input and then outputs the translated result. In this case, the -s flag tells translate to “squash” multiple instances of the following character so there is only one. I’m telling the tr command to remove extra spaces in the output of the ls command, so that now my output looks like this:

$ ls -la | tr -s " "
total 260
drwxr-xr-x 4 gabe gabe 512 May 4 12:14 .
drwxr-xr-x 3 root wheel 512 Oct 25 2015 ..
-rw-r--r-- 1 gabe gabe 87 Aug 16 2015 .Xdefaults
-rw-r--r-- 1 gabe gabe 773 Aug 16 2015 .cshrc
-rw-r--r-- 1 gabe gabe 103 Aug 16 2015 .cvsrc
-rw-r--r-- 1 gabe gabe 398 Aug 16 2015 .login
-rw-r--r-- 1 gabe gabe 175 Aug 16 2015 .mailrc
-rw-r--r-- 1 gabe gabe 218 Aug 16 2015 .profile
drwx------ 2 gabe gabe 512 Oct 25 2015 .ssh
-rw-r--r-- 1 gabe gabe 10014 May 4 12:14 dmesg.txt
drwxr-xr-x 2 gabe gabe 512 Dec 5 2015 etc
-rw-r--r-- 1 gabe gabe 11914 Dec 16 2015 index.html
-rw------- 1 gabe gabe 5267 Oct 25 2015 mbox
-rw-r--r-- 1 gabe gabe 82300 May 3 13:27 pxeboot

Notice that all extra spaces in the original output are now just a single space. This is useful since the next command cut allows me to split my input into columns, and only pick the columns I care about. cut -d " " -f 3,9 is saying take the input I am passing you and cut it into columns at every space -d " " then select fields 3 and 9 discarding the rest -f 3,9. So, at this point what I’m left with is:

$ ls -la | tr -s " " | cut -d " " -f 3,9

gabe .
root ..
gabe .Xdefaults
gabe .cshrc
gabe .cvsrc
gabe .login
gabe .mailrc
gabe .profile
gabe .ssh
gabe dmesg.txt
gabe etc
gabe index.html
gabe mbox
gabe pxeboot

Finally, I’d like to sort that list, so I pass it along to the sort program which will just sort the rows by the first column (you can tell it to do many other types of sorting as well by passing different options). So, now our output looks like this:

$ ls -la | tr -s " " | cut -d " " -f 3,9 | sort

gabe .
gabe .Xdefaults
gabe .cshrc
gabe .cvsrc
gabe .login
gabe .mailrc
gabe .profile
gabe .ssh
gabe dmesg.txt
gabe etc
gabe index.html
gabe mbox
gabe pxeboot
root ..

Building a pipeline is often a work of trial and error, you have some input and you would like some output, so you start using the various commands you know to massage the input until it looks like the output you want. The nice thing about this, is you can see the output at any step of the way so you know if you are on track or not. Pipelines are often used to transform data and are especially useful for analyzing log files, dealing with csv (comma separated value) data, or transforming data in one format to another.

New Terms

  • tr - A program to “translate” characters from the given input. For example, to translate all instances of the letter a, to upper case, you would simply write: ls -al | tr a A
  • cut - Used to “cut” out selected columns in a given input, so if I only want column 1 and column 3, of a space separated input, I could say: ls -la | cut -d " " -f 1,3
  • sort - The sort command is used to sort a given input by line.
  • pipe / pipeline - Indicated by the | (vertical bar) operator, it allows you to pass the output of one command as the input to another command.