Sunday, September 30, 2012

Cross your "tee"

In dealing with bash scripts we often need to troubleshoot a data pipeline. Many times I saw injecting tee utility into the pipeline as means to copy data into a log file. Often those tee calls are left behind - as it seem innocuous. It is a pass-through, changes nothing, right?

Not necessarily so. Look at the way we write pipelines:

producer | tee file.log | consumer

The fact that producer sends data to tee through the pipe means that there is a buffer allocated for the data transfer, which is most commonly a mere 64 Kb (4 x16 Kb buffers) size. If the receiving process is not fast enough to take data off of that buffer and the buffer is full, then the producer process will block on the next write. It is a totally reasonable architecture in many cases. However, if you have a fast producer-consumer pair and inject a slow tee between them, you will pay a performance penalty. Look at this very simple scenario.

Here is what in the directory:

$ l
total 19531296
-rwxr-x---  1 vlad  staff  -   38B Jun 21 16:30 cat2null*
-rwxr-x---  1 vlad  staff  -   44B Jun 21 16:22 catcat2null*
-rwxr-x---  1 vlad  staff  -   50B Jun 21 16:24 catcatcat2null*
-rwxr-x---  1 vlad  staff  -   60B Jun 21 16:23 catntee*
-rw-r-----  1 vlad  staff  -  9.3G Jun 21 16:21 large_file
-rwxr-x---  1 vlad  staff  -  343B Jun 21 16:21 mk10g*

Which is a 10 gigabyte file and a few scripts. The large_file consists of 99-char-long lines.

The cat* scripts have a relatively slow producer, reading data from local disk (SSD in my case). We have some very fast consumers, which either promptly discard the data using /dev/null, or put it through other supposedly fast interim pipes.

Note how scripts with supposedly fast pass-through stages take significantly longer with each extra buffered StdIO pipe involved (cache is primed to have comparable conditions):

$ cat large_file >/dev/null; for i in ./cat*; do echo ""; echo "----- $i -----"; cat $i; echo -n "-----"; time $i; done; echo ""

----- ./cat2null -----
#!/bin/bash
cat large_file >/dev/null
-----
real 0m7.929s
user 0m0.027s
sys 0m3.130s

----- ./catcat2null -----
#!/bin/bash
cat large_file | cat >/dev/null
-----
real 0m8.688s
user 0m0.242s
sys 0m6.435s

----- ./catcatcat2null -----
#!/bin/bash
cat large_file | cat | cat >/dev/null
-----
real 0m8.668s
user 0m0.393s
sys 0m9.941s

----- ./catntee -----
#!/bin/bash
cat large_file | tee /dev/null | cat >/dev/null
-----
real 0m9.399s
user 0m0.639s
sys 0m12.258s

It is very telling. Interim cat filters and tee for /dev/null are not perceived as expensive, yet they turn out to consume significantly more system time. I am running on a single-CPU, multi-core MacBook Pro and the scripts do not exhaust available cores. So the wall (real) time does not grow as fast as the system (sys) time does. That is, if about 9% jump for the first pipe introduced is not a big thing for you. Keep in mind, that in case of a busy system, or with a highly concurrent software, where existing cores are tightly scheduled, that sys time will swiftly spill over to real time. And you will not like the slow-down.

I have seen a production process on a twelve-core server speed up eight fold when the logging tee which pointed to an NFS mount was removed. The point is that mindless cruft has a good chance to hurt your system.

Beware. Keep your scripts clean. Remove unnecessary pipe components, especially if they do IO.

P.S. If you would like to reproduce the test on your system, here is the mk10g script to generate that 10 gigabyte test file:

#!/bin/bash
d="1234567890123456789"
l=${d}0${d}0${d}0${d}0${d}
k="${l}\n${l}\n${l}\n${l}\n${l}\n${l}\n${l}\n${l}\n${l}\n${l}\n"

rm -f 1m 1g large_file

for (( i=0 ; i<1000 ; i=i+1 ))
do
  echo -en ${k} >> 1m
done

for (( i=0 ; i<1000 ; i=i+1 ))
do
  cat 1m >> 1g
done

for (( i=0 ; i<10 ; i=i+1 ))
do
  cat 1g >> large_file
done

rm -f 1m 1g

read more ...