NOTE: Scripts in this post were tested in CentOS release 5.3 and may not work in other Unix dialects. For example, flock(1) utility is absent from OS X 10.6.
When writing a daemon shell script we traditionally use *.pid files to store the process ID for later use and to avoid multiple copies running simultaneously and potentially colliding. The common pattern of such use is:
self=$(basename $0)
lock=/tmp/${self}.pid
if [ -f ${lock} ]
then
echo "Another copy of ${self} potentially running." >&2
echo "Check the ${lock} lock and remove if necessary." >&2
exit 1
else
echo $$ >${lock}
trap "rm -f ${lock}" EXIT
fi
The problem with this approach is that after a process or system crash the lock is stale and the new copy of the daemon will not start without manual cleanup.
Another approach, documented in the flock(1) man page and used by system programmers all along, is to use a process file descriptor locked exclusively on a “pid” file. In this case, the pid file becomes a bearer of the lock instead of being a lock itself. A file descriptor lock on a file exists only as long as the file descriptor is alive. When file descriptor owner processes die, the file descriptor is destroyed together with file locks associated with it. That solves the crash problem of the first approach.
Let’s look at the code:
1: pidf=/tmp/${self}.pid # Define lock file name
2: exec 221>${pidf} # Open the file with descriptor 221
3: flock --exclusive --nonblock 221 || # Attempt to acquire the lock
4: {
5: # Lock acquisition failure code # Your custom error handler here
6: exit 1 # optionally exit the script
7: }
8: echo $$ >&221 # Put the PID in lock file
After this block of code, the .pid file has a lock from the current process and it’s new children. Children receive the lock with file descriptors, which they inherit from the parent.
To release the lock, one can either close all file descriptors holding the lock, or use the /sbin/flock --unlock ... command to explicitly release the lock.
A user may check the presence of a lock on a file using the fuser(1) utility:
$ /sbin/fuser -v monitord.pid
USER PID ACCESS COMMAND
monitord.pid: auser 28576 F.... monitord
auser 28579 F.... ping
Note, that line 8 of the locking code which stores PID of the locking process in a .pid file is redundant, as the information can be retrieved for the lock itself. It is only stored in the file for convenience later in the code, when we need to use the PID to inquiry or manage the process. It is very important to note, that a mere presence of the .pid file in this arrangement does not mean, that the lock is active. It only records a PID of the process which currently holds the lock or was the last one to hold it.
To programmatically test for the presence of the lock we need to attempt to grab the lock. If we fail, then there was another lock already on the file. If we succeed, then there was no lock on the file. In any case we need to close the file descriptor to avoid inadvertently holding the lock ourselves.
if flock --exclusive --nonblock 232 232<${pidf}
then
echo "Open"
else
echo "Locked"
fi
exec <&232-
This testing technique is only good to test the presence of the lock. It is more convenient to use the fuser(1) utility to send KILL or other signals to locking processes, like this:
/sbin/fuser -k ${pidf}
The fuser utility will send the kill signal to all processes with file descriptors holding the lock, which should take care of the parent and children processes, if any.
There is more to graceful daemon writing in bash. Other topics to cover are logging with rotation, sleeping, and configuration. Time permitting, I’ll get to write about that.