Skip to main content

Answering 'Are we there yet?' in a Unix setting

Often -- commonly during an outage window -- you might get asked "How far through is that (insert-length-process-here)?". Such processes are common in outage windows; particularly unscheduled outages where filesystem-related work may be involved, but crop up in plenty of places.

In a UNIX/Linux environment, a lot of processes are very silent about progress (certainly with regard to % completed), but a lot of time, we can deduce how far through an operation is. This post illustrates with a few examples, and then slaps on a very simple and easy user-interface.

But 'Are we there yet?' is rather similar in spirit to 'Where is up to?' or 'What is it doing?', so I'll address that here too. In fact, I'll address those first, because they often lead up to the first question. And we won't just cover filesystem operations, but they will be first because that's what's on my mind as I write this.

Naval-gazing filesystem progress

Let's assume you're moving data around a filesystem. Perhaps you have a rsync or cp command in flight (and perhaps you omitted any sort of --progress flag because you didn't want to miss any errors that might get printed). Or perhaps you're trying to determine this for another process.

You can use lsof to find out what (regular) files are open at the time.

# lsof /disknew | awk '$5 == "REG" {print $9}'

A common technique is to keep tabs on this with the watch command. Here I'm also using the df command to show the source and destination as well as the current file. The effect is a crude, if still effective, dashboard.:

# watch -n30 lsof /disknew \| awk "'\$5 == \"REG\" {print \$9}'" \; df -h /disk /disknewEvery 30.0s: lsof /disknew | awk '$5 == "REG" {print $9}' ; df -h /disk /disknew  Tue Apr 21 14:17:20 2015

Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             493G  468G     0 100% /disk
                      788G  201G  548G  27% /disknew

When filesystem operations recurse a directory, they don't (generally) open a directory, read the directory listing, sort it and then proceed in sorted order; ls certainly does, but cp etc. don't. find does sort as well, but doesn't appear to have a way to tell it not to. Instead cp etc. open the directory, and start reading the contents (list of things in that directory) in the order that the filesystem returns it in.

We can get ls to return a directory in unsorted order using the -U option (use ls -U1 if output is going to your screen otherwise it will wait to collate the output into columns). Note that this is also great if a directory is really large. With knowledge of where the our migration process is up to (from lsof perhaps), and knowledge of the order that should do things in (from ls -U), then we can even determine how far-through it is -- you could make this quite exotic if you wanted.

# ls -U1 | awk '/blahblah.mp4/ {up_to=NR} END {print int(up_to / NR * 100)}'

In the above example, I was copying a lot of multimedia files and wanted to know where it was up to. It was just just in one directory, so I didn't have to worry about recursion. I could have used lsof to find out where my rsync process was up to, but in this case I was using rsync -av, so it will printing out the filenames as it processed them. The trick here was to use awk to record the line number (NR -- number of records) that were read when the input matched blahblah.mp4 -- what rsync reported it was up to at the time -- and then when it finished reading the directory contents, print out as an integral percentage its progress, based on the number of records at the end.

Gauging progress

What we need is simply is some metric of completion. If we don't want the equivalent of a progress-bar, we could just eyeball it. Heck, if we want a UI, we could even use whiptail:

How easy is this? The whiptail part is actually pretty simple, just pipe something that outputs lines of integral percentages (remember; no fractions). Note that whiptail is a cousin of dialog, so if you're not on a Red Hat system, you'll probably find this easier using dialog. Here is an example from my rsync example earlier, reformatted to be easier to read. I've also used the df -P flag to ensure that there is one-line per record of output (plus a header).

$ while true; 
>   df -Pm /disk /disknew \
>     | awk '{ used[$6] = $3 }
>            END { print int(used["/disknew"] / 
>                        used["/disk"] * 100)
>            }';
>   sleep 5;
done | whiptail --gauge "Initial sync" 10 70 0

Remember that in this example, whiptail is being given the stdout of the entire while loop contents.

Progress from other places

Progress could be formulated in any number of ways. Examples:

  • number of MBs used in one filesystem / directory versus another;
  • amount of time spent doing something that you've done in a test environment (see my post on How Long has that Command been Running)
  • a SQL query (such as a row-count)
But there is nothing about these techniques that require that it be something that begins at 0 and ends at 100, or even really that you have a number. With the whiptail example, we were dealing with a percentage guage, and a guage can go up or down.

With the watch examples earlier, we don't even need a number. If you were sufficiently bored, you could even hook it up to something like cowsay if you wanted some amooo-sing updates.


Popular posts from this blog

Use IPTables NOTRACK to implement stateless rules and reduce packet loss.

I recently struck a performance problem with a high-volume Linux DNS server and found a very satisfying way to overcome it. This post is not about DNS specifically, but useful also to services with a high rate of connections/sessions (UDP or TCP), but it is especially useful for UDP-based traffic, as the stateful firewall doesn't really buy you much with UDP. It is also applicable to services such as HTTP/HTTPS or anything where you have a lot of connections...

We observed times when DNS would not respond, but retrying very soon after would generally work. For TCP, you may find that you get a a connection timeout (or possibly a connection reset? I haven't checked that recently).

Observing logs, you might the following in kernel logs:
kernel: nf_conntrack: table full, dropping packet. You might be inclined to increase net.netfilter.nf_conntrack_max and net.nf_conntrack_max, but a better response might be found by looking at what is actually taking up those entries in your conne…

ORA-12170: TNS:Connect timeout — resolved

If you're dealing with Oracle clients, you may be familiar with the error message
ERROR ORA-12170: TNS:Connect timed out occurred I was recently asked to investigate such a problem where an application server was having trouble talking to a database server. This issue was blocking progress on a number of projects in our development environment, and our developers' agile post-it note progress note board had a red post-it saying 'Waiting for Cameron', so I thought I should promote it to the front of my rather long list of things I needed to do... it probably also helped that the problem domain was rather interesting to me, and so it ended being a late-night productivity session where I wasn't interrupted and my experimentation wouldn't disrupt others. I think my colleagues are still getting used to seeing email from me at the wee hours of the morning.

This can masquerade as a number of other error strings as well. Here's what you might see in the sqlnet.log f…

Getting MySQL server to run with SSL

I needed to get an old version of MySQL server running with SSL. Thankfully, that support has been there for a long time, although on my previous try I found it rather frustrating and gave it over for some other job that needed doing.

If securing client connections to a database server is a non-negotiable requirement, I would suggest that MySQL is perhaps a poor-fit and other options, such as PostgreSQL -- according to common web-consensus and my interactions with developers would suggest -- should be first considered. While MySQL can do SSL connections, it does so in a rather poor way that leaves much to be desired.

UPDATED 2014-04-28 for MySQL 5.0 (on ancient Debian Etch).

Here is the fast guide to getting SSL on MySQL server. I'm doing this on a Debian 7 ("Wheezy") server. To complete things, I'll test connectivity from a 5.1 client as well as a reasonably up-to-date MySQL Workbench 5.2 CE, plus a Python 2.6 client; just to see what sort of pain awaits.

UPDATE: 2014-0…