Please don't use dig etc. in reporting scripts... use 'getent hosts' instead (gawk example)

Okay, so the excrement of the day is flying from the fan and you need to get some quick analytics of who/what is being the most-wanted of the day. Perhaps you don't currently the analytics you need at your fingertips, so you hack up a huge 'one-liner' that perhaps takes a live sample of traffic (tcpdump -p -nn ...), extracts some data using sed, collates the data using awk etc. and then does some sort of reporting step. Yay, that's what we call agile analytics (see 'hyperbole'); its the all-to-common fallback goto, but it does prove to be immensely useful.

Okay, so you've got some report, perhaps it lacks a bit of polish, but it contains IP addresses, and we'd like to see something more recognisable (mind you, some IP addresses can become pretty recognisable). So, you scratch your head a bit and have the usual internal debate "do I do this bash, or fall up to awk, perl/python". At this point (if you go with bash etc. or awk), you'll perhaps think of using `dig +short -x $IP` to get (you hope) the canonical DNS name associated with it.

Oh, oh. A common point of trouble at this part is that you end up calling `dig` many times... often for the same name in a short period of time. Perhaps you'll request things too fast and lookups might fail (rate controls and limits on DNS servers etc.) As someone who looks after DNS servers, I urge you to stop. There is a much better way, and that way is the cache-friendly `getent hosts`.

When you call `dig`, `host`, or `nslookup`, you are going directly to DNS. Any local resolver is bypassed. What a regular program does when it calls `gethostbyname(...)` or `getnameinfo(...)` and related functions, is to look for the presence of a local cache (eg. it will look explicitly for the presence of a UNIX-domain socket /var/run/nscd/socket) and it will consult files like /etc/resolv.conf. Both of these get used to potentially cache DNS queries.

This is why sometimes it can be useful to use 'ping' to lookup DNS resolution... but if your doing that, you perhaps don't know about 'getent'.

The `getent` command is part of your standard Linux environment; it comes with glibc. It's purpose is as a lookup/testing tool for things like /etc/passwd, /etc/groups, /etc/services, and also name resolution and many others --- see the manual page for getent(1). As `getent` is simply a command that exposes API functionality from glibc, you get that potentially huge benefit from caching.

To do a forward lookup, you can use the 'hosts' database. Forward lookups are not as pleasant compared to say `dig +short -t a ...` if you expect an IPv4 address (I'll leave you to play with forward lookups yourself). Reverse lookups are very simple.

Let's get a test-case:

$ host is an alias for has address has IPv6 address 2404:6800:4006:801::2009

$ host domain name pointer

Okay, so if we do a reverse lookup on, we expect to see (this will vary depending where you in the world; topologically it seems I'm close to India)

$ getent hosts

Hooray. Let's see what a negative result looks like.

$ getent hosts
   ... yup, 0 lines of output

The output is meant to be exactly what /etc/hosts could contain; a single IP address, a canonical names, and a list of aliases.


Nice and script-friendly. Let's see how to pull that in AWK (or GAWK), with a simple example first. Let's start off with some input -- perhaps lines with an IP address and some count. I've also include a threshold, just as a reminder that it's good to minimise the number of lookups.

$ echo -e ' 2\n8.8.8.8 12\n8.8.4.4 25' \
  | awk '
    BEGIN {threshold = 5}
    $2 > threshold {
      "getent hosts " $1 | getline getent_hosts_str;
      split(getent_hosts_str, getent_hosts_arr, " ");
      print $1, getent_hosts_arr[2], $3

Real-world example

Well, the previous example was reasonably real-world, but its useful to see something a bit more fully-fledged.

I have a script called dns-live-sample-frequent-each-second which runs tcpdump, and basically outputs the most frequent client (with a minimum threshold) every second. So this 'one-liner', which is not yet encapsulated into a script, will look at 100 samples (ie. 100 seconds) and output a table of the common sources of such spikes.

The first script (dns-live-sample-frequent-each-second) has output like the following (I've anonymised the IP addresses). The fields are time, IP and count of requests that client made that second.

13:19:36 10
13:19:37 36
13:19:38 30
13:19:39 23

There's the 'one-liner' in all its glory (?). If you care to expand it, you'll see that its using the PROCINFO["sorted_in"] functionality of gawk to sort an associated array by its values in descending numerical order. That's certainly a trick worth knowing.

./dns-live-sample-most-frequent-each-second | head -100 | gawk '{clients[$2] += 1} END {print "CLIENT_IP CLIENT_DNS %TOP1"; rest = 0; PROCINFO["sorted_in"] = "@val_num_desc"; for (client in clients) {if (clients[client] >= 0.05*NR) {"getent hosts " client | getline getent_hosts_str; split(getent_hosts_str, getent_hosts_arr, " "); print client, getent_hosts_arr[2], int(clients[client] / NR * 100) } else { rest += clients[client] }} print "REST -", int(rest/NR*100)}' | column -t

And its output

CLIENT_IP  CLIENT_DNS         %TOP1   9  18
REST       -                  73

Right, now to go have a chat with whoever looks after

I'll put that into a script soon and maybe do something with cron... maybe.

Here's the more polished form in a script. I called it dns-live-sample-most-frequent-spikers



echo Looking for spiking clients; sampling each second for $num_seconds seconds...


dns-live-sample-most-frequent-each-second \
  | head -n "$num_seconds" \
  | gawk '
    {clients[$2] += 1}
    END {
        print "CLIENT_IP CLIENT_DNS %TOP1";
        rest = 0;
        PROCINFO["sorted_in"] = "@val_num_desc";
        for (client in clients) {
            if (clients[client] >= 0.05*NR) {
                "getent hosts " client | getline getent_hosts_str;
                split(getent_hosts_str, getent_hosts_arr, " ");
                print client, getent_hosts_arr[2], int(clients[client] / NR * 100)
            } else {
                rest += clients[client]
        print "REST -", int(rest/NR*100)
    }' \
  | column -t


Popular posts from this blog

ORA-12170: TNS:Connect timeout — resolved

Getting MySQL server to run with SSL

From DNS Packet Capture to analysis in Kibana