Skip to main content

Capturing and Analysing DNS Samples (tcpdump meets SQLite3)

A few nights ago, I was on a bit of a mission to determine how much traffic our servers were sending to our DNS servers, as I was fairly sure it was much, and I wanted to put some improvement in place so we don't make an issue for ourselves.

I didn't want to turn on query logging, because of the rate of traffic, and its not my job to manage the DNS service, but I can see the queries in flight, so tcpdump was an obvious tool to use. But I didn't want to be awking, grepping, sedding, and generally bashing (pun intended) tcpdump output into whatever variety of reports I wanted, so I instead fed it into SQLite3.

I thought I would share how I did that; I won't show you much of the output, as that is not as useful (I'm sure you could think of some more interesting SQL to throw at it.)

It all starts with a tcpdump capture, which can be read back. You could do this without saving the file (which can be very useful to avoid creating a potential space issue), but because I wanted to know the duration of the capture, I decided to write it to disk first so I could query it again -- actually, I should have just created another query for last timestamp minus the first. To make the timestamps more useful, I set the timestamp format to raw, which is basically Unix-time with microseconds. Here's some sample output from just tcpdump (sanitised)

$ /usr/sbin/tcpdump -tt -r /tmp/query-sample.pcap -n | head -3
reading from file /tmp/query-sample.pcap, link-type EN10MB (Ethernet)
1405083214.871468 IP ###.56357 > ###.domain: 24707+ PTR? (45)
1405083214.871663 IP ###.34021 > ###.domain: 8029 [1au] PTR? (56)
1405083214.884989 IP ###.60397 > ###.domain: 15430+ A? (70)

Here's the script. There is much that could be improved (I'd improve the temporary file location if I was planning on reusing this script.) I've taken some pains to process the query values somewhat to aggregate things usefully, such as reverse lookups, and some cases where a anti-malware software is calling out to DNS as a form of distributed database. I've also tried to aggregate some IP address ranges differently (which could be useful for internal/external); pretend that I have a net-block of 123.45.67.

Note that I haven't added any indexes to this. It didn't seem to suffer much from not having done, but at 50,000 requests, it's reaching that point where it would perhaps be beneficial.

# cat dns-live-query-sample


echo "Capturing sample of $sample_size packets from interface $interface"

tcpdump -w /tmp/query-sample.pcap -i "$interface" -nn -p -c "$sample_size" -s0 dst port domain

rm -f /tmp/query-sample.sqlite3
sqlite3 /tmp/query-sample.sqlite3 'create table queries (count int,client,querytype,queryval);'

tcpdump -r /tmp/query-sample.pcap -nn \
    | sed -rne 's/^[^ ]+ IP ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\.[0-9]+ > ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\.[0-9]+: .* ([^?]+)\? ([A-Za-z0-9._-]+) .*/\1 \3 \4/p' \
    | awk '
        { queryval = $3 }
        queryval ~ /\.noisy-example\.com\.$/ { queryval = "" }
        queryval ~ /\.10\.in-addr\.arpa\.$/ { queryval = "" }
        queryval ~ /\.67\.45\.123\.in-addr\.arpa\.$/ { queryval = "" }
        queryval ~ /[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*\.in-addr\.arpa\.$/ { queryval = "" }
        { print $1,$2,queryval }' \
    | sort | uniq -c \
    | sed -re 's/[ \t]+/|/g' -e 's/^\|//' \
    | sqlite3 /tmp/query-sample.sqlite3 '.import /dev/stdin queries'

echo "
    In this report, Cost is a count of such queries received,
    normalised by the number of clients that queried it. Thus,
    something with a Cost greater than 10 (<= are omitted),
    would likely benefit from DNS caching.

    Some queries, namely inverse lookups and things under (generally some abuse of DNS) are


sqlite3 /tmp/query-sample.sqlite3 <<EOF
.mode column
.header on

.width 4 7 70
select sum(count)/count(count) as Cost, querytype as RRtype, queryval as Name from queries group by RRtype,Name having Cost > 10 order by Cost desc limit 100;

.width 9 6 15
select sum(count) as NumQueries, querytype as RRtype, client as Client from queries group by Client having NumQueries > 10 order by NumQueries desc limit 100;

tcpdump -tt -r /tmp/query-sample.pcap -nn \
    | grep '\.53: ' \
    | cut -d. -f1 | uniq -c \
    | awk '
        NR == 1 {starttime = $2; next}
        {total += $1; count += 1; last = $1; lasttime = $2}
        END { total -= last; print "Queries / second = " total / (lasttime - starttime) }'

Here is a sample of the output (again, sanitised)

# ./dns-live-query-sample
Capturing sample of 5000 packets from interface eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
5000 packets captured
5000 packets received by filter
0 packets dropped by kernel
reading from file /tmp/query-sample.pcap, link-type EN10MB (Ethernet)

    In this report, Cost is a count of such queries received,
    normalised by the number of clients that queried it. Thus,
    something with a Cost greater than 10 (<= are omitted),
    would likely benefit from DNS caching.

    Some queries, namely inverse lookups and things under (used for ###) are

Cost  RRtype   Name
----  -------  -----------------------------------------------
195   A
130   A
104   A
102   A
94    AAAA
94    AAAA
22    PTR

NumQuerie  RRtype  Client
---------  ------  ---------------
808        TXT     10.###
473        PTR     10.###
348        PTR     10.###
314        PTR     10.###
reading from file /tmp/query-sample.pcap, link-type EN10MB (Ethernet)
Queries / second = 134.722

I should point out: only requests are captured, not responses, so this script, as written, cannot tell you about success of failure (but it may let you see some interesting patterns).

Hope you find this useful; presumably there is already a tool that does this analysis better, but you may find the technique of feeding tcpdump into sqlite useful.



Popular posts from this blog

ORA-12170: TNS:Connect timeout — resolved

If you're dealing with Oracle clients, you may be familiar with the error message
ERROR ORA-12170: TNS:Connect timed out occurred I was recently asked to investigate such a problem where an application server was having trouble talking to a database server. This issue was blocking progress on a number of projects in our development environment, and our developers' agile post-it note progress note board had a red post-it saying 'Waiting for Cameron', so I thought I should promote it to the front of my rather long list of things I needed to do... it probably also helped that the problem domain was rather interesting to me, and so it ended being a late-night productivity session where I wasn't interrupted and my experimentation wouldn't disrupt others. I think my colleagues are still getting used to seeing email from me at the wee hours of the morning.

This can masquerade as a number of other error strings as well. Here's what you might see in the sqlnet.log f…

Getting MySQL server to run with SSL

I needed to get an old version of MySQL server running with SSL. Thankfully, that support has been there for a long time, although on my previous try I found it rather frustrating and gave it over for some other job that needed doing.

If securing client connections to a database server is a non-negotiable requirement, I would suggest that MySQL is perhaps a poor-fit and other options, such as PostgreSQL -- according to common web-consensus and my interactions with developers would suggest -- should be first considered. While MySQL can do SSL connections, it does so in a rather poor way that leaves much to be desired.

UPDATED 2014-04-28 for MySQL 5.0 (on ancient Debian Etch).

Here is the fast guide to getting SSL on MySQL server. I'm doing this on a Debian 7 ("Wheezy") server. To complete things, I'll test connectivity from a 5.1 client as well as a reasonably up-to-date MySQL Workbench 5.2 CE, plus a Python 2.6 client; just to see what sort of pain awaits.

UPDATE: 2014-0…

From DNS Packet Capture to analysis in Kibana

UPDATE June 2015: Forget this post, just head for the Beats component for ElasticSearch. Beats is based on PacketBeat (the same people). That said, I haven't used it yet.

If you're trying to get analytics on DNS traffic on a busy or potentially overloaded DNS server, then you really don't want to enable query logging. You'd be better off getting data from a traffic capture. If you're capturing this on the DNS server, ensure the capture file doesn't flood the disk or degrade performance overmuch (here I'm capturing it on a separate partition, and running it at a reduced priority).

# nice tcpdump -p -nn -i eth0 -s0 -w /spare/dns.pcap port domain

Great, so now you've got a lot of packets (set's say at least a million, which is a reasonably short capture). Despite being short, that is still a massive pain to work with in Wireshark, and Wireshark is not the best tool for faceting the message stream so you can can look for patterns (eg. to find relationshi…