Use IPTables NOTRACK to implement stateless rules and reduce packet loss.

I recently struck a performance problem with a high-volume Linux DNS server and found a very satisfying way to overcome it. This post is not about DNS specifically, but useful also to services with a high rate of connections/sessions (UDP or TCP), but it is especially useful for UDP-based traffic, as the stateful firewall doesn't really buy you much with UDP. It is also applicable to services such as HTTP/HTTPS or anything where you have a lot of connections...

We observed times when DNS would not respond, but retrying very soon after would generally work. For TCP, you may find that you get a a connection timeout (or possibly a connection reset? I haven't checked that recently).

Observing logs, you might the following in kernel logs:
kernel: nf_conntrack: table full, dropping packet.
You might be inclined to increase net.netfilter.nf_conntrack_max and net.nf_conntrack_max, but a better response might be found by looking at what is actually taking up those entries in your connection tracking table.




We found that the connection tracking was even happening for UDP rules. You could see this with some simple filtering of /proc/net/ip_conntrack looking to see how many entries are there relating to port 53, for example. Here is the basic rules that most Linux people would likely write for iptables.

-A INPUT  -p udp --dport 53 -j ACCEPT
-A INPUT -p tcp --dport 53 -m state --state=NEW -j ACCEPT

NOTRACK for Stateless Firewall Rules in a Stateful Firewall

Thankfully, I had heard of the NOTRACK rule some time back, but never had a cause to use it, so at least I knew where to begin my research. Red Hat have an article about it at https://access.redhat.com/solutions/972673, though the rules below do not necessarily come from that.

So we needed to use the 'raw' table to disable stateful inspection for DNS packets; that does mean we need to explictly match all incoming and outgoing packets (which is four UDP flows for a recursive server, plus TCP if you want to do stateless TCP) -- its rather like IPChains way back in the day... and like IPChains, you do lose all the benefits you get from a stateful firewall, and gain all the responsibilities of making sure you explicitly match all traffic flows.

*raw
...
# Don't do connection tracking for DNS
-A PREROUTING -p tcp --dport 53 -j NOTRACK
-A PREROUTING -p udp --dport 53 -j NOTRACK
-A PREROUTING -p tcp --sport 53 -j NOTRACK
-A PREROUTING -p udp --sport 53 -j NOTRACK
-A OUTPUT -p tcp --sport 53 -j NOTRACK
-A OUTPUT -p udp --sport 53 -j NOTRACK
-A OUTPUT -p tcp --dport 53 -j NOTRACK
-A OUTPUT -p udp --dport 53 -j NOTRACK
...
COMMIT
...
*filter
...
# Allow stateless UDP serving
-A INPUT  -p udp --dport 53 -j ACCEPT
-A OUTPUT -p udp --sport 53 -j ACCEPT
# Allow stateless UDP backending
-A OUTPUT -p udp --dport 53 -j ACCEPT
-A INPUT  -p udp --sport 53 -j ACCEPT
# Allow stateless TCP serving
-A INPUT  -p tcp --dport 53 -j ACCEPT
-A OUTPUT -p tcp --sport 53 -j ACCEPT
# Allow stateless TCP backending
-A OUTPUT -p tcp --dport 53 -j ACCEPT
-A INPUT  -p tcp --sport 53 -j ACCEPT
...
COMMIT

Beware the moving bottleneck

That worked well... perhaps a little too well. Now the service gets more than it did before, and you need to be prepared for that, as you may find that a new limit (and potential negative behaviour) is reached.

DNS is particularly prone to having very large spikes of activity due to misconfigured clients. A common problem, particularly from Linux clients, are things like Wireshark, scripts that look up (often using dig -- see my post on how to do this better), and not having a local name-service cache (eg. nscd or better).

Assuming you can identify such clients (see my other DNS posts which have some ideas and tools), you could (perhaps in conjunction with fail2ban or similar) have some firewall rules that limit allowable request rates from segments of your network.

These rules would go prior to your filter table rules allowing access (listed earlier).

-N DNS_TOO_FREQUENT_BLACKLIST
# This chain is where the actual rate limiting is put in place.
# Note that it is using just the srcip method in its hashing
-A DNS_TOO_FREQUENT_BLACKLIST -p udp -m udp --dport 53 -m hashlimit --hashlimit-mode srcip --hashlimit-srcmask 32 --hashlimit-above 10/sec --hashlimit-burst 20 --hashlimit-name dns_too_frequen -m comment --comment "drop_overly_frequent_DNS_requests" -j DROP

# This matches a pair of machines I judged to be innocently bombarding DNS
# It so happens that they could be nicely summarised with a /31
# The second line is so we can counters of what made it through
-A INPUT -s «CLIENT_IP»/31 -j DNS_TOO_FREQUENT_BLACKLIST
-A INPUT -s «CLIENT_IP»/31
#... more rules here as needed

Concluding Remarks

I've been running this configuration now for some time, and am very happy with it. I do intend to implement this technique on other services where I feel it may be needed (Samba perhaps, perhaps logging servers)

I hope you find this useful for you; if you've got any comments, I'd be happy to see them. 

Cheers,
Cameron

Comments

  1. I guess we can skip NOTRACK rules at all if all other rules are written like this, without state module? It's working that way.

    ReplyDelete
  2. No, unfortunately, if you have a rule such as;

    -A INPUT -p udp --dport 53 -j ACCEPT

    then state will still be tracked, even though we haven't made any "stateful" condition on the rule. This has to happen in a stateful firewall because at the top of our stateful firewall we test for state. I'm not sure if the state starts getting tracked as soon as we start require matching on state, or as soon as the ipt_state module is loaded, but you can verify this behaviour by looking for (in this example) a UDP port of 53 in /proc/net/nf_conntrack, which is where you can see the state currently tracked (warning: it can be very large).

    ReplyDelete

Post a Comment

Popular posts from this blog

ORA-12170: TNS:Connect timeout — resolved

Getting MySQL server to run with SSL

From DNS Packet Capture to analysis in Kibana