Effort sponsored in part by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-01-2-0537.
A firewall may consist of several components. Its key component is usually a packet filter. The packet filter may be stateful to reach more informed decisions. The state allows the packet filter to keep track of established connections so that arriving packets could be associated with them. On the other hand, a stateless packet filter bases its decisions solely on individual packets. With release 3.0, OpenBSD includes a new Stateful Packet Filter (pf) in the base install. pf implements traditional packet filtering with some additional novel algorithms. This paper describes the design and implementation of pf and compares its scalability and performance with existing packet filter implementations.
While an acceptable Open Source license was a prerequisite for any replacement, we used this opportunity to develop a new packet filter that employed optimized data structures and algorithms to achieve good performance for stateful filtering and address translation. The resulting code base is small and thus facilitates future extensions.
The remainder of this paper is organized as follows. Section 2 outlines the design of the packet filter. In Section 3 we compare pf's performance with other packet filters and discuss the results. Section 4 presents future work. Finally, we conclude in Section 5.
The filter itself consists of two basic elements, the filter rules and the state table.
During rule set evaluation, the packet is compared against all rules from the beginning to the end. A packet can match more than one rule, in which case the last matching rule is used. This mechanism allows overriding general rules with more specific rules, like blocking all packets first and then passing specific packets. The last matching rule determines if the packet is passed or blocked according to its action field.
A matching rule that has been flagged as final terminates rule evaluation for the current packet. The action from that rule is applied to the packet. This prevents final rules from being overridden by subsequent rules.
The rule set is organized in a multiple linked list. This allows pf to perform automatic optimization of the rule set as discussed in Section 2.8.
For TCP, state matching involves checking sequence numbers against expected windows [8], which improves security against sequence number attacks.
UDP is stateless by nature: packets are considered to be part of the same connection if the host addresses and ports match a state entry. UDP state entries have an adaptive expiration scheme. The first UDP packet could either be a one-shot packet or the beginning of a UDP pseudo-connection. The first packet will create a state entry with a low timeout. If the other endpoint responds, pf will consider it a pseudo-connection with bidirectional communication and allow more flexibility in the duration of the state entry.
ICMP packets fall into two categories: ICMP error messages which refer to other packets, and ICMP queries and replies which are handled separately. If an ICMP error message corresponds to a connection in the state table, it is passed. ICMP queries and replies create their own state, similar to UDP states. As an example, an ICMP echo reply matches the state an ICMP echo request created. This is necessary so that applications like ping or traceroute work across a Stateful Packet Filter.
pf stores state entries in an AVL tree. An AVL tree is a balanced binary search tree. This container provides efficient search functionality which scales well for large trees. It guarantees the same O(log n) behavior even in worst case. Although alternative containers like hash tables allow searches in constant time, they also have their drawbacks. Hash tables have a fixed size by nature. As the number of entries grows, collisions occur (if two entries have the same hash value) and several entries end up in the same hash bucket. An attacker can trigger the worst case behavior by opening connections that lead to hash collisions and state entries in the same hash bucket. To accommodate this case, the hash buckets themselves would have to be binary search trees, otherwise the worst case behavior allows for denial of service attacks. The hash table would therefore only cover a shallow part of the entire search tree, and reduce only the first few levels of binary search, at considerable memory cost.
The state table contains entries with three address/port pairs: the internal, the gateway and the external pair. Two trees contain the keys, one sorted on internal and external pair, the other sorted on external and gateway pair. This allows to find not only a matching state for outgoing and incoming packets, but also provides the NAT mapping in the same step without additional lookup costs. pf also supports port redirection and bidirectional translation. Application-level proxies reside in userland, e.g., ftp-proxy which allows active mode FTP for clients behind a NAT gateway.
While OpenBSD itself is not vulnerable to fragmentation attacks [3, 4], it protects machines with less secure stacks behind it. Fragments are cached and reassembled by pf so that any conflict between overlapping fragments is resolved before a packet is received by its destination [2].
When a rule set is loaded, the kernel traverses the set to calculate these so-called skip-steps. In each rule, for each parameter, there is a pointer set to the next rule that specifies a different value for the parameter.
During rule set evaluation, if a packet does not match a rule parameter, the pointer is used to skip to the next rule which could match, instead of the very next rule in the set.
The skip-steps optimization is completely transparent to the user because it happens automatically without changing the meaning of any rule set.
The performance gain depends on the specific rule set. In the worst case, all skip-steps have a length of one so that no rules can be skipped during evaluation, because the parameters of consecutive rules are always different. However, even in the worst case, performance does not decrease compared to an unoptimized version.
An average rule set, however, results in larger skip-steps which are responsible for a significant performance gain. The cost of evaluating a rule set is often reduced by an order of magnitude, as only every 10th to 100th rule is actually evaluated.
Firewall administrators can increase the likelihood of skip-steps optimizations and thereby improving the performance of rule evaluation by sorting blocks of rules on parameters in a documented order.
Automatically generated groups of rules are already sorted in optimal order, e.g., this happens when one rule contains parameter lists that are expanded internally to several new rules.
The tester host uses a libnet program to generate TCP packets as ethernet frames. They are sent through the first interface to the firewall host and captured as ethernet frames on the second interface of the tester using libpcap. The two hosts do not have any other network connections.
The firewall is configured to forward IP packets between its interfaces, so that the packets sent by the tester are forwarded through the other interface back to the tester.
The firewall is an i386 machine with a Pentium 166 MHz CPU and 64 MB RAM; the tester is a faster i386 machine. All four network interface cards are identical NetGear PCI cards, using the sis driver.
Arp table entries are static, and the only packets traversing the wires are the packets generated by the tester and forwarded back by the firewall.
The generated packets contain time stamps and serial numbers, that allow the tester to determine latency and packet loss rate.
The tester is sending packets of defined size (bytes/packet, including ethernet header and checksum) at defined rates (packets/s), and measures the rate of received packets, the average latency and loss rate (percentage of packets lost). For each combination, the measuring period is at least ten seconds. Latency is the average for all packets returned to the tester during the measuring period. Lost packets are not counted towards latency.
pf, OpenBSD 3.0 with IPFilter and GNU/Linux RedHat 7.2 with iptables.
Figure 1: Unfiltered
Figure 1 shows that both the tester and the firewall are able to handle packets at the maximum frame rate [1] for all packet sizes of 128 bytes and above. All further tests are done using packet sizes of either 128 or 256 bytes. The degradation for the GNU/Linux machine seems slightly worse than for the OpenBSD machine.
Figures 2, 3 and 4 show throughput, latency and loss depending on sending rate, for a set of 100 rules, using 256 byte packets. Iptables outperforms both pf and IPFilter in this test. It has a higher maximum throughput and lower latency compared to the other two packet filters.
Figure 2: Stateless filtering with 100 rules (throughput)
Figure 3: Stateless filtering with 100 rules (latency)
Figure 4: Stateless filtering with 100 rules (loss)
Each packet filter has a distinct maximum lossless throughput. If the sending rate exceeds this maximum, latency increases quickly and packet loss occurs. For all three filters, latency is nearly identical below the maximum lossless rate. When the sending rate is increased beyond the point where loss occurs, throughput actually decreases. In such overloaded condition, all three packet filter consume all CPU resources and the console becomes unresponsive. After the sending rate is lowered below the maximum throughput rate, each one of them recovers quickly.
Figure 5: Maximum throughput with increasing number of rules
The test is repeated with various rule set sizes between one and ten thousand. For each rule set size, the maximum throughput rate possible without packet loss is noted. The results are shown in Figure 5 as a function of the rule set size. Both pf and IPFilter evaluate the rule set twice for each packet, once incoming on the first interface and once outgoing on the second interface. Iptables' performance advantage is due to the fact that it evaluates the rule set only once by using a forwarding chain. The forwarding chain evaluates the rules set based on a packet's complete path through the machine.
Figure 6: Stateful filtering with 20000 state entries (throughput)
Figure 6 compares the throughput in relation to the sending rate for stateful filtering with twenty thousand state entries. Both pf and IPFilter exhibit the same behavior when overloaded. However, IPFilter reaches overload at a packet rate of about four thousand packets per second whereas, pf does not reach overload until about six thousand packets per second.
Figure 7: Stateful filtering with 20000 state entries (latency)
The latency comparison for this test is shown in Figure 7. The latency increases as expected when the packet filters reach overload.
Similarly to the second test, the procedure is repeated for various state table sizes between one and twenty five thousand. Figure 8 shows the maximum lossless throughput rate as a function of the state table size. We notice that pf performs significantly better than IPFilter when the size of the state table is small. At ten thousand state table entries, IPFilter starts to perform better than pf, but the difference is not significant.
Figure 8: Maximum throughput with increasing number of states
We expected Figure 8 to show the O(1) behavior of hash table lookups for IPFilter and O(\log n) of tree lookups for pf. However, it seems that the constant cost of hash key calculation is relatively high, and IPFilter still depends on n for some unknown reason.
Iptables has not been included in this benchmark because it does not do stateful filtering comparable to pf and IPFilter. The version of iptables that we tested employs connection tracking without any sequence number analysis for packets outside of the initial TCP handshake. While this is unsurprisingly faster, it would be an unfair performance comparison. There is a patch for iptables that adds sequence number checking, but it is still beta and is not included in the GNU/Linux distribution used for testing.
The slower performance is explained by the fact that pf used to evaluate the rule set three times for every packet that passes an interface: twice to look for scrub rules which determine whether IP and TCP normalization should be performed and once to look for pass and block rules.
This problem can be rectified by a simple optimization. It is sufficient to add two additional skip-steps for rule types scrub versus pass/block and the direction in versus out. This change which is now part of OpenBSD 3.1 improves the performance of pf's rule set evaluation considerably, as shown in Figure 9.
Figure 9: Maximum throughput with increasing number of rules
The benchmarks measure only how packet filter performance scales for extreme cases to show the behavior of rule set evaluation and state table lookup algorithms, e.g., completely stateless and when all packets match state. In real-life, a firewall will perform different mixtures of these operations, as well as other operations that have not been tested, like creation and removal of state table entries and logging of blocked packets.
Also, real-life packets rarely require a complete evaluation of the rule set. All tested packet filters have mechanisms to reduce the number of rules that need to be evaluated on average. Iptables allows the definition of and jumps to chains of rules. As a result, the rule set becomes a tree instead of a linked list. IPFilter permits the definition of rule groups, which are only evaluated when a packet matches a head rule. pf uses skip-steps to automatically skip rules that cannot apply to a specific packet. In summary, iptables perform the best for stateless rules and pf performs the best when using stateful filtering.
The benchmarks show that the lower cost of state table lookups compared to the high cost of rule set evaluations justify creating state for performance reasons. Stateful filtering not only improves the quality of the filter decisions, it effectively improves filtering performance.
The new Stateful Packet Filter is included in OpenBSD 3.0 released in November 2001. The source code is BSD licensed and available in the OpenBSD source tree repository [7].
[2] David Watson, Farnam Jahanian, G. Robert Malan and Paul Howell. Transport and Application Protocol Scrubbing. In Proc Infocomm 2000, 2000.
[3] Patricia Gilfeather and Todd Underwood. Fragmentation Made Friendly.
[4] C. A. Kent and J. C. Mogul. Fragmentation considered harmful. In WRL Technical Report 87/3, December 1987.
[5] C. Kreibich, M. Handley and V. Paxson. Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics. In Proc. USENIX Security Symposium 2001, 2001.
[6] Steven McCanne and Van Jacobson. The BSD Packet Filter: A New Architecture for User-level Packet Capture. In Proc. 1993 Winter USENIX Conference, San Diego, CA, January 1993.
[7] The OpenBSD Project.
[8] Guido van Rooij. Real Stateful TCP Packet Filtering in IP Filter.