When the operating system boots, the kernel takes control of the machine. After attaching detected hardware components, it starts the first process, init(8). All further processes are created from there, like the startup scripts rc(8) and netstart(8), or getty(8) which runs login(8) for terminals like your console, which in turn runs your shell when you log in, which then creates processes as you type commands.
Anything outside the kernel, including all processes created by users, is called userland. The kernel has unlimited privileges, while processes are always associated with a user and limited by the privileges of the user, enforced by the kernel.
As a user, you need to communicate with pf in the kernel to load a ruleset, to configure options, and to retrieve information like the contents of the state table or statistical counters. Operations of this kind, where the user initiates a request that pf in the kernel answers, take place through the ioctl(2) interface, using pfctl(8).
There is a second interface between pf in the kernel and userland, bpf(4). Using this interface, a userland process can register itself to receive network packets from the kernel. This is used by pflog(4) for logging.
pfctl, a userland process, opens the special file /dev/pf and sends ioctl commands through the file handle to the kernel. An ioctl command can both transfer arguments from the process to the kernel as well as transfer results back from the kernel to the process. Some commands given to pfctl by the user translate into a single ioctl call. Others might require several ioctl calls.
The file is special as it does not store data written to it in the file system and has no size:
$ ls -l /dev/pf crw------- 1 root wheel 73, 0 Nov 22 10:59 /dev/pfThe 'c' in the file mode (the left-most column) stands for character special file. For such files, ls(1) prints the so-called major and minor device numbers in place of the size. The major number, 73 in the output above, indicates which component in the kernel ioctl commands should be dispatched to. Since different architectures support different kinds of devices, the major number of a given device (or a pseudo device, like pf's ioctl interface) vary across architectures and may change across releases. Some devices, but not pf, support multiple instances, and the minor number, 0 in the output above, is used to dispatch commands to specific instances.
$ ls -l /dev/pf crw------- 1 root wheel 73, 0 Nov 22 10:59 /dev/pf $ id uid=1000(dhartmei) gid=1000(dhartmei) groups=1000(dhartmei), 0(wheel) $ pfctl -d pfctl: /dev/pf: Permission deniedYou can grant other users access to pf by changing these file permissions. For instance, you could allow all members of the wheel group access to read-only functions:
$ chmod g+r /dev/pf $ ls -l /dev/pf crw-r----- 1 root wheel 73, 0 Nov 22 10:59 /dev/pfSpecial files like /dev/pf can be recreated with default permissions using the MAKEDEV(8) script:
$ cd /dev $ ./MAKEDEV pfThis script calls mknod(8) to create a character type pseudo device with the major and minor number appropriate for the architecture. On macppc, it runs:
$ mknod pf c 73 0 $ chmod 600 pfThe file name does not need to be pf for the kernel to forward requests sent through the file to pf, only the major and minor numbers are relevant. Hence, you could create multiple special files, for instance in locations other than /dev for chrooted daemons or with different file owners or groups.
Note that access to pf, especially write access, should only be granted to trusted users or audited daemons, as it allows direct communication with pf in the kernel. Not only can a malicious user or a compromised daemon with access to pf disturb the operation of the packet filter or bypass your filtering policy, but insufficient input validation (a bug) in the kernel could potentially be exploited with invalid ioctl arguments to escalate privileges locally.
Another feature that affects access control is called securelevel(7). During boot, the kernel initially starts in 'insecure mode', also referred to as single-user and then switches to 'secure mode', known as multi-user. There is an optional 'highly secure mode', which can be set in rc.securelevel(8) to further lock down a system. The system becomes less generally useful in this state, but the harm a compromised root account can do is limited. pf no longer allows ruleset changes once this securelevel is reached.
pf=YES # Packet filter / NAT pf_rules=/etc/pf.conf # Packet filter rules fileFirst, the system startup script rc(8) loads a minimal default ruleset and enables pf:
RULES="block all" RULES="$RULES\npass on lo0" RULES="$RULES\npass in proto tcp from any to any port 22 keep state" RULES="$RULES\npass out proto { tcp, udp } from any to any port 53 keep state" RULES="$RULES\npass out inet proto icmp all icmp-type echoreq keep state" if ifconfig lo0 inet6 >/dev/null 2>&1; then RULES="$RULES\npass out inet6 proto icmp6 all icmp6-type neighbrsol" RULES="$RULES\npass in inet6 proto icmp6 all icmp6-type neighbradv" RULES="$RULES\npass out inet6 proto icmp6 all icmp6-type routersol" RULES="$RULES\npass in inet6 proto icmp6 all icmp6-type routeradv" fi RULES="$RULES\npass proto { pfsync, carp }" case `sysctl vfs.mounts.nfs 2>/dev/null` in *[1-9]*) # don't kill NFS RULES="scrub in all no-df\n$RULES" RULES="$RULES\npass in proto udp from any port { 111, 2049 } to any" RULES="$RULES\npass out proto udp from any to any port { 111, 2049 }" ;; esac echo $RULES | pfctl -f - -eThis ruleset is active while the network is being started through netstart(8). It only allows traffic necessary during netstart(8), like DNS or NFS. Your real ruleset couldn't be loaded at this point, because it might contain references to interface names and addresses which do not exist at this point, because netstart hasn't run yet. And you wouldn't want to just pass all traffic until your real ruleset has been loaded, because netstart(8) might start some vulnerable network daemon you rely on being protected by pf. There would be a brief window of vulnerability during each boot without the minimal default ruleset.
Afterwards, your full ruleset /etc/pf.conf is loaded:
if [ -f ${pf_rules} ]; then pfctl -f ${pf_rules} finetstart(8) typically runs only for a brief period of time, so the use of the minimal default ruleset is barely noticable for most users, except for the case when the ruleset /etc/pf.conf cannot be loaded, for instance due to typographical mistake in the ruleset. In this case, the minimal default ruleset remains active, which does allow incoming SSH connections so the problem can be fixed remotely.
The flag -r affects results that contain IP addresses. By default, addresses are shown numerically. With -r, reverse DNS looksup are performed and symbolic host names are shown instead, where available.
$ pfctl -e -f /etc/pf.confThis both enables pf and loads the ruleset. Some combinations have different results depending on chronological order of execution. pfctl executes some combinations in reasonable order (instead of evaluating command line options strictly from left to right), but if there is any ambiguity, commands should be issued with separate pfctl invocations.
$ pfctl -e pf enabled $ pfctl -d pf disabledWhen pf is disabled, no packets are passed to pf to decide whether they should be blocked or passed. This can be used to diagnose problems or compare performance.
It's not required to enable or disable pf to perform other operations, e.g. you don't need to disable pf before and re-enable it after a ruleset change.
When filtering statefully, disabling pf can break ongoing connections that are translated or use sequence number modulation. Also, pf cannot associate packets with state entries while disabled. When packets are missed, state entries do not advance their sequence number windows, and connections can stall and reset when pf is re-enabled and may require re-establishment.
A less intrusive way to diagnose pf related problems is to leave pf enabled but flush (clear) the ruleset. An empty ruleset will pass all packets due to the pass rule implied when no matching rule is found.
$ pfctl -Fr -Fn rules cleared nat clearedPackets with invalid checksums or IP options are blocked by default even with an empty ruleset. Diagnosis of such cases might require disabling pf.
The current state, enabled or disabled, is show in the first line of output from
$ pfctl -si Status: Enabled for 17 days 18:26:19 Debug: Urgentpfctl operations, like loading rulesets or showing state entries, are possible even if pf is disabled. However, loading a ruleset does not automatically enable pf, an explicit pfctl -e is required.
$ pfctl -f /etc/pf.confA file can be only parsed but not loaded, for instance to check syntax validity, by adding -n:
$ pfctl -n -f /etc/pf.confAdding -v makes the output more verbose, showing what rules would be loaded into the kernel:
$ pfctl -n -v -f /etc/pf.confInstead of a file name, '-' can be use for standard input, e.g.
$ echo "block all" | pfctl -nvf - block drop allIf the ruleset contains macros, their values can be supplied or overridden from the command line when the ruleset is loaded using the -D option, like:
$ cat /etc/pf.conf pass out on $ext_if keep state $ pfctl -D 'ext_if=wi0' -vf /etc/pf.conf pass out on wi0 all keep stateRuleset files like /etc/pf.conf can contain filter rules (pass or block), translation rules (nat, rdr, and binat), and options (like set limit states 10000) and pfctl -f processes all of them. In the kernel, filter and translation rules are stored separately, i.e. a ruleset contains a list of filter rules and a list of translation rules.
You can load only the filter rules, leaving the translation rules unchanged, using:
$ pfctl -R -f /etc/pf.confConversely, only translation rules are loaded with:
$ pfctl -N -f /etc/pf.confTo load only the options, but neither filter nor translation rules, use:
$ pfctl -O -f /etc/pf.confThis is needed when you want to change an option from the command line like:
$ echo "set limit states 20000" | pfctl -O -f -Without the -O, pfctl would treat the piped input as a complete ruleset and replace the filter and translation rules with empty lists.
To show the currently loaded translation and filter rules, use:
$ pfctl -sn -srOr use -sn or -sr on its own to show either list of rules only.
Verbose output is produced by adding -v or -vv:
$ pfctl -vvsr @74 pass in on kue0 inet proto tcp from any to 62.65.145.30 port = smtp flags S/SA keep state [ Evaluations: 95196 Packets: 95284 Bytes: 33351097 States: 0 ]The '@74' show indicates the rule number, used as reference by other commands.
The second line shows how many times the rule has been evaluated, how many packets the rule was last-matching for, the sum of the sizes of these packets, and how many states currently exist in the state table that were created by the rule.
There's no need to flush rules before loading a new ruleset like
$ pfctl -Fr -Fn -f /etc/pf.confIn fact, this not only wastes CPU cycles, but introduces a (brief) temporary state with no rules loaded, when packets might pass that both the old and the new ruleset would block. A simple invokation with -f is sufficient and safe: while the new ruleset is being uploaded to the kernel, the old ruleset is still in effect. Once the new ruleset is completely uploaded, the kernel switches the rulesets and releases the old set. Any packet, at any time, is either filtered by the entire old ruleset or the entire new ruleset. If the upload fails for any reason, the old ruleset remains intact and in effect.
There are no pfctl commands to add or remove individual rules from a loaded ruleset. However, the output of pfctl -sr is valid input for pfctl -f. For instance, additional rules can be inserted at the beginning or end of the ruleset using:
$ (echo "pass quick on lo0"; pfctl -sr) | pfctl -f - $ (pfctl -sr; echo "block all") | pfctl -f -Piping the output through standard text processing tools like head(1), tail(1), sed(1), or grep(1), rulesets can be manipulated in many ways.
Instead of adding and removing rules, it's often simpler to use constant rules which reference tables, and to manipulate the tables so the rules apply to different sets of addresses.
Note that loading a ruleset does not remove state entries created by previously used rulesets. For instance, if your currently loaded ruleset contains the rule
pass in proto tcp to port ssh keep stateand you establish an SSH connection matching this rule and creating a state entry, the state entry will continue to exist and to pass packets related to that connection even after you have loaded another ruleset which does not contain a similar rule or even explicitely blocks such connections.
To flush existing state entries, explicitely use
$ pfctl -Fs
$ pfctl -ss gem0 tcp 10.1.1.1:43222 -> 10.1.1.111:22 ESTABLISHED:ESTABLISHED gem0 tcp 81.221.21.250:6667 <- 10.2.2.6:4487 ESTABLISHED:ESTABLISHED kue0 tcp 62.65.145.30:17533 -> 12.108.129.170:25 FIN_WAIT_2:FIN_WAIT_2 kue0 tcp 127.0.0.1:8025 <- 62.65.145.30:25 <- 63.236.31.89:53251 ESTABLISHED:ESTABLISHEDThe first column shows the interface the state was created for, except for states that are floating (not bound to interfaces), where 'self' is shown instead.
The second column shows the protocol of the connection, like tcp, udp, icmp, or other.
The following columns show the peers involved in the connection. Those can simply be two source and destination addresses (and ports, for tcp or udp) when the connection is not translated. When either source translation (nat or binat) or destination translation (rdr or binat) is used, a third address shows the original address before translation. The arrows <- and -> indicate the direction of the connection (incoming and outgoing, respectively) from the point of view of the interface the state was created on.
The last column shows the condition the state is in, which determines the timeout value being used to remove the state entry. For TCP states, this loosly resembles the TCP states shown by netstat -p tcp for the local peer.
Adding -v make the output more verbose:
$ pfctl -vss kue0 tcp 62.65.145.30:25 <- 216.136.204.119:29267 TIME_WAIT:TIME_WAIT [3321306408 + 58242] wscale 0 [64544208 + 16656] wscale 0 age 00:01:05, expires in 00:00:28, 10:9 pkts, 4626:1041 bytes, rule 74For TCP connections, the second line shows the currently valid TCP sequence number windows, that is the lowest and highest segment pf will let pass. The first number shows the highest segment acknowledged by the peer, the lower boundary of the window, and the second number is the window advertised by the peer. The sum of both numbers equals the upper boundary. If the connection uses TCP window scaling, the scaling factors of both peers are shown. A value of n means the factor is 2^n. The value 0 means a peer advertised its supports of window scaling, but didn't want to scale its own windows (2^0 is factor 1). The windows in the square brackets are shown unscaled, that is, before any scaling factors are applied.
The third line shows the age of the state entry in hours, minutes, and seconds. Similarly, the time after which the entry will timeout if no further packets match the entry is shown next. In the example, the condition of the connection is TIME_WAIT:TIME_WAIT, so the timeout value tcp.closed applies, which defaults to 90 seconds. The state entry expires in 28 seconds, because the last packet of the connection was seen 62 seconds ago. If no further packet matches this state entry, the entry will be marked for removal in 28 seconds. Marked entries are removed periodically, the default interval is 10 seconds. This explains how state entries can show up in pfctl -vss output as 'expires in 00:00:00' for several seconds before they finally vanish.
The "10:9 pkts" on the third line in the example indicates that 19 packets have matched the state entry so far, 10 in the same direction as the packet that created the state entry, and 9 in the opposite direction. Similarly, "4626:1041 bytes" means those former 10 packets contained a total of 4626 bytes and the latter 9 packets a total of 1041 bytes.
The last part, "rule 74", shows the number of the "pass ... keep state" rule that created the state entry. This number usually does not equal the line number of the rule in the ruleset file, due to rule expansion. Instead, the number corresponds to the rule numbers printed by pfctl -vvsr, like:
$ pfctl -vvsr | grep '@74 ' @74 pass in on kue0 inet proto tcp from any to 62.65.145.30 port = smtp flags S/SA keep stateMore verbose output from pfctl -vvss includes an id and creator id of the state entry used by pfsync.
The state table can be flushed (cleared) with:
$ pfctl -FsIndividual entries can be killed (removed) with:
$ pfctl -k 10.1.2.3 $ pfctl -k 10.2.3.4 -k 10.3.4.5The first command kills all states from source 10.1.2.3, the second one kills all states from source 10.2.3.4 to destination 10.3.4.5. Depending on whether the state is for an incoming or outgoing connection, arguments may have to be reversed. The -k option is not very versatile, not all kinds of states can be killed with it, requiring to flush the entire state table.
$ pfctl -s queue queue q_max priority 7 queue q_hig priority 5 queue q_def priority 3 queue q_low priq( default )Adding -v adds two lines of counters for each queue:
$ pfctl -v -s queue queue q_low priq( default ) [ pkts: 4174247 bytes: 1861178708 dropped pkts: 10382 bytes: 2318648 ] [ qlength: 0/ 50 ]The 'pkts' counter shows how many packets were assigned to the queue, 'bytes' is the sum of the those packets' sizes. Similarly, 'dropped pkts' counts packets that were assigned to the queue but had to be dropped because the queue length was reached, and the total size of those packets. 'qlength' shows the current fullness of the queue as the number of entries vs. the maximum number of entries.
Adding -vv makes pfctl show the same output as -v in an endless loop. Additionally, the differences of counters between passes, after the first pass, allows pfctl to print average packet rate and throughput, like:
queue q_low priq( default ) [ pkts: 4177298 bytes: 1861897544 dropped pkts: 10382 bytes: 2318648 ] [ qlength: 0/ 50 ] [ measured: 4.6 packets/s, 10.24Kb/s ]
$ pfctl -s Tables spammers whitelistAn individual table, specified by -t, can be manipulated using the -T command.
Show all entries of a table:
$ pfctl -t spammers -T show 222.222.48.0/24 222.223.128.38Delete all entries from a table:
$ pfctl -t spammers -T flush 5 addresses deleted.Add an entry to a table:
$ pfctl -t spammers -T add 10.1.2.3 1/1 addresses added. $ pfctl -t spammers -T add 10/8 1/1 addresses added. $ pfctl -t spammers -T add '!10.1/16' 1/1 addresses added.Delete an entry from a table:
$ pfctl -t spammers -T delete 10.1.2.3 1/1 addresses deleted.Test whether an address matches a table:
$ pfctl -t spammers -T test 10.2.3.4 1/1 addresses match. $ pfctl -t spammers -T test 10.1.1.1 0/1 addresses match. $ pfctl -t spammers -vv -T test 10.1.1.1 0/1 addresses match. 10.1.1.1 !10.1.0.0/16Multiple entries can be added, removed, or tested like:
$ pfctl -t spammers -T add 10.2.3.4 10.3.4.5 10.4.5.6 3/3 addresses added.Instead of listing the entries on the command line, the list can be read from a file:
$ cat file 10.2.3.4 10.3.4.5 10.4.5.6 $ pfctl -t spammers -T add -f file 3/3 addresses added.The following example searches the web server log for requests containing 'cmd.exe' (a common exploit attempt) and adds all (new) client addresses to a table:
$ grep 'cmd\.exe' /var/www/logs/access.log | \ cut -d ' ' -f 1 | sort -u | \ pfctl -t weblog -T add -f - 28/32 addresses added.The table could be referenced by rules, for example, to block these clients, to redirect them to another server, or to queue replies to their web requests differently.
You can load a ruleset, a list of rules, into an anchor, as you can create a number of files in a directory. Evaluating a ruleset corresponds to processing all files located in one directory.
When the main ruleset is evaluated for a packet, only the rules inside the main ruleset are automatically evaluated. If there are anchors containing rules, those rules are not automatically evaluated, unless there is an explicit call (like a function call) to them from the main ruleset.
There are two forms of calls that cause evaluation of anchors, the first one is:
anchor "/foo" allWhen rule evaluation reaches this rule, evaluation branches into the list of rules within anchor /foo, and evaluates them from first to last. Upon reaching the last rule within anchor /foo, evaluation returns to the caller and continues with the next rule after the anchor call in the caller's context.
Note that evaluation is not recursive. When anchor /foo contains sub-anchors, the lists of rules within those sub-anchors are not evaluated by the above call, only the rules directly within anchor /foo are.
The second form is:
anchor "/foo/*" allThis call does not evaluate the list of rules in anchor /foo at all. Instead, all anchors within anchor /foo are traversed, and for each sub-anchor, the list of rules inside that sub-anchor is evaluated.
Again evaluation is not recursive. When the sub-anchors below anchor /foo contain sub-sub-anchors, the sub-sub-anchors are not evaluated, only the rules directly within the sub-anchors are.
Anchors can be used to dynamically change a ruleset (from a script, for instance) without reloading the entire main ruleset. When you regularly need to modify only a specific section of your main ruleset, you can move the rules of that section into an anchor, which you call from the main ruleset. Then you can modify the section by reload the rules of the anchor, without ever touching the main ruleset again. Of course, anchors can also be empty (contain no rules). Calling an empty anchor from the main ruleset simply does nothing while the anchor is empty. However, you can later load rules into the anchor and the main ruleset will then evaluate these rules automatically, not requiring a change in the main ruleset.
Another example is authpf(8), which dynamically modifies the filter policy to allow traffic from authenticated users. You create an anchor /authpf directly below the main ruleset. For each user who authenticates, the program creates a sub-anchor below anchor /authpf, and the rules for that user are loaded into that sub-anchor. The hierarchy looks like this:
/ the main ruleset /authpf the anchor containing the user anchors /authpf/fred an anchor for user fred /authpf/paul an anchor for user paulEvery anchor can contain rules, as every directory can contain files. In this case, however, the anchor authpf does not contain any rules, it only contains other anchors (like a directory that only contains subdirectories, but no files). The purpose of the authpf anchor is merely to hold the user anchors, not to contain rules itself. The users' anchors could be created directly in the main ruleset, but the intermediate anchor helps keep anchors organized. Instead of cluttering the namespace in the main ruleset, which could contain other anchors not related to authpf, all anchors related to authpf are stored inside one dedicated anchor, and authpf is free to do whatever it wants within that part of the world.
In this case, we want to evaluate the rules within anchor /authpf/fred and /authpf/paul. Actually, we want to evaluate the rules within all sub-anchors directly below /authpf, since authpf will dynamically add and remove sub-anchors. Hence, we can use the second form of call from the main ruleset:
anchor "/authpf/*" allAnchor calls don't have to specify absolute paths to the destination, relative paths are possible, too:
anchor "authpf" all anchor "authpf/fred" all anchor "../../authpf" allFor relative paths, the point of reference is the caller, i.e. if anchor /foo/bar/baz contains the rule which calls "../frobnitz", the destination is /foo/bar/frobnitz (no matter from where /foo/bar/baz may have been called).
You can list all top-level anchors with:
$ pfctl -s Anchors authpfAdding -v lists all anchors recursively:
$ pfctl -v -s Anchors authpf authpf/fred authpf/paul authpf/paul/subTo list the sub-anchors of a specific anchor:
$ pfctl -a authpf -s Anchors authpf/fred authpf/paulAdding -v lists all anchors below the specified anchor recursively:
$ pfctl -a authpf -v -s Anchors authpf/fred authpf/paul authpf/paul/subTo load a ruleset into an anchor:
$ pfctl -a authpf/fred -f freds_rules.txtTo show the filter rules within an anchor:
$ pfctl -a authpf/fred -srAnchors can also contain tables. A table within an anchor is manipulated in the same way as a table in the main ruleset, the only difference is the additional -a option specifying the anchor:
$ pfctl -a authpf/fred -t spammers -T add 10.1.2.3
Copyright (c) 2004-2006 Daniel Hartmeier <daniel@benzedrine.ch>. Permission to use, copy, modify, and distribute this documentation for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.