|
|
Reject mail matching regular expressions
Introduction
sendmail's
milter
API
allows programs to register themselves and get called during
mail transactions. Such plugins will see all mails passing
through sendmail, including
SMTP
envelope parameters and mail
headers and body. They can cause sendmail to reject messages
with permanent or temporary error replies
or discard messages silently, based on arbitrary conditions.
milter-regex is a very simple plugin that rejects or discards
messages matching
regular expressions.
It doesn't add much
processing overhead, so even a busy mail server can afford to
run it.
Inline filtering
Filtering mails 'inline', i.e. while the SMTP transaction is
happening, has several advantages compared to post-processing
as commonly done using procmail.
Messages rejected inline do
not have to be stored locally just to get deleted again later.
The sender immediately gets an SMTP error code and the receiver
doesn't generate any bounce messages (which might get sent
to fake sender addresses, and cost bandwidth and queue space).
Furthermore, inline filtering applies to all messages passing
through the system. A single filter can reject incoming and
outgoing messages to and from all users.
Regular expressions
Spam
filters like
SpamAssassin
can use complex algorithms to
detect offending messages, at the cost of consuming considerable
resources. Regular expression matching is much simpler
and allows to reject large volumes of unwanted messages
at low cost, greatly reducing the load on
more complex filters called subsequently. Regular expressions
are a commonly known and versatile tool, and well-suited for
quickly matching the most urgent threats.
Motivation
The milter API is relatively new, but already several plugins
have been written that filter messages in various ways, some of
them using regular expressions in some form. milter-regex does
not provide any fundamentally different features. Its main
goal is to support both
basic and extended
regular expressions in a useable way and stay lean enough to be
affordable on busy mail servers. It doesn't change or add headers,
and relinquishes resources back to sendmail as early as possible
(not reading message bodies when there are no expressions to
match the body against). milter-regex runs on OpenBSD and is
BSD licensed.
Man page
MILTER-REGEX(8) System Manager's Manual MILTER-REGEX(8)
NAME
milter-regex - sendmail milter plugin for regular expression filtering
SYNOPSIS
milter-regex [-d] [-c config] [-f facility] [-j dirname] [-l loglevel]
[-m number] [-p pipe] [-r pid-file] [-t] [-u user]
[-G group] [-P mode] [-U user]
DESCRIPTION
The milter-regex plugin can be used with the milter API of sendmail(8) to
filter mails using regular expressions matching SMTP envelope parameters
and mail headers and body.
The options are as follows:
-d Don't detach from controlling terminal and produce verbose
debug output on stdout.
-c config Use the specified configuration file instead of the default,
/etc/milter-regex.conf.
-f facility
Use the specified syslog facility instead of the default,
daemon.
-j dirname
Change root to the specified directory.
-l loglevel
Only log messages up to and including the specified level.
See syslog(3) for the numerical values, e.g. the default
LOG_INFO=6.
-m number Ignore mail body after the specified number of lines.
-p pipe Use the specified pipe to interface sendmail(8). Default is
unix:/var/spool/milter-regex/sock.
-r pid-file
Write the pid to the specified file. Default is not to write a
file.
-t Test the configuration file and immediately exit with a status
indicating whether the file is valid.
-u user Run as the specified user instead of the default, _milter-
regex. When milter-regex is started as root, it calls
setuid(2) to drop privileges. The non-privileged user should
have read access to the configuration file and read-write
access to the pipe.
-G group Set the group ID of the pipe.
-P mode Set the permissions of the pipe to the specified mode instead
of the default, 0600.
-U user Set the user ID of the pipe.
SENDMAIL CONFIGURATION
The plugin needs to be registered in the sendmail(8) configuration, by
adding the following lines to the .mc file
INPUT_MAIL_FILTER(`milter-regex',
`S=unix:/var/spool/milter-regex/sock, T=S:30s;R:2m')
rebuilding /etc/mail/sendmail.cf from the .mc file using m4(1), and
restarting sendmail(8).
PLUGIN CONFIGURATION
The configuration file consists of rules that, when matched, cause
sendmail(8) to reject mails. Emtpy lines and lines starting with # are
ignored, as well as leading whitespace (blanks, tabs). Trailing
backslashes can be used to wrap long rules into multiple lines. Each
rule starts with one of the following commands:
reject <message>
Subsequent rules cause the mail to be rejected with a permanent
error consisting of the specified text part. The SMTP reply
consists of the three-digit code 554 (RFC 2821 "command rejected
for policy reasons"), the extended reply code 5.7.1 (RFC 1893
"Permanent Failure", "Security or Policy Status", "Delivery not
authorized, message refused") and the text part (which defaults to
"Command rejected", if not specified). This is a permanent
failure, which causes the sender to remove the message from its
queue without trying to retransmit, commonly generating a bounce
message to the sender.
tempfail <message>
Subsequent matching rules cause the mail to be rejected with a
temporary error consisting of the specified text part. The SMTP
reply consists of the three-digit code 451 (RFC 2821 "Requested
action aborted: local error in processing"), the extended reply
code 4.7.1 (RFC 1893 "Persistent Transient Failure", "Security or
Policy Status", "Delivery not authorized, message refused") and the
text part (which defaults to "Please try again later", if not
specified). This is a temporary failure, which causes the sender
to keep the message in its queue and try to retransmit it, commonly
for several days.
discard
Subsequent matching rules cause the mail to be accepted but then
discarded silently. Note that connect and helo rules should not
use discard.
quarantine <message>
Subsequent matching rules cause the mail to be quarantined in
sendmail(8).
accept
Subsequent matching rules cause the mail to be accepted without
further rule evaluation. Can be used for whitelist criteria.
A command is followed by one or more expressions, each causing the
previous command to be executed when matched. The following expressions
can be used:
connect <hostname> <address>
Reject the connection if both the sender's hostname and address
match the specified regular expressions. The numerical address is
either dotted-quad (IPv4) or coloned-hex (IPv6). The hostname is
the result of a DNS reverse resolution of the numerical address
(which sendmail(8) performs independantly of the milter plugin).
When resolution fails, the hostname contains the numerical address
in square brackets.
helo <name>
Reject the connection if the sender supplied HELO name matches the
specified regular expression. Commonly, the sender supplies his
fully-qualified hostname as HELO name.
envfrom <address>
Reject the mail if the sender supplied envelope MAIL FROM address
matches the specified regular expression. Addresses commonly have
the form <user@host.doma.in>.
envrcpt <address>
Reject the mail if the sender supplied envelope RCPT TO address
matches the specified regular expression.
header <name> <value>
Reject the mail if a header matches the specified name and value.
For instance, the header "Subject: Test" matches name Subject and
value Test.
body <line>
Reject the mail if a body line matches the specified regular
expression.
macro <name> <value>
Reject the mail if a sendmail macro value matches.
The plugin regularly checks the configuration file for modification and
reloads it automatically. Signals like SIGHUP will terminate the plugin,
according to the milter signal handler. The plugin reacts to any kind of
error, like syntax errors in the configuration file, by failing open,
accepting all messages. When the plugin is not running, sendmail(8) will
accept all messages.
REGULAR EXPRESSIONS
The regular expressions used in the configuration rules are enclosed in
arbitrary delimiters, no further escaping is needed.
The first character of an argument is taken as the delimiter, and all
subsequent characters up to the next occurance of the same delimiter are
taken literally as the regular expression. Since the delimiter itself
cannot be part of the regular expression (no escaping is supported), a
delimiter must be chosen that doesn't occur in the regular expression
itself. Each argument can use a different delimiter, all characters
except spaces and tabs are valid.
Two immediately adjacent delimiters form an empty regular expression,
which always matches and requires no regexec(3) call. This can be used
in rules requiring multiple arguments, to match only some arguments.
See re_format(7) for a detailed description of basic and extended regular
expressions.
Optionally, the following flags can be used after the closing delimiter:
e Extended regular expression. This sets REG_EXTENDED for regcomp(3).
i Ignore upper/lower case. This sets REG_ICASE.
n Not matching. Reverses the matching result, i.e. the mail is
rejected if the regular expression does not match.
BOOLEAN EXPRESSIONS
A rule can consist of either a simple term or more complex expressions.
A term has the form
header /From/ /domain/i
and expressions can be built combining terms with operators "and", "or",
"not" and parentheses, as in
header /From/ /domain/i and body /money/
( not header /From/ /domain/ ) and ( body /sex/ or body /fast/ )
Operator precedence should not be relied on, instead parentheses should
be used to resolve any ambiguities (they usually produce syntax errors
from the parser).
MACROS
Macros allow to store terms or expressions as a name, and $name can be
used as term within other rules, expressions or macro definitions.
Example:
friends = header /^Received$/ /^from [^ ]*(ork.net|home.com)/e
attachments = header ,^Content-Type$, ,multipart/mixed, and \
body ,^Content-Type: application/,
executables = $attachments and body ,name=".*.(pif|exe|scr)"$,e
reject "executable attachment from non-friends"
$executables and not $friends
Macro names must begin with a letter and may contain alphanumeric
characters and punctuation characters. Reserved keywords (like "reject"
or "header") cannot be used as macro names. Macros must be defined
before use, the definition must precede the use in the configuration
file, read from top to bottom.
EVALUATION
Rules are evaluated in the order specified in the configuration file,
from top to bottom. When a rule matches, the corresponding action is
taken, that is the last action specified before the matching rule.
The plugin evaluates the rules every time a line of mail (or envelope) is
received. As soon as a rule matches, the action is taken immediately,
possibly before the entire mail is received, even if further lines might
possibly make other rules match, too. This means the first rule matching
chronologically has precedence.
If evaluation for a line of mail makes two (or more) rules match, the
rule that comes first in the configuration file has precedence.
Boolean expressions are short-circuit evaluated, that means "a or b"
becomes true as soon as one of the terms is true and "a and b" becomes
false as soon as one of the terms is false, even if the other term is not
known, possibly because the relevant mail line has not been received yet.
EXAMPLES
# /etc/milter-regex.conf example
# Accept anything encrypted, just to demonstrate sendmail macros
accept
macro /tls_version/ /TLSv/
tempfail "Sender IP address not resolving"
connect /\[.*\]/ //
reject "Malformed HELO (not a domain, no dot)"
helo /\./n
reject "Malformed RCPT TO (not an email address, not <.*@.*>)"
envrcpt /<(.*@.*|Postmaster)>/ein
reject "HTML mail not accepted"
# use comma as delimiter here, as / occurs within RE
header /^Content-type$/i ,^text/html,i
body ,^Content-type: text/html,i
# Swen worm
discard
header /^(TO|FROM|SUBJECT)$/e //
header /^Content-type$/i /boundary="Boundary_(ID_/i
header /^Content-type$/i /boundary="[a-z]*"/
body ,^Content-type: audio/x-wav; name="[a-z]*\.[a-z]*",i
# Some nasty spammer
reject "Business Corp spam, get lost"
body /^Business Corp. for W.& L. AG/i and \
( body /043.*317.*0285/ or body /0041.43.317.02.85/ )
LOGGING
milter-regex sends log messages to syslogd(8) using facility daemon and,
with increasing verbosity, level err, notice, info and debug. The
following syslog.conf(5) section can be used to log messages to a
dedicated file:
!milter-regex
daemon.err;daemon.notice /var/log/milter-regex
GRAMMAR
Syntax for milter-regex in BNF:
file = ( rule | macro ) file
rule = action expr-list
action = "reject" msg | "tempfail" msg | "discard" |
"quarantine" msg | "accept"
msg = ( '"' | "'" ) string ( '"' | "'" )
expr-list = expr [ expr-list ]
expr = term | term "and" expr | term "or" expr | "not" term
term = '(' expr ')' | "connect" arg arg | "helo" arg |
"envfrom" arg | "envrcpt" arg | "header" arg arg |
"body" arg | "macro" arg arg | '$' name
arg = del regex del flags
del = '/' | ',' | '-' | ...
flags = [ 'e' ] [ 'i' ] [ 'n' ]
macro = name '=' expr
FILES
/etc/milter-regex.conf
SEE ALSO
mailstats(1), regex(3), syslog(3), syslog.conf(5), re_format(7),
sendmail(8), syslogd(8)
Simple Mail Transfer Protocol, RFC 2821.
Enhanced Mail System Status Codes, RFC 1893.
HISTORY
The first version of milter-regex was written in 2003. Boolean
expression evaluation was added in 2004.
AUTHORS
Daniel Hartmeier <daniel@benzedrine.ch>
OpenBSD 6.1 September 24, 2003 OpenBSD 6.1
More examples
If you have interesting rules that work for you, you're very welcome to
contribute them.
HELO with your own IP address
From Christopher Kruslicky:
tempfail "Malformed HELO (can't be me)"
helo /^62\.65\.145\.30$/
Some spammers pick your own IP address as HELO, assuming it has a
better chance of getting accepted by you than a random IP address (or
some potentially non-resolving hostname).
Dynamic host addresses
From Darren Henderson:
# from your examples, tempfailing non-resolving rDNS connections
tempfail "Sender IP address not resolving"
connect /\[.*\]/ //
# reject things that look like they might come from a dynamic address
reject "Looks like a dynamic address"
connect /[0-9][0-9]*\-[0-9][0-9]*\-[0-9][0-9]*/ //
connect /[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*/ //
connect /[0-9]{12}/e //
So, we reject anything that has three digit sets deperated by a dash, (ie
adsl-134-11-333-11.someisp.net). We reject anything that has 3 or more
numeric subdomains, (ie dialup.123.45.67.8.someisp.net). And finally
reject any address that has a group of 12 digits, (ie
pool123045067003.someisp.net).
Forged Outlook headers
Analyzing the spam that still gets delivered (and then promptly
detected by SpamAssassin), I found that most of it uses fake Outlook
headers. So let's add a rule to detect that inline (blatantly stealing
rules from SpamAssassin ;).
HAS_MIMEOLE = header /^X-MimeOLE$/ //
HAS_MSMAIL_PRI = header /^X-MSMail-Priority$/ //
HAS_X_MAILER = header /^X-Mailer$/ //
HAS_OUTLOOK_IN_MAILER = header /^X-Mailer$/ /Microsoft (CDO|Outlook) /e
MISSING_OUTLOOK_NAME = ( $HAS_MIMEOLE or $HAS_MSMAIL_PRI ) and \
$HAS_X_MAILER and not $HAS_OUTLOOK_IN_MAILER
OUTLOOK_MUA = header /^X-Mailer$/ / Outlook /
OUTLOOK_MSGID_1 = header /^Message-ID$/ \
/^<[0-9a-f]{12}\$[0-9a-f]{8}\$[0-9a-f]{8}@>$/
OUTLOOK_MSGID_2 = header /^Message-ID$/ \
/^<[A-Za-z0-9-]{7}[A-Za-z0-9]{20}@hotmail\.com>$/
IMS_MSGID = header /^Message-ID$/ \
/^<[A-F]{36,40}@>$/
UNUSABLE_MSGID = header /^List-Unsubscribe$/ //
FORGED_MUA_OUTLOOK = $OUTLOOK_MUA and not ( $UNUSABLE_MSGID or \
$OUTLOOK_MSGID_1 or $OUTLOOK_MSGID_2 )
MSGID_OE_SPAM_4ZERO = header /^Message-ID$/ \
/<[a-f0-9]{12}\$[a-f0-9]{8}\$0000[a-f0-9]{4}@/
reject "Forged Outlook headers"
$MISSING_OUTLOOK_NAME or $FORGED_MUA_OUTLOOK or $MSGID_OE_SPAM_4ZERO
Some performance benchmarks would be interesting here, I'm quite
sure these rules evaluate much cheaper inline in milter-regex than
in SpamAssassin (Perl) after accepting delivery, or a milter plugin
using spamc. If you measure how many mails per second max either of
these can handle on a specific machine, please let me know.
Sources
Makefiles for GNU/Linux and Solaris are included, but might need some tweaking. If they
don't work for you, please try to fix them and send me corrections.
Some patches
to build under Linux (not supported by me).
History
3.0: April 23rd, 2022
Takao Abe added GeoIP filtering criteria, you can find his version on
github.com/milter-regex.
2.7: December 12th, 2019
Add -t option to test the configuration file and exit with a status,
suggested by Ralph Seichter.
2.6: April 26th, 2019
Treat socket file name without prefix like local file, from Takao Abe.
Make pid file writable only by root, from Ralph Seichter.
2.5: April 18th, 2019
Add -r option to write pid file. Based on FreeBSD port patches.
2.4: March 2nd, 2019
Add -f option to set syslog facility. Patch from Takao Abe.
2.3: January 28th, 2019
Bug fix: for actions followed by multiple expressions (not just one
arbitrarily complex expression), when multiple expressions become defined
during the same sequence point, but with different values (e.g. one true,
another false), depending on the expression order, the action might not be
taken, when it should be.
This affects all prior versions since 1.0. As a workaround, use only a single
expression per action (duplicating action lines where needed), or combine
multiple expressions to a single expression per action using 'or'.
Report and testing by JCA.
2.2: September 25, 2018
Add -U, -G, and -P options to set pipe user, group, and permissions.
Suggested and tested by Ralph Seichter.
2.1: September 26, 2017
Default maximum log level to 6 (LOG_INFO), i.e. exclude LOG_DEBUG.
2.0: November 25, 2013
Add -l option to specify maximum log level.
1.9: November 21, 2011
Add -j option to chroot. Improve building on various platforms. Fix some typos in documentation and example config.
1.8: August 12, 2010
Log symbolic host name together with numeric IP address.
1.7: August 4, 2007
Support filtering sendmail macros, like {auth_type}.
1.6: June 6, 2005
Support sendmail quarantine action. Requires non-ancient sendmail
(>= 8.13) and libmilter, as shipping with recent *BSD releases by default.
More fixes for the state machine, dealing with multi-message connections.
1.5: March 19, 2004
Fix logic errors in dealing with multi-message connections (SMTP RSET,
HELO or MAIL FROM resetting SMTP state). Add cb_abort callback.
1.4: March 13, 2004
Some performance improvements, abort rule evaluation immediately when
no further rules can possibly match. Compile without -Werror, as some
ports generate warnings.
1.3: March 8, 2004
Two bugfixes related to RCPT TO: rule evaluation (DSN options and multiple receipients
would match incorrectly), umask(0177) for pipe, fix for Solaris daemon() implementation.
Improved logging (From:, To: and Subject: headers, when available).
1.2: February 27, 2004
Some logging improvements and small fixes. Adds Makefiles for GNU/Linux
and Solaris. Thanks to everyone who helped me solve the build problems.
1.1: February 25, 2004
Support macro definition/expansion.
1.0: February 24, 2004
Now supports boolean expressions, so multiple regular expressions can
be combined using and, or, not and parentheses.
Note that the new parser now requires quotes around reject/tempfail messages.
If you get syntax errors in your existing configuration file, lacking quotes
are a likely cause. Otherwise rulesets are backwards compatible with pre-1.0
versions.
0.1: September 24, 2003
First version.
Related links
- sendmail and milter
- Regular expressions
- OpenBSD
- SMTP
- RFC 821 Simple Mail Transfer Protocol (SMTP)
- RFC 1893 Enhanced Mail System Status Codes
- Other milter plugins (multiple plugins can be chained)
|