2.6. Message data checks
Time has come to look at the content of the message itself.
This is what conventional spam and virus scanners do, as they
normally operate on the message after it has been accepted.
However, in our case, we perform these checks
before issuing the final
250 response, so that we have a chance to
reject the mail on the spot rather than later generating Collateral Spam.
If your incoming mail exchangers are very busy (i.e. large site,
few machines), you may find that performing some or all of these
checks directly in the mail exchanger is too costly. In
particular, running Virus Scanners and Spam Scanners do take up a fair amount of CPU
bandwidth and time.
If so, you will want to set up dedicated machines for these
scanning operations. Most server-side anti-spam and anti-virus
software can be invoked over the network, i.e. from your mail
exchanger. More on this in the following chapters, where we
discuss implementation for the various MTAs.
2.6.1. Header checks
2.6.1.1. Missing Header Lines
RFC
2822 mandates that a message
should contain at least the following
header lines:
From: ...
To: ...
Subject: ...
Message-ID: ...
Date: ...
|
The absence of any of these lines means that the message
is not generated by a mainstream Mail User Agent, and
that it is probably junk
.
2.6.1.2. Header Address Syntax Check
Addresses presented in the message header (i.e. the
To:, Cc:,
From: ... fields) should be syntactically
valid. Enough said.
2.6.1.3. Simple Header Address Validation
For each address in the message header:
If the address is local, is the local
part (before the @ sign) a valid mailbox?
If the address is remote, does the domain
part (after the @ sign) exist?
2.6.2. Junk Mail Signature Repositories
One trait of junk mail is that it is sent to a large number of
addresses. If 50 other recipients have already flagged a
particular message as spam, why couldn't you use this fact to
decide whether or not to accept the message when it is
delivered to you? Better yet, why not set up Spam Traps that feed a public pool of known spam?
I am glad you asked. As it turns out, such pools do exist:
These tools have progressed beyond simple signature checks
that only trigger if you receive an identical copy of a
message that is known to be junk mail. Rather, they evaluate
common patterns, to account for slight variations in the
message header and body.
2.6.3. Binary garbage checks
Messages containing non-printable characters are rare. When
they do show up, the message is nearly always a virus, or in
some cases spam written in a non-western language, without the
appropriate MIME encoding.
One particular case is where the message contains NUL
characters (ordinal zero). Even if you decide that figuring
out what a non-printable character means
is more complex than beneficial, you might consider checking
for this character. That is because some Mail Delivery Agents, such as the Cyrus Mail Suite,
will ultimately reject mails that contain it.
.
If you use such software, you should definitely consider
getting rid of NUL characters.
On the other hand, the (now obsolete) RFC 822 specification
did not explicitly prohibit NUL characters in the message.
For this reason, as an alternative to rejecting mails
containing it, you may choose to strip these characters from
the message before delivering it to Cyrus.
2.6.4. MIME checks
Similarly, it might be worthwhile to validate the MIME
structure of incoming message. MIME decoding errors or
inconsistencies do not happen very often; but when they do,
the message is definitely junk. Moreover, such errors may
indicate potential problems in subsequent checks, such as
File Attachment Checks, Virus Scanners,
or Spam Scanners.
In other words, if the MIME encoding is illegal, reject the
message.
2.6.5. File Attachment Check
When was the last time someone sent you a Windows screensaver
(".scr" file) or Windows Program Information File
(".pif") that you actually wanted?
Consider blocking messages with "Windows
executable" file attachment(s) - i.e. file names that
end with a period followed by any of a number of three-letter
combinations such as the above. This check consumes
significantly less resources on your server than Virus Scanners, and may also catch new virii for
which a signature does not yet exist in your anti-virus
scanner.
For a more-or-less comprehensive list of such "file name
extensions", please visit: http://support.microsoft.com/default.aspx?scid=kb;EN-US;290497.
2.6.6. Virus Scanners
A number of different server-side virus scanners are
available. To name a few:
In situations where you are not willing to block all
potentially dangerous files based on their file names alone
(consider ".zip" files), such scanners are
helpful. Also, they will be able to catch virii that are
not transmitted as file attachments, such as the
"Bagle.R" virus that arrived in March, 2004.
In most cases, the machine performing the virus scan does not
need to be your mail exchanger. Most of these anti-virus
scanners can be invoked on a different host over a network
connection.
Anti-virus software mainly detect virii based on a set of
signatures for known virii, or virus
definitions. These need to be updated regularly,
as new virii are developed. Also, the software itself
should at any time be up to date for maximum accuracy.
2.6.7. Spam Scanners
Similarly, anti-spam software can be used to classify messages
based on a large set of heuristics, including their content,
standards compliance, and various network checks such as DNS Blacklists and Junk Mail Signature Repository. In the end,
such software typically assigns a composite
"score" to each message, indicating the
likelihood that the message is spam, and if the score is above
a certain threshold, would classify it as such.
Two of the most popular server-side heuristic anti-spam
filters are:
These tools undergo a constant evolution as spammers find ways
to circumvent their various checks. For instance, consider
"creative" spelling, such as "GR0W lO
1NCH35". So, just like anti-virus software, if you use
anti-spam software, you should update it frequently for the
highest level of accuracy.
I use SpamAssassin, although to minimize impact on machine
resources, it is no longer my first line of defense. Out of
approximately 500 junk mail delivery attempts to my personal
address per day, about 50 reach the point where they are being
checked by SpamAssassin (mainly because they are forwarded
from one of my other accounts, so the checks described above
are not effective). Out of these 50 messages, one message
ends up in my inbox approximately every 2 or 3 days.