Notes to self on Dealing with Spam

I'm a masochist and maintain my own Linux server, with Sendmail the main source of pain.

For a long time I simply wasn't getting spam: my domain is not on the radar, nor easily guessed.

Those blissful times are long gone and now I get dozens of spams a day.

So I finally decided to do something about it and looked into what sendmail can do for me. I settled on using its access db. Here, for posterity, is what I am currently doing to maintain that db:

Install cidrexpand.pl from Derek Balling. I had to install some Perl modules for this to work. Net::CIDR, I think it was.

Make a little script, which I call spamcatch, and put it in my path:

#!/bin/sh
SPAMFILE=/tmp/spam$$
cat - > $SPAMFILE

egrep --after=3 "^From " $SPAMFILE | egrep "^Received: " - | egrep "\[([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0\
-9]{1,3})\]" -o | egrep "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" -o >> /tmp/spam_ip_harvest
rm $SPAMFILE

Then, in Mutt, when the cursor is on a spam email, I pipe it to spamcatch, then delete the email. This step is pretty easy and a big improvement over what I'd been doing.

This collects ip addresses in

/tmp/spam_ip_harvest

Now do

sort -n /tmp/spam_ip_harvest | uniq  > spam_ip

to remove duplicates and

for ip in `cat spam_ip`; do echo \# $ip; whois -h whois.lacnic.net $ip ; done >> spam_whois

to look up the info that we will use to filter.

Most of the whois response is not what we're looking for, so filter out much of the junk with:

egrep "^##* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|^CIDR|^cidr|^Connect|^route|^inetnum|^parent" spam_whois > spam_whois2

This is the tedious part. For now we have to manually edit spam_whois2 to get to the Sendmail db format:

  1. Remove (or re-lookup with whois) entries with no route/cidr/etc.
  2. Put entries in CIDR format:
    • Normalize entries like 123.45/16 to 123.45.0.0/16
    • Normalize entries like 123.45.0.0 - 123.45.255.255 to 123.45.0.0/16
  3. Replace CIDR:, route:, inetnum:, etc. with Connect:
  4. Append REJECT after all the Connect: lines

For the rest we become root:

Copy /etc/mail/access and /etc/mail/access2 to /root/

Add new entries:

cat spam_whois2 >> access

Expand CIDR entries into mail access.db format:

cidrexpand access > access2

Finally, we update the access db. Note that the .db extension on output file is implied. The following command actually writes to /etc/mail/access.db:

makemap -r hash /etc/mail/access < access2

Restart sendmail:

/etc/rc.d/rc.sendmail restart

To Do

I'd like to make that whois step come up with the format I need (CIDR record) but I don't know how to force that.

Ideally the command to which I pipe the spam from mutt would do the whole thing - pull out the IP from which the spam came, find its CIDR record, expand that to sendmail access db format, append to the current list, and rebuild the hash file. Ideally it would handle dupes and overlapping. I think I saw that cidrexpand has some ability to deal with overlapping ranges, so that part is promising.

[I know I can combine many of the individual steps described above into a single script and will do so. This is just a first airing of my dirty laundry so I can remember what to do next 'cause it's still too much to do every day.]

Effectiveness

What I'm most worried about now is that I block legit email. The only way to know with this approach is to scan sendmail's log file, /var/adm/maillog. This is a pain in the ass because you have to be privileged to read it and I haven't gotten to the point where I can casually see what's going on. A little more Google time wouldn't hurt I guess.

changed January 9, 2015