Notes to self on Dealing with Spam
I'm a masochist and maintain my own Linux server, with Sendmail the main source of pain.
For a long time I simply wasn't getting spam: my domain is not on the radar, nor easily guessed.
Those blissful times are long gone and now I get dozens of spams a day.
So I finally decided to do something about it and looked into what sendmail can do for me. I settled on using its access db. Here, for posterity, is what I am currently doing to maintain that db:
Install cidrexpand.pl from Derek Balling. I had to install some Perl modules for this to work. Net::CIDR, I think it was.
Make a little script, which I call spamcatch, and put it in my path:
#!/bin/sh
SPAMFILE=/tmp/spam$$
cat - > $SPAMFILE
egrep --after=3 "^From " $SPAMFILE | egrep "^Received: " - | egrep "\[([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0\
-9]{1,3})\]" -o | egrep "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" -o >> /tmp/spam_ip_harvest
rm $SPAMFILE
Then, in Mutt, when the cursor is on a spam email, I pipe it to spamcatch, then delete the email. This step is pretty easy and a big improvement over what I'd been doing.
This collects ip addresses in
/tmp/spam\_ip_harvest
Now do
sort -n /tmp/spam_ip_harvest | uniq > spam_ip
to remove duplicates and
for ip in `cat spam_ip`; do echo \# $ip; whois -h whois.lacnic.net $ip ; done >> spam_whois
to look up the info that we will use to filter.
Most of the whois response is not what we're looking for, so filter out much of the junk with:
egrep "^##* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|^CIDR|^cidr|^Connect|^route|^inetnum|^parent" spam_whois > spam_whois2
This is the tedious part. For now we have to manually edit spam_whois2 to get to the Sendmail db format:
- Remove (or re-lookup with whois) entries with no route/cidr/etc.
- Put entries in CIDR format:
- Normalize entries like 123.45/16 to 123.45.0.0/16
- Normalize entries like 123.45.0.0 - 123.45.255.255 to 123.45.0.0/16
- Replace CIDR:, route:, inetnum:, etc. with Connect:
- Append REJECT after all the Connect: lines
For the rest we become root:
Copy /etc/mail/access and /etc/mail/access2 to /root/
Add new entries:
cat spam_whois2 >> access
Expand CIDR entries into mail access.db format:
cidrexpand < access > access2
Finally, we update the access db. Note that the .db extension on output file is implied. The following command actually writes to /etc/mail/access.db:
makemap hash /etc/mail/access < access2
Makemap may report errors like missing commands or duplicate entries, so you may have to re-edit access and re-issue the cidrexpand command until you have cleared these up.
To Do
I'd like to make that whois step come up with the format I need (CIDR record) but I don't know how to force that.
Ideally the command to which I pipe the spam from mutt would do the whole thing - pull out the IP from which the spam came, find its CIDR record, expand that to sendmail access db format, append to the current list, and rebuild the hash file. Ideally it would handle dupes and overlapping. I think I saw that cidrexpand has some ability to deal with overlapping ranges, so that part is promising.
[I know I can combine many of the individual steps described above into a single script and will do so. This is just a first airing of my dirty laundry so I can remember what to do next 'cause it's still too much to do every day.]
Effectiveness
What I'm most worried about now is that I block legit email. The only way to know with this approach is to scan sendmail's log file, /var/adm/maillog. This is a pain in the ass because you have to be privileged to read it and I haven't gotten to the point where I can casually see what's going on. A little more Google time wouldn't hurt I guess.