[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
status report (2)
Finished the two new mod_perl modules to prevent spiders from killing us.
Both support a 'friends list' of IP's to allow us to run spiders on our
own site without being killed ourselves.
conf/good_ips.txt is a list of ip's or networks (regexp) that matches IP's
of hosts we don't want to kill if they make too many requests/min or have
a known unfriendly agent name. This is mostly to allow our own htdig
spider to do it's job without being denied itself. good_ips.txt applies
to both BlockAgent.pm and SpeedLimit.pm and is auto-magically re-read when
the time stamp of the file changes.
BlockAgent.pm also uses conf/bad_agents.txt to list any http agents we
want to stop. email gathering and mirroring agents mostly.
bad_agents.txt is also a list of regexps and auto-magically re-read when
the time stamp of the file changes.
Below are the httpd.conf statements which should explain a bit how things
work.
<Location />
PerlAccessHandler Apache::BlockAgent
PerlSetVar BlockAgentFile conf/bad_agents.txt
# bad http agents to block
PerlSetVar AllowedIPFile conf/good_ips.txt
# friendly IP's to exclude
</Location>
<Location />
PerlAccessHandler Apache::SpeedLimit
PerlSetVar AllowedIPFile conf/good_ips.txt
# friendly IP's to exclude
PerlSetVar SpeedLimit 20
# max 20 hits/min
PerlSetVar SpeedSamples 5
# 5 hits before sampling
PerlSetVar SpeedForgive 30
# amnesty after 30 min
</Location>
--
Aaron Turner, Core Developer http://vodka.linuxkb.org/~aturner/
Linux Knowledge Base Organization http://linuxkb.org/
Because world domination requires quality open documentation.