[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

status report (2)




Finished the two new mod_perl modules to prevent spiders from killing us.
Both support a 'friends list' of IP's to allow us to run spiders on our
own site without being killed ourselves.

conf/good_ips.txt is a list of ip's or networks (regexp) that matches IP's
of hosts we don't want to kill if they make too many requests/min or have
a known unfriendly agent name.  This is mostly to allow our own htdig
spider to do it's job without being denied itself.  good_ips.txt applies
to both BlockAgent.pm and SpeedLimit.pm and is auto-magically re-read when
the time stamp of the file changes.

BlockAgent.pm also uses conf/bad_agents.txt to list any http agents we
want to stop.  email gathering and mirroring agents mostly.
bad_agents.txt is also a list of regexps and auto-magically re-read when
the time stamp of the file changes.

Below are the httpd.conf statements which should explain a bit how things
work.


<Location />
	PerlAccessHandler Apache::BlockAgent
	PerlSetVar	BlockAgentFile	conf/bad_agents.txt
					# bad http agents to block
	PerlSetVar	AllowedIPFile	conf/good_ips.txt
					# friendly IP's to exclude
</Location>

<Location />
	PerlAccessHandler Apache::SpeedLimit
	PerlSetVar	AllowedIPFile	conf/good_ips.txt
					# friendly IP's to exclude
	PerlSetVar	SpeedLimit	20 
					# max 20 hits/min
	PerlSetVar	SpeedSamples	5  
					# 5 hits before sampling
	PerlSetVar	SpeedForgive	30
					# amnesty after 30 min
</Location>



--
Aaron Turner, Core Developer       http://vodka.linuxkb.org/~aturner/
Linux Knowledge Base Organization  http://linuxkb.org/
Because world domination requires quality open documentation.