Archive

Posts Tagged ‘Squid Cache’

Who and what I allow access my websites

December 18th, 2008 No comments

I’ve written before about how I use my reverse proxy to block various bad bots and crawlers. At this stage I am blocking so much stuff that it would be way to much to post here. So if you are interested here are two links for you. The first is a copy of my squid regex file I use as per my tutorial and the second is a list of IP address and IP block that I have blocked on the proxy using IP tables.

Blockedbots.txt
Blockedip.txt

Why Squid Cache Rocks

March 28th, 2008 No comments

Update – I have a more complete tutorial on how to block bots with Squid over on my wiki which you can view here.

I’ve written before about my reverse proxy and how it allows me to accelerate content delivery and also to allow me to run multiple webservers unsing a single IP address. However it is capable of so much more.

Squid uses access control lists (acl’s) to govern who can do what with the proxy server. For example you can set acls to only allow certain computers to access the internet or indeed access the internet via the cache at certain times or hours. There are a myriad of different options that you could configure but one in particular struck me as being exceptionally useful. That is that you can use acls to block certain useragents.

In a conventional scenario you would use .htaccess on the server to block access to various bad bots. If you were the administrator of several or maybe even a few dozen sites then it becomes a chore to ensure that the bot and nefarious useragents in all the .htaccess files are kept up to date. However as in my case as all traffic is passing through the reverse proxy it becomes trivial to deny access to those bots and useragents as all you have to do is create a single acl and it will apply to all sites that the proxy is fronting for.

Setting it up couldn’t be easier.

In my case my squid.conf is almost identical to the one used on my reverse proxy tutorial. One of the key things to consider in adding an acl to block certain useragents is that the new acl that we will be creating needs to be read by squid on startup before all the others.

First up we need to define our acl. So as per my tutorial I need to add this acl which I will be calling ‘badbrowsers’ just above the first ‘cache_peer’ entry in squid.conf. I will be storing all the bad bot entries in a seperate text file to avoid a messy squid.conf. In order to get squid to reference a seperate file, the location for the file musr be enclosed in quotes. So now we define our acl exactly as follows:

acl badbrowsers browser “/etc/squid/badbrowsers.conf”

Now the acl has been defined we must decide on an action that will occur when our new acl is triggered and for this we need to scroll down through our squid.conf and in a new line just above the http_access for our proxied sites add a new line to deny http access for out acl as follows:

http_access deny badbrowsers

That’s all the configuration needed for our squid.conf so save your changes and now we will create and edit the file that we have defined that will contain our bad bots and useragents.

When defining our acl the configuration file that I have chosen will be located in /etc/squid. So change to this directory and using your favourite editor create a file called badbrowsers.conf. On each line in this file we can add our banned useragents using regular expressions. I’ve noticed lately that most of the comment spam that I have been receiving lately has been coming from a useragent calling itself “Jakarta Commons-HttpClient/3.0.1″. To banish this useragent add a line to your badbrowsers.conf file with the following:

^Jakarta

That’s it. That’s all you need. Once the first word is matched in the useragent string you don’t need anything else. You can elaborate on this if you like to encompass whatever you like using regular expressions.

Once you are happy with your configuration save your changes and restart squid and no more bad bots.

Reverse Proxy: Making the most of one IP address

January 3rd, 2008 47 comments

For the HowTo on my wiki please click here.

This is a repost from my personal wiki originally published on 18th August 2007

It seems that I can never leave well enough alone as far as my home setup is concerned. At one stage I had an excrutiatingly complicated
mail setup. Somewhile back I moved to rectify that and now I just have
the one mail server which is also serving mail for a few other domains.

My webserver was just running on one machine and is also serving up a few other domains for friends as well.

But I decided it was too simple.

One thing that was bugging me was how to make the most out of having
one static IP address. Virtual hosts are an option on Apache which fits
the bill nicely if all you want to do is serve PHP and static HTML
sites. But if you want to extend of further and run a J2EE app or an
ASPX site you pretty much hit a brick wall.

This is where reverse proxies come in. You might know that a proxy
server serves requests from multiple clients to multiple servers.
Acting as a kind of gateway. They also cache frequently accessed files
so they can also help to reduce bandwidth. Especially in a situation
where many users are sharing a single internet connection. A reverse
proxy as the name suggests does the reverse.

Essentially it allows multiple connections from the internet and
depending on certain criteria routes the requests to the desired
computers in your local area network. So I set one up today. In fact
the page you are reading now has passed through the reverse proxy.

There are many solutions available to implement a reverse proxy. The
Apache webserver would be one of the better know ones along with the Squid Cache
proxy server. Both of these are open source which means that they are
free so I opted for Squid as it is a lot more configurable.

So after an afternoon of compiling and configuring I managed to get
it up and running. Rather than bore you with the details now, I intend
to do a write up on my wiki in the near future. Not least of all so
that I can remember what I did if anything goes wrong.

Anyway, as already mentioned you are looking at this page served up
to you by IIS running on Windows. If you follow the links to the howto you find yourself viewing a page running Apache on Linux.

About the only ‘gotcha’ I noticed is that all my logfiles show the
requests as coming from the proxy rather than the actual computer that
made the request. I can use the actual squid logfiles from now on but
I’m going to have to work on implementing a solution for that soon.

Update – I have added a howto to my wiki. You can access the howto here.

Categories: Software Tags: ,
Easy AdSense by Unreal