MacPython Logo from __future__ import *

Kailash and Friends Kailash Kher Kaipa

online mp3 Anoice albums buy Amund Maarud albums online Asia online CD Andy M. Stewart buy tracks Axis online Astral Rising A Beautiful Machine download CD Aereda buy tracks Aksent online tracks Absidia Atrium Carceri A Beautiful Machine Absolum buy CD Aryan Wind and Brumalis and Valhalla Saints online music Atomsmasher download albums AK1200 download music Angelzoom online CD Arturo Mantovani and his Orchestra buy music 16 buy tracks Ashtorath online CD Aimee Mann buy music Anael And Bradfield buy mp3 Autumnblaze download mp3 Aggrolites download CD Arj Snoek buy albums Ada buy CD Aalto Andy With Rama West A Beautiful Machine Absolum online tracks Asura albums online Albert Lee 4 Non Blondes A Beautiful Machine Absolum download albums Andrew Lloyd Webber and Ar Rahman online music African Head Charge download mp3 Amber Asylum online music Analena online music ANTIX feat ROB SALMON A.R. Rahman A Beautiful Machine Absolum online tracks African Blackwood buy mp3 Axis buy mp3 Alan Menken buy music Amoebic Dysentery buy Alph Secakuku A Beautiful Machine albums download Albita online Amparo Ochoa A Beautiful Machine download tracks Andy Partridge and Harold Budd download tracks Anubian Lights Alient Project A Beautiful Machine Absolum buy albums Antonio Forcione download CD Ali G Indahouse online mp3 Art and Jazz Messengers Blakey download Arab Strap A Beautiful Machine online albums Adema buy Agua de Annique A Beautiful Machine buy CD Avalanches download tracks Acroma Andi Deris A Beautiful Machine Absolum download tracks American Steel download albums Amanda Perez online 999 A Beautiful Machine download mp3 Arild Andersen download CD American Steel buy tracks Absolute Beginner download tracks Anubi online albums Ancient Wisdom online A Verse Unsung A Beautiful Machine buy music Aghast Andromeda Island A Beautiful Machine Absolum download Arlo Guthrie A Beautiful Machine online mp3 Aavepyora online albums Achillea buy Andrew Bird A Beautiful Machine buy music Alexey Aigui and Ensemble 4'33'' albums buy Abbey Lincoln and Archie Shepp download albums Archive download CD A Guy Called Gerald feat. D.S. download music Al Di Meola online music Abigail download music Angel Witch online music Adelaide

2006-08-04

Reverse proxy roundup

Filed under: FreeBSD, General — bob @ 8:23 pm

The first line of defense in scaling out a web solution is a load balancer, often implemented as a HTTP reverse proxy. In my particular situation, I wanted to be able to load balance based on HTTP headers (Host is a must, URL is a bonus) on a FreeBSD platform.

Apache 2.0 + mod_proxy

Pros:
We were already running Apache, so it was an obvious initial solution to the problem. Not too bad to configure, well documented, and pretty stable. It served us well for quite some time.
Cons:
The prefork MPM is definitely not ideal for this purpose and doesn't give us the scalability we want. Screwing around with other MPMs was not ideal because Apache is also doing other things that we didn't want to disturb. Maintaining a separately configured installation just for proxying isn't a good solution. The thread-using MPMs would only be a small win, anyway.

LightTPD

Pros:
Easy enough to configure, supports all of the features we need, pretty good performance and scalability. Single process event-driven model that can use kqueue.
Cons:
Spotty documentation. Totally bizarre configuration language. Not terribly mature compared to other solutions. Lots of bugs. The current version (1.4.11) leaks memory like a sieve (#758), so it's unfortunately not an option at this point.

Pound

Pros:
Solid documentation (but only in man page format). All of the features we need, not a whole lot of other junk. Good track record. This is what we're currently using for now, but it's no permanent solution. Better than Apache, and it doesn't seem to leak RAM.
Cons:
Uses LOTS of threads. That means lots of RAM, limited scalability and performance. Documentation is only available as a man page, had to install it to really evaluate it.

The other solutions I looked into, but didn't end up trying were

Squid:
Way too complicated. Too hard to figure out how to get it to do what we wanted to. Might be a good solution though, so I'll probably look into it again.
Perlbal:
No usable docs. FreeBSD port exists, but sucks (no rc file, no sample config). Couldn't figure out how to get it to reverse proxy with the features that I needed. Written in Perl, which I do grok, but don't like hacking on.
HAProxy:
Looks awesome, but doesn't have the features we need. Seems to only be able to do IP based load balancing, not by headers. More than just a TCP proxy, but not enough of a HTTP proxy.
Balance:
Just a TCP proxy.
Pen:
Just a TCP proxy.

23 Comments »

  1. Apache has some other benefits that you did not mention. All of the Apache modules out there are available to you. Specifically, mod_security is a great way to do a conceptually simple but flexible layer-7 firewall. That will block bad URLs, GETs, POSTs, SQL injection, XSS, and other nastiness. (Also look at fail2ban as a supplement.)

    A mature and well-understood reverse-proxy + HTTP firewall combination seems like a decent trade-off to me. For those features, you might justify a separate chroot Apache installation (mod_security supports this as a configuration option) — that’s no different from a separate Pound installation, right? In fact, a reverse proxy and firewall _should_ be standalone for security and availability reasons. (At least with Apache, you get the large mindshare, maintainability, and documentation). Is prefork the fastest thing out there? No. But since it is standalone now, you can experiment with other MPMs. Upgrading the hardware might be an option too if you agree that a dedicated chrooted mod_proxy + mod_security + fail2ban combination is worth the cost.

    Comment by Jason Smith — 2006-08-04 @ 10:07 pm

  2. mod_rewrite is also worth mentioning. It is great for a front-end proxy when you need to start integrating several components together.

    Comment by Jason Smith — 2006-08-04 @ 10:09 pm

  3. Apache 2.0 and 2.2 especially are good as a caching proxy. By using its caching features you should get pretty good performance. Depending on your app of course.

    2.0 caching is buggier and less feature complete than 2.2.

    It didn’t look like pound could do caching.

    Maybe with caching apache could work ok for you.

    Comment by Rene Dudfield — 2006-08-05 @ 12:16 am

  4. lighty 1.5.0 currently gets a final polish and has a completly new mod_proxy_core which will integrate the features from the different backend plugins and will support HTTP, FastCGI, SCGI and CGI and provide load-balancing, fail-over and keep-alive on top of them.

    http://blog.lighttpd.net/articles/2006/07/15/the-new-mod_proxy_core or the new articles document the process.

    Comment by Jan Kneschke — 2006-08-05 @ 12:22 am

  5. It’s great that lighty 1.5.0 is going to be out soon, but honestly I don’t give a shit about features unless it no longer leaks memory. Lighty sounds nice in theory, but I can’t run broken code in production.

    Comment by bob — 2006-08-05 @ 3:10 am

  6. I think you’d want to use Squid really… you’ll also get cacheing for free - it even supports cacheing of parts of the HTML… there was a press release once that Zope and Squid supported that… don’t know how that works though.

    And I don’t know, but doesn’t seem to be that hard to configure… I’ve done it :)

    Comment by Damjan — 2006-08-05 @ 8:10 am

  7. Caching doesn’t apply here. I only want reverse proxying, nothing else.

    Comment by bob — 2006-08-05 @ 1:44 pm

  8. Hi Bob,

    have you considered PLB (Pure Load Balancer): http://plb.sunsite.dk/index.html

    “It uses an asynchronous non-forking/non-blocking model, and provides failover abilities. When a backend server goes down, it automatically removes it from the server pool, and tries to bring it back to life later.”

    There is also python director: http://pythondirector.sourceforge.net/

    “async i/o based, so much less overhead than fork/thread based balancers. Can use either twisted or python’s standard asyncore library (twisted is recommended, and asyncore support will be removed in a future version).”

    Dunno if they can do HTTP headers based balance, PLB’s doc seems pretty scarce while it seems you can easily write (in python) a custom balancing algorithm for python director.

    Let us know how it goes.

    Comment by michele — 2006-08-05 @ 6:49 pm

  9. I hadn’t heard of either PLB or Python Director. PLB is basically just a TCP load balancer, and so is Python Director. Neither of them know anything about HTTP.

    For some reason PLB reads in the HTTP headers in full before dispatching to a server (from what I understand by glancing at the source), but it doesn’t act based on those headers.

    Of course I could hack something to do what I want, but that’s really a last resort… I have other code that needs to be written. A better reverse proxy than Pound is a relatively small win overall, so writing a bunch of code to replace it would be counter-productive (especially considering the maintenance it’d require over time).

    Comment by bob — 2006-08-05 @ 8:35 pm

  10. Ops, I hadn’t noticed they are both TCP based although for Python Director is clearly stated… need to sleep more.

    Finally I agree that spending time implementing an ad-hoc alternative to Pound is quite pointless.

    Thanks.

    Comment by michele — 2006-08-06 @ 1:34 am

  11. I am using Squid to proxy incoming requests to separate applications running on the same server: Zope, Apache, and CherryPy.

    It’s much easier than you think. In squid.conf, you need to set the following:

    http_port 80

    redirect_program /path/to/program-i-will-explain-below

    change the “http_access deny all” line to “http_access allow all”.

    httpd_accel_host virtual
    httpd_accel_port 0

    httpd_accel_uses_host_header on

    Now all access control and redirection will be controlled by the program that redirect_program points to.

    It’s pretty simple, actually. Squid will open up 5 (configurable through redirect_children) instances of the program and write data about incoming requests to stdin. There are 4 fields: url, source_ip, ident, and method. All the program needs to do is respond with the real information and Squid will retrieve it on behalf of the client.

    For example, if you want http://www.foo.com/ to actually go to http://10.0.0.2:8080/, you write a program that behaves like this:

    stdin: http://www.foo.com/ 192.168.1.1 - GET
    stdout: http://10.0.0.2:8080/ 192.168.1.1 - GET

    As you can see, this gives you an enormous amount of flexibility. You can redirect based on source ip, and you can merge the url space of separate servers. You can have http://www.foo.com/bar go to an Apache instance and http://foo.com/baz go to Zope.

    Comment by James Oakley — 2006-08-11 @ 7:45 pm

  12. You should check out Squid more thoroughly.  A company I used to work at used Squid as the basis of a pretty large CDN (content delivery network).  I’m familiar with some of the more obscure but performance-boosting options and some of the configuration pitfalls.   Shoot me a line, or maybe I’ll post an article on it if I ever get my blog back up.

    -arg

    Comment by Andy Gross — 2006-08-15 @ 10:52 am

  13. It is quite strange that you consider HTTP servers while looking for HTTP proxy. There are few mature proxy servers that can do a great job better than any HTTP server. Beside Squid, I’d recommend to take a look at Delegate (http://www.delegate.org/)

    Comment by Stranger — 2006-08-16 @ 7:13 pm

  14. DeleGate was not considered because I’ve never heard of it and it didn’t show up in any of the searches I did.

    The majority of proxy servers I found did not suit the requirements.

    Comment by bob — 2006-08-17 @ 10:10 am

  15. Bob, I haven’t been able to confirm the Memory Leak in Lighttpd. I’ve been running it for over a month on my Textdrive server which runs FreeBSD 6.0. However, I’m not sure if they’ve installed via ports or otherwise.

    Comment by SuperJared — 2006-08-24 @ 1:00 pm

  16. Perhaps it’s reverse proxying that causes the leak? That’s the only thing my Lighttpd installation was doing at the time. Are you sure it’s 1.4.11?

    Either way, I wasn’t the first one with the problem, and I’m not particularly interested in touching Lighttpd again after that experience.

    Comment by bob — 2006-08-24 @ 3:14 pm

  17. Hi guys,

    just found varnish
    http://varnish.projects.linpro.no/
    and also a blog with som e comments o it:
    http://www.mnot.net/blog/2006/08/21/caching_performance
    In my opiniosn, varnsih looks qute promising

    Comment by Hans — 2006-11-12 @ 9:00 am

  18. This stuff is covered heavily in both “Scalable Internet Architectures” and “Building Scalable Web Sites”.

    Comment by Shannon -jj Behrens — 2006-11-16 @ 1:12 am

  19. Unfortunately the state of the art advances faster than the presses, so it’s entirely likely that whatever recommendations are given in literature for specific software choices are irrelevant.

    On the other hand, I’m sure these books have great advice with regard to the architecture of scalable sites. However, I don’t think they’re particularly relevant to specific load balancer/proxy choices.

    Comment by bob — 2006-11-16 @ 1:54 am

  20. I believe that HTProxy does (now?) support the features (cookie based session affinity, in particular) that you’re looking for. Currently I’m trialling it myself, while I’m nowhere near production (or even development) it seems quite interesting.

    Comment by johnf — 2007-11-23 @ 6:48 am

  21. argh!

    HAProxy, not HTProxy.

    Comment by johnf — 2007-11-23 @ 6:50 am

  22. I would highly suggest pound or lighttpd as a reverse proxy. As of version 2.4e, Pound is extremely fast and stable. Lighttpd did have some problems in the past and most of those have been fixed. Memeory managment has been greatly improved. I have to agree about the documentation, but there are examples like the following to help everyone out:

    Pound Reverse Proxy “how “to”
    http://calomel.org/pound.html

    Light webserver “how to”
    http://calomel.org/lighttpd.html

    Comment by Calomel — 2007-12-12 @ 8:31 am

  23. Just to add to the list, PHK of FreeBBSD, MD5, and phkmalloc fame has started on a project called “Varnish”:

    http://www.varnish-cache.org/

    There’s a good video on why they’re doing things the way they are:

    http://varnish.projects.linpro.no/wiki/VarnishInTheNews

    You can also skim the arch doc (though I also recommend the video if you have time):

    http://varnish.projects.linpro.no/wiki/ArchitectNotes

    Comment by David Magda — 2008-01-23 @ 7:55 pm

RSS feed for comments on this post.

Leave a comment

What's a blog without spam? WP-Hashcash.

Powered by WordPress