MacPython Logo from __future__ import *

buy music albums Silver Apples buy mp3 albums Tarrus Riley buy tracks mp3 Kravits buy Reaper albums mp3 buy Kravits albums music buy music Evita CD online albums mp3 Silver Apples download Madonna CD music buy tracks music Kravits download music albums Silver Apples

2005-09-23

Apache X-Forwarded-For caveat

Filed under: debugging, python — bob @ 2:08 pm

When using Apache's mod_proxy in a reverse proxying scenario, usually you'll want to take a look at the (seemingly undocumented) X-Forwarded-For header. This header contains whatever the client sent for X-Forwarded-For (if anything), plus the remote IP address of the client.

So, if you're trying to do anything with this header, check for commas and pick out the last piece, because the client can send anything they want to, and you shouldn't ever trust the client:

def get_request_ip(request):
    """get the IP of a request in twisted.web-speak"""
    host = request.transport.getPeer().host
    # Twisted doesn't support IPv6 anyway :)
    if host != "127.0.0.1":
        return host
    header = request.received_headers.get('x-forwarded-for', None)
    if header is None:
        return host
    return header.split(',')[-1].strip()

I gleaned this from reading the Apache 2.0.54 source, I couldn't find any description of the behavior of mod_proxy in Apache's docs, only how to configure it. I was surprised that it even preserved what the client sent!

10 Comments »

  1. This is the “standard” for how `X-Forwarded-For` is supposed to work everywhere. I had read about this somewhere else before; I am surprised it appears not to actually be documented in Apache.

    The motivation is to be able to trace a chain of proxies. Having the information around doesn’t really hurt anyway, you may always disregard it at will.

    Comment by Aristotle Pagaltzis — 2005-09-23 @ 3:31 pm

  2. That’s what the Via header is for, which is actually a standard.

    Comment by bob — 2005-09-23 @ 3:32 pm

  3. I hear HTTP says that multiple headers can always be combined into a single header separated with commas. So I assume that’s where the commas are coming from.

    However, all this indirection is one reason I don’t like using mod_proxy for my applications, but prefer SCGI. SCGI has two namespaces (HTTP_* for headers, everything else for other info, like CGI), and there’s real trusted data as a result, as well as no confusion about which-request-does-this-apply-to. So, for instance, I can put an authentication system in Apache and then trust REMOTE_USER in my SCGI-connected application.

    Comment by Ian Bicking — 2005-09-26 @ 7:30 am

  4. Yeah, I had considered one of those approaches, I just didn’t want to migrate code off of mod_proxy. I’m not sure that Twisted has an easy SCGI interface, for example.

    Comment by bob — 2005-09-26 @ 9:31 am

  5. Incidentally, twisted.web2 has an SCGI interface now.

    Comment by Ian Bicking — 2005-09-27 @ 6:39 am

  6. I’m not using that yet, but I’ll definitely look into it next time I update the app.

    Comment by bob — 2005-09-27 @ 6:49 am

  7. It’s probably worth commenting that the client address is technically the *left-most* address, not the rightmost. The rightmost address is either the client address (if it’s the only address in the list), or the address of the last proxy server that the request went through.

    That said, since you probably don’t trust the client or their proxies to provide correct information, if you only have space to store one address, it probably *is* the right-most one that you want to keep. There are some obvious exceptions though… I believe AOL users are often behind AOL’s proxy servers, which means that recording the right-most address will probably get you the IP of an AOL proxy.

    My take, therefore, is that it’s best (if possible) to store the entire contents of the X-Forwarded-For header, just in case.

    Comment by alastair — 2007-04-02 @ 7:37 am

  8. The use case here is to determine the external IP address of the service, in a local reverse-proxy style load balancer configuration. The goal is not to find out any information whatsoever about the client.

    Comment by bob — 2007-04-02 @ 12:42 pm

  9. I think the correct solution does neither have some check for localhost (there is no rule that your reverse proxy must live on localhost, it could be some other [virtual] machine also), nor does it just use the rightmost or leftmost IP addr from x-forwarded-for.

    I guess the generalized scenario is this:

    Client -> ProxyL1 -> … -> ProxyLN -> ProxyP1 -> … -> ProxyPN -> ProxyR1 -> … -> ProxyRN -> Server

    What you want for your logs depends a bit on what you want, but, in the general case it is NEITHER ever the leftmost NOR ever the rightmost address in the list.

    E.g. Client’s IP could be 192.168.0.2 - if you are not that LAN’s admin, this is completely unusable, because many people on this planet uses this IP. So it is not the leftmost one in general. We have to kill all private IP addrs from the left (even the next proxies ProxyL1 … ProxyLN could have private IPs).

    Then, it has to go to the public IP address range somewhere. ProxyP1 is the first one using an own public IP address for querying all subsequent proxies. Maybe you want to log the IP ProxyP1 is using, because it is the neareast public IP to the client.

    If course, if you use the ProxyP1 IP addr you got from the X-Forwarded-For header, that means you have to trust all proxies after it that they are not cheating. So if you are paranoid, you maybe want to use the ProxyPX or ProxyRX IP address for the biggest X that is not a proxy operated by yourself. The bad news is that what you get, might be some proxy of AOL or whatever, so maybe not that useful in finding out who owned the browser doing that request.

    OK, so how is the algorithm? I think a simple algorithm would just remove all IP ranges we are not interested in, i.e.:
    1. all private IP ranges (see http://en.wikipedia.org/wiki/Private_network ) and also 127.0.0.0/8
    2. all (reverse) proxies near our server (if they do not have some IP already removed in step 1)

    What’s left is a list of public and “interesting” IP addresses. Now either take the leftmost (client first public ip) or the rightmost (paranoia mode) IP from those.

    I am new to this proxy stuff (just recently started to fix moinmoin’s support for it), so please tell me if I am talking nonsense. I just wrote this to get clear about how to implement it. Sometimes explaining it to someone else helps. :)

    Comment by Thomas Waldmann — 2007-11-24 @ 6:34 am

  10. An explanation of both the full problem and its solution is here: http://www.openinfo.co.uk/apache/

    Consider two possible ways for handling X-FORWARDED-FOR in conjunction with Django as the current middleware im 0.96.1 is highly questionable.

    1. The Apache 2 mod_extract_forwarded module available at http://www.openinfo.co.uk/apache/. The documentation is dated but the code is stable and is being used with both Apache 2.0 and Apache 2.2. With this solution, you actively do not want to run the Django soluiton for handling X-FORWARDED-FOR as the entire matter is handled by the Apache module, but you can dynamically detect the request header MEF_PROXY_ADDR to decide if mod_extract_forwarded had already processed the request.

    2. Alternatively, re-implement Django’s middleware solution using the logic in the following code fragment from Zope-2.10.5 module lib/python/ZPublisher/HTTPRequest.py which is broadly speaking correct. Note the use of the trusted_proxies list to stop the algorithm backing up too far in the forwarded for list:

    if environ.has_key(’REMOTE_ADDR’):
    self._client_addr = environ['REMOTE_ADDR']
    if environ.has_key(’HTTP_X_FORWARDED_FOR’) and self._client_addr in trusted_proxies:
    # REMOTE_ADDR is one of our trusted local proxies. Not really very remote at all.
    # The proxy can tell us the IP of the real remote client in the forwarded-for header
    # Skip the proxy-address itself though
    forwarded_for = [
    e.strip()
    for e in environ['HTTP_X_FORWARDED_FOR'].split(’,')]
    forwarded_for.reverse()
    for entry in forwarded_for:
    if entry not in trusted_proxies:
    self._client_addr = entry
    break
    else:
    self._client_addr = ”

    Comment by Richard Barrett — 2008-02-14 @ 12:28 pm

RSS feed for comments on this post.

Leave a comment

Protected by WP-Hashcash.

Powered by WordPress