<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Apache X-Forwarded-For caveat</title>
	<atom:link href="http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/feed/" rel="self" type="application/rss+xml" />
	<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/</link>
	<description>Bob's Rants</description>
	<pubDate>Fri, 25 Jul 2008 16:09:28 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Richard Barrett</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-12405</link>
		<dc:creator>Richard Barrett</dc:creator>
		<pubDate>Thu, 14 Feb 2008 20:28:54 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-12405</guid>
		<description>An explanation of both the full problem and its solution is here: http://www.openinfo.co.uk/apache/

Consider two possible ways for handling X-FORWARDED-FOR in conjunction with Django as the current middleware im 0.96.1 is highly questionable. 

1. The Apache 2 mod_extract_forwarded module available at http://www.openinfo.co.uk/apache/. The documentation is  dated but the code is stable and is being used with both Apache 2.0 and Apache 2.2. With this solution, you actively do not want to run the Django soluiton for handling X-FORWARDED-FOR as the entire matter is handled by the Apache module, but you can dynamically detect the request header MEF_PROXY_ADDR to decide if mod_extract_forwarded had already processed the request.

2. Alternatively, re-implement Django's middleware solution using the logic in the following code fragment from Zope-2.10.5 module lib/python/ZPublisher/HTTPRequest.py which is broadly speaking correct. Note the use of the trusted_proxies list to stop the algorithm backing up too far in the forwarded for list:
 
    if environ.has_key('REMOTE_ADDR'):
        self._client_addr = environ['REMOTE_ADDR']
        if environ.has_key('HTTP_X_FORWARDED_FOR') and self._client_addr in trusted_proxies:
            # REMOTE_ADDR is one of our trusted local proxies. Not really very remote at all.
            # The proxy can tell us the IP of the real remote client in the forwarded-for header
            # Skip the proxy-address itself though
            forwarded_for = [
                e.strip()
                for e in environ['HTTP_X_FORWARDED_FOR'].split(',')]
            forwarded_for.reverse()
            for entry in forwarded_for:
                if entry not in trusted_proxies:
                    self._client_addr = entry
                    break
    else:
        self._client_addr = ''</description>
		<content:encoded><![CDATA[<p>An explanation of both the full problem and its solution is here: <a href="http://www.openinfo.co.uk/apache/" rel="nofollow">http://www.openinfo.co.uk/apache/</a></p>
<p>Consider two possible ways for handling X-FORWARDED-FOR in conjunction with Django as the current middleware im 0.96.1 is highly questionable. </p>
<p>1. The Apache 2 mod_extract_forwarded module available at <a href="http://www.openinfo.co.uk/apache/" rel="nofollow">http://www.openinfo.co.uk/apache/</a>. The documentation is  dated but the code is stable and is being used with both Apache 2.0 and Apache 2.2. With this solution, you actively do not want to run the Django soluiton for handling X-FORWARDED-FOR as the entire matter is handled by the Apache module, but you can dynamically detect the request header MEF_PROXY_ADDR to decide if mod_extract_forwarded had already processed the request.</p>
<p>2. Alternatively, re-implement Django&#8217;s middleware solution using the logic in the following code fragment from Zope-2.10.5 module lib/python/ZPublisher/HTTPRequest.py which is broadly speaking correct. Note the use of the trusted_proxies list to stop the algorithm backing up too far in the forwarded for list:</p>
<p>    if environ.has_key(&#8217;REMOTE_ADDR&#8217;):<br />
        self._client_addr = environ['REMOTE_ADDR']<br />
        if environ.has_key(&#8217;HTTP_X_FORWARDED_FOR&#8217;) and self._client_addr in trusted_proxies:<br />
            # REMOTE_ADDR is one of our trusted local proxies. Not really very remote at all.<br />
            # The proxy can tell us the IP of the real remote client in the forwarded-for header<br />
            # Skip the proxy-address itself though<br />
            forwarded_for = [<br />
                e.strip()<br />
                for e in environ['HTTP_X_FORWARDED_FOR'].split(&#8217;,')]<br />
            forwarded_for.reverse()<br />
            for entry in forwarded_for:<br />
                if entry not in trusted_proxies:<br />
                    self._client_addr = entry<br />
                    break<br />
    else:<br />
        self._client_addr = &#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Thomas Waldmann</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-11759</link>
		<dc:creator>Thomas Waldmann</dc:creator>
		<pubDate>Sat, 24 Nov 2007 14:34:11 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-11759</guid>
		<description>I think the correct solution does neither have some check for localhost (there is no rule that your reverse proxy must live on localhost, it could be some other [virtual] machine also), nor does it just use the rightmost or leftmost IP addr from x-forwarded-for.

I guess the generalized scenario is this:

Client -&#62; ProxyL1 -&#62; ... -&#62; ProxyLN -&#62; ProxyP1 -&#62; ... -&#62; ProxyPN -&#62; ProxyR1 -&#62; ... -&#62; ProxyRN -&#62; Server

What you want for your logs depends a bit on what you want, but, in the general case it is NEITHER ever the leftmost NOR ever the rightmost address in the list.

E.g. Client's IP could be 192.168.0.2 - if you are not that LAN's admin, this is completely unusable, because many people on this planet uses this IP. So it is not the leftmost one in general. We have to kill all private IP addrs from the left (even the next proxies ProxyL1 ... ProxyLN could have private IPs).

Then, it has to go to the public IP address range somewhere. ProxyP1 is the first one using an own public IP address for querying all subsequent proxies. Maybe you want to log the IP ProxyP1 is using, because it is the neareast public IP to the client.

If course, if you use the ProxyP1 IP addr you got from the X-Forwarded-For header, that means you have to trust all proxies after it  that they are not cheating. So if you are paranoid, you maybe want to use the ProxyPX or ProxyRX IP address for the biggest X that is not a proxy operated by yourself. The bad news is that what you get, might be some proxy of AOL or whatever, so maybe not that useful in finding out who owned the browser doing that request.

OK, so how is the algorithm? I think a simple algorithm would just remove all IP ranges we are not interested in, i.e.:
 1. all private IP ranges (see http://en.wikipedia.org/wiki/Private_network ) and also 127.0.0.0/8
 2. all (reverse) proxies near our server (if they do not have some IP already removed in step 1)

What's left is a list of public and "interesting" IP addresses. Now either take the leftmost (client first public ip) or the rightmost (paranoia mode) IP from those.

I am new to this proxy stuff (just recently started to fix moinmoin's support for it), so please tell me if I am talking nonsense. I just wrote this to get clear about how to implement it. Sometimes explaining it to someone else helps. :)</description>
		<content:encoded><![CDATA[<p>I think the correct solution does neither have some check for localhost (there is no rule that your reverse proxy must live on localhost, it could be some other [virtual] machine also), nor does it just use the rightmost or leftmost IP addr from x-forwarded-for.</p>
<p>I guess the generalized scenario is this:</p>
<p>Client -&gt; ProxyL1 -&gt; &#8230; -&gt; ProxyLN -&gt; ProxyP1 -&gt; &#8230; -&gt; ProxyPN -&gt; ProxyR1 -&gt; &#8230; -&gt; ProxyRN -&gt; Server</p>
<p>What you want for your logs depends a bit on what you want, but, in the general case it is NEITHER ever the leftmost NOR ever the rightmost address in the list.</p>
<p>E.g. Client&#8217;s IP could be 192.168.0.2 - if you are not that LAN&#8217;s admin, this is completely unusable, because many people on this planet uses this IP. So it is not the leftmost one in general. We have to kill all private IP addrs from the left (even the next proxies ProxyL1 &#8230; ProxyLN could have private IPs).</p>
<p>Then, it has to go to the public IP address range somewhere. ProxyP1 is the first one using an own public IP address for querying all subsequent proxies. Maybe you want to log the IP ProxyP1 is using, because it is the neareast public IP to the client.</p>
<p>If course, if you use the ProxyP1 IP addr you got from the X-Forwarded-For header, that means you have to trust all proxies after it  that they are not cheating. So if you are paranoid, you maybe want to use the ProxyPX or ProxyRX IP address for the biggest X that is not a proxy operated by yourself. The bad news is that what you get, might be some proxy of AOL or whatever, so maybe not that useful in finding out who owned the browser doing that request.</p>
<p>OK, so how is the algorithm? I think a simple algorithm would just remove all IP ranges we are not interested in, i.e.:<br />
 1. all private IP ranges (see <a href="http://en.wikipedia.org/wiki/Private_network" rel="nofollow">http://en.wikipedia.org/wiki/Private_network</a> ) and also 127.0.0.0/8<br />
 2. all (reverse) proxies near our server (if they do not have some IP already removed in step 1)</p>
<p>What&#8217;s left is a list of public and &#8220;interesting&#8221; IP addresses. Now either take the leftmost (client first public ip) or the rightmost (paranoia mode) IP from those.</p>
<p>I am new to this proxy stuff (just recently started to fix moinmoin&#8217;s support for it), so please tell me if I am talking nonsense. I just wrote this to get clear about how to implement it. Sometimes explaining it to someone else helps. :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bob</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-9590</link>
		<dc:creator>bob</dc:creator>
		<pubDate>Mon, 02 Apr 2007 20:42:06 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-9590</guid>
		<description>The use case here is to determine the external IP address of the service, in a local reverse-proxy style load balancer configuration. The goal is not to find out any information whatsoever about the client.</description>
		<content:encoded><![CDATA[<p>The use case here is to determine the external IP address of the service, in a local reverse-proxy style load balancer configuration. The goal is not to find out any information whatsoever about the client.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: alastair</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-9581</link>
		<dc:creator>alastair</dc:creator>
		<pubDate>Mon, 02 Apr 2007 15:37:35 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-9581</guid>
		<description>It's probably worth commenting that the client address is technically the *left-most* address, not the rightmost. The rightmost address is either the client address (if it's the only address in the list), or the address of the last proxy server that the request went through.

That said, since you probably don't trust the client or their proxies to provide correct information, if you only have space to store one address, it probably *is* the right-most one that you want to keep. There are some obvious exceptions though… I believe AOL users are often behind AOL's proxy servers, which means that recording the right-most address will probably get you the IP of an AOL proxy.

My take, therefore, is that it's best (if possible) to store the entire contents of the X-Forwarded-For header, just in case.</description>
		<content:encoded><![CDATA[<p>It&#8217;s probably worth commenting that the client address is technically the *left-most* address, not the rightmost. The rightmost address is either the client address (if it&#8217;s the only address in the list), or the address of the last proxy server that the request went through.</p>
<p>That said, since you probably don&#8217;t trust the client or their proxies to provide correct information, if you only have space to store one address, it probably *is* the right-most one that you want to keep. There are some obvious exceptions though… I believe AOL users are often behind AOL&#8217;s proxy servers, which means that recording the right-most address will probably get you the IP of an AOL proxy.</p>
<p>My take, therefore, is that it&#8217;s best (if possible) to store the entire contents of the X-Forwarded-For header, just in case.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bob</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-2649</link>
		<dc:creator>bob</dc:creator>
		<pubDate>Tue, 27 Sep 2005 14:49:48 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-2649</guid>
		<description>I'm not using that yet, but I'll definitely look into it next time I update the app.</description>
		<content:encoded><![CDATA[<p>I&#8217;m not using that yet, but I&#8217;ll definitely look into it next time I update the app.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Bicking</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-2648</link>
		<dc:creator>Ian Bicking</dc:creator>
		<pubDate>Tue, 27 Sep 2005 14:39:24 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-2648</guid>
		<description>Incidentally, twisted.web2 has an SCGI interface now.</description>
		<content:encoded><![CDATA[<p>Incidentally, twisted.web2 has an SCGI interface now.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bob</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-2647</link>
		<dc:creator>bob</dc:creator>
		<pubDate>Mon, 26 Sep 2005 17:31:03 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-2647</guid>
		<description>Yeah, I had considered one of those approaches, I just didn't want to migrate code off of mod_proxy.  I'm not sure that Twisted has an easy SCGI interface, for example.</description>
		<content:encoded><![CDATA[<p>Yeah, I had considered one of those approaches, I just didn&#8217;t want to migrate code off of mod_proxy.  I&#8217;m not sure that Twisted has an easy SCGI interface, for example.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian Bicking</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-2646</link>
		<dc:creator>Ian Bicking</dc:creator>
		<pubDate>Mon, 26 Sep 2005 15:30:11 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-2646</guid>
		<description>I hear HTTP says that multiple headers can always be combined into a single header separated with commas.  So I assume that's where the commas are coming from.

However, all this indirection is one reason I don't like using mod_proxy for my applications, but prefer SCGI.  SCGI has two namespaces (HTTP_* for headers, everything else for other info, like CGI), and there's real trusted data as a result, as well as no confusion about which-request-does-this-apply-to.  So, for instance, I can put an authentication system in Apache and then trust REMOTE_USER in my SCGI-connected application.</description>
		<content:encoded><![CDATA[<p>I hear HTTP says that multiple headers can always be combined into a single header separated with commas.  So I assume that&#8217;s where the commas are coming from.</p>
<p>However, all this indirection is one reason I don&#8217;t like using mod_proxy for my applications, but prefer SCGI.  SCGI has two namespaces (HTTP_* for headers, everything else for other info, like CGI), and there&#8217;s real trusted data as a result, as well as no confusion about which-request-does-this-apply-to.  So, for instance, I can put an authentication system in Apache and then trust REMOTE_USER in my SCGI-connected application.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bob</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-2638</link>
		<dc:creator>bob</dc:creator>
		<pubDate>Fri, 23 Sep 2005 23:32:50 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-2638</guid>
		<description>That's what the Via header is for, which is actually a standard.</description>
		<content:encoded><![CDATA[<p>That&#8217;s what the Via header is for, which is actually a standard.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aristotle Pagaltzis</title>
		<link>http://bob.pythonmac.org/archives/2005/09/23/apache-x-forwarded-for-caveat/#comment-2637</link>
		<dc:creator>Aristotle Pagaltzis</dc:creator>
		<pubDate>Fri, 23 Sep 2005 23:31:28 +0000</pubDate>
		<guid isPermaLink="false">http://bob.pythonmac.org/?p=180#comment-2637</guid>
		<description>This is the &#8220;standard&#8221; for how `X-Forwarded-For` is supposed to work everywhere. I had read about this somewhere else before; I am surprised it appears not to actually be documented in Apache.

The motivation is to be able to trace a chain of proxies. Having the information around doesn&#8217;t really hurt anyway, you may always disregard it at will.</description>
		<content:encoded><![CDATA[<p>This is the &#8220;standard&#8221; for how `X-Forwarded-For` is supposed to work everywhere. I had read about this somewhere else before; I am surprised it appears not to actually be documented in Apache.</p>
<p>The motivation is to be able to trace a chain of proxies. Having the information around doesn&#8217;t really hurt anyway, you may always disregard it at will.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
