mod_rewrite not sending Vary: Accept-Language when RewriteCond matches - mod-rewrite

I have a rewrite rule which redirects to / if no Accept-Language header is present and someone attempts to visit ?lang=en. It works fine, except for the headers returned. Vary: Accept-Language is missing from the response.
RewriteCond %{HTTP:Accept-Language} ^$
RewriteCond %{QUERY_STRING} ^lang=en
RewriteRule ^$ http://www.example.com/? [R=301,L]
The Apache documentation specifies:
If a HTTP header is used in a condition this header is added to the Vary header of the response in case the condition evaluates to to true for the request. It is not added if the condition evaluates to false for the request.
The conditions are definitely matching and redirecting, so I don't understand why Apache isn't adding the language vary. One can see why this would be a real problem if a proxy were to cache that ?lang=en and always redirect to / regardless of the Accept-Language header sent.

After peeking into the seedy underbelly of Apache's request handling system, it turns out that the documentation is somewhat misleading...But before I get into the explanation, from what I can tell you're at the mercy of Apache on this one.
The Client Problem
First, the header name will not be added to the Vary response header if it is not sent by the client. This is due to how mod_rewrite constructs the value for that header internally.
It looks up the header by name using apr_table_get(), the request's header table, and the name that you provided:
const char *val = apr_table_get(ctx->r->headers_in, name);
If name is not a key in the table, this function will return NULL. This is a problem, because immediately after this is a check against val:
if (val) {
// Set the structure member ctx->vary_this
}
ctx->vary_this is used on a per-RewriteCond basis to accumulate header names that should be assembled into the final Vary header*. Since no assignment or appending will occur if there is no value, a referenced (but not sent) header will never appear in Vary. The documentation doesn't explicitly state this, so it may or may not have been what you expected.
*As an aside, the NV (no vary) flag and ignore-on-failure functionality is implemented by setting ctx->vary_this to NULL, preventing its addition to the response header.
However, it's possible that you sent Accept-Language, but it was blank. In this case, the empty string will pass the above check, and the header name will be added to Vary by mod_rewrite from what's described above. Keeping this in mind, I used the following request to diagnose what was going on:
User-Agent: Fiddler
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language:
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Host: 129.168.0.123
This doesn't work either, but why? mod_rewrite definitely sets the headers when the rule and condition match (ctx->vary is an aggregate of ctx->vary_this across all checked conditions):
if (ctx->vary) {
apr_table_merge(r->headers_out, "Vary", ctx->vary);
}
This can be verified with a log statement, and r->headers_out is the variable used when generating the response headers. Given something is definitely going wrong though, there must be trouble after the rules are executed.
The .htaccess Problem
Currently, you appear to be defining your rules in .htaccess, or a <Directory> section. This means that mod_rewrite is operating in Apache's fixup phase, and the mechanism it uses to actually perform rewrites here is very messy. Let's assume for a second there's no external redirection, since you had problem a even without it (and I'll get to the issue with the redirect later).
After you perform a rewrite, it's far too late in the request processing for the module to actually map to a file. What it does instead is assign itself as the request's "content" handler and when the request reaches that point, it performs a call to ap_internal_redirect(). This leads to the creation of a new request object, one that does not contain the headers_out table from the original.
Assuming that mod_rewrite causes no further redirects, the response is generated from the new request object, which will never have the appropriate (original) headers assigned to it. It is possible to get around this by working in a per-server context (in the main configuration or in a <VirtualHost>), but...
The Redirect Problem
Unfortunately, it turns out that it's largely irrelevant anyway, since even if we do use mod_rewrite in a server context, the path the response takes in the event of a redirect still causes the headers that the module set to be tossed out.
When the request is received by Apache, through a chain of function calls it makes its way to ap_process_request(). This in turn calls ap_process_request_internal(), where the bulk of the important request parsing steps occur (including the invocation of mod_rewrite). It returns an integer status code, which in the case of your redirect happens to be set to 301.
Most requests return OK (which has a value of 0), leading immediately to ap_finalize_request_protocol(). However, that's not the case here:
if (access_status == OK) {
ap_finalize_request_protocol(r);
}
else {
r->status = HTTP_OK;
ap_die(access_status, r);
}
ap_die() does some additional manipulation (like returning the response code back to 301), and in this particular case ends with a call to ap_send_error_response().
Luckily, this is finally root of the problem. Though it might seem like it, things are not "assbackwards", and this causes the destruction of the original headers. There's even a comment about it in the source:
if (!r->assbackwards) {
apr_table_t *tmp = r->headers_out;
/* For all HTTP/1.x responses for which we generate the message,
* we need to avoid inheriting the "normal status" header fields
* that may have been set by the request handler before the
* error or redirect, except for Location on external redirects.
*/
r->headers_out = r->err_headers_out;
r->err_headers_out = tmp;
apr_table_clear(r->err_headers_out);
if (ap_is_HTTP_REDIRECT(status) || (status == HTTP_CREATED)) {
if ((location != NULL) && *location) {
apr_table_setn(r->headers_out, "Location", location);
}
//...
}
//...
}
Take note that r->headers_out is replaced, and the original table is cleared. That table had all of the information that was expected to show up in the response, so now it is lost.
Conclusion
If you don't redirect and you define the rules in a per-server context, everything does seem to work correctly. However, this is not what you want. I can see a potential workaround, but I'm not sure if it would be acceptable, not to mention the need to recompile the server.
As for the Vary: Accept-Encoding, I can only assume it comes from a different module that behaves in a way that allows the header to sneak through. I'm also not sure why Gumbo didn't have an issue when trying it.
For reference, I was looking at the 2.2.14 and 2.2 trunk source code, and I was modifying and running Apache 2.2.15. There doesn't appear to be any significant differences between the versions in the related code sections.

You may want to try something like the following as a workaround:
<LocationMatch "^.*lang\=">
Header onsuccess merge Vary "Accept-Language"
</LocationMatch>

To specifically set the Vary: Accept-Language HTTP response header on the redirect response only (which is what's expected here), you would need to set an environment variable (eg. VARY_ACCEPT_LANGUAGE) as part of the redirect rule and use this to set the header conditionally with the Header directive.
You also need to use the always condition (as opposed to the default onsuccess) with the Header directive in order to set this on the 3xx response (ie. non-200 reponses).
For example:
# Redirect requests that have an empty Accept-Language header and "lang=en" is present
RewriteCond %{HTTP:Accept-Language} ^$
RewriteCond %{QUERY_STRING} ^lang=en
RewriteRule ^$ /? [E=VARY_ACCEPT_LANGUAGE:1,R=301,L]
# Set/Merge "Vary" header on Accept-Language redirect
Header always merge Vary "Accept-Language" env=VARY_ACCEPT_LANGUAGE
HOWEVER, the Vary header shouldn't only be set on the redirect response (when the Accept-Language header is empty), it needs to be set on all responses to requests for /?lang=en, regardless of what the Accept-Language HTTP request header is actually set to. So, relying on Apache to set this header using only the redirect would not be sufficient anyway (even if it did set the header on the response as initially expected).
In order to set the appropriate Vary header on all responses to requests for /?lang=en, including the redirect then do it like this:
# Set env var if "/?lang=en" is requested
RewriteCond %{QUERY_STRING} ^lang=en
RewriteRule ^$ - [E=VARY_ACCEPT_LANGUAGE:1]
# Redirect requests that have an empty Accept-Language header and "lang=en" is present
RewriteCond %{HTTP:Accept-Language} ^$
RewriteCond %{QUERY_STRING} ^lang=en
RewriteRule ^$ /? [R=301,L]
# Set/Merge "Vary" header on all responses from "/?lang=en"
Header always merge Vary "Accept-Language" env=VARY_ACCEPT_LANGUAGE
Note, however, that if you have additional internal rewrite directives that cause the rewrite engine to start over then the env var VARY_ACCEPT_LANGUAGE is renamed to REDIRECT_VARY_ACCEPT_LANGUAGE and the above Header directive will not be successful. You'll probably need an additional directive to handle this. For example:
Header always merge Vary "Accept-Language" env=REDIRECT_VARY_ACCEPT_LANGUAGE

Related

Sanitizing url and parameters

Currently, my software has the following workflow
User performs an search through a REST API and selects an item
Server performs the same search again to validate the user's selection
In order to implement step 2, the user has to send the URL params that he used for his search as a string (ex. age=10&gender=M).
The server will then http_get(url + "?" + params_str_submitted_by_user)
Can a malicious user make the server connect to an unintended server by manipulating params_str_submitted_by_user?
What is the worst case scenario if even newlines are left in and the user can arbitrarily manipulate the HTTP headers?
As you are appending params_str_submitted_by_user to the base URL after the ? delimiter, you are safe from this type of attack used where the context of the domain is changed to a username or password:
Say URL was http://example.com and params_str_submitted_by_user was #evil.com and you did not have the / or ? characters in your URL string concatenation.
This would make your URL http://example.com#evil.com which actually means username example.com at domain evil.com.
However, the username cannot contain the ? (nor slash) character, so you should be safe as you are forcing the username to be concatenated. In your case URL becomes:
http://example.com?#evil.com
or
http://example.com/?#evil.com
if you include the slash in your base URL (better practise). These are safe as all it does is pass your website evil.com as a query string value because #evil.com will no longer be interpretted as a domain by the parser.
What is the worst case scenario if even newlines are left in and the user can arbitrarily manipulate the HTTP headers?
This depends on how good your http_get function is at sanitizing values. If http_get does not strip newlines internally it could be possible for an attacker to control the headers sent from your application.
e.g. If http_get internally created the following request
GET <url> HTTP/1.1
Host: <url.domain>
so under legitimate use it would work like the following:
http_get("https://example.com/foo/bar")
generates
GET /foo/bar HTTP/1.1
Host: example.com
an attacker could set params_str_submitted_by_user to
<space>HTTP/1.1\r\nHost: example.org\r\nCookie: foo=bar\r\n\r\n
this would cause your code to call
http_get("https://example.com/" + "?" + "<space>HTTP/1.1\r\nHost: example.org\r\nCookie: foo=bar\r\n\r\n")
which would cause the request to be
GET / HTTP/1.1
Host: example.org
Cookie: foo=bar
HTTP/1.1
Host: example.com
Depending on how http_get parses the domain this might not cause the request to go to example.org instead of example.com - it is just manipulating the header (unless example.org was another site on the same IP address as your site). However, the attacker has managed to manipulate headers and add their own cookie value. The advantage to the attacker depends on what can be gained under your particular setup from them doing this - there is not necessarily any general advantage, it would be more of a logic flaw exploit if they could trick your code into behaving in an unexpected way by causing it to make requests under the control of the attacker.
What should you do?
To guard against the unexpected and unknown, either use a version of http_get that handles header injection properly. Many modern languages now deal with this situation internally.
Or - if http_get is your own implementation, make sure it sanitizes or rejects URLs that contain invalid characters like carriage returns or line feeds and other parameters that are invalid in a URL. See this question for list of valid characters.

Change Referrer in header using Varnish

I think this is a possiblity with varnish where you can change the referrer in the header of its users and then serve them the content either from cache or from the server. I want to know how can that be made possible.
I tried this with "req.http.referer" and then "set req.http.referer" in varnish 2.1 on centos 32-bit machine but it didn't work when i checked the results with the command "varnishtop -i TxHeader -I Referer".
Anyone got any other ideas better than this?
At least on Varnish 3.0 the following works as expected. Obviously if the response is served from cache and you are not using the req.http.Referer for hash(), it doesn't matter how you change the referer header.
# Modify Referer header
sub vcl_recv {
if (req.http.Referer) {
# Referer was set. Replace foo with bar
set req.http.Referer = regsub(req.http.Referer,"foo","bar");
} else {
# Referer was not set. Set it to something anyway.
set req.http.Referer = "http://referer.was.empty/";
}
}
Also note that varnishtop -i TxHeader -I Referer is case sensitive. If you set req.http.referer then it will not match -I Referer even though your HTTP backend will understand the referer: header as well (according to RFC 2612 4.2 message headers are case insensitive).

Fiddler: Creating an AutoResponse rule to map all calls to one host to another host

Example:
I want to create one AutoResponse rule that will map all calls to one host to another host, but preserve the urls. Examples
http://hostname1/foo.html -> http://hostname2/foo.html
and
http://hostname1/js/script.js -> http://hostname2/js/script.js
in one rule.
For now, I've accomplished this by creating aN AutoResponse rule for every URL my project calls, but I'm sure there must be a way to right one rule using the right wildcards. I looked at http://www.fiddler2.com/Fiddler2/help/AutoResponder.asp, but I couldn't see how to do it. The wild cards all seem to be around the matching and not the action.
Full context: I'm developing on a beta platform and Visual Studio is borked in such away that it is sending all the requests to http://localhost:24575 when my project is actually running on http://localhost:56832
This is how I configured Fiddler2 :
I want to redirect all requests from http://server-name/vendor-portal-html/ to http://localhost/vendor-portal-html/
My configuration is as follows:
REGEX:.*/vendor-portal-html/(.*) to http://127.0.0.1/vendor-portal-html/$1
Thanks to EricLaw for above comment.
To map from one host to another, don't use AutoResponder. Instead, click Tools > Hosts.
Alternatively, you can click Rules > Customize Rules, scroll to OnBeforeRequest and write a bit of code:
if (oSession.HostnameIs("localhost") && (oSession.port == 24575)) oSession.port = 56832;
Because this was way harder to find than it should have been to use Fiddler to redirect all requests for one to host to another host:
Use the AutoResponder tab to set a rule such that any request matching your old host will redirect to your new host with the path and query string appended.
Match with regex options ix to make it case-insensitive and ignore whitespace. Leave off the n option as it requires explicitly named capture groups.
Capture the path and query string of the request and append it to the redirect response using the variable $1, where the path+query is the first capture group. You can use capture groups $1-$n if your regex has more.
Fiddler will then issue an HTTP 307 redirect response.
Request: regex:^(?ix)http://old.host.com/(.*)$ #Match HTTP host
Response: *redir:http://new.host.com/$1
Request
GET http://old.host.com/path/to/file.html HTTP/1.1
Host: old.host.com
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Response
HTTP/1.1 307 AutoRedir
Content-Length: 0
Location: http://new.host.com/path/to/file.html
Cache-Control: max-age=0, must-revalidate
Mapping requests with Fiddler Autoresponder using Regular Expressions is possible.
This can be done with rexexp rules. However this doesn't seem to be documented anywhere.
If you add a rule and use regular expressions within parenthesis, these matches can be used in the desired mapping when using the placeholders ... $n
each number corresponds to the matched regexp in the rule.
Example of Rule: regex:http://server1/(\w*) -> http://server2/
This will result in the following mapping: http://server1/foo.html -> http://server2/foo.html

apache2 tomcat6 mod_rewrite with pretty urls loses user session info - empties shopping cart

I have tried this with both mod_jk and mod_proxy and get the same result.
Using this mod_rewrite rule works fine:
RewriteRule ^/(.*)\-blah.html$ /blah/blah/blah?blah=l2vb&party_name=$1 [R,L]
The trouble with this is the ugly new URL /blah/blah/blah?blah=l2vb&party_name is displayed in the address line of the browser, which is what I'd hoped to avoid. It seems to be the [R] flag that does this.
The following rule hides the ugly URL and displays only the new pretty one:
RewriteRule ^/(.*)\-blah.html$ /blah/blah/blah?blah=l2vb&party_name=$1 [P,L]
NB: The only difference here is the flags at the end between the [].
The trouble is that if the user already had something in their shopping cart it gets emptied. Somehow their connect session (or whatever it is - rather out of my depth here!) gets re-initialised so they appear to be starting from scratch.
I have tried several other combinations of flags, like [PT,L], [R,PT] etc and had no luck so far.
The [R] flag means 302 Redirect Code, which obviously changes the URL in a browser.
I think you need QSA flag:
RewriteRule ^/(.*)\-blah.html$ /blah/blah/blah?blah=l2vb&party_name=$1 [QSA,L]
QSA flag will preserve existing query string (to be more precise, will append it to the new URL) .. which otherwise gets lost as you DO manipulate with query string. I think session ID or something may be passed via query string .. and when URL gets rewritten it is lost, so server creates new session. If that is the case, then the above should solve your problem.
Apache documentation: http://httpd.apache.org/docs/current/rewrite/flags.html#flag_qsa

Mod_rewrite and MySql

I have a url eg. www.example.com/user.php?user_id=9 , where the user_id field maps to one of the pk in the user table . I don't want the url to be like this , instead i want to have a url like www.example.com/user/Aditya-Shukla.i am using apache 2 and I understand that mod-rewrite module has sets of rewriting rules which can be used to create url alias.
My question is
I have all href in the form www.example.com/user.php?user_id=9. So to change the url I suppose i have to change all the href's to the www.example.com/user/Aditya-Shukla and for rewriting the rule do a query to get a record?
Is there a better solution .
No, mod_rewrite does not have sets of rewriting rules. It rather provides directives to build rules based on regular expression patterns that can be combined with additional conditions.
In your case you would build a rule that takes any requested URL path that starts with /user/ and has another path segment following and rewrites it internally to your user.php, like:
RewriteEngine on
RewriteRule ^/user/([^/]+)$ /user.php?name=$1
The first directive RewriteEngine on is just to enable mod_rewrite. And the second directive RewriteRule … is the rule as described above: ^/user/([^/]+)$ is the pattern that matches any URL path that starts with /user/ (i.e. ^/user/) and that is followed by one path segment (i.e. ([^/]+)$). That request is then rewritten internally to /user.php while the matched path segment behind the /user/ is used as a parameter value for the name parameter ($1 is a reference to the matched value of the first group denoted with (…)).
So this will rewrite a request of /user/Aditya-Shukla internally to /user.php?name=Aditya-Shukla. You can then use that user name and look it up in your table.
You can either add a RewriteRule that will rewrite user/Aditya-Shukla to user.php?user_name=Aditya-Shukla and handle the rest in your code.
RewriteEngine On
RewriteRule ^user/(.*)$ user.php?user_name=$1
Or using a RewriteMap directive to lookup usernames, which will allow to rewrite user/Aditya-Shukla directly to user.php?user_id=9
I presume that within your own site you will always create the canonical form of the URL, i.e.:
/user/Aditya-Shukla
...and you are just having to deal with outside links that are not in canonical form, i.e. "old links" like:
www.example.com/user.php?user_id=9
mod_rewrite may not be suitable for remapping in this situation. I am presuming you may have very many users, and that number may grow. mod_rewrite does have a RewriteMap directive and yes there are ways to generate your map dynamically, but I don't think that would be a good design (to dynamically create a map of userId-to-userName dynamically every time your rewrite rule matches...)
Instead you should simply write your user.php code to lookup the correct userName, assemble the canonical form of URL you want, and send a redirect back to the client. Something like:
Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://www.example.com/user/Aditya-Shukla" );
You should probably also use a 301 redirect (instead of 302) to indicate this is a "permanent" URL change, which will help search bots index your site correctly if it encounters an "old style" URL out there.
-broc

Resources