Blocking user agent - mod-rewrite

Can someone tell me how to block the following user agent using apache2 mod rewrite or any other method,
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/A1E1

To block that specific user-agent in Apache config (or per-directory .htaccess file) using mod_rewrite, you can do something like this:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/A1E1"
RewriteRule ^ - [F]
This serves a 403 Forbidden for any request from that exact user-agent.
The regex (first argument to the RewriteRule directive) ^ (start-of-string assertion) is successful for every request. Whilst the single - (hyphen) in the substitution string (2nd argument) indicates no substitution (we are simply blocking the request, not rewriting the URL).
By prefixing the CondPattern (2nd argument to the RewriteCond directive) with = makes it a lexicographical string comparison (ie. an exact match), not a regular expression. The surrounding double quotes are required since the string we are matching contains spaces.
The F flag is equivalent to R=403. The L flag is not required since it is implied when returning a non-3xx (or 2xx) status.
To return a "404 Not Found" instead of a "403 Forbidden" use the R=404 flag instead of F.
UPDATE:
can we add a wildcard entry like the last part of Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/A1E1 keeps changing the /A1E1
Yes, but you'll need to change the above CondPattern to a regex.
For example:
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/5\.0 (Windows NT 6\.1; WOW64; rv:63\.0) Gecko/20100101 Firefox/"
The above matches any user agent that starts Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/, thus leaving the end of the user-agent variable.
Note that since this is now a regex any special regex meta characters need to be backslash-escaped. In this example, that would seem to be just the dots (.). The surrounding double quotes can still be used to avoid having to escape the spaces.

Related

Replace html encoding with url encoding

We are getting random hits like:
/abc?p=2&utm_campaign=xyz-campaign&utm_medium=email&utm_source=newsletter
Notice & which is html encoding for &. I have checked the possible sources but they all contain '&' only.
I want replace all & with &. Is there a way to achieve it? I have removed complete string as of now using below given rule. But this is not right!
RewriteCond %{QUERY_STRING} (&)
RewriteRule (.*?) https://%{HTTP_HOST}%{REQUEST_URI}? [R=302,L]
Capture the referer in your logs so you can find the source of the problem and (also) fix it there.
RewriteCond %{THE_REQUEST} ^\S++\s++([^\s?]*+\?\S*?&)amp;(\S*+)
RewriteRule ^ %1%2 [DPI,L,R=301]
That will do one redirect per ampersand without messing about decoding and re-encoding the URL. Less efficient but more reliable.

How can I pass argument stored on a variable to WGET

I'm writing a bash script that extensively uses wget. To define all common parameters in one place I store them on variables. Here's a piece of code:
useragent='--user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0"'
cookies_file="/tmp/wget-cookies.txt"
save_cookies_cmd="--save-cookies $cookies_file --keep-session-cookies"
load_cookies_cmd="--load-cookies $cookies_file --keep-session-cookies"
function mywget {
log "#!!!!!!!!!# WGET #!!!!!!!!!# wget $quiet $useragent $load_cookies_cmd $#"
wget $useragent $load_cookies_cmd "$#"
}
Saddly isn't working. Somehow I'm missing the right way to store parameters on variables $useragent, $save_cookies_cmd, $load_cookies_cmd and caling wget passing these vars as parameters.
I want the result commandline as this:
wget --user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0" --load-cookies /tmp/wget-cookies.txt --keep-session-cookies http://mysite.local/myfile.php
Drop the inner quotes when setting $useragent, but retain the double quotes when you use it:
useragent='--user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0'
...
wget "$useragent" $load_cookies_cmd "$#"
To understand why this works, notice that wget --user-agent="string with spaces" is entirely equivalent to wget "--user-agent=string with spaces". Wget receives (and must requires) the --user-agent=... option as a single argument, regardless of the positioning of the quotes.
The quotes serve to prevent the shell from splitting the string, which is why wget "$useragent" is necessary. On the other hand, the definition of user-agent needs quotes for the assignment to work, but doesn't need a second level of quotes, because those would be seen by Wget and become part of the user-agent header sent over the wire, which you don't want.

Word Splitting on options substitution

In the following shell script I am unable to set a user-agent with spaces in it. I am getting word splitting. The bit after the first space (i.e. "(Macintosh;") is being interpreted by curl as a url.
If I type it in into the console it work fine but not when I use substitution.
PARAMS="-v"
PARAMS="${PARAMS} --user-agent \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko)\"" #does not work
#PARAMS="${PARAMS} --user-agent \"Mozilla/5.0\"" #works
curl ${PARAMS} $1 > results.txt
Can someone please explain why?
The problem is explained in the Bash FAQ
The solution is a slightly different syntax.
PARAMS=(-v)
PARAMS+=( "-A Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko)")
curl "${PARAMS[#]}" $1 > results.txt
From here: http://wiki.bash-hackers.org/syntax/quoting
These quote characters (", double quote and ', single quote) are a syntax element that influences parsing. It is not related to eventual quote characters that are passed as text to the commandline! The syntax-quotes are removed before the command is called!
So there is a fundamental difference between cmd "my args" and myargs="\"my args\""; cmd $myargs.
Try replacing the spaces with %20
You can do this in the script if you want like:
str_replace ( ' ', '%20', 'what you need here' );
Hope this helps.

mod_rewrite, help me forward a page

I'm sorry but I don't really completely understand how mod_rewrite works but I'd like to basically change the url:
/index.php?category=value1&video=value2
to be accessed via /value1/value2
could anybody tell me how to do this? thanks^^
Try this one here:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/?(.*)/(.*)$ index.php?category=$1&video=$2 [L]
The first line enabled the usage from the mod_rewrite.
The second line is a condition which checks if there is a file with that name. If not continue with the next line.
The third one is a regular expression. The ^ markes the beginning and the $ the end of it. The /? means that at the beginning should be a optional / (this depends on the server configuration). The (.*) meanes a range of chars which from 0 until n. The brackets meanes that there is a group which can be called as $n here as $1 and $2.
Please note that AllowOverride All must be enabled in the server configuration.

Isapi Rewrite - access query string pattern matches in RewriteRule

I'm using Isapi Rewrite 3 (mod rewrite clone for IIS) and trying to rewrite URLs based on the query string - and then pass on part of that query string in the rewrite.
So if I enter a URL like this: /test/home.cfm?page=default&arbitraryExtraArg=123
I want that to be rewritten as: /test/index.cfm?page=home&arbitraryExtraArg=123
I have the following condition/rule:
RewriteCond %{QUERY_STRING} ^page=default(.*)$ [I]
RewriteRule ^/test/home.cfm$ /test/index.cfm?page=home%1 [I,R=301]
But the extra query string variables are never passed. %1 seems to be blank.
This is how to reference a pattern match from the RewriteCond, right?
What am I missing here?
Thanks!
EDIT: It looks to me like the RewriteCond is being totally ignored. Here is what my logfile looks like:
[snip] (2) init rewrite engine with requested uri /test/home.cfm?page=default
[snip] (1) Htaccess process request C:\Inetpub\wwwroot\dev\IsapiRewrite\httpd.conf
[snip] (3) applying pattern '^/test/home\.cfm$' to uri '/test/home.cfm'
[snip] (1) escaping /test/index.cfm?page=home
[snip] (2) explicitly forcing redirect with http://www.devsite.com/test/index.cfm?page=home
[snip] (2) internal redirect with /test/home.cfm?page=default [INTERNAL REDIRECT]
[snip] (2) init rewrite engine with requested uri /test/index.cfm?page=home
[snip] (1) Htaccess process request C:\Inetpub\wwwroot\dev\IsapiRewrite\httpd.conf
[snip] (3) applying pattern '^/test/home\.cfm$' to uri '/test/index.cfm'
Should there be mention of the RewriteCond pattern check in there?
But the extra query string variables are never passed. %1 seems to be blank.
%1 is blank because according to the log the request you make is /test/home.cfm?page=default - without second parameter.
The absense of RewriteCond processing in the log may be due to low RewriteLogLevel.
And the config should be:
RewriteCond %{QUERY_STRING} ^page=default(.+)$ [NC]
RewriteRule ^/test/home.cfm$ /test/index.cfm?page=home%1 [NC,R=301,L]
I think ISAPI Rewrite is a little different to mod_rewrite:
Whenever you put parentheses in any regular expression present in a complex rule (a rule with conditions) you are marking a submatch that could be used in a format string (using $N syntax) or as a back-reference (using \N syntax) in subsequent conditions. These submathces are global for the whole complex rule (RewriteRule directive and corresponding RewriteCond directives). Submatches are numbered from up to down and from left to right beginning from the first RewriteCond directive (if such directive exists) corresponding to the RewriteRule.
So try $1 to get the match of the first group no matter where it appears:
RewriteCond %{QUERY_STRING} ^page=default(.*)$ [I]
RewriteRule ^/test/home\.cfm$ /test/index.cfm?page=home$1 [I,R=301]
ISAPI Rewrite 3 seems to be more compatible to Apache’s mod_rewrite.
I believe this works also:
RewriteRule ^/test/home.cfm\?page=default(.*)$ /test/index.cfm?page=home%1 [I,R=301]

Resources