I'm writing a bash script that extensively uses wget. To define all common parameters in one place I store them on variables. Here's a piece of code:
useragent='--user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0"'
cookies_file="/tmp/wget-cookies.txt"
save_cookies_cmd="--save-cookies $cookies_file --keep-session-cookies"
load_cookies_cmd="--load-cookies $cookies_file --keep-session-cookies"
function mywget {
log "#!!!!!!!!!# WGET #!!!!!!!!!# wget $quiet $useragent $load_cookies_cmd $#"
wget $useragent $load_cookies_cmd "$#"
}
Saddly isn't working. Somehow I'm missing the right way to store parameters on variables $useragent, $save_cookies_cmd, $load_cookies_cmd and caling wget passing these vars as parameters.
I want the result commandline as this:
wget --user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0" --load-cookies /tmp/wget-cookies.txt --keep-session-cookies http://mysite.local/myfile.php
Drop the inner quotes when setting $useragent, but retain the double quotes when you use it:
useragent='--user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0'
...
wget "$useragent" $load_cookies_cmd "$#"
To understand why this works, notice that wget --user-agent="string with spaces" is entirely equivalent to wget "--user-agent=string with spaces". Wget receives (and must requires) the --user-agent=... option as a single argument, regardless of the positioning of the quotes.
The quotes serve to prevent the shell from splitting the string, which is why wget "$useragent" is necessary. On the other hand, the definition of user-agent needs quotes for the assignment to work, but doesn't need a second level of quotes, because those would be seen by Wget and become part of the user-agent header sent over the wire, which you don't want.
Related
Can someone tell me how to block the following user agent using apache2 mod rewrite or any other method,
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/A1E1
To block that specific user-agent in Apache config (or per-directory .htaccess file) using mod_rewrite, you can do something like this:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/A1E1"
RewriteRule ^ - [F]
This serves a 403 Forbidden for any request from that exact user-agent.
The regex (first argument to the RewriteRule directive) ^ (start-of-string assertion) is successful for every request. Whilst the single - (hyphen) in the substitution string (2nd argument) indicates no substitution (we are simply blocking the request, not rewriting the URL).
By prefixing the CondPattern (2nd argument to the RewriteCond directive) with = makes it a lexicographical string comparison (ie. an exact match), not a regular expression. The surrounding double quotes are required since the string we are matching contains spaces.
The F flag is equivalent to R=403. The L flag is not required since it is implied when returning a non-3xx (or 2xx) status.
To return a "404 Not Found" instead of a "403 Forbidden" use the R=404 flag instead of F.
UPDATE:
can we add a wildcard entry like the last part of Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/A1E1 keeps changing the /A1E1
Yes, but you'll need to change the above CondPattern to a regex.
For example:
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/5\.0 (Windows NT 6\.1; WOW64; rv:63\.0) Gecko/20100101 Firefox/"
The above matches any user agent that starts Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/, thus leaving the end of the user-agent variable.
Note that since this is now a regex any special regex meta characters need to be backslash-escaped. In this example, that would seem to be just the dots (.). The surrounding double quotes can still be used to avoid having to escape the spaces.
This question already has answers here:
Building command strings using variables with various quote levels and spaces
(3 answers)
Closed 4 years ago.
I want to set my user agent from a string I've got in a variable, however there are some other options as well that I want to pass in the same string, so this is what I ended up with:
cookie="-b cookie -c cookie"
agent="My Bot"
opt="-A \"$agent\" $cookie"
curl http://example.com $opt
When I run my script it fetches the site but it doesn't set the whole user agent but just the part before the first space and then goes
curl: (6) Could not resolve host: Bot
I'm guessing there is something messing up with the quotes, but if I replace curl with echo this is what I see, which seems pretty accurate to me
http://example.com -A "My Bot" -b cookie -c cookie
What am I doing wrong?
To handle whitespace correctly store arguments in arrays, not strings.
cookie=(-b cookie -c cookie)
agent="My Bot"
opt=(-A "$agent" "${cookie[#]}")
curl http://example.com "${opt[#]}"
I'm guessing there is something messing up with the quotes, but if I replace curl with echo this is what I see, which seems pretty accurate to me.
echo is misleading. If you want to be absolutely sure, use set -x.
$ set -x
$ curl http://example.com $opt
+ curl http://example.com -A '"My' 'Bot"' -b cookie -c cookie
See how it turned "My Bot" into two arguments, '"My' and 'Bot"'?
This question already has answers here:
unable to pass wget a variable with quotes inside the variable
(2 answers)
Closed 5 years ago.
When writing a script that calls wget, I want to make use of a variable that holds all of the headers, but if I attempt the following, then the whitespace within the header strings breaks them into multiple parameters (not a single header):
HEADERS='--header "Accept-Encoding: gzip, deflate"'
wget $HEADERS "$URL"
# ... $HEADERS is interpreted as 4 strings instead of 2 ...
wget --header Accept-Encoding: gzip, deflate http://example.com
I can't wrap $HEADERS in quotation marks because I need it to be interpreted as 2 arguments instead of 1. (And if I had many headers in the variable, it would need to be split into still more arguments.)
I often write bash scripts that assemble arguments for other programs, and I often encounter this issue. Is there a solution in bash?
Do not use a regular variable; use an array.
headers=(--header "Accept-Encoding: gzip, deflate")
wget "${headers[#]}" "$URL"
I have to insert many data in my application and through the graphical interface it takes many time. For this reason I want to create a bash script and make the requests through curl using the REST API (I have to manually specify the id).
The problem is that i get the error: The server refused this request because the request entity is in a format not supported by the requested resource for the requested method.
Here is the code
#!/bin/bash
for i in {1..1}
do
CURL='/usr/bin/curl -X POST'
RVMHTTP="http://192.168.1.101:8080/sitewhere/api/devices
-H 'accept:application/json'
-H 'content-type:application/json'
-H 'x-sitewhere-tenant:sitewhere1234567890'
--user admin:password"
DATA=" -d '{\"hardwareId":\"$i",\"siteToken\":\"4e6913db-c8d3-4e45-9436-f0a99b502d3c\",\"specificationToken\":\"82043707-9e3d-441f-bdcc-33cf0f4f7260\"}'"
# or you can redirect it into a file:
$CURL $RVMHTTP $DATA >> /home/bluedragon/Desktop/tokens
done
The format of my request has to be json
#!/usr/bin/env bash
rvmcurl() {
local url
url="http://192.168.1.101:8080/sitewhere/${1#/}"
shift || return # function should fail if we weren't passed at least one argument
curl -XPOST "${rvm_curl_args[#]}" "$url" "$#"
}
i=1 # for testing purposes
rvm_curl_args=(
-H 'accept:application/json'
-H 'content-type:application/json'
-H 'x-sitewhere-tenant:sitewhere1234567890'
--user admin:password
)
data=$(jq -n --arg hardwareId "$i" '
{
"hardwareId": $hardwareId,
"siteToken": "4e6913db-c8d3-4e45-9436-f0a99b502d3c",
"specializationToken": "82043707-9e3d-441f-bdcc-33cf0f4f7260"
}')
rvmcurl /api/devices -d "$data"
Note:
Commands, or command fragments intended to be parsed into multiple words, should never be stored in strings. Use an array or a function instead. Quotes inside such strings are not parsed as syntax, and instead (when parsed without eval, which carries its own serious risks and caveats) become literal values. See BashFAQ #50 for a full explanation.
Use a JSON-aware tool, such as jq, to ensure that generated data is legit JSON.
Fully-qualifying paths to binaries is, in general, an antipattern. It doesn't result in a significant performance gain (the shell caches PATH lookups), but it does reduce your scripts' portability and flexibility (preventing you from installing a wrapper for curl in your PATH, in an exported shell function, or otherwise).
All-caps variable names are in a namespace used for variables with meaning to the shell and operating system. Use names with at least one lowercase character for your own variables to prevent any chance of conflict.
In the following shell script I am unable to set a user-agent with spaces in it. I am getting word splitting. The bit after the first space (i.e. "(Macintosh;") is being interpreted by curl as a url.
If I type it in into the console it work fine but not when I use substitution.
PARAMS="-v"
PARAMS="${PARAMS} --user-agent \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko)\"" #does not work
#PARAMS="${PARAMS} --user-agent \"Mozilla/5.0\"" #works
curl ${PARAMS} $1 > results.txt
Can someone please explain why?
The problem is explained in the Bash FAQ
The solution is a slightly different syntax.
PARAMS=(-v)
PARAMS+=( "-A Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko)")
curl "${PARAMS[#]}" $1 > results.txt
From here: http://wiki.bash-hackers.org/syntax/quoting
These quote characters (", double quote and ', single quote) are a syntax element that influences parsing. It is not related to eventual quote characters that are passed as text to the commandline! The syntax-quotes are removed before the command is called!
So there is a fundamental difference between cmd "my args" and myargs="\"my args\""; cmd $myargs.
Try replacing the spaces with %20
You can do this in the script if you want like:
str_replace ( ' ', '%20', 'what you need here' );
Hope this helps.