I'm trying to use cURL to get data to the form an URL:
http://example.com/site-explorer/get_overview_text_data.php?data_type=refdomains_stats&hash=19a53c6b9aab3917d8bed5554000c7cb
which needs a cookie, so I first store it on a file:
curl -c cookie-jar http://example.com/site-explorer/overview/subdomains/example.com
Trying curl with these values:
curl -b cookie-jar -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" --referer "http://example.com/site-explorer/overview/subdomains/example.com" http://example.com/site-explorer/get_overview_text_data.php?data_type=refdomains_stats&hash=19a53c6b9aab3917d8bed5554000c7cb
There is one problem which leaps out at me: You aren't quoting the URL, which means that characters such as & and ? will be interpreted by the shell instead of getting passed to curl. If you're using a totally static URL, enclose it in single quotes, as in 'http://blah.com/blah/blah...'.
Related
As I made a batch file to update NirSoft tools, I had a strange experience using wget.
First I downloaded a text file with pad links:
wget http://www.nirsoft.net/pad/pad-links.txt --backups=20 --append-output=C:\Path\Update\LOG\Nirsoft\%Timestamp%_NirSoft.log
After, I used fart-js to delete rows I did not need from the pad-links.txt file. Also I used that program to change the download links to https://www.nirsoft.net/utils, and change the file extensions to .zip.
fart ".\pad-links.txt" "http://www.nirsoft.net/pad" "http://www.nirsoft.net/utils" | tee --append C:\Path\Update\LOG\Nirsoft\%Timestamp%_NirSoft.log
and
fart ".\pad-links.txt" ".xml" ".zip" | tee --append C:\Path\Update\LOG\Nirsoft\%Timestamp%_NirSoft.log
After, to download the programs, I used:
wget --timestamping --input-file=C:\Path\UtilSuit\NirLauncher\Download\pad-links.txt --append-output=C:\Path\Update\LOG\Nirsoft\%Timestamp%_NirSoft.log
Having a look at the log file I found out that not all programs are stored in this location. For example WirelessKeyView is stored in https://www.nirsoft.net/toolsdownload/wirelesskeyview.zip.
Trying to get this file with wget leads to downloaded corrupt files at size of 4kb. The same with cURL and aria2. When I download it with Mozilla, or IDM, I have no problems to get the file. So I tried out wget --auth-no-challenge or wget --header="Accept: text/html" --user-agent="Mozilla/5.0 …"
I also tried cliget, the wget/aria2/curl lines it produced while normal downloading with Mozilla.
wget --header 'Host: www.nirsoft.net' --user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:92.0) Gecko/20100101 Firefox/92.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' --header 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --referer 'https://www.nirsoft.net/utils/wirelesskeyview.html' --header 'Upgrade-Insecure-Requests: 1' --header 'Sec-Fetch-Dest: document' --header 'Sec-Fetch-Mode: navigate' --header 'Sec-Fetch-Site: same-origin' --header 'Sec-Fetch-User: ?1' --header 'DNT: 1' --header 'Sec-GPC: 1' 'https://www.nirsoft.net/toolsdownload/wirelesskeyview.zip' --output-document 'wirelesskeyview.zip'
I googled and found this reference for powershell, (same error), but cannot reproduce the working answer in batch, (I am not familiar with powershell scripting).
So how is is possible to download the single wirelesskey.zip file with wget/curl or aria2 in a batch script?
A workaround I found out is downloading it directly from the pad Panel but I want the .zip-file, including the updated .chm-file, and also the 64-bit versions, if available.
One more note, within my anti-virus tool the nirsoft site is exempted from scanning, so that is not the answer.
Any solutions?
Aah, this one is simple. If you look at the actual page downloaded, it's called "403.html". So, let's open it. The first thing that strikes you is this:
<title>Error 403: Missing HTTP referer in the HTTP request</title>
So, the server wants a Referer header. Sure, let's give it one:
$ wget --referer foo <URL>
And it downloads the zip file correctly as expected.
Now, really, the server should not be returning a HTTP 200 response with a file called 403. It really should have sent back a HTTP 403 response. But what can you do? There's broken servers everywhere
I am currently trying to make a Wallpaper randomiser.
The rule that I have is to take the 9th image on google image from a random word selected and put it as the wallpaper. I am doing it on bash.
But when I do a wget on a google website, the common href for these link disappear and get replace (if I don't use the option -k they get replace by a # else they get replace by something that i can't read)
Here is my command:
wget -q -p -k --user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" -e robots=off $address
where $address is:
address="https://www.google.fr/search?q=wallpaper+$word&safe=off&biw=1920&bih=880&tbs=isz:ex,iszw:1920,iszh:1080&tbm=isch&source=lnt"
the link that I want to obtain is like
href="/imgres/imgurl="<Paste here an url image>"
I have some new information.
In fact google seems to change his url with javascript and other client technologies. Then i need some wget copy that interpret javascript before. Do some one know this?
My xidel command is the following:
xidel "https://www.iec-iab.be/nl/contactgegevens/c360afae-29a4-dd11-96ed-005056bd424d" -e '//div[#class="consulentdetail"]'
This should extract all data in the divs with class consulentdetail
Nothing special I thought but it wont print anything.
Can anyone help me finding my mistake?
//EDIT: When I use the same expression in Firefox it finds the desired tags
The site you are connecting to obviously checks the user agent string and delivers different pages, according to the user agent string it gets sent.
If you instruct xidel to send a user agent string, impersonating as e.g. Firefox on Windows 10, your query starts to work:
> ./xidel --silent --user-agent="Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0" "http://www.iec-iab.be/nl/contactgegevens/c360afae-29a4-dd11-96ed-005056bd424d" -e '//div[#class="consulentdetail"]'
Lidnummer11484 2 N 73
TitelAccountant, Belastingconsulent
TaalNederlands
Accountant sinds4/04/2005
Belastingconsulent sinds4/04/2005
AdresStationsstraat 2419550 HERZELE
Telefoon+32 (53) 41.97.02
Fax+32 (53) 41.97.03
AdresStationsstraat 2419550 HERZELE
Telefoon+32 (53) 41.97.02
Fax+32 (53) 41.97.03
GSM+32 (474) 29.00.67
Websitehttp://abbeloosschinkels.be
E-mail
<!--
document.write("");document.write(decrypt(unescCtrlCh("5yÿÃ^à (pñ_!13!Â[îøû!13!5ãév¦Ãçj|°W"),"Iate1milrve%ster"));document.write("");
-->
As a rule of thumb, when doing Web scraping and getting weird results:
Check the page in a browser with Javascript disabled.
Send a user agent string simulating a Web browser.
I try to download something with wget using for loop in bash script:
When i'm not using variables everything work fine, when i assign it into variables i have 500 server error. This is strange for me, because this is only copy-paste.
What i'm trying to do is take number from loop i and paste it into body.
Here is my code:
#!/bin/bash
for i in {1..5}
do
STR="some_static_stuff_before"$i"some_static_suff_after"
echo $STR
wget -O ready/page$i.aspx --header="Host: www.something.com" --header="Pragma: no-cache" --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" --header="Accept-Language: en-en" --header="Accept-Encoding: gzip, deflate" --header="Content-Type: application/x-www-form-urlencoded" --header="Origin: http://something.com" --header="Connection: keep-alive" --header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11" --header="Referer: http://www.something.com/something.aspx" --header="Cookie: ASP.NET_SessionId=u5cmt0figi4bvs40a30gnwsa; __utma=20059042.38323768.1389369038.1389710153.1389780868.6; __utmb=20059042.2.10.1389780868; __utmc=20059042; __utmz=20059042.1389627823.2.2.utmcsr=something.com|utmccn=(referral)|utmcmd=referral|utmcct=/something.aspx" --post-data='"$STR"' http://something.com/something.aspx
done
And when i paste object directly to --post-data there is no problem with download content.
I've tried --post-data= "/"$STR/"" and --post-data='"$STR"' and still not working.
You single-quoted the variable reference (in addition to double-quoting it), which prevents substitution of the variable value.
Instead of
--post-data='"$STR"'
use
--post-data="$STR"
Having a problem with CURL and the HTTP User and password Auth methods, it is not liking the exclamation mark, I've tried escaping the following ways:
Tried and failed...
/usr/bin/curl -u 'UserName\WithSlash:PasswordWithExclamation!' https://test.com/
/usr/bin/curl -u UserName\\WithSlash:PasswordWithExclamation\! https://test.com/
Not working for basic or digest if it matters (using --anyauth) ... getting 401 denied...
What am I doing incorrectly?
curl -u UserName\\WithSlash:PasswordWithExclamation\! http://....
works fine.
it sends
GET / HTTP/1.1
Authorization: Basic VXNlck5hbWVcV2l0aFNsYXNoOlBhc3N3b3JkV2l0aEV4Y2xhbWF0aW9uIQ==
User-Agent: curl/7.21.0
Host: teststuff1.com:80
Accept: */*
which is "UserName\WithSlash:PasswordWithExclamation!" in the auth string.
not that complicated, just use "". at least it works on Linux.
for example:
curl -u "username:passwdwithspecialchar" GET https://....
If you know the server supports Basic auth, you could set the header directly:
curl --header "Authorization: Basic $(base64 --wrap=0 credentials)" https://example.org
This way you can store the user and password (UserName\WithSlash:PasswordWithExclamation!) without any escaping in the credentials file you pass to the base64 command.