I am using following command to download a single webpage with all its images and js using wget in Windows 7:
wget -E -H -k -K -p -e robots=off -P /Downloads/ http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html
It is downloading the HTML as required, but when I tried to pass on a text file having a list of 3 URLs to download, it didn't give any output, below is the command I am using:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt -B 'http://'
I tried this also:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt
This text file had URLs http:// prepended in it.
list.txt contains list of 3 URLs which I need to download using a single command. Please help me in resolving this issue.
From man wget:
2 Invoking
By default, Wget is very simple to invoke. The basic syntax is:
wget [option]... [URL]...
So, just use multiple URLs:
wget URL1 URL2
Or using the links from comments:
$ cat list.txt
http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html
http://www.verizonwireless.com/smartphones-2.shtml
http://www.att.com/shop/wireless/devices/smartphones.html
and your command line:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt
works as expected.
First create a text file with the URLs that you need to download.
eg: download.txt
download.txt will as below:
http://www.google.com
http://www.yahoo.com
then use the command wget -i download.txt to download the files. You can add many URLs to the text file.
If you have a list of URLs separated on multiple lines like this:
http://example.com/a
http://example.com/b
http://example.com/c
but you don't want to create a file and point wget to it, you can do this:
wget -i - <<< 'http://example.com/a
http://example.com/b
http://example.com/c'
pedantic version:
for x in {'url1','url2'}; do wget $x; done
the advantage of it you can treat is as a single wget url command
Related
Is there any way to make wget output everything in a single flat folder? right now i'm
wget --quiet -E -H -k -K -p -e robots=off #{url}
but i'm getting everything in the same nested way as it is on the site, is there any option to flatten the resulting folder structure? (and also the sources links on the index.html file)
After reading on the documentation and some examples i found that i was missing the -nd flag, that would make wget just get the files and not the directories
correct call wget --quiet -E -H -k -nd -K -p -e robots=off #{url}
I am trying to extract URLS from a webpage using wget. I tried this
wget -r -l2 --reject=gif -O out.html www.google.com | sed -n 's/.*href="\([^"]*\).*/\1/p'
It is displaiyng FINISHED
Downloaded: 18,472 bytes in 1 files
But not displaying the weblinks. If I try to do it seperately
wget -r -l2 --reject=gif -O out.html www.google.com
sed -n 's/.*href="\([^"]*\).*/\1/p' < out.html
Output
http://www.google.com/intl/en/options/
/intl/en/policies/terms/
It is not displaying all the links
ttp://www.google.com
http://maps.google.com
https://play.google.com
http://www.youtube.com
http://news.google.com
https://mail.google.com
https://drive.google.com
http://www.google.com
http://www.google.com
http://www.google.com
https://www.google.com
https://plus.google.com
And more over I want to get links from 2nd level and more can any one give a solution for this
Thanks in advance
The -O file option captures the output of wget and writes it to the specified file, so there is no output going through the pipe to sed.
You can say -O - to direct wget output to standard output.
If you don't want to use grep, you can try
sed -n "/href/ s/.*href=['\"]\([^'\"]*\)['\"].*/\1/gp"
I'm using a script that downloads images. I am having issues while running the wget command with a variable that contains "{" and "}" characters. It transforms into "%7B" and "%7D".
Here is part of the script
wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off "$url"
The "$url" contains something like "http://webpage.com/di1/dir2/{1..26}.jpg".
I just did a Loop
for i in {1..50}
do
wget "$url$i.jpg"
done
I want to download this webpage using wget in Win7 : http://www.att.com/shop/wireless/devices/smartphones.deviceListView.xhr.flowtype-NEW.deviceGroupType-Cellphone.paymentType-postpaid.packageType-undefined.html?commitmentTerm=24&taxoStyle=SMARTPHONES&showMoreListSize=1000
I am using this command to do this :
wget -E -H -k -K -p -e robots=off -P /Downloads/AT&T_2013-01-29/ http://www.att.com/shop/wireless/devices/smartphones.deviceListView.xhr.flowtype-NEW.deviceGroupType-Cellphone.paymentType-postpaid.packageType-undefined.html?commitmentTerm=24&taxoStyle=SMARTPHONES&showMoreListSize=1000
I am getting taxostyle not defined, commitmentterm not defined or recognizble method error
Add quotes around address
wget -E -H -k -K -p -e robots=off -P "/Downloads/AT&T_2013-01-29/" "http://www.att.com/shop/wireless/devices/smartphones.deviceListView.xhr.flowtype-NEW.deviceGroupType-Cellphone.paymentType-postpaid.packageType-undefined.html?commitmentTerm=24&taxoStyle=SMARTPHONES&showMoreListSize=1000"
& is used as command separator in command window
when i use bash to upload files to dropbox, it works fine but when i manually use command line it does not work.
I'm thinking it might be the & in the url.. im not sure..
Bash code:
CURL_BIN="/usr/bin/curl"
#Note: This option explicitly allows curl to perform "insecure" SSL connections and transfers.
#CURL_ACCEPT_CERTIFICATES="-k"
CURL_PARAMETERS="--progress-bar"
APPKEY="zrwv8z3bycfk3m8"
OAUTH_ACCESS_TOKEN="aaaaaaaa"
APPSECRET="aaaaaaaaaa"
OAUTH_ACCESS_TOKEN_SECRET="aaaaaaaaa"
ACCESS_LEVEL="dropbox"
API_UPLOAD_URL="https://api-content.dropbox.com/1/files_put"
RESPONSE_FILE="temp2.txt"
FILE_SRC="temp.txt"
$CURL_BIN $CURL_ACCEPT_CERTIFICATES $CURL_PARAMETERS -v -i -o "$RESPONSE_FILE" --upload-file "$FILE_SRC" "$API_UPLOAD_URL/$ACCESS_LEVEL/$FILE_DST?oauth_consumer_key=$APPKEY&oauth_token=$OAUTH_ACCESS_TOKEN&oauth_signature_method=PLAINTEXT&oauth_signature=$APPSECRET%26$OAUTH_ACCESS_TOKEN_SECRET"
Manual code:
curl --insecure --progress-bar -v -i -o temp2.txt --upload-file temp.txt https://api-content.dropbox.com/1/files_put/dropbox/attachments/temp.txt?oauth_consumer_key=aaaaaaaaaa&oauth_token=aaaaaaaaa&oauth_signature_method=PLAINTEXT&oauth_signature=aaaaaaaaa%26aaaaaaaaaa
curl --insecure --progress-bar -v -i -o temp2.txt --upload-file temp.txt "https://api-content.dropbox.com/1/files_put/dropbox/attachments/temp.txt?oauth_consumer_key=aaaaaaaaaa&oauth_token=aaaaaaaaa&oauth_signature_method=PLAINTEXT&oauth_signature=aaaaaaaaa%26aaaaaaaaaa"
The solution is to add in the inverted commas "