Unix Wget | # in URL | Syntax Issue - shell

what should be the wget command if the URL contain #
Ex of my url
https://tableau.abc.intranet/#/site/QQ/views/Myreport/DailyReport.csv
Command
wget -P /temp "https://tableau.abc.intranet/#/site/QQ/views/Myreport/DailyReport.csv"
however wget is considering the url till intrenet/ after that it is not taking .

Related

how do I download a large number of zip files with wget to a url

At the url here there is a large number of zip files that I need to download and save to the test/files/downloads directory. I'm using wget with the prompt
wget -i http://bitly.com/nuvi-plz -P test/files/downloads
and It downloads the whole page into a file inside the directory and starts downloading each zip file but then gives me a 404 for each file that looks something like
2016-05-12 17:12:28-- http://bitly.com/1462835080018.zip
Connecting to bitly.com|69.58.188.33|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bitly.com/1462835080018.zip [following]
--2016-05-12 17:12:28-- https://bitly.com/1462835080018.zip
Connecting to bitly.com|69.58.188.33|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-05-12 17:12:29 ERROR 404: Not Found.
How can I get wget to download all the zip files on the page properly?
You need to get the redirect from bit.ly and then download all files. This is real ugly, but it worked:
wget http://bitly.com/nuvi-plz --server-response -O /dev/null 2>&1 | \
awk '(NR==1){SRC=$3;} /^ Location: /{DEST=$2} END{ print SRC, DEST}' | sed 's|.*http|http|' | \
while read url; do
wget -A zip -r -l 1 -nd $url -P test/files/downloads
done
If you use the direct link, this will work:
wget -A zip -r -l 1 -nd http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/ -P test/files/downloads

Content-Disposition in dropbox

is there a way to rename a file while downloading from dropbox without changing the filename itself :
for example :
dropbox link : https://www.dropbox.com/s/uex431ou02h2m2b/300x50.gif?dl=1
and get file downloaded : NewNameImage.gif instead of 300x50.gif
Content-Disposition header didn't work for me .
any ideas how to do that ?
If you're downloading the file locally you should either be able to control the name given to the local file, or be able to rename it after the fact. For example, using curl, -JO downloads the file using the remote name specified in the Content-Disposition header, while -o lets you specify a name:
$ curl -L -JO "https://www.dropbox.com/s/uex431ou02h2m2b/300x50.gif?dl=1"
...
curl: Saved to filename '300x50.gif'
$ ls
300x50.gif
$ rm 300x50.gif
$ curl -L -o "NewNameImage.gif" "https://www.dropbox.com/s/uex431ou02h2m2b/300x50.gif?dl=1"
...
$ ls
NewNameImage.gif
Alternatively, renaming it after the fact:
$ curl -L -JO "https://www.dropbox.com/s/uex431ou02h2m2b/300x50.gif?dl=1"
...
curl: Saved to filename '300x50.gif'
$ ls
300x50.gif
$ mv 300x50.gif "NewNameImage.gif"
$ ls
NewNameImage.gif

How to download a file using curl

I'm on mac OS X and can't figure out how to download a file from a URL via the command line. It's from a static page so I thought copying the download link and then using curl would do the trick but it's not.
I referenced this StackOverflow question but that didn't work. I also referenced this article which also didn't work.
What I've tried:
curl -o https://github.com/jdfwarrior/Workflows.git
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
.
wget -r -np -l 1 -A zip https://github.com/jdfwarrior/Workflows.git
zsh: command not found: wget
How can a file be downloaded through the command line?
The -o --output option means curl writes output to the file you specify instead of stdout. Your mistake was putting the url after -o, and so curl thought the url was a file to write to rate and hence that no url was specified. You need a file name after the -o, then the url:
curl -o ./filename https://github.com/jdfwarrior/Workflows.git
And wget is not available by default on OS X.
curl -OL https://github.com/jdfwarrior/Workflows.git
-O: This option used to write the output to a file which named like remote file we get. In this curl that file would be Workflows.git.
-L: This option used if the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code), this option will make curl redo the request on the new place.
Ref: curl man page
The easiest solution for your question is to keep the original filename. In that case, you just need to use a capital o ("-O") as option (not a zero=0!). So it looks like:
curl -O https://github.com/jdfwarrior/Workflows.git
There are several options to make curl output to a file
# saves it to myfile.txt
curl http://www.example.com/data.txt -o myfile.txt -L
# The #1 will get substituted with the url, so the filename contains the url
curl http://www.example.com/data.txt -o "file_#1.txt" -L
# saves to data.txt, the filename extracted from the URL
curl http://www.example.com/data.txt -O -L
# saves to filename determined by the Content-Disposition header sent by the server.
curl http://www.example.com/data.txt -O -J -L
# -O Write output to a local file named like the remote file we get
# -o <file> Write output to <file> instead of stdout (variable replacement performed on <file>)
# -J Use the Content-Disposition filename instead of extracting filename from URL
# -L Follow redirects

Using wget to recursively fetch a directory with --no-parent

I am trying to download all of the files in a directory using:
wget -r -N --no-parent -nH -P /media/karunakar --ftp-user=jsjd --ftp-password='hdshd' ftp://ftp.xyz.com/Suppliers/my/ORD20130908
but wget is fetching files from the parent directory, even though I specified --no-parent. I only want the files in ORD20130908.
You need to add a trailing slash to indicate the last item in the URL is a directory and not a file:
wget -r -N --no-parent -nH -P /media/karunakar --ftp-user=jsjd --ftp-password='hdshd' ftp://ftp.xyz.com/Suppliers/my/ORD20130908
↓
wget -r -N --no-parent -nH -P /media/karunakar --ftp-user=jsjd --ftp-password='hdshd' ftp://ftp.xyz.com/Suppliers/my/ORD20130908/
From the documentation:
Note that, for HTTP (and HTTPS), the trailing slash is very important to ‘--no-parent’. HTTP has no concept of a “directory”—Wget relies on you to indicate what’s a directory and what isn’t. In ‘http://foo/bar/’, Wget will consider ‘bar’ to be a directory, while in ‘http://foo/bar’ (no trailing slash), ‘bar’ will be considered a filename (so ‘--no-parent’ would be meaningless, as its parent is ‘/’).
wget -r -N --no-parent -nH -P /media/karunakar --ftp-user=jsjd --ftp-password='hdshd' -I/ORD20130908 ftp://ftp.xyz.com/Suppliers/my
See wget document for the use of -I flag

How to download multiple URLs using wget using a single command?

I am using following command to download a single webpage with all its images and js using wget in Windows 7:
wget -E -H -k -K -p -e robots=off -P /Downloads/ http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html
It is downloading the HTML as required, but when I tried to pass on a text file having a list of 3 URLs to download, it didn't give any output, below is the command I am using:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt -B 'http://'
I tried this also:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt
This text file had URLs http:// prepended in it.
list.txt contains list of 3 URLs which I need to download using a single command. Please help me in resolving this issue.
From man wget:
2 Invoking
By default, Wget is very simple to invoke. The basic syntax is:
wget [option]... [URL]...
So, just use multiple URLs:
wget URL1 URL2
Or using the links from comments:
$ cat list.txt
http://www.vodafone.de/privat/tarife/red-smartphone-tarife.html
http://www.verizonwireless.com/smartphones-2.shtml
http://www.att.com/shop/wireless/devices/smartphones.html
and your command line:
wget -E -H -k -K -p -e robots=off -P /Downloads/ -i ./list.txt
works as expected.
First create a text file with the URLs that you need to download.
eg: download.txt
download.txt will as below:
http://www.google.com
http://www.yahoo.com
then use the command wget -i download.txt to download the files. You can add many URLs to the text file.
If you have a list of URLs separated on multiple lines like this:
http://example.com/a
http://example.com/b
http://example.com/c
but you don't want to create a file and point wget to it, you can do this:
wget -i - <<< 'http://example.com/a
http://example.com/b
http://example.com/c'
pedantic version:
for x in {'url1','url2'}; do wget $x; done
the advantage of it you can treat is as a single wget url command

Resources