wget command using brew in terminal (mac) - terminal

wget -E -H -k -K -p -e robots=off -P ./images/ -i./list.txt
./list.txt: No such file or directory
No URLs found in ./list.txt.
Converted links in 0 files in 0 seconds.
I downloaded and installed brew. Further, I installed wget and it's letting me download images one image at a time. However, when I tried the aforementioned command to download images from multiple urls, it's not doing anything. Can someone tell me what I could be doing wrong here?

wget is pretty lucid with description of issue
./list.txt: No such file or directory
apparently there is not file named list.txt inside current dir. Please trying giving full path to list.txt.

Related

Considering a specific name for the downloaded file

I download a .tar.gz file using wget using this command:
wget hello.tar.gz
This is a part of a long script, sometimes when I want to download this file, an error occurs and when for the second time the file is downloaded the name of the downloaded file changes to something like this:
hello.tar.gz.2
the third time:
hello.tar.gz.3
How can I say that the whatever the name of the downloaded is, change it to hello.tar.gz?
In other words I don't want the name of the downloaded file be anything other than hello.tar.gz?
wget hello.tar.gz -O <fileName>
wget have internal option like -r, -p to change default behavior
So just try the following:
wget -p <url>
wget -r <url>
Since now you noticed the incremental change. Discard any repeated files and rely on the following as initial condition:
wget hello.tar.gz
mv hello.tar.gz.2 hello.tar.gz

How to curl for extracting valid .zip file from redirecting link

I am trying to automate a data downloading process. For this purpose, my goal is to extract (using bash commands) the .zip from a redirection link that could be seen on display here: https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303
I have seen that people suggest the -L tag with curl for redirections, but it doesn't seem to work for my case. The specific command I have tried is:
curl -L -o output.zip https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip
The command file output.zip shows that the extracted .zip file is actually a HTML document text. On the other hand, clicking the redirection link (used inside curl command) downloads the extracted folder automatically via a browser.
Any ideas, tips, or suggestions on what I should try (or whether this is possible or not) will be highly appreciated!
If you execute curl with the --verbose option you can see that it is a cookie related problem. The cookie engine needs to be enabled. You can download the desired file as follows:
curl -b cookies.txt -L https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip -o test.zip
It doesn't matter if the file provided with the -b option doesn't exist. We just need to activate the cookie engine.
Refer to Send cookies with curl and Save cookies between two curl requests for futher information.
You can download that file with wget on Linux
$ wget https://journals.sagepub.com/doi/suppl/10.1177/0022002706289303/suppl_file/Sambanis_Aug_06.zip
$ unzip Sambanis_Aug_06.zip
Archive: Sambanis_Aug_06.zip
inflating: Sambanis (Aug 06).dta
inflating: Sambanis Appendix (Aug 06).pdf

wget hangs after large file download

I'm trying to download a large file over a ftp. (5GB file). Here is my script.
read ZipName
wget -c -N -q --show-progress "ftp://Password#ftp.server.com/$ZipName"
unzip $ZipName
The files downloads at 100% but never goes to the unzip command. No special error message, no outputs in the terminal. Just blank new line. I have to send CTRL + c and run back to script to unzip since wget detects that the file is fully downloaded.
Why does is hangs out like this? Is it because of the large file, or passing an argument in command?
By the way I can't use ftp because it's not on the VM i'm working on, and it's a temporary VM so no root privilege to install anything.
I've made some tests, and I think that size of the disk was the reason.
I've tried with curl -O and it worked for the same disk space.

Bash command to copy images from remote url

I'm using mac's terminal.
I want to copy images from remote url: http://media.pragprog.com/titles/rails4/code/depot_b/public/images/ to a local directory.
What's the command to do that?
Tnx,
You can use curl
curl -O "http://media.pragprog.com/titles/rails4/code/depot_b/public/images/*.jpg"
for example.
alternatively you may want just all the images, from a website. wget can do this with a recursive option such as:
$ wget -r -A=jpeg,jpg,bmp,png,gif,tiff,xpm,ico http://www.website.com/
This should only download the comma delimited extensions recursively starting at the site index. This works like a web-spider so if its not referenced anywhere on the site it will miss the image.
wget will work, assuming the server has directory listing:
wget -m http://media.pragprog.com/titles/rails4/code/depot_b/public/images
You can do this with Wget or cURL. If I recall correctly, neither come out-of-the-box w/ OS X, so you may need to install them with MacPorts or something similar.

wget on Windows command line

Basically I'm trying to download images from a website using the following command (SwiftIRC is an easy example to use):
wget.exe -r -l1 -A.png --no-parent www.swiftirc.net/index.php
This command works fine, however one of the ways I am trying to do it isn't working.
When I fire up an elevated command prompt, default to windows\system32.
If I use to following two commands everything works fine:
cd c:\users\tom\downloads\\
wget.exe -r -l1 etc. etc.**
The images are saved in the folder www.swiftirc.net in my downloads folder.
However if I try to do this in one line like this:
c:\users\tom\downloads\wget.exe -r -l1 etc. etc.
The response from wget on the cmd is exactly the same, but the images are not saved on my hard disk.
Does anyone know what I'm doing wrong?
Try adding c:\users\tom\downloads\ to PATH or put wget.exe into your windows/system32 folder.
I beleive it's because windows doesn't allow users to write files on the disk root, when you run "c:\users\tom\downloads\wget.exe" you have C:\ as a working directory so the files should be saved there but it's not allowed by the common strategies

Resources