How to wget/curl a dynamically built zip archive? - bash

I'm trying to create a script that will download a dynamically built Initializr zip archive. Something like:
wget http://www.initializr.com/builder?mode=less&boot-hero&h5bp-htaccess&h5bp-nginx&h5bp-webconfig&h5bp-chromeframe&h5bp-analytics&h5bp-build&h5bp-iecond&h5bp-favicon&h5bp-appletouchicons&h5bp-scripts&h5bp-robots&h5bp-humans&h5bp-404&h5bp-adobecrossdomain&jquery&modernizrrespond&boot-css&boot-scripts
That url works in a browser, but not in a script. In a script it downloads a small portion of the archive, and saves it as builder?mode=less instead of initializr-less-verekia-3.0.zip.
builder?mode=less actually unzips, so it is just a misnamed a zip file. But it's missing probably 80% of the files it should have.
Anyone know how to script this?

urls contain shell metacharacters, so you'll have to quote the whole url:
wget 'your...url...here'
If the initializr website doesn't put a proper filename into the HTTP response headers, wget will use a best-guess version based on the url being requested. You can force it to write to a specific filename with
wget 'your url here' -O name_of_file.zip

Related

WGET - Download specific files (by extension or mime-type) from third party websites

I need to get ALL ".js" extension files from a website by using wget, including third party ones, but it's not always being done.
I use the next code:
wget -H -p -A "*.js" -e robots=off --no-check-certificate https://www.quantcast.com
For example, if I execute wget to "https://www.stackoverflow.com" I want to get all "*.js" files from stackoverflow.com but also third party websites, such as "scorecardresearch.com", "secure.quantserve.com" and others.
Is something missing in my code?
Thanks in advance!
Wget with the -p flag will only download simple page requirements like scripts with a src, links with an href or images with a src.
Third party scripts are often loaded dynamically using script snippets (such as Google Tag Manager https://developers.google.com/tag-manager/quickstart). These dynamically loaded script will not be downloaded by Wget, since they need to run the JavaScript to actually load. To get absolutely everything, you would likely need something like Pupeteer or Selenium to load the page and scrape the contents.

Download CSV file in bash shell

Hello friends I am trying to download a csv file from a external url I try with wget command but I obtain a 400 Error bad request if I paste the url directly in the browser I can download the csv file, is there another way to download this type of file or other solution? I need to have a file with the csv content.
Thanks
Have you escaped all special characters in the url such as & to \& or $ to \$
If it doesn't have to do anything with authentication/cookies, run some tool on your browser (like Live HTTP Headers) to capture headers. If you, then, mockup those fields in wget, that would get you very close to be like your "browser". That will also show you if there is any difference in encodings between wget data and browser data.
On other hand, you could also watch the server log files (if you have access to them).

How to download all files from hidden directory

I have do download all log files from a virtual directory within a site. The access to virtual directory is forbidden but files are accessible.
I have manually entered the file names to download
dir="Mar"
for ((i=1;i<100;i++)); do
wget http://sz.dsyn.com/2014/$dir/log_$i.txt
done
The problem is the script is not generic and most of the time I need to find out how many files are there and tweak the for loop. Is there a way to trigger wget to fetch all files without me bothering to specify the exact count.
Note:
If I use the browser to view http://sz.dsyn.com/2014/$dir, it is 403 forbidden. I cant pull all the files via browser tool/extension.
First of all check this similar question If this is not what you are looking for, you need to generate a file of URLs within and feed wget. e.g.
wget --input-file=http://sz.dsyn.com/2014/$dir/filelist.txt
wget will have the same problem your browser has: it cannot read the directory. Just pull until your first failure then quit.

Use curl to download a Dropbox folder via shared link (not public link)

Dropbox makes it easy to programmatically download a single file via curl (EX: curl -O https://dl.dropboxusercontent.com/s/file.ext). It is a little bit trickier for a folder (regular directory folder, not zipped). The shared link for a folder, as opposed to a file, does not link directly to the zipped folder (Dropbox automatically zips the folder before it is downloaded). It would appear that you could just add ?dl=1 to the end of the link, as this will directly start the download in a browser. This, however, points to an intermediary html document that redirects to the actual zip folder and does not seem to work with curl. Is there anyway to use curl to download a folder via a shared link? I realize that the best solution would be to use the Dropbox api, but for this project it is important to keep it as simple as possible. Additionally, the solution must be incorporated into a bash shell script.
It does appear to be possible with curl by using the -L option. This forces curl to follow the redirect. Additionally, it is important to specify an output name with a .zip extension, as the default will be a random alpha-numeric name with no extension. Finally, do not forget to add the ?dl=1 to the end of the link. Without it, curl will never reach the redirect page.
curl -L -o newName.zip https://www.dropbox.com/sh/[folderLink]?dl=1
Follow redirects (use -L). Your immediate problem is that Curl is not following redirects.
Set a filename. (Optional)
Dropbox already sends a Content-Disposition Header with its Dropbox filename. There is no reason to specify the filename if you use the correct curl flags.
Conversely, you can force a filename using something of your choosing.
Use one of these commands:
curl https://www.dropbox.com/sh/AAbbCCEeFF123?dl=1 -O -J -L
Preserve/write the remote filename (-O,-J) and follows any redirects (-L).
This same line works for both individually shared files or entire folders.
Folders will save as a .zip automatically (based on folder name).
Don't forget to change the parameter ?dl=0 to ?dl=1 (see comments).
OR:
curl https://www.dropbox.com/sh/AAbbCCEeFF123?dl=1 -L -o [filename]
Follow any redirects (-L) and set a filename (-o) of your choosing.
NOTE: Using the -J flag in general:
WARNING: Exercise judicious use of this option, especially on Windows. A rogue server could send you the name of a DLL or other file that could possibly be loaded automatically by Windows or some third party software.
Please consult: https://curl.haxx.se/docs/manpage.html#OPTIONS (See: -O, -J, -L, -o) for more.

How to download a file with wget that starts with a word and it has a specific extension?

Im trying to do a bash script and i need to download certain files with wget
like libfat-nds-1.0.11.tar.bz2 but after some times the version of this file may change so i would like to download a file that start with libfatnds and ends in .tar.bz2 .Is this possible with wget?
Using only wget, it can be achieved by specifying filename with wildcards in the list of accepted extensions.
wget -r -np -nd --accept='libfat-nds-*.tar.bz2'
The problem is that HTTP doesn't support wildcard downloads
. But if there is content listing enabled on the server or you have a index.html containing the available file names you could download that, extract the file name you need and then download the file with wget.
Something in this order
Download the index with curl
Use grep and/or sed to extract the exact file name
Download the file with wget (or curl)
If you pipe the commands you can do it on one line.

Resources