how to download file VIA wget while target path include Wildcards - bash

here is elegant example how to download file and copy it to
/etc/yum.repo.d folder
example
REPOSITORY_SERVER=master_machine01
wget -nd -r -P /etc/yum.repos.d/ -A ".repo" "http://$REPOSITORY_SERVER/ambari/centos7/2.6.2.2-1/ambari.repo"
after above command ambari.repo file will copied to /etc/yum.repos.d/
note: the file amabri.rep path is
ls -ltr /var/www/html/ambari/centos7/2.6.2.2-1/ambari.repo
-rw-r--r-- 1 root users 304 Jun 11 2018 /var/www/html/ambari/centos7/2.6.2.2-1/ambari.repo
so this is the simple case
now what about path could be as ( with diff path's )
$REPOSITORY_SERVER/ambari/centos7/2.6.2.3-1/ambari.repo
or
$REPOSITORY_SERVER/ambari/centos7/2.6.2.2-4/ambari.repo
then how to use the cli with Wildcards
we try the following
wget -nd -r -P /etc/yum.repos.d/ -A ".repo" "http://$REPOSITORY_SERVER/ambari/centos7/*/ambari.repo"
but we get
HTTP request sent, awaiting response... 404 Not Found
2021-11-28 18:40:07 ERROR 404: Not Found.
or even with backslash
wget -nd -r -P /etc/yum.repos.d/ -A ".repo" "http://$REPOSITORY_SERVER/ambari/centos7/\*/ambari.repo"
BUT WITH THE SAME ERROR
any idea how to resolve this issue?

how to use the cli with Wildcards
It is not possible to perform a glob expansion with HTTP protocol. These are very unrelated technologies.
how to resolve this issue?
Devise and implement a method of getting the available files under certain path from an HTTP server. For example, contact the server administrator and ask him about it. Potentially, if the HTTP server supports serving a directory listing, recursively filter the listing to find all matching paths. Or find and query some other site that contains all the links and filter the obtained answer to extract all links, for example. Etc.

Related

How to download a big file from google drive via curl in Bash?

I wanna make a very simple bash script for downloading files from google drive via Drive API, so in this case there is a big file on google drive and I installed OAuth 2.0 Playground on my google drive account, then in the Select the Scope box, I choose Drive API v3, and https://www.googleapis.com/auth/drive.readonly to make a token and link.
After clicking Authorize APIs and then Exchange authorization code for tokens. I copied the Access tokenlike below.
#! /bin/bash
read -p 'Enter your id : ' id
read -p 'Enter your new token : ' token
read -p 'Enter your file name : ' file
curl -H "Authorization: Bearer $token" "https://www.googleapis.com/drive/v3/files/$id?alt=media" -o "$file"
but it won't work, any idea ?
for example the size of my file is 12G, when I run the code I will get this as output and after a second it back to prompt again ! I checked it in two computers with two different ip addresses.(I also add alt=media to URL)
-bash-3.2# bash mycode.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 166 100 166 0 0 80 0 0:00:02 0:00:02 --:--:-- 80
-bash-3.2#
the content of file that it created is like this
{
"error": {
"errors": [
{
"domain": "global",
"reason": "downloadQuotaExceeded",
"message": "The download quota for this file has been exceeded."
}
],
"code": 403,
"message": "The download quota for this file has been exceeded."
}
}
You want to download a file from Google Drive using the curl command with the access token.
If my understanding is correct, how about this modification?
Modified curl command:
Please add the query parameter of alt=media.
curl -H "Authorization: Bearer $token" "https://www.googleapis.com/drive/v3/files/$id?alt=media" -o "$file"
Note:
This modified curl command supposes that your access token can be used for downloading the file.
In this modification, the files except for Google Docs can be downloaded. If you want to download the Google Docs, please use the Files: export method of Drive API. Ref
Reference:
Download files
If I misunderstood your question and this was not the direction you want, I apologize.
UPDATE AS FOR MARCH 2021
Simply follow this guide here. It worked for me.
In summary:
For small files to download run
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=FILEID' -O FILENAME
While if you are trying to download a quite large file you should try to run
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=FILEID' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=FILEID" -O FILENAME && rm -rf /tmp/cookies.txt
Simply substitute FILEID and FILENAME with your custom values.
FILEID can be found in your file share link (after the /d/ as illustrated in the article mantioned above).
FILENAME is simply the name you want to save the download as. Remember to include the right extension. For Example FILENAME = my_file.pdf if the file is a pdf.
This is a known bug
It has been reported in this Issue Tracker post. This is caused because as you can read in the documentation:
(about download url)
Short lived download URL for the file. This field is only populated
for files with content stored in Google Drive; it is not populated for
Google Docs or shortcut files.
So you should use another field.
You can follow the report by clicking on the star next to the issue
number to give more priority to the bug and to receive updates.
As you can read in the comments of the report, the current workaround is:
Use webContentlink instead
or
Change www.googleapis.com to content.googleapis.com

Pipelining listing and copying files

I have a server which is hosting my files which I can list with the following command:
xrdfs servername ls path/to/file
Similarly, I can copy file using the following command:
xrdcp server/path/to/file .
For, some reason the server doesn't support copying an entire folder(of course with -r option). So, I am trying to pipeline these two commands such a way that xrdfs will list the files and xrdcp will copy it to my destination. I tried the following line:
xrdfs servername ls path/to/file | xrdcp server/$() .
I get the following message:
Prepare: [ERROR] Invalid arguments
This is not very enlightening. Can somebody help with this?
Ok, I found an answer and I am posting here for reference
xrdfs servername ls path/to/file | while read -r out; do xrdcp server$out .; done

How to download some specific files with some keywords from different directories using wget?

I am trying to download data from TRMM satellite data archive using the following command
wget -r --no-parent ftp://arthurhou.pps.eosdis.nasa.gov/pub/trmmdata/ByDate/V07/2008/01/01 --user=--user= --password="
2008 is the year, 01 is for January and 01 is for 01 is for the date. Within this date folder, there are plenty of data files
(e.g 1A01.20080101.57701.7.gz, 2A21.20080101.57711.7.HDF.gz, 2A23.20080101.57702.7.HDF.gz).
I want to download only the files under "2A23" category from every folder (e.g year, month and date), but with my wget command all the files are getting downloaded. Is there a way to specify some key to download just those files?
Thank you in advance for your help.
The solution is here, if someone is stuck at the same question later.
wget -r --no-parent -A 'pattern' 'URL' --user=--user= --password=
In my case the pattern was 2a23*.gz.

What is the fastest way to perform a HTTP request and check for 404?

Recently I needed to check for a huge list of filenames if they exist on a server. I did this by running a for loop which tried to wget each of those files. That was efficient enough, but took about 30 minutes in this case. I wonder if there is a faster way to check whether a file exists or not (since wget is for downloading files and not performing thousands of requests).
I don't know if that information is relevant, but it's an Apache server.
Curl would be the best option in a for loop and here is a straight forward simple way, run this in your forloop
curl -I --silent http://www.yoururl/linktodetect | grep -m 1 -c 404
What this simply does is check the http response header for a 404 returned on the link and if its detected as a missing file/link throwing a 404 then the command line output will display you a number 1; otherwise, if the file/link is valid and does not return a 404 then the command line output will display you a number 0.

Using CURL to download file and view headers and status code

I'm writing a Bash script to download image files from Snapito's web page snapshot API. The API can return a variety of responses indicated by different HTTP response codes and/or some custom headers. My script is intended to be run as an automated Cron job that pulls URLs from a MySQL database and saves the screenshots to local disk.
I am using curl. I'd like to do these 3 things using a single CURL command:
Extract the HTTP response code
Extract the headers
Save the file locally (if the request was successful)
I could do this using multiple curl requests, but I want to minimize the number of times I hit Snapito's servers. Any curl experts out there?
Or if someone has a Bash script that can respond to the full documented set of Snapito API responses, that'd be awesome. Here's their API documentation.
Thanks!
Use the dump headers option:
curl -D /tmp/headers.txt http://server.com
Use curl -i (include HTTP header) - which will yield the headers, followed by a blank line, followed by the content.
You can then split out the headers / content (or use -D to save directly to file, as suggested above).
There are three options -i, -I, and -D
> curl --help | egrep '^ +\-[iID]'
-D, --dump-header FILE Write the headers to FILE
-I, --head Show document info only
-i, --include Include protocol headers in the output (H/F)

Resources