Download tsv file from link which generates the file upon request, using bash - bash

I need to download .txt files which are generated from links like this one:
https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP002480&result=read_run&fields=fastq_ftp&format=tsv&download=true&limit=0
but I need to download it in the bash shell. It works perfectly fine on Firefox, on the shell I tried wget and curl to no avail. I read lots of similar question in Stack Overflow and other pages, tried everything I could find, but couldn't find a solution.
For example:
curl https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP002480&result=read_run&fields=fastq_ftp&format=tsv&download=true&limit=0
This is the output, and no file is downloaded:
[1] 1094
[2] 1095
[3] 1096
[4] 1097
[5] 1098
[2] Done result=read_run
[3] Done fields=fastq_ftp
[4]- Done format=tsv
(base) user#DESKTOP-LV4SKHQ:/mnt/c/Users/conog/Desktop/prova$ curl: (6) Could not resolve host: www.ebi.ac.uk
[1]- Exit 6 curl https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP002480
[5]+ Done download=true
Another example, after I read a couple of posts here:
curl -O -L https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP002480&result=read_run&fields=fastq_ftp&format=tsv&download=true&limit=0
[1] 1056
[2] 1057
[3] 1058
[4] 1059
[5] 1060
[2] Done result=read_run
[3] Done fields=fastq_ftp
[4] Done format=tsv
[5]+ Done download=true
(base) gsoletta#DESKTOP-LV4SKHQ:/mnt/c/Users/conog/Desktop/prova$ % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 49 100 49 0 0 68 0 --:--:-- --:--:-- --:--:-- 67
[1]+ Done
this last one downloads a 49 byte file with no extension, called filereportaccession=SRP002480, with the content: "Required String parameter 'result' is not present".
I'll also add I'm a novice at bash.
What could I do?
Thank you!

It works for me:
$ curl -s 'https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP002480&result=read_run&fields=fastq_ftp&format=tsv&download=true&limit=0'
run_accession fastq_ftp
SRR1620013 ftp.sra.ebi.ac.uk/vol1/fastq/SRR162/003/SRR1620013/SRR1620013_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR162/003/SRR1620013/SRR1620013_2.fastq.gz
SRR1620014 ftp.sra.ebi.ac.uk/vol1/fastq/SRR162/004/SRR1620014/SRR1620014_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR162/004/SRR1620014/SRR1620014_2.fastq.gz
...
$ wget -O filereport.tsv 'https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP002480&result=read_run&fields=fastq_ftp&format=tsv&download=true&limit=0'
--2021-11-15 17:51:48-- https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP002480&result=read_run&fields=fastq_ftp&format=tsv&download=true&limit=0
Resolving www.ebi.ac.uk (www.ebi.ac.uk)... 193.62.193.80
Connecting to www.ebi.ac.uk (www.ebi.ac.uk)|193.62.193.80|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘filereport.tsv’
...
2021-11-15 17:51:51 (831 KB/s) - ‘filereport.tsv’ saved [675136]
Your problem is that you didn't put quotes around the URL. When you don't quote the URL the &s in it cause each URL parameter to be interpreted as a separate command by bash.

Related

Bash How to kill wget process after a given timeout? [duplicate]

This question already has answers here:
Timeout command on Mac OS X?
(6 answers)
Closed 1 year ago.
UPDATE:
This question has been closed. I asked in another question.
Bash how to run multiple wget with output by sequence (NOT Parallel) and delete the wget file for speedtest purpose
I used timeout solution
#!/bin/bash
function speedtest() {
local key=$1
local url=$2
timeout 10 wget $url
echo -e "\033[40;32;1m$key is completed.\033[0m"
}
speedtest "Lisbon" "https://lg-lis.fdcservers.net/100MBtest.zip"
speedtest "London" "https://lg-lon.fdcservers.net/100MBtest.zip"
speedtest "Madrid" "https://lg-mad.fdcservers.net/100MBtest.zip"
speedtest "Paris" "https://lg-par2.fdcservers.net/100MBtest.zip"
When I run the bash script, this is the output.
» ./wget_speedtest.sh [2021/05/8 |15:42:49]
Redirecting output to ‘wget-log’.
Lisbon is completed.
Redirecting output to ‘wget-log.1’.
London is completed.
Redirecting output to ‘wget-log.2’.
Madrid is completed.
Redirecting output to ‘wget-log.3’.
Paris is completed.
I am expected to see the kb/s for running after 10 seconds for each wget.
I have a list of wget of file to download and want to see the download speed. I just want to run about 10 seconds, then print out the result what is the download speed.
I have 20 different server file to test out. My goal is to see the how kb/s download for that 10 seconds.
e.g
> wget https://lg-lis.fdcservers.net/100MBtest.zip
--2021-05-08 13:37:37-- https://lg-lis.fdcservers.net/100MBtest.zip
Resolving lg-lis.fdcservers.net (lg-lis.fdcservers.net)... 50.7.43.4
Connecting to lg-lis.fdcservers.net (lg-lis.fdcservers.net)|50.7.43.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104857600 (100M) [application/zip]
Saving to: ‘100MBtest.zip’
100MBtest.zip 0%[ ] 679.66K 174KB/s eta 9m 26s ^C
This is my bash file
#!/bin/bash
function speedtest() {
local key=$1
local url=$2
( cmdpid=$$;
(sleep 10; kill $cmdpid; rm -f 100M) \
& while ! wget "$url"
do
echo -e "\033[40;32;1m$key for 10 seconds done.\033[0m"
done )
}
speedtest "Lisbon" "https://lg-lis.fdcservers.net/100MBtest.zip"
speedtest "London" "https://lg-lon.fdcservers.net/100MBtest.zip"
speedtest "Madrid" "https://lg-mad.fdcservers.net/100MBtest.zip"
speedtest "Paris" "https://lg-par2.fdcservers.net/100MBtest.zip"
However, the above code does not work, it still download at background and redirect to wget-log
> ./wget_speedtest.sh
--2021-05-08 13:41:56-- https://lg-lis.fdcservers.net/100MBtest.zip
Resolving lg-lis.fdcservers.net (lg-lis.fdcservers.net)... 50.7.43.4
Connecting to lg-lis.fdcservers.net (lg-lis.fdcservers.net)|50.7.43.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104857600 (100M) [application/octet-stream]
Saving to: ‘100M’
100M 5%[==> ] 5.84M 1.02MB/s eta 79s [1] 21251 terminated ./wget_speedtest.sh
Redirecting output to ‘wget-log’.
Use the timeout command (part of GNU coreutils):
$ timeout 10 wget https://lg-lis.fdcservers.net
/100MBtest.zip
--2021-05-07 22:53:20-- https://lg-lis.fdcservers.net/100MBtest.zip
Resolving lg-lis.fdcservers.net (lg-lis.fdcservers.net)... 50.7.43.4
Connecting to lg-lis.fdcservers.net (lg-lis.fdcservers.net)|50.7.43.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104857600 (100M) [application/zip]
Saving to: ‘100MBtest.zip’
100MBtest.zip 0%[ ] 23.66K 10.8KB/s
$
If the specified time is exceeded, timeout exits with a status of 124.
timeout --help or info coreutils timeout for more information.
If you're on MacOS, see Timeout command on Mac OS X? for some suggested alternatives.

Piping raw code from github to ruby not working?

I am doing some basic piping of some simple raw code from github to terminal as shown here i.e.
curl https://raw.github.com/leachim6/hello-world/master/r/ruby.rb | ruby
When I try it, it doesn't produce "Hello World", but instead I just see
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
use
curl -sSL https://raw.github.com/leachim6/hello-world/master/r/ruby.rb | ruby
this should work
Update to explain
this URL is redirecting to
https://raw.githubusercontent.com/leachim6/hello-world/master/r/ruby.rb
so -L option was required to follow the redirection (-L, --location)
this option will make curl redo the request on the new place
sS to hide the progress bar and show errors if happened
to debug curl request you can use -v option which will make you see exactly what is happening

Calling curl command from Ruby's system method

I'm using curl command inside logstash exec plugin to post a message to a stride group. The plugin documentation stated that they were using ruby system method to execute the command, so I'm trying to run it in my ruby IRB.
Escaping the double quotes with backslash character is giving error The request body cannot be parsed as valid JSON. Here is the full error
irb(main):050:0' --inf-ruby-2f3827a9-23243-13517-726000--
curl: (6) Couldn't resolve host 'first'
curl: (3) [globbing] unmatched close brace/bracket in column 9
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 165 0 87 100 78 60 54 0:00:01 0:00:01 --:--:-- 60
{
"statusCode": 400,
"message": "The request body cannot be parsed as valid JSON"
}=> true
I tried swapping double quotes with single quotes and using full double quotes everywhere. Nothing is working.
system("curl -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer blah-blah' -d '{\"body\":{\"version\":1,\"type\":\"doc\",\"content\":[{\"type\":\"paragraph\",\"content\":[{\"type\":\"text\",\"text\":\"My first message!\"}]}]}}' --url 'https://api.atlassian.com/site/blah-blah/conversation/blah-blah/message'")
Is there any way to make this work?
EDIT: I have tried running the cURL command in the terminal and it is working fine.
Besides the following answers on StackOverflow, NetHTTP has notoriously poor documentation but can be used to post what you're interested in.

How do I download a file to a newly created directory with curl on OS X?

I am trying to download my Heroku backups to a folder.
Downloading to the current folder like this works:
curl -o latest.dump `heroku pg:backups public-url`
But when I tried adding a folders path to latest.dump it looks like this:
$ curl -o /db-bkups/latest.dump `heroku pg:backups public-url`
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 44318 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0Warning: Failed to create the file
Warning: /db-bkups/latest.dump: No such file
Warning: or directory
36 44318 36 16384 0 0 9626 0 0:00:04 0:00:01 0:00:03 9626
curl: (23) Failed writing body (0 != 16384)
Ideally, I would like it be saved and downloaded like this:
/db-bkups/nov-1-2016/timestamp-db.dump
Where the folder nov-1-2016 is created dynamically when the cron is run, and the filename is the timestamp when the bkup was run.
You could try using the --create-dirs argument which was added in curl 7.10.3:
Here is an example that will create the directory hierarchy, (if it doesn't already exist), and will name the subdirectory you require renamed with the output of the datecommand:
curl -o /db-bkups/$(date +"%b-%d-%Y")/timestamp-db.dump --create-dirs http://www.w3schools.com/xml/simple.xml
The result is a file stored in a directory like so /db-bkups/Nov-04-2016/timestamp-db.dump.

Why is wget saving something when using parametrized url?

I am using following command in my bash script to trigger jenkins build:
wget --no-check-certificate "http://<jenkins_url>/view/some_view/job/some_prj/buildWithParameters?token=xxx"
Output:
HTTP request sent, awaiting response... 201 Created
Length: 0
Saving to: “buildWithParameters?token=xxx”
[ <=> ] 0 --.-K/s in 0s
2015-02-20 10:10:46 (0.00 B/s) - “buildWithParameters?token=xxx” saved [0/0]
And then it's creates empty file: “buildWithParameters?token=xxx”
My question is: why wget creates this file and how to turn that functionality off?
Most simply:
wget --no-check-certificate -O /dev/null http://foo
this will make wget save the file to /dev/null, effectively discarding it.

Resources