Use curl and loop in shell script to through different jq values - bash

I am using a curl command to get json data from an application called "Jira".
Stupidly (in my view), you cannot use the api to return more than 50 values at a time.
The only choice is to do it in multiple commands and they call this "pagination". It is not possible to get more than 50 results, no matter the command.
This is the command here:
curl -i -X GET 'https://account_name.atlassian.net/rest/api/3/project/search?jql=ORDER%20BY%20Created&maxResults=50&startAt=100' --user 'scouse_bob#mycompany.com:<sec_token_deets>'
This is the key piece of what I am trying to work into a loop to avoid having to do this manually each time:
startAt=100
My goal is to "somehow" have this loop in blocks of fifty, so, startAt=50 then startAt=100, startAt=150 etc and append the entire output to a file until the figure 650 is reached and / or there is no further output available.
I have played around with a command like this:
#!/bin/ksh
i=1
while [[ $i -lt 1000 ]] ; do
curl -i -X GET 'https://account_name.atlassian.net/rest/api/3/project/search?jql=ORDER%20BY%20Created&maxResults=50&startAt=100' --user 'scouse_bob#mycompany.com:<sec_token_deets>'
echo "$i"
(( i += 1 ))
done
Which does not really get me far as although it will loop, I am uncertain as to how to apply the variable.
Help appreciated.

My goal is to "somehow" have this loop in blocks of fifty, so, startAt=50 then startAt=100, startAt=150 etc and append the entire output to a file until the figure 650 is reached and / or there is no further output available.
The former is easy:
i=0
while [[ $i -lt 650 ]]; do
# if you meant until 650 inclusive, change to -le 650 or -lt 700
curl "https://host/path?blah&startAt=$i"
# pipe to/through some processing if desired
# note URL is in " so $i is expanded but
# other special chars like & don't screw up parsing
# also -X GET is the default (without -d or similar) and can be omitted
(( i+=50 ))
done
The latter depends on just what 'no further output available' looks like. I'd expect you probably don't get an HTTP error, but either a contenttype indicating error or a JSON containing either an end or error indication or a no-data indication. How to recognize this depends on what you get, and I don't know this API. I'll guess you probably want something more or less like:
curl ... >tmpfile
if jq -e '.eof==true' tmpfile; then break; else cat/whatever tmpfile; fi
# or
if jq -e '.data|length==0' tmpfile; then break; else cat/whatever tmpfile; fi
where tmpfile is some suitable filename that won't conflict with your other files; the most general way is to use $(mktemp) (saved in a variable). Or instead of a file put the data in a variable var=$(curl ...) and then use <<<$var as input to anything that reads stdin.
EDIT: I meant to make this CW to make it easier for anyone to add/fix the API specifics, but forgot; instead I encourage anyone who knows to edit.
You may want to stop when you get partial output i.e. if you ask for 50 and get 37, it may mean there is no more after those 37 and you don't need to try the next batch. Again this depends on the API which I don't know.

Related

bash script if statement issue

I call the following script from another bash script to check for file changes. If the SHA changes from one execution to the next, the google doc has been updated.
The script (tries to) accept a google drive doc ID as a parameter, then tries a few times to get the info (because gdrive fails randomly). The results are several lines long,so the script does a SHA on the results to get a unique short result.
It was working (when gdrive would return results), so I added the loop and failure message to make it a little more robust, but...
I must be doing something wrong with the if and possibly the while statements in the following script because the script cycles through only once even when the gdrive info results fail. Also when the string to test the length of is set to something deliberately short.
If I had hair, I'd be pulling it out.
#!/bin/bash
maxAttempts=10
minResult=100 # gdrive errors are about 80 characters
# If there was no parameter, give an error
[[ -z $1 ]] && errorMsg="Error: no google docs ID provided as a parameter" && echo $errorMsg && exit 0
# With an ID, get the file info, which includes a timestamp and return the SHA
attemptCount=0
strLength=1
while [[ "$strLength" < "$minResult" && "$attemptCount" < "$maxAttempts" ]];
do
((attemptCount++))
fileInfo="$(gdrive info $1)"
#fileInfo="TESTXXX" # use for testing different message lengths
strLength=${#fileInfo} # if under 100, the grive attempt failed
timeStamp="$(echo -n $fileInfo | sha256sum )"
echo $fileInfo
if [[ $strLength < $minResult ]]; then
sleep 10
timeStamp="Failed to get timestamp after $attemptCount tries, or wrong ID provided"
fi
done
#return the timestamp
echo $timeStamp
With the if statement at the end, I've tried single and double square brackets, double quotes around the variables, -gt and < and even putting in the numerical values 7 and 100 to try and force that section to execute and it is failing. I have if statements in other functioning scripts that look exactly the same. I'm going crazy. What am I not seeing wrong? Help please.
Use round parenthesis for numerical operations:
if (( strLength < minResult )); then
But make sure that the variables contain numerical values. This can be done with declare:
declare -i strLength
declare -i minResult

Capture percentage download from curl in real time

I am downloading a file from remote server through curl. The script will be packaged as an app through Platypus, which lets the app show percentage progress if the script output is of the format PROGRESS:\d+\n, as described here. I tried doing this
curl -O <remote_file> | sed -r 's/[# ]//g;s/^/#/g'
to get the output percentage, it didn't work. I tried out another method as described here. That didn't work as well.
How do I capture the percentage download and echo text like PROGRESS:<percentage>\n in parallel.
P.S Perhaps because it is Bash and GNU Sed is not there, it is not working as expected.
First, curl outputs information like progression on the standard error output, so you have to consider STDERR at least, for example by merging it with STDOUT using 2>&1.
Second, curl is not producing \n symbols while downloading the file but stays on the same line to overwrite displayed progression as the file is downloaded. You may thus have to read char by char the output stream, for example by using IFS= read -r -n1 char.
Given a curl-like progression char stream, I suggest to use a guard mechanism to know which char has to be kept:
If it is a number, we now want to keep the characters.
If it is a % symbol, we now don't want to keep the characters anymore
The following code uses these ideas and may suit your problem.
curl -# -O URL 2>&1 | while IFS= read -r -n1 char; do
[[ $char =~ [0-9] ]] && keep=1 ;
[[ $char == % ]] && echo "PROGRESS:$progress" && progress="" && keep=0 ;
[[ $keep == 1 ]] && progress="$progress$char" ;
done
Enjoy !
PS : You can of course use it as a oneliner by concatening all this code but I thought it'd be easier to read like that.

Iterate Over X Lines in Variable (Bash)

I'm trying to write a simple script to print the first 5 lines of a webpage's source code, and then the request time it took for the page to load. Currently, I've been trying the following:
#!/bin/bash
# Enter a website, get data
output=$(curl $1 -s -w "%{time_connect}\n" | (head -n5; tail -n1))
echo "$output"
However, on some pages, the "tail" doesn't actually print, which should be the time to request, and I'm not sure why.
I've found that I can also use a while loop to iterate through lines and print the whole thing, but is there a way for me to just echo the first few lines of a variable and then the last line of that same variable, so I can precede the request time with a heading (ex: Request time: 0.489)?
I'd like to be able to format it as:
echo "HTML: $output\n"
echo "Request Time: $requestTime"
Thank you! Sorry if this seems very simple, I am really new to this language :). The main problem for me is getting this data all from the same request- doing two separate curl requests would be very simple.
head may read more than 5 lines of input in order to identify what it needs to output. This means the lines you intended to pass to tail may have already been consumed. It's safer to use a single process (awk, in this case) to handle all the output.
output=$(curl "$1" -s -w "%{time_connect}\n" | awk 'NR<=5 {print} END {print})
The carriage returns threw me. Try this:
echo "HTML: "
retn=$'\r'
i=0
while read item
do
item="${item/$retn/}" # Strip out the carriage-return
if (( i < 5 )); then # Only display the first 5 lines
echo "$item"
fi
(( i++ ))
requestTime="$item" # Grab the last line
done < <(curl $1 -s -w "%{time_connect}\n")
requestTime="${requestTime##*\>}" # Sanitise the last line
echo "Request Time: $requestTime"

BASH if then else with quoted text

I'm trying to write a short script that checks for verizon fios availability by zip code from a list of 5 digit us zip codes.
The basis of this I have working but comparing the recived output from curl to the expected output in the if statements isn't working.
I know there is a better & cleaner way to do this however I'm really interested in what is wrong with this method. I think it's something to do with the quotes getting jumbled up.
Let me know what you guys think. I originally thought this would be a quick little script. ha. Thanks for the help
Here is what I have so far:
#!/bin/bash
Avail='<link rel="canonical" href="http://fios.verizon.com/fios-plans.html#availability-msg" />'
NotAvail='<link rel="canonical" href="http://fios.verizon.com/order-now.html#availability-msg" />'
while read zip; do
chk=`curl -s -d "ref=GIa6uiuwP81j047HjKMHOwEyW4QJTYjG&PageID=page9765&AvailabilityZipCode=$zip" http://fios.verizon.com/availability_post4.php --Location | grep "availability-msg"`
#echo $chk
if [ "$chk" = "$Avail" ]
then
fios=1
elif [ "$chk" = "$NotAvail" ]
then
fios=0
else
fios=err
fi
echo "$zip | $fios"
done < zipcodes.txt
Most likely, the line read from curl ends in CR/LF. grep will take the LF as a line-end, and leave the CR in the line, where it will not match either of your patterns. (Other whitespace issues could also cause a similarly invisible mismatch, but stray CR's are very common since HTTP insists on them.)
The easiest solution is to use a less specific match, like a glob or regex; these are both available with bash's [[ (rather than [) command.
Eg.:
if [[ $chk =~ /fios-plans\.html ]]; then
will do a substring comparison

BASH script: Downloading consecutive numbered files with wget

I have a web server that saves the logs files of a web application numbered. A file name example for this would be:
dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log
The last 3 digits are the counter and they can get sometime up to 100.
I usually open a web browser, browse to the file like:
http://someaddress.com/logs/dbsclog01s001.log
and save the files. This of course gets a bit annoying when you get 50 logs.
I tried to come up with a BASH script for using wget and passing
http://someaddress.com/logs/dbsclog01s*.log
but I am having problems with my the script.
Anyway, anyone has a sample on how to do this?
thanks!
#!/bin/sh
if [ $# -lt 3 ]; then
echo "Usage: $0 url_format seq_start seq_end [wget_args]"
exit
fi
url_format=$1
seq_start=$2
seq_end=$3
shift 3
printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$#"
Save the above as seq_wget, give it execution permission (chmod +x seq_wget), and then run, for example:
$ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50
Or, if you have Bash 4.0, you could just type
$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log
Or, if you have curl instead of wget, you could follow Dennis Williamson's answer.
curl seems to support ranges. From the man page:
URL
The URL syntax is protocol dependent. You’ll find a detailed descrip‐
tion in RFC 3986.
You can specify multiple URLs or parts of URLs by writing part sets
within braces as in:
http://site.{one,two,three}.com
or you can get sequences of alphanumeric series by using [] as in:
ftp://ftp.numericals.com/file[1-100].txt
ftp://ftp.numericals.com/file[001-100].txt (with leading zeros)
ftp://ftp.letters.com/file[a-z].txt
No nesting of the sequences is supported at the moment, but you can use
several ones next to each other:
http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html
You can specify any amount of URLs on the command line. They will be
fetched in a sequential manner in the specified order.
Since curl 7.15.1 you can also specify step counter for the ranges, so
that you can get every Nth number or letter:
http://www.numericals.com/file[1-100:10].txt
http://www.letters.com/file[a-z:2].txt
You may have noticed that it says "with leading zeros"!
You can use echo type sequences in the wget url to download a string of numbers...
wget http://someaddress.com/logs/dbsclog01s00{1..3}.log
This also works with letters
{a..z} {A..Z}
Not sure precisely what problems you were experiencing, but it sounds like a simple for loop in bash would do it for you.
for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done
You can use a combination of a for loop in bash with the printf command (of course modifying echo to wget as needed):
$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html
Interesting task, so I wrote full script for you (combined several answers and more). Here it is:
#!/bin/bash
# fixed vars
URL=http://domain.com/logs/ # URL address 'till logfile name
PREF=logprefix # logfile prefix (before number)
POSTF=.log # logfile suffix (after number)
DIGITS=3 # how many digits logfile's number have
DLDIR=~/Downloads # download directory
TOUT=5 # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
file=$PREF`printf "%0${DIGITS}d" $i`$POSTF # local file name
dl=$URL$file # full URL to download
echo "$dl -> $DLDIR/$file" # monitoring, can be commented
wget -T $TOUT -q $dl -O $file
if [ "$?" -ne 0 ] # test if we finished
then
exit
fi
done
At the beggiing of the script you can set URL, log file prefix and suffix, how many digits you have in numbering part and download directory. Loop will download all logfiles it found, and automaticaly exit on first non-existant (using wget's timeout).
Note that this script assumes that logfile indexing starts with 1, not zero, as you mentioned in example.
Hope this helps.
Here you can find a Perl script that looks like what you want
http://osix.net/modules/article/?id=677
#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;
for($count=1;$count<=$max;$count++) {
if($count<10) {
$url=$base_url."0".$count.$format; #insert a '0' and form the URL
}
else {
$url=$base_url.$count.$format; #no need to insert a zero
}
system("$program $url");
}
I just had a look at the wget manpage discussion of 'globbing':
By default, globbing will be turned on if the URL contains a globbing character. This option may be used to turn globbing on or off permanently.
You may have to quote the URL to protect it from being expanded by your shell. Globbing makes Wget look for a directory listing, which is system-specific. This is why it currently works only with Unix FTP servers (and the ones emulating Unix "ls" output).
So wget http://... won't work with globbing.
Check to see if your system has seq, then it would be easy:
for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done
If your system has the jot command instead of seq:
for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done
Oh! this is a similar problem I ran into when learning bash to automate manga downloads.
Something like this should work:
for a in `seq 1 999`; do
if [ ${#a} -eq 1 ]; then
b="00"
elif [ ${#a} -eq 2 ]; then
b="0"
fi
echo "$a of 231"
wget -q http://site.com/path/fileprefix$b$a.jpg
done
Late to the party, but a real easy solution that requires no coding is to use the DownThemAll Firefox add-on, which has the functionality to retrieve ranges of files. That was my solution when I needed to download 800 consecutively numbered files.

Resources