Bash Curl Output compare with a variable? - bash

I have this code fetching file size from curl:
file_size=$(curl -sI -b cookies.txt -H "Authorization: Bearer $access_token" -X GET "https://url.com/api/files$file" | grep Content-Length | awk '{print $2}')
I have another set of variables:
output_filesize_lowerBound=25000000
output_filesize_upperBound=55000000
wrong_filesize=317
I want to write an if statement that compares them and allows me to process it. Sample:
if [[ ( "$file_size" > "$output_filesize_lowerBound" ) && ( "$file_size" < "$output_filesize_upperBound" ) ]]
then
echo "Writing"
curl -b cookies.txt -H "Authorization: Bearer $access_token" -X GET "https://url.com/api/files$file" -o "$output_file"
elif [[ ( "$file_size" == "$wrong_filesize" ) ]]
then
echo "Wrong File"
else
echo "There is some problem with the file's size, please check it online"
fi
Somehow it's not working for wrong files i.e. it doesn't go to the second if and executes first every time.
I wasted almost an entire day trying out every alternatives I could find.

First off, I'd suggest using a different strategy for fetching object size. I've used both of these at various times:
file_size="$(curl -s -o/dev/null -w '%{size_download}' "$url")"
or
file_size="$(curl -sI "$url" | awk '{a[$1]=$2} END {print a["Content-Length:"]}')"
The first one downloads the whole object and returns the number of bytes that curl actually saw. It'll work for URLs that don't return the length header. The second one uses curl -I to download only the headers.
Note that you could also parse curl output in pure bash, without using awk. But whatever, it all works. :)
Second, issue is that your if notation may not work with the results you're getting. You may need to add some debug code to make it more obvious where the problem actually lies. I recommend testing each potential failure separately and reporting on the specific problems:
if (( file_size < output_filesize_lowerBound )); then
echo "ERROR: $file_size is too small." >&2
elif (( file_size > output_filesize_upperBound )); then
echo "ERROR: $file_size is too big." >&2
elif (( file_size == wrong_filesize )); then
# This will never be hit if wrong_filesize is < lowerBound.
echo "ERROR: $file_size is just plain wrong." >&2
else
echo "Writing"
...
fi
By testing your limits individually and including the tested data in your error output, it'll be more obvious exactly what is causing your script to behave unexpectedly.
For example, if you want the wrong_filesize test to be done BEFORE file_size is tested against lowerBound, you can simply reorder the tests.

Related

Bash loop a curl request, output to file and stop until empty response

So I have the following bash file and right now its looping a curl request based on the for loop. However, i am tying to find out how to continue looping until the response is empty.
Unfortunately the API that I am calling is based on pages with a maximum responses of 500 results per page. I am trying to pull the data since 2017 so its a lot of data.
I want to continue countering until the response is empty.
#!/bin/bash
# Basic while loop
counter=1
for ((i=1;i<=2;i++));
do
curl -o gettext.txt --request GET \
--url "https://api.io/v1/candidates?page=${counter}&per_page=500" \
--header 'Authorization: Basic aklsjdl;fakj;l;kasdflkaj'
((counter++))
done
echo $counter
echo All done
Anyone have an idea?
As stated in author's comment on his/her own post, the returned data is in json format. The author didn't ask how to append two json files, but it is a necessary step for him/her to accomplish his/her job. In order to append two json's, json1 and json2, maybe skipping json1 last byte } and json2 first byte {, and appending , between them would be enough. Here I am using jq to join two jsons as a more generic approach.
In the examples shown below, the nextjsonchunk file is the json file got at each request. If it has contents, it is appended to the mainjsonfile with jq. If it seems to be empty (inferred by its size) the loop breaks and the result is moved to the present folder and cleanup is made.
Using curl:
#!/usr/bin/env bash
tempfolder=/dev/shm # temporary memory parition, avaiable in ubuntu
emptyjsonize=10 # the minimum json file length, to be used as a threshold
for ((counter=1; 1; counter++))
do
curl "https://api.io/v1/candidates?page=${counter}&per_page=500" \
--header "Authorization: Basic aklsjdl;fakj;l;kasdflkaj" \
--ouput $tempfolder/nextjsonchunk
if [ $(wc -c <$tempfolder/nextjsonchunk) -le $emptyjsonize ]; then break; fi
jq -s '.[0]*.[1]' $tempfolder/mainjsonfile $tempfolder/nextjsonchunk > $folder/mainjsonfile
done
rm $tempfolder/nextjsonchunk # cleaning up
mv $tempfolder/mainjsonfile ./jsonresultfile # end result
Alternativelly, using wget:
#!/usr/bin/env bash
tempfolder=/dev/shm # temporary memory parition, avaiable in ubuntu
emptyjsonize=10 # the minimum json file length, to be used as a threshold
for ((counter=1; 1; counter++))
do
wget "https://api.io/v1/candidates?page=${counter}&per_page=500" \
--header="Authorization: Basic aklsjdl;fakj;l;kasdflkaj" \
--ouput-document $tempfolder/nextjsonchunk
if [ $(wc -c <$tempfolder/nextjsonchunk) -le $emptyjsonize ]; then break; fi
jq -s '.[0]*.[1]' $tempfolder/mainjsonfile $tempfolder/nextjsonchunk > $folder/mainjsonfile
done
rm $tempfolder/nextjsonchunk # cleaning up
mv $tempfolder/mainjsonfile ./jsonresultfile # end result
It is a good idea to take two sample json and test the merging between them, to check if it is being done properly.
It is also good to assure if the empty json file check is ok. The 10 byte was just a guess.
A tmpfs (in memory) partition, /dev/shm was used in the examples, to avoid a many writes, but its use is optional.
You can use break to end the loop at any point:
#!/bin/bash
for ((counter=1; 1; counter++)); do
curl -o gettext.txt --request GET \
--url "https://api.io/v1/candidates?page=${counter}&per_page=500" \
--header 'Authorization: Basic aklsjdl;fakj;l;kasdflkaj'
if [ ! -s gettext.txt ]; then
break;
fi
# do something with gettext.txt
# as in your question, it will be overwritten in the next iteration
done
echo "$counter"
echo "All done"
Like this?
#!/bin/bash
# Basic while loop
counter=1
while true; do
data=$(curl --request GET \
--url "https://api.io/v1/candidates?page=${counter}&per_page=500" \
--header 'Authorization: Basic aklsjdl;fakj;l;kasdflkaj')
[[ $data ]] || break
echo "$data" >> gettext.txt
((counter++))
done
echo $counter
echo All done

How to get success count, failure count and failure reason when testing rest webservices from file using shell script

Hi i am testing web services using shell script by having multiple if condition, with the shell script coding i am getting success count, failure count and failure reason
success=0
failure=0
if curl -s --head --request DELETE http://localhost/bimws/delete/deleteUser?email=pradeepkumarhe1989#gmail.com | grep "200 OK" > /dev/null; then
success=$((success+1))
else
echo "DeleteUser is not working"$'\r' >> serverLog.txt
failure=$((failure+1))
fi
if curl -s --head --request GET http://localhost/bimws/get/getUserDetails?email=anusha4saju#gmail.com | grep "200 OK" > /dev/null; then
success=$((success+1))
else
curl -s --head --request GET http://localhost/bimws/get/getUserDetails?email=anusha4saju#gmail.com > f1.txt
echo "getUserDetails is not working"$'\r' >> serverLog.txt
failure=$((failure+1))
fi
if curl -s -i -X POST -H "Content-Type:application/json" http://localhost/bimws/post/addProjectLocationAddress -d '{"companyid":"10","projectid":"200","addresstypeid":"5","address":"1234 main st","city":"san jose","state":"CA","zip":"989898","country":"United States"}' | grep "200 OK" > /dev/null; then
success=$((success+1))
else
echo "addProjectLocationAddress is not working"$'\r' >> serverLog.txt
failure=$((failure+1))
fi
echo $success Success
echo $failure failure
but i am looking forward to test the web services from a file like i have file called web_services.txt which contains all my web services using shell script how do i execute and success count, failure count and failure reason
web_services.txt
All are different calls delete,get and post
http://localhost/bimws/delete/deleteUser?email=pradeepkumarhe1989#gmail.com
http://localhost/bimws/get/getUserDetails?email=anusha4saju#gmail.com
http://localhost/bimws/post/addProjectLocationAddress -d '{"companyid":"10","projectid":"200","addresstypeid":"5","address":"1234 main st"
,"city":"san jose","state":"CA","zip":"989898","country":"United States"}'
First of all, your current code does not correctly deal with empty lines. You need to skip those.
Your lines already contain shell commands. Running curl on them makes no sense. Instead, you should evaluate these commands.
Then, you need to modify curl so that it reports whether the request was successful by adding -f:
FILE=D:/WS.txt
success=0
failure=0
while read LINE; do
if test -z "$LINE"; then
continue
fi
if eval $(echo "$LINE" | sed 's/^curl/curl -f -s/') > /dev/null; then
success=$((success+1))
else
echo $LINE >> aNewFile.txt
failure=$((failure+1))
fi
done < $FILE
echo $success Success
echo $failure failure

Assign variable and redirect in bash

I'm doing ad-hoc profiling on a web service that seems to maintain some state and get slower and slower until eventually things start timing out. I have a simple script that will expose this behavior:
while true
do
RESPONSE_CODE=$( curl --config curl.config )
if [ "$RESPONSE_CODE" -eq "200" ]; then
echo SUCCESS
else
echo FAILURE
fi
done
Along with some headers, cookies, post data, url, etc. curl.config in particular has the lines:
silent
output = /dev/null
write-out = "%{http_code}"
So the only output from curl should be the HTTP status code.
This works fine. What I'd like to do is something like this:
{ time -p RESPONSE_CODE=$(curl --config curl.config) ; } 2>&1 | awk '/real/{print $2;}'
to get a running printout of how long these queries actually take, while still saving curl's output for use in my test. But that doesn't work.
How can I capture the http status from curl AND grab the output of time so I can process both?
As written:
RESPONSE_CODE = $( curl --config curl.config )
you have spaces around the assignment which simply does not work in shell (it tries to execute a command RESPONSE_CODE with = as the first argument, etc. You need:
RESPONSE_CODE=$( curl --config curl.config )
The time built-in is hard to redirect. Since you need both HTTP status and real time, you will have to do something to capture both values. One possibility is:
set -- $( (time -p -- curl --config curl.config ) 2>&1 |
awk '/real/{print $2} /^[0-9]+$/{print}')
which will set $1 and $2. Another is array assignment:
data=( $( (time -p -- curl --config curl.config ) 2>&1 |
awk '/real/{print $2} /^[0-9]+$/{print}') )
The HTTP response code should appear before the time.
(Tested using sh -c 'echo 200; sleep 1' in lieu of curl --config curl.config.)
This should work if Curl's response is only a single line:
#!/bin/bash
RESPONSE_CODE=''
TIME=''
while read -r TYPE DATA; do
case "$TYPE" in
curl)
RESPONSE_CODE=$DATA
;;
real)
TIME=$DATA
;;
esac
done < <(exec 2>&1; time -p R=$(curl --config curl.config); echo "curl $R")
Or use an associative array:
#!/bin/bash
declare -A RESPONSE
while read -r TYPE DATA; do
RESPONSE[$TYPE]=$DATA
done < <(exec 2>&1; time -p R=$(curl ...); echo "code $R")
echo "${RESPONSE[code] ${RESPONSE[real]}"

for loop: commands start from begin every time

I have write the following bash script to check list of domains from domain.list and multiple directories from dir.list.
# is the first domain, it first tries to find file at
http://example.com
if success script finish and exit no problem.
if failed it go to check it at
https://example.com
if ok , script finish and exit,
if not
check for it at
http://example.com/$list of different directories.
If file found script finished and exit , if failed to find
then go to check it at
https://example.com/$list of different directories
But the problem , when the first check failed and second check failed , it goes to third check , but it keep looping , at third command and 4th command, tell it find file or list of directories finished.
I want the script when reach the 3rd command to run it and check for it at list of directories tell the list finish and not to go for the 4th command tell it finished
As at my script it keep checking for single domain at multiple directories and every time to check a new directory it start the whole script from the bagain and run the 1st command and 2nd command again from the begin and I do not need that, big loss of time
Thanks
#!/bin/bash
dirs=(`cat dir.list`)
doms=( `cat domain.list`)
for dom in "${doms[#]}"
do
for dir in "${dirs[#]}"
do
target1="http://${dom}"
target2="https://${dom}"
target3="http://${dom}/${dir}"
target4="https://${dom}/${dir}"
if curl -s --insecure -m2 ${target1}/test.txt | grep "success" > /dev/null ;then
echo ${target1} >> dir.result
break
elif curl -s --insecure -m2 ${target2}/test.txt | grep "success" > /dev/null;then
echo ${target2} >> dir.result
break
elif curl -s --insecure -m2 ${target3}/test.txt | grep "success" > /dev/null; then
echo ${target3} >> dir.result
break
elif curl -s --insecure -m2 ${target4}/test.txt | grep "success" > /dev/null ; then
echo ${target4} >> dir.result
break
fi
done
done
Your code is sub-optimal; if you have a list of 5 'dir' values, you check 5 times whether http://${domain}/test.txt exists — but the chances are that if it didn't exist the first time, it doesn't exist on the other times either.
You use dir to indicate a sub-directory name, but your code uses http://${dom}:${dir} rather than the more normal http://${dom}/${dir}. Technically, what follows the colon up to the first slash is a port number, not a directory. I'm going to assume this is a typo and the colon should be replaced by a slash.
Generally, do not use the back-tick notation; use $(…) instead. Avoid swathes of repeated code, too.
I think you can compress your script down to something like this:
#!/bin/bash
dirs=( $(cat dir.list) )
file=test.txt
fetch_file()
{
if curl -s --insecure -m2 "${1:?}/${file}" | grep "success" > /dev/null
then
echo "${1}"
return 0
else
return 1
fi
}
for dom in $(cat domain.list)
do
for proto in http https
do
fetch_file "${proto}://{$dom}" && break
for dir in "${dirs[#]}"
do
fetch_file "${proto}://${dom}/${dir}" && break 2
done
done
done > dir.result
If the domain list is massive, you could consider using while read dom; do …; done < domain.list instead of using the $(cat domain.list). It would be feasible, and possibly even sensible, to define variable site="${proto}://${dom}" and then use that in the invocations of fetch_file.
You can use this script:
while read dom; do
while read dir; do
target1="http://${dom}"
target2="https://${dom}"
target3="http://${dom}:${dir}"
target4="https://${dom}:${dir}"
if curl -s --insecure -m2 ${target1}/test.txt | grep -q "success"; then
echo ${target1} >> dir.result
break 2
elif curl -s --insecure -m2 ${target2}/test.txt | grep -q "success"; then
echo ${target2} >> dir.result
break 2
elif curl -s --insecure -m2 ${target3}/test.txt | grep -q "success"; then
echo ${target3} >> dir.result
break 2
elif curl -s --insecure -m2 ${target4}/test.txt | grep -q "success"; then
echo ${target4} >> dir.result
break 2
fi
done < dir.list
done < domain.list

Curl not downloading files correctly

So I have been struggling with this task for eternity and still don't get what went wrong. This program doesn't seem to download ANY pdfs. At the same time I checked the file that stores final links - everything stored correctly. The $PDFURL also checked, stores correct values. Any bash fans ready to help?
#!/bin/sh
#create a temporary directory where all the work will be conducted
TMPDIR=`mktemp -d /tmp/chiheisen.XXXXXXXXXX`
echo $TMPDIR
#no arguments given - error
if [ "$#" == "0" ]; then
exit 1
fi
# argument given, but wrong format
URL="$1"
#URL regex
URL_REG='(https?|ftp|file)://[-A-Za-z0-9\+&##/%?=~_|!:,.;]*[-A-Za-z0-9\+&##/%=~_|]'
if [[ ! $URL =~ $URL_REG ]]; then
exit 1
fi
# go to directory created
cd $TMPDIR
#download the html page
curl -s "$1" > htmlfile.html
#grep only links into temp.txt
cat htmlfile.html | grep -o -E 'href="([^"#]+)\.pdf"' | cut -d'"' -f2 > temp.txt
# iterate through lines in the file and try to download
# the pdf files that are there
cat temp.txt | while read PDFURL; do
#if this is an absolute URL, download the file directly
if [[ $PDFURL == *http* ]]
then
curl -s -f -O $PDFURL
err="$?"
if [ "$err" -ne 0 ]
then
echo ERROR "$(basename $PDFURL)">&2
else
echo "$(basename $PDFURL)"
fi
else
#update url - it is always relative to the first parameter in script
PDFURLU="$1""/""$(basename $PDFURL)"
curl -s -f -O $PDFURLU
err="$?"
if [ "$err" -ne 0 ]
then
echo ERROR "$(basename $PDFURLU)">&2
else
echo "$(basename $PDFURLU)"
fi
fi
done
#delete the files
rm htmlfile.html
rm temp.txt
P.S. Another minor problem I have just spotted. Maybe the problem is with the if in regex? I pretty much would like to see something like that there:
if [[ $PDFURL =~ (https?|ftp|file):// ]]
but this doesn't work. I don't have unwanted parentheses there, so why?
P.P.S. I also ran this script on URLs beginning with http, and the program gave the desired output. However, it still doesn't pass the test.

Resources