Bash: Looping urls using curl, giving only first url output - bash

What I want to do: I want to find all the products(URLs) which are not redirected.
To get the final URL after redirection I'm using curl command as follows:
curl -Ls -o /dev/null -w %{url_effective} "$URL"
This is working fine. Now I want to iterate over URLs to get which are the URLs that are not redirected and display them as output of program. I've the following code:
result=""
productIds=$1
for productId in $(echo $productIds | sed "s/,/ /g")
do
echo "Checking product: $productId";
URL="http://localhost/?go=$productId";
updatedURL=`curl -Ls -o /dev/null -w %{url_effective} "$URL"`
echo "URL : $URL, UpdatedUrl: $updatedURL";
if [ "$URL" == "$updatedURL" ]
then
result="$result$productId,";
fi
done
The curl command works only for the first product. But from 2nd to last product, all the URL and updatedURL are same. I can't understand the reason why? The productId is changing in every iteration, so I think it cannot be something related to caching.
Also, I've tried the following variant of curl also:
updatedURL=$(curl -Ls -o /dev/null -w %{url_effective} "$URL")
updatedURL="$(curl -Ls -o /dev/null -w %{url_effective} "$URL")"
Edit: After trying with debug mode and lot of different ways. I noticed a pattern i.e. If I manually hit the following on terminal:
curl -Ls -o /dev/null -w %{url_effective} "http://localhost/?go=32123"
Then in shell script these urls will work fine. But if I don't hit manually then curl will also not work for those products via shell script.

Just add #!/bin/bash to be the first line of shell. It does produce required output. Now invocation should be like this bash file.sh 123,456,789,221
Invocation via Bourne shell sh file.sh 123,456,789,221 does requires code changes.Do let me know if you require that too :)

I would suggest changing your loop to something like this:
IFS=, read -ra productIds <<<"$1"
for productId in "${productIds[#]}"; do
url=http://localhost/?go=$productId
num_redirects=$(curl -Ls -o /dev/null -w %{num_redirects} "$url")
if [ "$num_redirects" -eq 0 ]; then
printf 'productId %s has no redirects\n' "$productId"
fi
done
This splits the first argument passed to the script into an array, using a comma as the delimiter. The number of redirects is stored to a variable. When the number is zero, the message is printed.
I have to admit that I can't see anything inherently broken with your original approach so it's possible that there is something extra going on that we're not aware of. If you could provide a reproducible test case then we would be able to help you more effectively.

Related

Catch and retry bash

I'm running in to a situation where the curl seeks that I do to a server to fetch some information intermittently breaks and instead of populating the resultant file with actual information it populates it with error it got and then move to next datapoint(in for loop), so I decided to catch and retry.(Using bash). But for some reason it's not doing what it should be, the generated file turns out to be empty. Below is what I have written, if you have a better/easier way to approach this or see a problem in below please go ahead and share. Thanks in advance.
for i in `cat master_db_list`
do
curl -sN --negotiate -u foo:bar "${URL}/$i/table" > $table_info/${i}
#Seek error catch
result=`cat $table_info/${i} | grep -i "Error"`
#lowercase comparison, since error can be in any case
echo -e "result is: $result \n" >> $job_log 2>&1
while [[ "${result,,}" =~ error ]]
do
echo "Retrying table list generation for ${i}" >> $job_log 2>&1
curl -sN --negotiate -u foo:bar "${URL}/$i/table" > $table_info/${i}
result=`cat $table_info/${i} | grep -i "Error"`
done
done
turns out , nothing wrong with above, some of the curls calls returned empty values, decreased seek frequency with sleeps in between and is working all ok now.

Submit SGE job array with random file names

I have a script that was kicking off ~200 jobs for each sub-analysis. I realized that a job array would probably be much better for this for several reasons. It seems simple enough but is not quite working for me. My input files are not numbered so I've following examples I've seen I do this first:
INFILE=`sed -n ${SGE_TASK_ID}p <pathto/listOfFiles.txt`
My qsub command takes in quite a few variables as it is both pulling and outputting to different directories. $res does not change, however $INFILE is what I am looping through.
qsub -q test.q -t 1-200 -V -sync y -wd ${res} -b y perl -I /master/lib/ myanalysis.pl -c ${res}/${INFILE}/configFile-${INFILE}.txt -o ${res}/${INFILE}/
Since this was not working, I was curious as to what exactly was being passed. So I did an echo on this and saw that it only seems to expand up to the first time $INFILE is used. So I get:
perl -I /master/lib/ myanalysis.pl -c mydirectory/fileABC/
instead of:
perl -I /master/lib/ myanalysis.pl -c mydirectory/fileABC/configFile-fileABC.txt -o mydirectory/fileABC/
Hoping for some clarity on this and welcome all suggestions. Thanks in advance!
UPDATE: It doesn't look like $SGE_TASK_ID is set on the cluster. I looked for any variable that could be used for an array ID and couldn't find anything. If I see anything else I will update again.
Assuming you are using a grid engine variant then SGE_TASK_ID should be set within the job. It looks like you are expecting it to be set to some useful variable before you use qsub. Submitting a script like this would do roughly what you appear to be trying to do:
#!/bin/bash
INFILE=$(sed -n ${SGE_TASK_ID}p <pathto/listOfFiles.txt)
exec perl -I /master/lib/ myanalysis.pl -c ${res}/${INFILE}/configFile-${INFILE}.txt -o ${res}/${INFILE}/
Then submit this script with
res=${res} qsub -q test.q -t 1-200 -V -sync y -wd ${res} myscript.sh
`

AWS S3 ListBucketResult pagination without authentication?

I'm looking to get a simple listing of all the objects in a public S3 bucket.
I'm aware how to get a listing with curl for upto 1000 results, though I do not understand how to paginate the results, in order to get a full listing. I think marker is a clue.
I do not want to use a SDK / library or authenticate. I'm looking for a couple of lines of shell to do this.
#!/bin/sh
# setting max-keys higher than 1000 is not effective
s3url=http://mr2011.s3-ap-southeast-1.amazonaws.com?max-keys=1000
s3ns=http://s3.amazonaws.com/doc/2006-03-01/
i=0
s3get=$s3url
while :; do
curl -s $s3get > "listing$i.xml"
nextkey=$(xml sel -T -N "w=$s3ns" -t \
--if '/w:ListBucketResult/w:IsTruncated="true"' \
-v 'str:encode-uri(/w:ListBucketResult/w:Contents[last()]/w:Key, true())' \
-b -n "listing$i.xml")
# -b -n adds a newline to the result unconditionally,
# this avoids the "no XPaths matched" message; $() drops newlines.
if [ -n "$nextkey" ] ; then
s3get=$s3url"&marker=$nextkey"
i=$((i+1))
else
break
fi
done

Inserting variables into URLs with bash scripting

Say I want to write a bash script that takes user input, inserts it into a URL and then downloads it. Something like this:
#!/bin/bash
cd /path/to/folder
echo Which version?
read version
curl -O http://assets.company.tld/$version/foo.bar
Would this work? If not, how can I do what I'm trying to do?
#!/bin/bash
version=$1
cd /path/to/folder
echo $version
curl -o $version-foo.bar http://assets.company.tld/$version/foo.bar
where $1 is the first positional argument
So, suppose you save the script with name assets.sh. Then you can using the same like following:
./assests.sh ver1
where ver1 is the version
[EDIT] If you want an interactive session:
#!/bin/bash
version=$1
cd /path/to/folder
echo -n "Which version you want? "
read version
curl -o $version-foo.bar http://assets.company.tld/$version/foo.bar

creating a file downloading script with checksum verification

I want to create a shellscript that reads files from a .diz file, where information about various source files are stored, that are needed to compile a certain piece of software (imagemagick in this case). i am using Mac OSX Leopard 10.5 for this examples.
Basically i want to have an easy way to maintain these .diz files that hold the information for up-to-date source packages. i would just need to update these .diz files with urls, version information and file checksums.
Example line:
libpng:1.2.42:libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:http://downloads.sourceforge.net/project/libpng/00-libpng-stable/1.2.42/libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:9a5cbe9798927fdf528f3186a8840ebe
script part:
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
curl -L -v -O $url 2>> logfile.txt
$calculated_md5=`/sbin/md5 $file | /usr/bin/cut -f 2 -d "="`
echo $calculated_md5
done < "files.diz"
Actually I have more than just one question concerning this.
how to calculate and compare the checksums the best? i wanted to store md5 checksums in the .diz file and compare it with string comparison with "cut"ting out the string
is there a way to tell curl another filename to save to? (in my case the filename gets ugly libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks)
i seem to have issues with the backticks that should direct the output of the piped md5 and cut into the variable $calculated_md5. is the syntax wrong?
Thanks!
The following is a practical one-liner:
curl -s -L <url> | tee <destination-file> |
sha256sum -c <(echo "a748a107dd0c6146e7f8a40f9d0fde29e19b3e8234d2de7e522a1fea15048e70 -") ||
rm -f <destination-file>
wrapping it up in a function taking 3 arguments:
- the url
- the destination
- the sha256
download() {
curl -s -L $1 | tee $2 | sha256sum -c <(echo "$3 -") || rm -f $2
}
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
#use -o for output file. define $outputfile yourself
curl -L -v $url -o $outputfile 2>> logfile.txt
# use $(..) instead of backticks.
calculated_md5=$(/sbin/md5 "$file" | /usr/bin/cut -f 2 -d "=")
# compare md5
case "$calculated_md5" in
"$md5" )
echo "md5 ok"
echo "do something else here";;
esac
done < "files.diz"
My curl has a -o (--output) option to specify an output file. There's also a problem with your assignment to $calculated_md5. It shouldn't have the dollar sign at the front when you assign to it. I don't have /sbin/md5 here so I can't comment on that. What I do have is md5sum. If you have it too, you might consider it as an alternative. In particular, it has a --check option that works from a file listing of md5sums that might be handy for your situation. HTH.

Resources