How to run multiple curl requests in parallel with multiple variables - bash

Set Up
I currently have the below script working to download files with curl, using a ref file with multiple variables. When I created the script it suited my needs however as the ref file has gotten larger and the data I am requesting via curl is takes longer to generate, my script is now taking too much time to complete.
Objective
I want to be able to update this script so I have curl request and download multiple files as they are ready - as opposed to waiting for each file to be requested and downloaded sequentially.
I've had a look around and seen that I could use either xargs or parallel to achieve this however based on the past questions I've seen, youtube videos and other forum posts, I have haven't been able to find an example that explains if this is possible using more than one variable.
Can someone confirm if this is possible and which tool is better suited to achieve this? Is my current script in the right configuration or do I need to amend a lot of it to shoe horn these commands in?
I suspect this may be a questions that's been asked previously and I may have just not found the right one.
account-list.tsv
client1 account1 123 platform1 50
client2 account1 234 platform1 66
client3 account1 344 platform1 78
client3 account2 321 platform1 209
client3 account2 321 platform2 342
client4 account1 505 platform1 69
download.sh
#!/bin/bash
set -eu
user="user"
pwd="pwd"
D1=$(date "+%Y-%m-%d" -d "1 days ago")
D2=$(date "+%Y-%m-%d" -d "1 days ago")
curr=$D2
cheese=$(pwd)
curl -o /dev/null -s -S -L -f -c cookiejar 'https://url/auth/' -d name=$user -d passwd=$pwd
while true; do
while IFS=$' ' read -r client account accountid platform platformid
do
curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
curl -sSfL -o "$client€$account#$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
done < account-list.tsv
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done
exit

Using GNU Parallel it looks something like this to fetch 100 entries in parallel:
#!/bin/bash
set -eu
user="user"
pwd="pwd"
D1=$(date "+%Y-%m-%d" -d "1 days ago")
D2=$(date "+%Y-%m-%d" -d "1 days ago")
curr=$D2
cheese=$(pwd)
curl -o /dev/null -s -S -L -f -c cookiejar 'https://url/auth/' -d name=$user -d passwd=$pwd
fetch_one() {
client="$1"
account="$2"
accountid="$3"
platform="$4"
platformid="$5"
curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
curl -sSfL -o "$client€$account#$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
}
export -f fetch_one
while true; do
cat account-list.tsv | parallel -j100 --colsep '\t' fetch_one
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done
exit

One (relatively) easy way to run several processes in parallel is to wrap the guts of the call in a function and then call the function inside the while loop, making sure to put the function call in the background, eg:
# function definition
docurl () {
curl -o /dev/null -s -S -f -b cookiejar -c cookiejar 'https://url/auth/' -d account=$accountid
curl -sSfL -o "$client€$account#$platform£$curr.xlsx" -J -b cookiejar -c cookiejar "https://url/platform=$platformid&date=$curr"
}
# call the function within OP's inner while loop
while true; do
while IFS=$' ' read -r client account accountid platform platformid
do
docurl & # put the function call in the background so we can continue loop processing while the function call is running
done < account-list.tsv
wait # wait for all background calls to complete
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done
One issue with this approach is that for a large volume of curl calls it may be possible to bog down the underlying system and/or cause the remote system to reject 'too many' concurrent calls. In this case it'll be necessary to limit the number of concurrent curl calls.
One idea would be to keep a counter of the number of currently running (backgrounded) curl calls and when we hit a limit we wait for a background process to complete before spawning a new one, eg:
max=5 # limit of 5 concurrent/backgrounded calls
ctr=0
while true; do
while IFS=$' ' read -r client account accountid platform platformid
do
docurl &
ctr=$((ctr+1))
if [[ "${ctr}" -ge "${max}" ]]
then
wait -n # wait for a background process to complete
ctr=$((ctr-1))
fi
done < account-list.tsv
wait # wait for last ${ctr} background calls to complete
[ "$curr" \< "$D1" ] || break
curr=$( date +%Y-%m-%d --date "$curr +1 day" ) ## used in instances where I need to grade data for past date ranges.
done

Related

Shell script - is there a faster way to write date/time per second between start and end time?

I have this script (which works fine) that will write all the date/time per second, from a start date/time till an end date/time to a file
while read line; do
FIRST_TIMESTAMP="20230109-05:00:01" #this is normally a variable that changes with each $line
LAST_TIMESTAMP="20230112-07:00:00" #this is normally a variable that changes with each $line
date=$FIRST_TIMESTAMP
while [[ $date < $LAST_TIMESTAMP || $date == $LAST_TIMESTAMP ]]; do
date2=$(echo $date |sed 's/ /-/g' |sed "s/^/'/g" |sed "s/$/', /g")
echo "$date2" >> "OUTPUTFOLDER/output_LABELS_$line"
date=$(date -d "$date +1 sec" +"%Y%m%d %H:%M:%S")
done
done < external_file
However this sometimes needs to run 10 times, and the start date/time and end date/time sometimes lies days apart.
Which makes the script take a long time to write all that data.
Now I am wondering if there is a faster way to do this.
Avoid using a separate date call for each date. In the next example I added a safety parameter maxloop, avoiding loosing resources when the dates are wrong.
#!/bin/bash
awkdates() {
maxloop=1000000
awk \
-v startdate="${first_timestamp:0:4} ${first_timestamp:4:2} ${first_timestamp:6:2} ${first_timestamp:9:2} ${first_timestamp:12:2} ${first_timestamp:15:2}" \
-v enddate="${last_timestamp:0:4} ${last_timestamp:4:2} ${last_timestamp:6:2} ${last_timestamp:9:2} ${last_timestamp:12:2} ${last_timestamp:15:2}" \
-v maxloop="${maxloop}" \
'BEGIN {
T1=mktime(startdate);
T2=mktime(enddate);
linenr=1;
while (T1 <= T2) {
printf("%s\n", strftime("%Y%m%d %H:%M:%S",T1));
T1+=1;
if (linenr++ > maxloop) break;
}
}'
}
mkdir -p OUTPUTFOLDER
while IFS= read -r line; do
first_timestamp="20230109-05:00:01" #this is normally a variable that changes with each $line
last_timestamp="20230112-07:00:00" #this is normally a variable that changes with each $line
awkdates >> "OUTPUTFOLDER/output_LABELS_$line"
done < <(printf "%s\n" "line1" "line2")
Using epoch time (+%s and #) with GNU date and GNU seq to
produce datetimes in ISO 8601 date format:
begin=$(date -ud '2023-01-12T00:00:00' +%s)
end=$(date -ud '2023-01-12T00:00:12' +%s)
seq -f "#%.0f" "$begin" 1 "$end" |
date -uf - -Isec
2023-01-12T00:00:00+00:00
2023-01-12T00:00:01+00:00
2023-01-12T00:00:02+00:00
2023-01-12T00:00:03+00:00
2023-01-12T00:00:04+00:00
2023-01-12T00:00:05+00:00
2023-01-12T00:00:06+00:00
2023-01-12T00:00:07+00:00
2023-01-12T00:00:08+00:00
2023-01-12T00:00:09+00:00
2023-01-12T00:00:10+00:00
2023-01-12T00:00:11+00:00
2023-01-12T00:00:12+00:00
if you're using macOS/BSD's date utility instead of the gnu one, the equivalent command to parse would be :
(bsd)date -uj -f '%FT%T' '2023-01-12T23:34:45' +%s
1673566485
...and the reverse process is using -r flag instead of -d, sans "#" prefix :
(bsd)date -uj -r '1673566485' -Iseconds
2023-01-12T23:34:45+00:00
(gnu)date -u -d '#1673566485' -Iseconds
2023-01-12T23:34:45+00:00

Bash script does not find index from array

I have written a bash script which I want to use to monitor backups on a synology via pushgateway.
The script should search for subfolders in the backup folder, write the newest file into a variable and write the age and size of the file into an array.
To give it finally to the push gateway, I list all metrics with indexes. All folders or files are available. If I execute the script, often one or more indexes are not found. If I execute the commands manually one by one, I get a correct output.
Here is the script:
#!/bin/bash
set -e
backup_dir=$1
for dir in $(find "$backup_dir" -maxdepth 1 -mindepth 1 -type d \( ! -name #eaDir \)); do
if compgen -G "${dir}/*.vib" > /dev/null; then
latest_vib=$(ls -t1 "$dir"/*.vib | head -1)
age_vib=$(( ( $(date +%s) - $(stat -c %Y "$latest_vib") ) ))
size_vib=$(stat -c %s "$latest_vib")
arrage_vib+=("${age_vib}")
arrsize_vib+=("${size_vib}")
fi
if compgen -G "${dir}/*.vbk" > /dev/null; then
latest_vbk=$(ls -t1 "$dir"/*.vbk | head -1)
age_vbk=$(( ( $(date +%s) - $(stat -c %Y "$latest_vbk") ) ))
size_vbk=$(stat -c %s "$latest_vbk")
arrage_vbk+=("${age_vbk}")
arrsize_vbk+=("${size_vbk}")
fi
min_dir=$(echo "$dir" | cut -d'/' -f4- | tr '[:upper:]' '[:lower:]')
sign_dir=${min_dir//_/-}
arrdir+=("${sign_dir}")
done
echo "${arrdir[4]}"
echo "${arrage_vib[4]}"
cat << EOF | curl -ks -u user:pw --data-binary #- https://pushgateway/metrics/job/backup/instance/instance_name
# HELP backup_age displays the age of backups in seconds
# TYPE backup_age gauge
backup_age_vib{dir="${arrdir[1]}"} ${arrage_vib[1]}
backup_age_vib{dir="${arrdir[2]}"} ${arrage_vib[2]}
backup_age_vib{dir="${arrdir[3]}"} ${arrage_vib[3]}
backup_age_vib{dir="${arrdir[4]}"} ${arrage_vib[4]}
backup_age_vbk{dir="${arrdir[1]}"} ${arrage_vbk[1]}
...
# HELP backup_size displays the size of backups in bytes
# TYPE backup_size gauge
backup_size_vib{dir="${arrdir[1]}"} ${arrsize_vib[1]}
...
EOF
I hope you can help me and point out where I made a mistake. I am also open for general optimizations of the script, because I assume that it can be solved better and more performant or optimal. I have a few lines of code from here ;-).
Many thanks in advance.

Quick search to find active urls

I'm trying to use cURL to find active rediractions and save results to file. I know the url is active, when it redirects at least once to specific website. So I came up with:
if (( $( curl -I -L https://mywebpage.com/id=00001&somehashnumber&si=0 | grep -c "/something/" ) > 1 )) ; then echo https://mywebpage.com/id=00001&somehashnumber&si=0 | grep -o -P 'id=.{0,5}' >> id.txt; else echo 404; fi
And it works, but how to modify it to check id range from 00001 to 99999?
you'll want to wrap the whole operation in a for loop and use a formatted sequence to print the ids you'd like to test. without know too much about the task at hand i would write something like this to test the ids
$ for i in $(seq -f "%06g" 1 100000); do curl --silent "example.com/id=$i" --write-out "$i %{response_code}\n" --output /dev/null; done

Bash script, domain expiration date with email sending

I am trying to implement a solution for automatic mail sending once it finds a domain that expiration date has been exceeded. I am really new to this, therefore I managed to get as far as code below, that shows expiration dates and sends an email containg the output.
The kind of help I am looking for is at least a clue how to compare expiration date with the current date and get a result as number of days. I will really appreciate any kind of help.
#!/bin/bash
DOM="onet.pl wp.pl"
for d in $DOM
do
echo -n "$d - "
whois $d | egrep -i 'Expiration|Expires on' | head -1
whois $d | egrep -i 'Expiration|Expires on' | head -1 >> /tmp/domain.date
echo ""
done
#[ -f /tmp/domain.date ] && mail -s 'Domain renew / expiration date' myemail#gmail.com < /tmp/domain.date || :
Look no further than the date command, it has everything you need !
Here is a straightforward solution using date -d to parse the date :
# Get the expiration date
expdate="$(whois $d | egrep -i 'Expiration|Expires on' | head -1)"
# Turn it into seconds (easier to compute with)
expdate="$(date -d"$expdate" +%s)"
# Get the current date in seconds
curdate="$(date +%s)"
# Print the difference in days
printf "Number of days to expiration : %s\n" "$(((expdate-curdate)/86400))"
Good luck !

Implementing a datalogger in bash

Hi I'm a newby in Bash scripting.
I need to log a data stream from a specific IP address and generate a logfile for each day as "file-$date.log" (i.e at 00:00:00 UT close the previous day file and create the correspondig to the new one)
I need to show data stream on screen while it is logged in a file
I try this solution but not works well because never closesthe initial file
apparently the condition check never executes while the first command of the pipe it is something different to an constant string like echo "something".
#!/bin/bash
log_data(){
while IFS= read -r line ; do printf '%s %s\n' "$(date -u '+%j %Y-%m-%d %H:%M:%S')" "$line"; done
}
register_data() {
while : ;
do
> stream.txt
DATE=$(date -u "+%j %Y-%m-%d %H:%M")
HOUR=$(date -u "+%H:%M:%S")
file="file-$DATE.log"
while [[ "${HOUR}" != 00:00:00 ]];
do
tail -f stream.txt | tee "${file}"
sleep 1
HOUR=$(date -u "+%H:%M:%S")
done
> stream.txt
done
}
nc -vn $IP $IP_port | log_data >> stream.txt &
register_data
I'll will be glad if someone can give me some clues to solve this problem.

Resources