How to use parallel with curl? - bash

How do i use gnu parallel to make this process faster ?
#!/bin/bash
for (( c=1; c<=100; c++ ))
do
curl -sS 'https://example.com' \
--data 'value='$c'' /dev/null
echo $c
done

You can use parallel, or xargs
seq 100 | parallel curl -sS 'https://example.com' --data value='{}' /dev/null
seq 100 | xargs -I{} curl -sS 'https://example.com' --data value='{}' /dev/null
As the script stand, output will be sent to stdout. With xargs, this will result in output from different calls potentially mixed. Consider redirect output to files for additional processing, if needed.
You can add options for max parallel (-Pn, etc.) as needed
I'm not sure why '/dev/null' is needed. Consider reordering:
curl -sS --data value='{}' https://example.com'

Related

Sending grep data iteratively using curl for output of tail -f

I am using grep to extract data from a log file. The log file is dynamically updating new rows. and I need to send the grepped data to a REST endpoint using curl. This can be done easily for a static file but cannot find a solution fo a running log file. How can I realize this situation?
eg: tail -f | grep "<string>" > ~/<fileName>.log
The above can put the data in a file. Need to send it using a POST curl.
Maybe using a function like
send_data(){
curl -s -k -X POST –header Content-Type: application/json’ \
–header ‘Accept: application/json’ \
“http://${HOST}${PORT}/v1/notify” \
-d $1
}
If tail -f | grep "<string>" > ~/<fileName>.log is working for you then you could do:
tail -f file | stdbuf -i0 -o0 -e0 grep "<string>" | xargs -n 1 -d $'\n' curl ...
or:
while IFS= read -r line; do
curl ... "$line"
done < <(tail -f file | stdbuf -i0 -o0 -e0 grep "<string>")

How can I loop over comma-separated lists *inside* each line of a file?

Need to write some status checker at bash-script:
Have file with strings like that:
domain.com; 111.111.111.111,222.222.222.222; /link/to/somefile.js,/link/to/somefile2.js
domain2.com; 122.122.111.111,211.211.222.222; /link/to/somefile2.js,/link/to/somefile3.js
Need to execute such commands at total:
curl -s -I -H 'Host: domain.com' http://111.111.111.111/link/to/somefile.js
curl -s -I -H 'Host: domain.com' http://222.222.222.222/link/to/somefile.js
curl -s -I -H 'Host: domain.com' http://111.111.111.111/link/to/somefile2.js
curl -s -I -H 'Host: domain.com' http://222.222.222.222/link/to/somefile2.js
curl -s -I -H 'Host: domain2.com' http://122.122.111.111/link/to/somefile2.js
curl -s -I -H 'Host: domain2.com' http://211.211.222.222/link/to/somefile2.js
curl -s -I -H 'Host: domain2.com' http://122.122.111.111/link/to/somefile3.js
curl -s -I -H 'Host: domain2.com' http://211.211.222.222/link/to/somefile3.js
The question is:
what tool do I need to use to have such result at total?
Maybe xargs with some arguments/flags can do that or gnu parallel?
Can you, please, show examples?
I can to separate lines and set result to different variables that's isn't problem at all:
domain=$(cut -d';' -f1 file| xargs -I N -d "," echo curl -H) \'N\'
ip=$(cut -d';' -f2 file| xargs -I N -d "," echo curl -H) \'N\'
and else
But question at other :) :
how after delimiting and separating strings to variables, I can execute curl with different variables at that case - the number of arguments for different variables will be different ?
The answer's that get Barmar doesn't cover task problem at all, cause it has greater than two list's. The problem is not at ignorance of bash, but of way I can resolve issue
#!/usr/bin/env bash
# ^^^^- IMPORTANT: not /bin/sh
# print command instead of running it, so people can test their answers without real URLs
log_command() { printf '%q ' "$#"; printf '\n'; }
while IFS='; ' read -r domain addrs_str files_str; do
IFS=, read -a addrs <<<"$addrs_str"
IFS=, read -a files <<<"$files_str"
for file in "${files[#]}"; do
for addr in "${addrs[#]}"; do
log_command curl -s -I -H "Host: $domain" "http://$addr/$file"
done
done
done
...emits as output (as the list of commands if it would run if the log_command prefix were removed):
curl -s -I -H Host:\ domain.com http://111.111.111.111//link/to/somefile.js
curl -s -I -H Host:\ domain.com http://222.222.222.222//link/to/somefile.js
curl -s -I -H Host:\ domain.com http://111.111.111.111//link/to/somefile2.js
curl -s -I -H Host:\ domain.com http://222.222.222.222//link/to/somefile2.js
curl -s -I -H Host:\ domain2.com http://122.122.111.111//link/to/somefile2.js
curl -s -I -H Host:\ domain2.com http://211.211.222.222//link/to/somefile2.js
curl -s -I -H Host:\ domain2.com http://122.122.111.111//link/to/somefile3.js
curl -s -I -H Host:\ domain2.com http://211.211.222.222//link/to/somefile3.js
...as you can see at https://ideone.com/dTC8q8
Now how does this work?
Step 1: Read each line into domain, addrs_str and files_str, split on semicolons and spaces.
That's what's done by the line IFS='; ' read -r domain addrs_str files_str, which operates as described in BashFAQ #1, and in How to read variables from file, with multiple variables per line?
Step 2: For addrs_str and files_str, split them on commas into separate arrays. This is described in How do I split a string on a delimiter in Bash?
Step 3: Iterate over those arrays, and call curl for each combination. If you wanted to call the first IP with only the first file, and the second IP with the second file, you could use Iterate over two arrays simultaneously in bash; otherwise, it's a plain nested loop.
With GNU Parallel it would look like this
doit() {
domain="$1"
ips="$2"
paths="$3"
parallel --dry-run -d ',' -q curl -s -I -H Host:\ "$domain" http://{1}/{2} ::: "$ips" ::: "$paths"
}
export -f doit
parallel --colsep ';' doit :::: input.file
Remove --dry-run when you are convinced it works.

Run multiple curl commands in parallel

I have the following shell script. The issue is that I want to run the transactions parallel/concurrently without waiting for one request to finish to go to the next request. For example if I make 20 requests, I want them to be executed at the same time.
for ((request=1;request<=20;request++))
do
for ((x=1;x<=20;x++))
do
time curl -X POST --header "http://localhost:5000/example"
done
done
Any guide?
You can use xargs with -P option to run any command in parallel:
seq 1 200 | xargs -n1 -P10 curl "http://localhost:5000/example"
This will run curl command 200 times with max 10 jobs in parallel.
Using xargs -P option, you can run any command in parallel:
xargs -I % -P 8 curl -X POST --header "http://localhost:5000/example" \
< <(printf '%s\n' {1..400})
This will run give curl command 400 times with max 8 jobs in parallel.
Update 2020:
Curl can now fetch several websites in parallel:
curl --parallel --parallel-immediate --parallel-max 3 --config websites.txt
websites.txt file:
url = "website1.com"
url = "website2.com"
url = "website3.com"
This is an addition to #saeed's answer.
I faced an issue where it made unnecessary requests to the following hosts
0.0.0.1, 0.0.0.2 .... 0.0.0.N
The reason was the command xargs was passing arguments to the curl command. In order to prevent the passing of arguments, we can specify which character to replace the argument by using the -I flag.
So we will use it as,
... xargs -I '$' command ...
Now, xargs will replace the argument wherever the $ literal is found. And if it is not found the argument is not passed. So using this the final command will be.
seq 1 200 | xargs -I $ -n1 -P10 curl "http://localhost:5000/example"
Note: If you are using $ in your command try to replace it with some other character that is not being used.
Adding to #saeed's answer, I created a generic function that utilises function arguments to fire commands for a total of N times in M jobs at a parallel
function conc(){
cmd=("${#:3}")
seq 1 "$1" | xargs -n1 -P"$2" "${cmd[#]}"
}
$ conc N M cmd
$ conc 10 2 curl --location --request GET 'http://google.com/'
This will fire 10 curl commands at a max parallelism of two each.
Adding this function to the bash_profile.rc makes it easier. Gist
Add “wait” at the end, and background them.
for ((request=1;request<=20;request++))
do
for ((x=1;x<=20;x++))
do
time curl -X POST --header "http://localhost:5000/example" &
done
done
wait
They will all output to the same stdout, but you can redirect the result of the time (and stdout and stderr) to a named file:
time curl -X POST --header "http://localhost:5000/example" > output.${x}.${request}.out 2>1 &
Wanted to share my example how I utilised parallel xargs with curl.
The pros from using xargs that u can specify how many threads will be used to parallelise curl rather than using curl with "&" that will schedule all let's say 10000 curls simultaneously.
Hope it will be helpful to smdy:
#!/bin/sh
url=/any-url
currentDate=$(date +%Y-%m-%d)
payload='{"field1":"value1", "field2":{},"timestamp":"'$currentDate'"}'
threadCount=10
cat $1 | \
xargs -P $threadCount -I {} curl -sw 'url= %{url_effective}, http_status_code = %{http_code},time_total = %{time_total} seconds \n' -H "Content-Type: application/json" -H "Accept: application/json" -X POST $url --max-time 60 -d $payload
.csv file has 1 value per row that will be inserted in json payload
Based on the solution provided by #isopropylcyanide and the comment by #Dario Seidl, I find this to be the best response as it handles both curl and httpie.
# conc N M cmd - fire (N) commands at a max parallelism of (M) each
function conc(){
cmd=("${#:3}")
seq 1 "$1" | xargs -I'$XARGI' -P"$2" "${cmd[#]}"
}
For example:
conc 10 3 curl -L -X POST https://httpbin.org/post -H 'Authorization: Basic dXNlcjpwYXNz' -H 'Content-Type: application/json' -d '{"url":"http://google.com/","foo":"bar"}'
conc 10 3 http --ignore-stdin -F -a user:pass httpbin.org/post url=http://google.com/ foo=bar

Bashscript with curl operation in parallel

I have a list with urls which I like to load with CURL and do some operations on the result with a bash script.
Since it are almost 100k requests I like to run this in parallel.
I already looked into GNU parallel, but how am I going to glue all together? Thanks!
The bashscript:
while read URL; do
curl -L -H "Accept: application/unixref+xml" $URL > temp.xml;
YEAR=$(xmllint --xpath '//year' temp.xml);
MONTH=$(xmllint --xpath '(//date/month)[1]' temp.xml);
echo "$URL;$YEAR;$MONTH" >> results.csv;
sed -i '1d' urls.txt;
done < urls.txt;
You shouldn't be modifying the input list of URLs as you make each HTTP request. And having multiple appenders writing to the same output file from different processes will likely end in tears.
Put most of your commands in a separate script (named, say, geturl.sh) that could be invoked with the URL as a parameter, and writes its line of output to standard out:
#!/usr/bin/env bash
URL="${1}"
curl -L -H "Accept: application/unixref+xml" "${URL}" > /tmp/$$.xml
YEAR="$(xmllint --xpath '//year' /tmp/.xml)"
MONTH="$(xmllint --xpath '(//date/month)[1]' /tmp/$$.xml)"
rm -f /tmp/$$.xml
echo "${URL};${YEAR};${MONTH}"
Then invoke as follows (here we let parallel merge the outputs from the various threads line by line):
parallel --line-buffer geturl.sh < urls.txt > results.csv

Script to get the HTTP status code of a list of urls?

I have a list of URLS that I need to check, to see if they still work or not. I would like to write a bash script that does that for me.
I only need the returned HTTP status code, i.e. 200, 404, 500 and so forth. Nothing more.
EDIT Note that there is an issue if the page says "404 not found" but returns a 200 OK message. It's a misconfigured web server, but you may have to consider this case.
For more on this, see Check if a URL goes to a page containing the text "404"
Curl has a specific option, --write-out, for this:
$ curl -o /dev/null --silent --head --write-out '%{http_code}\n' <url>
200
-o /dev/null throws away the usual output
--silent throws away the progress meter
--head makes a HEAD HTTP request, instead of GET
--write-out '%{http_code}\n' prints the required status code
To wrap this up in a complete Bash script:
#!/bin/bash
while read LINE; do
curl -o /dev/null --silent --head --write-out "%{http_code} $LINE\n" "$LINE"
done < url-list.txt
(Eagle-eyed readers will notice that this uses one curl process per URL, which imposes fork and TCP connection penalties. It would be faster if multiple URLs were combined in a single curl, but there isn't space to write out the monsterous repetition of options that curl requires to do this.)
wget --spider -S "http://url/to/be/checked" 2>&1 | grep "HTTP/" | awk '{print $2}'
prints only the status code for you
Extending the answer already provided by Phil. Adding parallelism to it is a no brainer in bash if you use xargs for the call.
Here the code:
xargs -n1 -P 10 curl -o /dev/null --silent --head --write-out '%{url_effective}: %{http_code}\n' < url.lst
-n1: use just one value (from the list) as argument to the curl call
-P10: Keep 10 curl processes alive at any time (i.e. 10 parallel connections)
Check the write_out parameter in the manual of curl for more data you can extract using it (times, etc).
In case it helps someone this is the call I'm currently using:
xargs -n1 -P 10 curl -o /dev/null --silent --head --write-out '%{url_effective};%{http_code};%{time_total};%{time_namelookup};%{time_connect};%{size_download};%{speed_download}\n' < url.lst | tee results.csv
It just outputs a bunch of data into a csv file that can be imported into any office tool.
This relies on widely available wget, present almost everywhere, even on Alpine Linux.
wget --server-response --spider --quiet "${url}" 2>&1 | awk 'NR==1{print $2}'
The explanations are as follow :
--quiet
Turn off Wget's output.
Source - wget man pages
--spider
[ ... ] it will not download the pages, just check that they are there. [ ... ]
Source - wget man pages
--server-response
Print the headers sent by HTTP servers and responses sent by FTP servers.
Source - wget man pages
What they don't say about --server-response is that those headers output are printed to standard error (sterr), thus the need to redirect to stdin.
The output sent to standard input, we can pipe it to awk to extract the HTTP status code. That code is :
the second ($2) non-blank group of characters: {$2}
on the very first line of the header: NR==1
And because we want to print it... {print $2}.
wget --server-response --spider --quiet "${url}" 2>&1 | awk 'NR==1{print $2}'
Use curl to fetch the HTTP-header only (not the whole file) and parse it:
$ curl -I --stderr /dev/null http://www.google.co.uk/index.html | head -1 | cut -d' ' -f2
200
wget -S -i *file* will get you the headers from each url in a file.
Filter though grep for the status code specifically.
I found a tool "webchk” written in Python. Returns a status code for a list of urls.
https://pypi.org/project/webchk/
Output looks like this:
▶ webchk -i ./dxieu.txt | grep '200'
http://salesforce-case-status.dxi.eu/login ... 200 OK (0.108)
https://support.dxi.eu/hc/en-gb ... 200 OK (0.389)
https://support.dxi.eu/hc/en-gb ... 200 OK (0.401)
Hope that helps!
Keeping in mind that curl is not always available (particularly in containers), there are issues with this solution:
wget --server-response --spider --quiet "${url}" 2>&1 | awk 'NR==1{print $2}'
which will return exit status of 0 even if the URL doesn't exist.
Alternatively, here is a reasonable container health-check for using wget:
wget -S --spider -q -t 1 "${url}" 2>&1 | grep "200 OK" > /dev/null
While it may not give you exact status out, it will at least give you a valid exit code based health responses (even with redirects on the endpoint).
Due to https://mywiki.wooledge.org/BashPitfalls#Non-atomic_writes_with_xargs_-P (output from parallel jobs in xargs risks being mixed), I would use GNU Parallel instead of xargs to parallelize:
cat url.lst |
parallel -P0 -q curl -o /dev/null --silent --head --write-out '%{url_effective}: %{http_code}\n' > outfile
In this particular case it may be safe to use xargs because the output is so short, so the problem with using xargs is rather that if someone later changes the code to do something bigger, it will no longer be safe. Or if someone reads this question and thinks he can replace curl with something else, then that may also not be safe.

Resources