Run curl loop in bash faster - bash

I've made a simple for loop to make POST requests using curl and save the output to a .txt file.
for ((i=200000; i<=300000; i++)); do
curl -s -X POST -d "do=something&page=$i" "https://example.com/ajax" -o "$i.txt" > /dev/null
done
Currently, the script creates a new output in like every 260 ms. Is it possible to make this process even faster?

Have a look at gnu parallel. You can use this to get parallelisation for anything, but it also works well with curl. Look to replace for and while loops with it and test for optimal performance as more is not always better and there is diminishing marginal return as you go beyond a certain point.
Here is a reference to another article that discusses it: Bash sending multiple curl request using GNU parallel
I wanted to add a simple example to my previous post.
parallel -j8 curl -s '{}' < urls >/dev/null
-j8 means to use 8 parallel processes, but this can be left unset and it will try and use as many as possible. 'urls' is a text file with a bunch of URLs.
Change and apply as you see fit as it doesn't conform specifically to your example above.

Related

tar `--to-command` : how to send output to a custom function?

I have a very many tar archives I need to extract files from and perform post processing on (amongst other things, changing file encodings and applying some sed commands). I'm interesting in using tar's --to-command option to apply a custom function which does all of those things in sequence.
Up until now, I have been doing:
tar -xzi --to-command=<line of shell commands>
Unfortunately, the list of commands I need to do has got larger and means it is no longer neat (nor probably sensible) to attempt to do everything on one line.
To neaten things up, I've written function in another file, test-function.sh, which (tries to) perform those things in sequence:
#!/bin/bash
post_extract() {
<the things I need to do>
}
I realise the above is example is incomplete, but my problem at the moment is that I can't get --to-command to find the post_extract function to even go about testing it.
Where should I put post_extract / what would be the idiomatic way of exposing it to tar's --to-command?
Given the behaviors demonstrated in TAR with --to-command, (in particularly, --to-command='md5sum | ...' resulting in md5sum: |: No such file or directory), it's clear that tar --to-command doesn't invoke a shell, but simply performs shell-like parsing and then attempts an execv()-family invocation of the resulting command. This means the target command needs to be something that can actually be executed -- a script, or similar.
This behavior is (loosely) documented in the Running External Commands section of the GNU tar documentation.
One way around this is to have the thing that's invoked be a shell, and then use an exported function to ship code from your parent shell to that child process; as shown below:
#!/usr/bin/env bash
# ^^^^- MUST be bash, not sh
post_extract() {
: "do stuff here"
}
export -f post_extract
tar --to-command $'bash -c \'post_extract "$#"\' _' ...

Curl through list in bash

I am trying to iterate through a list to curl each one, this ultimately is to kick of a list of Jenkins jobs.
so i have a text file which contents is
ApplianceInsurance-Tests
BicycleInsurance-Tests
Breakdown-Tests
BridgingLoans-Tests
Broadband-Tests
Business-Loans
BusinessElectric-Tests
BusinessGas-Tests
and i am trying to create a loop in which i fire a curl command for each line in the txt file
for fn in `cat jenkins-lists.txt`; do "curl -X POST 'http://user:key#x.x.x.xxx:8080/job/$fn/build"; done
but i keep getting a error - No such file or directory.
Getting a little confused
Your do-done body is quoted wrong. It should be:
curl -X POST "http://user:key#x.x.x.xxx:8080/job/$fn/build"
I'd also recommend:
while read -r fn; do
curl -X POST "http://user:key#x.x.x.xxx:8080/job/$fn/build"
done < jenkins-list.txt
instead of for fn in $(anything); do .... With the second way you don't
have to worry about inadvertent globbing and the jenkins-list file may
get nicely buffered instead of needing to be read all into memory at once (not that it matters for such a small file but why not have a technique that works well more or less regardless of file size?).
If the error had come from curl, it would probably have been html-formatted. The only way I can reproduce the error you describe is by cat-ing a non-existent file.
Double check the name of the jenkins-lists.txt file, and make sure your script is running in the same directory as the file. Or use an absolute path to the file.

How to navigate to certain point in file using curl?

For certain reasons I'm writing a bash script in which I need to navigate to a certain point in a UTF-8 plaintext web page (this one to be precise gutenberg.org/files/2701/2701-0.txt), getting to this in the first place using curl. The command I'm currently using is:
curl -s http://gutenberg.org/files/2701/2701-0.txt|say
how could I make it start reading from a certain point in the book (i.e the start of a chapter)
You probably want:
chapter=4
curl -s "$url" | sed -n '/^CHAPTER '"$chapter"'\./,$p' | say

How do I curl multiple resources in one command?

Say I am trying to download a set of 50 lecture notes efficiently. These notes are inside the prof subdirectory of a university website. The 45th lecture note is inside the lect45 subdirectory as a pdf entitled lect45.pdf. I get my first pdf as follows:
curl -O http://www.university.edu/~prof/lect1/lect1.pdf
How do I get all my 50 notes efficiently using cURL and bash? I'm trying to do this from the command line, not through a Python / Ruby / Perl script. I know something like the below will generate a lot of 404s:
curl -O http://www.university.edu/~prof/lect{1..50}/lect{1..50}.pdf
so what will work better? I would prefer an elegant one-liner over a loop.
Do it in several processes:
for i in {1..50}
do
curl -O http://www.university.edu/~prof/lect$i/lect$i.pdf &
done
or as a one-liner (just a different formatting):
for i in {1..50}; do curl -O http://www.university.edu/~prof/lect$i/lect$i.pdf & done
The & makes all processes run in parallel.
Don't be scared by the output; the shell tells you that 50 processes have been started, that's a lot of spam. Later it will tell you for each of these that they terminated. A lot of output again.
You probably don't want to run all 50 in parallel ;-)
EDIT:
Your example using {1..50} twice makes a matrix of the numbers. See for example echo {1..3}/{1..3} to see what I mean. And I guess that this way you create a lot of 404s.
Take a look at parallel shell tool.
So, for this particular case it will look like
seq 50 | parallel curl -O http://www.university.edu/~prof/lect{}/lect{}.pdf
As for curl - it doesn't have its own parallel mechanism, and what for it actually should? And your example with shell expansions {1..50} seems valid for me.

Is there an equivilent function in CURL for WGET -N?

I was wondering if CURL allows you to do the same function as WGET -N does - which will only download / overwrite a file if the existing file on the client side is older than the one on the server.
I realise this question is old now, but just in case someone else is looking for the answer, it seems that cURL can indeed acheieve similar to wget -N.
Because I was just looking for an answer to this question today, and I found elsewhere that cURL does have time condition option. If google brings you here first, as it did me, then I hope this answer might save you some time in looking. According to curl --help, there is a time-cond flag;
-z, --time-cond <time> Transfer based on a time condition
The other part I needed, in order to make it like wget -N, is to make it try and preserve the timestamp. This is with the -R option.
-R, --remote-time Set the remote file's time on the local output
We can use these to download "$file", only in the condition when the current local "$file" timestamp is older than the server's file timestamp; we can do it in this form;
curl -R -o "$file" -z "$file" "$serverurl"
So, for example, I use it to check if there is a newer cygwin installer like this;
curl -R -o "C:\cygwin64\setup-x86_64.exe" -z "C:\cygwin64\setup-x86_64.exe" "https://www.cygwin.com/setup-x86_64.exe"
cURL doesn't have the same type of mirroring support that wget has built in. There is one setting in there with cURL that should make it pretty easy to implement this for yourself though with a little bit of wrapping logic. It's the --remote-time option:
-R/--remote-time
When used, this will make libcurl attempt to figure out the
timestamp of the remote file, and if that is available make the
local file get that same timestamp.

Resources