Say I am trying to download a set of 50 lecture notes efficiently. These notes are inside the prof subdirectory of a university website. The 45th lecture note is inside the lect45 subdirectory as a pdf entitled lect45.pdf. I get my first pdf as follows:
curl -O http://www.university.edu/~prof/lect1/lect1.pdf
How do I get all my 50 notes efficiently using cURL and bash? I'm trying to do this from the command line, not through a Python / Ruby / Perl script. I know something like the below will generate a lot of 404s:
curl -O http://www.university.edu/~prof/lect{1..50}/lect{1..50}.pdf
so what will work better? I would prefer an elegant one-liner over a loop.
Do it in several processes:
for i in {1..50}
do
curl -O http://www.university.edu/~prof/lect$i/lect$i.pdf &
done
or as a one-liner (just a different formatting):
for i in {1..50}; do curl -O http://www.university.edu/~prof/lect$i/lect$i.pdf & done
The & makes all processes run in parallel.
Don't be scared by the output; the shell tells you that 50 processes have been started, that's a lot of spam. Later it will tell you for each of these that they terminated. A lot of output again.
You probably don't want to run all 50 in parallel ;-)
EDIT:
Your example using {1..50} twice makes a matrix of the numbers. See for example echo {1..3}/{1..3} to see what I mean. And I guess that this way you create a lot of 404s.
Take a look at parallel shell tool.
So, for this particular case it will look like
seq 50 | parallel curl -O http://www.university.edu/~prof/lect{}/lect{}.pdf
As for curl - it doesn't have its own parallel mechanism, and what for it actually should? And your example with shell expansions {1..50} seems valid for me.
Related
I've made a simple for loop to make POST requests using curl and save the output to a .txt file.
for ((i=200000; i<=300000; i++)); do
curl -s -X POST -d "do=something&page=$i" "https://example.com/ajax" -o "$i.txt" > /dev/null
done
Currently, the script creates a new output in like every 260 ms. Is it possible to make this process even faster?
Have a look at gnu parallel. You can use this to get parallelisation for anything, but it also works well with curl. Look to replace for and while loops with it and test for optimal performance as more is not always better and there is diminishing marginal return as you go beyond a certain point.
Here is a reference to another article that discusses it: Bash sending multiple curl request using GNU parallel
I wanted to add a simple example to my previous post.
parallel -j8 curl -s '{}' < urls >/dev/null
-j8 means to use 8 parallel processes, but this can be left unset and it will try and use as many as possible. 'urls' is a text file with a bunch of URLs.
Change and apply as you see fit as it doesn't conform specifically to your example above.
I am new to bash scripting. I have a shell script that runs several functions for longitudinal image processing in Matlab through terminal. I would like to parallelize the process in the terminal.
Here is a brief example of how it runs:
./script.sh *.nii -surface -m /Applications/MATLAB_R2018b.app/bin/matlab
*.nii refers to images from a single subject taken at different times (i.e. subj1img1 subj1img2 subj3img3). There are 3 images per subject in my case. So in each run, the script runs through all images of a single subject.
I would like to parallelize this process so that I can run this script for multiple subjects at the same time. Reading through GNU parallel with my little experience I wasn't able to figure out the code I need to write to make it happen. I'd really appreciate if anyone has any suggestions.
parallel ./script.sh {} -surface -m /Applications/MATLAB_R2018b.app/bin/matlab ::: *.nii
you can start them in the background using & in a for loop as below :
for f in *.nii
do
./script.sh "$f" -surface -m /Applications/MATLAB_R2018b.app/bin/matlab &
done
To start, I am relatively new to shell scripting. I was wondering if anyone could help me create "steps" within a bash script. For example, I'd like to run one analysis and then have the script proceed to the next analysis with the output files generated in the first analysis.
So for example, the script below will generate output file "filt_C2":
./sortmerna --ref ./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-id98.db:./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-id98.db:./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-id95.db:./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s-id98.db:./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-database-id98.db:./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s.db --reads ~/path/to/file/C2.fastq --aligned ~/path/to/file/rrna_C2 --num_alignments 1 --other **~/path/to/file/filt_C2** --fastx --log -a 8 -m 64000
Once this step is complete, I would like to run another step that will use the output file "filt_C2" that was generated. I have been creating multiple bash scripts for each step; however, it would be more efficient if I could do each step in one bash file. So, is there a way to make a script that will complete Step 1, then move to Step 2 using the files generated in step 1? Any tips would be greatly appreciated. Thank you!
Welcome to bash scripting!
Here are a few tips:
You can have multiple lines, as many as you like, in a bash script file.
You may call other bash scripts (or any other executable programs) from within your shell script, just as Frank has mentioned in his answer.
You may use variables to make your script more generic, say, if you want to name your result "C3" instead of "C2". (Not shown below)
You may use bash functions if your script becomes more complicated, e.g. see https://ryanstutorials.net/bash-scripting-tutorial/bash-functions.php
I recommend placing sortmerna in a directory that is in your environmental PATH variable, and to replace the multiple ~/path/to/file to another variable (say WORKDIR) for consistency and flexibility.
For example, let’s say you name your script print_analysis.sh:
#!/bin/bash
# print_analysis.sh
# Written by Nikki E. Andrzejczyk, November 2018
# Set variables
WORKDIR=~/path/to/file
# Stage 1: Generate filt_C2 using SortMeRNA
./sortmerna --ref ./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-id98.db:./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-id98.db:./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-id95.db:./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s-id98.db:./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-database-id98.db:./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s.db \
--reads "$WORKDIR/C2.fastq" \
--aligned "$WORKDIR/rrna_C2" \
--num_alignments 1 \
--other "$WORKDIR/filt_C2" \
--fastx --log -a 8 -m 64000
# Stage 2: Process filt_C2 to generate result_C2
./stage2 "$WORKDIR/filt_C2" > "$WORKDIR/result_C2.txt"
# Stage 3: Print the result in result_C2
less "$WORKDIR/result_C2.txt"
Note how I use trailing backslash \ so that I could split the long sortmerna command into multiple shorter lines, and the use of # for human-readable comments.
There is still room for improvement as mentioned above but not implemented in this quick example, but hope this quick example shows you how to expand your bash script and make it do multiple steps in one go.
Bash is actually a very powerful scripting and programming language. To learn more, you may want to start with Bash tutorials like the following:
https://ryanstutorials.net/bash-scripting-tutorial/
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
Hope this helps! If you have any other questions, or if I had misunderstood your question, please feel free to ask!
Cheers,
Anthony
I'm using GNU parallel to run a Stata do file for many different data sets.
I have a Bash script that contains the following:
parallel -a arguments.txt -j 3 stata -b do $dofileloc {}
Since the do file has several different parts for each dataset, I would like to have the progress shown "real-time" (e.g. display "data loaded for XYZ" after a part of the Stata do file finishes for a dataset etc.).
So I'd like to redirect messages from Stata to the command line, but I'm having trouble doing this.
If I don't run Stata in batch mode I can see everything, which is a bit messy. I have tried using the shell command in Stata but I can't seem to figure out the correct combination.
I would appreciate any tips.
Does this do what you want?
parallel --tag --linebuffer -a arguments.txt -j 3 stata -b do $dofileloc {}
I have a feeling there's already and answer to this but I wasn't able to find it.
How can I execute each line of output in bash as it comes out?
For example, as my script runs, it generates,
command-1
command-2
command-3
etc.
I need some way to pipe them or something into something that will run them neatly. I've been experimenting with xargs but haven't found anything good to put on the receiving end.
I'd like to avoid doing something like dumping them into a separate script on the side if possible. (I also tried for loops, but they ended up breaking on words instead of lines.)
$ bash echosomecommands.sh | bash