append output of multiple curl requests to a file from shell script - bash

I'm trying to fetch the JSON output by an internal API and add 100 to a parameter value between cURL requests. I need to loop through because it restricts the maximum number of results per request to 100. I was told to "increment and you should be able to get what you need".
Anyway, here's what I wrote:
#!/bin/bash
COUNTER=100
until [ COUNTER -gt 30000 ]; do
curl -vs "http://example.com/locations/city?limit=100&offset=$COUNTER" >> cities.json
let COUNTER=COUNTER+100
done
The problem is that I get a bunch of weird messages in the terminal and the file I'm trying to redirect the output too still contains it's original 100 objects. I feel like I'm probably missing something terrifically obvious. Any thoughts? I did use a somewhat old tutorial on the until loop, so maybe it's a syntax issue?
Thank you in advance!
EDIT: I'm not opposed to a completely alternate method, but I had hoped this would be somewhat straightforward. I figured my lack of experience was the main limiter.

You might find you can do this faster, and pretty easily with GNU Parallel:
parallel -k curl -vs "http://example.com/locations/city?limit=100\&offset={}" ::: $(seq 100 100 30000) > cities.json

If you want to overwrite the file's content only once, for your entire loop...
#!/bin/bash
# ^-- NOT /bin/sh, as this uses bash-only syntax
for (( counter=100; counter<=30000; counter+=100 )); do
curl -vs "http://example.com/locations/city?limit=100&offset=$counter"
done >cities.json
This is actually more efficient than putting >>cities.json on each curl command, as it only opens the output file once, and has the side effect (which you appear to want) of clearing the file's former contents when the loop is started.

Related

Possible to get bash input while user is at prompt? (Essentially an event listener)

Old stuff:
Background:
- Ultimate goal is to put a script in my .bash_profile that warns me by changing text color if I'm typing a commit message and it gets too
long (yes I'm aware vim has something like this).
Progress:
- I found the read -n option which led me to write this:
while true; do
# This hits at the 53rd character
read -rn53 input
# I have commit aliased to gc so the if is just checking if I'm writing a commit
if [ "${input:0:2}" = "gc" ]; then
printf "\nMessage getting long"
fi
done
Question:
- However, running this takes the user out of the bash prompt. I need a way to do something like this while at a normal prompt. I can't find
information on anything like this. Does that mean it's not possible?
Or am I just going about it the wrong way?
New progress:
I found the bind -x option which led me to write this:
check_commit() {
if [ "${READLINE_LINE:0:13}" == 'git commit -m' ] && [ ${#READLINE_LINE} -gt 87 ]; then
echo "Commit is $((${#READLINE_LINE} - 87)) characters too long!"
fi
READLINE_LINE="$READLINE_LINE$1"
READLINE_POINT=$(($READLINE_POINT+1))
}
bind -x '"\"": check_commit "\""'
It listens for a double quote and if I'm writing a long commit message tells me how many characters I am over the limit. Also puts the character I typed into the current line since it is eaten by the bind.
New question:
Now I just need a way to put in a regex, character list or at least a variable instead of \" so I can listen on more keys (Yes, I'm aware bind -x probably wasn't intended to be used this way. I can check performance/footprint/stability myself). I tried "$char", "${char}", "$(char)" and a few other things, but none seem to work. What is the correct approach here ?
AFAIK, not possible in a sane way if you want this to happen during your normal prompt (when PROMPT_COMMAND and PS1 are evaluated). That would involved binding a custom compiled readline function for every insert-self and alike.
If you want this to happen in a script using prompt builtin, this is crudely possible with a loop of
read -e -i $(munge_buf $buf) -n $(buf_warn_len $buf) -p $(buf_warning $buf) buf
like commands. This will allow you to create munge_buf() to alter the currently typed text if needed, buf_warn_len() to calculate a new len to warn at (which may be very large if warning was already displayed), and buf_warn_msg() to derive a warning message based upon the buffer.

Bash: Trying to append to a variable name in the output of a function

this is my very first post on Stackoverflow, and I should probably point out that I am EXTREMELY new to a lot of programming. I'm currently a postgraduate student doing projects involving a lot of coding in various programs, everything from LaTeX to bash, MATLAB etc etc.
If you could explicitly explain your answers that would be much appreciated as I'm trying to learn as I go. I apologise if there is an answer else where that does what I'm trying to do, but I have spent a couple of days looking now.
So to the problem I'm trying to solve: I'm currently using a selection of bioinformatics tools to analyse a range of genomes, and I'm trying to somewhat automate the process.
I have a few sequences with names that look like this for instance (all contained in folders of their own currently as paired files):
SOL2511_S5_L001_R1_001.fastq
SOL2511_S5_L001_R2_001.fastq
SOL2510_S4_L001_R1_001.fastq
SOL2510_S4_L001_R2_001.fastq
...and so on...
I basically wish to automate the process by turning these in to variables and passing these variables to each of the programs I use in turn. So for example my idea thus far was to assign them as wildcards, using the R1 and R2 (which appears in all the file names, as they represent each strand of DNA) as follows:
#!/bin/bash
seq1=*R1_001*
seq2=*R2_001*
On a rudimentary level this works, as it returns the correct files, so now I pass these variables to my first function which trims the DNA sequences down by a specified amount, like so:
# seqtk is the program suite, trimfq is a function within it,
# and the options -b -e specify how many bases to trim from the beginning and end of
# the DNA sequence respectively.
seqtk trimfq -b 10 -e 20 $seq1 >
seqtk trimfq -b 10 -e 20 $seq2 >
So now my problem is I wish to be able to append something like "_trim" to the output file which appears after the >, but I can't find anything that seems like it will work online.
Alternatively, I've been hunting for a script that will take the name of the folder that the files are in, and create a variable for the folder name which I can then give to the functions in question so that all the output files are named correctly for use later on.
Many thanks in advance for any help, and I apologise that this isn't really much of a minimum working example to go on, as I'm only just getting going on all this stuff!
Joe
EDIT
So I modified #ghoti 's for loop (does the job wonderfully I might add, rep for you :D ) and now I append trim_, as the loop as it was before ended up giving me a .fastq.trim which will cause errors later.
Is there any way I can append _trim to the end of the filename, but before the extension?
Explicit is usually better than implied, when matching filenames. Your wildcards may match more than you expect, especially if you have versions of the files with "_trim" appended to the end!
I would be more precise with the wildcards, and use for loops to process the files instead of relying on seqtk to handle multiple files. That way, you can do your own processing on the filenames.
Here's an example:
#!/bin/bash
# Define an array of sequences
sequences=(R1_001 R2_001)
# Step through the array...
for seq in ${sequences[#]}; do
# Step through the files in this sequence...
for file in SOL*_${seq}.fastq; do
seqtk trimfq -b 10 -e 20 "$file" > "${file}.trim"
done
done
I don't know how your folders are set up, so I haven't addressed that in this script. But the basic idea is that if you want the script to be able to manipulate individual filenames, you need something like a for loop to handle the that manipulation on a per-filename basis.
Does this help?
UPDATE:
To put _trim before the extension, replace the seqtk line with the following:
seqtk trimfq -b 10 -e 20 "$file" > "${file%.fastq}_trim.fastq"
This uses something documented in the Bash man page under Parameter Expansion if you want to read up on it. Basically, the ${file%.fastq} takes the $file variable and strips off a suffix. Then we add your extra text, along with the suffix.
You could also strip an extension using basename(1), but there's no need to call something external when you can use something built in to the shell.
Instead of setting variables with the filenames, you could pipe the output of ls to the command you want to run with these filenames, like this:
ls *R{1,2}_001* | xargs -I# sh -c 'seqtk trimfq -b 10 -e 20 "$1" > "${1}_trim"' -- #
xargs -I# will grab the output of the previous command and store it in # to be used by seqtk

How to check if PDF files are online?

I would like to iterate through a number of PDFs starting from 18001.pdf to N.pdf (adding 1 to the basename) and stop the loop as soon as a file is not online available. Below is the code that I guess is closest to what a solution might look like but actually there are multiple things not properly working it seems. The command in the while condition causes a syntax error f.x.
#!/bin/bash
path=http://dip21.bundestag.de/dip21/btp/18/
n=18001
while [ wget -q --spider $path$n.pdf ]
do
n=$(($n+1))
done
echo $n
HST - my question is not about debugging this specific code - it mostly serves the purpose of illustrating what I would like to do. Then again, I would appreciate a solution using a loop and wget.
If you want to test the success of a command, don't put it inside [ -- that's used to test the value of a conditional expression.
while wget -q --spider $path$n.pdf
do
...
done

Arduino returning more responses than queries have been sent

I have a problem when using Arduino to post data to Pachube. The Arduino is configured to return JSON data for the temperature when you send a 't' and return JSON data for the light level when you send an 'l'. This works perfectly through the Arduino Serial Monitor. I then created two bash scripts. One regularly sends the 't' and 'l' commands to Arduino and waits 10 seconds in between each request.
while true; do
echo -n t > /dev/ttyACM0
echo "$(date): Queried Arduino for temperature."
sleep 10
echo -n l > /dev/ttyACM0
echo "$(date): Queried Arduino for light."
sleep 10
done
This works fine. I get an echo message every 10 seconds. The other script reads the generated JSON from serial port (I basically copied it from some Web page).
ARDUINO_PORT=/dev/ttyACM0
ARDUINO_SPEED=9600
API_KEY='MY_PACHUBE_KEY'
FEED_ID='MY_FEED_ID'
# Set speed for usb
stty -F $ARDUINO_PORT ispeed $ARDUINO_SPEED ospeed $ARDUINO_SPEED raw
exec 6<$ARDUINO_PORT
# Read data from Arduino
while read -u 6 f ;do
# Remove trailing carriage return character added
# by println to satisfy stupid MS-DOS Computers
f=${f:0:${#f} - 1}
curl --request PUT --header "X-PachubeApiKey: $API_KEY" --data-binary "{ \"version\":\"1.0.0\", \"datastreams\":[ $f ] }" "http://api.pachube.com/v2/feeds/MY_FEED_ID"
echo "$(date) $f was read."
done
Unfortunately, this script goes crazy with echo messages telling me several times per 10 seconds that it posted data to Pachube although it should only do it every 10 seconds (whenever the first script told Arduino to create a JSON message). I thought it might be an issue with buffered messages on the Arduino but even when switching it off and on again the problem remains. Any thoughts? Thanks in advance.
I am completely unfamiliar with Arduino and a handful of other things you're doing here but here are a few general things I see:
Bash is almost entirely incapable of handling binary data reliably. There is no way to store a NUL byte in a Bash string. Looks like you're trying to pull some trickery to make arbitrary data readable - hopefully you're sending nothing but character data into read, otherwise this isn't likely going to work.
read reads newline-delimited input (or the given value of -d if your bash is new enough). I don't know the format the while loop is reading, but it has to be a newline delimited string of characters.
Use read -r unless you want escape sequences interpreted. (You almost always want -r with read.)
Unconditionally stripping a character off the end of each string isn't the greatest. I'd use: f=${f%+($'\r')}, which removes 1 or more adjacent \r's from the end of f. Remember to shopt -s extglob at the top of your script if this isn't the default.
This shouldn't be actually causing an issue, but I prefer not using exec unless it's really required - which it isn't here. Just put done <$ARDUINO_PORT to terminate the while loop and remove the -u 6 argument from read (unless something inside the loop is specifically reading from stdin and can't conflict, which doesn't appear to be the case). The open FD will automatically close when exiting the loop.
Don't create your own all-caps variable names in scripts because they are reserved and can conflict with variables from the environment. Use at least one lower-case letter. This of course doesn't apply if those variables are set by something in your system and you're only using or modifying them.

How to deal with NFS latency in shell scripts

I'm writing shell scripts where quite regularly some stuff is written
to a file, after which an application is executed that reads that file. I find that through our company the network latency differs vastly, so a simple sleep 2 for example will not be robust enough.
I tried to write a (configurable) timeout loop like this:
waitLoop()
{
local timeout=$1
local test="$2"
if ! $test
then
local counter=0
while ! $test && [ $counter -lt $timeout ]
do
sleep 1
((counter++))
done
if ! $test
then
exit 1
fi
fi
}
This works for test="[ -e $somefilename ]". However, testing existence is not enough, I sometimes need to test whether a certain string was written to the file. I tried
test="grep -sq \"^sometext$\" $somefilename", but this did not work. Can someone tell me why?
Are there other, less verbose options to perform such a test?
You can set your test variable this way:
test=$(grep -sq "^sometext$" $somefilename)
The reason your grep isn't working is that quotes are really hard to pass in arguments. You'll need to use eval:
if ! eval $test
I'd say the way to check for a string in a text file is grep.
What's your exact problem with it?
Also you might adjust your NFS mount parameters, to get rid of the root problem. A sync might also help. See NFS docs.
If you're wanting to use waitLoop in an "if", you might want to change the "exit" to a "return", so the rest of the script can handle the error situation (there's not even a message to the user about what failed before the script dies otherwise).
The other issue is using "$test" to hold a command means you don't get shell expansion when actually executing, just evaluating. So if you say test="grep \"foo\" \"bar baz\"", rather than looking for the three letter string foo in the file with the seven character name bar baz, it'll look for the five char string "foo" in the nine char file "bar baz".
So you can either decide you don't need the shell magic, and set test='grep -sq ^sometext$ somefilename', or you can get the shell to handle the quoting explicitly with something like:
if /bin/sh -c "$test"
then
...
Try using the file modification time to detect when it is written without opening it. Something like
old_mtime=`stat --format="%Z" file`
# Write to file.
new_mtime=$old_mtime
while [[ "$old_mtime" -eq "$new_mtime" ]]; do
sleep 2;
new_mtime=`stat --format="%Z" file`
done
This won't work, however, if multiple processes try to access the file at the same time.
I just had the exact same problem. I used a similar approach to the timeout wait that you include in your OP; however, I also included a file-size check. I reset my timeout timer if the file had increased in size since last it was checked. The files I'm writing can be a few gig, so they take a while to write across NFS.
This may be overkill for your particular case, but I also had my writing process calculate a hash of the file after it was done writing. I used md5, but something like crc32 would work, too. This hash was broadcast from the writer to the (multiple) readers, and the reader waits until a) the file size stops increasing and b) the (freshly computed) hash of the file matches the hash sent by the writer.
We have a similar issue, but for different reasons. We are reading s file, which is sent to an SFTP server. The machine running the script is not the SFTP server.
What I have done is set it up in cron (although a loop with a sleep would work too) to do a cksum of the file. When the old cksum matches the current cksum (the file has not changed for the determined amount of time) we know that the writes are complete, and transfer the file.
Just to be extra safe, we never overwrite a local file before making a backup, and only transfer at all when the remote file has two cksums in a row that match, and that cksum does not match the local file.
If you need code examples, I am sure I can dig them up.
The shell was splitting your predicate into words. Grab it all with $# as in the code below:
#! /bin/bash
waitFor()
{
local tries=$1
shift
local predicate="$#"
while [ $tries -ge 1 ]; do
(( tries-- ))
if $predicate >/dev/null 2>&1; then
return
else
[ $tries -gt 0 ] && sleep 1
fi
done
exit 1
}
pred='[ -e /etc/passwd ]'
waitFor 5 $pred
echo "$pred satisfied"
rm -f /tmp/baz
(sleep 2; echo blahblah >>/tmp/baz) &
(sleep 4; echo hasfoo >>/tmp/baz) &
pred='grep ^hasfoo /tmp/baz'
waitFor 5 $pred
echo "$pred satisfied"
Output:
$ ./waitngo
[ -e /etc/passwd ] satisfied
grep ^hasfoo /tmp/baz satisfied
Too bad the typescript isn't as interesting as watching it in real time.
Ok...this is a bit whacky...
If you have control over the file: you might be able to create a 'named pipe' here.
So (depending on how the writing program works) you can monitor the file in an synchronized fashion.
At its simplest:
Create the named pipe:
mkfifo file.txt
Set up the sync'd receiver:
while :
do
process.sh < file.txt
end
Create a test sender:
echo "Hello There" > file.txt
The 'process.sh' is where your logic goes : this will block until the sender has written its output. In theory the writer program won't need modifiying....
WARNING: if the receiver is not running for some reason, you may end up blocking the sender!
Not sure it fits your requirement here, but might be worth looking into.
Or to avoid synchronized, try 'lsof' ?
http://en.wikipedia.org/wiki/Lsof
Assuming that you only want to read from the file when nothing else is writing to it (ie, the writing process has finished) - you could check whether nothing else has file handle to it ?

Resources