Bash script loop through netCDF files via placeholder and use cdo commands - bash

I want to loop through 40 netCDF files. There are 20 files with the variable PRECC and 20 files with the variable PRECL("modelmember001.PRECC.192001-200512.nc", "modelmember002.PRECC.192001-200512.nc", ... ,"modelmember020.PRECC.192001-200512.nc" and for PRECL respectively).
I need to perform multiple cdo (climate data operator) commands with the loop (add PRECC and PRECL files, and change time series form 1920-2005 to 1955-2005).
This is the code that I use:
datadir="path_to_mydatat"
workdir="path_to_folder_for_newfiles"
members="{001 .. 020}"
for model in $members
do
echo 'working with model' ${model}
echo cdo -s add ${datadir}/modelmember${members}.PRECC.192001-200512.nc${datadir}/modelmember${members}.PRECL.192001-200512.nc ${workdir}/modelmember${members}PRECT.192001-200512.nc
# echo cdo -s selyear,1955/2005 ${workdir}/modelmember${members}.PRECT.192001-200512.nc ${workdir}/modelmember${members}.PRECT.195501-200512.nc
Eventually I need 20 files with the name
"modelmember001.PRECT.195501-200512.nc", "modelmember002.PRECT.195501-200512.nc", ... , "modelmember020.PRECT.195501-200512.nc"
This is what I get when I run my code (deliberately with an "echo" in front of the cdo line):
$./cdo_add.sh
{001 .. 020}
working with model {001
cdo -s add /path_to_mydatat/modelmember{001 .. 020}.PRECC.192001-200512.nc /path_to_mydatat/modelmember{001 .. 020}.PRECL.192001-200512.nc /path_to_folder_for_newfiles/modelmember{001 .. 020}.PRECT.192001-200512.nc
working with model ..
cdo -s add /path_to_mydatat/modelmember{001 .. 020}.PRECC.192001-200512.nc /path_to_mydatat/modelmember{001 .. 020}.PRECL.192001-200512.nc /path_to_folder_for_newfiles/modelmember{001 .. 020}.PRECT.192001-200512.nc
working with model 020}
cdo -s add /path_to_mydatat/modelmember{001 .. 020}.PRECC.192001-200512.nc /path_to_mydatat/modelmember{001 .. 020}.PRECL.192001-200512.nc /path_to_folder_for_newfiles/modelmember{001 .. 020}.PRECT.192001-200512.nc
My code doesn't seem to loop through the members. There is something wrong with the way I use the placeholder "members" but I can't figure out how to fix it.
Does anyone have a suggestion?
Cheers!

Your code does not seem to loop because you cannot assign a brace expansion to a variable and expect it to expand when substituted in a for loop. The following saves the literal string "{001 .. 020}" to the variable members, e.g.
members="{001 .. 020}"
When you use members in for model in $members, normal word-splitting occurs because it is just a string and you loop once with 001, then with .. and finally with 020 -- not the expected sequence from 001, 002, 003, ... 020. (there should be no spaces between the number and .. to begin with -- but that still doesn't allow you to use the expansion in a variable)
To properly use the expansion, get rid of the members variable altogether and use {001..020} in the loop, e.g.
for model in {001..020} ## (notice NO space between 001 and ..)
example:
$ for m in {001..020}; do echo $m; done
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
That will allow you to loop with your sequence in model.
From the conversation in the comments, I know understand that you have 40 files prefixed by modelmemberXXX (where XXX is 001-020) followed by .PRECC* or .PRECL* (20 files each) that you want to coordinate feeding matching pairs to a cdo command. While the preferred way would be to loop over one matching glob, e.g. for i in modelmember*.PRECC*; do, you can also use your brace expansion approach, e.g.
for i in {001..020}
do
a=$(echo modelmember${i}.PRECC*)
b=$(echo modelmember${i}.PRECL*)
if [ -e "$a" ] && [ -f "$b" ]
then
printf "%s\n%s\n\n" "$a" "$b"
fi
done
(note the [ -e "$a" ] && [ -f "$b" ] test just makes sure both files in the pair exist before proceeding with the command (printf here))
Example Output
modelmember001.PRECC.192001-200512.nc
modelmember001.PRECL.192001-200512.nc
modelmember002.PRECC.192001-200512.nc
modelmember002.PRECL.192001-200512.nc
modelmember003.PRECC.192001-200512.nc
modelmember003.PRECL.192001-200512.nc
...
modelmember020.PRECC.192001-200512.nc
modelmember020.PRECL.192001-200512.nc
You simply need to make use of $a and $b with whatever cdo_cmd you need within the loop. (as noted in the comments, you need to change to the directory containing the files, or precede the filenames with path/to/the/files)
Preferred Way
Rather than using your brace expansion, it is probably preferred to loop over one set (either PRECC or PRECL), validate the other exists, then execute the command, e.g.
for i in modelmember*.PRECC*
do
b="${i/PRECC/PRECL}"
if [ -e "$i" ] && [ -f "$b" ]
then
printf "%s\n%s\n\n" "$i" "$b"
fi
done
(same output)

Related

bash script to iterate through folders

I have 5 folders in Test directory with name output001 to output005. Each output folders have files with filename *.cl_evt in it. I want to enter the each output folders and want to run the below command in bash shell script. How to iterate through each folder and run the below code through bash shell script ?
#!/bin/bash
echo -e "sw00092413002xpcw3po_cl.evt
sw00092413002xpcw3po_cl_bary4.evt
sw00092413002sao.fits.gz" | barycorr ra=253.467570 dec=39.760169
You can use the command seq (with a format) to generate the numbers in the name of your directories... Like that:
for i in $(seq -f "%03g" 1 5); do
cd output$i
echo -e "sw00092413002xpcw3po_cl.evt
sw00092413002xpcw3po_cl_bary4.evt
sw00092413002sao.fits.gz" | barycorr ra=253.467570 dec=39.760169
cd ..
done
If you want to adjust the name of the data file in each of the sub-directories, you can write it this way:
for i in $(seq -f "%03g" 1 5); do
cd output$i
f1=$(ls sw*pcw3po_cl.evt 2>/dev/null)
if [ "$f1" == "" ] ; then
echo "no data file in directory output$i"
continue
else
f2=${f1/pcw3po_cl.evt/pcw3po_cl_bary4.evt}
f3=$(ls sw*sao.fits.gz)
echo -e "$f1
$f2
$f3" | barycorr ra=253.467570 dec=39.760169
fi
cd ..
done
Break-down of the script:
the command :
seq -f "%03g" 1 5
generates the sequence:
001
002
003
004
005
Which means that the loop :
for i in $(seq -f "%03g" 1 5); do
will loop 5 times, with variable i successively taking the value 001, 002, 003, 004, 005
Inside each loop, we're running a few commands. (Let's analyse the first loop when i contains 001) :
the fist line of the loop :
cd output$i
is equivalent to output001 (so we go to the directory output001).
The last line of the loop:
cd ..
goes up one level in the directory tree (i.e. it returns to the directory where you were at the beginning of the script).
The following command executes : ls sw*pcw3po_cl.evt inside the directory output001 and puts the result in the variable f1 :
f1=$(ls sw*pcw3po_cl.evt)
The following command takes variable f1, substitutes the string pcw3po_cl.evt with pcw3po_cl_bary4.evt and puts the result in the variable f2 :
f2=${f1/pcw3po_cl.evt/pcw3po_cl_bary4.evt}
The following command executes : ls sw*sao.fits.gz inside the directory output001 and puts the result in the variable f3 :
f3=$(ls sw*sao.fits.gz)
The following command :
echo -e "$f1 $f2 $f3" | barycorr ra=253.467570 dec=39.760169
prints out the values of the 3 variables f1 f2 and f3 (which hopefully contains the file of the directory whose name matches (successively) swpcw3po_cl.evt, swpcw3po_cl_bary4.evt, and sw*sao.fits.gz
Then it "pipes" these values into your application named barycorr as standard input of this application (so barycorr can read these filenames and probably process the files).

Bash: checking substring increments with modular arithmetic

I have a list of files with file names that contain a substring of 6 numbers that represents HHMMSS, HH: 2 digits hour, MM: 2 digits minutes, SS: 2 digits seconds.
If the list of files is ordered, the increments should be in steps of 30 minutes, that is, the first substring should be 000000, followed by 003000, 010000, 013000, ..., 233000.
I want to check that no file is missing iterating the list of files and checking that neither of these substrings is missing. My approach:
string_check=000000
for file in ${file_list[#]}; do
if [[ ${file:22:6} == $string_check ]]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
string_check=$((string_check+3000)) #this is the key line
done
And the previous to the last line is the key. It should be formatted to 6 digits, I know how to do that, but I want to add time like a clock, or, in more specific words, modular arithmetic modulo 60. How can that be done?
Assumptions:
all 6-digit strings are of the format xx[03]0000 (ie, has to be an even 00 or 30 minutes and no seconds)
if there are strings like xx1529 ... these will be ignored (see 2nd half of answer - use of comm - to address OP's comment about these types of strings being an error)
Instead of trying to do a bunch of mod 60 math for the MM (minutes) portion of the string, we can use a sequence generator to generate all the desired strings:
$ for string_check in {00..23}{00,30}00; do echo $string_check; done
000000
003000
010000
013000
... snip ...
230000
233000
While OP should be able to add this to the current code, I'm thinking we might go one step further and look at pre-parsing all of the filenames, pulling the 6-digit strings into an associative array (ie, the 6-digit strings act as the indexes), eg:
unset myarray
declare -A myarray
for file in ${file_list}
do
myarray[${file:22:6}]+=" ${file}" # in case multiple files have same 6-digit string
done
Using the sequence generator as the driver of our logic, we can pull this together like such:
for string_check in {00..23}{00,30}00
do
[[ -z "${myarray[${string_check}]}" ]] &&
echo "Problem: (file) '${string_check}' is missing"
done
NOTE: OP can decide if the process should finish checking all strings or if it should exit on the first missing string (per OP's current code).
One idea for using comm to compare the 2 lists of strings:
# display sequence generated strings that do not exist in the array:
comm -23 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
# OP has commented that strings not like 'xx[03]000]` should generate an error;
# display strings (extracted from file names) that do not exist in the sequence
comm -13 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
Where:
comm -23 - display only the lines from the first 'file' that do not exist in the second 'file' (ie, missing sequences of the format xx[03]000)
comm -13 - display only the lines from the second 'file' that do not exist in the first 'file' (ie, filenames with strings not of the format xx[03]000)
These lists could then be used as input to a loop, or passed to xargs, for additional processing as needed; keeping in mind the comm -13 output will display the indices of the array, while the associated contents of the array will contain the name of the original file(s) from which the 6-digit string was derived.
Doing this easy with POSIX shell and only using built-ins:
#!/usr/bin/env sh
# Print an x for each glob matched file, and store result in string_check
string_check=$(printf '%.0sx' ./*[0-2][0-9][03]000*)
# Now string_check length reflects the number of matches
if [ ${#string_check} -eq 48 ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
Alternatively:
#!/usr/bin/env sh
if [ "$(printf '%.0sx' ./*[0-2][0-9][03]000*)" \
= 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi

Is there a way to implement a counter in bash but for letters instead of numbers?

I'm working with an existing script which was written a bit messily. Setting up a loop with all of the spaghetti code could make a bigger headache than I want to deal with in the near term. Maybe when I have more time I can clean it up but for now, I'm just looking for a simple fix.
The script deals with virtual disks on a xen server. It reads multipath output and asks if particular LUNs should be formatted in any way based on specific criteria. However, rather than taking that disk path and inserting it, already formatted, into a configuration file, it simply presents every line in the format
'phy:/dev/mapper/UUID,xvd?,w',
UUID, of course, is an actual UUID.
The script actually presents each of the found LUNs in this format expecting the user to copy and paste them into the config file replacing each ? with a letter in sequence. This is tedious at best.
There are several ways to increment a number in bash. Among others:
var=$((var+1))
((var+=1))
((var++))
Is there a way to do the same with characters which doesn't involve looping over the entire alphabet such that I could easily "increment" the disk assignment from xvda to xvdb, etc?
To do an "increment" on a letter, define the function:
incr() { LC_CTYPE=C printf "\\$(printf '%03o' "$(($(printf '%d' "'$1")+1))")"; }
Now, observe:
$ echo $(incr a)
b
$ echo $(incr b)
c
$ echo $(incr c)
d
Because, this increments up through ASCII, incr z becomes {.
How it works
The first step is to convert a letter to its ASCII numeric value. For example, a is 97:
$ printf '%d' "'a"
97
The next step is to increment that:
$ echo "$((97+1))"
98
Or:
$ echo "$(($(printf '%d' "'a")+1))"
98
The last step is convert the new incremented number back to a letter:
$ LC_CTYPE=C printf "\\$(printf '%03o' "98")"
b
Or:
$ LC_CTYPE=C printf "\\$(printf '%03o' "$(($(printf '%d' "'a")+1))")"
b
Alternative
With bash, we can define an associative array to hold the next character:
$ declare -A Incr; last=a; for next in {b..z}; do Incr[$last]=$next; last=$next; done; Incr[z]=a
Or, if you prefer code spread out over multiple lines:
declare -A Incr
last=a
for next in {b..z}
do
Incr[$last]=$next
last=$next
done
Incr[z]=a
With this array, characters can be incremented via:
$ echo "${Incr[a]}"
b
$ echo "${Incr[b]}"
c
$ echo "${Incr[c]}"
d
In this version, the increment of z loops back to a:
$ echo "${Incr[z]}"
a
How about an array with entries A-Z assigned to indexes 1-26?
IFS=':' read -r -a alpharray <<< ":A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z"
This has 1=A, 2=B, etc. If you want 0=A, 1=B, and so on, remove the first colon.
IFS=':' read -r -a alpharray <<< "A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z"
Then later, where you actually need the letter;
var=$((var+1))
'phy:/dev/mapper/UUID,xvd${alpharray[$var]},w',
The only problem is that if you end up running past 26 letters, you'll start getting blanks returned from the array.
Use a Bash 4 Range
You can use a Bash 4 feature that lets you specify a range within a sequence expression. For example:
for letter in {a..z}; do
echo "phy:/dev/mapper/UUID,xvd${letter},w"
done
See also Ranges in the Bash Wiki.
Here's a function that will return the next letter in the range a-z. An input of 'z' returns 'a'.
nextl(){
((num=(36#$(printf '%c' $1)-9) % 26+97));
printf '%b\n' '\x'$(printf "%x" $num);
}
It treats the first letter of the input as a base 36 integer, subtracts 9, and returns the character whose ordinal number is 'a' plus that value mod 26.
Use Jot
While the Bash range option uses built-ins, you can also use a utility like the BSD jot utility. This is available on macOS by default, but your mileage may vary on Linux systems. For example, you'll need to install athena-jot on Debian.
More Loops
One trick here is to pre-populate a Bash array and then use an index variable to grab your desired output from the array. For example:
letters=( "" $(jot -w %c 26 a) )
for idx in 1 26; do
echo ${letters[$idx]}
done
A Loop-Free Alternative
Note that you don't have to increment the counter in a loop. You can do it other ways, too. Consider the following, which will increment any letter passed to the function without having to prepopulate an array:
increment_var () {
local new_var=$(jot -nw %c 2 "$1" | tail -1)
if [[ "$new_var" == "{" ]]; then
echo "Error: You can't increment past 'z'" >&2
exit 1
fi
echo -n "$new_var"
}
var="c"
var=$(increment_var "$var")
echo "$var"
This is probably closer to what the OP wants, but it certainly seems more complex and less elegant than the original loop recommended elsewhere. However, your mileage may vary, and it's good to have options!

Bash For loop - multiple variables, not using arrays?

I have run into an issue that seems like it should have an easy answer, but I keep hitting walls.
I'm trying to create a directory structure that contains files that are named via two different variables. For example:
101_2465
203_9746
526_2098
I am looking for something that would look something like this:
for NUM1 in 101 203 526 && NUM2 in 2465 9746 2098
do
mkdir $NUM1_$NUM2
done
I thought about just setting the values of NUM1 and NUM2 into arrays, but it overcomplicated the script -- I have to keep each line of code as simple as possible, as it is being used by people who don't know much about coding. They are already familiar with a for loop set up using the example above (but only using 1 variable), so I'm trying to keep it as close to that as possible.
Thanks in advance!
while read NUM1 NUM2; do
mkdir ${NUM1}_$NUM2
done << END
101 2465
203 9746
526 2098
END
Note that underscore is a valid variable name character, so you need to use braces to disambiguate the name NUM1 from the underscore
...setting the values of NUM1 and NUM2 into arrays, but it overcomplicated the script...
No-no-no. Everything will be more complicated, than arrays.
NUM1=( 101 203 526 )
NUM2=( 2465 9746 2098 )
for (( i=0; i<${#NUM1}; i++ )); do
echo ${NUM1[$i]}_${NUM2[$i]}
done
One way is to separate the entries in your two variables by newlines, and then use paste to get them together:
a='101 203 526'
b='2465 9746 2098'
# Convert space-separated lists into newline-separated lists
a="$(echo $a | sed 's/ /\n/g')"
b="$(echo $b | sed 's/ /\n/g')"
# Acquire newline-separated list of tab-separated pairs
pairs="$(paste <(echo "$a") <(echo "$b"))"
# Loop over lines in $pairs
IFS='
'
for p in $pairs; do
echo "$p" | awk '{print $1 "_" $2}'
done
Output:
101_2465
203_9746
526_2098

Find all numbers between two numbers in Bash

I have two variables like:
a=200
b=205
and want to find out all numbers between these two numbers (including. these specified numbers).
Check the seq instruction:
seq $a $b
The good tool is seq (as ChronoTrigger already stated), but this is not a bash internal function. Unfortunately the {1..4} notation is not working with variables. But there is a sideway:
a=200; b=205; eval "t=({$a..$b})"; echo ${t[*]}
Output:
200 201 202 203 204 205
The resulting array can be used in a for cycle later. for i in ${t[*]};{ ...;}. But better to use for((...)) loop for that as 1_CR stated.
ADDED
If it should be added some string as prefix or postfix to all elements then it is pretty easy to do:
echo ${t[*]/#/ab}
echo ${t[*]/%/cd}
Output:
ab200 ab201 ab202 ab203 ab204 ab205
200cd 201cd 202cd 203cd 204cd 205cd
ADDED #2
If fixed number of digits needed to be placed to the array this can be used
a=0; b=5; eval "t=({$a..$b})"; printf -v tt "%03d " ${t[*]}; t=($tt)
echo Array length: ${#t[*]}
echo ${t[*]}
Output:
Array length: 6
000 001 002 003 004 005
You could use the bash C-style for loop. Note that a $ is not needed before the a and b; this is characteristic of bash arithmetic expressions.
for ((i=a; i<=b; ++i))
do
echo $i
done
Alternately, to capture the numbers in an array
arr=()
for ((i=a; i<=b; ++i))
do
arr+=($i)
done
echo "${arr[*]}"

Resources