bash script to iterate through folders - bash

I have 5 folders in Test directory with name output001 to output005. Each output folders have files with filename *.cl_evt in it. I want to enter the each output folders and want to run the below command in bash shell script. How to iterate through each folder and run the below code through bash shell script ?
#!/bin/bash
echo -e "sw00092413002xpcw3po_cl.evt
sw00092413002xpcw3po_cl_bary4.evt
sw00092413002sao.fits.gz" | barycorr ra=253.467570 dec=39.760169

You can use the command seq (with a format) to generate the numbers in the name of your directories... Like that:
for i in $(seq -f "%03g" 1 5); do
cd output$i
echo -e "sw00092413002xpcw3po_cl.evt
sw00092413002xpcw3po_cl_bary4.evt
sw00092413002sao.fits.gz" | barycorr ra=253.467570 dec=39.760169
cd ..
done
If you want to adjust the name of the data file in each of the sub-directories, you can write it this way:
for i in $(seq -f "%03g" 1 5); do
cd output$i
f1=$(ls sw*pcw3po_cl.evt 2>/dev/null)
if [ "$f1" == "" ] ; then
echo "no data file in directory output$i"
continue
else
f2=${f1/pcw3po_cl.evt/pcw3po_cl_bary4.evt}
f3=$(ls sw*sao.fits.gz)
echo -e "$f1
$f2
$f3" | barycorr ra=253.467570 dec=39.760169
fi
cd ..
done
Break-down of the script:
the command :
seq -f "%03g" 1 5
generates the sequence:
001
002
003
004
005
Which means that the loop :
for i in $(seq -f "%03g" 1 5); do
will loop 5 times, with variable i successively taking the value 001, 002, 003, 004, 005
Inside each loop, we're running a few commands. (Let's analyse the first loop when i contains 001) :
the fist line of the loop :
cd output$i
is equivalent to output001 (so we go to the directory output001).
The last line of the loop:
cd ..
goes up one level in the directory tree (i.e. it returns to the directory where you were at the beginning of the script).
The following command executes : ls sw*pcw3po_cl.evt inside the directory output001 and puts the result in the variable f1 :
f1=$(ls sw*pcw3po_cl.evt)
The following command takes variable f1, substitutes the string pcw3po_cl.evt with pcw3po_cl_bary4.evt and puts the result in the variable f2 :
f2=${f1/pcw3po_cl.evt/pcw3po_cl_bary4.evt}
The following command executes : ls sw*sao.fits.gz inside the directory output001 and puts the result in the variable f3 :
f3=$(ls sw*sao.fits.gz)
The following command :
echo -e "$f1 $f2 $f3" | barycorr ra=253.467570 dec=39.760169
prints out the values of the 3 variables f1 f2 and f3 (which hopefully contains the file of the directory whose name matches (successively) swpcw3po_cl.evt, swpcw3po_cl_bary4.evt, and sw*sao.fits.gz
Then it "pipes" these values into your application named barycorr as standard input of this application (so barycorr can read these filenames and probably process the files).

Related

Loop Script from Input File

I have a reference file with device names in them. For example WABEL8499IPM101. I'm using this script to set the base name (without the last 3 digits) to look at the reference file and see what is already used. If 101 is used it will create a file for me with 102, 103 if I request 2 total. I'm looking to use an input file to run it multiple times. I'm also trying to figure out how to start at 101 if there isn't a name found when searching the reference file
I would like to loop this using an input file instead of manually entering bash test.sh WABEL8499IPM 2 each time. I would like to be able to build an input file of all the names that need compared and then output. It would also be nice that if there isn't a match that it starts creating names at WABEL8499IPM101 instead of just WABEL8499IPM1.
Input file example:
ColumnA (BASE NAME) ColumnB (QUANTITY)
WABEL8499IPM 2
Script:
SRCFILE="~/Desktop/deviceinfo.csv"
LOGDIR="~/Desktop/"
LOGFILE="$LOGDIR/DeviceNames.csv"
# base name, such as "WABEL8499IPM"
device_name=$1
# quantity, such as "2"
quantityNum=$2
# the largest in sequence, such as "WABEL8499IPM108"
max_sequence_name=$(cat $SRCFILE | grep -o -e "$device_name[0-9]*" | sort --reverse | head -n 1)
# extract the last 3digit number (such as "108") from max_sequence_name
max_sequence_num=$(echo $max_sequence_name | rev | cut -c 1-3 | rev)
# create new sequence_name
# such as ["WABEL8499IPM109", "WABEL8499IPM110"]
array_new_sequence_name=()
for i in $(seq 1 $quantityNum);
do
cnum=$((max_sequence_num + i))
array_new_sequence_name+=($(echo $device_name$cnum))
done
#CODE FOR CREATING OUTPUT FILE HERE
#for fn in ${array_new_sequence_name[#]}; do touch $fn; done;
# write log
for sqn in ${array_new_sequence_name[#]};
do
echo $sqn >> $LOGFILE
done
Usage:
bash test.sh WABEL8499IPM 2
Result in the log file:
WABEL8499IPM109
WABEL8499IPM110
Just wrap a loop around your code instead of assuming the args come in on the command line.
SRCFILE="~/Desktop/deviceinfo.csv"
LOGDIR="~/Desktop/"
LOGFILE="$LOGDIR/DeviceNames.csv"
while read device_name quantityNum
do max_sequence_name=$( grep -o -e "$device_name[0-9]*" $SRCFILE |
sort --reverse | head -n 1)
max_sequence_num=${max_sequence_name: -3}
array_new_sequence_name=()
for i in $(seq 1 $quantityNum)
do cnum=$((max_sequence_num + i))
array_new_sequence_name+=("$device_name$cnum")
done
for sqn in ${array_new_sequence_name[#]};
do echo $sqn >> $LOGFILE
done
done < input.file
I'd maybe pass the input file as the parameter now.

Make cat command to operate recursively looping through a directory

I have a large directory of data files which I am in the process of manipulating to get them in a desired format. They each begin and end 15 lines too soon, meaning I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence.
To begin, I have written the following code to separate the relevant data into easy chunks:
#!/bin/bash
destination='media/user/directory/'
for file1 in `ls $destination*.ascii`
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
done
This worked perfectly, so the next step is the worlds simplest cat command:
cat $file3 $file2 > outfile
However, what I need to do is to stitch file2 to the previous file3. Look at this screenshot of the directory for better understanding.
See how these files are all sequential over time:
*_20090412T235945_20090413T235944_* ### April 13
*_20090413T235945_20090414T235944_* ### April 14
So I need to take the 15 lines snipped off the April 14 example above and paste it to the end of the April 13 example.
This doesn't have to be part of the original code, in fact it would be probably best if it weren't. I was just hoping someone would be able to help me get this going.
Thanks in advance! If there is anything I have been unclear about and needs further explanation please let me know.
"I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence."
If I understand what you want correctly, it can be done with one line of code:
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
When this has run, the files file1.new, file2.new, and file3.new will be in the new form with the lines transferred. Of course, you are not limited to three files: you may specify as many as you like on the command line.
Example
To keep our example short, let's just strip the first 2 lines instead of 15. Consider these test files:
$ cat file1
1
2
3
$ cat file2
4
5
6
7
8
$ cat file3
9
10
11
12
13
14
15
Here is the result of running our command:
$ awk 'NR==1 || FNR==3{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
$ cat file1.new
1
2
3
4
5
$ cat file2.new
6
7
8
9
10
$ cat file3.new
11
12
13
14
15
As you can see, the first two lines of each file have been transferred to the preceding file.
How it works
awk implicitly reads each file line-by-line. The job of our code is to choose which new file a line should be written to based on its line number. The variable f will contain the name of the file that we are writing to.
NR==1 || FNR==16{f=FILENAME ".new"}
When we are reading the first line of the first file, NR==1, or when we are reading the 16th line of whatever file we are on, FNR==16, we update f to be the name of the current file with .new added to the end.
For the short example, which transferred 2 lines instead of 15, we used the same code but with FNR==16 replaced with FNR==3.
print>f
This prints the current line to file f.
(If this was a shell script, we would use >>. This is not a shell script. This is awk.)
Using a glob to specify the file names
destination='media/user/directory/'
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' "$destination"*.ascii
Your task is not that difficult at all. You want to gather a list of all _end files in the directory (using a for loop and globbing, NOT looping on the results of ls). Once you have all the end files, you simply parse the dates using parameter expansion w/substing removal say into d1 and d2 for date1 and date2 in:
stuff_20090413T235945_20090414T235944_end
| d1 | | d2 |
then you simply subtract 1 from d1 into say date0 or d0 and then construct a previous filename out of d0 and d1 using _snip instead of _end. Then just test for the existence of the previous _snip filename, and if it exists, paste your info from the current _end file to the previous _snip file. e.g.
#!/bin/bash
for i in *end; do ## find all _end files
d1="${i#*stuff_}" ## isolate first date in filename
d1="${d1%%T*}"
d2="${i%T*}" ## isolate second date
d2="${d2##*_}"
d0=$((d1 - 1)) ## subtract 1 from first, get snip d1
prev="${i/$d1/$d0}" ## create previous 'snip' filename
prev="${prev/$d2/$d1}"
prev="${prev%end}snip"
if [ -f "$prev" ] ## test that prev snip file exists
then
printf "paste to : %s\n" "$prev"
printf " from : %s\n\n" "$i"
fi
done
Test Input Files
$ ls -1
stuff_20090413T235945_20090414T235944_end
stuff_20090413T235945_20090414T235944_snip
stuff_20090414T235945_20090415T235944_end
stuff_20090414T235945_20090415T235944_snip
stuff_20090415T235945_20090416T235944_end
stuff_20090415T235945_20090416T235944_snip
stuff_20090416T235945_20090417T235944_end
stuff_20090416T235945_20090417T235944_snip
stuff_20090417T235945_20090418T235944_end
stuff_20090417T235945_20090418T235944_snip
stuff_20090418T235945_20090419T235944_end
stuff_20090418T235945_20090419T235944_snip
Example Use/Output
$ bash endsnip.sh
paste to : stuff_20090413T235945_20090414T235944_snip
from : stuff_20090414T235945_20090415T235944_end
paste to : stuff_20090414T235945_20090415T235944_snip
from : stuff_20090415T235945_20090416T235944_end
paste to : stuff_20090415T235945_20090416T235944_snip
from : stuff_20090416T235945_20090417T235944_end
paste to : stuff_20090416T235945_20090417T235944_snip
from : stuff_20090417T235945_20090418T235944_end
paste to : stuff_20090417T235945_20090418T235944_snip
from : stuff_20090418T235945_20090419T235944_end
(of course replace stuff_ with your actual prefix)
Let me know if you have questions.
You could store the previous $file3 value in a variable (and do a check if it is not the first run with -z check):
#!/bin/bash
destination='media/user/directory/'
prev=""
for file1 in $destination*.ascii
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
if [ -z "$prev" ]; then
cat $prev $file2 > outfile
fi
prev=$file3
done

if output file from loop isn't empty, copy last line bash

I have a loop, in a bash script. It runs a programme that by default outputs a text file when it works, and no file if it doesn't. I'm running it a large number of times (> 500K) so I want to merge the output files, row by row. If one iteration of the loop creates a file, I want to take the LAST line of that file, append it to a master output file, then delete the original so I don't end up with 1000s of files in one directory. The Loop I have so far is:
oFile=/path/output/outputFile_
oFinal=/path/output.final
for counter in {101..200}
do
$programme $counter -out $oFile$counter
if [ -s $oFile$counter ] ## This returns TRUE if file isn't empty, right?
then
out=$(tail -1 $oFile$counter)
final=$out$oFile$counter
$final >> $oFinal
fi
done
However, it doesn't work properly, as it seems to not return all the files I want. So is the conditional wrong?
You can be clever and pass the programme a process substitution instead of a "real" file:
oFinal=/path/output.final
for counter in {101..200}
do
$programme $counter -out >(tail -n 1)
done > $oFinal
$programme will treat the process substitution as a file, and all the lines written to it will be processed by tail
Testing: my "programme" outputs 2 lines if the given counter is even
$ cat programme
#!/bin/bash
if (( $1 % 2 == 0 )); then
{
echo ignore this line
echo $1
} > $2
fi
$ ./programme 101 /dev/stdout
$ ./programme 102 /dev/stdout
ignore this line
102
So, this loop should output only the even numbers between 101 and 200
$ for counter in {101..200}; do ./programme $counter >(tail -1); done
102
104
[... snipped ...]
198
200
Success.

Shell script help to fix it please

Example i run
sh mycode Manu gg44
And I need to get file with name Manu
with content:
gg44
192.168.1.2.(second line) (this number I explain below)
(in the directory DIR=/h/Manu/HOME/hosts there is already file Alex
cat Alex
ff55
198.162.1.1.(second line))
So mycode creates file named Manu with the first line gg44 and generate IP at the second line.
BUT for generating IP he has compare with Alex file IP. So second line of Manu has to be 198.162.1.2. If we have more than one files in the directory then we have to check all second lines of all files and then generate according to them.
[CODE]
DIR=/h/Manu/HOME/hosts #this is a directory where i have my files (structure of the files above)
for j in $1 $2 #$1 is Manu; $2 is gg44
do
if [ -d $DIR ] #checking if directory exists (it exists already)
then #if it exists
for i in $* # for every file in this directory do operation
do
sort /h/ManuHOME/hosts/* | tail -2 | head -1 # get second line of every file
IFS="." read A B C D # divide number in second line into 4 parts (our number 192.168.1.1. for example)
if [ "$D" != 255 ] #compare D (which is 1 in our example: if its less than 255)
then
D=` expr $D + 1 ` #then increment it by 1
else
C=` expr $C + 1 ` #otherwise increment C and make D=0
D=0
fi
echo "$2 "\n" $A.$B.$C.$D." >/h/Manu/HOME/hosts/$1
done done #get $2 (which is gg44 in example as a first line and get ABCD as a second line)[/CODE]
In the result it creates file with name Manu and first line, but second line is totally wrong. It gives me ...1.
Also error message
sort: open failed: /h/u15/c2/00/c2rsaldi/HOME/hosts/yu: No such file or directory
yu n ...1.
#!/bin/bash
dir=/h/Manu/HOME/hosts
filename=$dir/$1
firstline=$2
# find the max IP address from all current files:
maxIP=$(awk 'FNR==2' $dir/* | cut -d. -f4 | sort -nr | head -1)
ip=198.162.1.$(( maxIP + 1 ))
cat > $filename <<END
$firstline
$ip
END
I'll leave it up to you to decide what to do when you get more than 255 files...

bash for loop with numerated names

I'm currently working on a maths project and just run into a bit of a brick wall with programming in bash.
Currently I have a directory containing 800 texts files, and what I want to do is run a loop to cat the first 80 files (_01 through to _80) into a new file and save elsewhere, then the next 80 (_81 to _160) files etc.
all the files in the directory are listed like so: ath_01, ath_02, ath_03 etc.
Can anyone help?
So far I have:
#!/bin/bash
for file in /dir/*
do
echo ${file}
done
Which just simple lists my file. I know I need to use cat file1 file2 > newfile.txt somehow but it's confusing my with the numerated extension of _01, _02 etc.
Would it help if I changed the name of the file to use something other than an underscore? like ath.01 etc?
Cheers,
Since you know ahead of time how many files you have and how they are numbered, it may be easier to "unroll the loop", so to speak, and use copy-and-paste and a little hand-tweaking to write a script that uses brace expansion.
#!/bin/bash
cat ath_{001..080} > file1.txt
cat ath_{081..160} > file2.txt
cat ath_{161..240} > file3.txt
cat ath_{241..320} > file4.txt
cat ath_{321..400} > file5.txt
cat ath_{401..480} > file6.txt
cat ath_{481..560} > file7.txt
cat ath_{561..640} > file8.txt
cat ath_{641..720} > file9.txt
cat ath_{721..800} > file10.txt
Or else, use nested for-loops and the seq command
N=800
B=80
for n in $( seq 1 $B $N ); do
for i in $( seq $n $((n+B - 1)) ); do
cat ath_$i
done > file$((n/B + 1)).txt
done
The outer loop will iterate n through 1, 81, 161, etc. The inner loop will iterate i over 1 through 80, then 81 through 160, etc. The body of the inner loops just dumps the contents if the ith file to standard output, but the aggregated output of the loop is stored in file 1, then 2, etc.
You could try something like this:
cat "$file" >> "concat_$(( ${file#/dir/ath_} / 80 ))"
with ${file#/dir/ath_} you remove the prefix /dir/ath_ from the filename
$(( / 80 )) you get the suffix divided by 80 (integer division)
Also change the loop to
for file in /dir/ath_*
So you only get the files you need
If you want groups of 80 files, you'd do best to ensure that the names are sortable; that's why leading zeroes were often used. Assuming that you only have one underscore in the file names, and no newlines in the names, then:
SOURCE="/path/to/dir"
TARGET="/path/to/other/directory"
(
cd $SOURCE || exit 1
ls |
sort -t _ -k2,2n |
awk -v target="$TARGET" \
'{ file[n++] = $1
if (n >= 80)
{
printf "cat"
for (i = 0; i < 80; i++)
printf(" %s", file[i]
printf(" >%s/%s.%.2d\n", target, "newfile", ++number)
n = 0
}
END {
if (n > 0)
{
printf "cat"
for (i = 0; i < n; i++)
printf(" %s", file[i]
printf(" >%s/%s.%.2d\n", target, "newfile", ++number)
}
}' |
sh -x
)
The two directories are specified (where the files are and where the summaries should go); the command changes directory to the source directory (where the 800 files are). It lists the names (you could specify a glob pattern if you needed to) and sorts them numerically. The output is fed into awk which generates a shell script on the fly. It collects 80 names at a time, then generates a cat command that will copy those files to a single destination file such as "newfile.01"; tweak the printf() command to suit your own naming/numbering conventions. The shell commands are then passed to a shell for execution.
While testing, replace the sh -x with nothing, or sh -vn or something similar. Only add an active shell when you're sure it will do what you want. Remember, the shell script is in the source directory as it is running.
Superficially, the xargs command would be nice to use; the difficulty is coordinating the output file number. There might be a way to do that with the -n 80 option to group 80 files at a time and some fancy way to generate the invocation number, but I'm not aware of it.
Another option is to use xargs -n to execute a shell script that can deduce the correct output file number by listing what's already in the target directory. This would be cleaner in many ways:
SOURCE="/path/to/dir"
TARGET="/path/to/other/directory"
(
cd $SOURCE || exit 1
ls |
sort -t _ -k2,2n |
xargs -n 80 cpfiles "$TARGET"
)
Where cpfiles looks like:
TARGET="$1"
shift
if [ $# -gt 0 ]
then
old=$(ls -r newfile.?? | sed -n -e 's/newfile\.//p; 1q')
new=$(printf "%.2d" $((old + 1)))
cat "$#" > "$TARGET/newfile. $new
fi
The test for zero arguments avoids trouble with xargs executing the command once with zero arguments. On the whole, I prefer this solution to the one using awk.
Here's a macro for #chepner's first solution, using GNU Make as the templating language:
SHELL := /bin/bash
N = 800
B = 80
fileNums = $(shell seq 1 $$((${N}/${B})) )
files = ${fileNums:%=file%.txt}
all: ${files}
file%.txt : start = $(shell echo $$(( ($*-1)*${B}+1 )) )
file%.txt : end = $(shell echo $$(( $* * ${B} )) )
file%.txt:
cat ath_{${start}..${end}} > $#
To use:
$ make -n all
cat ath_{1..80} > file1.txt
cat ath_{81..160} > file2.txt
cat ath_{161..240} > file3.txt
cat ath_{241..320} > file4.txt
cat ath_{321..400} > file5.txt
cat ath_{401..480} > file6.txt
cat ath_{481..560} > file7.txt
cat ath_{561..640} > file8.txt
cat ath_{641..720} > file9.txt
cat ath_{721..800} > file10.txt

Resources