Execute Script to Run on Multiple Files - bash

I have a script that I need to run on a large number of files.
This is the script and how it is run:
./tag-lbk.sh test.txt > output.txt
It takes a file as input and creates an output file. I need to run this on several input files, and I want a different output file for each input file.
How would I go about doing this? Can I make a script (I have not much experience writing bash scripts).
[edits]:
#fedorqui asked: Where are the names of the input files and output files stored?
There are several thousand files, each with a unique name. I was thinking maybe there is a way to recursively iterate through all the files (they are all .txt files). The output files should have names that are generated recursively, but in a random fashion.

Simple solution: Use two folders.
for input in /path/to/folder/*.txt ; do
name=$(basename "$input")
./tag-lbk.sh "$input" > "/path/to/output-folder/$name"
done
or, if you want everything in the same folder:
for input in *.txt ; do
if [[ "$input" = *-tagged.txt ]]; then
continue # skip output
fi
name=$(basename "$input" .txt)-tagged.txt
./tag-lbk.sh "$input" > "$name"
done
Try this with a small set of inputs somewhere where it doesn't matter when files get deleted, corrupted and overwritten.

The below script will find the files with extension .txt and redirect the output of the tag-1bk script to the randomly generated log file log.123 ..
#!/bin/bash
declare -a ar
# Find the files and store it in an array
# This way you don't iterate over the output files
# generated by this script
ar=($(find . -iname "*.txt"))
#Now iterate over the files and run your script
for i in "${ar[#]}"
do
#Create a random file in the format log.123,log.345
tmp_f=$(mktemp log.XXX)
#Redirect your output to the log file
./tag-lbk.sh "$i" > $tmp_f
done

Related

How To store grep data in a variable so that it can we used later on?

I wrote the code as follows:
#!/bin/bash
cd /home/ubuntu/MouniKaShell/newfolder/dev/EC2-Var
grep ec2_name: *.txt >tempStore
The data is not being stored in tempStore and when run there is no output.
I want to have all the files that contains the ec2_name and after that all file I want to store in tempStore.
If I understand well your question, you want all the .txt files located in a specific directory and which names contains "ec2_name" to be stored in a bash variable.
Here is a possible solution
#!/bin/bash
cd /home/ubuntu/MouniKaShell/newfolder/dev/EC2-Var;
# mylist contains all files
mylist="`ls *.txt | grep ec2_name`";
# you can iterate over files with for-loop
for file in $mylist; do
echo "file: $file";
done;
Correct me if I haven't understood your question
edit: simplified code

How do I write a script that loops through different folders and saves PATH outputs into a tab-delimited text file?

I am a newbie who is attempting to write a script that utilizes a text file containing names of different folders (subj1, subj2, subj3... etc), loops through each folder and goes into these folders to extract full paths of two files inside (i.e. /Users/desktop/subj1/animals/pig.jpg, /Users/desktop/subj1/animals/cow.jpg), then saves the subject ID and two files in 3 columns in a tab-delimited text file. (E.g. subj1 /Users/desktop/subj1/animals/pig.jpg /Users/desktop/subj1/animals/cow.jpg) so on and so forth.
The output would look like:
subj1 /Users/desktop/subj1/animals/pig.jpg /Users/desktop/subj1/animals/cow.jpg
subj2 /Users/desktop/subj2/animals/pig.jpg /Users/desktop/subj2/animals/cow.jpg
I have tried searching for answers but nothing really came close to answering this query.. I also need to do this for over 1000 folders so it would be insane to try to create a file by hand.
(edit): I would have to first verify if the files exist inside each folder. I created an array using a text file of folder names. Here is what I have thus far:
read -r -a array <<< ${subj_numbs}
array_error_file=()
for subj in "${array[#]}"
do
echo "Working on.." \"${subj}\"
pig_file=${dir}/${subj}/animals/pig.jpg
cow_file=${dir}/${subj}/animals/cow.jpg
if [ ! -f $pig_file ] && [ ! -f $cow_file ]; then
echo " [!] Files ${pig_file} and ${cow_file} do not exist."
array_error_file+=($pig_file)
array_error_file+=($cow_file)
else
echo "Writing path names to text file.. -> \"${subj}\"
pig_path="$pig_file"; pwd
cow_path="$cow_file"; pwd
Something like this ?
while read line; do
pig="$(cd "$(dirname "$line/animals/pig.jpg")"; pwd)/$(basename "$line/animals/pig.jpg")"
cow="$(cd "$(dirname "$line/animals/cow.jpg")"; pwd)/$(basename "$line/animals/cow.jpg")"
echo "$line \t $pig \t $cow" >> yourtabfile.txt
done < yourlist.txt
I'd be glad editing this to make it fitting your situation if it doesn't work, but that means you would have to provide more info.

Handling empty files when concatenating files in bash

I have a number (say, 100) of CSV files, out of which some (say, 20) are empty (i.e., 0 bytes file). I would like to concatenate the files into one single CSV file (say, assorted.csv), with the following requirement met:
For each empty file, there must be a blank line in assorted.csv.
It appears that simply doing cat *.csv >> assorted.csv skips the empty files completely in the sense that they do not have any lines and hence there is nothing to concatenate.
Though I can solve this problem using any high-level programming language, I would like to know if and how to make it possible using Bash.
Just make a loop and detect if the file is not empty. If it's empty, just echo the file name+comma in it: that will create a near blank line. Otherwise, prefix each line with the file name+comma.
#!/bin/bash
out=assorted.csv
# delete the file prior to doing concatenation
# or if ran twice it would be counted in the input files!
rm -f "$out"
for f in *.csv
do
if [ -s "$f" ] ; then
#cat "$f" | sed 's/^/$f,/' # cat+sed is too much here
sed "s/^/$f,/" "$f"
else
echo "$f,"
fi
done > $out

Redirecting the result files to different variable file names

I have a folder with, say, ten data files I01.txt, ..., I10.txt.. Each file, when executed using the command /a.out, gives me five output files, namely f1.txt, f2.txt, ... f5.txt.
I have written a simple bash program to execute all the files and save the output printed on the screen to a variable file using the command
./ cosima_peaks_444_temp_muuttuva -args > $counter-res.txt.
Using this, I am able to save the on screen output to the file. But the five files f1 to f5 are altered to store results of the last file run, in this case I10, and the results of the first nine files are lost.
So I want to save the output of each I*.txt file (f1 ... f5) to a a different file such that, when the program executes I01.txt, using ./a.out it stores the output of the files
f1>var1-f1.txt , f2>var1-f2.txt... f5 > var1-f5.txt
and then repeats the same for I02 (f1>var2-f1.txt ...).
#!/bin/bash
# echo "for looping over all the .txt files"
echo -e "Enter the name of the file or q to quit "
read dir
if [[ $dir = q ]]
then
exit
fi
filename="$dir*.txt"
counter=0
if [[ $dir == I ]]
then
for f in $filename ; do
echo "output of $filename"
((counter ++))
./cosima_peaks_444_temp_muuttuva $f -m202.75 -c1 -ng0.5 -a0.0 -b1.0 -e1.0 -lg > $counter-res.txt
echo "counter $counter"
done
fi
If I understand you want to pass files l01.txt, l02.txt, ... to a.out and save the output for each execution of a.out to a separate file like f01.txt, f02.txt, ..., then you could use a short script that reads each file named l*.txt in the directory and passes the value to a.out redirecting the output to a file fN.txt (were N is the same number in the lN.txt filename.) This presumes you are passing each filename to a.out and that a.out is not reading the entire directory automatically.
for i in l*.txt; do
num=$(sed 's/^l\(.*\)[.]txt/\1/' <<<"$i")
./a.out "$i" > "f${num}.txt"
done
(note: that is 's/(lowercase L) ... /\one..')
note: if you do not want the same N from the filename (with its leading '0'), then you can trim the leading '0' from the N value for the output filename.
(you can use a counter as you have shown in your edited post, but you have no guarantee in sort order of the filenames used by the loop unless you explicitly sort them)
note:, this presumes NO spaces or embedded newline or other odd characters in the filename. If your lN.txt names can have odd characters or spaces, then feeding a while loop with find can avoid the odd character issues.
With f1 - f5 Created Each Run
You know the format for the output file name, so you can test for the existence of an existing file name and set a prefix or suffix to provide unique names. For example, if your first pass creates filenames 'pass1-f01.txt', 'pass1-f02.txt', then you can check for that pattern (in several ways) and increment your 'passN' prefix as required:
for f in "$filename"; do
num=$(sed 's/l*\(.*\)[.]txt/\1/' <<<"$f")
count=$(sed 's/^0*//' <<<"$num")
while [ -f "pass${count}-f${num}.txt" ]; do
((count++))
done
./a.out "$f" > "pass${count}-f${num}.txt"
done
Give that a try and let me know if that isn't closer to what you need.
(note: the use of the herestring (<<<) is bash-only, if you need a portable solution, pipe the output of echo "$var" to sed, e.g. count=$(echo "$num" | sed 's/^0*//') )
I replaced your cosima_peaks_444_temp_muuttuva with a function myprog.
The OP asked for more explanation, so I put in a lot of comment:
# This function makes 5 output files for testing the construction
function myprog {
# Fill the test output file f1.txt with the input filename and a datestamp
echo "Output run $1 on $(date)" > f1.txt
# The original prog makes 5 output files, so I copy the new testfile 4 times
cp f1.txt f2.txt
cp f1.txt f3.txt
cp f1.txt f4.txt
cp f1.txt f5.txt
}
# Use the number in the inputfile for making a unique filename and move the output
function move_output {
# The parameter ${1} is filled with something like I03.txt
# You can get the number with a sed action, but it is more efficient to use
# bash functions, even in 2 steps.
# First step: Cut off from the end as much as possiple (%%) starting with a dot.
Inumber=${1%%.*}
# Step 2: Remove the I from the Inumber (that is filled with something like "I03").
number=${Inumber#I}
# Move all outputfiles from last run
for outputfile in f*txt; do
# Put the number in front of the original name
mv "${outputfile}" "${number}_${outputfile}"
done
}
# Start the main processing. You will perform the same logic for all input files,
# so make a loop for all files. I guess all input files start with an "I",
# followed by 2 characters (a number), and .txt. No need to use ls for listing those.
for input in I??.txt; do
# Call the dummy prog above with the name of the first input file as a parameter
myprog "${input}"
# Now finally the show starts.
# Call the function for moving the 5 outputfiles to another name.
move_output "${input}"
done
I guess you have the source code of this a.out binary. If so, I would modify it so that it outputs to several fds instead of several files. Then you can solve this very cleanly using redirects:
./a.out 3> fileX.1 4> fileX.2 5> fileX.3
and so on for every file you want to output. Writing to a file or to a (redirected) fd is equivalent in most programs (notable exception: memory mapped I/O, but that is not commonly used for such scripts - look for mmap calls)
Note that this is not very esoteric, but a very well known technique that is regularly used to separate output (stdout, fd=1) from errors (stderr, fd=2).

Shell script to execute executable over numerous files

Hi I have a file that sorts some code and reformats it. I have over 200 files to apply this to with incremental names run001, run002 etc. Is there a quick way to write a shell script to execute this file over all the files? The executable creates a new file called run001an etc so just running over all files containing run doesnt work, how do i increment the file number?
Cheers
how about:
for i in ./run*; do
process_the_file $i
done
which is valid Bash/Ksh
To be more specific with run### files you can have
for file in dir/run[0-9][0-9][0-9]; do
do_something "$file"
done
dir could simply be just . or other directories. If they have spaces, quote them around "" but only the directory parts.
In bash, you can make use of extended patterns to generate all number matches not just 3 digits:
shopt -s extglob
for file in dir/run+([0-9]); do
do_something "$file"
done

Resources