How to add grouping mechanism inside for loop in bash - bash

I have a for loop that loops through a list of files, and inside the for loop a script is called, that takes this file name as input.
Something like
for file in $(cat list_of_files) ; do
script $file
done
the file list_of_files has files like
file1
file2
file3
...
so with each iteration, one file is processed.
I have to design something like, loop through all the files, group them into groups of 3 , so that in one loop, script will be called 3 times, and not one by one,and then again the other 3 will be called in second loop iteration and so on
something like,
for file in $(cat list_of_files) ; do
# do somekind of grouping here
call one more loop to run the sript.sh 3 times, so something like
for i=1 to 3 and then next iteration from 4 to 6 and so on..
script.sh $file1
script.sh $file2
script.sh $file3
done
I am struggling currently on how to get this looping done and i am stuck here and could not think of efficient way here.

Change for ... in to while read
for file in $(cat list_of_files)
This style of loop is subtly dangerous and/or incorrect. It won't work right on file names with spaces, asterisks, or other special characters. As a general rule, avoid for x in $(...) loops. For more details, see:
Bash Pitfalls: for f in $(ls *.mp3).
A safer alternative is to use while read along with process substitution, like so:
while IFS= read -r file; do
...
done < <(cat list_of_files)
It's ugly, I'll admit, but it will handle special characters safely. It split apart file names with spaces and it won't expand * globs. For more details on what this is doing, see:
Unix.SE: Understanding “IFS= read -r line”.
You can then remove the Useless Use of Cat and use a simple redirection instead:
while IFS= read -r file; do
...
done < list_of_files
Read 3 at a time
So far these changes haven't answered your core question, how to group files 3 at a time. The switch to read has actually served a second purpose. It makes grouping easy. The trick is to call read multiple times per iteration. This is an easy change with while read; it's not so easy with for ... in.
Here's what that looks like:
while IFS= read -r file1 &&
IFS= read -r file2 &&
IFS= read -r file3
do
script.sh "$file1"
script.sh "$file2"
script.sh "$file3"
done < list_of_files
This calls read three times, and once all three succeed it proceeds to the loop body.
It will work great if you always have a multiple of 3 items to process. If not, it will mess up at the end and skip the last file or two. If that's an issue we can update it to try to handle that case.
while IFS= read -r file1; do
IFS= read -r file2
IFS= read -r file3
script.sh "$file1"
[[ -n $file2 ]] && script.sh "$file2"
[[ -n $file3 ]] && script.sh "$file3"
done < list_of_files
Run the scripts in parallel
If I understand your question right, you also want to run the scripts at the same time rather than sequentially, one after the other. If so, the way to do that is to append &, which will cause them to run in the background. Then call wait to block until they have all finished before proceeding.
while IFS= read -r file1; do
IFS= read -r file2
IFS= read -r file3
script.sh "$file1" &
[[ -n $file2 ]] && script.sh "$file2" &
[[ -n $file3 ]] && script.sh "$file3" &
wait
done < list_of_files

How about
xargs -d $'\n' -L 1 -P 3 script.sh <list_of_files
-P 3 runs 3 processes in parallel. Each of those gets the input of one line (due to -L 1), and the -d options ensures that spaces in an input line are not considered separate arguments.

You can use bash arrays to store the filenames until you get 3 of them:
#!/bin/bash
files=()
while IFS= read -r f; do
files+=( "$f" )
(( ${#files[#]} < 3 )) && continue
script.sh "${files[0]}"
script.sh "${files[1]}"
script.sh "${files[2]}"
files=()
done < list_of_files
However, I think that John Kugelman's answer is simpler, then better: it uses less bash-specific features, then it can be more easily converted to a POSIX version.

you should not mix script languages if you don't absolutely have to
you can start with that
from os import listdir
from os.path import isfile, join
PATH_FILES = "/yourfolder"
def yourFunction(file_name):
file_path = PATH_FILES + "/" + file_name
print(file_path) #or do something else
print(file_path) #or do something else
print(file_path) #or do something else
file_names = [f for f in listdir(PATH_FILES) if isfile(join(PATH_FILES, f))]
for file_name in file_names:
yourFunction(file_name)

If mapfile aka readarray is available/acceptable. (bash4+ is required)
Assuming script.sh can accept multiple input.
#!/usr/bin/env bash
while mapfile -tn3 files && (( ${#files[*]} == 3 )); do
script.sh "${files[#]}"
done < list_of_files
otherwise loop through the array named files
#!/usr/bin/env bash
while mapfile -tn3 files && (( ${#files[*]} == 3 )); do
for file in "${files[#]}"; do
script.sh "$file"
done
done < list_of_files
The body after the do will run/execute if there are always 3 lines if there is not enough lines to satisfy 3 lines until the end of the file, just remove the
&& (( ${#files[*]} == 3 ))
from the script.
or do it manually one-by-one, but it should have 3 lines to be processed (the file) until the end.
#!/usr/bin/env bash
while mapfile -tn3 files && (( ${#files[*]} == 3 )); do
script.sh "${file[0]}"
script.sh "${file[1]}"
script.sh "${file[2]}"
done < list_of_files

Related

Loop through table and parse multiple arguments to scripts in Bash

I am in a situation similar to this one and having difficulties implementing this kind of solution for my situation.
I have file.tsv formatted as follows:
x y
dog woof
CAT meow
loud_goose honk-honk
duck quack
with a fixed number of columns (but variable rows) and I need to loop those pairs of values, all but the first one, in a script like the following (pseudocode)
for elements in list; do
./script1 elements[1] elements[2]
./script2 elements[1] elements[2]
done
so that script* can take the arguments from the pair and run with it.
Is there a way to do it in Bash?
I was thinking I could do something like this:
list1={`awk 'NR > 1{print $1}' file.tsv`}
list2={`awk 'NR > 1{print $2}' file.tsv`}
and then to call them in the loop based on their position, but I am not sure on how.
Thanks!
Shell tables are not multi-dimensional so table element cannot store two arguments for your scripts. However since you are processing lines from file.tsv, you can iterate on each line, reading both elements at once like this:
#!/usr/bin/env sh
# Populate tab with a tab character
tab="$(printf '\t')"
# Since printf's sub-shell added a trailing newline, remove it
tab="${tab%?}"
{
# Read first line in dummy variable _ to skip header
read -r _
# Iterate reading tab delimited x and y from each line
while IFS="$tab" read -r x y || [ -n "$x" ]; do
./script1 "$x" "$y"
./script2 "$x" "$y"
done
} < file.tsv # from this file
You could try just a while + read loop with the -a flag and IFS.
#!/usr/bin/env bash
while IFS=$' \t' read -ra line; do
echo ./script1 "${line[0]}" "${line[1]}"
echo ./script2 "${line[0]}" "${line[1]}"
done < <(tail -n +2 file.tsv)
Or without the tail
#!/usr/bin/env bash
skip=0 start=-1
while IFS=$' \t' read -ra line; do
if ((start++ >= skip)); then
echo ./script1 "${line[0]}" "${line[1]}"
echo ./script2 "${line[0]}" "${line[1]}"
fi
done < file.tsv
Remove the echo's if you're satisfied with the output.

bash for loop with same order as GNU "ls -v" ("version-number" sort)

In a bash script I want to do a typical "for file in somedir" but I want the files to be processed in the same order that "ls -v" returns them. I know the downfalls of using "ls" as a function. Is there some way to replicate "-v" without using "ls"? Thanks.
Assuming that this is "version number" sort order, this is also implemented by GNU sort. Thus, on a GNU platform:
somedir=/foo
while IFS= read -r -d '' filename; do
printf 'Processing file: %q\n' "$filename"
done < <(set -- "$somedir"/*; [[ -e $1 || -L $1 ]] && printf '%s\0' "$#" | sort -z -V)
If you really want to use a for loop rather than a while loop, parse into an array and iterate over that:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(set -- "$somedir"/*; [[ -e $1 || -L $1 ]] && printf '%s\0' "$#" | sort -z -V)
for filename in "${files[#]}"; do
printf 'Processing file: %q\n' "$filename"
done
To explain some of the magic above:
In < <(...), <(...) is a process substitution. It's replaced with a filename which, when read from, will return the output of the code enclosed. Thus, < <(...) will put that process substitution's output as the input to the while read loop. This loop form is described in BashFAQ #1. The reasons to use this kind of redirection instead of piping into the loop are given in BashFAQ #24.
set -- "$somedir"/* replaces the argument list within the current context (that context being the subshell running the process substitution!) with the results of "$somedir"/*; thus, (non-hidden, by default) contents of the directory named in the variable somedir.
[[ -e $1 || -L $1 ]] is true only if that glob expanded to at least one item; if it remained * (and no actual filesystem object exists by that name), gating output on this condition prevents the process substitution from emitting any output.
sort -z tells sort to delimit elements in both input and output with NULs -- a character that isn't allowed to exist in filenames.

Bash script to remove lines containing any of a list of words

I have a large config file that I use to define variables for a script to pull from it, each defined on a single line. It looks something like this:
var val
foo bar
foo1 bar1
foo2 bar2
I have gathered a list of out of date variables that I want to remove from the list. I could go through it manually, but I would like to do it with a script, which would be at least more stimulating. The file that contains the vlaues may contain multiple instances. The idea is to find the value, and if it's found, remove the entire line.
Does anyone know if this is possible? I know sed does this but I do not know how to make it use a file input.
#!/bin/bash
shopt -s extglob
REMOVE=(foo1 foo2)
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && echo "$LINE"
done < input_file.txt > output_file.txt
Or (Use with a copy first)
#!/bin/bash
shopt -s extglob
FILE=$1 REMOVE=("${#:2}")
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
SAVE=()
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && SAVE+=("$LINE")
done < "$FILE"
printf '%s\n' "${SAVE[#]}" > "$FILE"
Running with
bash script.sh your_config_file pattern1 pattern2 ...
Or
#!/bin/bash
shopt -s extglob
FILE=$1 PATTERNS_FILE=$2
readarray -t REMOVE < "$PATTERNS_FILE"
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
SAVE=()
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && SAVE+=("$LINE")
done < "$FILE"
printf '%s\n' "${SAVE[#]}" > "$FILE"
Running with
bash script.sh your_config_file patterns_file
Here's one with sed. Add words to the array. Then use
./script target_filename
(assuming you put the following in a file called script). (Not very efficient). I think it might be more efficient if we concat the words and put it in the regex like bbonev did
#!/bin/bash
declare -a array=("foo1" "foo2")
for i in "${array[#]}";
do
sed -i "/^${i}\s.*/d" $1
done
It's actually even simpler using file input
If you have a word file
word1
word2
word3
.....
then the following will do the job
#!/bin/bash
while read i;
do
sed -i "/^${i}\s.*/d" $2
done <$1
usage:
./script wordlist target_file

issue with bash : 2 variables instead of one

I've written this piece of code.
The aim is the following :
for each files in the temp list, it should take the first occurence of the list, put it into a variable called $name1 and then the second occurence of the list into a second variable called $name2. The variables are file names. With the 2 variables, I do a join.
for files in $(cat temp.lst); do
if [ $(cat temp.lst | wc -l) == 1 ]
then
name=$(head -1 temp.lst)
join -t\; -j 1 file_minus1.txt "$name" | sed 's/;;/;/g' > file1.txt
else
name1=$(head -1 temp.lst)
name2=$(head -2 temp.lst)
echo "var1 "$name1 "var2 "$name2
sed '1,2d' temp.lst > tmpfile.txt
mv tmpfile.txt temp.lst
join -t\; -j 1 "$name1" "$name2" | sed 's/;;/;/g' > file_minus1.txt
fi
;done
Theoretically, it should work but here it is not working, alas.
The echo line I've put in my code is giving me 3 variables instead of 2
var1 ei_etea17_m.tsv var2 ei_etea17_m.tsv ei_eteu25_m.tsv
Worse, the join is not functionning the way I thought, giving me this error code instead
join: ei_etea17_m.tsv
ei_eteu25_m.tsv: No such file or directory
Please find a sample of my temp.lst
ei_eteu27_m.tsv
ei_eteu28_m.tsv
ei_isbr_m.tsv
ei_isbu_m.tsv
ei_isin_m.tsv
Any suggestions are welcomed.
Best.
To extract 2 lines of a file in a loop, try this:
paste - - < temp.lst |
while read name1 name2; do
if [[ -z $name2 ]]; then
name2=$name1
name1=file_minus1.txt
output=file1.txt
else
output=file_minus1.txt
fi
join -t\; "$name1" "$name2" | sed 's/;;/;/g' > $output
done
Notes
the paste command takes 2 consecutive lines from the file and joins them into a single line (separated by tab)
demo: seq 7 | paste - -
read can assign to multiple variables: the line will be split on whitespace (default) and assigned to the named variables.
in the loop body, I basically follow your logic
To perform an n-way join, use recursion :)
recursive_join () {
# Zero files: do nothing (special case)
# One file: output it
# Multiple files: join the first with the result of joining the rest
file1=$1
shift || return
[ "$#" -eq 0 ] && cat "$file1" ||
recursive_join "$#" | join -t\; -j1 "$file1" -
}
recursive_join ei_eteu27_m.tsv ei_eteu28_m.tsv ei_isbr_m.tsv ei_isbu_m.tsv ei_isin_m.tsv
Adapting this to use a file listing the input files, rather than using command-line arguments, is a little tricker. As long as none of the input file names contain whitespace or other special characters, you could simply use
recursive_join $(cat temp.lst)
Or, if you want to avail yourself of bash features, you could use an array:
while read; do files+=("$REPLY"); done < temp.lst
recursive_join "${files[#]}"
or in bash 4:
readarray files < temp.list
recursive_join "${files[#]}"
However, if you want to stick with standard shell scripting only, it would be better to modify the recursive function to read the input file names from standard input. This makes the function a little uglier, since in order to detect if there is only one file left on standard input, we have to try to read a second one, and put it back on standard input if we succeed.
recursive_join () {
IFS= read -r file1 || return
IFS= read -r file2 &&
{ echo "$file2"; cat; } | recursive_join | join -t\; -j1 "$file1" - ||
cat "$file1"
}
recursive_join < temp.lst
Creating a function that can take either command-line arguments or read a list from standard input is left as an exercise for the reader.
Variable name1 is getting the first line.
Variable name2 is getting the first two lines.
If you want name2 to have only the second line you could try something like:
name2=$(sed -n '2p')
Also sed -i will remove the need for tmpfile.txt.
Ok Gents or Ladies.
I found out the Why.
head -1 temp.lst is only given the file name without the extension.
I need to find a way to include the extension. Doable.

text file multiply bash linux

for example i have a text file with 5 lines:
one
two
three
four
five
and i want to make a script to make a 2000 lines file containing loops of the file above
and it would look like
one
two
three
four
five
one
two
three
four
five
one
two
three
four
five
............repeat until n times is reached
Testing showed this to be about 100 times faster than the next best approach given so far.
#!/bin/bash
IN="${1}"
OUT="${2}"
for i in {1..2000}; do
echo "${IN}"
done | xargs cat > "${OUT}"
The reason this is so much faster is because it doesn't repeatedly open, seek to end, append, and close the output file. It opens the output file once, and streams the data to it in a single large, continuous write. It also invokes cat as few times as possible. It may invoke cat only once, even, depending on the system's maximum command line length and the length of the input file name.
If you need to repeat 2000 times
for i in {1..2000}; do cat "FILE"; done > NEW_FILE
Do you need 2000 lines or 2000 copies of the original file?
If the first:
infile='/path/to/inputfile'
outfile='/path/to/outputfile'
len=$(wc -l < "$infile")
for ((i=1; i<=2000/len; i++))
do
cat "$infile"
done > "$outfile.tmp" # you can use mktemp or tempfile if you want
head -n 2000 "$outfile.tmp" > "$outfile"
rm "$outfile.tmp"
If the second:
for i in {1..2000}; do cat "$infile"; done > "$outfile"
For a small input file (avoids the overhead of forking cat 2000 times):
file=$(<"$infile"); for i in {1..2000}; do echo "$file"; done > "$outfile"
Does it need to be a script? If you just want to quickly generate that you can open on vim, cut (press esc than 5dd to cut 5 lines) and than insert n times (press esc than n p to paste n times).
Edit: if you absolutely need a script and efficiency is not a problem, you can do this "dirty" trick:
i=0;
n=5;
while(($i < $n)) ; do
cat orginal_file >> new_file;
let i+=1;
done
file_size() {
cat -- "$#" |wc -l
}
mult_file() {
local \
max_lines="$1" \
iter_size \
iters \
i
shift 1
iter_size="$(file_size "$#")"
let iters=max_lines/iter_size+1
(for ((i=0; i<iters; ++i)); do
cat -- "$#"
done) |
head --lines="$max_lines"
}
mult_file "$#"
So you would call it like script.sh LINES FILE1 FILE2 FILE3 >REPEAT_FILE.
No process in the loop, no pipes:
infile='5.txt'
outfile='2000.txt'
n=$((2000/ $(wc -l < "$infile") )) # repetitions
> "$outfile" # empty output file
IFS=''
content=$(cat "$infile") # file content as string
for (( CNTR=0; CNTR<n; CNTR+=1 )); do
echo "$content" >> "$outfile"
done

Resources