bash shell script two variables in for loop - bash

I am new to shell scripting. so kindly bear with me if my doubt is too silly.
I have png images in 2 different directories and an executable which takes an images from each directory and processes them to generate a new image.
I am looking for a for loop construct which can take two variables simultaneously..this is possible in C, C++ etc but how do I accomplish something of the following. The code is obviously wrong.
#!/bin/sh
im1_dir=~/prev1/*.png
im2_dir=~/prev3/*.png
index=0
for i,j in $im1_dir $im2_dir # i iterates in im1_dir and j iterates in im2_dir
do
run_black.sh $i $j
done
thanks!

If you are depending on the two directories to match up based on a locale sorted order (like your attempt), then an array should work.
im1_files=(~/prev1/*.png)
im2_files=(~/prev3/*.png)
for ((i=0;i<=${#im1_files[#]};i++)); do
run_black.sh "${im1_files[i]}" "${im2_files[i]}"
done

Here are a few additional ways to do what you're looking for with notes about the pros and cons.
The following only works with filenames that do not include newlines. It pairs the files in lockstep. It uses an extra file descriptor to read from the first list. If im1_dir contains more files, the loop will stop when im2_dir runs out. If im2_dir contains more files, file1 will be empty for all unmatched file2. Of course if they contain the same number of files, there's no problem.
#!/bin/bash
im1_dir=(~/prev1/*.png)
im2_dir=(~/prev3/*.png)
exec 3< <(printf '%s\n' "${im1_dir[#]}")
while IFS=$'\n' read -r -u 3 file1; read -r file2
do
run_black "$file1" "$file2"
done < <(printf '%s\n' "${im1_dir[#]}")
exec 3<&-
You can make the behavior consistent so that the loop stops with only non-empty matched files no matter which list is longer by replacing the semicolon with a double ampersand like so:
while IFS=$'\n' read -r -u 3 file1 && read -r file2
This version uses a for loop instead of a while loop. This one stops when the shorter of the two lists run out.
#!/bin/bash
im1_dir=(~/prev1/*.png)
im2_dir=(~/prev3/*.png)
for ((i = 0; i < ${#im1_dir[#]} && i < ${#im2_dir[#]}; i++))
do
run_black "${im1_dir[i]}" "${im2_dir[i]}"
done
This version is similar to the one immediately above, but if one of the lists runs out it wraps around to reuse the items until the other one runs out. It's very ugly and you could do the same thing another way more simply.
#!/bin/bash
im1_dir=(~/prev1/*.png)
im2_dir=(~/prev3/*.png)
for ((i = 0, j = 0,
n1 = ${#im1_dir[#]},
n2 = ${#im2_dir[#]},
s = n1 >= n2 ? n1 : n2,
is = 0, js = 0;
is < s && js < s;
i++, is = i, i %= n1,
j++, js = j, j %= n2))
do
run_black "${im1_dir[i]}" "${im2_dir[i]}"
done
This version only uses an array for the inner loop (second directory). It will only execute as many times as there are files in the first directory.
#!/bin/bash
im1_dir=~/prev1/*.png
im2_dir=(~/prev3/*.png)
for file1 in $im1_dir
do
run_black "$file1" "${im2_dir[i++]}"
done

If you don't mind going off the beaten path (bash), the Tool Command Language (TCL) has such a loop construct:
#!/usr/bin/env tclsh
set list1 [glob dir1/*]
set list2 [glob dir2/*]
foreach item1 $list1 item2 $list2 {
exec command_name $item1 $item2
}
Basically, the loop reads: for each item1 taken from list1, and item2 taken from list2. You can then replace command_name with your own command.

This might be another way to use two variables in the same loop. But you need to know the total number of files (or, the number of times you want to run the loop) in the directory to use it as the value of iteration i.
Get the number of files in the directory:
ls /path/*.png | wc -l
Now run the loop:
im1_dir=(~/prev1/*.png)
im2_dir=(~/prev3/*.png)
for ((i = 0; i < 4; i++)); do run_black.sh ${im1_dir[i]} ${im2_dir[i]}; done
For more help please see this discussion.

I have this problem for a similar situation where I want a top and bottom range simultaneously. Here was my solution; it's not particularly efficient but it's easy and clean and not at all complicated with icky BASH arrays and all that nonsense.
SEQBOT=$(seq 0 5 $((PEAKTIME-5)))
SEQTOP=$(seq 5 5 $((PEAKTIME-0)))
IDXBOT=0
IDXTOP=0
for bot in $SEQBOT; do
IDXTOP=0
for top in $SEQTOP; do
if [ "$IDXBOT" -eq "$IDXTOP" ]; then
echo $bot $top
fi
IDXTOP=$((IDXTOP + 1))
done
IDXBOT=$((IDXBOT + 1))
done

It is very simple you can use two for loop functions in this problem.
#bin bash
index=0
for i in ~/prev1/*.png
do
for j ~/prev3/*.png
do
run_black.sh $i $j
done
done

The accepted answer can be further simplified using the ${!array[#]} syntax to iterate over array's indexes:
a=(x y z); b=(q w e); for i in ${!a[#]}; do echo ${a[i]}-${b[i]}; done

Another solution. The two lists with filenames are pasted into one.
paste <(ls --quote-name ~/prev1/*.png) <(ls --quote-name ~/prev3/*.png) | \
while read args ; do
run_black $args
done

Related

Listing skipped numbers in a large txt file using bash

I need to find a way to display the missing numbers from a large txt file. It's a web graph that has 875,713 vertices. However, when I sort the file the largest number that is displayed at the end is 916,427. So there are some numbers not being used for vertex index. Is there a bash command I could use to do this?
I found this after searching around some other threads but I'm not entirely sure if its correct:
awk 'NR != $1 { for (i = prev + 1; i < $1; i++) {print i} } { prev = $1 + 1 }' file
Assuming the 'number' of each vertex is in the first column, you can use:
awk '{a[$1]} END{for(i = 1; i <= 916427; i++){if(!(i in a)){print i}}}' file
E.g.
# create some example data and remove "10"
seq 916427 | sed '10d' > test.txt
head test.txt
1
2
3
4
5
6
7
8
9
11
awk '{a[$1]} END { for (i = 1; i <= 916427; i++) { if (!(i in a)) {print i}}}' test.txt
10
If you don't want to store the array in memory (otherwise #jared_mamrot solution would work), you can use
awk 'NR==1 {p=$1; next} {for (i=p+1; i<$1; i++) {print i}; p=$1}' < <( sort -n file)
which sorts the file first.
Just because you tagged your question bash, I'll provide a bash solution. :)
# sample data as jared suggested, with 10 removed...
seq 916427 | sed '10d' > test.txt
# read sample data into an array...
mapfile -t a < test.txt
# reverse the $a array into $b
for i in "${a[#]}"; do b[$i]=1; done
# step through list of possible numbers, testing if each one is an index of $b
for ((i=1; i<${a[((${#a[#]}-1))]}; i++)); do [[ -z ${b[i]} ]] && echo $i; done
The line noise in the last line (${a[((${#a[#]}-1))]}) simply means "the value of the last array element", and the -1 is there because without instructions otherwise, mapfile starts numbering things at zero.
This takes a little longer to run than awk, because awk is awesome. But it runs in bash without calling any external tools. Aside from the ones generating our sample data, of course!
Note that the last line verifies $b array membership with a string comparison. You might get a very slight performance increase by doing a math comparison instead ((( ${b[i]} )) || echo $i) but the improvement would be so small that it's not even worth mentioning. Oh dang.
Note also that both this and the awk solution involve creating very large arrays in memory, then stepping through those arrays. Be careful of your memory, and don't waste array space with unnecessary data. You will probably want to pull just your indices out of your original dataset for this comparison, rather than loading everything into a bash or awk array.

How to sort 2 arrays in bash

I want to sort 2 arrays at the same time. The arrays are the following: wordArray and numArray. Both are global.
These 2 arrays contain all the words (without duplicates) and the number of the appearances of each word from a text file.
Right now I am using Bubble Sort to sort both of them at the same time:
# Bubble Sort function
function bubble_sort {
local max=${#numArray[#]}
size=${#numArray[#]}
while ((max > 0))
do
local i=0
while ((i < max))
do
if [ "$i" != "$(($size-1))" ]
then
if [ ${numArray[$i]} \< ${numArray[$((i + 1))]} ]
then
local temp=${numArray[$i]}
numArray[$i]=${numArray[$((i + 1))]}
numArray[$((i + 1))]=$temp
local temp2=${wordArray[$i]}
wordArray[$i]=${wordArray[$((i + 1))]}
wordArray[$((i + 1))]=$temp2
fi
fi
((i += 1))
done
((max -= 1))
done
}
#Calling Bubble Sort function
bubble_sort "${numArray[#]}" "${wordArray[#]}"
But for some reason it won't sort them properly when large arrays are in place.
Does anyone knows what's wrong with it or an other approach to sort the words with the corresponding number of appearance with or without arrays?
This:
wordArray = (because, maybe, why, the)
numArray = (5, 12, 20, 13)
Must turn to this:
wordArray = (why, the, maybe, because)
numArray = (20, 13, 12, 5)
Someone recommended to write the two arrays side by side in a text file and sort the file.
How will it work for this input:
1 Arthur
21 Zebra
to turn to this output:
21 Zebra
1 Arthur
Assuming the arrays do not contain tab character or newline character, how about:
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
tmp1=$(mktemp tmp.XXXXXX) # file to be sorted
tmp2=$(mktemp tmp.XXXXXX) # sorted result
for (( i = 0; i < ${#wordArray[#]}; i++ )); do
echo "${numArray[i]}"$'\t'"${wordArray[i]}" # write the number and word delimited by a tab character
done > "$tmp1"
sort -nrk1,1 "$tmp1" > "$tmp2" # sort the file by number in descending order
while IFS=$'\t' read -r num word; do # read the lines splitting by the tab character
numArray_sorted+=("$num") # add the number to the array
wordArray_sorted+=("$word") # add the word to the array
done < "$tmp2"
rm -- "$tmp1" # unlink the temp file
rm -- "$tmp2" # same as above
echo "${wordArray_sorted[#]}" # same as above
echo "${numArray_sorted[#]}" # see the result
Output:
why the maybe because
20 13 12 5
If you prefer not to create temp files, here is the process substitution version, which will run faster without writing/reading temp files.
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
while IFS=$'\t' read -r num word; do
numArray_sorted+=("$num")
wordArray_sorted+=("$word")
done < <(
sort -nrk1,1 < <(
for (( i = 0; i < ${#wordArray[#]}; i++ )); do
echo "${numArray[i]}"$'\t'"${wordArray[i]}"
done
)
)
echo "${wordArray_sorted[#]}"
echo "${numArray_sorted[#]}"
Or simpler (using the suggestion by KamilCuk):
#!/bin/bash
wordArray=(why the maybe because)
numArray=(20 13 12 5)
while IFS=$'\t' read -r num word; do
numArray_sorted+=("$num")
wordArray_sorted+=("$word")
done < <(
paste <(printf "%s\n" "${numArray[#]}") <(printf "%s\n" "${wordArray[#]}") | sort -nrk1,1
)
echo "${wordArray_sorted[#]}"
echo "${numArray_sorted[#]}"
You need numeric sort for the numbers. You can sort an array like this:
mapfile -t wordArray <(printf '%s\n' "${wordArray[#]}" | sort -n)
But what you actually need is something like:
for num in "${numArray[#]}"; do
echo "$num: ${wordArray[j++]}"
done |
sort -n k1,1
But, earlier in the process, you should have used only one array, where the word and frequency (or vice versa) are key value pairs. Then they always have a direct relationship, and can be printed similarly to the for loop above.

BASH: How to write values generated by a for loop to a file quickly

I have a for loop in bash that writes values to a file. However, because there are a lot of values, the process takes a long time, which I think can be saved by improving the code.
nk=1152
nb=24
for k in $(seq 0 $((nk-1))); do
for i in $(seq 0 $((nb-1))); do
for j in $(seq 0 $((nb-1))); do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
I've moved the output action to after the entire loop is done rather than echo -e "$k\t$i\t$j" >> file.dat to avoid opening and closing the file many times. However, the speed the script writes to the file is still rather slow, ~ 10kbps.
Is there a better way to improve the IO?
Many thanks
Jacek
It looks like the seq calls are fairly punishing since that is a separate process. Try this just using shell math instead:
for ((k=0;k<=$nk-1;k++)); do
for ((i=0;i<=$nb-1;i++)); do
for ((j=0;j<=$nb-1;j++)); do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
It takes just 7.5s on my machine.
Another way is to compute the sequences just once and use them repeatedly, saving a lot of shell calls:
nk=1152
nb=24
kseq=$(seq 0 $((nk-1)))
bseq=$(seq 0 $((nb-1)))
for k in $kseq; do
for i in $bseq; do
for j in $bseq; do
echo -e "$k\t$i\t$j"
done
done
done > file.dat
This is not really "better" than the first option, but it shows how much of the time is spent spinning up instances of seq versus actually getting stuff done.
Bash isn't always the best for this. Consider this Ruby equivalent which runs in 0.5s:
#!/usr/bin/env ruby
nk=1152
nb=24
nk.times do |k|
nb.times do |i|
nb.times do |j|
puts "%d\t%d\t%d" % [ k, i, j ]
end
end
end
What is the most time consuming is calling seq in a nested loop. Keep in mind that each time you call seq it loads command from disk, fork a process to run it, capture the output, and store the whole output sequence into memory.
Instead of calling seq you could use an arithmetic loop:
#!/usr/bin/env bash
declare -i nk=1152
declare -i nb=24
declare -i i j k
for ((k=0; k<nk; k++)); do
for (( i=0; i<nb; i++)); do
for (( j=0; j<nb; j++)); do
printf '%d\t%d\t%d\n' "$k" "$i" "$j"
done
done
done > file.dat
Running seq in a subshell consumes most of the time.
Switch to a different language that provides all the needed features without shelling out. For example, in Perl:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $nk = 1152;
my $nb = 24;
for my $k (0 .. $nk - 1) {
for my $i (0 .. $nb - 1) {
for my $j (0 .. $nb - 1) {
say "$k\t$i\t$j"
}
}
}
The original bash solution runs for 22 seconds, the Perl one finishes in 0.1 seconds. The output is identical.
#Jacek : I don't think the I/O is the problem, but the number of child processes spawned. I would store the result of the seq 0 $((nb-1)) into an array and loop over the array, i.e.
nb_seq=( $(seq 0 $((nb-1)) )
...
for i in "${nb_seq[#]}"; do
for j in "${nb_seq[#]}"; do
seq is bad) once i've done this function special for this case:
$ que () { printf -v _N %$1s; _N=(${_N// / 1}); printf "${!_N[*]}"; }
$ que 10
0 1 2 3 4 5 6 7 8 9
And you can try to write first all to a var and then whole var into a file:
store+="$k\t$i\t$j\n"
printf "$store" > file
No. it's even worse like that)

How to loop argument in bash to call a function

I'd like to apologize if my question had already been asked, but english isn't my native language and I didn't find the answer. I'd like to have a bash script that executes a program I'll call MyProgram, and I want it to run with a fixed number of arguments which consist in random numbers. I'd like to have something like this:
./MyProgram for(i = 0; i < 1000; i++) $(($RANDOM%200-100))
How should I go about this?
You (mostly) just have the loop and the actual program call inverted.
for ((i=0; i < 1000; i++)); do
./MyProgram $((RANDOM%200 - 100))
done
If, however, you actually want 1000 different arguments passed to a single call, you have to build up a list first.
args=()
for ((i=0; i < 1000; i++)); do
args+=( $((RANDOM%200 - 100)) )
done
./MyProgram "${args[#]}"
The
$RANDOM % 200 - 100
is the same as the next perl
perl -E 'say int(200*rand() -100) for (1..1000)'
e.g. the
perl -E 'say int(200*rand() -100) for (1..1000)' | xargs -n1 ./MyProgram
will run like:
./MyProgram -10
./MyProgram 13
... 1000 times ...
./MyProgram 55
./MyProgram -31
if you need 1000 args
./MyProgram $(perl -E 'say int(200*rand() -100) for (1..1000)')
will produce
./MyProgram 5 -41 -81 -79 -14 ... 1000 numbers ... -63 -9 95 -9 -29
In addition to what #chepner says, you can also use the for ... in style of for loop. This looks like:
for a in one two three; do
echo "${a}"
done
which would produce the result:
one
two
three
In other words, the list of words after the in part, separated by spaces, is looped over, with each iteration of the loop having a different word in the variable a.
To call your program 1000 times (or just modify to produce the list of arguments to run it once as in #chepner's answer) you could then do:
for a in $(seq 1 1000); do
./MyProgram $((RANDOM%200 - 100))
done
where the output of the seq command is providing the list of values to loop over. Although the traditional for loop may be more immediately obvious to many programmers, I like for ... in because it can be applied in lots of situations. A crude and mostly pointless ls, for example:
for a in *; do
echo "${a}"
done
for ... in is probably the bit of "advanced" bash that I find the most useful, and make use of it very frequently.

Bash for loop - naming after n (which is user's input) [duplicate]

This question already has answers here:
How do I iterate over a range of numbers defined by variables in Bash?
(20 answers)
Closed 10 years ago.
I am looping over the commands with for i in {1..n} loop and want output files to have n extension.
For example:
for i in {1..2}
do cat FILE > ${i}_output
done
However n is user's input:
echo 'Please enter n'
read number
for i in {1.."$number"}
do
commands > ${i}_output
done
Loop rolls over n times - this works fine, but my output looks like this {1..n}_output.
How can I name my files in such loop?
Edit
Also tried this
for i in {1.."$number"}
do
k=`echo ${n} | tr -d '}' | cut -d "." -f 3`
commands > ${k}_output
done
But it's not working.
Use a "C-style" for-loop:
echo 'Please enter n'
read number
for ((i = 1; i <= number; i++))
do
commands > ${i}_output
done
Note that the $ is not required ahead of number or i in the for-loop header but double-parentheses are required.
The range parameter in for loop works only with constant values. So replace {1..$num} with a value like: {1..10}.
OR
Change the for loop to:
for((i=1;i<=number;i++))
You can use a simple for loop ( similar to the ones found in langaues like C, C++ etc):
echo 'Please enter n'
read number
for (( i=1; i <= $number; i++ ))
do
commands > ${i}_output
done
Try using seq (1) instead. As in for i in $(seq 1 $number).

Resources