Bash For loop - multiple variables, not using arrays? - bash

I have run into an issue that seems like it should have an easy answer, but I keep hitting walls.
I'm trying to create a directory structure that contains files that are named via two different variables. For example:
101_2465
203_9746
526_2098
I am looking for something that would look something like this:
for NUM1 in 101 203 526 && NUM2 in 2465 9746 2098
do
mkdir $NUM1_$NUM2
done
I thought about just setting the values of NUM1 and NUM2 into arrays, but it overcomplicated the script -- I have to keep each line of code as simple as possible, as it is being used by people who don't know much about coding. They are already familiar with a for loop set up using the example above (but only using 1 variable), so I'm trying to keep it as close to that as possible.
Thanks in advance!

while read NUM1 NUM2; do
mkdir ${NUM1}_$NUM2
done << END
101 2465
203 9746
526 2098
END
Note that underscore is a valid variable name character, so you need to use braces to disambiguate the name NUM1 from the underscore

...setting the values of NUM1 and NUM2 into arrays, but it overcomplicated the script...
No-no-no. Everything will be more complicated, than arrays.
NUM1=( 101 203 526 )
NUM2=( 2465 9746 2098 )
for (( i=0; i<${#NUM1}; i++ )); do
echo ${NUM1[$i]}_${NUM2[$i]}
done

One way is to separate the entries in your two variables by newlines, and then use paste to get them together:
a='101 203 526'
b='2465 9746 2098'
# Convert space-separated lists into newline-separated lists
a="$(echo $a | sed 's/ /\n/g')"
b="$(echo $b | sed 's/ /\n/g')"
# Acquire newline-separated list of tab-separated pairs
pairs="$(paste <(echo "$a") <(echo "$b"))"
# Loop over lines in $pairs
IFS='
'
for p in $pairs; do
echo "$p" | awk '{print $1 "_" $2}'
done
Output:
101_2465
203_9746
526_2098

Related

How to put a line from a file into a table (variable)

I have the following file
Durand 12 9 14
Lucas 8 11 4
Martin 9 12 1
I need to display the name and the average of the three other with a function. The function part is easy.
I thought I could get line by line with:
head -i notes | tail -1
and then put the result of the command in a table in order to access it
table=(head -i notes | tail -1)
echo "${table[0]} averge : moy ${table[1]} ${table[2]} ${table[3]}"
You might use three important concepts to approach a problem like this.
Iterate over a file
Store values as variables
Do math to variables
A good way to read a file line by line is with a while loop:
while read line; do echo $line; done < notes
Notice how we use a file redirect < to treat the file as standard input. read consumes one full line at a time. Let's expand on that in order to store separate variables.
while read name a b c; do echo $name $a $b $c; done < notes
Now let's get math involved. You could use an external program like bc, but that's inefficient if we don't need floating point math (decimals). Bash has math built in!
while read name a b c; do echo $name $(( (a + b + c) / 3 )); done < notes
Like you said, the function part is easy :)
awk one liner:
awk '{print $1, ($2+$3+$4)/3}' notes

Sorting and printing a file in bash UNIX

I have a file with a bunch of paths that look like so:
7 /usr/file1564
7 /usr/file2212
6 /usr/file3542
I am trying to use sort to pull out and print the path(s) with the most occurrences. Here it what I have so far:
cat temp| sort | uniq -c | sort -rk1 > temp
I am unsure how to only print the highest occurrences. I also want my output to be printed like this:
7 1564
7 2212
7 being the total number of occurrences and the other numbers being the file numbers at the end of the name. I am rather new to bash scripting so any help would be greatly appreciated!
To emit only the first line of output (with the highest number, since you're doing a reverse numeric sort immediately prior), pipe through head -n1.
To remove all content which is not either a number or whitespace, pipe through tr -cd '0-9[:space:]'.
To filter for only the values with the highest number, allowing there to be more than one:
{
read firstnum name && printf '%s\t%s\n' "$firstnum" "$name"
while read -r num name; do
[[ $num = $firstnum ]] || break
printf '%s\t%s\n' "$num" "$name"
done
} < temp
If you want to avoid sort and you are allowed to use awk, then you can do this:
awk '{
if($1>maxcnt) {s=$1" "substr($2,10,4); maxcnt=$1} else
if($1==maxcnt) {s=s "\n"$1" "substr($2,10,4)}} END{print s}' \
temp

for loop control in bash using a string

I want to use a string to control a for loop in bash. My first test code produces what I would expect and what I want:
$ aa='1 2 3 4'
$ for ii in $aa; do echo $ii; done
1
2
3
4
I'd like to use something like the following instead. This doesn't give the output I'd like (I can see why it does what it does).
$ aa='1..4'
$ for ii in $aa; do echo $ii; done
1..4
Any suggestions on how I should modify the second example to give the same output as the first?
Thanks in advance for any thoughts. I'm slowly learning bash but still have a lot to learn.
Mike
The notation could be written out as:
for ii in {1..4}; do echo "$ii"; done
but the {1..4} needs to be written out like that, no variables involved, and not as the result of variable substitution. That is brace expansion in the Bash manual, and it happens before string expansions, etc. You'll probably be best off using:
for ii in $(seq 1 4); do echo "$ii"; done
where either the 1 or the 4 or both can be shell variables.
You could use seq command (see man seq).
$ aa='1 4'
$ for ii in $(seq $aa); do echo $ii; done
Bash won't do brace expansion with variables, but you can use eval:
$ aa='1..4'
$ for ii in $(eval echo {$aa}); do echo $ii; done
1
2
3
4
You could also split aa into an array:
IFS=. arr=($aa)
for ((ii=arr[0]; ii<arr[2]; ii++)); do echo $ii; done
Note that IFS can only be a single character, so the .. range places the numbers into indexes 0 and 2.
Note There are certainly more elegant ways of doing this, as Ben Grimm's answer, and this is not pure bash, as uses seq and awk.
One way of achieving this is by calling seq. It would be trivial if you knew the numbers in the string beforehand, so there would be no need to do any conversion, as you could simple do seq 1 4 or seq $a $b for that matter.
I assume, however, that your input is indeed a string in the format you mentioned, that is, 1..4 or 20..100. For this purpose you could convert the string into 2 numbers ans use them as parameters for seq.
One of possibly many ways of achieving this is:
$ `echo "1..4" | sed -e 's/\.\./ /g' | awk '{print "seq", $1, $2}'`
1
2
3
4
Note that this will work the same way for any input in the given format. If desired, sed can be changed by tr with similar results.
$ x="10..15"
$ `echo $x | tr "." " " | awk '{print "seq", $1, $2}'`
10
11
12
13
14
15

Using awk with Operations on Variables

I'm trying to write a Bash script that reads files with several columns of data and multiplies each value in the second column by each value in the third column, adding the results of all those multiplications together.
For example if the file looked like this:
Column 1 Column 2 Column 3 Column 4
genome 1 30 500
genome 2 27 500
genome 3 83 500
...
The script should multiply 1*30 to give 30, then 2*27 to give 54 (and add that to 30), then 3*83 to give 249 (and add that to 84) etc..
I've been trying to use awk to parse the input file but am unsure of how to get the operation to proceed line by line. Right now it stops after the first line is read and the operations on the variables are performed.
Here's what I've written so far:
for file in fileone filetwo
do
set -- $(awk '/genome/ {print $2,$3}' $file.hist)
var1=$1
var2=$2
var3=$((var1*var2))
total=$((total+var3))
echo var1 \= $var1
echo var2 \= $var2
echo var3 \= $var3
echo total \= $total
done
I tried placing a "while read" loop around everything but could not get the variables to update with each line. I think I'm going about this the wrong way!
I'm very new to Linux and Bash scripting so any help would be greatly appreciated!
That's because awk reads the entire file and runs its program on each line. So the output you get from awk '/genome/ {print $2,$3}' $file.hist will look like
1 30
2 27
3 83
and so on, which means in the bash script, the set command makes the following variable assignments:
$1 = 1
$2 = 30
$3 = 2
$4 = 27
$5 = 3
$6 = 83
etc. But you only use $1 and $2 in your script, meaning that the rest of the file's contents - everything after the first line - is discarded.
Honestly, unless you're doing this just to learn how to use bash, I'd say just do it in awk. Since awk automatically runs over every line in the file, it'll be easy to multiply columns 2 and 3 and keep a running total.
awk '{ total += $2 * $3 } ENDFILE { print total; total = 0 }' fileone filetwo
Here ENDFILE is a special address that means "run this next block at the end of each file, not at each line."
If you are doing this for educational purposes, let me say this: the only thing you need to know about doing arithmetic in bash is that you should never do arithmetic in bash :-P Seriously though, when you want to manipulate numbers, bash is one of the least well-adapted tools for that job. But if you really want to know, I can edit this to include some information on how you could do this task primarily in bash.
I agree that awk is in general better suited for this kind of work, but if you are curious what a pure bash implementation would look like:
for f in file1 file2; do
total=0
while read -r _ x y _; do
((total += x * y))
done < "$f"
echo "$total"
done

What's an easy way to read random line from a file?

What's an easy way to read random line from a file in a shell script?
You can use shuf:
shuf -n 1 $FILE
There is also a utility called rl. In Debian it's in the randomize-lines package that does exactly what you want, though not available in all distros. On its home page it actually recommends the use of shuf instead (which didn't exist when it was created, I believe). shuf is part of the GNU coreutils, rl is not.
rl -c 1 $FILE
Another alternative:
head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1
sort --random-sort $FILE | head -n 1
(I like the shuf approach above even better though - I didn't even know that existed and I would have never found that tool on my own)
This is simple.
cat file.txt | shuf -n 1
Granted this is just a tad slower than the "shuf -n 1 file.txt" on its own.
perlfaq5: How do I select a random line from a file? Here's a reservoir-sampling algorithm from the Camel Book:
perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' file
This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
using a bash script:
#!/bin/bash
# replace with file to read
FILE=tmp.txt
# count number of lines
NUM=$(wc - l < ${FILE})
# generate random number in range 0-NUM
let X=${RANDOM} % ${NUM} + 1
# extract X-th line
sed -n ${X}p ${FILE}
Single bash line:
sed -n $((1+$RANDOM%`wc -l test.txt | cut -f 1 -d ' '`))p test.txt
Slight problem: duplicate filename.
Here's a simple Python script that will do the job:
import random, sys
lines = open(sys.argv[1]).readlines()
print(lines[random.randrange(len(lines))])
Usage:
python randline.py file_to_get_random_line_from
Another way using 'awk'
awk NR==$((${RANDOM} % `wc -l < file.name` + 1)) file.name
A solution that also works on MacOSX, and should also works on Linux(?):
N=5
awk 'NR==FNR {lineN[$1]; next}(FNR in lineN)' <(jot -r $N 1 $(wc -l < $file)) $file
Where:
N is the number of random lines you want
NR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2
--> save line numbers written in file1 and then print corresponding line in file2
jot -r $N 1 $(wc -l < $file) --> draw N numbers randomly (-r) in range (1, number_of_line_in_file) with jot. The process substitution <() will make it look like a file for the interpreter, so file1 in previous example.
#!/bin/bash
IFS=$'\n' wordsArray=($(<$1))
numWords=${#wordsArray[#]}
sizeOfNumWords=${#numWords}
while [ True ]
do
for ((i=0; i<$sizeOfNumWords; i++))
do
let ranNumArray[$i]=$(( ( $RANDOM % 10 ) + 1 ))-1
ranNumStr="$ranNumStr${ranNumArray[$i]}"
done
if [ $ranNumStr -le $numWords ]
then
break
fi
ranNumStr=""
done
noLeadZeroStr=$((10#$ranNumStr))
echo ${wordsArray[$noLeadZeroStr]}
Here is what I discovery since my Mac OS doesn't use all the easy answers. I used the jot command to generate a number since the $RANDOM variable solutions seems not to be very random in my test. When testing my solution I had a wide variance in the solutions provided in the output.
RANDOM1=`jot -r 1 1 235886`
#range of jot ( 1 235886 ) found from earlier wc -w /usr/share/dict/web2
echo $RANDOM1
head -n $RANDOM1 /usr/share/dict/web2 | tail -n 1
The echo of the variable is to get a visual of the generated random number.
Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows:
sed -n $(awk 'END {srand(); r=rand()*NR; if (r<NR) {sub(/\..*/,"",r); r++;}; print r}' FILENAME)p FILENAME
(This works even if FILENAME is empty, in which case no line is emitted.)
One possible advantage of this approach is that it only calls rand() once.
As pointed out by #AdamKatz in the comments, another possibility would be to call rand() for each line:
awk 'rand() * NR < 1 { line = $0 } END { print line }' FILENAME
(A simple proof of correctness can be given based on induction.)
Caveat about rand()
"In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk."
-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html

Resources