How to put a line from a file into a table (variable) - bash

I have the following file
Durand 12 9 14
Lucas 8 11 4
Martin 9 12 1
I need to display the name and the average of the three other with a function. The function part is easy.
I thought I could get line by line with:
head -i notes | tail -1
and then put the result of the command in a table in order to access it
table=(head -i notes | tail -1)
echo "${table[0]} averge : moy ${table[1]} ${table[2]} ${table[3]}"

You might use three important concepts to approach a problem like this.
Iterate over a file
Store values as variables
Do math to variables
A good way to read a file line by line is with a while loop:
while read line; do echo $line; done < notes
Notice how we use a file redirect < to treat the file as standard input. read consumes one full line at a time. Let's expand on that in order to store separate variables.
while read name a b c; do echo $name $a $b $c; done < notes
Now let's get math involved. You could use an external program like bc, but that's inefficient if we don't need floating point math (decimals). Bash has math built in!
while read name a b c; do echo $name $(( (a + b + c) / 3 )); done < notes
Like you said, the function part is easy :)
awk one liner:
awk '{print $1, ($2+$3+$4)/3}' notes

Related

How To Split Up Digits Into Character Array

I'm a bit stuck with something. I have a for loop like this:
#!/bin/bash
for i in {10..15}
do
I want to obtain the last digit of the number, so if i is 12, I want to get 2. I'm having difficulties with the syntax though. I've read that I should convert it into a character array, but when I do something like:
j=${i[#]}
echo $j
I don't get 1 0 1 1 1 2 and so on...I get 10, 11, 12...How do I get the numbers to be split up so I can get the last one of i, when I don't always know how many digits will make up i (ex. it may be 1, or 10, or a 100, etc.)?
Trick is to treat $i like a string.
for i in {10..15}; do j="${i: -1}"; echo $j; done
Of course, you do not need to assign to a variable if you don't want to:
for i in {10..15}; do echo "${i: -1}"; done
This answer which uses GNU shell parameter expansion is the most sensible method, I guess.
However, you can also use the double parenthesis construct which allows C-style manipulation of variables in Bash.
for i in {10..15}
do
(( j = i % 10 )) # modulo 10 always gives the ones' digit
echo $j
done
This awk command could solve your problem:
awk '{print substr($0,length,1)}' test_file
I'm assuming that the numbers are saved in a file test_file
If you want to use for loop:
for i in `cat test_1`
do
echo $i |tail -c 2
done

Sorting on multiple columns w/ an output file per key

I'm uncertain as to how I can use the until loop inside a while loop.
I have an input file of 500,000 lines that look like this:
9 1 1 0.6132E+02
9 2 1 0.6314E+02
10 3 1 0.5874E+02
10 4 1 0.5266E+02
10 5 1 0.5571E+02
1 6 1 0.5004E+02
1 7 1 0.5450E+02
2 8 1 0.5696E+02
11 9 1 0.6369E+02
.....
And what I'm hoping to achieve is to sort the numbers in the first column in numerical order such that I can pull all the similar lines (eg. lines that start with the same number) into new text files "cluster${i}.txt". From there I want to sort the fourth column of ("cluster${i}.txt") files in numerical order. After sorting I would like to write the first row of each sorted "cluster${i}.txt" file into a single output file. A sample output of "cluster1.txt" would like this:
1 6 1 0.5004E+02
1 7 1 0.5450E+02
1 11 1 0.6777E+02
....
as well as an output.txt file that would look like this:
1 6 1 0.5004E+02
2 487 1 0.3495E+02
3 34 1 0.0344E+02
....
Here is what I've written:
#!/bin/bash
input='input.txt'
i=1
sort -nk 1 $input > 'temp.txt'
while read line; do
awk -v var="$i" '$1 == var' temp.txt > "cluster${i}.txt"
until [[$i -lt 20]]; do
i=$((i+1))
done
done
for f in *.txt; do
sort -nk 4 > temp2.txt
head -1 temp2.txt
rm temp2.txt
done > output.txt
This only takes one line, if your sort -n knows how to handle exponential notation:
sort -nk 1,4 <in.txt | awk '{ of="cluster" $1 ".txt"; print $0 >>of }'
...or, to also write the first line for each index to output.txt:
sort -nk 1,4 <in.txt | awk '
{
if($1 != last) {
print $0 >"output.txt"
last=$1
}
of="cluster" $1 ".txt";
print $0 >of
}'
Consider using an awk implementation -- such as GNU awk -- which will cache file descriptors, rather than reopening each output file for every append; this will greatly improve performance.
By the way, let's look at what was wrong with the original script:
It was slow. Really, really slow.
Starting a new instance of awk 20 times for every line of input (because the whole point of while read is to iterate over individual lines, so putting an awk inside a while read is going to run awk at least once per line) is going to have a very appreciable impact on performance. Not that it was actually doing this, because...
The while read line outer loop was reading from stdin, not temp.txt or input.txt.
Thus, the script was hanging if stdin didn't have anything written on it, or wasn't executing the contents of the loop at all if stdin pointed to a source with no content like /dev/null.
The inner loop wasn't actually processing the line read by the outer loop. line was being read, but all of temp.txt was being operated on.
The awk wasn't actually inside the inner loop, but rather was inside the outer loop, just before the inner loop. Consequently, it wasn't being run 20 times with different values for i, but run only once per line read, with whichever value for i was left over from previously executed code.
Whitespace is important to how commands are parsed. [[foo]] is wrong; it needs to be [[ foo ]].
To "fix" the inner loop, to do what I imagine you meant to write, might look like this:
# this is slow and awful, but at least it'll work.
while IFS= read -r line; do
i=0
until [[ $i -ge 20 ]]; do
awk -v var="$i" '$1 == var' <<<"$line" >>"cluster${i}.txt"
i=$((i+1))
done
done <temp.txt
...or, somewhat better (but still not as good as the solution suggested at the top):
# this is a somewhat less awful.
for (( i=0; i<=20; i++ )); do
awk -v var="$i" '$1 == var' <temp.txt >"cluster${i}.txt"
head -n 1 "cluster${i}.txt"
done >output.txt
Note how the redirection to output.txt is done just once, for the whole loop -- this means we're only opening the file once.

for loop control in bash using a string

I want to use a string to control a for loop in bash. My first test code produces what I would expect and what I want:
$ aa='1 2 3 4'
$ for ii in $aa; do echo $ii; done
1
2
3
4
I'd like to use something like the following instead. This doesn't give the output I'd like (I can see why it does what it does).
$ aa='1..4'
$ for ii in $aa; do echo $ii; done
1..4
Any suggestions on how I should modify the second example to give the same output as the first?
Thanks in advance for any thoughts. I'm slowly learning bash but still have a lot to learn.
Mike
The notation could be written out as:
for ii in {1..4}; do echo "$ii"; done
but the {1..4} needs to be written out like that, no variables involved, and not as the result of variable substitution. That is brace expansion in the Bash manual, and it happens before string expansions, etc. You'll probably be best off using:
for ii in $(seq 1 4); do echo "$ii"; done
where either the 1 or the 4 or both can be shell variables.
You could use seq command (see man seq).
$ aa='1 4'
$ for ii in $(seq $aa); do echo $ii; done
Bash won't do brace expansion with variables, but you can use eval:
$ aa='1..4'
$ for ii in $(eval echo {$aa}); do echo $ii; done
1
2
3
4
You could also split aa into an array:
IFS=. arr=($aa)
for ((ii=arr[0]; ii<arr[2]; ii++)); do echo $ii; done
Note that IFS can only be a single character, so the .. range places the numbers into indexes 0 and 2.
Note There are certainly more elegant ways of doing this, as Ben Grimm's answer, and this is not pure bash, as uses seq and awk.
One way of achieving this is by calling seq. It would be trivial if you knew the numbers in the string beforehand, so there would be no need to do any conversion, as you could simple do seq 1 4 or seq $a $b for that matter.
I assume, however, that your input is indeed a string in the format you mentioned, that is, 1..4 or 20..100. For this purpose you could convert the string into 2 numbers ans use them as parameters for seq.
One of possibly many ways of achieving this is:
$ `echo "1..4" | sed -e 's/\.\./ /g' | awk '{print "seq", $1, $2}'`
1
2
3
4
Note that this will work the same way for any input in the given format. If desired, sed can be changed by tr with similar results.
$ x="10..15"
$ `echo $x | tr "." " " | awk '{print "seq", $1, $2}'`
10
11
12
13
14
15

Using awk with Operations on Variables

I'm trying to write a Bash script that reads files with several columns of data and multiplies each value in the second column by each value in the third column, adding the results of all those multiplications together.
For example if the file looked like this:
Column 1 Column 2 Column 3 Column 4
genome 1 30 500
genome 2 27 500
genome 3 83 500
...
The script should multiply 1*30 to give 30, then 2*27 to give 54 (and add that to 30), then 3*83 to give 249 (and add that to 84) etc..
I've been trying to use awk to parse the input file but am unsure of how to get the operation to proceed line by line. Right now it stops after the first line is read and the operations on the variables are performed.
Here's what I've written so far:
for file in fileone filetwo
do
set -- $(awk '/genome/ {print $2,$3}' $file.hist)
var1=$1
var2=$2
var3=$((var1*var2))
total=$((total+var3))
echo var1 \= $var1
echo var2 \= $var2
echo var3 \= $var3
echo total \= $total
done
I tried placing a "while read" loop around everything but could not get the variables to update with each line. I think I'm going about this the wrong way!
I'm very new to Linux and Bash scripting so any help would be greatly appreciated!
That's because awk reads the entire file and runs its program on each line. So the output you get from awk '/genome/ {print $2,$3}' $file.hist will look like
1 30
2 27
3 83
and so on, which means in the bash script, the set command makes the following variable assignments:
$1 = 1
$2 = 30
$3 = 2
$4 = 27
$5 = 3
$6 = 83
etc. But you only use $1 and $2 in your script, meaning that the rest of the file's contents - everything after the first line - is discarded.
Honestly, unless you're doing this just to learn how to use bash, I'd say just do it in awk. Since awk automatically runs over every line in the file, it'll be easy to multiply columns 2 and 3 and keep a running total.
awk '{ total += $2 * $3 } ENDFILE { print total; total = 0 }' fileone filetwo
Here ENDFILE is a special address that means "run this next block at the end of each file, not at each line."
If you are doing this for educational purposes, let me say this: the only thing you need to know about doing arithmetic in bash is that you should never do arithmetic in bash :-P Seriously though, when you want to manipulate numbers, bash is one of the least well-adapted tools for that job. But if you really want to know, I can edit this to include some information on how you could do this task primarily in bash.
I agree that awk is in general better suited for this kind of work, but if you are curious what a pure bash implementation would look like:
for f in file1 file2; do
total=0
while read -r _ x y _; do
((total += x * y))
done < "$f"
echo "$total"
done

What's an easy way to read random line from a file?

What's an easy way to read random line from a file in a shell script?
You can use shuf:
shuf -n 1 $FILE
There is also a utility called rl. In Debian it's in the randomize-lines package that does exactly what you want, though not available in all distros. On its home page it actually recommends the use of shuf instead (which didn't exist when it was created, I believe). shuf is part of the GNU coreutils, rl is not.
rl -c 1 $FILE
Another alternative:
head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1
sort --random-sort $FILE | head -n 1
(I like the shuf approach above even better though - I didn't even know that existed and I would have never found that tool on my own)
This is simple.
cat file.txt | shuf -n 1
Granted this is just a tad slower than the "shuf -n 1 file.txt" on its own.
perlfaq5: How do I select a random line from a file? Here's a reservoir-sampling algorithm from the Camel Book:
perl -e 'srand; rand($.) < 1 && ($line = $_) while <>; print $line;' file
This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
using a bash script:
#!/bin/bash
# replace with file to read
FILE=tmp.txt
# count number of lines
NUM=$(wc - l < ${FILE})
# generate random number in range 0-NUM
let X=${RANDOM} % ${NUM} + 1
# extract X-th line
sed -n ${X}p ${FILE}
Single bash line:
sed -n $((1+$RANDOM%`wc -l test.txt | cut -f 1 -d ' '`))p test.txt
Slight problem: duplicate filename.
Here's a simple Python script that will do the job:
import random, sys
lines = open(sys.argv[1]).readlines()
print(lines[random.randrange(len(lines))])
Usage:
python randline.py file_to_get_random_line_from
Another way using 'awk'
awk NR==$((${RANDOM} % `wc -l < file.name` + 1)) file.name
A solution that also works on MacOSX, and should also works on Linux(?):
N=5
awk 'NR==FNR {lineN[$1]; next}(FNR in lineN)' <(jot -r $N 1 $(wc -l < $file)) $file
Where:
N is the number of random lines you want
NR==FNR {lineN[$1]; next}(FNR in lineN) file1 file2
--> save line numbers written in file1 and then print corresponding line in file2
jot -r $N 1 $(wc -l < $file) --> draw N numbers randomly (-r) in range (1, number_of_line_in_file) with jot. The process substitution <() will make it look like a file for the interpreter, so file1 in previous example.
#!/bin/bash
IFS=$'\n' wordsArray=($(<$1))
numWords=${#wordsArray[#]}
sizeOfNumWords=${#numWords}
while [ True ]
do
for ((i=0; i<$sizeOfNumWords; i++))
do
let ranNumArray[$i]=$(( ( $RANDOM % 10 ) + 1 ))-1
ranNumStr="$ranNumStr${ranNumArray[$i]}"
done
if [ $ranNumStr -le $numWords ]
then
break
fi
ranNumStr=""
done
noLeadZeroStr=$((10#$ranNumStr))
echo ${wordsArray[$noLeadZeroStr]}
Here is what I discovery since my Mac OS doesn't use all the easy answers. I used the jot command to generate a number since the $RANDOM variable solutions seems not to be very random in my test. When testing my solution I had a wide variance in the solutions provided in the output.
RANDOM1=`jot -r 1 1 235886`
#range of jot ( 1 235886 ) found from earlier wc -w /usr/share/dict/web2
echo $RANDOM1
head -n $RANDOM1 /usr/share/dict/web2 | tail -n 1
The echo of the variable is to get a visual of the generated random number.
Using only vanilla sed and awk, and without using $RANDOM, a simple, space-efficient and reasonably fast "one-liner" for selecting a single line pseudo-randomly from a file named FILENAME is as follows:
sed -n $(awk 'END {srand(); r=rand()*NR; if (r<NR) {sub(/\..*/,"",r); r++;}; print r}' FILENAME)p FILENAME
(This works even if FILENAME is empty, in which case no line is emitted.)
One possible advantage of this approach is that it only calls rand() once.
As pointed out by #AdamKatz in the comments, another possibility would be to call rand() for each line:
awk 'rand() * NR < 1 { line = $0 } END { print line }' FILENAME
(A simple proof of correctness can be given based on induction.)
Caveat about rand()
"In most awk implementations, including gawk, rand() starts generating numbers from the same starting number, or seed, each time you run awk."
-- https://www.gnu.org/software/gawk/manual/html_node/Numeric-Functions.html

Resources