calculate mediane with shell script - bash

I have a script that prints numbers in loops.
#!/bin/bash
for i in `seq 80 $i`
do
for j in `seq 1 $4`
do
./sujet1 $1 $2 $i
done
done
./sujet1 $1 $2 $i is a C compiled program which prints a number ( but I don't like to print it on screen).
I would like to calculate the mediane of numbers in the second loop that ./sujet1 $1 $2 $i prints then print this mediane on the screen.
so I'll have $i mediane at the end.
I konw I should firstly use ./sujet1 $1 $2 $i >> mediane.txt to save values. But I don't know how to recover them in the file, calculate mediane, erase them when finishing every loop..
EDIT:
I tried with awk as told in comment, but I find it difficult to understand for me
#!/bin/bash
for i in `seq 80 $i`
do
for j in `seq 1 $4`
do
awk '{ total += ./sujet1 $1 $2 $i } END { print total/NR }' mediane.txt
done
done
It doesn't work for me.
EDIT 2: for exemple i type ./run.sh 30 40 90 3
so I'll have
//for($3= 80 )
2,3
3,5
4,4
//for($3= 81 )
4,5
1,3
5,6
...
//for($3=90)
2,4
3,5
5,4
You notice here for every value in $3 I have $4 value repeating. I want to calculate the median of these $4 values and print one value

Your question is very hard to understand, but I think you want to run the sujet program lots of times and average the answer.
for i in `seq 80 $i`
do
for j in `seq 1 $4`
do
./sujet1 $1 $2 $i
done
done | awk '{total += $0} END{ print total/NR}'
Maybe you want the median of all the outputs of the sujet program. If so, pipe the output through sort first and then find the middle one with awk something like this:
for ...
for ...
./sujet ...
done
done | sort -n | awk '{x[NR]=$0} END{middle=int(NR/2); print x[middle]}'

You could yse 'backticks' operator:
result="`./sujet1 $1 $2 $i`"
It is used to "inline" run an os command, and to assign it's output to the left-side variable

Related

AWK, average columns of different length from multiple files

I need to calculate average from columns from multiple files but columns have different number of lines. I guess awk is best tool for this but anything from bash will be OK. Solution for 1 column per file is OK. If solution works for files with multiple columns, even better.
Example.
file_1:
10
20
30
40
50
file_2:
20
30
40
Expected result:
15
25
35
40
50
awk would be a tool to do it easily,
awk '{a[FNR]+=$0;n[FNR]++;next}END{for(i=1;i<=length(a);i++)print a[i]/n[i]}' file1 file2
And the method could also suit for multiple files.
Brief explanation,
FNR would be the input record number in the current input file.
Record the sum of the specific column in files into a[FNR]
Record the number of show up times for the specific column into n[FNR]
Print the average value for each column using print a[i]/n[i] in the for loop
I have prepared for you the following bash script for you,
I hope this helps you.
Let me know if you have any question.
#!/usr/bin/env bash
#check if the files provided as parameters exist
if [ ! -f $1 ] || [ ! -f $2 ]; then
echo "ERROR: file> $1 or file> $2 is missing"
exit 1;
fi
#save the length of both files in variables
file1_length=$(wc -l $1 | awk '{print $1}')
file2_length=$(wc -l $2 | awk '{print $1}')
#if file 1 is longer than file 2 appends n 0\t to the end of the file
#until both files are the same length
# you can improve the scrips by creating temp files instead of working directly on the input ones
if [ "$file1_length" -gt "$file2_length" ]; then
n_zero_to_append=$(( file1_length - file2_length ))
echo "append $n_zero_to_append zeros to file $2"
#append n zeros to the end of file
yes 0 | head -n "${n_zero_to_append}" >> $2
#combine both files and compute the average line by line
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if file 2 is longer than file 1 do the inverse operation
# you can improve the scrips by creating temp files instead of working on the input ones
elif [ "$file2_length" -gt "$file1_length" ]; then
n_zero_to_append=$(( file2_length - file1_length ))
echo "append $n_zero_to_append zeros to file $1"
yes 0 | head -n "${n_zero_to_append}" >> $1
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if files have the same size we do not need to append anything
#and we can directly compute the average line by line
else
echo "the files : $1 and $2 have the same size."
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
fi

Adding numbers with a while loop using piped output

So i am running a randomfile that can receive several arguments ($1 and $2) not shown, and then does something with the argument passed...
with the 3rd argument, i am supposed to search for $3 (or not $3) in file1 and add number of instances of this to file2 ...
this works fine:
cat file1 | grep $3 | wc -l | while read line1; do echo $3 $line1 > file2; done
cat file1 | grep -v $3 | wc -l | while read line2; do echo not $3 $line2 >> file2; done
Now I am trying to read file2 that is holding the instances of the search, i want to get the numbers in the file, get the sum, to then append to file2. So, for example, if $3 was "baby":
file2 would contain-
baby 30
not baby 20
and then i want to get the sum of 20 and 30 and append to that same file2, so that it looks like-
baby 30
not baby 20
total 50
This is what i have at the moment:
cat file2 | grep -o '[0-9]*' | while read num ; do sum=$(($sum + $num));echo "total $sum" >> file2; done
my file2 ends up with two lines for totals, where one of them is what i need-
baby 30
not baby 20
total 30
total 50
What did I miss here?
This is happening because your echo is within your for loop.
The obvious solution would be to move this outside your for loop, but if you try this you will find that $sum is not set, this is because the while loops and pipes are actually spawned as their own processes. You can solve this by using braces ({}) to group your commands:
cat file2 | grep -o '[0-9]*' | { while read num ; do sum=$(($sum + $num)); done; echo "total $sum" >> file2; }
Other answers do point out better ways of doing this, but this hopefully helps you understand what is happening.
cat file1 | grep $3 | wc -l | while read line1; do echo $3 $line1 > file2; done
If you want to count the instances of $3, you can use the option -c of grep, avoiding a pipe to wc(1). Moreover, it would be better to quote the $3. Finally, you don't need a loop to read the count (either from wc or grep): it is a single line! So, your code above could be written like this:
count=$(grep -c "$3" file1)
echo $count $3 >file2
The second grep would be just the same as before:
count=$(grep -vc "$3" file1)
echo $count $3 >>file2
Now you should have the intermediate result:
30 baby
20 not baby
Note that I reversed the two terms, count and pattern; this is because we know that the count is a single word, but the pattern could be more words. So writing first the count, we have a well defined format: "count, then all the rest".
The third loop can be written like this:
while read num string; do
# string is filled with all the rest on the line
let "sum = $sum + $num"
done < file2
echo "$sum total" >> file2
There are other ways to sum up the total; if needed, you could also reverse again the terms of the final file, as was your original - it could be done by using another file again.

Bash loop that calculates the sums of columns

I'm trying to write a loop in Bash that prints the sum of every column in a file. These columns are separated by tabs. What I have so far is this:
cols() {
count=$(grep -c $'\t' $1)
for n in $(seq 1 $count) ;do
cat $FILE | awk '{sum+=$1} END{print "sum=",sum}'
done
}
But this only prints out the sum of the first column. How can I do this for every column?
Your approach does the job, but it is somehow overkill: you are counting the number of columns, then catting the file and calling awk, while awk alone can do all of it:
awk -F"\t" '{for(i=1; i<=NF; i++) sum[i]+=$i} END {for (i in sum) print i, sum[i]}' file
This takes advantage of NF that stores the number of fields a line has (which is what you were doing with count=$(grep -c $'\t' $1)). Then, it is just a matter of looping through the fields and sum to every element on the array, where sum[i] contains the sum for the column i. Finally, it loops through the result and writes its values.
Why isn't your approach suming a given column? Because when you say:
for n in $(seq 1 $count) ;do
cat $FILE | awk '{sum+=$1} END{print "sum=",sum}'
done
You are always using $1 as the element to sum. Instead, you should pass the value $n to awk by using something like:
awk -v col="$n" '{sum+=$col} END{print "sum=",sum}' $FILE # no need to cat $FILE
If you want a bash builtin only solution, this would work:
declare -i i l
declare -ai la sa=()
while read -d$'\t' -ra la; do
for ((l=${#la[#]}, i=0; i<l; sa[i]+=la[i], ++i)); do :; done
done < file
(IFS=$'\t'; echo "${sa[*]}")
The performance of this should be decent, but quite a bit slower than something like awk.

How to simplify the syntax needed for 10 >= x >= 1 in BASH conditionals?

I use this BASH script to check if file.txt has between 1 and 10 lines:
if [[ `wc -l file.txt | awk '{ print $1 }'` -le "10" && `wc -l file.txt | awk '{ print $1}'` -ge "2" ]]
echo "It has between 1 and 10 lines."
fi
This code is too verbose. If I make a change to one part, it is easy to forget to make a change to the repeated part.
Is there a way to simplify the syntax?
One option would be to do the whole thing using awk:
awk 'END{if(1<=NR&&NR<=10) print "It has between 1 and 10 lines."}' file.txt
As pointed out in the comments (thanks rici), you might want to prevent awk from processing the rest of your file once it has read 10 lines:
awk 'NR>10{exit}END{if(1<=NR&&NR<=10) print "It has between 1 and 10 lines."}' file.txt
The END block is still processed if exit is called, so it is still necessary to have both checks in the if.
Alternatively, you could store the result of wc -l to a variable in bash:
lines=$(wc -l < file.txt)
(( 1 <= lines && lines <= 10)) && echo "It has between 1 and 10 lines."
Note that redirecting the file into wc means that you just get the number without the filename.
Get the line count, then check it against the range bounds:
lc=$(wc -l < file.txt)
if (( 1 <= lc && lc <= 10 )); then
echo "It has between 1 and 10 lines"
fi

Replacing numbers with SED

I'm trying to replace numbers from -20 to 30 using sed, but it adds "v" character. What's wrong?
For example: SINR=-18, output must be "c", but output is "vc".
I tryed to delete 1st character, but it returns 1 instead of j.
SINR=`curl -s http://10.0.0.1/status | awk '/3GPP.SINR=/ {print $0}' | awk -F "3GPP.SINR=" '{print $2}'` # returns number
echo $SINR | sed "s/-20/a/;s/-19/b/;s/-18/c/;s/-17/d/;s/-16/e/;s/-15/f/;s/-14/g/;s/-13/h/;s/-12/i/;s/-11/j/;s/-10/k/;s/-9/l/;s/-8/m/;s/-7/n/;s/-6/o/;s/-5/p/;s/-4/q/;s/-3/r/;s/-2/s/;s/-1/t/;s/0/u/;s/1/v/;s/2/w/;s/3/x/;s/4/y/;s/5/z/;s/6/A/;s/7/B/;s/8/C/;s/9/D/;s/10/E/;s/11/F/;s/12/G/;s/13/H/;s/14/I/;s/15/J/;s/16/K/;s/17/L/;s/18/M/;s/19/N/;s/20/O/;s/21/P/;s/22/Q/;s/23/R/;s/24/S/;s/25/T/;s/26/U/;s/27/V/;s/28/W/;s/29/X/;s/30/Y/"
This way would be more elegant and less error-prone:
echo $SINR | awk 'BEGIN { chars="abcdefg" } { print substr(chars, $1 + 21, 1) }'
Of course, chars should contain all the letters you need for the mapping. That is, all the way until ...VWXY as in your example, I just wrote until g to keep it short and sweet.
With this solution your problem disappears.
You don't really need sed or awk if you have bash like you say you do. You can use arrays, which is maybe even less error-prone ;-)
map=({a..z} {A..Z}) # Create map of your characters
SINR=-18 # Set your SINR number to something
SINR=$(($SINR+20)) # Add an offset to get to right place
result=${map[$SINR]} # Lookup your result
echo $result # Print it
c
If you have a mapping process, you're surely better off building a switch statement, a couple of if's, or even using bash associative arrays (bash >= 4.0). For example, you could tackle your problem with the following snippet:
function mapper() {
if [[ $1 -ge -20 && $1 -le 5 ]]; then
printf \\$(printf '%03o' $(( $1 + 117 )) )
elif [[ $1 -ge 6 && $1 -le 30 ]]; then
printf \\$(printf '%03o' $(( $1 + 59 )) )
else
echo ""; return 1
fi
return 0
}
And use like below:
$ mapper -20
a
$ mapper 5
z
$ mapper 6
A
$ mapper 30
Y
$ mapper $SINR
c
echo "${SINR}" | sed 's/-20/a/;t;s/-19/b/;t;s/-18/c/;t;s/-17/d/;t;s/-16/e/;t;s/-15/f/;t;s/-14/g/;t;s/-13/h/;t;s/-12/i/;t;s/-11/j/;t;s/-10/k/;t;s/-9/l/;t;s/-8/m/;t;s/-7/n/;t;s/-6/o/;t;s/-5/p/;t;s/-4/q/;t;s/-3/r/;t;s/-2/s/;t;s/-1/t/;t;s/0/u/;t;s/1/v/;t;s/2/w/;t;s/3/x/;t;s/4/y/;t;s/5/z/;t;s/6/A/;t;s/7/B/;t;s/8/C/;t;s/9/D/;t;s/10/E/;t;s/11/F/;t;s/12/G/;t;s/13/H/;t;s/14/I/;t;s/15/J/;t;s/16/K/;t;s/17/L/;t;s/18/M/;t;s/19/N/;t;s/20/O/;t;s/21/P/;t;s/22/Q/;t;s/23/R/;t;s/24/S/;t;s/25/T/;t;s/26/U/;t;s/27/V/;t;s/28/W/;t;s/29/X/;t;s/30/Y/'
Use the t after s// to accelerate a bit.
vc is normaly not occuring if SINR is just a number like specified

Resources