sum the odd numbers in line form file.txt bash - bash

hello the StackOverflow i wanted to ask you how to sum the odd numbers in every line from input file.txt
the input.txt file looks like that
4 1 8 3 7
2 5 8 2 7
4 7 2 5 2
0 2 5 3 5
3 6 3 1 6
the output must be
11
12
12
13
7
start of the code like this
read -p "Enter file name:" filename
while read line
do
...
my code whats the wrong here
#!/bin/sh
read -p "Enter file name:" filename
while read line
do
sum = 0
if ($_ % 2 -nq 0){
sum = sum + $_
}
echo $sum
sum = 0
done <$filename

The logic seems correct in your question, so I'll go with you're not sure how to do this line by line as stated on your comment.
if ($_ % 2 -nq 0){
sum = sum + $_
}
I think it's a good place for a function in this case. Takes a string containing integers as input and returns the sum of all odd numbers on that string or -1 assuming there are no integers or all even numbers.
function Sum-OddNumbers {
[cmdletbinding()]
param(
[parameter(mandatory,ValueFromPipeline)]
[string]$Line
)
process {
[regex]::Matches($Line,'\d+').Value.ForEach{
begin {
$result = 0
}
process {
if($_ % 2) {
$result += $_
}
}
end {
if(-not $result) {
return -1
}
return $result
}
}
}
}
Usage
#'
4 1 8 3 7
2 5 8 2 7
4 7 2 5 2
0 2 5 3 5
3 6 3 1 6
2 4 6 8 10
asd asd asd
'# -split '\r?\n' | Sum-OddNumbers
Result
11
12
12
13
7
-1
-1

If that's how your txt file is set up, you can use Get-Content and a bit of logic to accomplish this.
Get-Content will read the file line by line (unless -Raw is specified), which we can pipe to a Foreach-Object to have the current line in the iteration split by the white space.
Then, we can evaluate the newly formed array (due to splitting the white space, leaving the numbers to create an array).
Finally, just get the sum of the odd numbers.
Get-Content -Path .\input.txt | ForEach-Object {
# Split the current line into an array of just #'s
$OddNumbers = $_.Split(' ').Trim() | Foreach {
if ($_ % 2 -eq 1) { $_ } # odd number filter
}
# Add the filtered results
($OddNumbers | Measure-Object -Sum).Sum
}

"what's wrong here":
while read line
do
sum = 0
if ($_ % 2 -nq 0){
sum = sum + $_
}
echo $sum
sum = 0
done <$filename
First, in sh, spaces are not allowed around the = in an assigmnent
Next the if syntax is wrong. See https://www.gnu.org/software/bash/manual/bash.html#index-if
See also https://www.gnu.org/software/bash/manual/bash.html#Shell-Arithmetic

Related

Powershell: How to get the average of numbers after ";" symbol? (from a txt file)

I have a txt file of students and their grades splitted with ";" syimbol.
like this
Andrea Pirlo; 5 4 5 4 5 4
Alessandro Del Piero; 5 3 4
Gianluigi Buffon; 4 4 3 5 4 4 4 4 4 4 4
I need to count the number of their grades and the average their grades. I got the number of grades, it works right but I can't get any output for the average.
$file = file.txt
$file | ForEach-Object{
if(($_.split(';')[1].split() | Measure-Object).Count -gt 5){
Write-Host (($_.split(';')[1].split() | Measure-Object).Count)
Write-Host (($_.split(';')[1].split() | Measure-Object).Average)
}
}
I don't understand because the two lines are just about the same.
Thanks for the help!
Look at 'Description' section in documentation: link to documentation
If you don't pass any argument, it would give you only count property.
You have to add -Average to your command e.g.
(1,2,3,4,5 | Measure-Object -Average).Average <-- that command will return correct value 3

Compare consecutive columns of a file and determine if any is greater than 80

I have a file with the output of an SQL query like this:
DG_DATA 9 DG_FRA 0 OCR002 3 OCR 3
I use the following for extract the columns with numbers
awk '{for(x=1;x<=NF;++x)if(x % 2 == 0)printf $x "\t"}'
and I get an output with this format:
9 0 3 3
but I need some help to compare all the integers and print "CRITICAL" if any integer is greater than 80.
This is a way to do it:
{
n=split($0, arr, " ");
i=2;
while(i<=n){
str=str arr[i]"\t";
if(arr[i] > 80){
f=1
};
i+=2;
}
print str;
if(f){
print "CRITICAL";
}
}
EXAMPLES
All values ok:
[llamazing#pc ~]$ echo "DG_DATA 9 DG_FRA 0 OCR002 3 OCR 3" | awk '{n=split($0, a, " ");i=2;while(i<=n){str=str a[i]"\t";if(a[i] > 80){f=1};i+=2;}print str;if(f){print "CRITICAL";}}'
9 0 3 3
With critical message:
[llamazing#pc ~]$ echo "DG_DATA 9 DG_FRA 81 OCR002 3 OCR 3" | awk '{n=split($0, a, " ");i=2;while(i<=n){str=str a[i]"\t";if(a[i] > 80){f=1};i+=2;}print str;if(f){print "CRITICAL";}}'
9 81 3 3
CRITICAL

Compare consecutive columns of a file and obtain the number of matched elements

I want to compare consecutive columns of a file and return the number of matched elements. I would prefer to use shell scripting or awk. Here is a sample bash/AWK script that I am trying to use.
#!/bin/bash
for i in 3 4 5 6 7 8 9
do
for j in 3 4 5 6 7 8 9
do
`awk "$i == $j" phased.txt | wc -l`
done
done
I have a file of size 147189*828 and I want to compare each columns and return the number of matched elements in a 828*828 matrix(A similarity matrix).
This would be fairly easy in MATLAB, but, it takes a long time to load huge files. I can compare two columns and return the number of matched elements with the following awk command:
awk '$3==$4' phased.txt | wc -l
but would need some help to do it for the entire file.
A snippet of the data that I'm working on is:
# sampleID HGDP00511 HGDP00511 HGDP00512 HGDP00512 HGDP00513 HGDP00513
M rs4124251 0 0 A G 0 A
M rs6650104 0 A C T 0 0
M rs12184279 0 0 G A T 0
................................................................................
After comparing I will compute a 6*6 matrix in this case: containing the matching percentage of these columns.
In bash, variables need a $ to be interpreted, so your awk "$i == $j" phased.txt | wc -l will be evaluated as awk "3 == 4" phased.txt | wc -l; then, because of your backticks (`), the shell will try to execute that as a command. To get awk to see $3 == $4, you need to add \$: awk "\$$i == \$$j" phased.txt | wc -l.
#!/bin/bash
for i in 3 4 5 6 7 8 9
do
for j in 3 4 5 6 7 8 9
do
awk "\$$i == \$$j" phased.txt | wc -l
done
done
Though you'll probably want to show which combination you're evaluating:
#!/bin/bash
for i in 3 4 5 6 7 8 9
do
for j in 3 4 5 6 7 8 9
do
echo "$i $j: $(awk "\$$i == \$$j" phased.txt | wc -l)"
done
done
You could actually just do the count in awk directly
#!/bin/bash
for i in 3 4 5 6 7 8 9
do
for j in 3 4 5 6 7 8 9
do
echo "$i $j: $(awk "\$$i == \$$j {count++}; END{print count}" phased.txt)"
done
done
Finally, you could just do the whole thing in awk; it'll almost certainly be faster but to be honest it's not that much cleaner: [UNTESTED]
#!/usr/bin/env awk -f
{
for (i = 3; i <= 9; i++) {
for (j = 3; j <= 9; j++) {
if ($i == $j) {
counts[i, j]++
}
}
}
}
END {
for (i = 3; i <= 9; i++) {
for (j = 3; j <= 9; j++) {
printf "%d = %d: %d\n", i, j, counts[i, j]
}
}
}

how to subtract fields pairwise in bash?

I have a large dataset that looks like this:
5 6 5 6 3 5
2 5 3 7 1 6
4 8 1 8 6 9
1 5 2 9 4 5
For every line, I want to subtract the first field from the second, third from fourth and so on deepening on the number of fields (always even). Then, I want to report those lines for which difference from all the pairs exceeds a certain limit (say 2). I should also be able to report next best lines i.e., lines in which one pairwise comparison fails to meet the limit, but all other pairs meet the limit.
from the above example, if I set a limit to 2 then, my output file should contain
best lines:
2 5 3 7 1 6 # because (5-2), (7-3), (6-1) are all > 2
4 8 1 8 6 9 # because (8-4), (8-1), (9-6) are all > 2
next best line(s)
1 5 2 9 4 5 # because except (5-4), both (5-1) and (9-2) are > 2
My current approach is to read every line, save each field as a variable, do subtraction.
But I don't know how to proceed further.
Thanks,
Prints "best" lines to the file "best", and prints "next best" lines to the file "nextbest"
awk '
{
fail_count=0
for (i=1; i<NF; i+=2){
if ( ($(i+1) - $i) <= threshold )
fail_count++
}
if (fail_count == 0)
print $0 > "best"
else if (fail_count == 1)
print $0 > "nextbest"
}
' threshold=2 inputfile
Pretty straightforward stuff.
Loop through fields 2 at a time.
If (next field - current field) does not exceed threshold, increment fail_count
If that line's fail_count is zero, that means it belongs to "best" lines.
Else if that line's fail_count is one, it belongs to "next best" lines.
Here's a bash-way to do it:
#!/bin/bash
threshold=$1
shift
file="$#"
a=($(cat "$file"))
b=$(( ${#a[#]}/$(cat "$file" | wc -l) ))
for ((r=0; r<${#a[#]}/b; r++)); do
br=$((b*r))
for ((c=0; c<b; c+=2)); do
if [[ $(( ${a[br + c+1]} - ${a[br + c]} )) < $threshold ]]; then
break; fi
if [[ $((c+2)) == $b ]]; then
echo ${a[#]:$br:$b}; fi
done
done
Usage:
$ ./script.sh 2 yourFile.txt
2 5 3 7 1 6
4 8 1 8 6 9
This output can then easily be redirected:
$ ./script.sh 2 yourFile.txt > output.txt
NOTE: this does not work properly if you have those empty lines between each line...But I'm sure the above will get you well on your way.
I probably wouldn't do that in bash. Personally, I'd do it in Python, which is generally good for those small quick-and-dirty scripts.
If you have your data in a text file, you can read here about how to get that data into Python as a list of lines. Then you can use a for-loop to process each line:
threshold = 2
results = []
for line in content:
numbers = [int(n) for n in line.split()] # Split it into a list of numbers
pairs = zip(numbers[::2],numbers[1::2]) # Pair up the numbers two and two.
result = [abs(y - x) for (x,y) in pairs] # Subtract the first number in each pair from the second.
if sum(result) > threshold:
results.append(numbers)
Yet another bash version:
First a check function that return nothing but a result code:
function getLimit() {
local pairs=0 count=0 limit=$1 wantdiff=$2
shift 2
while [ "$1" ] ;do
[ $(( $2-$1 )) -ge $limit ] && : $((count++))
: $((pairs++))
shift 2
done
test $((pairs-count)) -eq $wantdiff
}
than now:
while read line ;do getLimit 2 0 $line && echo $line;done <file
2 5 3 7 1 6
4 8 1 8 6 9
and
while read line ;do getLimit 2 1 $line && echo $line;done <file
1 5 2 9 4 5
If you can use awk
$ cat del1
5 6 5 6 3 5
2 5 3 7 1 6
4 8 1 8 6 9
1 5 2 9 4 5
1 5 2 9 4 5 3 9
$ cat del1 | awk '{
> printf "%s _ ",$0;
> for(i=1; i<=NF; i+=2){
> printf "%d ",($(i+1)-$i)};
> print NF
> }' | awk '{
> upper=0;
> for(i=1; i<=($NF/2); i++){
> if($(NF-i)>threshold) upper++
> };
> printf "%d _ %s\n", upper, $0}' threshold=2 | sort -nr
3 _ 4 8 1 8 6 9 _ 4 7 3 6
3 _ 2 5 3 7 1 6 _ 3 4 5 6
3 _ 1 5 2 9 4 5 3 9 _ 4 7 1 6 8
2 _ 1 5 2 9 4 5 _ 4 7 1 6
0 _ 5 6 5 6 3 5 _ 1 1 2 6
You can process result further according to your needs. The result is sorted by ‘goodness’ order.

Calculate mean, variance and range using Bash script

Given a file file.txt:
AAA 1 2 3 4 5 6 3 4 5 2 3
BBB 3 2 3 34 56 1
CCC 4 7 4 6 222 45
Does any one have any ideas on how to calculate the mean, variance and range for each item, i.e. AAA, BBB, CCC respectively using Bash script? Thanks.
Here's a solution with awk, which calculates:
minimum = smallest value on each line
maximum = largest value on each line
average = μ = sum of all values on each line, divided by the count of the numbers.
variance = 1/n × [(Σx)² - Σ(x²)] where
n = number of values on the line = NF - 1 (in awk, NF = number of fields on the line)
(Σx)² = square of the sum of the values on the line
Σ(x²) = sum of the squares of the values on the line
awk '{
min = max = sum = $2; # Initialize to the first value (2nd field)
sum2 = $2 * $2 # Running sum of squares
for (n=3; n <= NF; n++) { # Process each value on the line
if ($n < min) min = $n # Current minimum
if ($n > max) max = $n # Current maximum
sum += $n; # Running sum of values
sum2 += $n * $n # Running sum of squares
}
print $1 ": min=" min ", avg=" sum/(NF-1) ", max=" max ", var=" ((sum*sum) - sum2)/(NF-1);
}' filename
Output:
AAA: min=1, avg=3.45455, max=6, var=117.273
BBB: min=1, avg=16.5, max=56, var=914.333
CCC: min=4, avg=48, max=222, var=5253
Note that you can save the awk script (everything between, but not including, the single-quotes) in a file, say called script, and execute it with awk -f script filename
You can use python:
$ AAA() { echo "$#" | python -c 'from sys import stdin; nums = [float(i) for i in stdin.read().split()]; print(sum(nums)/len(nums))'; }
$ AAA 1 2 3 4 5 6 3 4 5 2 3
3.45454545455
Part 1 (mean):
mean () {
len=$#
echo $* | tr " " "\n" | sort -n | head -n $(((len+1)/2)) | tail -n 1
}
nMean () {
echo -n "$1 "
shift
mean $*
}
mean usage:
nMean AAA 3 4 5 6 3 4 3 6 2 4
4
Part 2 (variance):
variance () {
count=$1
avg=$2
shift
shift
sum=0
for n in $*
do
diff=$((avg-n))
quad=$((diff*diff))
sum=$((sum+quad))
done
echo $((sum/count))
}
sum () {
form="$(echo $*)"
formula=${form// /+}
echo $((formula))
}
nVariance () {
echo -n "$1 "
shift
count=$#
s=$(sum $*)
avg=$((s/$count))
var=$(variance $count $avg $*)
echo $var
}
usage:
nVariance AAA 3 4 5 6 3 4 3 6 2 4
1
Part 3 (range):
range () {
min=$1
max=$1
for p in $* ; do
(( $p < $min )) && min=$p
(( $p > $max )) && max=$p
done
echo $min ":" $max
}
nRange () {
echo -n "$1 "
shift
range $*
}
usage:
nRange AAA 1 2 3 4 5 6 3 4 5 2 3
AAA 1 : 6
nX is short for named X, named mean, named variance, ... .
Note, that I use integer arithmetic, which is, what is possible with the shell. To use floating point arithmetic, you would use bc, for instance. Here you loose precision, which might be acceptable for big natural numbers.
Process all 3 commands for an input line:
processLine () {
nVariance $*
nMean $*
nRange $*
}
Read the data from a file, line by line:
# data:
# AAA 1 2 3 4 5 6 3 4 5 2 3
# BBB 3 2 3 34 56 1
# CCC 4 7 4 6 222 45
while read line
do
processLine $line
done < data
update:
Contrary to my expectation, it doesn't seem easy to handle an unknown number of arguments with functions in bc, for example min (3, 4, 5, 2, 6).
But the need to call bc can be reduced to 2 places, if the input are integers. I used a precision of 2 ("scale=2") - you may change this to your needs.
variance () {
count=$1
avg=$2
shift
shift
sum=0
for n in $*
do
diff="($avg-$n)"
quad="($diff*$diff)"
sum="($sum+$quad)"
done
# echo "$sum/$count"
echo "scale=2;$sum/$count" | bc
}
nVariance () {
echo -n "$1 "
shift
count=$#
s=$(sum $*)
avg=$(echo "scale=2;$s/$count" | bc)
var=$(variance $count $avg $*)
echo $var
}
The rest of the code can stay the same. Please verify that the formula for the variance is correct - I used what I had in mind:
For values (1, 5, 9), I sum up (15) divide by count (3) => 5.
Then I create the diff to the avg for each value (-4, 0, 4), build the square (16, 0, 16), sum them up (32) and divide by count (3) => 10.66
Is this correct, or do I need a square root somewhere ;) ?
Note, that I had to correct the mean calculation. For 1, 5, 9, the mean is 5, not 1 - am I right? It now uses sort -n (numeric) and (len+1)/2.
There is a typo in the accepted answer that causes the variance to be miscalculated. In the print statement:
", var=" ((sum*sum) - sum2)/(NF-1)
should be:
", var=" (sum2 - ((sum*sum)/NF))/(NF-1)
Also, it is better to use something like Welford's algorithm to calculate variance; the algorithm in the accepted answer is unstable when the variance is small relative to the mean:
foo="1 2 3 4 5 6 3 4 5 2 3";
awk '{
M = 0;
S = 0;
for (k=1; k <= NF; k++) {
x = $k;
oldM = M;
M = M + ((x - M)/k);
S = S + (x - M)*(x - oldM);
}
var = S/(NF - 1);
print " var=" var;
}' <<< $foo

Resources