Implementing `sumproduct` in UNIX shell - bash

I have some output from a script thescript which reads:
202 1 0 1 0 0 0
Now I want to selectively sum this number with awk, depending on the value of a ${SUM_MASK}:
SUM_MASK=1,1,0,0,0,0,0
I would like to have something like:
thescript | awk <SOMETHING>
where the each number output of thescript gets multiplied by the corresponding number in ${SUM_MASK}, obtaining:
203
as result of:
203 = 202 * 1 + 1 * 1 + 0 * 0 + 1 * 0 + 0 * 0 + 0 * 0 + 0 * 0
This would be similar to the sumproduct function in spreadsheet software.
The following code snipets do the trick, but I would like to avoid using process substitution:
SUM_MASK="1,1,0,0,0,0,0"; paste <(thescript) <(echo ${SUM_MASK} | tr ',' '\n') | awk '{ SUM += $1 * $2 } END { print SUM }'
and named pipes:
SUM_MASK="1,1,0,0,0,0,0"; mkfifo fA; mkfifo fB; thescript > fA & echo ${SUM_MASK} | tr ',' '\n' > fB & paste fA fB | awk '{ SUM += $1 * $2 } END { print SUM }' > result.text; rm -f fA fB
how could I achieve that?

echo "202 1 0 1 0 0 0" |
awk -v summask="1,1,0,0,0,0,0" '
BEGIN {split(summask, mask, /,/)}
{ sumproduct=0
for (i=1; i<=NF; i++) {
sumproduct += $i * mask[i]
}
print sumproduct
}
'
203

There's no need for external tools such as awk here -- bash is capable of resolving this with built-in capabilities only. Consider the below implementation as a function:
sumproduct() {
local -a sum_inputs sum_mask
local idx result
# read your sum_inputs into an array from stdin
IFS=', ' read -r -a sum_inputs # this could be <<<"$1" to use the first argument
# and your sum_mask from the like-named variable
IFS=', ' read -r -a sum_mask <<<"$SUM_MASK" # or <<<"$2" for the second argument
# ...iterate over array elements in sum_inputs; find the corresponding sum_mask; math.
result=0
for idx in "${!sum_inputs[#]}"; do
(( result += ${sum_mask[$idx]} * ${sum_inputs[$idx]} ))
done
echo "$result"
}
To test this:
echo "202 1 0 1 0 0 0" | SUM_MASK=1,1,0,0,0,0,0 sumproduct
...correctly yields:
203

You do not actually need sum product, but masked summation, for example this should be faster if you have a lot of masked columns.
$ awk -v mask='1,1,0,0,0,0,0' 'BEGIN {n=split(mask,m,",");
for(i=1; i<=n; i++) if(m[i]) ix[i]}
{sum=0;
for(i in ix) sum += $i;
print sum}' file
203

With the one-digit multipliers you can make a simple loop
SUM_MASK=1,1,0,0,0,0,0
offset=0
sum=0;
for i in 202 1 0 1 0 0 0; do
j="${SUM_MASK:$offset:1}"
((sum += i * j ))
((offset+=2))
done
echo "${sum}"
This solution can be used in a script prodsum that can be called with thescript | prodsum :
offset=0
sum=0;
for i ; do
j="${SUM_MASK:$offset:1}"
((sum += i * j ))
((offset+=2))
done
echo "${sum}"
EDIT: When the SUB_MASK can have numbers>9, use the following:
SUM_MASK=1,10,0,0,0,0,0
sum=0;
remaining_mask="$SUM_MASK"
for i in 202 1 0 1 0 0 0; do
j="${remaining_mask%%,*}"
remaining_mask="${remaining_mask#*,}"
((sum += i * j ))
done
echo "${sum}"

Related

Add x^2 to every "nonzero" coefficient with sed/awk

I have to write, as easy as possible, a script or command which has to use awk or/and sed.
Input file:
23 12 0 33
3 4 19
1st line n=3
2nd line n=2
In each line of file we have string of numbers. Each number is coefficient and we have to add x^n where n is the highest power (sum of spaces between numbers in each line (no space after last number in each line)) and if we have "0" in our string we have to skip it.
So for that input we will have output like:
23x^3+12x^2+33
3x^2+4x+19
Please help me to write a short script solving that problem. Thank you so much for your time and all the help :)
My idea:
linescount=$(cat numbers|wc -l)
linecounter=1
While[linecounter<=linescount];
do
i=0
for i in spaces=$(cat numbers|sed 1p | sed " " )
do
sed -i 's/ /x^spaces/g'
i=($(i=i-1))
done
linecounter=($(linecounter=linecounter-1))
done
Following awk may help you on same too.
awk '{for(i=1;i<=NF;i++){if($i!="" && $i){val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))} else {pointer++}};if(val){print val};val=""} pointer==NF{print;} {pointer=""}' Input_file
Adding a non-one liner form of solution too here.
awk '
{
for(i=1;i<=NF;i++){
if($i!="" && $i){
val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))}
else {
pointer++}};
if(val) {
print val};
val=""
}
pointer==NF {
print}
{
pointer=""
}
' Input_file
EDIT: Adding explanation too here for better understanding of OP and all people's learning here.
awk '
{
for(i=1;i<=NF;i++){ ##Starting a for loop from variable 1 to till the value of NF here.
if($i!="" && $i){ ##checking if variable i value is NOT NULL then do following.
val=(val?val "+" $i:$i)(NF-i==0?"":(NF-i==1?"x":"x^"NF-i))} ##creating variable val here and putting conditions here if val is NULL then
##simply take value of that field else concatenate the value of val with its
##last value. Second condition is to check if last field of line is there then
##keep it like that else it is second last then print "x" along with it else keep
##that "x^" field_number-1 with it.
else { ##If a field is NULL in current line then come here.
pointer++}}; ##Increment the value of variable named pointer here with 1 each time it comes here.
if(val) { ##checking if variable named val is NOT NULL here then do following.
print val}; ##Print the value of variable val here.
val="" ##Nullifying the variable val here.
}
pointer==NF { ##checking condition if pointer value is same as NF then do following.
print} ##Print the current line then, seems whole line is having zeros in it.
{
pointer="" ##Nullifying the value of pointer here.
}
' Input_file ##Mentioning Input_file name here.
Offering a Perl solution since it has some higher level constucts than bash that make the code a little simpler:
use strict;
use warnings;
use feature qw(say);
my #terms;
while (my $line = readline(*DATA)) {
chomp($line);
my $degree = () = $line =~ / /g;
my #coefficients = split / /, $line;
my #terms;
while ($degree >= 0) {
my $coefficient = shift #coefficients;
next if $coefficient == 0;
push #terms, $degree > 1
? "${coefficient}x^$degree"
: $degree > 0
? "${coefficient}x"
: $coefficient;
}
continue {
$degree--;
}
say join '+', #terms;
}
__DATA__
23 12 0 33
3 4 19
Example output:
hunter#eros  ~  perl test.pl
23x^3+12x^2+33
3x^2+4x+19
Any information you want on any of the builtin functions used above: readline, chomp, push, shift, split, say, and join can be found in perldoc with perldoc -f <function-name>
$ cat a.awk
function print_term(i) {
# Don't print zero terms:
if (!$i) return;
# Print a "+" unless this is the first term:
if (!first) { printf " + " }
# If it's the last term, just print the number:
if (i == NF) printf "%d", $i
# Leave the coefficient blank if it's 1:
coef = ($i == 1 ? "" : $i)
# If it's the penultimate term, just print an 'x' (not x^1):
if (i == NF-1) printf "%sx", coef
# Print a higher-order term:
if (i < NF-1) printf "%sx^%s", coef, NF - i
first = 0
}
{
first = 1
# print all the terms:
for (i=1; i<=NF; ++i) print_term(i)
# If we never printed any terms, print a "0":
print first ? 0 : ""
}
Example input and output:
$ cat file
23 12 0 33
3 4 19
0 0 0
0 1 0 1
17
$ awk -f a.awk file
23x^3 + 12x^2 + 33
3x^2 + 4x + 19
0
x^2 + 1
17
$ cat ip.txt
23 12 0 33
3 4 19
5 3 0
34 01 02
$ # mapping each element except last to add x^n
$ # -a option will auto-split input on whitespaces, content in #F array
$ # $#F will give index of last element (indexing starts at 0)
$ # $i>0 condition check to prevent x^0 for last element
$ perl -lane '$i=$#F; print join "+", map {$i>0 ? $_."x^".$i-- : $_} #F' ip.txt
23x^3+12x^2+0x^1+33
3x^2+4x^1+19
5x^2+3x^1+0
34x^2+01x^1+02
$ # with post processing
$ perl -lape '$i=$#F; $_ = join "+", map {$i>0 ? $_."x^".$i-- : $_} #F;
s/\+0(x\^\d+)?\b|x\K\^1\b//g' ip.txt
23x^3+12x^2+33
3x^2+4x+19
5x^2+3x
34x^2+01x+02
One possibility is:
#!/usr/bin/env bash
line=1
linemax=$(grep -oEc '(( |^)[0-9]+)+' inputFile)
while [ $line -lt $linemax ]; do
degree=$(($(grep -oE ' +' - <<<$(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1) | cut -d : -f 1 | uniq -c)+1))
coeffs=($(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1))
i=0
while [ $i -lt $degree ]; do
if [ ${coeffs[$i]} -ne 0 ]; then
if [ $(($degree-$i-1)) -gt 1 ]; then
echo -n "${coeffs[$i]}x^$(($degree-$i-1))+"
elif [ $(($degree-$i-1)) -eq 1 ]; then
echo -n "${coeffs[$i]}x"
else
echo -n "${coeffs[$i]}"
fi
fi
((i++))
done
echo
((line++))
done
The most important lines are:
# Gets degree of the equation
degree=$(($(grep -oE ' +' - <<<$(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1) | cut -d : -f 1 | uniq -c)+1))
# Saves coefficients in an array
coeffs=($(grep -oE '(( |^)[0-9]+)+' inputFile | head -$line | tail -1))
Here, grep -oE '(( |^)[0-9]+)+' finds lines containing only numbers (see edit). grep -oE ' +' - ........... |cut -d : -f 1 |uniq counts the numbers of coefficients per line as explained in this question.
Edit: An improved regex for capturing lines with only numbers is
grep -E '(( |^)[0-9]+)+' inputfile | grep -v '[a-zA-Z]'
sed -r "s/(.*) (.*) (.*) (.*)/\1x^3+\2x^2+\3x+\4/; \
s/(.*) (.*) (.*)/\1x^2+\2x+\3/; \
s/\+0x(^.)?\+/+/g; \
s/^0x\^.[+]//g; \
s/\+0$//g;" koeffs.txt
Line 1: Handle 4 elements
Line 2: Handle 3
Line 3: Handle 0 in the middle
Line 5: Handle 0 at start
Line 5: Handle 0 at end
Here is a more bashy, less sedy answer which is better readable, than the sed one, I think:
#!/bin/bash
#
# 0 4 12 => 12x^3
# 2 4 12 => 12x
# 3 4 12 => 12
term () {
p=$1
leng=$2
fac=$3
pot=$((leng - 1 - p))
case $pot in
0) echo -n '+'${fac} ;;
1) echo -n '+'${fac}x ;;
*) echo -n '+'${fac}x^$pot ;;
esac
}
handleArray () {
# mapfile puts a counter into the array, starting with 0 for the 1st
# get rid of it!
shift
coeffs=($*)
# echo ${coeffs[#]}
cnt=0
len=${#coeffs[#]}
while (( cnt < len ))
do
if [[ ${coeffs[$cnt]} != 0 ]]
then
term $cnt $len ${coeffs[$cnt]}
fi
((cnt++))
done
echo # -e '\n' # extra line for dbg, together w. line 5 of the function.
}
mapfile -n 0 -c 1 -C handleArray < ./koeffs.txt coeffs | sed -r "s/^\++//;s/\++$//;"
The mapfile reads data and produces an array. See help mapfile for a brief syntax introduction.
We need some counting, to know, to which power to raise. Meanwhile we try to get rid of 0-terms.
In the end I use sed to remove leading and trailing plusses.
sh solution
while read line ; do
set -- $line
while test $1 ; do
i=$(($#-1))
case $1 in
0) ;;
*) case $i in
0) j="" ;;
1) j="x" ;;
*) j="x^$i" ;;
esac
result="$result$1$j+";;
esac
shift
done
echo "${result%+}"
result=""
done < infile
$ cat tst.awk
{
out = sep = ""
for (i=1; i<=NF; i++) {
if ($i != 0) {
pwr = NF - i
if ( pwr == 0 ) { sfx = "" }
else if ( pwr == 1 ) { sfx = "x" }
else { sfx = "x^" pwr }
out = out sep $i sfx
sep = "+"
}
}
print out
}
$ awk -f tst.awk file
23x^3+12x^2+33
3x^2+4x+19
First, my test set:
$ cat file
23 12 0 33
3 4 19
0 1 2
2 1 0
Then the awk script:
$ awk 'BEGIN{OFS="+"}{for(i=1;i<=NF;i++)$i=$i (NF-i?"x^" NF-i:"");gsub(/(^|\+)0(x\^[0-9]+)?/,"");sub(/^\+/,""}1' file
23x^3+12x^2+33
3x^2+4x^1+19
1x^1+2
2x^2+1x^1
And an explanation:
$ awk '
BEGIN {
OFS="+" # separate with a + (negative values
} # would be dealt with in gsub
{
for(i=1;i<=NF;i++) # process all components
$i=$i (NF-i?"x^" NF-i:"") # add x and exponent
gsub(/(^|\+)0(x\^[0-9]+)?/,"") # clean 0s and leftover +s
sub(/^\+/,"") # remore leading + if first component was 0
}1' file # output
This might work for you (GNU sed);)
sed -r ':a;/^\S+$/!bb;s/0x\^[^+]+\+//g;s/\^1\+/+/;s/\+0$//;b;:b;h;s/\S+$//;s/\S+\s+/a/g;s/^/cba/;:c;s/(.)(.)\2\2\2\2\2\2\2\2\2\2/\1\1\2/;tc;s/([a-z])\1\1\1\1\1\1\1\1\1/9/;s/([a-z])\1\1\1\1\1\1\1\1/8/;s/([a-z])\1\1\1\1\1\1\1/7/;s/([a-z])\1\1\1\1\1\1/6/;s/([a-z])\1\1\1\1\1/5/;s/([a-z])\1\1\1\1/4/;s/([a-z])\1\1\1/3/;s/([a-z])\1\1/2/;s/([a-z])\1/1/;s/[a-z]/0/g;s/^0+//;G;s/(.*)\n(\S+)\s+/\2x^\1+/;ba' file
This is not a serious solution!
Shows how sed can count, kudos goes to Greg Ubben back in 1989 when he wrote wc in sed!

awk: run time error: negative field index

I currently have the following:
function abs() {
echo $(($1<0 ?-$1:$1));
}
echo $var1 | awk -F" " '{for (i=2;i<=NF;i+=2) $i=(95-$(abs $i))*1.667}'
where var1 is:
4 -38 2 -42 1 -43 10 -44 1 -45 6 -46 1 -48 1 -49
When I run this, I am getting the error:
awk: run time error: negative field index $-38
FILENAME="-" FNR=1 NR=1
Does this have something to do with the 95-$(abs $i) part? I'm not sure how to fix this.
Try this:
echo "$var1" |
awk 'function abs(x) { return x<0 ? -x : x }
{ for (i=2;i<=NF;i+=2) $i = (95-abs($i))*1.667; print }'
Every line of input to AWK is placed in fields by the interpreter. The fields can be accessed with $N for N > 0. $0 means the whole line. $N for N < 0 is nonsensical. Variables are not prefixed with a dollar sign.

Find a number of a file in a range of numbers of another file

I have this two input files:
file1
1 982444
1 46658343
3 15498261
2 238295146
21 47423507
X 110961739
17 7490379
13 31850803
13 31850989
file2
1 982400 982480
1 46658345 46658350
2 14 109
2 5000 9000
2 238295000 238295560
X 110961739 120000000
17 7490200 8900005
And this is my desired output:
Desired output:
1 982444
2 238295146
X 110961739
17 7490379
This is what I want: Find the column 1 element of file1 in column 1 of file2. If the number is the same, take the number of column 2 of file1 and check if it is included in the range of numbers of column2 and 3 of file2. If it is included, print the line of file1 in the output.
Maybe is a little confusing to understand, but I'm doing my best. I have tried some things but I'm far away from the solution and any help will be really appreciated. In bash, awk or perl please.
Thanks in advance,
Just using awk. The solution doesn't loop through file1 repeatedly.
#!/usr/bin/awk -f
NR == FNR {
# I'm processing file2 since NR still matches FNR
# I'd store the ranges from it on a[] and b[]
# x[] acts as a counter to the number of range pairs stored that's specific to $1
i = ++x[$1]
a[$1, i] = $2
b[$1, i] = $3
# Skip to next record; Do not allow the next block to process a record from file2.
next
}
{
# I'm processing file1 since NR is already greater than FNR
# Let's get the index for the last range first then go down until we reach 0.
# Nothing would happen as well if i evaluates to nothing i.e. $1 doesn't have a range for it.
for (i = x[$1]; i; --i) {
if ($2 >= a[$1, i] && $2 <= b[$1, i]) {
# I find that $2 is within range. Now print it.
print
# We're done so let's skip to the next record.
next
}
}
}
Usage:
awk -f script.awk file2 file1
Output:
1 982444
2 238295146
X 110961739
17 7490379
A similar approach using Bash (version 4.0 or newer):
#!/bin/bash
FILE1=$1 FILE2=$2
declare -A A B X
while read F1 F2 F3; do
(( I = ++X[$F1] ))
A["$F1|$I"]=$F2
B["$F1|$I"]=$F3
done < "$FILE2"
while read -r LINE; do
read F1 F2 <<< "$LINE"
for (( I = X[$F1]; I; --I )); do
if (( F2 >= A["$F1|$I"] && F2 <= B["$F1|$I"] )); then
echo "$LINE"
continue
fi
done
done < "$FILE1"
Usage:
bash script.sh file1 file2
Let's mix bash and awk:
while read col min max
do
awk -v col=$col -v min=$min -v max=$max '$1==col && min<=$2 && $2<=max' f1
done < f2
Explanation
For each line of file2, read the min and the max, together with the value of the first column.
Given these values, check in file1 for those lines having same first column and being 2nd column in the range specified by file 2.
Test
$ while read col min max; do awk -v col=$col -v min=$min -v max=$max '$1==col && min<=$2 && $2<=max' f1; done < f2
1 982444
2 238295146
X 110961739
17 7490379
Pure bash , based on Fedorqui solution:
#!/bin/bash
while read col_2 min max
do
while read col_1 val
do
(( col_1 == col_2 && ( min <= val && val <= max ) )) && echo $col_1 $val
done < file1
done < file2
cut -d' ' -f1 input2 | sed 's/^/^/;s/$/\\s/' | \
grep -f - <(cat input2 input1) | sort -n -k1 -k3 | \
awk 'NF==3 {
split(a,b,",");
for (v in b)
if ($2 <= b[v] && $3 >= b[v])
print $1, b[v];
if ($1 != p) a=""}
NF==2 {p=$1;a=a","$2}'
Produces:
X 110961739
1 982444
2 238295146
17 7490379
Here's a Perl solution. It could be much faster but less concise if I built a hash out of file2, but this should be fine.
use strict;
use warnings;
use autodie;
my #bounds = do {
open my $fh, '<', 'file2';
map [ split ], <$fh>;
};
open my $fh, '<', 'file1';
while (my $line = <$fh>) {
my ($key, $val) = split ' ', $line;
for my $bound (#bounds) {
next unless $key eq $bound->[0] and $val >= $bound->[1] and $val <= $bound->[2];
print $line;
last;
}
}
output
1 982444
2 238295146
X 110961739
17 7490379

Separating and counting number of elements in a list with conditions

I would like to separate and count the number of elements within my input list.
The input.txt contains 2 columns, $1 is the element ID and $2 is it's ratio (number).
ENSG001 12.3107448237
ENSG007 4.3602275
ENSG008 2.9918420285
ENSG009 1.035588
ENSG010 0.999864
ENSG012 0.569833
ENSG013 0.495325
ENSG014 0.253893
ENSG015 0.125389
ENSG017 0.012568
ENSG018 -0.135689
ENSG020 -0.4938497942
ENSG022 -0.6429221854
ENSG024 -1.1759339381
ENSG029 -4.2722999766
ENSG030 -11.8447513281
I want to separate the ratios into the following categories:
Greater than or equal to 2
Between 1 and 2
Between 0.5 and 1
Between -0.5 and 0.5
Between -1 and -0.5
Between -2 and -1
Less than or equal to 2
and then print the count from each category into a single separate output file results.txt:
Total 16
> 2 3
1 to 2 1
0.5 to 1 2
-0.5 to 0.5 6
-0.5 to -1 1
-1 to -2 1
< -2 2
I can do this on the command line using the following:
awk $2 > 2 {print $1,$2} input.txt | wc -l
awk $2 > 0.5 && $2 < 1 {print $1,$2} input.txt | wc -l
awk $2 > -0.5 && $2 < 0.5 {print $1,$2} input.txt | wc -l
awk $2 > -0.5 && $2 < -1 {print $1,$2} input.txt | wc -l
awk $2 > -1 && $2 < -0.5 {print $1,$2} input.txt | wc -l
awk $2 > -2 && $2 < -1 {print $1,$2} input.txt | wc -l
awk $2 < -2 {print $1,$2} input.txt | wc -l
I think there is a quicker way of doing it using a shell script with while or for loop but I don't know how to. Any suggestions would be brilliant.
you can just process the file once, the straightforward way would be:
awk '$2>=2{a++;next}
$2>0.5 && $2 <1 {b++;next}
$2>-0.5 && $2 <0.5 {c++;next}
...
$2<=-2{x++;next}
END{print "total:",NR;
print ">2:",a;
print "1-2:",b;
...
print "<-2:",x
}' file
You could simply sort the entries numerically, using sort, and later count the number of entries in each interval. For example, considering your input:
cut -f 2 -d ' ' input.txt | sort -nr | awk '
BEGIN { split("2 1 0.5 -0.5 -1 -2", inter); i = 1; }
{
if (i > 6) { ++c; next; }
if ($1 >= inter[i]) ++c;
else if (i == 1) { print c, "greater than", inter[i++]; c = 1; }
else { print c, "between", inter[i - 1], "and", inter[i++]; c = 1; }
}
END { print c, "lower than", inter[i - 1]; }'
If your input is already sorted, you may even shorten your command line, using:
awk 'BEGIN { split("2 1 0.5 -0.5 -1 -2", inter); i = 1; }
{
if (i > 6) { ++c; next; }
if ($2 >= inter[i]) ++c;
else if (i == 1) { print c, "greater than", inter[i++]; c = 1; }
else { print c, "between", inter[i - 1], "and", inter[i++]; c = 1; }
}
END { print c, "lower than", inter[i - 1]; }' input.txt
And the resulting output -- which you may format as you will:
3 greater than 2
1 between 2 and 1
2 between 1 and 0.5
6 between 0.5 and -0.5
1 between -0.5 and -1
1 between -1 and -2
2 lower than -2
One approach would be to implement this with a single awk command by maintaining a running count for each category you are interested in.
#!/bin/bash
if [ $# -ne 1 ]
then
echo "Usage: $0 INPUT"
exit 1
fi
awk ' {
if ($2 > 2) count[0]++
else if ($2 > 1) count[1]++
else if ($2 > 0.5) count[2]++
else if ($2 > -0.5) count[3]++
else if ($2 > -1) count[4]++
else if ($2 > -2) count[5]++
else count[6]++
} END {
print " > 2\t", count[0]
print " 1 to 2\t", count[1]
print " 0.5 to 1\t", count[2]
print "-0.5 to 0.5\t", count[3]
print "-1 to -0.5\t", count[4]
print "-2 to -1\t", count[5]
print " < -2\t", count[6]
}' $1
awk -f script.awk input.txt
with script.awk:
{
if ($2>=2) counter1++
else if ($2>=1) counter2++
else if ($2>=0.5) counter3++
else if ($2>=-0.5) counter4++
else if ($2>=-1) counter5++
else if ($2>=-2) counter6++
else counter7++
}
END{
print "Greater than 2: "counter1
print "Between 1 and 2: "counter2
print "Between 0.5 and 1: "counter3
print "Between -0.5 and 0.5: "counter4
print "Between -1 and -0.5: "counter5
print "Between -2 and -1: "counter6
print "Less than 2: "counter7
}
script toto:
awk '
$2>2 { count[1]++; label[1]="Greater than or equal to 2"; }
($2>1 && $2<=2) { count[2]++; label[2]="Between 1 and 2"; }
($2>0.5 && $2<=1) { count[3]++; label[3]="Between 0.5 and 1"; }
($2>-0.5 && $2<=0.5) { count[4]++; label[4]="Between -0.5 and 0.5"; }
($2>-1 && $2<=-0.5) { count[5]++; label[5]="Between -1 and -0.5"; }
($2>-2 && $2<=-1) { count[6]++; label[6]="Between -2 and -1"; }
$2<=-2 { count[7]++; label[7]="Less than or equal to 2"; }
END { for (i=1;i<=7;i++)
{ printf "%-30s %s\n" ,label[i], count[i];
}
}
' /tmp/input.txt
and the result:
. /tmp/toto
Greater than or equal to 2 3
Between 1 and 2 1
Between 0.5 and 1 2
Between -0.5 and 0.5 6
Between -1 and -0.5 1
Between -2 and -1 1
Less than or equal to 2 2

summing second columns of all files in bash

I have 1-N files in this format:
file 1:
1 1
2 5
3 0
4 0
5 0
file 2:
1 5
2 1
3 0
4 0
5 1
As an output, I want to sum all second columns of all files, so the output looks like this:
output:
1 6
2 6
3 0
4 0
5 1
Thanks a lot.
(Alternatively would be the best for me to do this operation automatically with all files that have the same name, but start with different number, e.g. 1A.txt, 2A.txt, 3A.txt as one output and 1AD.txt, 2AD.txt, 3AD.txt as next output)
Something like this should work:
cat *A.txt | awk '{sums[$1] += $2;} END { for (i in sums) print i " " sums[i]; }'
cat *AD.txt | awk '{sums[$1] += $2;} END { for (i in sums) print i " " sums[i]; }'
A quick summing solution can be done in awk:
{ sum[$1] += $2; }
END { for (i in sum) print i " " sum[i]; }
Grouping your input files is done easiest by building a list of suffixes and then globbing for them:
ls *.txt | sed -e 's/^[0-9]*//' | while read suffix; do
awk '{ sum[$1] += $2; } END { for (i in sum) print i " " sum[i]; }' *$suffix > ${suffix}.sum
done
#!/bin/bash
suffixes=$(find . -name '*.txt' | sed 's/.*[0-9][0-9]*\(.*\)\.txt/\1/' | sort -u)
for suffix in ${suffixes}; do
paste *${suffix}.txt | awk '{sum = 0; for (i = 2; i <= NF; i += 2) sum += $i;
print $1" "sum}' > ${suffix}.sums.txt
done
exit 0
Pure Bash:
declare -a sum
for file in *A.txt; do
while read a b; do
((sum[a]+=b))
done < "$file"
done
for idx in ${!sum[*]}; do # iterate over existing indices
echo "$idx ${sum[$idx]}"
done

Resources