search for a string , and add if it matches - bash

I have a file that has 2 columns as given below....
101 6
102 23
103 45
109 36
101 42
108 21
102 24
109 67
and so on......
I want to write a script that adds the values from 2nd column if their corresponding first column matches
for example add all 2nd column values if it's 1st column is 101
add all 2nd column values if it's 1st colummn is 102
add all 2nd column values if it's 1st colummn is 103 and so on ...
i wrote my script like this , but i'm not getting the correct result
awk '{print $1}' data.txt > col1.txt
while read line
do
awk ' if [$1 == $line] sum+=$2; END {print "Sum for time stamp", $line"=", sum}; sum=0' data.txt
done < col1.txt

awk '{array[$1]+=$2} END { for (i in array) {print "Sum for time stamp",i,"=", array[i]}}' data.txt

Pure Bash :
declare -a sum
while read -a line ; do
(( sum[${line[0]}] += line[1] ))
done < "$infile"
for index in ${!sum[#]}; do
echo -e "$index ${sum[$index]}"
done
The output:
101 48
102 47
103 45
108 21
109 103

Related

How to add the elements in a for loop [duplicate]

This question already has answers here:
Summing values of a column using awk command
(2 answers)
Closed 1 year ago.
so basically my code looks through data and greps whatever it begins with, and so I've been trying to figure out a way where I'm able to add the those values.
the sample input is
35 45 75 76
34 45 53 55
33 34 32 21
my code:
for id in $(awk '{ print $1 }' < $3); do echo $id; done
I'm printing it right now to see the values but basically whats outputted is
35
34
33
I'm trying to add them all together but I cant figure out how, some help would be appreciated.
my desired output would be
103
Lots of ways to do this, a few ideas ...
$ cat numbers.dat
35 45 75 76
34 45 53 55
33 34 32 21
Tweaking OP's current code:
$ sum=0
$ for id in $(awk '{ print $1 }' < numbers.dat); do ((sum+=id)); done
$ echo "${sum}"
102
Eliminating awk:
$ sum=0
$ while read -r id rest_of_line; do sum=$((sum+id)); done < numbers.dat
$ echo "${sum}"
102
Using just awk (looks like Aivean beat me to it):
$ awk '{sum+=$1} END {print sum}' numbers.dat
102
awk '{ sum += $1 } END { print sum }'
Test:
35 45 75 76
34 45 53 55
33 34 32 21
Result:
102
(sum(35, 34, 33) = 102, that's what you want, right?)
Here is the detailed explanation of how this works:
$1 is the first column of the input.
sum is the variable that holds the sum of all the values in the first column.
END { print sum } is the action to be performed after all the input has been processed.
So the awk program is basically summing up the first column of the input and printing the result.
This answer was partially generated by Davinci Codex model, supervised and verified by me.

Select first two columns from tab-delimited text file and and substitute with '_' character

I have a sample input file as follows
RF00001 1c2x C 3 118 77.20 1.6e-20 1 119 f29242
RF00001 1ffk 9 1 121 77.40 1.4e-20 1 119 8e2511
RF00001 1jj2 9 1 121 77.40 1.4e-20 1 119 f29242
RF00001 1k73 B 1 121 77.40 1.4e-20 1 119 8484c0
RF00001 1k8a B 1 121 77.40 1.4e-20 1 119 93c090
RF00001 1k9m B 1 121 77.40 1.4e-20 1 119 ebeb30
RF00001 1kc8 B 1 121 77.40 1.4e-20 1 119 bdc000
I need to extract the second and third columns from the text file and substitute the tab with '_'
Desired output file :
1c2x_C
1ffk_9
1jj2_9
1k73_B
1k8a_B
1k9m_B
1kc8_B
I am able to print the two columns by :
awk -F" " '{ print $2,$3 }' input.txt
but unable to substitute the tab with '_' with the following command
awk -F" " '{ print $2,'_',$3 }' input.txt
Could you please try following.
awk '{print $2"_"$3}' Input_file
2nd solution:
awk 'BEGIN{OFS="_"} {print $2,$3}' Input_file
3rd solution: Adding a sed solution.
sed -E 's/[^ ]* +([^ ]*) +([^ ]*).*/\1_\2/' Input_file

Bash - Read lines from file with intervals

I need to read all lines of the file separating at intervals. A function will execute a command with each batch of lines.
Lines range example:
1 - 20
21 - 50
51 - 70
...
I tried with the sed command in a forloop, but the range does not go to the end of the file. For example, a file with 125 lines reads up to 121, missing lines to reach the end.
I commented on the sed line because in this loop the range goes up to 121 and the COUNT is 125.
TEXT=`cat wordlist.txt`
COUNT=$( wc -l <<<$TEXT )
for i in $(seq 1 20 $COUNT);
do
echo "$i"
#sed -n "1","${i}p"<<<$TEXT
done
Output:
1
21
41
61
81
101
121
Thanks!
Quick fix - ensure the last line is processed by throwing $COUNT on the end of of values assigned to i:
for i in $(seq 1 20 $COUNT) $COUNT;
do
echo "$i"
done
1
21
41
61
81
101
121
125
If COUNT happens to be the same as the last value generated by seq then we'll need to add some logic to skip the second time around; for example, if COUNT=121 then we'll want to skip the second time around when i=121, eg:
# assume COUNT=121
lasti=0
for i in $(seq 1 20 $COUNT) $COUNT;
do
[ $lasti = $COUNT ] && break
echo "$i"
lasti=$i
done
1
21
41
61
81
101
121

extracting lines if the first field matches another list saved in a different file -- shell command

I have two files. One contains a list of items, e.g.,
Allie
Bob
John
Laurie
Another file (file2) contains a different list of items in a different order, but some items might overlap with the items in file 1, e.g,
Laurie 45 56 6 75
Moxipen 10 45 56 56
Allie 45 56 67 23
I want to intersect these two files and extract only those lines from file 2 whose first field matches an item in field 1.
i.e., my output should be
Allie 45 56 67 23
Laurie 45 56 6 75
(preferably in this order, but it's OK if not)
grep -f file1 file2 doesn't do what I want.
I also need something efficient because the second file is HUGE.
I also tried this:
awk -F, 'FNR==NR {a[$1]=$0; next}; $1 in a {print a[$1]}' file2 file1
If order doesn't matter then
awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Explanation
FNR==NR{ arr[$1]; next } Here we read first file (file1), arr is array, whose index key being first field $1.
$1 in arr we read second file ( file2), if array arr which was created while reading first file, has index key which is second file's first column ($1 in arr gives true, if index key exists), then print current record/row/line from file2
Test Results:
akshay#db-3325:/tmp$ cat file1
Allie
Bob
John
Laurie
akshay#db-3325:/tmp$ cat file2
Laurie 45 56 6 75
Moxipen 10 45 56 56
Allie 45 56 67 23
akshay#db-3325:/tmp$ awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Laurie 45 56 6 75
Allie 45 56 67 23
No need for complex joins, it is a filtering function
$ grep -wFf file1 file2
Laurie 45 56 6 75
Allie 45 56 67 23
has the benefit or keeping the order in file2 as well. -w option is for full word matches to eliminate sub-string matches to create false positives. Of course if your sample input is not representative and your data may contain key like entries in other fields this will not work without qualifying beginning of line.
This is the job that join is built for.
Providing a reproducer testable via copy-and-paste with shell functions (which you could replace with your actual input files):
cat_file1() {
printf '%s\n' Allie Bob John Laurie
}
cat_file2() {
printf '%s\n' 'Laurie 45 56 6 75' \
'Moxipen 10 45 56 56' \
'Allie 45 56 67 23'
}
join <(cat_file1 | sort) <(cat_file2 | sort)
...properly emits:
Allie 45 56 67 23
Laurie 45 56 6 75
Of course, don't cat file1 | sort -- run sort <file1 to provide a real handle for better efficiency, or (better!) store your inputs in sorted form in the first place.

shellscript and awk extraction to calculate averages

I have a shell script that contains a loop. This loop is calling another script. The output of each run of the loop is appended inside a file (outOfLoop.tr). when the loop is finished, awk command should calculate the average of specific columns and append the results to another file(fin.tr). At the end, the (fin.tr) is printed.
I managed to get the first part which is appending the results from the loop into (outOfLoop.tr) file. also, my awk commands seem to work... But I'm not getting the final expected output in terms of format. I think I'm missing something. Here is my try:
#!/bin/bash
rm outOfLoop.tr
rm fin.tr
x=1
lmax=4
while [ $x -le $lmax ]
do
calling another script >> outOfLoop.tr
x=$(( $x + 1 ))
done
cat outOfLoop.tr
#/////////////////
#//I'm getting the above part correctly and the output is :
27 194 119 59 178
27 180 100 30 187
27 175 120 59 130
27 189 125 80 145
#////////////////////
#back again to the script
echo "noRun\t A\t B\t C\t D\t E"
echo "----------------------\n"
#// print the total number of runs from the loop
echo "$lmax\t">>fin.tr
#// extract the first column from the output which is 27
awk '{print $1}' outOfLoop.tr >>fin.tr
echo "\t">>fin.tr
#Sum the column---calculate average
awk '{s+=$5;max+=0.5}END{print s/max}' outOfLoop.tr >>fin.tr
echo "\t">>fin.tr
awk '{s+=$4;max+=0.5}END{print s/max}' outOfLoop.tr >>fin.tr
echo "\t">>fin.tr
awk '{s+=$3;max+=0.5}END{print s/max}' outOfLoop.tr >>fin.tr
echo "\t">>fin.tr
awk '{s+=$2;max+=0.5}END{print s/max}' outOfLoop.tr >> fin.tr
echo "-------------------------------------------\n"
cat fin.tr
rm outOfLoop.tr
I want the format to be like :
noRun A B C D E
----------------------------------------------------------
4 27 average average average average
I have incremented max inside the awk command by 0.5 as there was new line between the out put of the results (output of outOfLoop file)
$ cat file
27 194 119 59 178
27 180 100 30 187
27 175 120 59 130
27 189 125 80 145
$ cat tst.awk
NF {
for (i=1;i<=NF;i++) {
sum[i] += $i
}
noRun++
}
END {
fmt="%-10s%-10s%-10s%-10s%-10s%-10s\n"
printf fmt,"noRun","A","B","C","D","E"
printf "----------------------------------------------------------\n"
printf fmt,noRun,$1,sum[2]/noRun,sum[3]/noRun,sum[4]/noRun,sum[5]/noRun
}
$ awk -f tst.awk file
noRun A B C D E
----------------------------------------------------------
4 27 184.5 116 57 160

Resources