This question already has answers here:
Summing values of a column using awk command
(2 answers)
Closed 1 year ago.
so basically my code looks through data and greps whatever it begins with, and so I've been trying to figure out a way where I'm able to add the those values.
the sample input is
35 45 75 76
34 45 53 55
33 34 32 21
my code:
for id in $(awk '{ print $1 }' < $3); do echo $id; done
I'm printing it right now to see the values but basically whats outputted is
35
34
33
I'm trying to add them all together but I cant figure out how, some help would be appreciated.
my desired output would be
103
Lots of ways to do this, a few ideas ...
$ cat numbers.dat
35 45 75 76
34 45 53 55
33 34 32 21
Tweaking OP's current code:
$ sum=0
$ for id in $(awk '{ print $1 }' < numbers.dat); do ((sum+=id)); done
$ echo "${sum}"
102
Eliminating awk:
$ sum=0
$ while read -r id rest_of_line; do sum=$((sum+id)); done < numbers.dat
$ echo "${sum}"
102
Using just awk (looks like Aivean beat me to it):
$ awk '{sum+=$1} END {print sum}' numbers.dat
102
awk '{ sum += $1 } END { print sum }'
Test:
35 45 75 76
34 45 53 55
33 34 32 21
Result:
102
(sum(35, 34, 33) = 102, that's what you want, right?)
Here is the detailed explanation of how this works:
$1 is the first column of the input.
sum is the variable that holds the sum of all the values in the first column.
END { print sum } is the action to be performed after all the input has been processed.
So the awk program is basically summing up the first column of the input and printing the result.
This answer was partially generated by Davinci Codex model, supervised and verified by me.
Related
I have two files. One contains a list of items, e.g.,
Allie
Bob
John
Laurie
Another file (file2) contains a different list of items in a different order, but some items might overlap with the items in file 1, e.g,
Laurie 45 56 6 75
Moxipen 10 45 56 56
Allie 45 56 67 23
I want to intersect these two files and extract only those lines from file 2 whose first field matches an item in field 1.
i.e., my output should be
Allie 45 56 67 23
Laurie 45 56 6 75
(preferably in this order, but it's OK if not)
grep -f file1 file2 doesn't do what I want.
I also need something efficient because the second file is HUGE.
I also tried this:
awk -F, 'FNR==NR {a[$1]=$0; next}; $1 in a {print a[$1]}' file2 file1
If order doesn't matter then
awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Explanation
FNR==NR{ arr[$1]; next } Here we read first file (file1), arr is array, whose index key being first field $1.
$1 in arr we read second file ( file2), if array arr which was created while reading first file, has index key which is second file's first column ($1 in arr gives true, if index key exists), then print current record/row/line from file2
Test Results:
akshay#db-3325:/tmp$ cat file1
Allie
Bob
John
Laurie
akshay#db-3325:/tmp$ cat file2
Laurie 45 56 6 75
Moxipen 10 45 56 56
Allie 45 56 67 23
akshay#db-3325:/tmp$ awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Laurie 45 56 6 75
Allie 45 56 67 23
No need for complex joins, it is a filtering function
$ grep -wFf file1 file2
Laurie 45 56 6 75
Allie 45 56 67 23
has the benefit or keeping the order in file2 as well. -w option is for full word matches to eliminate sub-string matches to create false positives. Of course if your sample input is not representative and your data may contain key like entries in other fields this will not work without qualifying beginning of line.
This is the job that join is built for.
Providing a reproducer testable via copy-and-paste with shell functions (which you could replace with your actual input files):
cat_file1() {
printf '%s\n' Allie Bob John Laurie
}
cat_file2() {
printf '%s\n' 'Laurie 45 56 6 75' \
'Moxipen 10 45 56 56' \
'Allie 45 56 67 23'
}
join <(cat_file1 | sort) <(cat_file2 | sort)
...properly emits:
Allie 45 56 67 23
Laurie 45 56 6 75
Of course, don't cat file1 | sort -- run sort <file1 to provide a real handle for better efficiency, or (better!) store your inputs in sorted form in the first place.
$ cat grades.dat
santosh 65 65 65 65
john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84
santosh 99 99 99 99 99
Scripts:-
#!/usr/bin/bash
filename="$1"
while read line
do
a=`grep -w "santosh" $1 | awk '{print$1}' |wc -l`
echo "total is count of the file is $a";
done <"$filename"
O/p
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
Real O/P should be
total is count of the file is 2 like this right..please let me know,where i am missing in above scripts.
Whilst others have shown you better ways to solve your problem, the answer to your question is in the following line:
a=`grep -w "santosh" $1 | awk '{print$1}' |wc -l`
You are storing names in the variable "line" through the while loop, but it is never used. Instead your loop is always looking for "santosh" which does appear twice and because you run the same query for all 5 lines in the file being searched, you therefore get 5 lines of the exact same output.
You could alter your current script like so:
a=$(grep -w "$line" "$filename" | awk '{print$1}' | wc -l)
The above is not meant to be a solution as others have pointed out, but it does solve your issue.
So I'm trying to filter 'duplicate' results from a file.
Ive a file that looks like:
7 14 35 35 4 23
23 53 85 27 49 1
35 4 23 27 49 1
....
that I mentally can divide up into item 1 and item 2. Item 1 is the first 3 numbers on each line and item 2 is the last 3 numbers on each line.
I've also got a list of 'items':
7 14 35
23 53 85
35 4 23
27 49 1
...
At a certain point in the file, lets say line number 3 (this number is arbitrary and for example), the 'items' can be separated. Lets say lines 1 and 2 are red and lines 3 and 4 are blue.
I want to make sure on my original file that there are no red red or blue blues - only red blue or blue red, while retaining the original numbers.
So ideally the file would go from:
7 14 35 35 4 23 (red blue)
23 53 85 27 49 1 (red blue)
35 4 23 27 49 1 (blue blue)
....
to
7 14 35 35 4 23 (red blue)
23 53 85 27 49 1 (red blue)
....
I'm having trouble thinking of a good (or any) way to do it.
Any help is appreciated.
EDIT:
An filtering script I have that grabs lines if they have blue or red on the lines:
#!/bin/bash
while read name; do
grep "$name" Twoitems
done < Itemblue > filtered
while read name2; do
grep "$name2" filtered
done < Itemred > double filtered
EDIT2:
Example input an item files:
This is pretty easy using grep with option -f.
First of all, generate four 'pattern' files out of your items file.
I am using AWK here, but you might as well use Perl or what not.
Following your example, I put the 'split' between line 2 and 3; please adjust when necessary.
awk 'NR <= 2 {print "^" $0 " "}' items.txt > starts_red.txt
awk 'NR <= 2 {print " " $0 "$"}' items.txt > ends_red.txt
awk 'NR >= 3 {print "^" $0 " "}' items.txt > starts_blue.txt
awk 'NR >= 3 {print " " $0 "$"}' items.txt > ends_blue.txt
Next, use a grep pipeline using the pattern files (option -f) to filter the appropriate lines from the input file.
grep -f starts_red.txt input.txt | grep -f ends_blue.txt > red_blue.txt
grep -f starts_blue.txt input.txt | grep -f ends_red.txt > blue_red.txt
Finally, concatenate the two output files.
Of course, you might as well use >> to let the second grep pipeline append its output to the output of the first.
Let's say file1 contents
7 14 35 35 4 23
23 53 85 27 49 1
35 4 23 27 49 1
and file2 contents are
7 14 35
23 53 85
35 4 23
27 49 1
Then, you can use a hash to map line-nos to colors based on your cutoff and using that hash, compare lines in first file for the existence of different colors after splitting on third space of each line.
I suppose you want something like below script.Feel free to modify it according to your requirements.
#!/usr/bin/perl
use strict;
use warnings;
#declare a global hash to keep track of line and colors
my %color;
#open both the files
open my $fh1, '<', 'file1' or die "unable to open file1: $! \n";
open my $fh2, '<', 'file2' or die "unable to open file2: $! \n";
#iterate over the second file and store the lines as
#red or blue in hash based on line nos
while(<$fh2>){
chomp;
if($. <= 2){
$color{$_}="red";
}
else{
$color{$_}="blue";
}
}
#close second file
close($fh2);
#iterate over first file
while(<$fh1>){
chomp;
#split the line on 3rd space
my ($part1,$part2)=split /(?:\d+\s){3}\K/;
#remove trailing spaces present
$part1=~s/\s+$//;
#print if $part1 and $part does not belong to same color
print "$_\n" if($color{$part1} ne $color{$part2});
}
#close first file
close($fh1);
I am trying to collapse sequential numbers to ranges in bash. For example, if my input file is
1
2
3
4
15
16
17
18
22
23
45
46
47
I want the output as:
1 4
15 18
22 23
45 47
How can I do this with awk or sed in a single line command?
Thanks for any help!
$ awk 'NR==1{first=$1;last=$1;next} $1 == last+1 {last=$1;next} {print first,last;first=$1;last=first} END{print first,last}' file
1 4
15 18
22 23
45 47
Explanation
NR==1{first=$1;last=$1;next}
On the first line, initialize the variables first and last and skip to next line.
$1 == last+1 {last=$1;next}
If this line continues in the sequence from the last, update last and jump to the next line.
print first,last;first=$1;last=first
If we get here, we have a break in the sequence. Print out the range for the last sequence and reinitialize the variables for a new sequence.
END{print first,last}
After we get to the end of the file, print the final sequence.
I have a file that has 2 columns as given below....
101 6
102 23
103 45
109 36
101 42
108 21
102 24
109 67
and so on......
I want to write a script that adds the values from 2nd column if their corresponding first column matches
for example add all 2nd column values if it's 1st column is 101
add all 2nd column values if it's 1st colummn is 102
add all 2nd column values if it's 1st colummn is 103 and so on ...
i wrote my script like this , but i'm not getting the correct result
awk '{print $1}' data.txt > col1.txt
while read line
do
awk ' if [$1 == $line] sum+=$2; END {print "Sum for time stamp", $line"=", sum}; sum=0' data.txt
done < col1.txt
awk '{array[$1]+=$2} END { for (i in array) {print "Sum for time stamp",i,"=", array[i]}}' data.txt
Pure Bash :
declare -a sum
while read -a line ; do
(( sum[${line[0]}] += line[1] ))
done < "$infile"
for index in ${!sum[#]}; do
echo -e "$index ${sum[$index]}"
done
The output:
101 48
102 47
103 45
108 21
109 103