Bash - Sum numbers in line, line by line - bash

So I have a file which looks likes the following:
9032894 First Last 89 43 100
9423897 First Last 89 20 48
And so on, continuing on in this format. My hope is to get the sum of the last three numbers, (such as 89 43 100 for the first line), and then use echo to display this sum. In my attempts to do this, I just end up with the entire first column stored in one variable, as the following code does:
first=$(echo $INT | cut -d" " -f4 $1)
Would take the entire first column in the file and store it in "first". How would I go through and sum each individual line, instead of by column? (I already know how to do this using awk, I'm attempting to do an alternate way of coding using bash)? My full code thus far (which isn't even close to working) is:
#!/bin/bash
filename=$1
while read -a rows
do
first=$(echo $INT | cut -d" " -f4 $1)
second=$(echo $INT | cut -d" " -f5 $1)
third=$(echo $INT | cut -d" " -f6 $1)
echo ${first}
echo ${second}
echo ${third}
done< $filename
Thanks in advance, it's much appreciated.

You can read file columns directly into variables:
while read id firstname lastname first second third
echo "${first}"
echo "${second}"
echo "${third}"
echo Sum: $(( first + second + third ))
done< "$filename"
And get in the habit of quoting your variables, unless you specifically need the result to undergo word-splitting and globbing.

#Eric: Try:
awk '{print $NF+$(NF-1)+$(NF-2)}' Input_file
As you have mentioned last 3 fields always we need to take the sum so we need not to go through a loop we could add their values simply by doing($NF+$(NF-1)+$(NF-2)) where $NF means last field of a line, $(NF-1) is second last field and so on.

If you read a line into an array, just sum the last three elements of the array:
while read -a cells ; do
echo $(( cells[-1] + cells[-2] + cells[-3] ))
done
Output:
232
157

You could also use awk:
$ awk '{s=0; for(i=4;i<=NF;i++) s+=$i; print s}' file
232
157

Related

Bash : How to check in a file if there are any word duplicates

I have a file with 6 character words in every line and I want to check if there are any duplicate words. I did the following but something isn't right:
#!/bin/bash
while read line
do
name=$line
d=$( grep '$name' chain.txt | wc -w )
if [ $d -gt '1' ]; then
echo $d $name
fi
done <$1
Assuming each word is on a new line, you can achieve this without looping:
$ cat chain.txt | sort | uniq -c | grep -v " 1 " | cut -c9-
You can use awk for that:
awk -F'\n' 'found[$1] {print}; {found[$1]++}' chain.txt
Set the field separator to newline, so that we look at the whole line. Then, if the line already exists in the array found, print the line. Finally, add the line to the found array.
Note: If a line will only be suppressed once, so if the same line appears, say, 6 times, it will be printed 5 times.

adding numbers without grep -c option

I have a txt file like
Peugeot:406:1999:Silver:1
Ford:Fiesta:1995:Red:2
Peugeot:206:2000:Black:1
Ford:Fiesta:1995:Red:2
I am looking for a command That counts the number of red Ford Fiesta cars.
The last number in each line is the amount of that particular car.
The command I am looking for CANNOT use the -c option of grep.
so this command should just output the number 4.
Any help would be welcome, thank you.
A simple bit of awk would do the trick:
awk -F: '$1=="Ford" && $4=="Red" { c+=$5 } END { print c }' file
Output:
4
Explanation:
The -F: switch means that the input field separator is a colon, so the car manufacturer is $1 (the 1st field), the model is $2, etc.
If the 1st field is "Ford" and the 4th field is "Red", then add the value of the 5th (last) field to the variable c. Once the whole file has been processed, print out the value of c.
For a native bash solution:
c=0
while IFS=":" read -ra col; do
[[ ${col[0]} == Ford ]] && [[ ${col[3]} == Red ]] && (( c += col[4] ))
done < file && echo $c
Effectively applies the same logic as the awk one above, without any additional dependencies.
Methods:
1.) use some scripting language for counting, like awk or perl and such. Awk solution already posted, here is an perl solution.
perl -F: -lane '$s+=$F[4] if m/Ford:.*:Red/}{print $s' < carfile
#or
perl -F: -lane '$s+=$F[4] if ($F[0]=~m/Ford/ && $F[3]=~/Red/)}{print $s' < carfile
both examples prints
4
2.) The second method is based on shell-pipelining
filter out the right rows
extract the column with the count
sum the numbers
e.g some examples:
grep 'Ford:.*:Red:' carfile | cut -d: -f5 | paste -sd+ | bc
the grep filter out the right rows
the cut get the last column
the paste creates an line like 2+2 what can be counted by
the bc for counting
Another example:
sed -n 's/\(Ford:.*:Red\):\(.*\)/\2/p' carfile | paste -sd+ | bc
the sed filter and extract
another example - different way of counting
(echo 0 ; sed -n 's/\(Ford:.*:Red\):\(.*\)/\2+/p' carfile ;echo p )| dc
numbers are counted by RPN calculator called dc, e.g. it works like 0 2 + - first comes the values and as the last the operation.
the first echo puts into the stack 0
the sed creates a stream of numbers like 2+ 2+
the last echo p prints the stack
exists many other possibilies how count a strem of numbers.
e.g counting by bash
while read -r num
do
sum=$(( $sum + $num ))
done < <(sed -n 's/\(Ford:.*:Red\):\(.*\)/\2/p' carfile)
and pure bash:
while IFS=: read -r maker model year color count
do
if [[ "$maker" == "Ford" && "$color" == "Red" ]]
then
(( sum += $count ))
fi
done < carfile
echo $sum

Cut column by column name in bash

I want to specify a column by name (i.e. 102), find the position of this column and then use something like cut -5,7- with the found position to delete the specified column.
This is my file header (delim = "\t"):
#CHROM POS 1 100 101 102 103 107 108
This awk should work:
awk -F'\t' -v c="102" 'NR==1{for (i=1; i<=NF; i++) if ($i==c){p=i; break}; next} {print $p}' file
Here's one possible solution without the restriction that only one column is to be removed. It is written as a bash function, where the first argument is the filename, and the remaining arguments are the columns to exclude.
rmcol() {
local file=$1
shift
cut -f$(head -n1 "$file" | tr \\t \\n | grep -vFxn "${#/#/-e}" |
cut -d: -f1 | paste -sd,) "$file"
}
If you want to select rather than exclude the named columns, then change -vFxn to -Fxn.
That almost certainly requires some sort of explanation. The first two lines of the function just removes the filename from the arguments and stores it for later use. The cut command will then select the appropriate columns; the column numbers are computed with the complicated pipeline which follows:
head -n1 "$file" | # Take the first line of the file
tr \\t \\n | # Change all the tabs to newlines [ Note 1]
grep # Select all lines (i.e. column names) which
-v # don't match
F # the literal string
x # which is the complete line
n # and include the line number in the output
"${#/#/-e}" | # Put -e at the beginning of each command line argument,
# converting the arguments into grep pattern arguments (-e)
cut -d: -f1 | # Select only the line number from that matches
paste -sd, # Paste together all the line numbers, separated with commas.
Using a for loop in bash:
C=1; for i in $(head file -n 1) ; do if [ $i == "102" ] ; then break ; else C=$(( $C + 1 )) ; fi ; done ; echo $C
And a full script
C=1
for i in $(head in_file -n 1) ; do
echo $i
if [ $i == "102" ] ; then
break ;
else
echo $C
C=$(( $C + 1 ))
fi
done
cut -f1-$(($C-1)),$(($C+1))- in_file
trying a solution without looping through columns, I get:
#!/bin/bash
pick="$1"
titles="pos 1 100 102 105"
tmp=" $titles "
tmp="${tmp%% $pick* }"
tmp=($tmp)
echo "column ${#tmp[#]}"
It suffers from incorrectly reporting last column if column name can't be found.
Try this small awk utility to cut specific headers - https://github.com/rohitprajapati/toyeca-cutter
Example usage -
awk -f toyeca-cutter.awk -v c="col1, col2, col3, col4" my_file.csv

Bash - invalid arithmetic operator

I'm trying to study for a test and one of the subjects are bash scripts.
I have the following txt file :
123456 100
654321 50
203374111 86
I need to get the averages of the scores (the numbers in the second column).
This is what I have written :
cat $course_name$end | while read line; do
sum=`echo $line | cut -f2 -d" "`
let total+=$sum
done
I have tried with
while read -a line
and then
let sum+=${line[1]}
But I'm still getting the same error mentioned in the header.
I love AWK:
awk -F\* '{sum+=$3} END {print sum/NR}' x.txt
So in x.txt are values are stored. Please note that many answers don't actually compute the average, as they need to divide by the line numbers in the end. Often it will be performed by a wc -l < x.txt but in my solution you will get it almost for free.
cat your_file_name.txt | cut -f2 -d" " | paste -sd+ | bc
This should do the job!
You are very close, this works for me:
while read line; do
sum=$(echo $line | cut -f2 -d" ")
echo "sum is $sum"
let total+=$sum
echo "total is $total"
done < file
echo "total is $total"
As you can see, there is no need to use cat $course_name$end, it is enough to do
while read line
do
done < file
Also, it is more recommendable to use
sum=$(echo $line | cut -f2 -d" ")
rather than
sum=`echo $line | cut -f2 -d" "`
Or even
sum=$(cut -f2 -d" " <<< "$line")
There's no need to use cat as well as read; you can redirect the contents of the file into the loop. You also don't need to use let for arithmetic.
sum = 0
count = 0
while read id score; do
(( sum += score )) && (( ++count ))
done < "$course_name$end"
echo $(( sum / count ))
This will give you an integer result, as bash doesn't do floating point arithmetic. To get a floating point result, you could use bc:
bc <<< "scale=2;$a/$b"
This will give you a result correct to 2 decimal places.

BASH script - print sorted contents from all files in directory with no rep's

In the current directory there are files with names of the form "gradesXXX" (where XXX is a course number) which look like this:
ID GRADE (this line is not contained in the files)
123456789 56
213495873 84
098342362 77
. .
. .
. .
I want to write a BASH script that prints all the IDs that have a grade above a certain number, which is given as the first parameter to said script.
The requirements are that an ID must be printed once at most, and that no intermediate files are used.
I was guided to use two scripts - the first with length of one line, and the second with length of up to six lines (not including the "#!" line).
I'm quite lost with this one so any suggestions will be appreciated.
Cheers.
The answer I was looking for was
// internal script
#!/bin/bash
while read line; do
line_split=( $line )
if (( ${line_split[1]} > $1 )); then
echo ${line_split[0]}
fi
done
// external script
#!/bin/bash
cat grades* | sort -r -n -k 1 | internalScript $1 | cut -f1 -d" " | uniq
OK, a simple solution.
cat grades[0-9][0-9][0-9] | sort -nurk 2 | while read ID GRADE ; do if [ $GRADE -lt 60 ] ; then break ; fi ; echo $ID ; done | sort -u
I'm not sure why two scripts should be necessary. All in a script:
#!/bin/bash
threshold=$1
cat grades[0-9][0-9][0-9] | sort -nurk 2 | while read ID GRADE ; do if [ $GRADE -lt $threshold ] ; then break ; fi ; echo $ID ; done | sort -u
We first cat all the grade files, the sort them by grade in reverse order. The while loop breaks if grade is below threshold, so that only lines with higher grades get their ID printed. sort -u makes sure that every ID is sent only once.
You can use awk:
awk '{ if ($2 > 70) print $1 }' grades777
It prints the first column of every line which seconds column is greater than 70. If you need to change the threshold:
N=71
awk '{ if ($2 > '$N') print $1 }' grades777
That ' are required to pass shell variables in AWK. To work with all grade??? files in the current directory and remove duplicated lines:
awk '{ if ($2 > '$N') print $1 }' grades??? | sort -u
A simple one-line solution.
Yet another solution:
cat grades[0-9][0-9][0-9] | awk -v MAX=70 '{ if ($2 > MAX) foo[$1]=1 }END{for (id in foo) print id }'
Append | sort -n after that if you want the IDs in sorted order.
In pure bash :
N=60
for file in /path/*; do
while read id grade; do ((grade > N)) && echo "$id"; done < "$file"
done
OUTPUT
213495873
098342362

Resources