Doing math on the linux command line - bash

I have a log file from a web server which looks like this;
1908 462
232 538
232 520
232 517
My task is to total column 1 and column 2 in a bash script. My desired output is;
2604 2037
I know of awk or sed which could go a long way to solving my problem but I can't fathom how to actually do it. I've trawled examples on Google but haven't turned up anything useful. Can someone point me in the right direction please?

awk '{a += $1; b += $2} END { print a " " b }' foo.log
(Note the complete lack of error checking.)
EDIT :
Ok, here's a version with error checking:
awk 'BEGIN { ok = 1 } { if (/^ *[0-9]+ +[0-9]+ *$/) { a += $1; b += $2 } else { ok = 0; exit 1 } } END { if (ok) print a, b }' foo.log
If you don't want to accept leading or trailing blanks, delete the two " *"s in the if statement.
But this is big enough that it probably shouldn't be a one-liner:
#!/usr/bin/awk -f
BEGIN {
ok = 1
}
{
if (/^ *[0-9]+ +[0-9]+ *$/) {
a += $1
b += $2
}
else {
ok = 0
exit 1
}
}
END {
if (ok) print a, b
}
There's still no overflow or underflow checking, and it assumes that there will be no signs. The latter is easy enough to fix; the former would be more difficult. (Note that awk uses floating-point internally; if the sum is big enough, it could quietly lose precision.)

Try
awk '{a+=$1;b+=$2} END {print a, b}' file

Here is a non-awk alternative for you:
echo $( cut -f 1 -d " " log_file | tr '\n' + | xargs -I '{}' echo '{}'0 | bc ) $( cut -f 2 -d " " log_file | tr '\n' + | xargs -I '{}' echo '{}'0 | bc )
Make sure you replace log_file with your own file and that file does not have any extra or unnecessary new lines. If you have such lines then we would need to filter those out using a command like the following:
grep -v "^\s*$" log_file

These might work for you:
sed ':a;N;s/ \(\S*\)\n\(\S*\) /+\2 \1+/;$!ba;s/ /\n/p;d' file | bc | paste -sd' '
or
echo $(cut -d' ' -f1 file | paste -sd+ | bc) $(cut -d' ' -f2 file| paste -sd+ |bc)

Related

Counting palindromes in a text file

Having followed this thread BASH Finding palindromes in a .txt file I can't figure out what am I doing wrong with my script.
#!/bin/bash
search() {
tr -d '[[:punct:][:digit:]#]' \
| sed -E -e '/^(.)\1+$/d' \
| tr -s '[[:space:]]' \
| tr '[[:space:]]' '\n'
}
search "$1"
paste <(search <"$1") <(search < "$1" | rev) \
| awk '$1 == $2 && (length($1) >=3) { print $1 }' \
| sort | uniq -c
All im getting from this script is output of the whole text file. I want to only output palindromes >=3 and count them such as
425 did
120 non
etc. My textfile is called sample.txt and everytime i run the script with: cat sample.txt | source palindrome I get message 'bash: : No such file or directory'.
Using awk and sed
awk 'function palindrome(str) {len=length(str); for(k=1; k<=len/2+len%2; k++) { if(substr(str,k,1)!=substr(str,len+1-k,1)) return 0 } return 1 } {for(i=1; i<=NF; i++) {if(length($i)>=3){ gsub(/[^a-zA-Z]/,"",$i); if(length($i)>=3) {$i=tolower($i); if(palindrome($i)) arr[$i]++ }} } } END{for(i in arr) print arr[i],i}' file | sed -E '/^[0-9]+ (.)\1+$/d'
Tested on 1.2GB file and execution time was ~4m 40s (i5-6440HQ # 2.60GHz/4 cores/16GB)
Explanation :
awk '
function palindrome(str) # Function to check Palindrome
{
len=length(str);
for(k=1; k<=len/2+len%2; k++)
{
if(substr(str,k,1)!=substr(str,len+1-k,1))
return 0
}
return 1
}
{
for(i=1; i<=NF; i++) # For Each field in a record
{
if(length($i)>=3) # if length>=3
{
gsub(/[^a-zA-Z]/,"",$i); # remove non-alpha character from it
if(length($i)>=3) # Check length again after removal
{
$i=tolower($i); # Covert to lowercase
if(palindrome($i)) # Check if it's palindrome
arr[$i]++ # and store it in array
}
}
}
}
END{for(i in arr) print arr[i],i}' file | sed -E '/^[0-9]+ (.)\1+$/d'
sed -E '/^[0-9]+ (.)\1+$/d' : From the final result check which strings are composed of just repeated chracters like AAA, BBB etc and remove them.
Old Answer (Before EDIT)
You can try below steps if you want to :
Step 1 : Pre-processing
Remove all unnecessary chars and store the result in temp file
tr -dc 'a-zA-Z\n\t ' <file | tr ' ' '\n' > temp
tr -dc 'a-zA-Z\n\t ' This will remove all except letters,\n,\t, space
tr ' ' '\n' This will convert space to \n to separate each word in newlines
Step-2: Processing
grep -wof temp <(rev temp) | sed -E -e '/^(.)\1+$/d' | awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'
grep -wof temp <(rev temp) This will give you all palindromes
-w : Select only those lines containing matches that form whole words.
For example : level won't match with levelAAA
-o : Print only the matched group
-f : To use each string in temp file as pattern to search in <(rev temp)
sed -E -e '/^(.)\1+$/d': This will remove words formed of same letters like AAA, BBBBB
awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }' : This will filter words having length>=3 and counts their frequency and finally prints the result
Example :
Input File :
$ cat file
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
Output:
$ tr -dc 'a-zA-Z\n\t ' <file | tr ' ' '\n' > temp
$ grep -wof temp <(rev temp) | sed -E -e '/^(.)\1+$/d' | awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'
3 dad
3 kayak
3 bob
Just a quick Perl alternative:
perl -0nE 'for( /(\w{3,})/g ){ $a{$_}++ if $_ eq reverse($_)}
END {say "$_ $a{$_}" for keys %a}'
in Perl, $_ should be read as "it".
for( /(\w{3,})/g ) ... for all relevant words (may need some work to reject false positives like "12a21")
if $_ eq reverse($_) ... if it is palindrome
END {say "$_ $a{$_}" for...} ... tell us all the its and its number
\thanks{sokowi,batMan}
Running the Script
The script expects that the file is given as an argument. The script does not read stdin.
Remove the line search "$1" in the middle of the script. It is not part of the linked answer.
Make the script executable using chmod u+x path/to/palindrome.
Call the script using path/to/palindrome path/to/sample.txt. If all the files are in the current working directory, then the command is
./palindrome sample.txt
Alternative Script
Sometimes the linked script works and sometimes it doesn't. I haven't found out why. However, I wrote an alternative script which does the same and is also a bit cleaner:
#! /bin/bash
grep -Po '\w{3,}' "$1" | grep -Evw '(.)\1*' | sort > tmp-words
grep -Fwf <(rev tmp-words) tmp-words | uniq -c
rm tmp-words
Save the script, make it executable, and call it with a file as its first argument.

awk: division by zero input record number 1, file source line number 1

Im trying to get the signed log10-transformed t-test P-value by using the sign of the log2FoldChange multiplied by the inverse of the pvalue,
cat test.xlx | sort -k7g \
| cut -d '_' -f2- \
| awk '!arr[$1]++' \
| awk '{OFS="\t"}
{ if ($6>0) printf "%s\t%4.3e\n", $1, 1/$7; else printf "%s\t%4.3e\n", $1, -1/$7 }' \
| sort -k2gr > result.txt
text.xls =
ID baseMean log2FoldChange lfcSE stat pvalue padj
ENSMUSG00000037692-Ahdc1 2277.002091 1.742481553 0.170388822 10.22650154 1.51e-24 2.13e-20
ENSMUSG00000035561-Aldh1b1 768.4504879 -2.325533089 0.248837002 -9.345608047 9.14e-21 6.45e-17
ENSMUSG00000038932-Tcfl5 556.1693605 -3.742422892 0.402475728 -9.298505809 1.42e-20 6.71e-17
ENSMUSG00000057182-Scn3a 1363.915962 1.621456045 0.175281852 9.250564289 2.23e-20 7.89e-17
ENSMUSG00000038552-Fndc4 378.821132 2.544026087 0.288831276 8.808000721 1.27e-18 3.6e-15
but getting error awk: division by zero
input record number 1, file
source line number 1
As #jas points out in a comment, you need to skip your header line but your script could stand some more cleanup than that. Try this:
sort -k7g test.xlx |
awk '
BEGIN { OFS="\t" }
{ sub(/^[^_]+_/,"") }
($6~/[0-9]/) && (!seen[$1]++) { printf "%s\t%4.3e\n", $1, ($7?($6>0?1:-1)/$7:0) }
' |
sort -k2gr
ENSMUSG00000035561-Aldh1b1 1.550e+16
ENSMUSG00000037692-Ahdc1 4.695e+19
ENSMUSG00000038552-Fndc4 2.778e+14
ENSMUSG00000038932-Tcfl5 1.490e+16
ENSMUSG00000057182-Scn3a 1.267e+16
The above will print a result of zero instead of failing when $7 is zero.
What's the point of the cut -d '_' -f2- in your original script though (implemented above with sub()? You don't have any _s in your input file.

Escape # using awk

I have the following:
ssh $DOMAIN -l root "grep "'$EMAIL#$DOMAIN'" /var/log/maillog | grep retr= | grep -v retr=0 | awk '{ print "'$11'" }' | cut -d, -f1 | cut -d= -f2 | awk '{ t += $1 } END { print "'total: '", t, "' bytes transferred over POP3'"}'"
Running this command gives the following output:
stdin: is not a tty
awk: { t += blah#email.com } END { print total: , t, bytes transferred over POP3}
awk: ^ invalid char '#' in expression
Looks like the issue is with awk '{ t += $1 } because of the # in $1, however I've tried several different methods of escaping this with no luck. Any advice is appreciated.
I don't think this is the command you ran, because the quotes don't match up (there's a superfluous "). It sounds like the command you actually ran expanded "$1", thus causing awk to interpret a literal email address instead of reading from the first field.
The final part should be:
awk '{ t += $1 } END { print "total: ", t, " bytes transferred over POP3"}'

Switching the format of this output?

I have this script written to print the distribution of words in one or more files:
cat "$#" | tr -cs '[:alpha:]' '\n' |
tr '[:upper:]' '[:lower:]' | sort |
uniq -c | sort -n
Which gives me an output such as:
1 the
4 orange
17 cat
However, I would like to change it so that the word is listed first (I'm assuming sort would be involved so its alphabetical) , not the number, like so:
cat 17
orange 4
the 1
Is there just a simple option I would need to switch this? Or is it something more complicated?
Pipe the output to
awk '{print $2, $1}'
or you can use awk for the complete task:
{
$0 = tolower($0) # remove case distinctions
# remove punctuation
gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
for (i = 1; i <= NF; i++)
freq[$i]++
}
END {
for (word in freq)
printf "%s\t%d\n", word, freq[word]
}
usage:
awk -f wordfreq.awk input

Count number of names starts with particular character in file

i have the following file::
FirstName, FamilyName, Address, PhoneNo
the file is sorted according to the family name, how can i count the number of family names starts with a particular character ??
output should look like this ::
A: 2
B: 1
...
??
With awk:
awk '{print substr($2, 1, 1)}' file|
uniq -c|
awk '{print $2 ": " $1}'
OK, no awk. Here's with sed:
sed s'/[^,]*, \(.\).*/\1/' file|
uniq -c|
sed 's/.*\([0-9]\)\+ \([a-zA-Z]\)\+/\2: \1/'
OK, no sed. Here's with python:
import csv
r = csv.reader(open(file_name, 'r'))
d = {}
for i in r:
d[i[1][1]] = d.get(i[1][1], 0) + 1
for (k, v) in d.items():
print "%s: %s" % (k, v)
while read -r f l r; do echo "$l"; done < inputfile | cut -c 1 | sort | uniq -c
Just the Shell
#! /bin/bash
##### Count occurance of familyname initial
#FirstName, FamilyName, Address, PhoneNo
exec <<EOF
Isusara, Ali, Someplace, 022-222
Rat, Fink, Some Hole, 111-5555
Louis, Frayser, whaterver, 123-1144
Janet, Hayes, whoever St, 111-5555
Mary, Holt, Henrico VA, 222-9999
Phillis, Hughs, Some Town, 711-5525
Howard, Kingsley, ahahaha, 222-2222
EOF
while read first family rest
do
init=${family:0:1}
[ -n "$oinit" -a $init != "$oinit" ] && {
echo $oinit : $count
count=0
}
oinit=$init
let count++
done
echo $oinit : $count
Running
frayser#gentoo ~/doc/Answers/src/SH/names $ sh names.sh
A : 1
F : 2
H : 3
K : 1
frayser#gentoo ~/doc/Answers/src/SH/names $
To read from a file, remove the here document, and run:
chmod +x names.sh
./names.sh <file
The "hard way" — no use of awk or sed, exactly as asked for. If you're not sure what any of these commands mean, you should definitely look at the man page for each one.
INTERMED=`mktemp` # Creates a temporary file
COUNTS_L=`mktemp` # A second...
COUNTS_R=`mktemp` # A third...
cut -d , -f 2 | # Extracts the FamilyName field only
tr -d '\t ' | # Deletes spaces/tabs
cut -c 1 | # Keeps only the first character
# on each line
tr '[:lower:]' '[:upper:]' | # Capitalizes all letters
sort | # Sorts the list
uniq -c > $INTERMED # Counts how many of each letter
# there are
cut -c1-7 $INTERMED | # Cuts out the LHS of the temp file
tr -d ' ' > $COUNTS_R # Must delete the padding spaces though
cut -c9- $INTERMED > $COUNTS_L # Cut out the RHS of the temp file
# Combines the two halves into the final output in reverse order
paste -d ' ' /dev/null $COUNTS_R | paste -d ':' $COUNTS_L -
rm $INTERMED $COUNTS_L $COUNTS_R # Cleans up the temp files
awk one-liner:
awk '
{count[substr($2,1,1)]++}
END {for (init in count) print init ": " count[init]}
' filename
Prints the how many words start with each letter:
for i in {a..z}; do echo -n "$i:"; find path/to/folder -type f -exec sed "s/ /\n/g" {} \; | grep ^$i | wc -c | awk '{print $0}'; done

Resources