Write the frequency of each number in column into next column in bash - bash

I have a column with different numbers in each row in a text file. Now, I want the frequency of each number into a new column. And the similar rows should be deleted, to have only each unique number in the first column and the frequency in the second column.
Input:
0.32832977
0.31647876
0.31482627
0.31447645
0.31447645
0.31396809
0.31281157
0.312004
0.31102326
0.30771822
0.30560062
0.30413213
0.30373717
0.29636685
0.29622422
0.29590765
0.2949896
0.29414582
0.28841901
0.28820667
0.28291832
0.28243792
0.28156429
0.28043638
0.27872239
0.27833349
0.27825573
0.27669023
0.27645657
0.27645657
0.27645657
0.27645657
Output:
0.32832977 1
0.31647876 1
0.31482627 1
0.31447645 2
0.31396809 1
0.31281157 1
0.312004 1
0.31102326 1
0.30771822 1
0.30560062 1
0.30413213 1
0.30373717 1
0.29636685 1
0.29622422 1
0.29590765 1
0.2949896 1
0.29414582 1
0.28841901 1
0.28820667 1
0.28291832 1
0.28243792 1
0.28156429 1
0.28043638 1
0.27872239 1
0.27833349 1
0.27825573 1
0.27669023 1
0.27645657 4
I tried this command, but it doesn't seem to work:
awk -F '|' '{freq[$1]++} END{for (i in freq) print freq[i], i}' file

Using Awk is an overkill IMO here, the built-in tools will do the work just fine:
sort -n file | uniq -c | sort
Output:
1 0.32832977
2 0.31447645
4 0.27645657

For completeness, this would be the awk solution (no need to set the input field separator to | if your sample input is representative).
awk '{f[$0]++} END{for (i in f) print i, f[i]}' input.txt
0.28820667 1
0.30560062 1
0.312004 1
0.28156429 1
0.28291832 1
0.29636685 1
0.31447645 2
0.30373717 1
0.31482627 1
:
You can, however, set the output field separator to | or (as I did here) to a tab character, to format the output
awk '{f[$0]++} END{OFS="\t"; for (i in f) print i, f[i]}' input.txt
0.28820667 1
0.30560062 1
0.312004 1
0.28156429 1
0.28291832 1
0.29636685 1
0.31447645 2
0.30373717 1
0.31482627 1
:

Related

Format output in bash

I have output in bash that I would like to format. Right now my output looks like this:
1scom.net 1
1stservicemortgage.com 1
263.net 1
263.sina.com 1
2sahm.org 1
abac.com 1
abbotsleigh.nsw.edu.au 1
abc.mre.gov.br 1
ableland.freeserve.co.uk 1
academicplanet.com 1
access-k12.org 1
acconnect.com 1
acconnect.com 1
accountingbureau.co.uk 1
acm.org 1
acsalaska.net 1
adam.com.au 1
ada.state.oh.us 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
adelphia.net 1
aecom.yu.edu 1
aecon.com 1
aetna.com 1
agedwards.com 1
ahml.info 1
The problem with this is none of the numbers on the right line up. I would like them to look like this:
1scom.net 1
1stservicemortgage.com 1
263.net 1
263.sina.com 1
2sahm.org 1
Would there be anyway to make them look like this without knowing exactly how long the longest domain is? Any help would be greatly appreciated!
The code that outputted this is:
grep -E -o -r "\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" $ARCHIVE | sed 's/.*#//' | uniq -ci | sort | sed 's/^ *//g' | awk ' { t = $1; $1 = $2; $2 = t; print; } ' > temp2
ALIGNMENT:
Just use cat with column command and thats it:
cat /path/to/your/file | column -t
For more details on column command refer http://manpages.ubuntu.com/manpages/natty/man1/column.1.html
EDITED:
View file in terminal:
column -t < /path/to/your/file
(as noted by anishsane)
Export to a file:
column -t < /path/to/your/file > /output/file

awk command to merge the content of the same file

I have an input file with the following content
1 1
2 1
3 289
4 1
5 2
0 Clear
1 Warning
2 Indeterminate
3 Minor
4 Major
5 Critical
I want to merge the first type of lines with the messages by the first column and obtain
1 1 Warning
2 1 Indeterminate
3 289 Minor
4 1 Major
5 2 Critical
Just use awk:
awk '$1 in a { print $1, a[$1], $2; next } { a[$1] = $2 }' file
Output:
1 1 Warning
2 1 Indeterminate
3 289 Minor
4 1 Major
5 2 Critical
Using join/sed, sed creates different views of the file for each part and join joins on the common field:
join <(sed '/^[0-9]* [0-9]* *$/!d' input) <(sed '/^[0-9]* [0-9]* *$/d' input)
Gives:
1 1 Warning
2 1 Indeterminate
3 289 Minor
4 1 Major
5 2 Critical
You can do this with Awk:
awk 'BEGIN{n=0}NR>6{n=1}n==0{a[$1]=$2}n==1{print $1,a[$1],$2}' file
or another way:
awk 'NR<=5{a[$1]=$2}$2~/[a-zA-z]+/ && $1>0 && $1<=5{print $1,a[$1],$2}' file

Unix: Count occurrences of similar entries in first column, sum the second column

I have a file with two columns of data, I would like to count the occurrence of similarities in the first column. When two similar entries in the first column are matched, I would like to also sum the value of the second column of the two matched entries.
Example list:
2013-11-13-03 1
2013-11-13-06 1
2013-11-13-13 2
2013-11-13-13 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-17 1
2013-11-13-23 1
2013-11-14-01 1
2013-11-14-04 6
2013-11-14-07 1
2013-11-14-08 1
2013-11-14-09 1
2013-11-14-09 1
I would like the output to read similar to the following
2013-11-13-03 1 1
2013-11-13-06 1 1
2013-11-13-13 2 3
2013-11-13-15 3 3
2013-11-13-17 1 1
2013-11-13-23 1 1
2013-11-14-01 1 1
2013-11-14-04 1 6
2013-11-14-07 1 1
2013-11-14-08 1 1
2013-11-14-09 2 2
Column 1 is the matched columns from the earlier example column 1, column 2 is the count of matches of column 1 from the earlier example (1 if no other matches), column 3 is the sum of column 2 from the matched column 1 entries from the earlier example. Anyone have any tips on completing this using awk or a mixture of uniq and awk?
Here's a quickie with awk and sort:
awk '
{
counts[$1]++; # Increment count of lines.
totals[$1] += $2; # Accumulate sum of second column.
}
END {
# Iterate over all first-column values.
for (x in counts) {
print x, counts[x], totals[x];
}
}
' file.txt | sort
You can skip the sort if you don't care about the order of output lines.
Here a pure Bash solution
$ cat t
2013-11-13-03 1
2013-11-13-06 1
2013-11-13-13 2
2013-11-13-13 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-15 1
2013-11-13-17 1
2013-11-13-23 1
2013-11-14-01 1
2013-11-14-04 6
2013-11-14-07 1
2013-11-14-08 1
2013-11-14-09 1
2013-11-14-09 1
$ declare -A SUM CNT
$ while read ts vl; do (( SUM[$ts]=+$vl )) ; (( CNT[$ts]++ )); done < t
$ for i in "${!CNT[#]}"; do echo "$i ${CNT[$i]} ${SUM[$i]} "; done | sort
2013-11-13-03 1 1
2013-11-13-06 1 1
2013-11-13-13 2 3
2013-11-13-15 3 3
2013-11-13-17 1 1
2013-11-13-23 1 1
2013-11-14-01 1 1
2013-11-14-04 1 6
2013-11-14-07 1 1
2013-11-14-08 1 1
2013-11-14-09 2 2

Comparing few colums of a file with columns of another file

I have two data files 1.txt and 2.txt
1.txt contains valid lines.
For example.
1 2 1 2
1 3 1 3
In 2.txt i have an extra coloum, but if you ignore that, I have a few valid lines, and few invalid lines. There could be multiple occurrences of the same line in 2.txt
For example:
1 2 1 2 1.9
1 3 1 3 3.4
1 3 1 3 3.4
2 3 2 3 5.6
2 3 2 3 5.6
The second and third lines are the same and valid.
The fourth and fifth lines are also the same but invalid.
I want to write a shell script which compares these two files and outputs two files, valid.txt and invalid.txt which look like these...
valid.txt :
1 2 1 2 1
1 3 1 3 2
and invalid.txt :
2 3 2 3 2
The last extra column of valid.txt and invalid.txt contains the number of times the line has been repeated in 2.txt.
this awk script works for the example data:
awk 'NR==FNR{sub(/ *$/,"");a[$0]++;next}
{sub(/ [^ ]*$/,"")
if($0 in a)
v[$0]++
else
n[$0]++
}
END{
for(x in v)print x,v[x] > "valid.txt"
for(x in n) print x,n[x] >"inv.txt"
}' file1 file2
output:
kent$ head inv.txt valid.txt
==> inv.txt <==
2 3 2 3 2
==> valid.txt <==
1 3 1 3 2
1 2 1 2 1

Help with duplicating rows based on a field using awk

I have the following data set with the 3rd field consists of 0's and 1's
Input
1 2 1
2 4 0
3 3 1
4 1 1
5 0 0
I wish to expand the data set to the following format
Duplicate each row based on the 2nd field and
Replace only the "new" 1's (obtain after duplication) in the 3rd field by 0
How can I do this with AWK?
Thanks
Output
1 2 1
1 2 0
2 4 0
2 4 0
2 4 0
2 4 0
3 3 1
3 3 0
3 3 0
4 1 1
awk '{print; $3=0; for (i=1; i<$2; i++) print}' inputfile
If you want to actually skip records with a zero in the second field (as your example seems to show):
awk '{if ($2>0) print; $3=0; for (i=1; i<$2; i++) print}' inputfile

Resources