Print names alphabetically and how many appearances for each name - bash

I have a file that includes names, one on each line. I want to print the names alphabetically, but (and here is where it gets confusing at least for me) next to each name I must print the number of appearances of that name with exactly one space between the name and the number of appearances.
For example if the file includes these names:
Barry
Don
John
Sam
Harry
Don
Don
Sam
it must print
Barry 1
Don 3
Harry 1
John 1
Sam 2
Any ideas?

sort | uniq -c will get you very close, just with the columns reversed.
$ sort file | uniq -c
1 Barry
3 Don
1 Harry
1 John
2 Sam
If you really need them in the proscribed order you could swap them with awk.
$ sort test.txt | uniq -c | awk '{print $2, $1}'
Barry 1
Don 3
Harry 1
John 1
Sam 2

With awk :
% awk '{
a[$1]++
}
END{
for (i in a) {
print i, a[i]
}
}' file
Output:
Barry 1
Harry 1
Don 3
John 1
Sam 2

Given:
$ cat file
Barry
Don
John
Sam
Harry
Don
Don
Sam
You can do:
$ awk '{a[$1]++} END { for (e in a) print e, a[e] }' file | sort
Barry 1
Don 3
Harry 1
John 1
Sam 2

Related

How can we sum the values group by from file using shell script

I have a file where I have student Roll no, Name, Subject, Obtain Marks and Total Marks data:
10 William English 80 100
10 William Math 50 100
10 William IT 60 100
11 John English 90 100
11 John Math 75 100
11 John IT 85 100
How can i get Group by sum (total obtained marks) of every student in shell Shell? I want this output:
William 190
John 250
i have tried this:
cat student.txt | awk '{sum += $14}END{print sum" "$1}' | sort | uniq -c | sort -nr | head -n 10
This is not working link group by sum.
With one awk command:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file
Output
William 190
John 250
If you want to sort the output, you can pipe to sort, e.g. descending by numerical second field:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file | sort -rnk2
or ascending by student name:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file | sort
You need to use associative array in awk.
Try
awk '{ a[$2]=a[$2]+$4 } END {for (i in a) print i, a[i]}'
a[$2]=a[$2]+$4 Create associate array with $2 as index and sum of values $4 as value
END <-- Process all records
for (i in a) print i, a[i] <-- Print index and value of array
Demo :
$awk '{ a[$2]=a[$2]+$4 } END {for (i in a) print i, a[i]}' temp.txt
William 190
John 250
$cat temp.txt
10 William English 80 100
10 William Math 50 100
10 William IT 60 100
11 John English 90 100
11 John Math 75 100
11 John IT 85 100
$

Unix Compare Two CSV files using comm

I have two CSV files. 1.csv files has 718 entries and 2.csv has 68000 entries.
#cat 1.csv
#Num #Name
1 BoB
2 Jack
3 John
4 Hawk
5 Scot
...........
#cat 2.csv
#Num #Name
1 BoB
2 John
3 Linda
4 Hawk
5 Scot
........
I knew how to compare two files,when only one column(Names) is available in both and to get the matching names.
#comm -12 <(sort 1.csv) <(sort 2.csv)
Now i would like to check, If Num in 1.csv is matching with Num in 2.csv, What is the associated "Names" from both the csv files for that matched Num ?
Result :
1,Bob,Bob
2,Jack,John
3,John,Linda
4,Hawk,Hawk
5,Scot,Scot
..........
How to do achieve this using comm ?
You can use the join command to perform inner join on 2 csv files on the 1st field i.e the number. Here is an example:
$ cat f1.csv
1 BoB
2 Jack
3 John
4 Hawk
5 Scot
6 ExtraInF1
$ cat f2.csv
1 BoB
3 Linda
4 Hawk
2 John
5 Scot
7 ExtraInF2
$ join <(sort -t ' ' -k 1 f1.csv) <(sort -t ' ' -k 1 f2.csv)
1 BoB BoB
2 Jack John
3 John Linda
4 Hawk Hawk
5 Scot Scot
$ join <(sort -t ' ' -k 1 f1.csv) <(sort -t ' ' -k 1 f2.csv) | tr -s ' ' ,
1,BoB,BoB
2,Jack,John
3,John,Linda
4,Hawk,Hawk
5,Scot,Scot
$
Note I have added few dummy rows(number 6 and 7) and also note that they haven't appeared in the output as they aren't present in both files.
<(sort -t ' ' -k 1 f1.csv) means process substitution i.e substitute the output of the process at this place. sort with delimiter as space(-t ' ') and on 1st key i.e 1st column(-k 1) and join by default performs inner join on 1st column of both files.
Another one-liner for inner join using the join command
join -1 1 -2 1 <(sort 1.csv) <(sort 2.csv) | tr -s ' ' ,
-1 2 : sort on file 1, 1st field
-2 1 : sort on file 2, 1st field
tr -s squeezes multiple spaces into a single space and replaces it by a comma(,)

Custom Sort Multiple Files

I have 10 files (1Gb each). The contents of the files are as follows:
head -10 part-r-00000
a a a c b 1
a a a dumbbell 1
a a a f a 1
a a a general i 2
a a a glory 2
a a a h d 1
a a a h o 4
a a a h z 1
a a a hem hem 1
a a a k 3
I need to sort the file based on the last column of each line (descending order), which is of variable length. If there is a match on the numerical value then sort alphabetically by the 2nd last column. The following BASH command works on small datasets (not complete files) and takes 3 second to sort only 10 lines from one file.
cat part-r-00000 | awk '{print $NF,$0}' | sort -nr | cut -f2- -d' ' > FILE
I want the output in a separate FILE. Can someone help me out to speed up the process?
No, once you get rid of the UUOC that's as fast as it's going to get. Obviously you need to add the 2nd-last field to everything too, e.g. something like:
awk '{print $NF,$(NF-1),$0}' part-r-00000 | sort -k1,1nr -k2,2 | cut -f3- -d' '
Check the sort args, I always get mixed up with those..
Reverse order, sort and reverse order:
awk '{for (i=NF;i>0;i--){printf "%s ",$i};printf "\n"}' file | sort -nr | awk '{for (i=NF;i>0;i--){printf "%s ",$i};printf "\n"}'
Output:
a a a h o 4
a a a k 3
a a a general i 2
a a a glory 2
a a a h z 1
a a a hem hem 1
a a a dumbbell 1
a a a h d 1
a a a c b 1
a a a f a 1
You can use a Schwartzian transform to accomplish your task,
awk '{print -$NF, $(NF-1), $0}' input_file | sort -n | cut -d' ' -f3-
The awk command prepends each record with the negative of the last field and the second last field.
The sort -n command sorts the record stream in the required order because we used the negative of the last field.
The cut command splits on spaces and cuts the first two fields, i.e., the ones we used to normalize the sort
Example
$ echo 'a a a c b 1
a a a dumbbell 1
a a a f a 1
a a a general i 2
a a a glory 2
a a a h d 1
a a a h o 4
a a a h z 1
a a a hem hem 1
a a a k 3' | awk '{print -$NF, $(NF-1), $0}' | sort -n | cut -d' ' -f3-
a a a h o 4
a a a k 3
a a a glory 2
a a a general i 2
a a a f a 1
a a a c b 1
a a a h d 1
a a a dumbbell 1
a a a hem hem 1
a a a h z 1
$

bash- get all lines with the same column value in two files

I have two text files each with 3 fields. I need to get the lines with the same value on the third field. The 3rd field value is unique in each file. Example:
file1:
1 John 300
2 Eli 200
3 Chris 100
4 Ann 600
file2:
6 Kevin 250
7 Nancy 300
8 John 100
output:
1 John 300
7 Nancy 300
3 Chris 100
8 John 100
When I use the following command:
cat file1 file2 | sort -k 3 | uniq -c -f 2
I get only one row from an input file with the duplicate value. I need both!
this one-liner gives you that output:
awk 'NR==FNR{a[$3]=$0;next}$3 in a{print a[$3];print}' file1 file2
My solution is
join -1 3 -2 3 <(sort -k3 file1) <(sort -k3 file2) | awk '{print $2, $3, $1; print $4, $5, $1}'
or
join -1 3 -2 3 <(sort -k3 file1) <(sort -k3 file2) -o "1.1 1.2 0 2.1 2.2 0" | xargs -n3

How to sort columns using bash script? [duplicate]

I have a file full of data in columns
sarah mark john
10 20 5
x y z
I want to sort the data so the columns stay intact but the second row is in increasing order so it looks like this:
john sarah mark
5 10 20
z x y
I've been looking at the sort command but have only been able to find vertical sorting, not horizontal. I'm happy to use any tool, any help is appreciated.
Thank you!
Let's create a function to transpose a file (make rows become columns, and columns become rows):
transpose () {
awk '{for (i=1; i<=NF; i++) a[i,NR]=$i; max=(max<NF?NF:max)}
END {for (i=1; i<=max; i++)
{for (j=1; j<=NR; j++)
printf "%s%s", a[i,j], (j<NR?OFS:ORS)
}
}'
}
This just loads all the data into a bidimensional array a[line,column] and then prints it back as a[column,line], so that it transposes the given input. The wrapper transpose () { } is used to store it as a bash function. You just need to copy paste it in your shell (or in ~/.bashrc if you want it to be a permanent function, available any time you open a session).
Then, by using it, we can easily solve the problem by using sort -n -k2: sort numerically based on column 2. Then, transpose back.
$ cat a | transpose | sort -n -k2 | transpose
john sarah mark
5 10 20
z x y
In case you want to have a nice format as final output, just pipe to column like this:
$ cat a | transpose | sort -n -k2 | transpose | column -t
john sarah mark
5 10 20
z x y
Step by step:
$ cat a | transpose
sarah 10 x
mark 20 y
john 5 z
$ cat a | transpose | sort -n -k2
john 5 z
sarah 10 x
mark 20 y
$ cat a | transpose | sort -n -k2 | transpose
john sarah mark
5 10 20
z x y
Coming from a duplicate question, this would sort the columns by the first row:
#!/bin/bash
input="$1"
order=$((for i in $(head -1 $input); do echo $i; done) | nl | sort -k2 | cut -f1)
grep ^ $input | (while read line
do
read -a columns <<< "${line%"${line##*[![:space:]]}"}"
orderedline=()
for i in ${order[#]}
do
orderedline+=("${columns[$i - 1]}")
done
line=$(printf "\t%s" "${orderedline[#]}")
echo ${line:1}
done)
To sort by second row, replace head -1 $input with head -2 $input | tail -1. If the sort should be numeric, put in sort -n -k2 instead of sort -k2.
Good one-liner gets the job done:
perl -ane '$,=" "; print sort #F; print "\n";' file
I found it here: http://www.unix.com/unix-for-advanced-and-expert-users/36039-horizontal-sorting-lines-file-sed-implementation.html

Resources