How to export original unique values using awk - bash

This command works great for concatenating duplicates and giving only unique values:
awk '!x[$0]++' filewithdupes > newfile
However, I want to keep the original unique values.
Example:
If I have this simple set of values in a CSV column:
1
1
2
2
3
The command above outputs this:
1
2
3
But I want:
3
How can I modify this command to keep the original unique value? Or is there a command better suited to what I'm trying to do?

You may use this awk to print record that has only one occurrence:
awk '{x[$0]++} END{for (i in x) if (x[i] == 1) print i}' filewithdupes
3

if your file is already sorted as in the example, the simplest will be
$ uniq -u file
3
otherwise, a double scan algorithm
$ awk 'NR==FNR{a[$1]++; next} a[$1]==1' file{,}
3

Could you please try following.
awk 'FNR==NR{a[$0]++;next} a[$0]==1' Input_file Input_file

Related

How to replace only a column and for the rows contains specific values

I have the file with | delimited, and am trying to perform below logic
cat list.txt
101
102
103
LIST=`cat list.txt`
Input file
1|Anand|101|1001
2|Raj|103|1002
3|Unix|110|101
Expected result
1|Anand|101|1001
2|Raj|103|1002
3|Unix|UNKNOWN|101
I tried 2 methods,
using fgrep by passing list.txt as input and tried to segregate as 2 files. One matches the list and second not matching and post that non matching file using awk & gsub replacing the 3rd column with UNKNOWN, but issue here is in 3rd row 4th column contains the value available in list.txt, so not able to get expected result
Tried using one liner awk by passing list in -v VAR. Here no changes in the results.
awk -F"|" -v VAR="$LIST" '{if($3 !~ $VAR) {{{gsub(/.*/,"UNKNOWN", $3)1} else { print 0}' input_file
Can you please suggest me how to attain the expected results
There is no need to use cat to read complete file in a variable.
You may just use this awk:
awk 'BEGIN {FS=OFS="|"}
FNR==NR {a[$1]; next}
!($3 in a) {$3 = "UKNNOWN"} 1' list.txt input_file
1|Anand|101|1001
2|Raj|103|1002
3|Unix|UKNNOWN|101

Searching for a string between two characters

I need to find two numbers from lines which look like this
>Chr14:453901-458800
I have a large quantity of those lines mixed with lines that doesn't contain ":" so we can search for colon to find the line with numbers. Every line have different numbers.
I need to find both numbers after ":" which are separated by "-" then substract the first number from the second one and print result on the screen for each line
I'd like this to be done using awk
I managed to do something like this:
awk -e '$1 ~ /\:/ {print $0}' file.txt
but it's nowhere near the end result
For this example i showed above my result would be:
4899
Because it is the result of 458800 - 453901 = 4899
I can't figure it out on my own and would appreciate some help
With GNU awk. Separate the row into multiple columns using the : and - separators. In each row containing :, subtract the contents of column 2 from the contents of column 3 and print result.
awk -F '[:-]' '/:/{print $3-$2}' file
Output:
4899
Using awk
$ awk -F: '/:/ {split($2,a,"-"); print a[2] - a[1]}' input_file
4899

AWK write to a file based on number of fields in csv

I want to iterate over a csv file and discard the rows while writing to a file which doesnt have all columns in a row.
I have an input file mtest.csv like this
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP2##TestProcess2##TestDevice2
TestIP3##TestProcess3##TestDevice3##TestID3
But I am trying to only write those row records where all the 4 columns are present. The output should not have the TestIP2 column complete row as it has 3 columns.
Sample output should look like this:
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
I used to do like this to get all the columns earlier but it writes the TestIP2 row as well which has 3 columns
awk -F "\##" '{print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50)}' mtest.csv >output2.csv
But when I try to ensure that it writes to file when all 4 columns are present, it doesn't work
awk -F "\##", 'NF >3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50); exit}' mtest.csv >output2.csv
You are making things harder than it need to be. All you need to do is check NF==4 to output any records containing four fields. Your total awk expression would be:
awk -F'##' NF==4 < mtest.csv
(note: the default action by awk is print so there is no explicit print required.)
Example Use/Output
With your sample input in mtest.csv, you would receive:
$ awk -F'##' NF==4 < mtest.csv
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
Thanks David and vukung
Both your solutions are okay.I want to write to a file so that i can trim the length of each field as well
I think this below statement works out
awk -F "##" 'NF>3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,2)"\##"substr($4,1,3)}' mtest.csv >output2.csv

Count of unique words from one column of a file in shell

I was trying find out the count of unique words from one column of a file, and the words themselves, using a shell script. Here's what I was doing. Input file contains (filename: gnc.txt, new line after city name):
Male,Tyrus,Seattle
Male,Sam,Seattle
Male,Meha,Seattle
Male,John,Seattle
Male,Sam,Beijing
Male,Meha,Paris
Male,Meha,Berlin
As a first step I found out the number of unique names, which is 4 using below shell commands.
awk -F\, '{ if(!a[$2]) cnt++;a[$2]++;next}END{ print cnt }' gnc.txt
As a next step I want to get the list of unique names: i.e. Tyrus, Sam, Meha and John
Can someone help me in this on how to alter the above command?
Using this awk:
awk -F, '{c[$2]++} END{for (i in c) print i, c[i]}' file
Tyrus 1
Sam 2
John 1
Meha 3
You can also use this:
cut -d',' -f2 file | sort | uniq -c
1 John
3 Meha
2 Sam
1 Tyrus

Countif like function in AWK

I am looking for a way of counting the number of times a value in a field appears in a range of fields in a csv file much the same as countif in excel although I would like to use an awk command if possible.
So column 1 whould have the range of values and column 2 would have the times the value appears in column 1
Count how many times each value appears in the first column and append the count to the end of each line:
$ cat file
1,2,3
1,2,3
9,7,4
1,5,7
3,2,1
$ awk -F, '{c[$1]++;l[NR]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i]","c[s[1]]}}' file
1,2,3,3
1,2,3,3
9,7,4,1
1,5,7,3
3,2,1,1
One more solution using Perl.
perl -F, -lane ' $kv{$F[0]}++;$kl{$.}=$_;END {for(sort keys %kl) { $x=(split(",",$kl{$_}))[0]; print "$kl{$_},$kv{$x}" }} '
Borrowing input from Chris
$ cat kbiles.txt
1,2,3
1,2,3
9,7,4
1,5,7
3,2,1
$ perl -F, -lane ' $kv{$F[0]}++;$kl{$.}=$_;END {for(sort keys %kl) { $x=(split(",",$kl{$_}))[0]; print "$kl{$_},$kv{$x}" }} ' kbiles.txt
1,2,3,3
1,2,3,3
9,7,4,1
1,5,7,3
3,2,1,1
$

Resources