count and print the unique number of strings

count and print the unique number of strings - bash

I have a text file as shown below. I would like to count the unique number of connections of each person in the first and second column. Third column is the ID numbers of first column persons and fourth column is the ID numbers of second column persons.
susan ali 156 294
susan ali 156 294
susan anna 156 67
rex rex 432 564
rex rex 432 564
philip sama 543 22
for example, susan has two connections with ali and anna. susan's ID is 156. Ali's and anna's ID are 294, 67 respectively. In the ouput, last column is the number of connections of each person. Total connections are the sum of the connections of each person.
your help would be appreciated!!
output:
susan 156 :- ali 294 anna 67 2
rex 432 :- rex 564 1
philip 543 :- sama 22 1
ali 294 :- susan 156 1
anna 67 :- susan 156 1
rex 564 :- rex 432 1
sama 22 :- philip 543 1
Total connections:-8

a simple cat ztest.txt | sort -k1,2 | uniq -c does the trick , but since you want it formatted - you can use awk like this :
awk '{ print $2 " :- " $4 " connected to " $3 " :- " $5 "-- count: " $1} '
full command :
$ cat ztest.txt | sort -k1,2 | uniq -c | awk '{ print $2 " :- " $4 " connected to " $3 " :- " $5 "-- count: " $1} '
output :
philip :- 543 connected to sama :- 22 -- count: 1
rex :- 432 connected to rex :- 564 -- count: 2
susan :- 156 connected to ali :- 294 -- count: 2
susan :- 156 connected to anna :- 67 -- count: 1

Related

Apply multiple substract commands between two columns in text file in bash

I would like to substract 2x two columns in a text file and add into two new columns in a tab delimited text file in bash using awk.
I would like to substract column 3 (h3) - column 1 (h1). And name the new added column "count1".
I would like to substract column 4 (h4) - column 2 (h2). And name the new added column "count2".
I don't want to build a new text file, but edit the old one.
My text file:
h1 h2 h3 h4 h5
343 100 856 216 536
283 96 858 220 539
346 111 858 220 539
283 89 860 220 540
280 89 862 220 541
76 32 860 220 540
352 105 856 220 538
57 16 860 220 540
144 31 858 220 539
222 63 860 220 540
305 81 858 220 539
My command at the moment looks like this:
awk '{$6 = $3 - $1}1' file.txt
awk '{$6 = $4 - $2}1' file.txt
But I don't know how to rename the new added columns and maybe there is a smarter move to run both commands in the same awk command?

Pretty simple in awk. Use NR==1 to modify the first line.
awk -F '\t' -v OFS='\t' '
NR==1 {print $0,"count1","count2"}
NR!=1 {print $0,$3-$1,$4-$2}' file.txt > tmp && mv tmp file.txt

Organize the file by AWK

Well, I have the following file:
week ID Father Mother Group C_Id Weight Age System Gender
9 107001 728 7110 922 107001 1287 56 2 2
10 107001 728 7110 1022 107001 1319 63 2 2
11 107001 728 7110 1122 107001 1491 70 2 2
1 107002 702 7006 111 107002 43 1 1 1
2 107002 702 7006 211 107002 103 7 1 1
4 107002 702 7006 411 107002 372 21 1 1
1 107003 729 7112 111 107003 40 1 1 1
2 107003 729 7112 211 107003 90 7 1 1
5 107003 729 7112 511 107003 567 28 1 1
7 107003 729 7112 711 107003 1036 42 1 1
I need to transpose the Age ($8) and Weight ($7) columns, where the column ($8) will be the new label (1, 7, 21, 28, 42, 56, 63, 70). Additionally, the age label should be in ascending order. But not all animals have all age measures, animals that do not possess should be given the "NS" symbol. The Id, Father, Mother, System, and Gender columns will be maintained, but with the transposition of the Age and Weight columns, it will not be necessary to repeat these variables as in the first table. Week, Group and C_Id columns are not required. Visually, I need that file be this way:
ID Father Mother System Gender 1 7 21 28 42 56 63 70
107001 728 7110 2 2 NS NS NS NS NS 1287 1319 1491
107002 702 7006 1 1 43 103 372 NS NS NS NS NS
107003 729 7112 1 1 40 90 NS 567 1036 NS NS NS
I tried this program:
#!/bin/bash
awk 'NR==1{h=$2 OFS $3 OFS $4 OFS $9 OFS $10; next}
{a[$2]=(($1 in a)?(a[$1] OFS $NF):(OFS $3 OFS $4 OFS $9 OFS $10));
if(!($8 in b)) {h=h OFS $8; b[$8]}}
END{print h; for(k in a) print k,a[k]}' banco.txt | column -t > a
But I got it:
ID Father Mother System Gender
56 63 70 1 7 21 28 42
107001 728 7110 2 2
107002 702 7006 1 1
107003 729 7112 1 1
And I'm stuck at that point, any suggestion please? Thanks.

With GNU awk for "sorted_in":
$ cat tst.awk
{
id = $2
weight = $7
age = $8
idAge2weight[id,age] = weight
id2common[id] = $2 OFS $3 OFS $4 OFS $9 OFS $10
ages[age]
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc"
printf "%s", id2common["ID"]
for (age in ages) {
printf "%s%s", OFS, age
}
print ""
delete id2common["ID"]
for (id in id2common) {
printf "%s", id2common[id]
for (age in ages) {
weight = ((id,age) in idAge2weight ? idAge2weight[id,age] : "NS")
printf "%s%s", OFS, weight
}
print ""
}
}
$ awk -f tst.awk file | column -t
ID Father Mother System Gender Age 1 7 21 28 42 56 63 70
107001 728 7110 2 2 NS NS NS NS NS NS 1287 1319 1491
107002 702 7006 1 1 NS 43 103 372 NS NS NS NS NS
107003 729 7112 1 1 NS 40 90 NS 567 1036 NS NS NS
I added the pipe to column -t just so you could see the field alignment.

Awk, printing certain columns based on how rows of different files match

I am pretty sure that it is awk I would have to use
I have one file with information I need and another file where I need to take two pieces of information from and obtain two numbers from the second file based on that piece of information.
So if the first file has m7 in its fifth column and 3 in it's third column I want to search in the second column for a row that has 3 in it's first column and m7 in it's fourth column. The I want to print certain columns from these files as listed below.
Given the following two files of input
file1
1 dog 3 8 m7 n15
50 cat 5 8 m15 m22
20 fish 6 3 n12 m7
file2
3 695 842 m7 word
5 847 881 m15 not
8 910 920 n15 important
8 695 842 m22 word
6 312 430 n12 not
I want to produce the output
pre3 695 842 21
pre5 847 881 50
pre6 312 430 20
pre8 910 920 1
pre8 695 842 50
EDIT:
I need to also produce output of the form
pre3 695 842 pre8 910 920 1
pre5 847 881 pre8 695 842 50
pre6 312 430 pre3 695 842 20
The answer below work for the question before, but I'm confused with some of the syntax of it so I'm not sure how to adjust it to make this output

This command:
awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
NR>FNR && ar[$4,$1] {print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2
outputs pre plus the content of the second file's first, second, and third column and the first file's first column for all lines in which the content of the first file's fifth and third (or sixth and fourth) column is identical to the second file's fourth and first column:
pre3 695 842 21
pre5 847 881 50
pre8 910 920 1
pre8 695 842 50
pre6 312 430 20
(for lines with more than one match the values of ar[$4,$1] are summed up)
Note that the output is not necessarily sorted! To achieve this: add sort:
awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
NR>FNR && ar[$4,$1]{print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2 | sort
What does the code?
NR==FNR{...} works on the first input file only
NR>FNR{...} works on the 2nd, 3rd,... input file
ar[$5,$3] creates an array whose key is the content of the 5th and 3rd column of the current line / record (separated by the field separator; usually a single blank)

You could use the below command :
awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4]' f1.txt f2.txt
If you want to print only the specific fields from the matching lines in second file use like below :
awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4] { print "pre"$1" "$2" "$3}' f1.txt f2.txt

Retrieving multiple last rows of occurences of a column based on same first column

I have the following file:
ABC 1234 2333 BCD
ABC 121 123 BCD
ABC 124 231 BCD
ABC 2342 2344 CDK
MBN 231 252 RFC
MBN 230 212 RFC
MBN 213 215 RFC
MBN 233 235 RFC
MBN 12 67 RTC
MBN 67 98 TCF
I want to find the last row of unique first and fourth column value based on search from another file, my other file will have
ABC
MBN
The code will work such that it will look for ABC first in the above file, then find last occurrence of BCD and so on and the output would be:
ABC 124 231 BCD
ABC 2342 2344 CDK
MBN 233 235 RFC
MBN 67 98 TCF
I have begun by first finding the occurrence of ABC as
grep ABC abovefile.txt | head -1

You can use this awk command:
awk 'NR==FNR{search[$1];next} $1 in search{key=$1 SEP $4; if (!(key in data)) c[++n]=key;
data[key]=$0} END{for (i=1; i<=n; i++) print data[c[i]]}' file2 file1
Output:
ABC 124 231 BCD
ABC 2342 2344 CDK
MBN 233 235 RFC
MBN 12 67 RTC
MBN 67 98 TCF
This solution is using 3 arrays:
search to hold search items from file2
data to hold records from file1 with the key as $1,$4
c for keeping the order of the already processed keys
Code Breakup:
NR==FNR # Execute next block for the 1st file in the list (i.e. file2)
{search[$1];next} # store first column in search array and move to next record
$1 in search # for next file in the list if first col exists in search array
key=$1 SEP $4 # make key variable as $1, $4
if(!(key in data))# if key is not in data array
c[++n]=key # store in array c with an incrementing index
data[key]=$0} # not store full record in data array with index=key
END # run this block at the end

calculate percentage between columns in bash?

I have long tab formatted file with many columns, i would like to calculate % between two columns (3rd and 4rth) and print this % with correspondence numbers with this format (%46.00).
input:
file1 323 434 45 767 254235 275 2345 467
file1 294 584 43 7457 254565 345 235445 4635
file1 224 524 4343 12457 2542165 345 124445 41257
Desired output:
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
i tried:
cat test_file.txt | awk '{printf "%s (%.2f%)\n",$0,($4/$2)*100}' OFS="\t" | awk '{printf "%s (%.2f%)\n",$0,($3/$2)*100}' | awk '{print $1,$2,$3,$11,$4,$10,$5,$6,$7,$8,$9}' - | sed 's/ (/(/g' | sed 's/ /\t/g' >out.txt
It works but I want something sort-cut of this.

I would say:
$ awk '{$3=sprintf("%d(%.2f%)", $3, ($3/$2)*100); $4=sprintf("%d(%.2f%)", $4, ($4/$2)*100)}1' file
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
With a function to avoid duplicities:
awk 'function print_nice (num1, num2) {
return sprintf("%d(%.2f%)", num1, (num1/num2)*100)
}
{$3=print_nice($3,$2); $4=print_nice($4,$2)}1' file
This uses sprintf to express a specific format and store it in a variable. The calculations are the obvious.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

count and print the unique number of strings - bash

Related

Apply multiple substract commands between two columns in text file in bash

Organize the file by AWK

Awk, printing certain columns based on how rows of different files match

Retrieving multiple last rows of occurences of a column based on same first column

calculate percentage between columns in bash?

Categories

Resources