This question already has an answer here:
awk script along with for loop
(1 answer)
Closed 7 years ago.
I have a data set as below (t.txt):
827 819
830 826
828 752
752 694
828 728
821 701
724 708
826 842
719 713
764 783
812 820
829 696
697 849
840 803
752 774
I have second file as below (t1.txt):
752
728
856
693
713
792
812
706
737
751
745
I am trying to extract column 2 elements of the second file from the first data set using a for loop.
I have tried :
for i in `cat t1.txt`
do
awk -F " " '$1=i {print $2}' t.txt > t0.txt
done
Desired output is :
694
820
774
Unfortunately I am getting a blank file.
I have tried to do it manually like : awk -F " " '$1==752 {print $2}' t.txt > t0.txt
Results obtained are
694
774
How can I do it for the entire t1 file in one go?
Simplest way: using join
$ join -o 1.2 <(sort t.txt) <(sort t1.txt)
694
774
820
join requires the files to be lexically sorted on the comparison field (the default field one). The -o option instructs join to output the 2nd field from the 1st file.
With awk
$ awk 'NR==FNR {key[$1]; next} $1 in key {print $2}' t1.txt t.txt
694
820
774
That remembers the keys in t1.txt, then loops over t.txt (when the accumulated record number NR is not equal to the file's record number FNR), if the first field occurred in t1, print the second field.
Related
I would like to substract 2x two columns in a text file and add into two new columns in a tab delimited text file in bash using awk.
I would like to substract column 3 (h3) - column 1 (h1). And name the new added column "count1".
I would like to substract column 4 (h4) - column 2 (h2). And name the new added column "count2".
I don't want to build a new text file, but edit the old one.
My text file:
h1 h2 h3 h4 h5
343 100 856 216 536
283 96 858 220 539
346 111 858 220 539
283 89 860 220 540
280 89 862 220 541
76 32 860 220 540
352 105 856 220 538
57 16 860 220 540
144 31 858 220 539
222 63 860 220 540
305 81 858 220 539
My command at the moment looks like this:
awk '{$6 = $3 - $1}1' file.txt
awk '{$6 = $4 - $2}1' file.txt
But I don't know how to rename the new added columns and maybe there is a smarter move to run both commands in the same awk command?
Pretty simple in awk. Use NR==1 to modify the first line.
awk -F '\t' -v OFS='\t' '
NR==1 {print $0,"count1","count2"}
NR!=1 {print $0,$3-$1,$4-$2}' file.txt > tmp && mv tmp file.txt
I am pretty sure that it is awk I would have to use
I have one file with information I need and another file where I need to take two pieces of information from and obtain two numbers from the second file based on that piece of information.
So if the first file has m7 in its fifth column and 3 in it's third column I want to search in the second column for a row that has 3 in it's first column and m7 in it's fourth column. The I want to print certain columns from these files as listed below.
Given the following two files of input
file1
1 dog 3 8 m7 n15
50 cat 5 8 m15 m22
20 fish 6 3 n12 m7
file2
3 695 842 m7 word
5 847 881 m15 not
8 910 920 n15 important
8 695 842 m22 word
6 312 430 n12 not
I want to produce the output
pre3 695 842 21
pre5 847 881 50
pre6 312 430 20
pre8 910 920 1
pre8 695 842 50
EDIT:
I need to also produce output of the form
pre3 695 842 pre8 910 920 1
pre5 847 881 pre8 695 842 50
pre6 312 430 pre3 695 842 20
The answer below work for the question before, but I'm confused with some of the syntax of it so I'm not sure how to adjust it to make this output
This command:
awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
NR>FNR && ar[$4,$1] {print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2
outputs pre plus the content of the second file's first, second, and third column and the first file's first column for all lines in which the content of the first file's fifth and third (or sixth and fourth) column is identical to the second file's fourth and first column:
pre3 695 842 21
pre5 847 881 50
pre8 910 920 1
pre8 695 842 50
pre6 312 430 20
(for lines with more than one match the values of ar[$4,$1] are summed up)
Note that the output is not necessarily sorted! To achieve this: add sort:
awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
NR>FNR && ar[$4,$1]{print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2 | sort
What does the code?
NR==FNR{...} works on the first input file only
NR>FNR{...} works on the 2nd, 3rd,... input file
ar[$5,$3] creates an array whose key is the content of the 5th and 3rd column of the current line / record (separated by the field separator; usually a single blank)
You could use the below command :
awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4]' f1.txt f2.txt
If you want to print only the specific fields from the matching lines in second file use like below :
awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4] { print "pre"$1" "$2" "$3}' f1.txt f2.txt
I have a data set t.txt:
827 819
830 826
828 752
752 694
828 728
821 701
724 708
826 842
719 713
764 783
812 820
829 696
697 849
840 803
752 774
I also have a second file t1.txt:
752
728
856
693
713
792
812
706
737
751
745
I am trying to extract corresponding column 2 elements of the second file sequentially from the data set.
I have used: awk -F " " '$1==752 {print $2}' t.txt >> t2.txt
How can i use for loop for the above instruction and populate it in one text file instead of doing it one by one?
output for 752 will be 694. This 694 should be written in a different text file. For 812, it should give me 820. Both 694 and 820 should be written in the same text file. It should parse till end of the input file.
I was trying :
for i in `cat t1.txt` | awk -F " " '$1==$i {print $2}' t.txt >> t2.txt
which is throwing syntax error.
Answer for 3rd Version of This Question
$ awk 'FNR==NR{a[$1]=1;next;} $1 in a {print $2;}' t1.txt t.txt
694
820
774
Answer for 2nd Version of This Question
For every line in t1.txt, this checks to see if the same number appears in either column 1 of t.txt. If it does, the number in column 2 of the same line is printed:
$ awk 'FNR==NR{a[$1]=$2;next} $1 in a {print a[$1]}' t.txt t1.txt
694
820
To save the output in file t2.txt, use:
awk 'FNR==NR{a[$1]=$2;next} $1 in a {print a[$1]}' t.txt >t2.txt
How it works
FNR==NR{a[$1]=$2;next}
This reads through t.txt and creates an array a of its values.
$1 in a {print a[$1]}
For each number in file t1.txt, this checks to see if the number appears in array a and, if so, prints out the corresponding value.
I have long tab formatted file with many columns, i would like to calculate % between two columns (3rd and 4rth) and print this % with correspondence numbers with this format (%46.00).
input:
file1 323 434 45 767 254235 275 2345 467
file1 294 584 43 7457 254565 345 235445 4635
file1 224 524 4343 12457 2542165 345 124445 41257
Desired output:
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
i tried:
cat test_file.txt | awk '{printf "%s (%.2f%)\n",$0,($4/$2)*100}' OFS="\t" | awk '{printf "%s (%.2f%)\n",$0,($3/$2)*100}' | awk '{print $1,$2,$3,$11,$4,$10,$5,$6,$7,$8,$9}' - | sed 's/ (/(/g' | sed 's/ /\t/g' >out.txt
It works but I want something sort-cut of this.
I would say:
$ awk '{$3=sprintf("%d(%.2f%)", $3, ($3/$2)*100); $4=sprintf("%d(%.2f%)", $4, ($4/$2)*100)}1' file
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
With a function to avoid duplicities:
awk 'function print_nice (num1, num2) {
return sprintf("%d(%.2f%)", num1, (num1/num2)*100)
}
{$3=print_nice($3,$2); $4=print_nice($4,$2)}1' file
This uses sprintf to express a specific format and store it in a variable. The calculations are the obvious.
I have a data.txt file
1 2 3 4 5 6
cat data.txt
17 245 1323 17.7777 10.2222 61.1111
19 232 1232 19.9999 19.9999 68.8888
13 133 1233 13.3333 13.3333 63.3333
17 177 1678 17.7777 17.7777 69.9999
12 122 2325 12.2222 11.333 64.4444
18 245 1323 18.8888 12.4444 68.8888
12 222 1222 12.2222 19.9999 61.1111
14 245 1323 14.4444 13.5555 68.8888
I would like the find all the values sequentially from 12.2222 in column 4 to 18.8888.
Answer:
echo ${minValsCol4[#]}
12.2222 13.3333 14.4444 17.7777 18.8888
And the values sequentially from 63.3333 in column 6 to 68.8888.
Answer:
echo ${minValsCol6[#]}
63.3333 64.4444 68.8888
Any solution in awk?
Thanks.
Using awk and sort -nu:
awk -v col=4 -v start=12.2222 -v end=18.8888 '$col>=start && $col<=end{
print $col}' data.txt | sort -nu
12.2222
13.3333
14.4444
17.7777
18.8888