I have a data set t.txt:
827 819
830 826
828 752
752 694
828 728
821 701
724 708
826 842
719 713
764 783
812 820
829 696
697 849
840 803
752 774
I also have a second file t1.txt:
752
728
856
693
713
792
812
706
737
751
745
I am trying to extract corresponding column 2 elements of the second file sequentially from the data set.
I have used: awk -F " " '$1==752 {print $2}' t.txt >> t2.txt
How can i use for loop for the above instruction and populate it in one text file instead of doing it one by one?
output for 752 will be 694. This 694 should be written in a different text file. For 812, it should give me 820. Both 694 and 820 should be written in the same text file. It should parse till end of the input file.
I was trying :
for i in `cat t1.txt` | awk -F " " '$1==$i {print $2}' t.txt >> t2.txt
which is throwing syntax error.
Answer for 3rd Version of This Question
$ awk 'FNR==NR{a[$1]=1;next;} $1 in a {print $2;}' t1.txt t.txt
694
820
774
Answer for 2nd Version of This Question
For every line in t1.txt, this checks to see if the same number appears in either column 1 of t.txt. If it does, the number in column 2 of the same line is printed:
$ awk 'FNR==NR{a[$1]=$2;next} $1 in a {print a[$1]}' t.txt t1.txt
694
820
To save the output in file t2.txt, use:
awk 'FNR==NR{a[$1]=$2;next} $1 in a {print a[$1]}' t.txt >t2.txt
How it works
FNR==NR{a[$1]=$2;next}
This reads through t.txt and creates an array a of its values.
$1 in a {print a[$1]}
For each number in file t1.txt, this checks to see if the number appears in array a and, if so, prints out the corresponding value.
Related
I would like to substract 2x two columns in a text file and add into two new columns in a tab delimited text file in bash using awk.
I would like to substract column 3 (h3) - column 1 (h1). And name the new added column "count1".
I would like to substract column 4 (h4) - column 2 (h2). And name the new added column "count2".
I don't want to build a new text file, but edit the old one.
My text file:
h1 h2 h3 h4 h5
343 100 856 216 536
283 96 858 220 539
346 111 858 220 539
283 89 860 220 540
280 89 862 220 541
76 32 860 220 540
352 105 856 220 538
57 16 860 220 540
144 31 858 220 539
222 63 860 220 540
305 81 858 220 539
My command at the moment looks like this:
awk '{$6 = $3 - $1}1' file.txt
awk '{$6 = $4 - $2}1' file.txt
But I don't know how to rename the new added columns and maybe there is a smarter move to run both commands in the same awk command?
Pretty simple in awk. Use NR==1 to modify the first line.
awk -F '\t' -v OFS='\t' '
NR==1 {print $0,"count1","count2"}
NR!=1 {print $0,$3-$1,$4-$2}' file.txt > tmp && mv tmp file.txt
I am pretty sure that it is awk I would have to use
I have one file with information I need and another file where I need to take two pieces of information from and obtain two numbers from the second file based on that piece of information.
So if the first file has m7 in its fifth column and 3 in it's third column I want to search in the second column for a row that has 3 in it's first column and m7 in it's fourth column. The I want to print certain columns from these files as listed below.
Given the following two files of input
file1
1 dog 3 8 m7 n15
50 cat 5 8 m15 m22
20 fish 6 3 n12 m7
file2
3 695 842 m7 word
5 847 881 m15 not
8 910 920 n15 important
8 695 842 m22 word
6 312 430 n12 not
I want to produce the output
pre3 695 842 21
pre5 847 881 50
pre6 312 430 20
pre8 910 920 1
pre8 695 842 50
EDIT:
I need to also produce output of the form
pre3 695 842 pre8 910 920 1
pre5 847 881 pre8 695 842 50
pre6 312 430 pre3 695 842 20
The answer below work for the question before, but I'm confused with some of the syntax of it so I'm not sure how to adjust it to make this output
This command:
awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
NR>FNR && ar[$4,$1] {print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2
outputs pre plus the content of the second file's first, second, and third column and the first file's first column for all lines in which the content of the first file's fifth and third (or sixth and fourth) column is identical to the second file's fourth and first column:
pre3 695 842 21
pre5 847 881 50
pre8 910 920 1
pre8 695 842 50
pre6 312 430 20
(for lines with more than one match the values of ar[$4,$1] are summed up)
Note that the output is not necessarily sorted! To achieve this: add sort:
awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
NR>FNR && ar[$4,$1]{print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2 | sort
What does the code?
NR==FNR{...} works on the first input file only
NR>FNR{...} works on the 2nd, 3rd,... input file
ar[$5,$3] creates an array whose key is the content of the 5th and 3rd column of the current line / record (separated by the field separator; usually a single blank)
You could use the below command :
awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4]' f1.txt f2.txt
If you want to print only the specific fields from the matching lines in second file use like below :
awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4] { print "pre"$1" "$2" "$3}' f1.txt f2.txt
This question already has an answer here:
awk script along with for loop
(1 answer)
Closed 7 years ago.
I have a data set as below (t.txt):
827 819
830 826
828 752
752 694
828 728
821 701
724 708
826 842
719 713
764 783
812 820
829 696
697 849
840 803
752 774
I have second file as below (t1.txt):
752
728
856
693
713
792
812
706
737
751
745
I am trying to extract column 2 elements of the second file from the first data set using a for loop.
I have tried :
for i in `cat t1.txt`
do
awk -F " " '$1=i {print $2}' t.txt > t0.txt
done
Desired output is :
694
820
774
Unfortunately I am getting a blank file.
I have tried to do it manually like : awk -F " " '$1==752 {print $2}' t.txt > t0.txt
Results obtained are
694
774
How can I do it for the entire t1 file in one go?
Simplest way: using join
$ join -o 1.2 <(sort t.txt) <(sort t1.txt)
694
774
820
join requires the files to be lexically sorted on the comparison field (the default field one). The -o option instructs join to output the 2nd field from the 1st file.
With awk
$ awk 'NR==FNR {key[$1]; next} $1 in key {print $2}' t1.txt t.txt
694
820
774
That remembers the keys in t1.txt, then loops over t.txt (when the accumulated record number NR is not equal to the file's record number FNR), if the first field occurred in t1, print the second field.
I have long tab formatted file with many columns, i would like to calculate % between two columns (3rd and 4rth) and print this % with correspondence numbers with this format (%46.00).
input:
file1 323 434 45 767 254235 275 2345 467
file1 294 584 43 7457 254565 345 235445 4635
file1 224 524 4343 12457 2542165 345 124445 41257
Desired output:
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
i tried:
cat test_file.txt | awk '{printf "%s (%.2f%)\n",$0,($4/$2)*100}' OFS="\t" | awk '{printf "%s (%.2f%)\n",$0,($3/$2)*100}' | awk '{print $1,$2,$3,$11,$4,$10,$5,$6,$7,$8,$9}' - | sed 's/ (/(/g' | sed 's/ /\t/g' >out.txt
It works but I want something sort-cut of this.
I would say:
$ awk '{$3=sprintf("%d(%.2f%)", $3, ($3/$2)*100); $4=sprintf("%d(%.2f%)", $4, ($4/$2)*100)}1' file
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
With a function to avoid duplicities:
awk 'function print_nice (num1, num2) {
return sprintf("%d(%.2f%)", num1, (num1/num2)*100)
}
{$3=print_nice($3,$2); $4=print_nice($4,$2)}1' file
This uses sprintf to express a specific format and store it in a variable. The calculations are the obvious.
I have a text file with data in the following format.
1 0 0
2 512 6
3 992 12
4 1536 18
5 2016 24
6 2560 29
7 3040 35
8 3552 41
9 4064 47
10 4576 53
11 5088 59
12 5600 65
13 6080 71
14 6592 77
15 7104 83
I want to print all the lines where $1 > 1000.
awk 'BEGIN {$1 > 1000} {print " " $1 " "$2 " "$3}' graph_data_tmp.txt
This doesn't seem to give the output that I am expecting.What am I doing wrong?
You can do this :
awk '$1>1000 {print $0}' graph_data_tmp.txt
print $0 will print all the content of the line
If you want to print the content of the line after the 1000th line/ROW, then you could do the same by replacing $1 with NR. NR represents the number of rows.
awk 'NR>1000 {print $0}' graph_data_tmp.txt
All you need is:
awk '$1>1000' file