file1
rs12345 G C
rs78901 A T
file2
3 22745180 rs12345 G C,G
12 67182999 rs78901 A G,T
desired output
3 22745180 rs12345 G C
12 67182999 rs78901 A T
I tried
awk 'NR==FNR {h[$1] = $3; next} {print $1,$2,$3,h[$2]}' file1 file2
output generated
3 22745180 rs12345
print first 4 columns of file2 and 3rd column of file1 as 5th col in output
You may use this awk:
awk 'FNR == NR {map[$1,$2] = $3; next} ($3,$4) in map {$NF = map[$3,$4]} 1' f1 f2 | column -t
3 22745180 rs12345 G C
12 67182999 rs78901 A T
A more readable version:
awk '
FNR == NR {
map[$1,$2] = $3
next
}
($3,$4) in map {
$NF = map[$3,$4]
}
1' file1 file2 | column -t
Used column -t for tabular output only.
In the case you presented (rows in both files are matched) this will work
paste file1 file2 | awk '{print $4,$5,$6,$7,$3}'
I am trying to split a file (testfile.csv) that contains the following:
1,2,4,5,6,7,8,9
a,b,c,d,e,f,g,h
q,w,e,r,t,y,u,i
a,s,d,f,g,h,j,k
z,x,c,v,b,n,m,z
into a file
1,2
a,b
q,w
a,s
z,x
and another file
4,5
c,d
e,r
d,f
c,v
but I cannot seem to do that in awk using an iterative solution.
awk -F, '{print $1, $2}'
awk -F, '{print $3, $4}'
does it for me but I would like a looping solution.
I tried
awk -F, '{ for (i=1;i< NF;i+=2) print $i, $(i+1) }' testfile.csv
but it gives me a single column. It appears that I am iterating over the first row and then moving onto the second row skipping every other element of that specific row.
You can use cut:
$ cut -d, -f1,2 file > file_1
$ cut -d, -f3,4 file > file_2
If you are going to use awk be sure to set the OFS so that the columns remain a CSV file:
$ awk 'BEGIN{FS=OFS=","}
{print $1,$2 >"f1"; print $3,$4 > "f2"}' file
$ cat f1
1,2
a,b
q,w
a,s
z,x
$cat f2
4,5
c,d
e,r
d,f
c,v
Is there a quick and dirty way of renaming the resulting files with the first row and first column (like first file would be 1.csv, second file would be 4.csv:
awk 'BEGIN{FS=OFS=","}
FNR==1 {n1=$1 ".csv"; n2=$3 ".csv"}
{print $1,$2 >n1; print $3,$4 > n2}' file
awk -F, '{ for (i=1; i < NF; i+=2) print $i, $(i+1) > i ".csv"}' tes.csv
works for me. I was trying to get the output in bash which was all jumbled up.
It's do-able in bash, but it will be much slower than awk:
f=testfile.csv
IFS=, read -ra first < <(head -1 "$f")
for ((i = 0; i < (${#first[#]} + 1) / 2; i++)); do
slice_file="${f%.csv}$((i+1)).csv"
cut -d, -f"$((2 * i + 1))-$((2 * (i + 1)))" "$f" > "$slice_file"
done
with sed:
sed -r '
h
s/(.,.),./\1/w file1.txt
g
s/.,.,(.,.),./\1/w file2.txt' file.txt
I have two files
file1:
>string1<TAB>Name1
>string2<TAB>Name2
>string3<TAB>Name3
file2:
>string1<TAB>sequence1
>string2<TAB>sequence2
I want to use awk to compare column 1 of respective files. If both files share a column 1 value I want to print column 2 of file1 followed by column 2 of file2. For example, for the above files my expected output is:
Name1<TAB>sequence1
Name2<TAB>sequence2
this is my code:
awk 'BEGIN{FS=OFS="\t"} FNR == NR { a[$1] = $1; next } $1 in a { print a[$2], $2 }' file1 file2 >out
But the only thing I get is an empty first columnsequence
where is the error here?
your assignment is not right.
$ awk 'BEGIN {FS=OFS="\t"}
NR==FNR {a[$1]=$2; next}
$1 in a {print a[$1],$2}' file1 file2
Name1 sequence1
Name2 sequence2
I'm a complete newbie to using command-line utilities and am wondering how to process information as following:
mapping.txt:
80 001 002
81 011 012 013 014
82 021 022
...
input.txt:
81 103823044
80 103823054
81 103823064
...
Desired output.txt:
103823044|011|
103823044|012|
103823044|013|
103823044|014|
103823054|001|
103823054|002|
103823064|011|
103823064|012|
103823064|013|
103823064|014|
I've done simple mapping wherein the column numbers are fixed but I'm unsure of how to map a dynamic number of columns to the desired output
If order is not important, join and awk can do the job easily.
$ join <(sort input.txt) <(sort mapping.txt) | awk -v OFS="|" '{for (i=3;i<NF;i++) print $2, $i OFS}'
103823054|001|
103823044|011|
103823044|012|
103823044|013|
103823064|011|
103823064|012|
103823064|013|
Here's a GNU awk script that uses multi-dimensional arrays to do what you want:
#!/usr/bin/awk -f
BEGIN { OFS="|" }
FNR==NR { for(i=2;i<=NF;i++) a[$1][$i]; next }
$1 in a { for(k in a[$1]) print $2, k, "" }
If you save that to a file like script.awk and then chmod +x script.awk you can run it like:
$ ./script.awk mapping.txt input.txt
103823044|011|
103823044|012|
103823044|013|
103823044|014|
103823054|002|
103823054|001|
103823064|011|
103823064|012|
103823064|013|
103823064|014|
Here's a breakdown of the script:
BEGIN - set the output field separator to |
FNR==NR - process the first file (mapping.txt) and store the data in a multi-dimensional array by $1 first, then by the other fields. next to skip any other line processing.
$1 in a - test to see if the line has a mapping. If so, print the corresponding mappings out in order(also GNU awk). The commas in the print command are converted to the OFS value.
It could be remade a "one-liner" like:
awk -v OFS="|" 'FNR==NR {for(i=2;i<=NF;i++) a[$1][$i]; next} $1 in a {for(k in a[$1]) print $2, k, ""}' mapping.txt input.txt
Here's a version of the script that uses a single dimensional array to store $0 then split()s it later to preserve order:
#!/usr/bin/awk -f
BEGIN { OFS="|" }
FNR==NR { a[$1]=$0; next }
$1 in a { c=split(a[$1], b); for(i=2;i<=c;i++) print $2, b[i], "" }
I have 2 files with contents:-
file1:
918802944821 919968005200 kushinagar
919711354546 919211999924 delhi
915555555555 916666666666 kanpur
919711354546 915686524578 hehe
918802944821 4752168549 hfhkjh
file2:-
919211999924 919711354546 ghaziabad
919999999999 918888888888 lucknow
912222222222 911111111111 chandauli
918802944821 916325478965 hfhjdhjd
Now notice that number1 and number2 are interchanged in file1 and file2. I want to print only this duplicate line on the screen. to be more specific i want only the numbers or line to be printed on the screen which are duplicate like 8888888888 and 7777777777 are duplicate in the two files. I want only these two numbers on the screen or the whole line on the screen..
Using awk you can do:
awk 'FNR==NR{a[$1,$2]++;next} a[$2,$1]' f1 f2
7777777777 8888888888 pqr
EDIT: Based on your edited question you can do:
awk 'FNR==NR{a[$1]++;b[$2]++;next} a[$1] || b[$1] {print $1} a[$2] || b[$2]{print $2}' f1 f2
919211999924
919711354546
918802944821
kent$ awk 'NR==FNR{a[$2 FS $1]=1;next}a[$1 FS $2]{print $1,$2}' f1 f2
7777777777 8888888888