Print variable inside awk while calculating variable name - bash

I have a script that looks like the example below. I have a letter offset and I need to print the letter that I calculate with the offset. I am not sure how to read that letter using ksh.
My expected answer would be for LETTER_OFFSET(1)=a,LETTER_OFFSET(2)=v, LETTER_OFFSET(3)=c, etc. The offset I have it been calculated inside a loop.
#!/bin/ksh
# 1 2 3 4 5 6 7 8 9 10 11 12
LETTERS=" a v c d g r g s s a g f"
LETTER_OFFSET="3";
Letter=$(echo $LETTERS | awk '{print $((1 * $$LETTER_OFFSET )) }')

You'll pass your offset into your awk script to use as an awk variable using the awk -v flag:
LETTER=$(echo $LETTERS | awk -v offset=$LETTER_OFFSET '{print $offset}')

You don't need to invoke awk in every iteration. You can populate an array using your letters and then access it's values using index:
#!/bin/ksh
# 1 2 3 4 5 6 7 8 9 10 11 12
letters=" a v c d g r g s s a g f"
# populate an array
arr=($letters)
offset=1
while [ "$offset" -le 12 ]; do
echo "${arr[$offset-1]}"
let offset++
done
Output:
a
v
c
d
g
r
g
s
s
a
g
f

Related

awk insert rows of one file as new columns to every nth rows of another file

Let's keep n=3 here, and say I have two files:
file1.txt
a b c row1
d e f row2
g h i row3
j k l row4
m n o row5
o q r row6
s t u row7
v w x row8
y z Z row9
file2.txt
1 2 3
4 5 6
7 8 9
I would like to merge the two files into a new_file.txt:
new_file.txt
a b c 2 3
d e f 2 3
g h i 2 3
j k l 5 6
m n o 5 6
o q r 5 6
s t u 8 9
v w x 8 9
y z Z 8 9
Currently I do this as follows (there are also slow bash for or while loop solutions, of course): awk '1;1;1' file2.txt > tmp2.txt and then something like awk 'FNR==NR{a[FNR]=$2" "$3;next};{$NF=a[FNR]};1' tmp2.txt file1.txt > new_file.txt for the case listed in my question.
Or put these in one line: awk '1;1;1' file2.txt | awk 'FNR==NR{a[FNR]=$2" "$3;next};{$NF=a[FNR]};1' - file1.txt > new_file.txt. But these do not look elegant at all...
I am looking for a more elegant one liner (perhaps awk) that can effectively do this.
In the real case, let's say for example I have 9 million rows in input file1.txt and 3 million rows in input file2.txt and I would like to append columns 2 and 3 of the first row of file2.txt as the new last columns of the first 3 rows of file1.txt, columns 2 and 3 of the second row of file2.txt as the same new last columns of the next 3 rows of file1.txt, etc, etc.
Thanks!
Try this, see mywiki.wooledge - Process Substitution for details on <() syntax
$ # transforming file2
$ cut -d' ' -f2-3 file2.txt | sed 'p;p'
2 3
2 3
2 3
5 6
5 6
5 6
8 9
8 9
8 9
$ # then paste it together with required fields from file1
$ paste -d' ' <(cut -d' ' -f1-3 file1.txt) <(cut -d' ' -f2-3 file2.txt | sed 'p;p')
a b c 2 3
d e f 2 3
g h i 2 3
j k l 5 6
m n o 5 6
o q r 5 6
s t u 8 9
v w x 8 9
y z Z 8 9
Speed comparison, time shown for two consecutive runs
$ perl -0777 -ne 'print $_ x 1000000' file1.txt > f1
$ perl -0777 -ne 'print $_ x 1000000' file2.txt > f2
$ du -h f1 f2
95M f1
18M f2
$ time paste -d' ' <(cut -d' ' -f1-3 f1) <(cut -d' ' -f2-3 f2 | sed 'p;p') > t1
real 0m1.362s
real 0m1.154s
$ time awk '1;1;1' f2 | awk 'FNR==NR{a[FNR]=$2" "$3;next};{$NF=a[FNR]};1' - f1 > t2
real 0m12.088s
real 0m13.028s
$ time awk '{
if (c==3) c=0;
printf "%s %s %s ",$1,$2,$3;
if (!c++){ getline < "f2"; f4=$2; f5=$3 }
printf "%s %s\n",f4,f5
}' f1 > t3
real 0m13.629s
real 0m13.380s
$ time awk '{
if (c==3) c=0;
main_fields=$1 OFS $2 OFS $3;
if (!c++){ getline < "f2"; f4=$2; f5=$3 }
printf "%s %s %s\n", main_fields, f4, f5
}' f1 > t4
real 0m13.265s
real 0m13.896s
$ diff -s t1 t2
Files t1 and t2 are identical
$ diff -s t1 t3
Files t1 and t3 are identical
$ diff -s t1 t4
Files t1 and t4 are identical
Awk solution:
awk '{
if (c==3) c=0;
main_fields=$1 OFS $2 OFS $3;
if (!c++){ getline < "file2.txt"; f4=$2; f5=$3 }
printf "%s %s %s\n", main_fields, f4, f5
}' file1.txt
c - variable reflecting nth coefficient
getline < file - reads the next record from file
f4=$2; f5=$3 - contain the values of the 2nd and 3rd fields from currently read record of file2.txt
The output:
a b c 2 3
d e f 2 3
g h i 2 3
j k l 5 6
m n o 5 6
o q r 5 6
s t u 8 9
v w x 8 9
y z Z 8 9
This is still a lot slower than Sundeep's cut&paste code on the 100,000 lines test (8s vs 21s on my laptop) but perhaps easier to understand than the other Awk solution. (I had to play around for a bit before getting the indexing right, though.)
awk 'NR==FNR { a[FNR] = $2 " " $3; next }
{ print $1, $2, $3, a[1+int((FNR-1)/3)] }' file2.txt file1.txt
This simply keeps (the pertinent part of) file2.txt in memory and then reads file1.txt and writes out the combined lines. That also means it is limited by available memory, whereas Roman's solution will scale to basically arbitrarily large files (as long as each line fits in memory!) but slightly faster (I get 28s real time for Roman's script with Sundeep's 100k test data).

Find and replace entries in one csv file using another with bash

Main file:
A B
C D
D A
G H
Ref file:
1 A
2 B
3 C
4 D
5 G
6 H
New file:
1 2
3 4
4 1
5 6
I wanna do the above replacement, how can I do that using awk or some simple command line?
awk solution:
awk 'NR==FNR{ a[$2]=$1; next }{ $1=a[$1]; $2=a[$2] }1' reffile mainfile
The output:
1 2
3 4
4 1
5 6
a[$2]=$1 - capturing numbers from reffile into array indexed by letters (e.g. a["A"]=1)
$1=a[$1]; $2=a[$2] - replacing letters in mainfile with respective numbers

how to use awk to merge files with common fields and print in another file

I have read all the related questions, but still quite confuse...
I have two files tab separated.
file1 (breaks added for readability):
a 15 bac
g 10 bac
h11 bac
r 33 arq
t 12 euk
file2 (breaks added for readability):
0 15 h 3 5 2 gf a a g e g s s g g
p 33 g 4 5 2 hg 3 1 3 f 5 h 5 h 6
g 4 r 8 j 9 jk 9 j 9 9 h t 9 k 0
Output desired (breaks added for readability):
bac 15 h 3 5 2 gf a a g e g s s g g
arq 33 g 4 5 2 hg 3 1 3 f 5 h 5 h 6
ND g 4 r 8 j 9 jk 9 j 9 9 h t 9 k 0
Just that. I need to print the complete file2 but in the first column I need to replace with the third column of file1 only when $2 of file2 is the same that $2 of file1...
file1 is larger than file2, but still could happen that $2 from file2 is not present in file1, in that case print in the first column ND.
I'm sure it must be simple, but I have problems with awk managing two files. Please, if someone could help me...
Using this awk command:
awk 'FNR==NR{a[$2]=$3;next} {$1=(a[$2])?a[$2]:"ND"} 1' file1 file2
bac 15 h 3 5 2 gf a a g e g s s g g
arq 33 g 4 5 2 hg 3 1 3 f 5 h 5 h 6
ND 4 r 8 j 9 jk 9 j 9 9 h t 9 k 0
Explanation:
FNR==NR - Execute this block for first file in input i.e. file1
a[$2]=$3 - Populate an associative array a with key as $2 and value as $3 from file1
next - Read next line until EOF on first file
Now operating in file2
$1=(a[$2])?a[$2]:"ND" - Overwrite $1 with a[$2] if $2 is found in array a, otherwise by literal string "ND"
1 - print the output
You could try with join + awk command as below:
join -t ' ' -a2 -1 2 -2 2 test1.txt test2.txt | awk 'BEGIN { start = 5; end = 18 } { if (NF == 16) { temp = $1; $1 = "ND " $2; $2 = temp; print } else { printf("%s %s ", $3, $1); for (i=start; i<=end; i++) printf ("%s ", $i); printf("\n");}}'

Unix Command (Mac OS): cut and move rows

Could you please give me a hint which unix command I can use to do the following:
I want to convert these lines...
1 a i
2 b ii
3 c iii
4 d iv
5 e v
6 f vi
7 g vii
8 h viii
9 i xi
...into those:
1 a i 4 d iv 7 g vii
2 b ii 5 e v 8 h viii
3 c iii 6 f vi 9 i xi
rsand perl -pne just transpose them but I need a completely new arrangement as you see. Perl-code would be favored, but I am thankful for any help.
cheers
marsch
Using a perl one-liner
perl -lne 'push #{$l[($.-1) % 3]}, $_; }{ print "#$_" for #l' data.txt | column -t
Explanation:
Switches:
-l: Enable line ending processing, specifies line terminator
-n: Creates a while(<>){..} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
Code:
push #{$l[($.-1) % 3]}, $_;: Push each line into an array modulo the line number
}{ print "#$_" for #l: Print the 3 element array at end of processing
| column -t: Even out the columns
I would go with split and paste from coreutils. Try the following commands:
split -l3 infile
paste -d' ' xaa xab xac | column -t
Output:
1 a i 4 d iv 7 g vii
2 b ii 5 e v 8 h viii
3 c iii 6 f vi 9 i xi
Here is a oneliner:
perl -ne 'chomp; push #a,$_ if $_; unless($. % 3) {push #f,[#a]; #a = undef; shift #a} END {for my $i (#f) { for (#$i) {print "$_ "} print "\n"}}' filename.txt
output
1 a i 2 b ii 3 c iii
4 d iv 5 e v 6 f vi
7 g vii 8 h viii 9 i xi
I use ruby
string = "1 a i
2 b ii
3 c iii
4 d iv
5 e v
6 f vi
7 g vii
8 h viii
9 i xi "
ary = string.split("\n")
length = ary.size / 3
new_ary = Array.new(3, "")
ary.each_with_index do |e, i|
position = i % 3
new_ary[position] += e
end
puts new_ary.join("\n")
Hope to help:)

AWK -- How to do selective multiple column sorting?

In awk, how can I do this:
Input:
1 a f 1 12 v
2 b g 2 10 w
3 c h 3 19 x
4 d i 4 15 y
5 e j 5 11 z
Desired output, by sorting numerical value at $5:
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x
Note that the sorting should only affecting $4, $5, and $6 (based on value of $5), in which the previous part of table remains intact.
This could be done in multiple steps with the help of paste:
$ gawk '{print $1, $2, $3}' in.txt > a.txt
$ gawk '{print $4, $5, $6}' in.txt | sort -k 2 -n b.txt > b.txt
$ paste -d' ' a.txt b.txt
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x
Personally, I find using awk to safely sort arrays of columns rather tricky because often you will need to hold and sort on duplicate keys. If you need to selectively sort a group of columns, I would call paste for some assistance:
paste -d ' ' <(awk '{ print $1, $2, $3 }' file.txt) <(awk '{ print $4, $5, $6 | "sort -k 2" }' file.txt)
Results:
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x
This can be done in pure awk, but as #steve said, it's not ideal. gawk has limited sort functions, and awk has no built-in sort at all. That said, here's a (rather hackish) solution using a compare function in gawk:
[ghoti#pc ~/tmp3]$ cat text
1 a f 1 12 v
2 b g 2 10 w
3 c h 3 19 x
4 d i 4 15 y
5 e j 5 11 z
[ghoti#pc ~/tmp3]$ cat doit.gawk
### Function to be called by asort().
function cmp(i1,v1,i2,v2) {
split(v1,a1); split(v2,a2);
if (a1[2]>a2[2]) { return 1; }
else if (a1[2]<a2[2]) { return -1; }
else { return 0; }
}
### Left-hand-side and right-hand-side, are sorted differently.
{
lhs[NR]=sprintf("%s %s %s",$1,$2,$3);
rhs[NR]=sprintf("%s %s %s",$4,$5,$6);
}
END {
asort(rhs,sorted,"cmp"); ### This calls the function we defined, above.
for (i=1;i<=NR;i++) { ### Step through the arrays and reassemble.
printf("%s %s\n",lhs[i],sorted[i]);
}
}
[ghoti#pc ~/tmp3]$ gawk -f doit.gawk text
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x
[ghoti#pc ~/tmp3]$
This keeps your entire input file in arrays, so that lines can be reassembled after the sort. If your input is millions of lines, this may be problematic.
Note that you might want to play with the printf and sprintf functions to set appropriate output field separators.
You can find documentation on using asort() with functions in the gawk man page; look for PROCINFO["sorted_in"].

Resources