Sorting multiple arrays simultaneously in awk - sorting

Introduction
Consider the following example sort.awk:
BEGIN {
a[1]="5";
a[2]="3";
a[3]="6";
asort(a)
for (i=1; i<=3; i++) print a[i]
}
Running with awk -f sort.awk prints the sorted numbers in array a in ascending order:
3
5
6
Question
Consider the extended case of two (and, in general, for N) corresponding arrays a and b
a[1]="5"; b[1]="fifth"
a[2]="3"; b[2]="third"
a[3]="6"; b[3]="sixth"
and the problem of sorting all arrays "simultaneously".. To achieve this, I need to sort array a but also to obtain the indices of the sorting. For this simple case, the indices would be given by
ind[1]=2; ind[2]=1; ind[3]=3;
Having these indices, I can then print out also the sorted b array based on the result of the sorting of array a. For instance:
for (i=1;i<=3;i++) print a[ind[i]], b[ind[i]]
will print the sorted arrays..
See also Sort associative array with AWK.

I come up with two methods to do your "simultaneous" sort.
One is combining the two arrays then sort. This is useful when you just need the output.
the other one is using gawk's asorti()
read codes for details, I think it is easy to understand:
BEGIN{
a[1]="5"; b[1]="fifth"
a[2]="3"; b[2]="third"
a[3]="6"; b[3]="sixth"
#method 1: combine the two arrays before sort
for(;++i<=3;)
n[i] = a[i]" "b[i]
asort(n)
print "--- method 1: ---"
for(i=0;++i<=3;)
print n[i]
#method 2:
#here we build a new array/hastable, and use asorti()
for(i=0;++i<=3;)
x[a[i]]=b[i]
asorti(x,t)
print "--- method 2: ---"
for(i=0;++i<=3;)
print t[i],x[t[i]]
}
output:
kent$ awk -f sort.awk
--- method 1: ---
3 third
5 fifth
6 sixth
--- method 2: ---
3 third
5 fifth
6 sixth
EDIT
if you want to get the original index, you can try the method3 as following:
#method 3:
print "--- method 3: ---"
for(i=0;++i<=3;)
c[a[i]] = i;
asort(a)
for(i=0;++i<=3;)
print a[i], " | related element in b: "b[c[a[i]]], " | original idx: " c[a[i]]
the output is:
--- method 3: ---
3 | related element in b: third | original idx: 2
5 | related element in b: fifth | original idx: 1
6 | related element in b: sixth | original idx: 3
you can see, the original idx is there. if you want to save them into an array, just add idx[i]=c[a[i]] in the for loop.
EDIT2
method 4: combine with different order, then split to get idx array:
#method 4:
for(i=0;++i<=3;)
m[i] = a[i]"\x99"i
asort(m)
print "--- method 4: ---"
for(i=0;++i<=3;){
split(m[i],x,"\x99")
ind[i]=x[2]
}
#test ind array:
for(i=0;++i<=3;)
print i"->"ind[i]
output:
--- method 4: ---
1->2
2->1
3->3

Based on Kents answer, here is a solution that should also obtain the indices:
BEGIN {
a[1]="5";
a[2]="3";
a[3]="6";
for (i=1; i<=3; i++) b[i]=a[i]" "i
asort(b)
for (i=1; i<=3; i++) {
split(b[i],c," ")
ind[i]=c[2]
}
for (i=1; i<=3; i++) print ind[i]
}

Related

Perl sorting Alpha characters in a special way

I know this question may have been asked a million times but I am stumped. I have an array that I am trying to sort. The results I want to get are
A
B
Z
AA
BB
The sort routines that are available dont sort it this way. I am not sure if it can be done. Here's is my perl script and the sorting that I am doing. What am I missing?
# header
use warnings;
use strict;
use Sort::Versions;
use Sort::Naturally 'nsort';
print "Perl Starting ... \n\n";
my #testArray = ("Z", "A", "AA", "B", "AB");
#sort1
my #sortedArray1 = sort #testArray;
print "\nMethod1\n";
print join("\n",#sortedArray1),"\n";
my #sortedArray2 = nsort #testArray;
print "\nMethod2\n";
print join("\n",#sortedArray2),"\n";
my #sortedArray3 = sort { versioncmp($a,$b) } #testArray;
print "\nMethod3\n";
print join("\n",#sortedArray3),"\n";
print "\nPerl End ... \n\n";
1;
OUTPUT:
Perl Starting ...
Method1
A
AA
AB
B
Z
Method2
A
AA
AB
B
Z
Method3
A
AA
AB
B
Z
Perl End ...
I think what you want is to sort by length and then by ordinal. This is easily managed with:
my #sortedArray = sort {
length $a <=> length $b ||
$a cmp $b
} #testArray;
That is exactly as the English: sort based on length of a vs b, then by a compared to b.
my #sorted =
sort {
length($a) <=> length($b)
||
$a cmp $b
}
#unsorted;
or
# Strings must be have no characters above 255, and
# they must be shorter than 2^32 characters long.
my #sorted =
map substr($_, 4),
sort
map pack("N/a*", $_),
#unsorted;
or
use Sort::Key::Maker sort_by_length => sub { length($_), $_ }, qw( int str );
my #sorted = sort_by_length #unsorted;
The second is the most complicated, but it should be the fastest. The last one should be faster than the first.

How can I print a LUA table in order?

I have a table that I need to print out in order. I know LUA tables are not ordered.. but I am having a terrible time printing it out in an ordered way. I have cut a dozen code snips from this site and I just can not get it to work.
Say I have a table like this:
local tableofStuff = {}
tableofStuff['White'] = 15
tableofStuff['Red'] = 55
tableofStuff['Orange'] = 5
tableofStuff['Pink'] = 12
How can I get it to print like this...
Red, 55
White, 15
Pink, 12
Orange, 4
Using a line like this inside a loop...
print(k..', '..v)
You can store the key/value pairs in an array, sort the array by the second element, and loop through that array. (This example uses tail recursion, because that's how I felt like doing it.)
local tableofStuff = {}
tableofStuff['White'] = 15
tableofStuff['Red'] = 55
tableofStuff['Orange'] = 5
tableofStuff['Pink'] = 12
-- We need this function for sorting.
local function greater(a, b)
return a[2] > b[2]
end
-- Populate the array with key,value pairs from hashTable.
local function makePairs(hashTable, array, _k)
local k, v = next(hashTable, _k)
if k then
table.insert(array, {k, v})
return makePairs(hashTable, array, k)
end
end
-- Print the pairs from the array.
local function printPairs(array, _i)
local i = _i or 1
local pair = array[i]
if pair then
local k, v = table.unpack(pair)
print(k..', '..v)
return printPairs(array, i + 1)
end
end
local array = {}
makePairs(tableofStuff, array)
table.sort(array, greater)
printPairs(array)

Perl sqrt, cube issue: 1 showing up after each line

I am having a tiny issue with a small perl script using arithmetic operators. After my cube root, and square root operators, a 1 shows up. I was testing this script on an openSUSE 42.1 VM.
I'm just not too certain what the 1 after each line is, I have tried looking it up, but am not too certain. I mainly script in bash, and ksh, so this perl syntax is a bit new to me.
Script:
#!/usr/bin/perl
# Provide a sum, cube of the sum, and square root of the sum of three set variables
# Set variables
$v1=10;
$v2=9;
$v3=8;
$val=$v1+$v2+$v3;
$cube=$val ** (1/3);
$square= sqrt($val);
print "Sum of 10, 9, 8: $val\n";
print
print "Cube of Sum: $cube\n";
print
print "Square of Sum: $square\n";
print
print "Thanks for using this script!"
Your lines just saying
print
are not statements in themselves as they are not terminated by a ;. Instead they are part of statements of the form
print print "text";
The inner print has an argument of "text" and prints that, the outer print has an argument of print "text" and print the value of that, and when succesful print returns a value of 1 (perldoc only says it returns true, so don't rely it being 1) - so a 1 is printed.
If the point was to format your output nicely, you should explicitly print "\n".
As has been explained, half of your print calls are printing the return value of the following print statement because you are missing a semicolon at the end of the line to terminate the statement
Also, print on its own will print the value of the default variable $_, not a newline as you expected. You need to write print "\n"; to achieve what you intend
It's also very important to add use strict and use warnings 'all' to the top of every Perl program you write. You will also need to declare all of your variables using my
#!/usr/bin/perl
use strict;
use warnings 'all';
# Provide a sum, cube of the sum, and square root of the sum of three set variables
# Set variables
my $v1 = 10;
my $v2 = 9;
my $v3 = 8;
my $val = $v1 + $v2 + $v3;
my $cube = $val**( 1 / 3 );
my $square = sqrt($val);
print "Sum of 10, 9, 8: $val\n";
print "\n";
print "Cube root of Sum: $cube\n";
print "\n";
print "Square root of Sum: $square\n";
print "\n";
print "Thanks for using this script!\n";
print "\n";
output
Sum of 10, 9, 8: 27
Cube root of Sum: 3
Square root of Sum: 5.19615242270663
Thanks for using this script!
It's also worth pointing out that there's a construct called a here document that will let you do this more neatly and clearly. If you change those print statements to just one, like this, then the intention is clear and the output is identical to that of the original code
print <<END;
Sum of 10, 9, 8: $val
Cube root of Sum: $cube
Square root of Sum: $square
Thanks for using this script!
END
As Henrik states in his answer, the lines with print and no ; are the problem.
An alternate way to get Perl to print a blank line between the main lines of output is to add an addition new line character, \n, at the end of each of the print lines. The code would become:
#!/usr/bin/perl
# Provide a sum, cube of the sum, and square root of the sum of three set variables
# Set variables
$v1=10;
$v2=9;
$v3=8;
$val=$v1+$v2+$v3;
$cube=$val ** (1/3);
$square= sqrt($val);
print "Sum of 10, 9, 8: $val\n\n";
print
print "Cube of Sum: $cube\n\n";
print
print "Square of Sum: $square\n\n";
print
print "Thanks for using this script!"
The output is:
Sum of 10, 9, 8: 27
Cube of Sum: 3
Square of Sum: 5.19615242270663
By the way, your equation for calculating the cube of the sum calculates the cubed root. To calculate the cube of the sum you need,
$cube=$val ** (3);
Likewise, your equation to find the square of the sum is calculating the square root, not the square. To find the square of the sum you need to raise the sum to the power of 2.

Unix Bash...To sum up each row in a csv file, from the second entry onwards and then find the highest number from the sum of rows

I have create a one liner that will sum up each row in a csv file, from the second entry onwards. But I want to find the highest number from the sum of rows
Example file output: There are thousands of rows
03/Mar/2016:00:14,19772,7494,11293,9467
03/Mar/2016:00:15,18041,13241,9715,8968
03/Mar/2016:00:16,17441,13534,9926,9301
03/Mar/2016:00:17,17709,14243,9022,9209
03/Mar/2016:00:18,16368,13535,8761,8313
03/Mar/2016:00:19,17074,13224,8868,7789
03/Mar/2016:00:20,16783,13666,9499,8763
03/Mar/2016:00:21,16665,12962,8821,8862
Example script:
This is what I have achieved by calculating each row but need to just find the highest number from the calculated rows. Any ideas?
awk 'BEGIN {FS=OFS=","} {sum=0; for(i=2;i<=NF;i++) {sum+=$i}; print $0,"sum:"sum,}' /tmp/101.20160304.csv
cheers
awk is quite capable of remembering a maximum value.
awk -F, '
# for every row, calculate the sum
{sum = 0; for (i=2; i<=NF; i++) sum += $i}
# set the max value (if the first row, initialize the max value)
NR == 1 || sum > max {max = sum}
END {print max}
' file
For your sample data, this is the max:
50202
you can pipe your awk output to :
awk_output|sort -t':' -nrk4|head -1
this does sort by the sum descending, then pick the first row. Of course you can re-write your awk, to do this in one shot.

Need to calculate standard deviation from an array using bash and awk?

Guys I'm new to awk and I'm struggling with awk command to find the standard deviation.
I have got the mean using the following:
echo ${GfieldList[#]} | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "Mean= " sum / NF; }'
Standard Deviation formula is:
sqrt((1/N)*(sum of (value - mean)^2))
I have found the mean using the above formula
Can you guys help me with the awk command for this one?
An alternate formula for the standard deviation is the square root of the quantity: (the mean square minus the square of the mean). This is used below:
$ echo 20 21 22 | awk 'NF {sum=0;ssq=0;for (i=1;i<=NF;i++){sum+=$i;ssq+=$i**2}; print "Std Dev=" (ssq/NF-(sum/NF)**2)**0.5}'
Std Dev=0.816497
Notes:
In awk, NF is the number of "fields" on a line. In our case, every field is a number, so NF is the number of numbers on a given line.
ssq is the sum of the squares of each number on the line. Thus, ssq/NF is the mean square.
sum is the sum of the numbers on the line. Thus sum/NF is the mean and (sum/NF)**2 is the square of the mean.
As per the formular, then, the standard deviation is (ssq/NF-(sum/NF)**2)**0.5.
The awk code
NF
This serves as a condition: the statements which follow will only be executed if the number of fields on this line, NF, evaluates to true, meaning non-zero. In other words, this condition will cause empty lines to be skipped.
sum=0;ssq=0;
This initializes sum and ssq to zero. This is only needed if there is more than one line of input.
for (i=1;i<=NF;i++){sum+=$i;ssq+=$i**2}
This puts the sum of all the numbers in sum and the sum of the square of the numbers in ssq.
print "Std Dev=" (ssq/NF-(sum/NF)**2)**0.5
This prints out the standard deviation.
Once you know the mean:
awk '{
for (i = 1;i <= NF; i++) {
sum += $i
};
print sum / NF
}' # for 2, 4, 4, 4, 5, 5, 7, 9 gives 5
then the standard deviation can be found thus:
awk -vM=5 '{
for (i = 1; i <= NF; i++) {
sum += ($i-M) * ($i-M)
};
print sqrt (sum / NF)
}' # for 2, 4, 4, 4, 5, 5, 7, 9 gives 2
In "compressed" form:
awk '{for(i=1;i<=NF;i++){sum+=$i};print sum/NF}'
awk -vM=5 '{for(i=1;i<=NF;i++){sum+=($i-M)*($i-M)};print sqrt(sum/NF)}'
(changing the value for M to the actual mean extracted from the first command).

Resources