Sort an associative array in awk - sorting

I have an associative array in awk that gets populated like this:
chr_count[$3]++
When I try to print my chr_counts, I use this:
for (i in chr_count) {
print i,":",chr_count[i];
}
But not surprisingly, the order of i is not sorted in any way.
Is there an easy way to iterate over the sorted keys of chr_count?

Instead of asort, use asorti(source, destination) which sorts the indices into a new array and you won't have to copy the array.
Then you can use the destination array as pointers into the source array.
For your example, you would use it like this:
n=asorti(chr_count, sorted)
for (i=1; i<=n; i++) {
print sorted[i] " : " chr_count[sorted[i]]
}

you can use the sort command. e.g.
for ( i in data )
print i ":", data[i] | "sort"

I recently came across this issue and found that with gawk I could set the value of PROCINFO["sorted_in"] to control iteration order. I found a list of valid values for this by searching for PROCINFO online and landed on this GNU Awk User's Guide page: https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html
This lists options of the form #{ind|val}_{num|type|str}_{asc|desc} with:
ind sorting by key (index) and val sorting by value.
num sorting numerically, str by string and type by assigned type.
asc for ascending order and desc for descending order.
I simply used:
PROCINFO["sorted_in"] = "#val_num_desc"
for (i in map) print i, map[i]
And the output was sorted in descending order of values.

Note that asort() and asorti() are specific to gawk, and are unknown to awk. For plain awk, you can roll your own sort() or get one from elsewhere.

This is taken directly from the documentation:
populate the array data
# copy indices
j = 1
for (i in data) {
ind[j] = i # index value becomes element value
j++
}
n = asort(ind) # index values are now sorted
for (i = 1; i <= n; i++) {
do something with ind[i] Work with sorted indices directly
...
do something with data[ind[i]] Access original array via sorted indices
}

Related

moving sql logic to backend - bash

One of the sql logic is moving to backend and I need to generate a report using shell scripting.
For understanding, I'm making it simple as follows.
My input file - sales.txt (id, price, month)
101,50,2019-10
101,80,2020-08
101,80,2020-10
201,100,2020-09
201,350,2020-10
The output should be for 6 months window for each id e.g t1=2020-07 and t2=2020-12
101,50,2020-07
101,80,2020-08
101,80,2020-09
101,80,2020-10
101,80,2020-11
101,80,2020-12
201,100,2020-09
201,350,2020-10
201,350,2020-11
201,350,2020-12
For id 101, though there is no entry for 2020-07, it should take from the immediate previous month value that is available in the sales file.
So the price=50 from 2019-10 is used for 2020-07.
For 201, the first entry itself is from 2020-09, so 2020-08 and 2020-07 are not applicable.
Wherever there are gaps the immediate previous month value should be propagated.
I'm trying to use awk to solve this problem, I'm creating a reusable script util.awk like below
to generate the missing values, pipe it to sort command and then again use the util.awk for final output.
util.awk
function get_month(a,b,t1) { return strftime("%Y%m",mktime(a " " b t1)) }
BEGIN { ss=" 0 0 0 "; ts1=" 1 " ss; ts2=" 35 " ss ; OFS="," ; x=1 }
{
tsc=get_month($3,$4,ts1);
if ( NR>1 && $1==idp )
{
if( tsc == tsp) { print $1,$2,get_month($3,$4,ts1); x=0 }
else { for(i=tsp; i < tsc; i=get_month(j1,j2,i) )
{
j1=substr(i,1,4); j2=substr(i,5,2);
print $1,tpr,i;
}
}
}
tsp=get_month($3,$4,ts2);
idp=$1;
tpr=$2;
if(x!=0) print $1,$2,tsc
x=1;
}
But it is running infinitely awk -F"[,-]" -f utils.awk sales.txt
Though I tried in awk, I welcome other answers as well that would work in bash environment.
General plan:
assumption: sales.txt is already sorted (numerically) by the first column
user provides the min->max date range to be displayed (awk variables mindt and maxdt)
for a distinct id value we'll load all prices and dates into an array (prices[])
dates will be used as the indices of an associative array to store prices (prices[YYYY-MM])
once we've read all records for a given id ...
sort the prices[] array by the indices (ie, sort by YYYY-MM)
find the price for the max date less than mindt (save as prevprice)
for each date between mindt and maxdt (inclusive), if we have a price then display it (and save as prevprice) else ...
if we don't have a price but we do have a prevprice then use this prevprice as the current date's price (ie, fill the gap with the previous price)
One (GNU) awk idea:
mindate='2020-07'
maxdate='2020-12'
awk -v mindt="${mindate}" -v maxdt="${maxdate}" -v OFS=',' -F',' '
# function to add "months" (number) to "indate" (YYYY-MM)
function add_month(indate,months) {
dhms="1 0 0 0" # default day/hr/min/secs
split(indate,arr,"-")
yr=arr[1]
mn=arr[2]
return strftime("%Y-%m", mktime(arr[1]" "(arr[2]+months)" "dhms))
}
# function to print the list of prices for a given "id"
function print_id(id) {
if ( length(prices) == 0 ) # if prices array is empty then do nothing (ie, return)
return
PROCINFO["sorted_in"]="#ind_str_asc" # sort prices[] array by index in ascending order
for ( i in prices ) # loop through indices (YYYY-MM)
{ if ( i < mindt ) # as long as less than mindt
prevprice=prices[i] # save the price
else
break # no more pre-mindt indices to process
}
for ( i=mindt ; i<=maxdt ; i=add_month(i,1) ) # for our mindt - maxdt range
{ if ( !(i in prices) && prevprice ) # if no entry in prices[], but we have a prevprice, then ...
prices[i]=prevprice # set prices[] to prevprice (ie, fill the gap)
if ( i in prices ) # if we have an entry in prices[] then ...
{ prevprice=prices[i] # update prevprice (for filling future gap) and ...
print id,prices[i],i # print our data to stdout
}
}
}
BEGIN { split("",prices) } # pre-declare prices as an array
previd != $1 { print_id(previd) # when id changes print the prices[] array, then ...
previd=$1 # reset some variables for processing of the next id and ...
prevprice=""
delete prices # delete the prices[] array
}
{ prices[$3]=$2 } # for the current record create an entry in prices[]
END { print_id(previd) } # flush the last set of prices[] to stdout
' sales.txt
NOTE: This assumes sales.txt is sorted (numerically) by the first field; if this is not true then the last line should be changed to ' <(sort -n sales.txt)
This generates:
101,50,2020-07
101,80,2020-08
101,80,2020-09
101,80,2020-10
101,80,2020-11
101,80,2020-12
201,100,2020-09
201,350,2020-10
201,350,2020-11
201,350,2020-12
I hope I understood your question a bit. The following awk should do the trick
$ awk -v t1="2020-07" -v d="6" '
function next_month(d,a) {
split(d,a,"-"); a[2]==12?a[1]++ && a[2]=1 : a[2]++
return sprintf("%0.4d-%0.2d",a[1],a[2])
}
BEGIN{FS=OFS=",";t2=t1; for(i=1;i<=d;++i) t2=next_month(t2)}
{k[$1]}
($3<t1){a[$1,t1]=$2}
(t1 <= $3 && $3 < t2) { a[$1,$3]=$2 }
END{ for (key in k) {
p=""; t=t1;
for(i=1;i<=d;++i) {
if(p!="" || (key,t) in a) print key, ((key,t) in a ? p=a[key,t] : p), t
t=next_month(t)
}
}
}' input.txt
We implemented a straightforward function next_month that computes the next month based on a format YYYY-MM. Based on the duration of d months, we compute the time-period that should be shown in the BEGIN block. The time-period of interest is t1 <= t < t2.
Every time we read a record/line, we keep track of the key that he's been processed and store it in the array k. This way we know which key has been seen up to this point.
for all the times before the time-period of interest, we store the value in an array a with index (key,t1), while for all other times, we store the value in the array a with key (key,$3).
When the file is fully processed, we just cycle over all keys and print the output. We used a bit of logic, to check whether or not the month was listed in the original file.
Note: the output will be per key sorted in time, but the key will not appear in the same order as in the original file.

How to sort a name (String) list in Swift?

There is a name list without any order.
How to sort name list by alphabet order?
What I mean is how to compare two strings to find out greater one?
Is there any existing method or function to do this like Java does?
In Java, we can use
"abc".compareTo("abb");
to compare strings greater or smaller.
Yes. You can use sort function of array like below,
let names = ["Chris", "Alex", "Ewa", "Barry", "Daniella"]
var reversed = names.sort({s1,s2 in s1 < s2})
//var reversed = names.sort {$0 < $1} // Shorter version of Closure.
print(reversed)

Sorting a Lua table by key

I have gone through many questions and Google results but couldn't find the solution.
I am trying to sort a table using table.sort function in Lua but I can't figure out how to use it.
I have a table that has keys as random numeric values. I want to sort them in ascending order. I have gone through the Lua wiki page also but table.sort only works with the table values.
t = { [223]="asd", [23]="fgh", [543]="hjk", [7]="qwe" }
I want it like:
t = { [7]="qwe", [23]="fgh", [223]="asd", [543]="hjk" }
You cannot set the order in which the elements are retrieved from the hash (which is what your table is) using pairs. You need to get the keys from that table, sort the keys as its own table, and then use those sorted keys to retrieve the values from your original table:
local t = { [223]="asd", [23]="fgh", [543]="hjk", [7]="qwe" }
local tkeys = {}
-- populate the table that holds the keys
for k in pairs(t) do table.insert(tkeys, k) end
-- sort the keys
table.sort(tkeys)
-- use the keys to retrieve the values in the sorted order
for _, k in ipairs(tkeys) do print(k, t[k]) end
This will print
7 qwe
23 fgh
223 asd
543 hjk
Another option would be to provide your own iterator instead of pairs to iterate the table in the order you need, but the sorting of the keys may be simple enough for your needs.
What was said by #lhf is true, your lua table holds its contents in whatever order the implementation finds feasible. However, if you want to print (or iterate over it) in a sorted manner, it is possible (so you can compare it element by element). To achieve this, you can do it in the following way
for key, value in orderedPairs(mytable) do
print(string.format("%s:%s", key, value))
end
Unfortunately, orderedPairs is not provided as a part of lua, you can copy the implementation from here though.
The Lua sort docs provide a good solution
local function pairsByKeys (t, f)
local a = {}
for n in pairs(t) do table.insert(a, n) end
table.sort(a, f)
local i = 0 -- iterator variable
local iter = function () -- iterator function
i = i + 1
if a[i] == nil then return nil
else return a[i], t[a[i]]
end
end
return iter
end
Then you traverse the sorted structure
local t = { b=1, a=2, z=55, c=0, qa=53, x=8, d=7 }
for key,value in pairsByKeys(t) do
print(" " .. tostring(key) .. "=" .. tostring(value))
end
There is no notion of order in Lua tables: they are just sets of key-value pairs.
The two tables below have exactly the same contents because they contain exactly the same pairs:
t = { [223] = "asd" ,[23] = "fgh",[543]="hjk",[7]="qwe"}
t = {[7]="qwe",[23] = "fgh",[223] = "asd" ,[543]="hjk"}

How to deal with this situation when picking a single owner from a list of owners using perl hashes?

I run perforce command on a list of files and after some parsing and stuff i generate a file that contains owners like this(call it owner.log):
ownerA
ownerB
ownerC
ownerA
ownerA
then i go throug the owner.log file and pick an owner like this:
while(<OWNER>) {
$vote->{$_} += 1;
}
and then the owner with the highest vote gets selected for email notification. But the problem is when i have an owner log like this:
ownerA
ownerB
ownerC
ownerD
each one gets the same vote? How should i pick one?
Thank you.
Is there a quick way of finding if all hashes have same value? that way i can pick one at random.
One way to determine if all hash keys have the same value is to use uniq. If there is only one common value, use the keys of your hash as an array and use rand to find a random index within the array bounds:
use More::ListUtils qw(uniq);
my #keys = keys %hash;
my #vals = values %hash;
if (scalar uniq(#vals) == 1) {
print "all of equal weight\n";
print $keys[ int(rand(#keys)) ], "\n";
}
Assuming the array #winners:
print "The winner is: ", $winners[rand #winners];
The whole process:
my $last = 0;
my #winners;
for my $name (sort { $vote->{$b} <=> $vote->{$a} } keys %$vote) {
last if ($vote->{$name} < $last);
push #winners, $name;
$last = $vote->{$name};
}
my $winner = $winners[rand #winners];
print "The winner is, by ",
#winners == 1 ? "unanimous vote: " : "luck of the draw: ", $winner;

how to sort a treemap using bubble sort?

27527-683
27525-1179
27525-1571
27525-1813
27525-4911
27526-1303
27526-3641
27525-3989
27525-4083
27525-4670
27526-4102
27526-558
27527-2411
27527-4342
this is the list of key where it is declared as string in a map
then i want to sort it in ascending order.
how can i use a bubble sorting method inside a map?
where the value of the key is a list.
in order to get :
27525-1179
27525-1571
27525-1813
27525-3989
27525-4083
27525-4670
27525-4911
27526-558
27526-1303
27526-3641
27526-4102
27527-683
27527-2411
27527-4342
You should be able to just perform an in-order traversal on your tree. Bu if you insist here is what you would do.
keyList = yourTreeMap.getKeys();
for(i = keyList.length-1; i > 0; i--)
for(j = 0; j < i; j++)
if (keyList[j] > keyList[j+1]) keyList.swap(j, j+1);
Since you don't specify a lnaguage, I present psuedocode.
In general you just use the same bubble sort algorithm as normal it's just your comparison condition that's tweaked here to look at both the key and the value to determine what is greater than what, that is compare the keys first and if they're equal then compare the values if the keys don't match then use the difference in the values to get your result of swap or don't swap. Bubble sort is bad efficiency-wise though if you're using this in a real world scenario.
Jon got the post in before me but basically what he wrote looks right except you'd want a complex condition for the if within the nested loop, like
if(key1<key2)
keyList.swap(i,j)
else if(keyList[key1]<keyList[key2])
keyList.swap(i,j)
of course as he also stated how these keys/values are actually extracted/used will depend on the language, which is lacking in the question or tags.

Resources