Efficient way to give multiplicity count in TCL lsort - sorting

TCL lsort does not have the nice feature to provide a count of the multiplicity of items. Is there a speedy alternative? We're looking at lists ~1M objects with 100s of same entries.
blasort -unique -count { 3 2 4 3 1 }
1 : 1
2 : 1
3 : 2
4 : 1
Thanks,
Gert

For counting the elements like that, it's much better to use a binsort-derived algorithm, with a dictionary or associative array as the fast map. The following should be efficient even with very large input lists:
proc countSort {elementList args} {
set count {}
foreach element $elementList {
dict incr count $element
}
# Now sort the dictionary by the keys (i.e., the unique elements of the input)
return [lsort -stride 2 -index 0 {*}$args $count]
}
Demonstrating how to use it by reproducing the output in your question from the input list you are using:
set input { 3 2 4 3 1 }
foreach {item count} [countSort $input] {
puts "$item : $count"
}

Old fashioned using array ...
proc count { lista } {
foreach item $lista {
incr counter($item)
}
return [array get counter];# return a list
;# or showing the result
;#parray counter
}

Related

How to sort faster when adding few elements to sorted list?

I have a sorted list of ~10'000 elements in which I insert a few elements (1-10) at a time between popping the first. Measurements show the sort procedure take several milliseconds (~5), presumably because lsort makes a sort from scratch every time. It now takes up the majority of frame time so I need to do something about it.
Is there some trick to make merging a large sorted list with a small sorted list with enhanced efficiency?
Code for explaining the context:
while {true} {
set work [lindex $frontier 0]
set frontier [lreplace $frontier 0 0]
if {[done $work]} break;
set more_work [do work]; # about 1-10 elements, distribution is generally hard to predict
lappend frontier {*}$more_work
set frontier [lsort $frontier]; # when frontier is 10'000 elements time to sort is ~5ms
}
Trying my best to implement a Tcl proc doing merge-like sort, will post findings. :-)
This proc reduces time elapsed from ~5ms to ~1.2ms:
proc merge_insert {sorted1 sorted2} {
set res {}
set prevloc 0
foreach insert $sorted2 {
# find location of next element to insert
set nextloc [lsearch -bisect -integer -index 1 $sorted1 [lindex $insert 1]]
# append up to next loc
lappend res {*}[lrange $sorted1 $prevloc $nextloc] $insert
# put read location just beyond the inserted element
set prevloc [+ 1 $nextloc]
}
# append whatever tail is left
lappend res {*}[lrange $sorted1 $prevloc end]
return $res
}
The attribute sorted on is an integer in the second element in each sorted element, hence the -integer index 1 and lindex $insert 1.

Dynamic list sorting in Tcl

I want to do some sorting based on a dynamic list. Let me explain below
I am using tcl version 8.4 which i cannot change, have to use that
list1 = {{a b c} {c b c a} {b b a}} ..... 1st input data
List 1 is a tcl list that has 3 members which forms different types of sub lists in any order and this will even change everytime. For example next time, list 1 will be :
list1 = {{c} {b c a} {b c} {a c a c}} ..... 2nd input data (for next time consideration)
Now I want to sort them in such a way that if I use a loop around them or lsort or string compare or any other tcl command, the new tcl list should contain individual members based on a priority. just as we have ascending/descending.
Notice in that both cases the individual sub_lists length is increasing and decreasing and at the same time from a,b,c also keep on rotating.
In my case I want "a" to have highest priority, then "b" and then "c" (a->b->c)
So output after processing done for 1st iteration should be :
$> puts $new_list1
$> {a a a} # as out of 3 sublists a is present in them and it gets highest priority.
Similarly, output after processing done on 2nd iteration should be :
$> puts $new_list1
$> {c a b a} # as you can see that list1 1st element is c so it gets output as is, second sublist has b c and a so `a` gets outputted, 3rd sublist is b and c so `b` gets outputted
Let me know what your thoughts are.
Thanks in advance !
First, I'd look into is constructing that data structure in a way such that you wouldn't have to sort all the sublists—for example, use an algorithm as simple as binary search to linsert each element into a sorted index per sublist.
Second, I'd think about whether you need as much "optimization" as you might think you do. Often, the best solution (due to maintainability) is the most obvious thing: sort the sublists, then use a loop, like so:
# construct a new list of sorted sublists
foreach sublist $list {
lappend presorted_list [lsort $sublist]
}
# given a reference to a list of sorted lists, simultaneously build (1) a list of
# each sublist's first element and (2) a list of the each sublist's remaining
# elements, so that the former can be returned, and the latter can be set by
# reference for the next iteration (and will have omitted any empty sublists)
proc process_once {presorted_list_ref} {
upvar $presorted_list_ref presorted_list
foreach sublist $presorted_list {
if {[llength $sublist] > 0} {
lappend returning_list [lindex $sublist 0]
lappend remaining_list [lrange $sublist 1 end]
}
}
set presorted_list $remaining_list
return $returning_list
}
set iter_1 [process_once presorted_list]
set iter_2 [process_once presorted_list]
I don't think there is any better way to do this, if you cannot pre-process or construct your original list in a way to begin with sorted sublists. Unless beginning with sorted sublists, you cannot make a decision about which item in each sublist must be output, without examining all items—so you might as well sort once so you'll know to always take the first item per sublist, as I've coded above.
In loop form, if you don't need to retrieve one iteration at a time specifically,
foreach sublist $list {
lappend presorted_list [lsort $sublist]
}
while {[llength $presorted_list] > 0} {
foreach sublist $presorted_list {
if {[llength $sublist] > 0} {
lappend returning_list [lindex $sublist 0]
lappend remaining_list [lrange $sublist 1 end]
}
}
#
# do stuff with $returning_list
#
set presorted_list $remaining_list
}

How do I efficiently (mem/time) modify all elelements of a list in Tcl?

To operate on each element of a list, returning a modified list various languages have explicit constructs.
In Perl there's map:
perl -e 'my #a = (1..4); print join(q( ), map { $_ * $_ } #a)'
1 4 9 16
In Python there're list comprehensions:
>>> a = (1,2,3,4)
>>> [el*el for el in a]
[1, 4, 9, 16]
What's the most efficient way to do this in Tcl?
I can come up with the usual foreach loop.
set l {}
foreach i {1 2 3 4} {
lappend l [expr $i * $i]
}
puts $l
1 4 9 16
Is this the fastest way?
Regarding mem efficiency this builds up a second list, one by one. If I don't need the list permanently is there a more efficient way?
And, finally, is there something that's shorter?
I couldn't find infos here or in the http://wiki.tcl.tk
Answer:
As Donal Fellows has answered, most importantly for speed tests, things should be wrapped in a proc {} since Tcl then can optimize. For Tcl, a "map" function is discussed as a future enhancement. With this hint and further searching I found http://wiki.tcl.tk/12848
The most efficient method is this:
set idx 0
foreach item $theList {
lset theList $idx [expr {$item * $item}]
incr idx
}
If the list is short (e.g., a few hundred elements) the cost of allocating a new list is minimal though, so you can use this (simpler) version instead:
foreach item $theList {
lappend newList [expr {$item * $item}]
}
Note that the foreach command is only fast if placed in a procedure (or lambda expression or method) and expressions are only fast if placed in {braces}. Also, don't speculate, measure: take care to use the time command to find out how fast your code really is.
Well, there is something shorter (using the tcllib struct::list package), but not necessarily faster.
package require struct::list
puts [struct::list mapfor x $data { expr {$x * $x} }]

find the repetition of duplicate numbers

this is my algorithm that I have written it with my friends (which are in stackoverflow site)
this algorithm will find just the first duplicate number and returns it.this works in O(n)
I want to complete this algorithm that helps me to get duplicate numbers with their repetition. consider that I have [1,1,3,0,5,1,5]
I want this algorithm to return 2 duplicate numbers which are 1 and 5 with their repetition which is 3 and 2 respectively .how can I do this with O(n)?
1 Algorithm Duplicate(arr[1:n],n)
2
3 {
4 Set s = new HashSet();i:=0;
5 while i<a.size() do
6 {
7 if(!s.add(a[i)) then
8 {
9 return a[i]; //this is a duplicate value!
10 break;
11 }
12 i++;
13 }
14 }
You can do this in Java:
List<Integer> num=Arrays.asList(1,1,1,2,3,3,4,5,5,5);
Map<Integer,Integer> countNum=new HashMap<Integer, Integer>();
for(int n:num)
{
Integer nu;
if((nu=countNum.get(n))==null)
{
countNum.put(n,1);
continue;
}
countNum.put(n,nu+1);
}
Instead of iterating each time to get count of duplicate it's better to store the count in map.
Use a Map/Dictionary data structure.
Iterate over the list.
For each item in list, do a map lookup. If the key (item) exists, increment its value. If the key doesn't exist, insert the key and initial count.
In this particular instance it's not so much about the algorithm, it's about the data structure: a Multiset is like a Set, except it doesn't store only unique items, instead it stores a count of how often each item is in the Multiset. Basically, a Set tells you whether a particular item is in the Set at all, a Multiset in addition also tells you how often that particular item is in the Multiset.
So, basically all you have to do is to construct a Multiset from your Array. Here's an example in Ruby:
require 'multiset'
print Multiset[1,1,3,0,5,1,5]
Yes, that's all there is to it. This prints:
#3 1
#1 3
#1 0
#2 5
If you only want actual duplicates, you simply delete those items with a count less than 2:
print Multiset[1,1,3,0,5,1,5].delete_with {|item, count| count < 2 }
This prints just
#1 3
#2 5
As #suihock mentions, you can also use a Map, which basically just means that instead of the Multiset taking care of the element counting for you, you have to do it yourself:
m = [1,1,3,0,5,1,5].reduce(Hash.new(0)) {|map, item| map.tap { map[item] += 1 }}
print m
# { 1 => 3, 3 => 1, 0 => 1, 5 => 2 }
Again, if you only want the duplicates:
print m.select {|item, count| count > 1 }
# { 1 => 3, 5 => 2 }
But you can have that easier if instead of counting yourself, you use Enumerable#group_by to group the elements by themselves and then map the groupings to their sizes. Lastly, convert back to a Hash:
print Hash[[1,1,3,0,5,1,5].group_by(&->x{x}).map {|n, ns| [n, ns.size] }]
# { 1 => 3, 3 => 1, 0 => 1, 5 => 2 }
All of these have an amortized worst case step complexity of Θ(n).

Evenly select N elems from array

I need to evenly select n elements from an array. I guess the best way to explain is by example.
say I have:
array [0,1,2,3,4] and I need to select 3 numbers.. 0,2,4.
of course, if the array length <= n, I just need to return the whole array.
I'm pretty sure there's a defined algorithm for this, been trying to search, and I took a look at Introduction to algorithms but couldn't find anything that met my needs (probably overlooked it)
The problem I'm having is that I can't figure out a way to scale this up to any array [ p..q ], selecting N evenly elements.
note: I can't just select the even elements from the example above..
A couple other examples;
array[0,1,2,3,4,5,6], 3 elements ; I need to get 0,3,6
array[0,1,2,3,4,5], 3 elements ; I need to get 0, either 2 or 3, and 5
EDIT:
more examples:
array [0,1,2], 2 elems : 0,2
array [0,1,2,3,4,5,6,7], 5 elems : 0,2, either 3 or 4, 5,7
and yes, I'd like to include first and last elements always.
EDIT 2:
what I was thinking was something like .. first + last element, then work my way up using the median value. Though I got stuck/confused when trying to do so.
I'll take a look at the algo you're posting. thanks!
EDIT 3:
Here's a souped up version of incrediman solution with PHP. Works with associative arrays as well, while retaining the keys.
<?php
/**
* Selects $x elements (evenly distributed across $set) from $set
*
* #param $set array : array set to select from
* #param $x int : number of elements to select. positive integer
*
* #return array|bool : selected set, bool false on failure
*/
///FIXME when $x = 1 .. return median .. right now throws a warning, division by zero
function select ($set, $x) {
//check params
if (!is_array($set) || !is_int($x) || $x < 1)
return false;
$n = count($set);
if ($n <= $x)
return $set;
$selected = array ();
$step = ($n - 1) / ($x - 1);
$keys = array_keys ($set);
$values = array_values($set);
for ($i=0; $i<$x; $i++) {
$selected[$keys[round($step*$i)]] = $values[round($step*$i)];
}
return $selected;
}
?>
You can probably implement an Iterator but I don't need to take it that far.
Pseudo-code:
function Algorithm(int N,array A)
float step=(A.size-1)/(N-1) //set step size
array R //declare return array
for (int i=0, i<N, i++)
R.push(A[round(step*i)]) //push each element of a position which is a
//multiple of step to R
return R
Probably the easiest mistake to make here would be to cast step as an integer or round it at the beginning. However, in order to make sure that the correct elements are pulled, you must declare step as a floating point number, and round multiples of step as you are iterating through the array.
Tested example here in php:
<?
function Algorithm($N,$A){
$step=(sizeof($A)-1)/($N-1);
for ($i=0;$i<$N;$i++)
echo $A[round($step*$i)]." ";
echo "\n";
}
//some of your test cases:
Algorithm(3,array(1,2,3));
Algorithm(5,array(0,1,2,3,4,5,6,7));
Algorithm(2,array(0,1,2));
Algorithm(3,array(0,1,2,3,4,5,6));
?>
Outputs:
1 2 3
0 2 4 5 7
0 2
0 3 6
(you can see your test cases in action and try new ones here: http://codepad.org/2eZp98eD)
Let n+1 be the number of elements you want, already bounded to the length of the array.
Then you want elements at indices 0/n, 1/n, ..., n/n of the way to the end of the array.
Let m+1 be the length of the array. Then your indices are round(m*i/n) (with the division done with floating point).
Your step size is (ArraySize-1)/(N-1).
Just add the step size to a floating point accumulator, and round off the accumulator to get the array index. Repeat until accumulator > array size.
It looks like you want to include both the first and last elements in your list.
If you want to pull X items from your list of N items, your step size will be (N-1)/(X-1). Just round however you want as you pull out each one.
Based on #Rex's answer. Psuedocode! Or some might even say it's JS
/// Selects |evenly spaced| elements from any given array. Handles all the edge cases.
function select(array: [Int], selectionCount: Int) {
let iterationCount = array.length - 1; // Number of iterations
let expectedToBeSelected = selectionCount - 1; // Number of elements to be selected
let resultsArray: [Int] = []; // Result Array
if (selectionCount < 1 || selectionCount > array.length) {
console.log("Invalid selection count!");
return resultsArray;
}
var i;
for (i in array) {
if (selectionCount == 1) {
resultsArray.push(array[i]);
break;
}
let selectedSoFar = Math.round(iterationCount * i / expectedToBeSelected);
if (selectedSoFar < array.length) {
resultsArray.push(array[selectedSoFar]);
} else {
break; // If selectedSoFar is greater than the length then do not proceed further.
}
}
return resultsArray;
}

Resources