Multi-way KK differencing algorithm vs. Greedy algorithm? - algorithm

It's proven that, the Karmarkar-Karp's differencing algorithm always performs better than greedy for 2-way partitioning problems, i.e. partitioning set of n integers to 2 subsets with equal sums. Can this be extended to k-way partitioning as well? If not, is there any example where greedy performs better than KK in k-way partitioning?

KK's superiority cannot be generalized for the k-way partitioning. In fact, it's easier to give a counter-example where the Greedy algorithm is performing better.
Let the performance measure be the maximum subset sum of the final partition.
Now, take this set of integers:
S = [10 7 5 5 6 4 10 11 12 9 10 4 3 4 5] and k=4 (partitioning into 4 equal subsets)
Fast forward, KK algorithm gives the result of [28, 26, 26, 26] whereas the greedy gives the final partition of [27, 27, 27, 24]. Since 28 > 27, greedy performed better for this example.

There is an issue with KK Algorithm solution provided.
Sum(S) = 105
Sum([28, 26, 26, 26]) = 106
Sum([27, 27, 27, 24]) = 105
Greedy algorithm gives a result of
{{12, 6, 5, 4}{11, 7, 5, 4}{10, 10, 4, 3}{10, 9, 5}}
[27, 27, 27, 24]
KK algorithm gives a result of
{{5, 12, 6, 4}{5, 10, 7, 4}{5, 11, 10}{4, 3, 10, 9}}
[27, 26, 26, 26]
Since the highest sums are equal (27=27) and KK's lowest sum is greater than the Greedy Algorithm's (26>24), KK algorithm performs better. There are circumstances where Greedy Algorithm can still perform better than KK, but this example isn't one of them.

Related

Return an index of the most common element in a list of integers with equal probability using O(1) space

I came across this coding problem and am having a hard time coming up with a solution.
Given an array of integers, find the most common element in a list of integers
and return any of its indexes randomly with equal probability. The solution must run in O(N) time and use O(1) space.
Example:
List contains: [-1, 4, 9, 7, 7, 2, 7, 3, 0, 9, 6, 5, 7, 8, 9]
7 is most common element so output should be one of: 3, 4, 6, 12
Now this problem would be fairly trivial if not for the constant space constraint. I know reservoir sampling can be used to solve the problem with these constraints if we know the the most common element ahead of time. But if we don't know the most common element, how could this problem be solved?

What does kth largest/smallest element mean?

I'm currently studying selection algorithms, namely, median of medians.
I came across two following sentences:
In computer science, a selection algorithm is an algorithm for finding
the kth smallest number in a list or array;
In computer science, the median of medians is an approximate (median)
selection algorithm, frequently used to supply a good pivot for an
exact selection algorithm, mainly the quickselect, that selects the
kth largest element of an initially unsorted array.
What does kth smallest/largest element mean?
To make question a bit more concrete, consider following (unsorted) array:
[19, 1, 7, 20, 8, 10, 19, 24, 23, 6]
For example, what is 5th smallest element? And what is 5th largest element?
If you sort the array from smallest to largest, the kth smallest element is the kth element in the sorted array. The kth largest element is the kth from the end in the sorted array. Let's examine your example array in Python:
In [2]: sorted([19, 1, 7, 20, 8, 10, 19, 24, 23, 6])
Out[2]: [1, 6, 7, 8, 10, 19, 19, 20, 23, 24]
The smallest element is 1, second smallest is 6, and so on. So the kth smallest is the kth element from the left. Similarly, 24 is the largest, 23 the second largest, and so on, so the kth largest element is the kth element from the right. So if k = 5:
In [3]: sorted([19, 1, 7, 20, 8, 10, 19, 24, 23, 6])[4] # index 4 is 5th from the start
Out[3]: 10
In [4]: sorted([19, 1, 7, 20, 8, 10, 19, 24, 23, 6])[-5] # index -5 is 5th from the end
Out[4]: 19
Note that you don't have to sort the array in order to get the kth smallest/largest value. Sorting is just an easy way to see which value corresponds to k.

Generating a subset uniformly at random?

Here is an implementation of a combinatorial algorithm to choose a subset of an n-set, uniformly at random. Since there are 2n subsets of an n-set, each subset should have a probability: 2-n of getting selected.
I believe I have implemented the algorithm correctly (please let me know if there is a bug somewhere). When I run the program with Java 7 on my Linux box however, I get results that I am not able to reason quite well. The mystery seems to be around the Random Number Generator. I understand that one needs to run the program a 'large number' of times to 'see that the distribution reaches uniformity'. The question however is how large is large. A few runs I did suggest that unless the number of times the experiment is done is >= 1 billion, the distribution of chosen subsets is quite nonuniform.
The algorithm is based on Prof. Herbert Wilf's combinatorial algorithms book where the implementation (slightly different) is done in Fortran and the distribution is more-or-less uniform even when the program is run only 1280 times.
Here are a few sample runs (there's some variation among the run when n is constant) to get a random subset of a 4-set:
Number of times experiment is done n = 1280
Number of times experiment is done n = 12,800
Number of times experiment is done n = 128,000 (still 8 subsets only!)
Number of times experiment is done n = 1,280,000
Number of times experiment is done n = 12,800,000 (now it starts making sense)
Number of times experiment is done n = 1,280,000,000 (this is okay!)
Would you expect such performance? How could Prof. Wilf achieve similar results with only 1280 iterations of an equivalent program?
Every time you call ranInt(), you reset the RNG. Therefore in the long run, these numbers are no longer random.
Moved Random r = new Random(System.currentTimeMillis()); to the top and add static to it
class RandomSubsetSimulation {
static Random r = new Random(System.currentTimeMillis());
public static void main(String[] args) { ...
I am able to get the following results with 8-set
Total: 1000, number of subsets with a frequency > 0: 256
Total # of subsets possible: 256
Full results with 4-set
Frequencies of chosen subsets ....
[3] : 76, 4, 5.94
[4] : 72, 8, 5.63
[] : 83, -3, 6.48
[1] : 90, -10, 7.03
[2] : 80, 0, 6.25
[3, 4] : 86, -6, 6.72
[2, 3] : 88, -8, 6.88
[2, 4] : 55, 25, 4.30
[1, 2, 3] : 99, -19, 7.73
[1, 2, 4] : 75, 5, 5.86
[2, 3, 4] : 76, 4, 5.94
[1, 3] : 85, -5, 6.64
[1, 2] : 94, -14, 7.34
[1, 4] : 72, 8, 5.63
[1, 2, 3, 4] : 71, 9, 5.55
[1, 3, 4] : 78, 2, 6.09
Total: 1280, number of subsets with a frequency > 0: 16
Total # of subsets possible: 16

Need to understand answer of algorithm

I am trying to solve above Longest Monotonically Increasing Subsequence problem using javascript. In order to do that, I need to know about Longest Monotonically Subsequence. Current I am following wikipedia article. The thing I am not understanding this example is that the longest increasing subsequence is given as 0, 2, 6, 9, 13, 15 from 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15, … list. The question is Why the answer does not have 3 in between 2 and 6, and 8 between 6 and 9 etc? How does that answer come from that list?
Ist of all , consider the name "Longest Monotonically Increasing Subsequence" . So , from the given array you need to figure out the largest sequence where the numbers should be appeared in a strictly increasing fashion. There can be many sequence, where the sub array can be strictly increasing but you need to find the largest sub-Array.
So. lets debug this array. a[] = {0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15}
In here the some monotonously increasing sub-arrays are :
{0,8,12,14,15} Length = 5
{0,4,12,14,15} Length = 5
{0,1,9,13,15} Length = 5 and so on.
But if you calculate like this , you can find the largest sub-array will be :
{0, 2, 6, 9, 13, 15} , Length = 6, so this is the answer.
Every single little time you pick any number , the next number should be large than the previous one and must be present in the array. say {0, 2, 6, 9, 13, 15} this list, when you pick 9 , then the next number should be larger than 9. the immediate sequence shows 13>9, so you can pick 13. You can also pick 11. But that will create another branch of sub-array. Like :
{0, 2, 6, 9, 11, 15} which is another solution.
Hope this explanation will help you to understand the LIS (Longest Increasing Subsequence).Thanks.
First of all, the title of your question says: Longest increasing CONTIGUOUS subsequence which is a slight variation of the original problem of LIS in which the result need not have contiguous values from original array as pointed out in above examples. Follow this link for a decent explanation on LIS algorithm which has O(n^2) solution and it can be optimized to have a O(nlogn) solution:
http://www.algorithmist.com/index.php/Longest_Increasing_Subsequence
for the contiguous variant of LIS, here is a decent solution:
http://worldofbrock.blogspot.com/2009/10/how-to-find-longest-continuous.html

efficiently finding overlapping segments from a set of lists

Suppose I have the following lists:
[1, 2, 3, 20, 23, 24, 25, 32, 31, 30, 29]
[1, 2, 3, 20, 23, 28, 29]
[1, 2, 3, 20, 21, 22]
[1, 2, 3, 14, 15, 16]
[16, 17, 18]
[16, 17, 18, 19, 20]
Order matters here. These are the nodes resulting from a depth-first search in a weighted graph. What I want to do is break down the lists into unique paths (where a path has at least 2 elements). So, the above lists would return the following:
[1, 2, 3]
[20, 23]
[24, 25, 32, 31, 30, 29]
[28, 29]
[20, 21, 22]
[14, 15, 16]
[16, 17, 18]
[19, 20]
The general idea I have right now is this:
Look through all pairs of lists to create a set of lists of overlapping segments at the beginning of the lists. For example, in the above example, this would be the output:
[1, 2, 3, 20, 23]
[1, 2, 3, 20]
[1, 2, 3]
[16, 17, 18]
The next output would be this:
[1, 2, 3]
[16, 17, 18]
Once I have the lists from step 2, I look through each input list and chop off the front if it matches one of the lists from step 2. The new lists look like this:
[20, 23, 24, 25, 32, 31, 30, 29]
[20, 23, 28, 29]
[20, 21, 22]
[14, 15, 16]
[19, 20]
I then go back and apply step 1 to the truncated lists from step 3. When step 1 doesn't output any overlapping lists, I'm done.
Step 2 is the tricky part here. What's silly is it's actually equivalent to solving the original problem, although on smaller lists.
What's the most efficient way to solve this problem? Looking at all pairs obviously requires O(N^2) time, and step 2 seems wasteful since I need to run the same procedure to solve these smaller lists. I'm trying to figure out if there's a smarter way to do this, and I'm stuck.
Seems like the solution is to modify a Trie to serve the purpose. Trie compression gives clues, but the kind of compression that is needed here won't yield any performance benefits.
The first list you add becomes it's own node (rather than k nodes). If there is any overlap, nodes split but never get smaller than holding two elements of the array.
A simple example of the graph structure looks like this:
insert (1,2,3,4,5)
graph: (1,2,3,4,5)->None
insert (1,2,3)
graph: (1,2,3)->(4,5), (4,5)->None
insert (3,2,3)
graph: (1,2,3)->(4,5), (4,5)->None, (3,32)->None
segments
output: (1,2,3), (4,5), (3,32)
The child nodes should also be added as an actual Trie, at least when there are enough of them to avoid a linear search when adding/removing from the data structure and potentially increasing the runtime by a factor of N. If that is implemented, then the data structure has the same big O performance as a Trie with a somewhat higher hidden constants. Meaning that it takes O(L*N), where L is the average size of the list and N is the number of lists. Obtaining the segments is linear in the number of segments.
The final data structure, basically a directed graph, for your example would looks like below, with the start node at the bottom.
Note that this data structure can be built as you run the DFS rather than afterwords.
I ended up solving this by thinking about the problem slightly differently. Instead of thinking about sequences of nodes (where an edge is implicit between each successive pair of nodes), I'm thinking about sequences of edges. I basically use the algorithm I posted originally. Step 2 is simply an iterative step where I repeatedly identify prefixes until there are no more prefixes left to identify. This is pretty quick, and dealing with edges instead of nodes really simplified everything.
Thanks for everyone's help!

Resources