Efficient minimum discovery in lists with merge/split - algorithm

Suppose there are big lists (note that it is a list, not array) filled with numbers and they are unsorted.
We could merge and split these big lists. The problem is getting the minimum number in these lists with the minimum complexity.
For example, a list could have:
10 20 19 18 5 22 15 14 30 40 50 16
The minimum of this list is 5.
If we split the list at 30, we get
10 20 19 18 5 22 15 14 -> minimum is 5
30 40 50 16 -> minimum is 16
We could merge (the merge of A with B is always at the end of the A) the original list with another getting:
10 20 19 18 5 22 15 14 30 40 50 16 100 200 300 400 4 150 100 -> minimum is now 4
The minimum of the merge is trivial to obtain, but if we split the merged list again at any location then the minimum is not so trivial (at least for me). Splitting two times we would get:
10 20 19 18 5 22 -> minimum is 5
15 14 30 40 50 16 100 200 -> minimum is 14
300 400 4 150 100 -> minimum is 4
Language and memory is not an issue, we could get as much memory as needed. But if we could get an algorithm for the merge/split in O(log(N)) for all cases (best and worst case), that would be great!
Unfortunately, all my attempts to solve this are just trivial and result always in O(N). I tried to split the array in "M" and "m" sorted blocks, where "M" blocks would be blocks of numbers sorted in ascending order and "m" would be in decreasing order. But in the worst case (numbers are always going and "up" and "down") this is not efficient, at most O(N/2).
Thank you
M

This could be done by a variation of a Skip List.
On top of your lists, you have "layers". For each two elements in layer x, you have one element in layer x+1. This element is the minimum of the elements below it. (Note that an easier implementation used non deterministic coin flip with 50% to create a layer. This makes it easier to implement, but harder to explain)
So, in your example:
5
5 16
10 5 16
10 18 5 14 30 16
10 20 19 18 5 22 15 14 30 40 50 16
Now, both on merge and a split, you only need to modify elements from the modified element up (and not for the entire list). Since the height of the list is O(logn), you need to modify O(logn) elements.
Example, splitting at 30:
5
10 5
10 19 5 14
10 20 19 18 5 22 15 14
16
30 16
30 40 50 16
Note that you only need to modify elements above 30 and above 10 when splitting, the rest are guaranteed to be up to date.
Note that the undeterministic property makes it handy here - you don't need to adjust the layers to much the "every 2nd element" perfectly when you use non deterministic version. This what makes it easier to implement.

Related

Giving a test case where heap sort from Introduction to Algorithm fails

I was reading heapsort from Introduction to Algorithms ,
It is stated there
(1)To build max heap in bottom up manner.
(2)Then exchange with last element and call max hepify on the first element and continues like this.
Lets take an example on this input-
->7 10 20 3 4 49 50
the steps in building max heap will be
7 10 50 3 4 49 20
7 10 50 3 4 49 20
50 10 7 3 4 49 20
this is max heap build up. Now we exchange with last
20 10 7 3 4 49 | 50
now we call max heapify on 20, nothing happens n we will put 20 in n-1 position which is wrong.
We are making heap in the bottom up manner but calling heapify in top down manner, I think this is why its giving wrong on this input.
Your algorithm to build the max heap has an error. The array
50 10 7 3 4 49 20
Does not represent a valid max heap. In the traditional array representation, that array would correspond to this:
50
10 7
3 4 49 20
That's not a valid heap because 49 and 20 are larger than their parent.
You need to fix your bottom-up heap construction algorithm.
Your heap sort algorthm is incomplete. After the array is 50 10 7 3 4 49 20,the max_heapify is again called on the 3rd node which will swap the 3rd and 6th value in the array making it the max heap 50 10 49 3 4 7 20. Now you will swap the first element with the last element in the array and then call max_heapify on n-1 elements and you will get your desired answer.

Sum Pyramid with backtracking

I'm trying to solve this problem and I'm new to backtracking algorithms,
The problem is about making a pyramid like this so that a number sitting on two numbers is the sum of them. Every number in the pyramid has to be different and less than 100. Like this:
88
39 49
15 24 25
4 11 13 12
1 3 8 5 7
Any pointers on how to do this using backtracking?
Not necessarily backtracking but the property you are asking for is interestingly very similar to the Pascal Triangle property.
The Pascal Triangle (http://en.wikipedia.org/wiki/Pascal's_triangle), which is used for efficient computation of binomial coefficient among other things, is a pyramid where a number is equal to the sum of the two numbers above it with the top being 1.
As you can see you are asking the opposite property where a number is the sum of the numbers below it.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
For instance in the Pascal Triangle above, if you wanted the top of your pyramid to be 56, your pyramid will be a reconstruction bottom up of the Pascal Triangle starting from 56 and that will give something like:
56
21 35
6 15 20
1 5 10 10
Again that's not a backtracking solution and this might not give you a good enough solution for every single N though I thought this was an interesting approximation that was worth noting.

Print Maximum List

We are given a set F={a1,a2,a3,…,aN} of N Fruits. Each Fruits has price Pi and vitamin content Vi.Now we have to arrange these fruits in such a way that the list contains prices in ascending order and the list contains vitamins in descending order.
For example::
N=4
Pi: 2 5 7 10
Vi: 8 11 9 2
This is the exact question https://cs.stackexchange.com/questions/1287/find-subsequence-of-maximal-length-simultaneously-satisfying-two-ordering-constr/1289#1289
I'd try to reduce the problem to longest increasing subsequent problem.
Sort the list according to first criteria [vitamins]
Then, find the longest increasing subsequent in the modified list,
according to the second criteria [price]
This solution is O(nlogn), since both step (1) and (2) can be done in O(nlogn) each.
Have a look on the wikipedia article, under Efficient Algorithms - how you can implement longest increasing subsequent
EDIT:
If your list allows duplicates, your sort [step (1)] will have to sort by the second parameter as secondary criteria, in case of equality of the primary criteria.
Example [your example 2]:
Pi::99 12 34 10 87 19 90 43 13 78
Vi::10 23 4 5 11 10 18 90 100 65
After step 1 you get [sorting when Vi is primary criteria, descending]:
Pi:: 013 43 78 12 90 87 87 99 10 34
Vi:: 100 90 65 23 18 11 10 10 05 04
Step two finds for longest increasing subsequence in Pi, and you get:
(13,100), (43,90), (78,65), (87,11), (99,10)
as a feasible solution, since it is an increasing subsequence [according to Pi] in the sorted list.
P.S. In here I am assuming the increasing subsequence you want is strictly increasing, otherwise the result is (13,100),(43,90),(78,65),(87,11),(87,10),(99,10) - which is longer subsequence, but it is not strictly increasing/decreasing according to Pi and Vi

Understanding "median of medians" algorithm

I want to understand "median of medians" algorithm on the following example:
We have 45 distinct numbers divided into 9 group with 5 elements each.
48 43 38 33 28 23 18 13 8
49 44 39 34 29 24 19 14 9
50 45 40 35 30 25 20 15 10
51 46 41 36 31 26 21 16 53
52 47 42 37 32 27 22 17 54
The first step is sorting every group (in this case they are already sorted)
Second step recursively, find the "true" median of the medians (50 45 40 35 30 25 20 15 10) i.e. the set will be divided into 2 groups:
50 25
45 20
40 15
35 10
30
sorting these 2 groups
30 10
35 15
40 20
45 25
50
the medians is 40 and 15 (in case the numbers are even we took left median)
so the returned value is 15 however "true" median of medians (50 45 40 35 30 25 20 15 10) is 30, moreover there are 5 elements less then 15 which are much less than 30% of 45 which are mentioned in wikipedia
and so T(n) <= T(n/5) + T(7n/10) + O(n) fails.
By the way in the Wikipedia example, I get result of recursion as 36. However, the true median is 47.
So, I think in some cases this recursion may not return true median of medians. I want to understand where is my mistake.
The problem is in the step where you say to find the true median of the medians. In your example, you had these medians:
50 45 40 35 30 25 20 15 10
The true median of this data set is 30, not 15. You don't find this median by splitting the groups into blocks of five and taking the median of those medians, but instead by recursively calling the selection algorithm on this smaller group. The error in your logic is assuming that median of this group is found by splitting the above sequence into two blocks
50 45 40 35 30
and
25 20 15 10
then finding the median of each block. Instead, the median-of-medians algorithm will recursively call itself on the complete data set 50 45 40 35 30 25 20 15 10. Internally, this will split the group into blocks of five and sort them, etc., but it does so to determine the partition point for the partitioning step, and it's in this partitioning step that the recursive call will find the true median of the medians, which in this case will be 30. If you use 30 as the median as the partitioning step in the original algorithm, you do indeed get a very good split as required.
Hope this helps!
Here is the pseudocode for median of medians algorithm (slightly modified to suit your example). The pseudocode in wikipedia fails to portray the inner workings of the selectIdx function call.
I've added comments to the code for explanation.
// L is the array on which median of medians needs to be found.
// k is the expected median position. E.g. first select call might look like:
// select (array, N/2), where 'array' is an array of numbers of length N
select(L,k)
{
if (L has 5 or fewer elements) {
sort L
return the element in the kth position
}
partition L into subsets S[i] of five elements each
(there will be n/5 subsets total).
for (i = 1 to n/5) do
x[i] = select(S[i],3)
M = select({x[i]}, n/10)
// The code to follow ensures that even if M turns out to be the
// smallest/largest value in the array, we'll get the kth smallest
// element in the array
// Partition array into three groups based on their value as
// compared to median M
partition L into L1<M, L2=M, L3>M
// Compare the expected median position k with length of first array L1
// Run recursive select over the array L1 if k is less than length
// of array L1
if (k <= length(L1))
return select(L1,k)
// Check if k falls in L3 array. Recurse accordingly
else if (k > length(L1)+length(L2))
return select(L3,k-length(L1)-length(L2))
// Simply return M since k falls in L2
else return M
}
Taking your example:
The median of medians function will be called over the entire array of 45 elements like (with k = 45/2 = 22):
median = select({48 49 50 51 52 43 44 45 46 47 38 39 40 41 42 33 34 35 36 37 28 29 30 31 32 23 24 25 26 27 18 19 20 21 22 13 14 15 16 17 8 9 10 53 54}, 45/2)
The first time M = select({x[i]}, n/10) is called, array {x[i]} will contain the following numbers: 50 45 40 35 30 20 15 10.
In this call, n = 45, and hence the select function call will be M = select({50 45 40 35 30 20 15 10}, 4)
The second time M = select({x[i]}, n/10) is called, array {x[i]} will contain the following numbers: 40 20.
In this call, n = 9 and hence the call will be M = select({40 20}, 0).
This select call will return and assign the value M = 20.
Now, coming to the point where you had a doubt, we now partition the array L around M = 20 with k = 4.
Remember array L here is: 50 45 40 35 30 20 15 10.
The array will be partitioned into L1, L2 and L3 according to the rules L1 < M, L2 = M and L3 > M. Hence:
L1: 10 15
L2: 20
L3: 30 35 40 45 50
Since k = 4, it's greater than length(L1) + length(L2) = 3. Hence, the search will be continued with the following recursive call now:
return select(L3,k-length(L1)-length(L2))
which translates to:
return select({30 35 40 45 50}, 1)
which will return 30 as a result. (since L has 5 or fewer elements, hence it'll return the element in kth i.e. 1st position in the sorted array, which is 30).
Now, M = 30 will be received in the first select function call over the entire array of 45 elements, and the same partitioning logic which separates the array L around M = 30 will apply to finally get the median of medians.
Phew! I hope I was verbose and clear enough to explain median of medians algorithm.

How can I define a verb in J that applies a different verb alternately to each atom in a list?

Imagine I've defined the following name in J:
m =: >: i. 2 4 5
This looks like the following:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
I want to create a monadic verb of rank 1 that applies to each list in this list of lists. It will double (+:) or add 1 (>:) to each alternate item in the list. If we were to apply this verb to the first row, we'd get 2 3 6 5 10.
It's fairly easy to get a list of booleans which alternate with each item, e.g., 0 1 $~{:$ m gives us 0 1 0 1 0. I thought, aha! I'll use something like +:`>: #. followed by some expression, but I could never quite get it to work.
Any suggestions?
UPDATE
The following appears to work, but perhaps it can be refactored into something more elegant by a J pro.
poop =: monad define
(($ y) $ 0 1 $~{:$ y) ((]+:)`(]>:) #. [)"0 y
)
I would use the oblique verb, with rank 1 (/."1)- so it applies to successive elements of each list in turn.
You can pass a gerund into /. and it applies them in order, extending cyclically.
+:`>: /."1 m
2
3
6
5
10
12
8
16
10
20
22
13
26
15
30
32
18
36
20
40
42
23
46
25
50
52
28
56
30
60
62
33
66
35
70
72
38
76
40
80
I spent a long time and I looked at it, and I believe that I know why ,# works to recover the shape of the argument.
The shape of the arguments to the parenthesized phrase is the shape of the argument passed to it on the right, even though the rank is altered by the " conjugate (well, that is what trace called it, I thought it was an adverb). If , were monadic, it would be a ravel, and the result would be a vector or at least of a lower rank than the input, based on adverbs to ravel. That is what happens if you take the conjunction out - you get a vector.
So what I believe is happening is that the conjunction is making , act like a dyadic , which is called an append. The append alters what it is appending to what it is appending to. It is appending to nothing but that thing still has a shape, and so it ends up altering the intermediate vector back to the shape of the input.
Now I'm probably wrong. But $,"0#(+:>:/.)"1 >: i. 2 4 5 -> 2 4 5 1 1` which I thought sort of proved my case.
(,#(+:`>:/.)"1 a) works, but note that ((* 2 1 $~ $)#(+ 0 1 $~ $)"1 a) would also have worked (and is about 20 times faster, on large arrays, in my brief tests).

Resources