Print Maximum List - algorithm

We are given a set F={a1,a2,a3,…,aN} of N Fruits. Each Fruits has price Pi and vitamin content Vi.Now we have to arrange these fruits in such a way that the list contains prices in ascending order and the list contains vitamins in descending order.
For example::
N=4
Pi: 2 5 7 10
Vi: 8 11 9 2
This is the exact question https://cs.stackexchange.com/questions/1287/find-subsequence-of-maximal-length-simultaneously-satisfying-two-ordering-constr/1289#1289

I'd try to reduce the problem to longest increasing subsequent problem.
Sort the list according to first criteria [vitamins]
Then, find the longest increasing subsequent in the modified list,
according to the second criteria [price]
This solution is O(nlogn), since both step (1) and (2) can be done in O(nlogn) each.
Have a look on the wikipedia article, under Efficient Algorithms - how you can implement longest increasing subsequent
EDIT:
If your list allows duplicates, your sort [step (1)] will have to sort by the second parameter as secondary criteria, in case of equality of the primary criteria.
Example [your example 2]:
Pi::99 12 34 10 87 19 90 43 13 78
Vi::10 23 4 5 11 10 18 90 100 65
After step 1 you get [sorting when Vi is primary criteria, descending]:
Pi:: 013 43 78 12 90 87 87 99 10 34
Vi:: 100 90 65 23 18 11 10 10 05 04
Step two finds for longest increasing subsequence in Pi, and you get:
(13,100), (43,90), (78,65), (87,11), (99,10)
as a feasible solution, since it is an increasing subsequence [according to Pi] in the sorted list.
P.S. In here I am assuming the increasing subsequence you want is strictly increasing, otherwise the result is (13,100),(43,90),(78,65),(87,11),(87,10),(99,10) - which is longer subsequence, but it is not strictly increasing/decreasing according to Pi and Vi

Related

Efficient minimum discovery in lists with merge/split

Suppose there are big lists (note that it is a list, not array) filled with numbers and they are unsorted.
We could merge and split these big lists. The problem is getting the minimum number in these lists with the minimum complexity.
For example, a list could have:
10 20 19 18 5 22 15 14 30 40 50 16
The minimum of this list is 5.
If we split the list at 30, we get
10 20 19 18 5 22 15 14 -> minimum is 5
30 40 50 16 -> minimum is 16
We could merge (the merge of A with B is always at the end of the A) the original list with another getting:
10 20 19 18 5 22 15 14 30 40 50 16 100 200 300 400 4 150 100 -> minimum is now 4
The minimum of the merge is trivial to obtain, but if we split the merged list again at any location then the minimum is not so trivial (at least for me). Splitting two times we would get:
10 20 19 18 5 22 -> minimum is 5
15 14 30 40 50 16 100 200 -> minimum is 14
300 400 4 150 100 -> minimum is 4
Language and memory is not an issue, we could get as much memory as needed. But if we could get an algorithm for the merge/split in O(log(N)) for all cases (best and worst case), that would be great!
Unfortunately, all my attempts to solve this are just trivial and result always in O(N). I tried to split the array in "M" and "m" sorted blocks, where "M" blocks would be blocks of numbers sorted in ascending order and "m" would be in decreasing order. But in the worst case (numbers are always going and "up" and "down") this is not efficient, at most O(N/2).
Thank you
M
This could be done by a variation of a Skip List.
On top of your lists, you have "layers". For each two elements in layer x, you have one element in layer x+1. This element is the minimum of the elements below it. (Note that an easier implementation used non deterministic coin flip with 50% to create a layer. This makes it easier to implement, but harder to explain)
So, in your example:
5
5 16
10 5 16
10 18 5 14 30 16
10 20 19 18 5 22 15 14 30 40 50 16
Now, both on merge and a split, you only need to modify elements from the modified element up (and not for the entire list). Since the height of the list is O(logn), you need to modify O(logn) elements.
Example, splitting at 30:
5
10 5
10 19 5 14
10 20 19 18 5 22 15 14
16
30 16
30 40 50 16
Note that you only need to modify elements above 30 and above 10 when splitting, the rest are guaranteed to be up to date.
Note that the undeterministic property makes it handy here - you don't need to adjust the layers to much the "every 2nd element" perfectly when you use non deterministic version. This what makes it easier to implement.

How do I create strictly ordered uniformly distributed buckets out of an array?

I'm looking to take an array of integers and perform a partial bucket sort on that array. Every element in the bucket before it is less than the current bucket elements. For example, if I have 10 buckets for the values 0-100 0-9 would go in the first bucket, 10-19 for the second and so on.
For one example I can take 1 12 23 44 48 and put them into 4 buckets out of 10. But if I have 1, 2, 7, 4, 9, 1 then all values go into a single bucket. I'm looking a way to evenly distribute values to all the buckets while maintaining a ordering. Elements in each bucket don't have to be sorted. For example I'm looking similar to this.
2 1 9 2 3 8 7 4 2 8 11 4 => [[2, 1], [2, 2], [3], [4], [4], [7], [8, 8], [9], [11]]
I'm trying to use this as a quick way to partition a list in a map-reduce.
Thanks for the help.
Edit, maybe this clears things up:
I want to create a hashing function where all elements in bucket1 < bucket2 < bucket3 ..., where each bucket is unsorted.
If I understand it correctly you have around 100TB of data, or 13,743,895,347,200 unsigned 64-bit integers, that you want to distribute over a number of buckets.
A first step could be to iterate over the input, looking at e.g. the highest 24 bits of each integer, and counting them. That will give you a list of 16,777,216 ranges, each with a count of on average 819,200 so it may be possible to store them in 32-bit unsigned integers, which will take up 64 MB.
You can then use this to create a lookup table that tells you which bucket each of those 16,777,216 ranges goes into. You calculate how many integers are supposed to go into each bucket (input size divided by number of buckets) and go over the array, keeping a running total of the count, and set each range to bucket 1, until the running total is too much for bucket 1, then you set the ranges to bucket 2, and so on...
There will of course always be a range that has to be split between bucket n and bucket n+1. To keep track of this, you create a second table that stores how many integers in these split ranges are supposed to go into bucket n+1.
So you now have e.g.:
HIGH 24-BIT RANGE BUCKET BUCKET+1
0 0 ~ 2^40-1 1 0
1 2^40 ~ 2*2^40-1 1 0
2 2*2^40 ~ 3*2^40-1 1 0
3 3*2^40 ~ 4*2^40-1 1 0
...
16 16*2^40 ~ 17*2^40-1 1 0
17 17*2^40 ~ 18*2^40-1 1 284,724 <- highest 284,724 go into bucket 2
18 18*2^40 ~ 19*2^40-1 2 0
...
You can now iterate over the input again, and for each integer look at the highest 24 bits, and use the lookup table to see which bucket the integer is supposed to go into. If the range isn't split, you can immediately move the integer into the right bucket. For each split range, you create an ordered list or priority queue that can hold as many integers as need to go into the next bucket; you store only the highest values in this list or queue; any smaller integer goes straight to the bucket, and if an integer is added to the full list or queue, the smallest value is moved to the bucket. At the end this list or queue is added to the next bucket.
The number of ranges should be as high as possible with the available memory, because that minimises the number of integers in split ranges. With the huge input you have, you may need to save the split ranges to disk, and then afterwards look at each of them seperately, find the highest x values, and move them to the buckets accordingly.
The complexity of this is N for the first run, then you iterate over the ranges R, then N as you iterate over the input again, and then for the split ranges you'll have something like M.logM to sort and M to distribute, so a total of 2*N + R + M.LogM + M. Using a high number of ranges to keep the number of integers in split ranges low will probably be the best strategy to speed the process up.
Actually, the number of integers M that are in split ranges depends on the number of buckets B and ranges R, with M = N × B/R, so that e.g. with a thousand buckets and a million ranges, 0.1% of the input would be in split ranges and have to be sorted. (These are averages, depending on the actual distribution.) That makes the total complexity 2×N + R + (N×B/R).Log(N×B/R) + N×B/R.
Another example:
Input: N = 13,743,895,347,200 unsigned 64-bit integers
Ranges: 232 (using the highest 32 bits of each integer)
Integers per range: 3200 (average)
Count list: 232 16-bit integers = 8 GB
Lookup table: 232 16-bit integers = 8 GB
Split range table: B 16-bit integers = 2×B bytes
With 1024 buckets, that would mean that B/R = 1/222, and there are 1023 split ranges with around 3200 integers each, or around 3,276,800 integers in total; these will then have to be sorted and distributed over the buckets.
With 1,048,576 buckets, that would mean that B/R = 1/212, and there are 1,048,575 split ranges with around 3200 integers each, or around 3,355,443,200 integers in total. (More than 65,536 buckets would of course require a lookup table with 32-bit integers.)
(If you find that the total of the counts per range doesn't equal the total size of the input, there has been overflow in the count list, and you should switch to a larger integer type for the counts.)
Let's run through a tiny example: 50 integers in the range 1-100 have to be distributed over 5 buckets. We choose a number of ranges, say 20, and iterate over the input to count the number of integers in each range:
2 9 14 17 21 30 33 36 44 50 51 57 69 75 80 81 87 94 99
1 9 15 16 21 32 40 42 48 55 57 66 74 76 88 96
5 6 20 24 34 50 52 58 70 78 99
7 51 69
55
3 4 2 3 3 1 3 2 2 3 5 3 0 4 2 3 1 2 1 3
Then, knowing that each bucket should hold 10 integers, we iterate over the list of counts per range, and assign each range to a bucket:
3 4 2 3 3 1 3 2 2 3 5 3 0 4 2 3 1 2 1 3 <- count/range
1 1 1 1 2 2 2 2 3 3 3 4 4 4 4 5 5 5 5 5 <- to bucket
2 1 1 <- to next
When a range has to be split between two buckets, we store the number of integers that should go to the next bucket in a seperate table.
We can then iterate over the input again, and move all the integers in non-split ranges into the buckets; the integers in split ranges are temporarily moved into seperate buckets:
bucket 1: 9 14 2 9 1 15 6 5 7
temp 1/2: 17 16 20
bucket 2: 21 33 30 32 21 24 34
temp 2/3: 36 40
bucket 3: 44 50 48 42 50
temp 3/4: 51 55 52 51 55
bucket 4: 57 75 69 66 74 57 57 70 69
bucket 5: 81 94 87 80 99 88 96 76 78 99
Then we look at the temp buckets one by one, find the x highest integers as indicated in the second table, move them to the next bucket, and what is left over to the previous bucket:
temp 1/2: 17 16 20 (to next: 2) bucket 1: 16 bucket 2: 17 20
temp 2/3: 36 40 (to next: 1) bucket 2: 36 bucket 3: 40
temp 3/4: 51 55 52 51 55 (to next: 1) bucket 3: 51 51 52 55 bucket 4: 55
And the end result is:
bucket 1: 9 14 2 9 1 15 6 5 7 16
bucket 2: 21 33 30 32 21 24 34 17 20 36
bucket 3: 44 50 48 42 50 40 51 51 52 55
bucket 4: 57 75 69 66 74 57 57 70 69 55
bucket 5: 81 94 87 80 99 88 96 76 78 99
So, out of 50 integers, we've had to sort a group of 3, 2 and 5 integers.
Actually, you don't need to create a table with the number of integers in the split ranges that should go to the next bucket. You know how many integers are supposed to go into each bucket, so after the initial distribution you can look at how many integers are already in each bucket, and then add the necessary number of (lowest value) integers from the split range. In the example above, which expects 10 integers per bucket, that would be:
3 4 2 3 3 1 3 2 2 3 5 3 0 4 2 3 1 2 1 3 <- count/range
1 1 1 / 2 2 2 / 3 3 / 4 4 4 4 5 5 5 5 5 <- to bucket
bucket 1: 9 14 2 9 1 15 6 5 7 <- add 1
temp 1/2: 17 16 20 <- 3-1 = 2 go to next bucket
bucket 2: 21 33 30 32 21 24 34 <- add 3-2 = 1
temp 2/3: 36 40 <- 2-1 = 1 goes to next bucket
bucket 3: 44 50 48 42 50 <- add 5-1 = 4
temp 3/4: 51 55 52 51 55 <- 5-4 = 1 goes to next bucket
bucket 4: 57 75 69 66 74 57 57 70 69 <- add 1-1 = 0
bucket 5: 81 94 87 80 99 88 96 76 78 99 <- add 0
The calculation of how much of the input will be in split ranges and need to be sorted, given above as M = N × B/R, is an average for input that is roughly evenly distributed. A slight bias, with more values in a certain part of the input space will not have much effect, but it would indeed be possible to craft worst-case input to thwart the algorithm.
Let's look again at this example:
Input: N = 13,743,895,347,200 unsigned 64-bit integers
Ranges: 232 (using the highest 32 bits of each integer)
Integers per range: 3200 (average)
Buckets: 1,048,576
Integers per bucket: 13,107,200
For a start, if there are ranges that contain more than 232 integers, you'd have to use 64-bit integers for the count table, so it would be 32GB in size, which could force you to use fewer ranges, depending on the available memory.
Also, every range that holds more integers than the target size per bucket is automatically a split range. So if the integers are distributed with a lot of local clusters, you may find that most of the input is in split ranges that need to be sorted.
If you have enough memory to run the first step using 232 ranges, then each range has 232 different values, and you could distribute the split ranges over the buckets using a counting sort (which has linear complexity).
If you don't have the memory to use 232 ranges, and you end up with problematically large split ranges, you could use the complete algorithm again on the split ranges. Let's say you used 228 ranges, expecting each range to hold around 51,200 integers, and you end up with an unexpectedly large split range with 5,120,000,000 integers that need to be distributed over 391 buckets. If you ran the algorithm again for this limited range, you'd have 228 ranges (each holding on average 19 integers with a maximum of 16 different values) for just 391 buckets, and only a tiny risk of ending up with large split ranges again.
Note: the ranges that have to be split over two or more buckets don't necessarily have to be sorted. You can e.g. use a recursive version of Dijkstra's Dutch national flag algorithm to partition the range into a part with the x smallest values, and a part with the largest values. The average complexity of partitioning would be linear (when using a random pivot), against the O(N.LogN) complexity of sorting.

Divisions of set with even number of elements [duplicate]

This question already has answers here:
Sets of all disjoint pairs
(2 answers)
Closed 7 years ago.
Is there any effective way to list all divisions of a set {1, ... , 2*n} into n pairs?
The easiest idea is to list all permutations and then permutation (a1, a2, ... , a8) means division {{a1,a2}, ... , {a7,a8}}. In this situation there are 2^n * n! permutations for each division. This is O((2n)!).
Can I find a more effective way?
Here's my pencil and paper algorithm:
f(set,result):
if the set is empty:
return result
otherwise:
pair one item from the set with
each of the remaining items,
calling f again with the pair added to
the result and out of the set
123456
12 34 56
35 46
36 45
13 24 56
25 46
26 45
14 23 56
25 36
26 35
...

Need to find lowest differences between first line of an array and the rest ones

Well, I've been given a number of pairs of elements (s,h), where s sends an h element on the s-th row of a 2d array.It is not necessary that each line has the same amount of elements, only known that there cannot be more than N elements on a line.
What I want to do is to find the lowest biggest difference(!) between a certain element of the first line and the rest ones.
Thus, if I have 3 lines with (101,92) (100,25,95,52,101) (93,108,0,65,200) what I want to find is 3, because I have to choose 92 and I have 95-92=3 from first to second and 93-92=1 form first to third.
I have reached a point where it is certain that if I have s lines with n(i) elements each and i=0..s, then n0<=n1<=...<=ns so as to have a good average performance scenario when picking the best-fit from 1st line towards the others.
However, I cannot think of a way lower than O(n2) or even maybe O(n3) in some cases. Does anyone have a suggestion about a fairly improved way to do this?
Combine all lines into a single list, also keeping track of which element comes from where.
Sort this list.
Have a last-value variable for each line.
For each item in the sorted list, update the last-value variable of the applicable list. If not all lines have a last-value set yet, do nothing. If it's an element from the first list:
Recalculate the biggest difference for all of the last-value variables. Store this difference.
If it's an element from any other list:
If all values have previous not been set, calculate the biggest difference. Otherwise, if the difference between the first list's last-value and this element is bigger than the biggest difference, update the biggest difference with this difference. Store this difference.
The smallest difference is the desired value.
Example:
Lists: (101,92) (100,25,95,52,101) (93,108,0,65,200)
Sorted 0 25 52 65 92 93 95 100 101 101 108 200
Source 2 1 1 2 0 2 1 1 0 1 2 2
Last[0] - - - - 92 92 92 92 101 101 101 101
Last[1] - 25 52 52 52 52 95 100 100 101 101 101
Last[2] 0 0 0 65 65 93 93 93 93 93 108 200
Diff - - - - 40 41 3 8 8 8 7 9
Best - - - - 40 40 3 3 3 3 3 3
Best = 3 as required. Storing the actual items or finding them afterwards should be easy enough.
Complexity:
Let n be the total number of items and k be the number of lists.
O(n log n) for the combine + sort.
O(nk) (worst case) for the scan through, since we're checking n items and, at each item, we do maximum O(k) work.
So O(n log n + nk).

Finding a set of permutations, with a constraint

I have a set of N^2 numbers and N bins. Each bin is supposed to have N numbers from the set assigned to it. The problem I am facing is finding a set of distributions that map the numbers to the bins, satisfying the constraint, that each pair of numbers can share the same bin only once.
A distribution can nicely be represented by an NxN matrix, in which each row represents a bin. Then the problem is finding a set of permutations of the matrix' elements, in which each pair of numbers shares the same row only once. It's irrelevant which row it is, only that two numbers were both assigned to the same one.
Example set of 3 permutations satisfying the constraint for N=8:
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
0 8 16 24 32 40 48 56
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
0 9 18 27 36 45 54 63
1 10 19 28 37 46 55 56
2 11 20 29 38 47 48 57
3 12 21 30 39 40 49 58
4 13 22 31 32 41 50 59
5 14 23 24 33 42 51 60
6 15 16 25 34 43 52 61
7 8 17 26 35 44 53 62
A permutation that doesn't belong in the above set:
0 10 20 30 32 42 52 62
1 11 21 31 33 43 53 63
2 12 22 24 34 44 54 56
3 13 23 25 35 45 55 57
4 14 16 26 36 46 48 58
5 15 17 27 37 47 49 59
6 8 18 28 38 40 50 60
7 9 19 29 39 41 51 61
Because of multiple collisions with the second permutation, since, for example they're both pairing the numbers 0 and 32 in one row.
Enumerating three is easy, it consists of 1 arbitrary permutation, its transposition and a matrix where the rows are made of the previous matrix' diagonals.
I can't find a way to produce a set consisting of more though. It seems to be either a very complex problem, or a simple problem with an unobvious solution. Either way I'd be thankful if somebody had any ideas how to solve it in reasonable time for the N=8 case, or identified the proper, academic name of the problem, so I could google for it.
In case you were wondering what is it useful for, I'm looking for a scheduling algorithm for a crossbar switch with 8 buffers, which serves traffic to 64 destinations. This part of the scheduling algorithm is input traffic agnostic, and switches cyclically between a number of hardwired destination-buffer mappings. The goal is to have each pair of destination addresses compete for the same buffer only once in the cycling period, and to maximize that period's length. In other words, so that each pair of addresses was competing for the same buffer as seldom as possible.
EDIT:
Here's some code I have.
CODE
It's greedy, it usually terminates after finding the third permutation. But there should exist a set of at least N permutations satisfying the problem.
The alternative would require that choosing permutation I involved looking for permutations (I+1..N), to check if permutation I is part of the solution consisting of the maximal number of permutations. That'd require enumerating all permutations to check at each step, which is prohibitively expensive.
What you want is a combinatorial block design. Using the nomenclature on the linked page, you want designs of size (n^2, n, 1) for maximum k. This will give you n(n+1) permutations, using your nomenclature. This is the maximum theoretically possible by a counting argument (see the explanation in the article for the derivation of b from v, k, and lambda). Such designs exist for n = p^k for some prime p and integer k, using an affine plane. It is conjectured that the only affine planes that exist are of this size. Therefore, if you can select n, maybe this answer will suffice.
However, if instead of the maximum theoretically possible number of permutations, you just want to find a large number (the most you can for a given n^2), I am not sure what the study of these objects is called.
Make a 64 x 64 x 8 array: bool forbidden[i][j][k] which indicates whether the pair (i,j) has appeared in row k. Each time you use the pair (i, j) in the row k, you will set the associated value in this array to one. Note that you will only use the half of this array for which i < j.
To construct a new permutation, start by trying the member 0, and verify that at least seven of forbidden[0][j][0] that are unset. If there are not seven left, increment and try again. Repeat to fill out the rest of the row. Repeat this whole process to fill the entire NxN permutation.
There are probably optimizations you should be able to come up with as you implement this, but this should do pretty well.
Possibly you could reformulate your problem into graph theory. For example, you start with the complete graph with N×N vertices. At each step, you partition the graph into N N-cliques, and then remove all edges used.
For this N=8 case, K64 has 64×63/2 = 2016 edges, and sixty-four lots of K8 have 1792 edges, so your problem may not be impossible :-)
Right, the greedy style doesn't work because you run out of numbers.
It's easy to see that there can't be more than 63 permutations before you violate the constraint. On the 64th, you'll have to pair at least one of the numbers with another its already been paired with. The pigeonhole principle.
In fact, if you use the table of forbidden pairs I suggested earlier, you find that there are a maximum of only N+1 = 9 permutations possible before you run out. The table has N^2 x (N^2-1)/2 = 2016 non-redundant constraints, and each new permutation will create N x (N choose 2) = 28 new pairings. So all the pairings will be used up after 2016/28 = 9 permutations. It seems like realizing that there are so few permutations is the key to solving the problem.
You can generate a list of N permutations numbered n = 0 ... N-1 as
A_ij = (i * N + j + j * n * N) mod N^2
which generates a new permutation by shifting the columns in each permutation. The top row of the nth permutation are the diagonals of the n-1th permutation. EDIT: Oops... this only appears to work when N is prime.
This misses one last permutation, which you can get by transposing the matrix:
A_ij = j * N + i

Resources