Algorithms for Optimization of Integer Subset Linking - algorithm

Consider having two sets of integer values that are divided in multiple subsets. The two sets exist of the same set of values but the order and the division into subsets differ. The idea is to link the subsets from the first set with these from the second set in such way that every individual value in each subset of the first set is linked to a same individual value of a subset of the second set. No value can be linked with two others. In one linking step multiple values can be linked between only one subset of the first set with only one subset of the second set. The goal is to reduce the amount of linking steps as much as possible.
The question is: are there algorithms around for doing this kind of linking as optimal as possible?
I have done some research in several fields of mathematical optimization, such as Linear Programming, Integer Programming, Combinatorial optimization and Operations Research but none of the algorithms seem to cover this problem. Do you guys have any ideas, fields or algorithms to optimize these kinds of problems and make me head in the right direction?
For example:
Two sets of integers with two subsets:
[[1, 2, 2] [2, 3, 3]]
and
[[1, 2, 3] [2, 2, 3]].
Now the first linking set could be to link the first subset of the first set 1[1] with the first subset of the second set 2[1].
This is one step and leads to a link between: 1 - 1 - 1 and 2 - 1 - 1 and a link between 1 - 1 - 2 and 2 - 1 - 2. Now the sets will look like this:
[[1, 2, 2] [2, 3, 3]]
and
[[1, 2, 3] [2, 2, 3]].
The next step could be linking 1[1] with 2[2], leading to a link between 1 - 1 - 3 and 2 - 2 - 1 and the sets will look like this:
[[1, 2, 2] [2, 3, 3]]
and
[[1, 2, 3] [2, 2, 3]].
The third step could be linking 1[2] with 2[1]. Resulting in:
[[1, 2, 2] [2, 3, 3]]
and
[[1, 2, 3] [2, 2, 3]].
And the fourth step could then be linking 1[2] to 2[2]. Resulting in:
[[1, 2, 2] [2, 3, 3]]
and
[[1, 2, 3] [2, 2, 3]], which means every value is linked. This solution costs four steps.
When having larger sets, all subsets can be linked to all other subsets of the other set, but that will result in many steps. Is there a algorithm around that optimizes the number of steps?

Even this is not an answer, but I think this is a step in defining the problem toward finding a solution.
Note: The following example of input/output was an edit. I disagree with the rejecting votes, and I URGE everyone to read carefully before voting to approve or reject any edit.
This would open a discussion about the votes that are non-carefully casted. Still is a constructive discussion but here is not its place.
Consider the following example: It is less costly (less using of sub-sets) to use the 3nd sub-set of the first list than using the 2nd and 5th sub-sets.
The algorithm:
Define the smaller list: List #2
Create a counting list of all items in all sub-lists of list #2.
You will have this counting list {[item:count]}: {[1:3], [2:2], [3:1], [4:2], [5:1]}.
Now, your problem instead of linking (i.e. index-dependent) the sub-sets. It is to find the min number of sub-sets of list #1. That their items would give the count of the counting list.
A simple try of each possible combination would definitely get the answer.. but I think from point #4 we can think of a better solution containing some conditions to minimize the combination tries.
Hopefully, this suggestion would help in giving a hint towards finding a solution.

Related

Question regarding mergesort's merge algorithm

Let's suppose we have two sorted arrays, A and B, consisting of n elements. I dont understand why the time needed to merge these 2 is "n+n". In order to merge them we need to compare 2n-1 elements. For example, in the two following arrays
A = [3, 5, 7, 9] and B = [2, 4, 6, 8]
We will start merging them into a single one, by comparing the elements in the known way. However when we finally compare 8 with 9. Now, this will be our 2n-1=8-1=7th comparison and 8 will be inserted into the new array.
After this the 9 will be inserted without another comparison. So I guess my question is, since there are 2n-1 comparisons, why do we say that this merging takes 2n time? Im not saying O(n), im saying T(n)=2n, an exact time function.
Its probably a detail that im missing here so I would be very grateful if someone could provide some insight. Thanks in advance.

flatten and compact an array more efficiently

On many occasions, we need to perform two or more different operations on an array like flatten and compact.
some_array.flatten.compact
My concern here is that it will loop over the array two times. Is there more efficient way of doing this?
I actually think this is a great question. But first off, why is everyone not too concerned about this? Here's the performance of flatten and flatten.compact compared:
Here's the code I used to generate this chart, and one that includes memory.
Hopefully now you see why most folks won't worry: it is just another constant factor you're adding when you compose a flatten with a compact, maybe it's valuable at least theoretically to say: how can we shave off the time and space of this intermediate structure? Again, asymptotically not super valuable, but curious to think about.
As far as I can tell, you can't do this by making use of flatten:
Before looking at the source, I hoped that flatten could take a block like so:
[[3, [3, 3, 3]], [3, [3, 3, 3]], [3, [3, 3, 3]], nil].flatten {|e| e unless e.nil? }
No dice though. We get this as a return:
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, nil]
This is weird in that it basically tosses the block away as a no-op. But it makes sense with the source. The C method flatten used in Ruby's core isn't parameterized to take a block.
The procedure in the Ruby source code reads kinda weird to me (I am not a C programmer) but it's basically doing something like depth first search. It's using a stack that it adds every new nested array in to process that it encounters. (It terminates when none remain.) I've not calculated this formally, but it leads me to guess the complexity is on par with DFS.
So the source code could've been written such this would work by allowing for extra setup if a block is passed in. But without that, you're stuck with the (small) performance hit!
It is not iterating over the same array two times. flatten creates in general an array that has an entirely different structure from the original one. Therefore, the first and the second iteration are not iterating over the same elements. So, it naturally follows that you cannot do that.
If the array is one layer deep, then the arrays can be merged in a set.
require 'set'
s = Set.new
Ar.each{|a| s.merge(a)}

Split Algorithm on C++

I have an array with 8 elements:
a[8] = {9, 7, 6, 2, 3, 1, 5, 4}
I want to divide 8 elements to 3 group. Each group is the sum of 1 or more element. The sum of each group is most similar.
You are describing the k-partition problem with k=3.
Unfortunately, this problem is known to be (strong) NP-Hard, so there is no known efficient solution to it (and the general belied is one does not exist).
Your best hope will be brute force search: create all partitions to 3 groups, and choose the best one out of them. If you are dealing with 8 elements - that should be possible, but it will quickly become too slow for larger arrays I am afraid.

Round robin for 3 participants/trios?

I'm speaking with reference to Scheduling algorithm for a round-robin tournament?.
I need to pair (or triple) a group of people into trios, in order for them to meet. For example, in a group of 9 people, the first meetings would be: [1, 2, 3], [4, 5, 6], [7, 8, 9]. Next meetings would be something like [1, 4, 7], [2, 5, 8], [3, 6, 9]. Things end when everyone has met everyone else, and we need to minimize the number of "rounds".
I'm at wit's end thinking of the solution to this. Many thanks to someone who can point me in the right direction :)
If "everyone has met everyone else" means that all pairs appear in the schedule, then this is a generalization of Kirkman's schoolgirl problem, solvable in the minimum number of rounds when there is an odd number of groups (existence of Kirkman triple systems, due to Ray-Chaudhuri and Wilson). The social golfer problem is a generalization to other group sizes, and I expect that the situation for even numbers of groups would be studied under that name.
In the (seemingly unlikely) event that "everyone has met everyone else" means that all possible groups have been used, then you want to use the construction in Baranyai's theorem to find hypergraph factors (see my previous answer on the topic for a Python implementation).

decoding via number combinations algorithm in python 3

ok so here is the problem.
let's say:
1 means Bob
2 means Jerry
3 means Tom
4 means Henry
any summation combination of two of aforementioned numbers is a status/ mood type which is how the program will be encoded:
7 (4+3) means Angry
5 (3+2) menas Sad
3 (2+1) means Mad
4 (3+1) means Happy
and so on...
how may i create a decode function such that it accepts one of the added (encoded) values, such as 7, 5, 3, 4, etc and figures out the combination and return the names of the people representing the two numbers that constitue the combination. take note that one number cannot be repeated to get mood result, meaning 4 has to be 3+1 and may not be 2+2. so we can assume for this example, that there is only one possible combination for each status/ mood code. now the problem is, how do you implement such code in python 3? what would be the algorithm or logic for such a problem. how do you seek or check for combination of two numbers? i'm thinking i should just run a loop that keeps on adding two numbers at a time until the result matches with the status/ mood code. will that work? BUT THIS METHOD WILL SOON BECOME OBSOLETE IF THE NUMBER OF COMBINATIONS IS INCREASED (as in adding 4 numbers together instead of 2). doing it this way will take up a lot of time and will possibly be inefficient.
i apologize, i know this questions is extremely confusing but please bear with me.
let's try and work something out.
Use Binary
If you want to have sums that are unique, then assign each possible "Person" a number that's a power of 2. The sum of any combination of these numbers will uniquely identify which numbers were used in the sum.
1, 2, 4, 8, 16, ...
Rather than offer a detailed proof of correctness, I offer an intuitive argument about this: any number can be represented in base 2, and it is always a sum of exactly one combination of powers of 2.
This solution may not be optimal. It has realistic limitations (32 or 64 different "person" identifiers, unless you use some sort of BigInt), but depending on your needs, it might work. Having the smallest possible values, binary is better than any other radix though.
Example
(Edited)
Here's a quick snippet that demonstrates how you could decode the sum. The returned values are the exponents of the powers of 2. count_persons could be arbitrarily large, as could the range of n iterated over (just as a quick example).
#!/usr/bin/python3
count_persons = 64
for n in range(20,30):
matches = list(filter(lambda i: (n>>i) & 0x1, range(1,count_persons)))
print('{0}: {1}'.format(n,matches))
Output:
20: [2, 4]
21: [2, 4]
22: [1, 2, 4]
23: [1, 2, 4]
24: [3, 4]
25: [3, 4]
26: [1, 3, 4]
27: [1, 3, 4]
28: [2, 3, 4]
29: [2, 3, 4]
See a more appropriate answer here
In my opinion, the selected answer is so suboptimal that it can be considered plain wrong.
The table you are building can be indexed with N(N-1)/2 values, while the binary approach uses 2N.
With a 64 bits unsigned integer, you could encode about sqrt(265) values, that is 6 billion names, compared with the 64 names the binary approach will allow.
Using a big number library could push the limit somewhat, but the computations involved would be hugely more costly than the simple o(N) reverse indexing algorithm needed by the alternative approach.
My conclusion is: the binary approach is grossly inefficient, unless you want to play with a handful of values, in which case hard-coding or precomputing the indexes would be just as good a solution.
Since the question is very unlikely to match a search on the subject, it is not that important anyway.

Resources