I have a collection sharing problem, and I dont know its name.
The problem is simple, is their a general formula or algorithm so that,
With any set made up of numbers, can the numbers be shared into groups, where each group has constraints (of whats allowed in the set).
For example split the following set into the groups matching the conditions, so that each number is used only once, if the numbers can not be split fairly remaining numbers should be left, S{1,2,3,4,5,6,7},
A:Odd
B:Even
C:Multiple of 3
in this example the answer is obvious A:{1,5} B:{2,4} C:{3,6} S:{7} but each general method I come up with fails with certain cases.
I tried simplifying the problem by going directly what numbers are valid in each set, in the above example, this would make the inputs the following sets instead
A: 1,3,5,7
B: 2,4,6
C: 3,6
I thought about how if a number has no conflicts (not apart of more then one set) you can just give it to that set, and I feel like this is an intuitive part of the solution. When it comes to conflicts I tried a few different ways, Giving it to the set with the smallest amount of remaining valid numbers, Giving it to the set who currently has the smallest number of values in it.
Both of these break in certain scenarios, in the case where the smallest list, or the one with the least number of valid numbers takes an item that it doesn't need and something else does, as in the case of:
A: 1,2,3,4
B: 3,4,5,6,7,8,9,10,11,12
C: 5,6,7,8,9,10,11,12
D: 4,5,6,7,8,9,10,11,12
When giving each conflict to the number where the number of valid numbers left are the smallest, A would take 1, 2, 3 and 4 more than its fair share of 3. Of course you could check for this, but I fear that this would appear in more scenarios than just the last number of the set.
In this case with my method the result is going to be:
A: 1,2,3,4
B: 5,8,11
C: 6,9,12
D: 7,10
An correct answer for the above case should've been:
A: 1,2,3
B: 4,5,6
C: 7,8,9
D: 10,11,12
I tried brute forcing this problem through checking permutations but as you can imagine it scaled quite unmanageable quite fast.
My feeling is this is an np-complete problem, but I am unsure how to prove this or describe the problem as such if it is, or which np-complete problem it is.
Related
Edit2: I think the solution of David Eisenstat works but I will check it before I call the question solved.
Example list of strings:
1.) "a"
2.) "ab"
3.) "bc"
4.) "dc"
5.) "efa"
6.) "ef"
7.) "gh"
8.) "hi"
You can choose number 1.) there's 1 string and 1 letter in it: "a"
You can also choose 1.) and 2.) these are 2 strings with only two different letters in them "a" and "b"
other valid string combinations:
1.) 2.) 3.)
1.) 5.) 6.)
there's no valid combination with "h" (it would be ideal if cases like this could be proven however you can assume the program only needs to work when there's a valid answer)
There could be an extra condition that the strings you choose must include one specified letter, however simply finding all the possible combinations would solve the problem just as well. eg. specified letter "c" the only solution in this case would be: 1.) 2.) 3.)
[optional information] The purpose of this: I want to make a program which can choose from a big list of equations (probably around 100) which ones can be used to solve for a variable. Each equation gets one string, each letter in the string representing one unknown. The list of equations are all different eg. cannot be derived from each other, so you need as many equations as many unknowns there are in them. Solving for the unknowns will be done in a CAS, so you don't need to worry about it. However I believe the CAS (Maxima) might have a limit on how many equations it can solve simultaneously and it might be too slow if you give it too many unnecessary equations at a time.
As a start I would use an algorithm to reduce the number of strings just to make it faster. First all strings containing specified letter are in the reduced list, then all strings containing the letters from the strings in the reduced list are part of the reduced list until none is added. eg reduced list of "g" would be 7.) "gh" and 8.) "hi" This would only remove some unnecessary strings, but the task would remain the same with the rest.
I think this can be solved by taking away unnecessary strings from the reduced list until all the remaining are needed, however I don't know how to explicitly define which strings would be unnecessary (except for those mentioned in the previous paragraph).
If you work with the extra condition: This is an optimization task. I don't need a perfect solution, only an optimal solution. The program doesn't need to find the absolute minimum number of strings that give a solution. Having a few extra strings in the solution would probably only slow the computer down, but it would be acceptable.
Edit: Optional clarification about the meaning of the strings: Each letter in a string represent an unknown in an equation so the equation a=2 would be represented by "a" because that's the only unknown. The equation a+b=0 would be represented by "ab" and b^2-c=0 by "bc"
I'm not sure what to call this problem. It seems NP-hard, so I'm going to suggest an integer programming formulation, which can be attacked by an off-the-shelf solver.
Let x_i be a 0-1 variable indicating whether equation i is included in the output. Let y_j be a 0-1 variable indicating whether variable j is included in the output. We have constraints
for all equations i, for all variables j in equation i, y_j - x_i >= 0.
We need as many equations as variables in the output.
(sum over all equations i of x_i) - (sum over all variables j of y_j) = 0
As you point out, the empty set needs specifically to be disallowed. Let k be a variable that must appear in the output.
sum over all equations i containing variable k of x_i >= 1
Naturally, the objective is
minimize sum over all equations i of x_i.
I'am looking for an algorithm with which it is possible to derive a key from an already happened shuffling-process.
Assume we've got the string "Hello" which was shuffled:
"hello" -> "loelh"
Now I would like to derive a key k from it which i could use to undo the shuffling. So if we use k as input parameter for a deterministic shuffling-algorithm like for example Fisher-Yates and shuffle "loelh" again, we would restore the initial string "hello".
What i do not mean is to simply use one and the same deterministic shuffling algorithm to shuffle and de-shuffle. That's because in my case the first string would not have been really shuffled in the classical sense. Actually there would be two sets of data (byte or bit-arrays) which are just given and we want to get from the first to the second one with just a key which has been derived before.
I hope it's clear what I want to achieve and I would appreciate all hints or proposed solutions.
Regards,
Merrit
UPDATE:
Another attemp:
basically, one could also call it deterministic transformation of a bunch of data e.g. a byte-array, but I will stick with the "hello"-string example.
Assume we've got a transformation-algorithm transform(data, "unknown seed") where data is "hello" and unknown seed is what we are looking for. The result of transform is "loelh". We are looking for this "unknown seed" which we could use to reverse the process. At the time of the "unknown seed"-generation, both, the input data AND the result are known of course.
Later on I want to use the "unknown seed" (which should be known already ;-) to get the original string again: so this transform("loelh", seed) should lead to "hello" again.
So you could also see it as a form of equation like data*["unknown value"]=resultdata and we are trying to find the unknown value (the operator * could be any kind of operation).
First of all, let's simplify the problem greatly. Instead of permuting "hello", let's assume that you are always permuting "abcde", as that will make it easier to understand.
A shuffle is the random generation of a permutation. How the shuffle generates the permutation is irrelevant; shuffles generate permutations, that's all we need to know.
Let's state a permutation as a string containing the numbers 1 through 5. Suppose the shuffle produces permutation "21453". That is, we take the first letter and put it in position 2: _a___. We take the second letter and put it in position 1, ba___. We take the 3rd letter and put it in position 5: ab__c. We take the fourth letter and put it in position 3, bad_c, and we take the fifth letter and put it in position 4, badec.
Now you wish to deduce a "key" which allows you to "unpermute" the permutation. Well, that's just another permutation, called the inverse permutation. To compute the inverse permutation of "21453" you do the following:
find "1". It's in the 2nd spot.
find "2". It's in the 1st spot.
find "3". It's in the 5th spot.
find "4". It's in the 3rd spot.
Find "5". It's in the 4th spot.
And now read down the second column; the inverse permutation of "21453" is "21534". We are unpermuting "badec". We put the first letter in position 2: _b___. We put the second letter in position 1: ab___. We put the third letter in position 4: ab_d_. We put the fourth letter in position 5: ab_de. And we put the fifth letter in position 3: abcde.
Shuffling is just creating a random permutation of a given sequence. The typical way to do that is something like the Fisher-Yates Shuffle that you pointed out. The problem is that the shuffle program generates multiple random numbers based on a seed, and unless you implement the random number generator there's no easy way to reverse the sequence of random numbers.
There is another way to do it. What if you could generate the nth permutation of a sequence directly? That is, given the string "Fast", you define the first few permutations as:
0 Fast
1 Fats
2 Fsat
3 Fsta
... etc. for all 24 permutations
You want a random permutation of those four characters. Select a random number from 0 to 23 and then call a function to generate that permutation.
If you know the key, you can call a different function, again passing that key, to have it reverse the permutation back to the original.
In the fourth article in his series on permutations, Eric Lippert showed how to generate the nth permutation without having to generate all of the permutations that come before it. He doesn't show how to reverse the process, but doing so shouldn't be difficult if you understand how the generator works. It's well worth the time to study the entire series of articles.
If you don't know what the key (i.e. the random number used) is, then deriving the sequence of swaps required to get to the original order is expensive.
Edit
Upon reflection, it just might be possible to derive the key if you're given the original sequence and the transformed sequence. Since you know how far each symbol has moved, you should be able to derive the key. Consider the possible permutations of two letters:
0. ab 1. ba
Now, assign the letter b the value of 0, and the letter a the value of 1. What permutation number is ba? Find a in the string, swap to the left until it gets to the proper position, and multiply the number of swaps by one.
That's too easy. Consider the next one:
0. abc 1. acb 2. bac
3. cab 4. bca 5. cba
a is now 2, b is 1, and c is 0. Given cab:
swap a left one space. 1x2 = 2. Result is `acb`
swap b left one space. 1x1 = 1. Result is `abc`
So cab is permutation #3.
This does assume that your permutation generator numbers the permutations in the same way. It's also not a terribly efficient way of doing things. Worst case will require n(n-1)/2 swaps. You can optimize the swaps by moving things in an array, but it's still an O(n^2) algorithm. Where n is the length of the sequence. Not terrible for 100 or maybe even 1,000 items. Pretty bad after that, though.
Given a set A of n positive integers, determine a non-empty subset B
consisting of as few elements as possible such that their GCD is 1 and output its size.
For example: 5 6 10 12 15 18
yields an output of "3", while:
5 2 4 6 8 10
equals "NONE" since no subset can be determined.
So it seems really basic but I'm still stuck with it. My thoughts on it are as follows: we know that having the multiples of some number already present in the set are useless since their divisors are the same times some factor k and we're going for the smallest subsest. Hence, for every ni, we remove any kni where k is a positive int from further calculations.
That's where I get stuck, though. What should I do next? I can only think of a dumb, brute force approach of trying if there is already some 2-element subset, then 3-elem and so on. What should I check to determine it in some more clever way?
Suppose for each A,B (two elements) we calculate their greatest common
divisor D. And then we store these D values somewhere as a map of the form:
A,B -> D
Let's say we also store the reverse map
D -> A,B
If there's at least one D=1 then there we go - the answer is 2.
Suppose now, there's no such D that D=1.
What condition should be met for the answer to be 3?
I think this one:
there exist two D values say D1 and D2 such that GCD(D1, D2)=1.
Right?
So now instead of As and Bs, we've transformed our problem to the
same problem over the set of all Ds and we've transformed the option of
a the 2 answer to the option a 3 answer. Right?
I am not 100% sure just thinking out loud.
But this transformed problem is even worse as
we have to store much more values.
(combinations of N elements class 2).
Not sure, this problem you pose seems like a hard
problem to me. I would be surprised if there exists
a better approach than brute-force
and would be interested to know it.
What you need to think on (and look for) is this:
is there a way to express GCD(a1, a2, ... aN)
if you know their pair-wise GCDs. If there's some
sort of method or formula you can simplify a bit
your search (for the smallest subset matching
the desired criterion).
See also this link. Maybe it could help.
https://cs.stackexchange.com/questions/10249/finding-the-size-of-the-smallest-subset-with-gcd-1
The problem is definitely a tough one to solve. I can't see any computationally efficient algorithm that would guaranteed find the solution in reasonable time.
One approach is:
Form a list of ordered sets that would contain the prime factors of each element in the original set.
Now you need to find the minimum number of sets for which their intersection is zero.
To do that, first order these sets in your list so that the sets that have least number of intersections with other sets are towards the beginning. Now what are "least number of intersections"?
This is where heuristics come into play. It can be:
1. set having Less of MIN number of intersections with other elements.
2. set having Less of MAX number of intersections with other elements.
3. Any other more suitable definition.
Now you will need to expensively iterate through all the combinations maybe through recursion to determine the solution.
I have a 2D array that holds unique integers - this represents a physical container with rows/columns - in each position there is a vial.
I know the integers that should be in the array and where they should be located.
My array however is shuffled with potentially many/all unique integers in the wrong positions.
I now need to sort the array - however this maps to a physical process and therefore I really want to reduce the number of sort steps involved due to potential human error.
Is this just a plain sort? or is there a more specific name for this scenario? Is there well known solutions?
My colleague has suggested just creating a list of swap [1][1] with [2][1] type instructions, which seems reasonable however I can't quite get my head around if the order of swaps is important.
All assistance grateful.
If you really can tell, just by looking at the vial, where it belongs, the shortest way is to take the first vial that is in the wrong place out, then put it where it belongs, take whatever was there, put it to its proper place, etc., until you happen to get the vial that belongs where you originally made a "hole". Then repeat.
Since you take out each vial at most once, and only if it is in the wrong place, I think that this is optimal with respect to physical motion.
Sorting algorithms are analysed by the number of comparisons and the number of swaps required. Since for a human operator the cost of a swap is much higher than the cost of a comparison, you want a 2D sort that minimizes the number of swaps required.
"I can't quite get my head around if the order of swaps is important."
I general yes, it is. For a simple example consider the starting list of 3 elements, X Y Z.
The result of "swap 1 with 2, then 2 with 3" is Y Z X.
The result of "swap 2 with 3, then 1 with 2" is Z X Y.
The list of swaps you come up with will probably be (at most) 1 for each element that is out of place, and will swap that element with whatever is in its correct place. So for example you might swap [0][0] with wherever it belongs. Unless the place where it belongs happens to contain the element that belongs in [0][0], then your next swap could be, again [0][0] with wherever that belongs. So certainly the order of swaps is important - this second swap is only correct because the first swap has already happened, and moved some particular element into [0][0].
If two consecutive swaps are disjoint, though, then you can reverse their order: (1 2)(3 4) is equivalent to (3 4)(1 2), where (x y) is a mathematical notation for "swap x with y".
It's a theorem that any permutation can be written as a set of disjoint cycles. This decomposition into cycles is unique apart from which element in your cycle you choose to list first, and the order the cycles are listed, both of which are irrelevant to the result. The notation (1 2 3) means "move 1 to 2, 2 to 3, and 3 to 1", and is a 3-cycle. It's exactly the same as (2 3 1), but different from (1 3 2).
Depending how your human operative works, it might well be more efficient for them to carry out an n-cycle rather than an equivalent n swaps. So once you know how to sort your array (that is, you know what permutation must be performed on it to get it into order), it may be that the best thing to do is to generate that decomposition.
I want to find which elements of two arrays make the two arrays different.
For example, if I start off with
known_unacceptable_array = [bad, bad, good, good, good, bad, good]
known_acceptable_array = []
and an array is only unacceptable if there's three bads (but I don't know that at the time), but I'm able to evaluate whether an array is acceptable or unacceptable, I would like to find the smallest array that makes the array unacceptable
possibly_minimal_unacceptable = [bad, bad, bad]
maximal_acceptable = [bad, bad] # Third bad required to make the array unacceptable
What is this problem called, and what algorithms are there for this?
Edit: The elements can't be changed in order, and adding an element can only either change the list from being acceptable to unacceptable or have no effect - it can't change it from being unacceptable to acceptable.
Background: I've randomly generated thousands of instructions that make a ruby interpreter crash, and I want to isolate the specific instructions that cause it to crash, and at the time I thought that multiple bad instructions were required to make it crash. A very naive attempt to determine what the bad instructions is at this link
What is determining the elements that make the difference
between two arrays called?
Differencing is often called
subtraction.
I want to determine which elements of two arrays make the
two arrays different.
Again, that's subtraction(at least
some form of it):
Given A ={ x , y , z } B = { x , y a },
A - B = { z , -a }
or "only A has z and only B has a", or "z and a" make them
different.
For example, if I start off with
known_bad = [bad, bad, good, good, good, bad, good] >
known_good = []
Why start with a full array and an empty one? Isn't this an
extreme case, or are these "two arrays" not two of which you
are trying to determine the "difference."
possibly_minimal_bad = [bad, bad, bad]
maximal_good = [bad, bad] # Third bad required to make the list bad
Is this just a set of rules? Or is this the result of
finding the difference between the two arrays of the previous
(known_good,bad) set?
What is this problem called, and what algorithms are there
for this?
If it isn't called "difference" or "subtraction" then why
introduce it that way?
Is the problem: a. going from
the first two arrays (known_xx) to the second two (min,max);
or is it: b. classifying finite sequences of the words "good"
and "bad."
a) I can't see a relation between the first two
arrays and the second two. How did you get from the first two
to the second?
b) Classifying a sequence of words could be
"parsing a language", or decoding a message, recognizing a
pattern, etc.
Is it "Pattern Recognition"?
It appears that you are looking for a pattern in test input(or test point) data and it's relationship to product failure,
and want to represent the relationship in some codical
form for further analysis. Or searching for a correlation between certain test points and product failure. That makes this question rather
interesting. However, the presentation of the question
is quite confusing. Maybe those groups of
equations could be explained a little more, clarifying if they are related,and if so, then: In what way?
I'm not entirely sure if I understand the question. If my answer is unsatisfactory, please rephrase your question to be more clear. I'll base my answer on this.
I want to determine which elements of two arrays make the two arrays different.
This is a combination of the three set operations union, intersection and difference. Different combinations can achieve the same result.
Complement is the the subset of A which is not in B.
Intersection is the set of elements which is both in A and B, but not just A or B.
Union is the subset which is either in A or B (no duplicates).
It sounds like you want the union of both complements, which is:
A\B ∪ B\A
Or the complement between the intersection and the union:
A∩B \ A∪B
See http://en.wikipedia.org/wiki/Set_operations_(Boolean) for more information.