Whats is the proper name of this problem and or solution algorithm? - algorithm

I have a 2D array that holds unique integers - this represents a physical container with rows/columns - in each position there is a vial.
I know the integers that should be in the array and where they should be located.
My array however is shuffled with potentially many/all unique integers in the wrong positions.
I now need to sort the array - however this maps to a physical process and therefore I really want to reduce the number of sort steps involved due to potential human error.
Is this just a plain sort? or is there a more specific name for this scenario? Is there well known solutions?
My colleague has suggested just creating a list of swap [1][1] with [2][1] type instructions, which seems reasonable however I can't quite get my head around if the order of swaps is important.
All assistance grateful.

If you really can tell, just by looking at the vial, where it belongs, the shortest way is to take the first vial that is in the wrong place out, then put it where it belongs, take whatever was there, put it to its proper place, etc., until you happen to get the vial that belongs where you originally made a "hole". Then repeat.
Since you take out each vial at most once, and only if it is in the wrong place, I think that this is optimal with respect to physical motion.

Sorting algorithms are analysed by the number of comparisons and the number of swaps required. Since for a human operator the cost of a swap is much higher than the cost of a comparison, you want a 2D sort that minimizes the number of swaps required.

"I can't quite get my head around if the order of swaps is important."
I general yes, it is. For a simple example consider the starting list of 3 elements, X Y Z.
The result of "swap 1 with 2, then 2 with 3" is Y Z X.
The result of "swap 2 with 3, then 1 with 2" is Z X Y.
The list of swaps you come up with will probably be (at most) 1 for each element that is out of place, and will swap that element with whatever is in its correct place. So for example you might swap [0][0] with wherever it belongs. Unless the place where it belongs happens to contain the element that belongs in [0][0], then your next swap could be, again [0][0] with wherever that belongs. So certainly the order of swaps is important - this second swap is only correct because the first swap has already happened, and moved some particular element into [0][0].
If two consecutive swaps are disjoint, though, then you can reverse their order: (1 2)(3 4) is equivalent to (3 4)(1 2), where (x y) is a mathematical notation for "swap x with y".
It's a theorem that any permutation can be written as a set of disjoint cycles. This decomposition into cycles is unique apart from which element in your cycle you choose to list first, and the order the cycles are listed, both of which are irrelevant to the result. The notation (1 2 3) means "move 1 to 2, 2 to 3, and 3 to 1", and is a 3-cycle. It's exactly the same as (2 3 1), but different from (1 3 2).
Depending how your human operative works, it might well be more efficient for them to carry out an n-cycle rather than an equivalent n swaps. So once you know how to sort your array (that is, you know what permutation must be performed on it to get it into order), it may be that the best thing to do is to generate that decomposition.

Related

Dynamic algorithm to multiply elements in a sequence two at a time and find the total

I am trying to find a dynamic approach to multiply each element in a linear sequence to the following element, and do the same with the pair of elements, etc. and find the sum of all of the products. Note that any two elements cannot be multiplied. It must be the first with the second, the third with the fourth, and so on. All I know about the linear sequence is that there are an even amount of elements.
I assume I have to store the numbers being multiplied, and their product each time, then check some other "multipliable" pair of elements to see if the product has already been calculated (perhaps they possess opposite signs compared to the current pair).
However, by my understanding of a linear sequence, the values must be increasing or decreasing by the same amount each time. But since there are an even amount of numbers, I don't believe it is possible to have two "multipliable" pairs be the same (with potentially opposite signs), due to the issue shown in the following example:
Sequence: { -2, -1, 0, 1, 2, 3 }
Pairs: -2*-1, 0*1, 2*3
Clearly, since there are an even amount of pairs, the only case in which the same multiplication may occur more than once is if the elements are increasing/decreasing by 0 each time.
I fail to see how this is a dynamic programming question, and if anyone could clarify, it would be greatly appreciated!
A quick google for define linear sequence gave
A number pattern which increases (or decreases) by the same amount each time is called a linear sequence. The amount it increases or decreases by is known as the common difference.
In your case the common difference is 1. And you are not considering any other case.
The same multiplication may occur in the following sequence
Sequence = {-3, -1, 1, 3}
Pairs = -3 * -1 , 1 * 3
with a common difference of 2.
However this is not necessarily to be solved by dynamic programming. You can just iterate over the numbers and store the multiplication of two numbers in a set(as a set contains unique numbers) and then find the sum.
Probably not what you are looking for, but I've found a closed solution for the problem.
Suppose we observe the first two numbers. Note the first number by a, the difference between the numbers d. We then count for a total of 2n numbers in the whole sequence. Then the sum you defined is:
sum = na^2 + n(2n-1)ad + (4n^2 - 3n - 1)nd^2/3
That aside, I also failed to see how this is a dynamic problem, or at least this seems to be a problem where dynamic programming approach really doesn't do much. It is not likely that the sequence will go from negative to positive at all, and even then the chance that you will see repeated entries decreases the bigger your difference between two numbers is. Furthermore, multiplication is so fast the overhead from fetching them from a data structure might be more expensive. (mul instruction is probably faster than lw).

Algorithm determining the smallest coprime subset

Given a set A of n positive integers, determine a non-empty subset B
consisting of as few elements as possible such that their GCD is 1 and output its size.
For example: 5 6 10 12 15 18
yields an output of "3", while:
5 2 4 6 8 10
equals "NONE" since no subset can be determined.
So it seems really basic but I'm still stuck with it. My thoughts on it are as follows: we know that having the multiples of some number already present in the set are useless since their divisors are the same times some factor k and we're going for the smallest subsest. Hence, for every ni, we remove any kni where k is a positive int from further calculations.
That's where I get stuck, though. What should I do next? I can only think of a dumb, brute force approach of trying if there is already some 2-element subset, then 3-elem and so on. What should I check to determine it in some more clever way?
Suppose for each A,B (two elements) we calculate their greatest common
divisor D. And then we store these D values somewhere as a map of the form:
A,B -> D
Let's say we also store the reverse map
D -> A,B
If there's at least one D=1 then there we go - the answer is 2.
Suppose now, there's no such D that D=1.
What condition should be met for the answer to be 3?
I think this one:
there exist two D values say D1 and D2 such that GCD(D1, D2)=1.
Right?
So now instead of As and Bs, we've transformed our problem to the
same problem over the set of all Ds and we've transformed the option of
a the 2 answer to the option a 3 answer. Right?
I am not 100% sure just thinking out loud.
But this transformed problem is even worse as
we have to store much more values.
(combinations of N elements class 2).
Not sure, this problem you pose seems like a hard
problem to me. I would be surprised if there exists
a better approach than brute-force
and would be interested to know it.
What you need to think on (and look for) is this:
is there a way to express GCD(a1, a2, ... aN)
if you know their pair-wise GCDs. If there's some
sort of method or formula you can simplify a bit
your search (for the smallest subset matching
the desired criterion).
See also this link. Maybe it could help.
https://cs.stackexchange.com/questions/10249/finding-the-size-of-the-smallest-subset-with-gcd-1
The problem is definitely a tough one to solve. I can't see any computationally efficient algorithm that would guaranteed find the solution in reasonable time.
One approach is:
Form a list of ordered sets that would contain the prime factors of each element in the original set.
Now you need to find the minimum number of sets for which their intersection is zero.
To do that, first order these sets in your list so that the sets that have least number of intersections with other sets are towards the beginning. Now what are "least number of intersections"?
This is where heuristics come into play. It can be:
1. set having Less of MIN number of intersections with other elements.
2. set having Less of MAX number of intersections with other elements.
3. Any other more suitable definition.
Now you will need to expensively iterate through all the combinations maybe through recursion to determine the solution.

Algorithm for expressing reordering, as minimum number of object moves

This problem arises in synchronization of arrays (ordered sets) of objects.
Specifically, consider an array of items, synchronized to another computer. The user moves one or more objects, thus reordering the array, behind my back. When my program wakes up, I see the new order, and I know the old order. I must transmit the changes to the other computer, reproducing the new order there. Here's an example:
index 0 1 2
old order A B C
new order C A B
Define a move as moving a given object to a given new index. The problem is to express the reordering by transmitting a minimum number of moves across a communication link, such that the other end can infer the remaining moves by taking the unmoved objects in the old order and moving them into as-yet unused indexes in the new order, starting with the lowest index and going up. This method of transmission would be very efficient in cases where a small number of objects are moved within a large array, displacing a large number of objects.
Hang on. Let's continue the example. We have
CANDIDATE 1
Move A to index 1
Move B to index 2
Infer moving C to index 0 (the only place it can go)
Note that the first two moves are required to be transmitted. If we don't transmit Move B to index 2, B will be inferred to index 0, and we'll end up with B A C, which is wrong. We need to transmit two moves. Let's see if we can do better…
CANDIDATE 2
Move C to index 0
Infer moving A to index 1 (the first available index)
Infer moving B to index 2 (the next available index)
In this case, we get the correct answer, C A B, transmitting only one move, Move C to index 0. Candidate 2 is therefore better than Candidate 1. There are four more candidates, but since it's obvious that at least one move is needed to do anything, we can stop now and declare Candidate 2 to be the winner.
I think I can do this by brute forcibly trying all possible candidates, but for an array of N items there are N! (N factorial) possible candidates, and even if I am smart enough to truncate unnecessary searches as in the example, things might still get pretty costly in a typical array which may contain hundreds of objects.
The solution of just transmitting the whole order is not acceptable, because, for compatibility, I need to emulate the transmissions of another program.
If someone could just write down the answer that would be great, but advice to go read Chapter N of computer science textbook XXX would be quite acceptable. I don't know those books because, I'm, hey, only an electrical engineer.
Thanks!
Jerry Krinock
I think that the problem is reducible to Longest common subsequence problem, just find this common subsequence and transmit the moves that are not belonging to it. There is no prove of optimality, just my intuition, so I might be wrong. Even if I'm wrong, that may be a good starting point to some more fancy algorithm.
Information theory based approach
First, have a bit series such that 0 corresponds to 'regular order' and 11 corresponds to 'irregular entry'. Whenever there in irregular entry also add the original location of the entry that is next.
Eg. Assume original order of ABCDE for the following cases
ABDEC: 001 3 01 2
BCDEA: 1 1 0001 0
Now, if the probability of making a 'move' is p, this method requires roughly n + n*p*log(n) bits.
Note that if p is small the number of 0s is going to be high. You can further compress the result to:
n*(p*log(1/p) + (1-p)*log(1/(1-p))) + n*p*log(n) bits

Genetic algorithms: How to do crossover in "subset" problems?

I have a problem which I am trying to solve with genetic algorithms. The problem is selecting some subset (say 4) of 100 integers (these integers are just ids that represent something else). Order does not matter, the solution to the problem is a SET of integers not an ordered list. I have a good fitness function but am having trouble with the crossover function.
I want to be able to mate the following two chromosomes:
[1 2 3 4] and
[3 4 5 6] into something useful. Clearly I cannot use the typical crossover function because I could end up with duplicates in my children which would represent invalid solutions. What is the best crossover method in this case.
Just ignore any element that occurs in both of the sets (i.e. in their intersection.), that is leave such elements unchanged in both sets.
The rest of the elements form two disjoint sets, to which you can apply pretty much any random transformation (e.g. swapping some pairs randomly) without getting duplicates.
This can be thought of as ordering and aligning both sets so that matching elements face each other and applying one of the standard crossover algorithms.
Sometimes it is beneficial to let your solution go "out of bounds" so that your search will converge more quickly. Rather than making a set of 4 unique integers a requirement for your chromosome, make the number of integers (and their uniqueness) part of the fitness function.
Since order doesn't matter, just collect all the numbers into an array, sort the array, throw out the duplicates (by disconnecting them from a linked list, or setting them to a negative number, or whatever). Shuffle the array and take the first 4 numbers.
I don't really know what you mean on "typical crossover", but I think you could use a crossover similar to what is often used for permutations:
take m ints from the first parent (m < n, where n is the number of ints in your sets)
scan the second and fill your subset from it with (n-m) ints that are free (not in the subset already).
This way you will have n ints from the first and n-m ints from the second parent, without duplications.
Sounds like a valid crossover for me :-).
I guess it might be beneficial not to do either steps on ordered sets (or using an iterator where the order of returned elements correlates somehow with the natural ordering of ints), otherwise either smaller or higher numbers will get a higher chance to be in the child making your search biased.
If it is the best method depends on the problem you want to solve...
In order to combine sets A and B, you could choose the resulting set S probabilistically so that the probability that x is in S is (number of sets out of A, B, which contain x) / 2. This will be guaranteed to contain the intersection and be contained in the union, and will have expected cardinality 4.

Dynamic Programming: Sum-of-products

Let's say you have two lists, L1 and L2, of the same length, N. We define prodSum as:
def prodSum(L1, L2) :
ans = 0
for elem1, elem2 in zip(L1, L2) :
ans += elem1 * elem2
return ans
Is there an efficient algorithm to find, assuming L1 is sorted, the number of permutations of L2 such that prodSum(L1, L2) < some pre-specified value?
If it would simplify the problem, you may assume that L1 and L2 are both lists of integers from [1, 2, ..., N].
Edit: Managu's answer has convinced me that this is impossible without assuming that L1 and L2 are lists of integers from [1, 2, ..., N]. I'd still be interested in solutions that assume this constraint.
I want to first dispell a certain amount of confusion about the math, then discuss two solutions and give code for one of them.
There is a counting class called #P which is a lot like the yes-no class NP. In a qualitative sense, it is even harder than NP. There is no particular reason to believe that this counting problem is any better than #P-hard, although it could be hard or easy to prove that.
However, many #P-hard problems and NP-hard problems vary tremendously in how long they take to solve in practice, and even one particular hard problem can be harder or easier depending on the properties of the input. What NP-hard or #P-hard mean is that there are hard cases. Some NP-hard and #P-hard problems also have less hard cases or even outright easy cases. (Others have very few cases that seem much easier than the hardest cases.)
So the practical question could depend a lot on the input of interest. Suppose that the threshold is on the high side or on the low side, or you have enough memory for a decent number of cached results. Then there is a useful recursive algorithm that makes use of two ideas, one of them already mentioned: (1) After partially assigning some of the values, the remaining threshold for list fragments may rule out all of the permutations, or it may allow all of them. (2) Memory permitting, you should cache the subtotals for some remaining threshold and some list fragments. To improve the caching, you might as well pick the elements from one of the lists in order.
Here is a Python code that implements this algorithm:
list1 = [1,2,3,4,5,6,7,8,9,10,11]
list2 = [1,2,3,4,5,6,7,8,9,10,11]
size = len(list1)
threshold = 396 # This is smack in the middle, a hard value
cachecutoff = 6 # Cache results when up to this many are assigned
def dotproduct(v,w):
return sum([a*b for a,b in zip(v,w)])
factorial = [1]
for n in xrange(1,len(list1)+1):
factorial.append(factorial[-1]*n)
cache = {}
# Assumes two sorted lists of the same length
def countprods(list1,list2,threshold):
if dotproduct(list1,list2) <= threshold: # They all work
return factorial[len(list1)]
if dotproduct(list1,reversed(list2)) > threshold: # None work
return 0
if (tuple(list2),threshold) in cache: # Already been here
return cache[(tuple(list2),threshold)]
total = 0
# Match the first element of list1 to each item in list2
for n in xrange(len(list2)):
total += countprods(list1[1:],list2[:n] + list2[n+1:],
threshold-list1[0]*list2[n])
if len(list1) >= size-cachecutoff:
cache[(tuple(list2),threshold)] = total
return total
print 'Total permutations below threshold:',
print countprods(list1,list2,threshold)
print 'Cache size:',len(cache)
As the comment line says, I tested this code with a hard value of the threshold. It is quite a bit faster than a naive search over all permutations.
There is another algorithm that is better than this one if three conditions are met: (1) You don't have enough memory for a good cache, (2) the list entries are small non-negative integers, and (3) you're interested in the hardest thresholds. A second situation to use this second algorithm is if you want counts for all thresholds flat-out, whether or not the other conditions are met. To use this algorithm for two lists of length n, first pick a base x which is a power of 10 or 2 that is bigger than n factorial. Now make the matrix
M[i][j] = x**(list1[i]*list2[j])
If you compute the permanent of this matrix M using the Ryser formula, then the kth digit of the permanent in base x tells you the number of permutations for which the dot product is exactly k. Moreover, the Ryser formula is quite a bit faster than the summing over all permutations directly. (But it is still exponential, so it does not contradict the fact that computing the permanent is #P-hard.)
Also, yes it is true that the set of permutations is the symmetric group. It would be great if you could use group theory in some way to accelerate this counting problem. But as far as I know, nothing all that deep comes from that description of the question.
Finally, if instead of exactly counting the number of permutations below a threshold, you only wanted to approximate that number, then probably the game changes completely. (You can approximate the permanent in polynomial time, but that doesn't help here.) I'd have to think about what to do; in any case it isn't the question posed.
I realized that there is another kind of caching/dynamic programming that is missing from the above discussion and the above code. The caching implemented in the code is early-stage caching: If just the first few values of list1 are assigned to list2, and if a remaining threshold occurs more than once, then the cache allows the code to reuse the result. This works great if the entries of list1 and list2 are integers that are not too large. But it will be a failed cache if the entries are typical floating point numbers.
However, you can also precompute at the other end, when most of the values of list1 have been assigned. In this case, you can make a sorted list of the subtotals for all of the remaining values. And remember, you can use up list1 in order, and do all of the permutations on the list2 side. For example, suppose that the last three entries of list1 are [4,5,6], and suppose that three of the values in list2 (somewhere in the middle) are [2.1,3.5,3.7]. Then you would cache a sorted list of the six dot products:
endcache[ [2.1, 3.5, 3.7] ] = [44.9, 45.1, 46.3, 46.7, 47.9, 48.1]
What does this do for you? If you look in the code that I did post, the function countprods(list1,list2,threshold) recursively does its work with a sub-threshold. The first argument, list1, might have been better as a global variable than as an argument. If list2 is short enough, countprods can do its work much faster by doing a binary search in the list endcache[list2]. (I just learned from stackoverflow that this is implemented in the bisect module in Python, although a performance code wouldn't be written in Python anyway.) Unlike the head cache, the end cache can speed up the code a lot even if there are no numerical coincidences among the entries of list1 and list2. Ryser's algorithm also stinks for this problem without numerical coincidences, so for this type of input I only see two accelerations: Sawing off a branch of the search tree using the "all" test and the "none" test, and the end cache.
Probably not (without the simplifying assumption): your problem is NP-Hard. Here's a trivial reduction to SUBSET-SUM. Let count_perms(L1, L2, x) represent the function "count the number of permutations of L2 such that prodSum(L1, L2) < x"
SUBSET_SUM(L2,n): # (determine if any subset of L2 adds up to n)
For i in [1,...,len(L2)]
Set L1=[0]*(len(L2)-i)+[1]*i
calculate count_perms(L1,L2,n+1)-count_perms(L1,L2,n)
if result positive, return true
Return false
Thus, if there were a way to calculate your function count_perms(L1, L2, x) efficiently, then we would have an efficient algorithm to calculate SUBSET_SUM(L2,n).
This also turns out to be an abstract algebra problem. It's been awhile for me, but here's a few things to get started. There's nothing terribly significant about the following (it's all very basic; an expansion on the fact that every group is isomorphic to a permutation group), but it provides a different way of looking at the problem.
I'll try to stick to fairly standard notation: "x" is a vector, and "xi" is the ith component of x. If "L" is a list, L is the equivalent vector. "1n" is a vector with all components = 1. The set of natural numbers ℕ is taken to be the positive integers. "[a,b]" is the set of integers from a through b, inclusive. "θ(x, y)" is the angle formed by x and y
Note prodSum is the dot product. The question is equivalent to finding all vectors L generated by an operation (permuting elements) on L2 such that θ(L1, L) less than a given angle α. The operation is equivalent to reflecting a point in ℕn through a subspace with presentation:
< ℕn | (xixj-1)(i,j) ∈ A >
where i and j are in [1,n], A has at least one element and no (i,i) is in A (i.e. A is a non-reflexive subset of [1,n]2 where |A| > 0). Stated more plainly (and more ambiguously), the subspaces are the points where one or more components are equal to one or more other components. The reflections correspond to matrices whose columns are all the standard basis vectors.
Let's name the reflection group "RPn" (it should have another name, but memory fails). RPn is isomorphic to the symmetric group Sn. Thus
|RPn| = |Sn| = n!
In 3 dimensions, this gives a group of order 6. The reflection group is D3, the triangle symmetry group, as a subgroup of the cube symmetry group. It turns out you can also generate the points by rotating L2 in increments of π/3 around the line along 1n. This is the the modular group ℤ6 and this points to a possible solution: find a group of order n! with a minimal number of generators and use that to generate the permutations of L2 as sequences with increasing, then decreasing, angle with L2. From there, we can try to generate the elements L with θ(L1, L) < α directly (for example we can binsearch on the 1st half of each sequence to find the transition point; with that, we can specify the rest of the sequence that fulfills the condition and count it in O(1) time). Let's call this group RP'n.
RP'4 is constructed of 4 subspaces isomorphic to ℤ6. More generally, RP'n is constructed of n subspaces isomorphic to RP'n-1.
This is where my abstract algebra muscles really begins to fail. I'll try to keep working on the construction, but Managu's answer doesn't leave much hope. I fear that reducing RP3 to ℤ6 is the only useful reduction we can make.
It looks like if l1 and l2 are both ordered high->low (or low->high, whatever, if they have the same order), the result is maximized, and if they are ordered oposite, the result is minimized, and other alterations of order appear to follow some rules; swapping two numbers in a continuous list of integers always reduces the sum by a fixed amount which seems to be related to their distance apart (ie swapping 1 and 3 or 2 and 4 have the same effect). This was just from a little messing around, but the idea is that there is a maximum, a minimum, and if some-pre-specified-value is between them, there are ways to count the permutations that make that possible (although; if the list isn't evenly spaced, then there aren't. Well, not that I know of. If l2 is (1 2 4 5) swapping 1 2 and 2 4 would have different effects)

Resources