Select optimal pairings of elements - algorithm

I have following problem e.g:
Given a bucket with symbols
1 1 2 3 3 4
And book of recipes to create pairs e.g:
12 13 24
Select from bucket optimal pairing, leaving as little as possible symbols in the bucket. So using the examplary values above the optimal pairing would be:
13 13 24 Which would use all the symbols given.
Naive picking from the bucket could result in something like:
12 13 Leaving the 3 and 4 unmatched. 3 and 4 cannot be matched because the book does not contain a recipe for that particular connection
Notes:
Real problem consits on average of: 500 elements in bucket in about 30 kind of symbols.
We've tried to implement the solution using the bruteforce algorithm, however I am afraid that even our grandchildren will not live long enough to see the result :).
There is no limit to the size of recipe book, it could even have every possible in the bucket. Pair made of the same element twice is not allowed.
The answer is not required to empty the bucket completely. Its just about getting the most pairs out of the bucket. Its okay to leave some in the bucket. It would be best to look for the optimal solution, however close approximation is also good enough.
I will appreciate an answer that proposes/gives hint to an algorithm to solve the problem.
Examples:
Bucket:
1 1 2 2 2 2 3 3 3 4 5 6 7 8 8
Recipe book:
12 34 15 68
Optimal result (one of possible):
{1 2} {1 2} {3 4} {6 8}
Leftover:
2 2 3 3 5 7 8

This problem is essentially the maximum matching problem with the small twist that you're allowed to have duplicate objects. Here's one way to solve this problem assuming you have a solver for maximum matching:
Create a node for each number in the input list.
For each recipe, for each pair of numbers matching that recipe, add an edge between the nodes for those numbers.
Run a maximum matching algorithm and return the pairs reported that way.
There are a good number of off-the-shelf maximum matching algorithms you can use, and if you need to code one up yourself, consider Edmonds' Blossom Algorithm, which is reasonably efficient and less tricky to code up than other approaches.

First generate all possibles pairs of symbols and store them with the indices of each symbol , so if you have n symbols , then n*(n+1)/2 pairs are going to be generated (max case n=500 then 125250 pairs are going to be generated ).
Ex : bucket with symbols 1 1 3
Then pairs are going to be generated are (11,1,2)(13,1,3)(13,2,3).
General format ( a[i]a[j], i, j ).
Now lets loop over generated pairs and delete pairs that doesn't exist in the book of recipes, so now we have at most 30 pairs .
Next lets build a graph such that the nodes are our generated pairs, and each 2 nodes are connected if the indices of the 2 pairs are different (using 2 nested loops over our pairs ) .
Finally we can perform BFS or DFS and find the longest graph between all generated graphs , which has the answer to our problem.
If you want c++/Java implementation ,please don't hesitate to ask.

Related

TSP Genetic Algorithm - path representation and identical tour problem

We are implementing path representation to solve our travelling salesman problem using a genetic algorithm. However, we were wondering how to solve the issue that there might be identical tours in our individuals, but which are recognised by the path representation as different individuals. An example:
Each individual consists of an array, in which the elements are the cities visited in order.
Individual 1:
[1 2 3 4 5]
Individual 2:
[4 5 1 2 3]
You can see that the tour in 1 and 2 are actually identical, only the "starting" location is different.
We see some solutions to this problem, but we were wondering which one would be the best, or if there are best practices to overcome this problem from literature/experiments/....
Solution 1
Sort each individual to see if the individuals are identical:
1. pick an individual
2. shift the elements by 1 element to the right (at the end of the array, elements are placed at the beginning of the array)
3. check if this shift now matches an individual
4. if not, repeat steps 3 to 4
Solution 2
1. At the start if the simulations, choose a fixed starting point of the cities.
2. If the fixed starting point would change (mutation, recombination,...) then
3. Shift the array so that chosen starting point is back on first index.
Solution 3
1. Use the adjacency representation to check which individuals are identical.
2. Pass this information on to the path representation.
3. This is used to "correct" the individuals.
Solution 1 and 2 seem time consuming, although 2 would probably need much less computing time. Solution 3 would need to constantly switch from one to the other representation.
Then there is also the issue that in our example the tour can be read in 2 ways:
[1 2 3 4 5]
is the same as
[5 4 3 2 1]
Again here, are there any best practises the solve this?
Since you need to visit every city and return to the origin city, you can simply fix the origin. That solves your problem of shifted equivalent tours outright.
For the other, less important issue of mirrored tours, you can start by sorting your individuals by cost (which you probably already do), and check any pair of tours with equal costs using a simple palindrome-checking algorithm.

How to display all ways to give change

As far as I know, counting every way to give change to a set sum and a starting till configuration is a classic Dynamic Programming problem.
I was wondering if there was a way to also display (or store) the actual change structures that could possibly amount to the given sum while preserving the DP complexity.
I have never saw this issue being discussed and I would like some pointers or a brief explanation of how this can be done or why this cannot be done.
DP for change problem has time complexity O(Sum * ValuesCount) and storage complexity O(Sum).
You can prepare extra data for this problem in the same time as DP for change, but you need more storage O(O(Sum*ValuesCount), and a lot of time for output of all variants O(ChangeWaysCount).
To prepare data for way recovery, make the second array B of arrays (or lists). When you incrementing count array A element from some previous element, add used value to corresponding element of B. At the end, unwind all the ways from the last element.
Example: values 1,2,3, sum 4
index 0 1 2 3 4
A 0 1 2 3 4
B - 1 1 2 1 2 3 1 2 3
We start unwinding from B[4] elements:
1-1-1-1 (B[4]-B[3]-B[2]-B[1])
2-1-1 (B[4]-B[2]-B[1])
2-2 (B[4]-B[2])
3-1 (B[4]-B[1])
Note that I have used only ways with non-increasing values to avoid permutation variants (i.e. 1-3 and 3-1)

Algorithm to find adjacent cells in a matrix

For example, consider a non-wraparound 4x4 matrix;
1 2 5 1
5 2 5 2
9 3 1 7
2 9 0 3
If I wanted to find the neighbours of, say, the 5 in the first row = 2,5,1. Is there a more efficient solution than doing two for loops and adding a bunch of if conditions?
Yes. If you really need to find the neighbors, then you have an option to use graphs.
Graphs are basically vertex classes w/ their adjacent vertexes, forming an edge. We can see here that 2 forms an edge w/ 5, and 1 form an edge w/ 5, etc.
If you're going to need to know the neighbors VERY frequently(because this is inefficient if you're not), then implement your own vertex class, wrapping the value(5) in a generic T val variable. Have a hashtable of adjacent numbers and their respective distances(1 in this case, and if you need to find neighbors of 2, then you're going to need to assign those as well) by add(vertex, distance) into the hashtable.
Later on, simply iterate through the hashtable for the neighbors.
However, for an array this simple, there isn't much overhead for just doing a for loop and using "a bunch of if statements". In reality you only need to have if(boundaries check) for every direction(which is 4).
Hopefully this helps.

Generate multiple sequences of numbers with unique values at each index

I have a row with numbers 1:n. I'm looking to add a second row also with the numbers 1:n but these should be in a random order while satisfying the following:
No positions have the same number in both rows
No combination of numbers occurs twice
For example, in the following
Row 1: 1 2 3 4 5 6 7 ...
Row 2: 3 6 15 8 13 12 7 ...
the number 7 occurs at the same position in both rows 1 and 2 (namely position 7; thereby not satisfying rule 1)
while in the following
Row 1: 1 2 3 4 5 6 7 ...
Row 2: 3 7 15 8 13 12 2 ...
the combination of 2+7 appears twice (in positions 2 and 7; thereby not satisfying rule 2).
It would perhaps be possible – but unnecessarily time-consuming – to do this by hand (at least up until a reasonable number), but there must be quite an elegant solution for this in MATLAB.
This problem is called a derangment of a permutation.
Use the function randperm, in order to find a random permutation of your data.
x = [1 2 3 4 5 6 7];
y = randperm(x);
Then, you can check that the sequence is legal. If not, do it again and again..
You have a probability of about 0.3 each time to succeed, which means that you need roughly 10/3 times to try until you find it.
Therefore you will find the answer really quickly.
Alternatively, you can use this algorithm to create a random derangment.
Edit
If you want to have only cycles of size > 2, this is a generalization of the problem.
In it is written that the probability
in that case is smaller, but big enough to find it in a fixed amount of steps. So the same approach is still valid.
This is fairly straightforward. Create a random permutation of the nodes, but interpret the list as follows: Interpret it as a random walk around the nodes, and if node 'b' appears after node 'a', it means that node 'b' appears below node 'a' in the lists:
So if your initial random permutation is
3 2 5 1 4
Then the walk in this case is 3 -> 2 -> 5 -> 1 -> 4 and you creates the rows as follows:
Row 1: 1 2 3 4 5
Row 2: 4 5 2 3 1
This random walk will satisfy both conditions.
But do you wish to allow more than one cycle in your network? I know you don't want two people to have each other's hat. But what about 7 people, where 3 of them have each other's hats and the other 4 have each other's hats? Is this acceptable and/or desirable?
Andrey has already pointed you to randperm and the rejection-sampling-like approach. After generating a permutation p, an easy way to check whether it has fixed point is any(p==1:n). An easy way to check whether it contains cycles of length 2 is any(p(p)==1:n).
So this gets permutations p of 1:n fulfilling your requirements:
p=[];
while (isempty(p))
p=randperm(n);
if any(p==1:n), p=[];
elseif any(p(p)==1:n), p=[];
end
end
Surrounding this with a for loop and for each counting the iterations of the while loop, it seems that one needs to generate on average 4.5 permutations for every "valid" one (and 6.2 if cycles of length three are not allowed, either). Very interesting.

Binary search on unsorted arrays?

I came across this document Binary Search Revisited where the authors have proved/explained that binary search can be used for unsorted arrays (lists) as well. I haven't grokked much of the document on a first reading.
Have any of you already explored this ?
I've just read the paper. To me the author uses the term binary search to address the Bisection method used to find the zeros of a continuous function.
The examples in the paper are clearly inspired to problems like find the zero into an
interval (with translation on the y axe) or find the max/min of a function in tabular data.
The arrays the essay consider are not random filled ones, you will find a rule to construct them (it is the rule tied to the function used to dump them)
Said that it is a good chance of tinkering about different algorithms belonging to a common family in order to find similarity and differences. A good chance to expand your experiences.
Definitely not a new concept or an undervalued one.
Lookin for 3 into that unsorted list with binary or bisection :
L = 1 5 2 9 38 11 3
1-Take mid point in the whole list L : 9
3 < 9 so delete the right part of the list (38 11 3)
here you can already understand you will never find 3
2-Take mid point in the remaining list 1 5 2 : 5
3 > 5 so delete the right part of the list (5 2)
remains 1
Result : 3 unfound
Two remarks:
1-the binary or bisection algorithm consider right and left as an indication of the order
So i have rudely applied the usual algo considering right is high and left is low
If you consieder the opposit, ie right is low and left is high, then, trying to find 3 in this slighty similar list will lead to " 3 unfound"
L' = L = 1 5 2 9 3 38 11
3 < 9 / take right part : 3 38 11
mid point 38
3 < 38 take right part : 11
3 unfound
2- if you accept to re apply systematicly the algorithm on the dropped part of the list than it leads to searching the element in a list of n elements Complexity will be O(n) exactly the same as running all the list from beg to end to search your value.
The time of search could be slightly shorter.
Why ? let's consider you look one by one from beg. to end for the value 100000 in a sorted list. You will find it at the end of your list ! :-)
If now this list is unorderd and your value 100000 is for example exactly at the mid point ... bingo !!
Binary Search can be implemented on a rotated unsorted array/list.

Resources