I have to lists like these:
a = ["1a","2a","3a","4a","5a","6a","7a","8a","9a","10a","11a","12a","13a","14a"]
b = ["1b","2b","3b","4b","5b","6b","7b","8b","9b","10b","11b","12b","13b","14b"]
And what I want is to combine them, so that there is at least a difference of n elements between an element from a and it's corresponding element in b.
As an example, if my n is 10, and "3a" is in position 3 and "3b" is in position 5, that isn't a solution since there's only a distance of 2 between these corresponding elements.
I have already solved this for the purpose I want through a brute force method: shuffle the union of the two arrays and see if the constraint is met; if not, shuffle again and so on... Needless to say, that for 14 elements array, sometimes there is 5 to 10 second computation to yield a solution with a minimum distance of 10. Even though that's kind of ok for what I am looking for, I am curious about how I could solve this in a more optimized way.
I am currently using Python, but code in any language (or pseudo-code) is more than welcomed.
EDIT: The context of this problem is something like a questionnarie, in which around 100 participants are expected to join in. Therefore, I am not necessarily interested in all the solutions, but rather something like the first 100.
Thanks.
For your specific scenario, you could use a randomized approach -- though not as random as what you've already tried. Something like this:
start with a random permutation of the items in both lists
create a new permutation by creating a copy of the other and randomly swapping two items
measure the quality of the permutations, e.g., the sum of the distances of each pair of related items, or the minimum of such distances
if the quality of the new permutation is better than that of the original permutation, keep the new one, otherwise discard the new one and continue with the original permutation
repeat from 2. until each distance is at least 10 or until quality does not improve over a number of iterations
The difference to shuffling the whole list in each iteration (as in your approach) is that in each iteration the permutation can only get better, until a satisfying solution is found.
Each time you run this algorithm, the result will be slightly different, so you can run it 100 times for 100 different solutions. Of course, this algorithm does not guarantee to find a solution (much less all such solutions), but it should be fast enough so that you could just restart it in case it fails.
In Python, this could look somewhat like this (slightly simplified, but still working):
def shuffle(A, B):
# original positions, i.e. types of questions
kind = dict([(item, i) for i, item in list(enumerate(A)) + list(enumerate(B))])
# get positions of elements of kinds, and return sum of their distances
def quality(perm):
pos = dict([(kind[item], i) for i, item in enumerate(perm)])
return sum(abs(pos[kind[item]] - i) for i, item in enumerate(perm))
# initial permutation and quality
current = A + B
random.shuffle(current)
best = quality(current)
# improve upon initial permutation by randomly swapping items
for g in range(1000):
i = random.randint(0, len(current)-1)
j = random.randint(0, len(current)-1)
copy = current[:]
copy[i], copy[j] = copy[j], copy[i]
q = quality(copy)
if q > best:
current, best = copy, q
return current
Example output for print shuffle(a, b):
['14b', '2a', '13b', '3a', '9b', '4a', '6a', '1a', '8a', '5b', '12b', '11a', '10b', '7b', '4b', '11b', '5a', '7a', '8b', '12a', '13a', '14a', '1b', '2b', '3b', '6b', '10a', '9a']
As I understand from your question, it is possible to perform all the ordering by relying exclusively on the indices of the arrays (i.e., on pure integers) and thus the problem can be reduced to create (valid) ranges instead of analysing each element.
for each a <= total_items-n , valid b = if(a + n == total_items){total_items} else{[a + n, total_items]}
For example:
n = 10;
total_items = 15;
for a = 1 -> valid b = [11, 15]
for a = 2 -> valid b = [12, 15]
, etc.
This would be perfomed 4 times: forwards and backwards for a respect to b and the same for b respect to a.
In this way you would reduce the number of iterations to its minimum expression and would get, as an output, a set of "solutions" for each position, rather than a one-to-one binding (that is what you have right now, isn't it?).
If there are equivalents in Python to .NET's Lists and LINQ, then you may be able to directly convert the following code. It generates up to 100 lists really quickly: I press "debug" to run it and up pops a windows with the results in much less than a second.
' VS2012
Option Infer On
Module Module1
Dim minDistance As Integer = 10
Dim rand As New Random ' a random number generator
Function OkToAppend(current As List(Of Integer), x As Integer) As Boolean
' see if the previous minDistance values contain the number x
Return Not (current.Skip(current.Count - minDistance).Take(minDistance).Contains(x))
End Function
Function GenerateList() As List(Of String)
' We don't need to start with strings: integers will make it faster.
' The "a" and "b" suffixes can be sprinkled on at random once the
' list is created.
Dim numbersToUse() As Integer = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}
Dim pool As New List(Of Integer)
' we need all the numbers twice
pool.AddRange(numbersToUse)
pool.AddRange(numbersToUse)
Dim newList As New List(Of Integer)
Dim pos As Integer
For i = 0 To pool.Count - 1
' limit the effort it puts in
Dim sanity As Integer = pool.Count * 10
Do
pos = rand.Next(0, pool.Count)
sanity -= 1
Loop Until OkToAppend(newList, pool(pos)) OrElse sanity = 0
If sanity > 0 Then ' it worked
' append the value to the list
newList.Add(pool(pos))
' remove the value which has been used
pool.RemoveAt(pos)
Else ' give up on this arrangement
Return Nothing
End If
Next
' Create the final list with "a" and "b" stuck on each value.
Dim stringList As New List(Of String)
Dim usedA(numbersToUse.Length) As Boolean
Dim usedB(numbersToUse.Length) As Boolean
For i = 0 To newList.Count - 1
Dim z = newList(i)
Dim suffix As String = ""
If usedA(z) Then
suffix = "b"
ElseIf usedB(z) Then
suffix = "a"
End If
' rand.Next(2) generates an integer in the range [0,2)
If suffix.Length = 0 Then suffix = If(rand.Next(2) = 1, "a", "b")
If suffix = "a" Then
usedA(z) = True
Else
usedB(z) = True
End If
stringList.Add(z.ToString & suffix)
Next
Return stringList
End Function
Sub Main()
Dim arrangements As New List(Of List(Of String))
For i = 1 To 100
Dim thisArrangement = GenerateList()
If thisArrangement IsNot Nothing Then
arrangements.Add(thisArrangement)
End If
Next
'TODO: remove duplicate entries and generate more to make it up to
' the required quantity.
For Each a In arrangements
' outputs the elements of a with ", " as a separator
Console.WriteLine(String.Join(", ", a))
Next
' wait for user to press enter
Console.ReadLine()
End Sub
End Module
Related
I am writing in MATLAB a program that checks whether two elements A and B were exchanged in ranking positions.
Example
Assume the first ranking is:
list1 = [1 2 3 4]
while the second one is:
list2 = [1 2 4 3]
I want to check whether A = 3 and B = 4 have exchanged relative positions in the rankings, which in this case is true, since in the first ranking 3 comes before 4 and in the second ranking 3 comes after 4.
Procedure
In order to do this, I have written the following MATLAB code:
positionA1 = find(list1 == A);
positionB1 = find(list1 == B);
positionA2 = find(list2 == A);
positionB2 = find(list2 == B);
if (positionA1 <= positionB1 && positionA2 >= positionB2) || ...
(positionA1 >= positionB1 && positionA2 <= positionB2)
... do something
end
Unfortunately, I need to run this code a lot of times, and the find function is really slow (but needed to get the element position in the list).
I was wondering if there is a way of speeding up the procedure. I have also tried to write a MEX file that performs in C the find operation, but it did not help.
If the lists don't change within your loop, then you can determine the positions of the items ahead of time.
Assuming that your items are always integers from 1 to N:
[~, positions_1] = sort( list1 );
[~, positions_2] = sort( list2 );
This way you won't need to call find within the loop, you can just do:
positionA1 = positions_1(A);
positionB1 = positions_1(B);
positionA2 = positions_2(A);
positionB2 = positions_2(B);
If your loop is going over all possible combinations of A and B, then you can also vectorize that
Find the elements that exchanged relative ranking:
rank_diff_1 = bsxfun(#minus, positions_1, positions_1');
rank_diff_2 = bsxfun(#minus, positions_2, positions_2');
rel_rank_changed = sign(rank_diff_1) ~= sign(rank_diff_2);
[A_changed, B_changed] = find(rel_rank_changed);
Optional: Throw out half of the results, because if (3,4) is in the list, then (4,3) also will be, and maybe you don't want that:
mask = (A_changed < B_changed);
A_changed = A_changed(mask);
B_changed = B_changed(mask);
Now loop over only those elements that have exchanged relative ranking
for ii = 1:length(A_changed)
A = A_changed(ii);
B = B_changed(ii);
% Do something...
end
Instead of find try to compute something like this
Check if there is any exchanged values.
if logical(sum(abs(list1-list2)))
do something
end;
For specific values A and B:
if (list1(logical((list1-list2)-abs((list1-list2))))==A)&&(list1(logical((list1-list2)+abs((list1-list2))))==B)
do something
end;
Can somebody explain why this doesn't work? (meaning the variable colorChange stays the same throughout function)
Each array (leftRing and rightRing have integer values of numbers ranging from 1 through 4)
Private Sub Solve_Click(sender As System.Object, e As System.EventArgs) Handles Solve.Click
countColorChangesInRings(colorChanges)
RotateLeftRingClockwise()
countColorChangesInRings(colorChanges)
End Sub
Sub countColorChangesInRings(ByRef var As Integer)
Dim colorChangesLeft As Integer = 0
Dim colorChangesRight As Integer = 0
Dim totalChanges As Integer = 0
' Count Color Changes in Left Ring
For i = 1 To 20
If (i = 20) Then
If (leftRing(i) <> leftRing(1)) Then
colorChangesLeft += 1
End If
Else
If (leftRing(i) <> leftRing(i + 1)) Then
colorChangesLeft += 1
End If
End If
Next
' Count Color Changes in Right Ring
For i = 1 To 20
If (i = 20) Then
If (rightRing(i) <> rightRing(1)) Then
colorChangesRight += 1
End If
Else
If (rightRing(i) <> rightRing(i + 1)) Then
colorChangesRight += 1
End If
End If
Next
totalChanges = colorChangesLeft + colorChangesRight
var = totalChanges
End Sub
Sub RotateLeftRingClockwise()
Dim temp As Integer = leftRing(1)
' Rotates Rings clockwise direction
For i = 1 To 20
If (i = 20) Then
leftRing(i) = temp
Else
leftRing(i) = leftRing(i + 1)
End If
Next
End Sub
From what I can see, it will stay the same.
You basically have a circular buffer of values (due to your special handle when i = 20) and you count the number of times the value changes from index to index.
That number is not going to change if you simply rotate the values through the circular buffer.
Take for example, the array {1, 2, 2, 2, 3, 4}. There are four transitions there, 1-2, 2-3, 3-4 and 4-1.
If you then rotate the list clockwise, you get {4, 1, 2, 2, 2, 3} and the transitions are 4-1, 1-2, 2-3 and 3-4.
In other words, the order of the transitions may change, but the quantity does not.
Based on your addition of the RotateLeftRingClockwise code, this is where your problem lies.
Because it simply rotates the left ring, you're not handling the intersection points of the Hungarian rings correctly. Rotating the left ring would, as well as moving the values around the left ring, change two of the array elements in the right ring (and vice versa), as per the diagram below:
array element 1 (in a 32-element array)
|
+---+---+
| |
V V
LLL RRR
LL LL RR RR
L * R <- shared cell is left(29)/right(5)
L R L R
L R L R
L R L R
L R L R
L R L R
L R L R
L R L R
L * R <- shared cell is left(21)/right(13)
LL LL RR RR
\ LLL RRR /
\ /
\----->>>-----/
Direction of
increasing
array index
Because your rotate code does not do this, the transition count of both rings will remain identical and hence the sum of them will not change.
For example, say the top-middle cells are array element 1 (as shown) and it increases in a anti-clockwise direction (as your code seems to indicate from the fact that a clockwise rotation shifts from ring[i+1] to ring[i]).
What your code needs to do is first rotate the left ring and then force the new values for the * cells into the right ring, something like:
Sub RotateLeftRingClockwise()
Dim temp As Integer = leftRing(1)
For i = 1 To 19
leftRing(i) = leftRing(i + 1)
Next
leftRing(20) = temp
rightRing( 5) = leftRing(29)
rightRing(13) = leftRing(21)
End Sub
Those array indexes in the last two lines are specific to my diagram above (32 cells rather than 20), you'll need to adjust them for your own puzzle.
If you make those changes (and similar ones to your other three rotation functions), you should find that the function starts returning different values. Specifically, rotating the left ring will leave the left value unchanged but should change the right value as different balls come into the intersection point.
By fuzzy matching I don't mean similar strings by Levenshtein distance or something similar, but the way it's used in TextMate/Ido/Icicles: given a list of strings, find those which include all characters in the search string, but possibly with other characters between, preferring the best fit.
I've finally understood what you were looking for. The issue is interesting however looking at the 2 algorithms you found it seems that people have widely different opinions about the specifications ;)
I think it would be useful to state the problem and the requirements more clearly.
Problem:
We are looking for a way to speed up typing by allowing users to only type a few letters of the keyword they actually intended and propose them a list from which to select.
It is expected that all the letters of the input be in the keyword
It is expected that the letters in the input be in the same order in the keyword
The list of keywords returned should be presented in a consistent (reproductible) order
The algorithm should be case insensitive
Analysis:
The first two requirements can be sum up like such: for an input axg we are looking for words matching this regular expression [^a]*a[^x]*x[^g]*g.*
The third requirement is purposely loose. The order in which the words should appear in the list need being consistent... however it's difficult to guess whether a scoring approach would be better than alphabetical order. If the list is extremy long, then a scoring approach could be better, however for short list it's easier for the eye to look for a particular item down a list sorted in an obvious manner.
Also, the alphabetical order has the advantage of consistency during typing: ie adding a letter does not completely reorder the list (painful for the eye and brain), it merely filters out the items that do not match any longer.
There is no precision about handling unicode characters, for example is à similar to a or another character altogether ? Since I know of no language that currently uses such characters in their keywords, I'll let it slip for now.
My solution:
For any input, I would build the regular expression expressed earlier. It suitable for Python because the language already features case-insensitive matching.
I would then match my (alphabetically sorted) list of keywords, and output it so filtered.
In pseudo-code:
WORDS = ['Bar', 'Foo', 'FooBar', 'Other']
def GetList(input, words = WORDS):
expr = ['[^' + i + ']*' + i for i in input]
return [w for w in words if re.match(expr, w, re.IGNORECASE)]
I could have used a one-liner but thought it would obscure the code ;)
This solution works very well for incremental situations (ie, when you match as the user type and thus keep rebuilding) because when the user adds a character you can simply refilter the result you just computed. Thus:
Either there are few characters, thus the matching is quick and the length of the list does not matter much
Either there are a lots of characters, and this means we are filtering a short list, thus it does not matter too much if the matching takes a bit longer element-wise
I should also note that this regular expression does not involve back-tracking and is thus quite efficient. It could also be modeled as a simple state machine.
Levenshtein 'Edit Distance' algorithms will definitely work on what you're trying to do: they will give you a measurement of how closely two words or addresses or phone numbers, psalms, monologues and scholarly articles match each other, allowing you you rank the results and choose the best match.
A more lightweight appproach is to count up the common substrings: it's not as good as Levenshtein, but it provides usable results and runs quickly in slow languages which have access to fast 'InString' functions.
I published an Excel 'Fuzzy Lookup' in Excellerando a few years ago, using 'FuzzyMatchScore' function that is, as far as I can tell, exactly what you need:
http://excellerando.blogspot.com/2010/03/vlookup-with-fuzzy-matching-to-get.html
It is, of course, in Visual Basic for Applications. Proceed with caution, crucifixes and garlic:
Public Function SumOfCommonStrings( _
ByVal s1 As String, _
ByVal s2 As String, _
Optional Compare As VBA.VbCompareMethod = vbTextCompare, _
Optional iScore As Integer = 0 _
) As Integer
Application.Volatile False
' N.Heffernan 06 June 2006
' THIS CODE IS IN THE PUBLIC DOMAIN
' Function to measure how much of String 1 is made up of substrings found in String 2
' This function uses a modified Longest Common String algorithm.
' Simple LCS algorithms are unduly sensitive to single-letter
' deletions/changes near the midpoint of the test words, eg:
' Wednesday is obviously closer to WedXesday on an edit-distance
' basis than it is to WednesXXX. So it would be better to score
' the 'Wed' as well as the 'esday' and add up the total matched
' Watch out for strings of differing lengths:
'
' SumOfCommonStrings("Wednesday", "WednesXXXday")
'
' This scores the same as:
'
' SumOfCommonStrings("Wednesday", "Wednesday")
'
' So make sure the calling function uses the length of the longest
' string when calculating the degree of similarity from this score.
' This is coded for clarity, not for performance.
Dim arr() As Integer ' Scoring matrix
Dim n As Integer ' length of s1
Dim m As Integer ' length of s2
Dim i As Integer ' start position in s1
Dim j As Integer ' start position in s2
Dim subs1 As String ' a substring of s1
Dim len1 As Integer ' length of subs1
Dim sBefore1 ' documented in the code
Dim sBefore2
Dim sAfter1
Dim sAfter2
Dim s3 As String
SumOfCommonStrings = iScore
n = Len(s1)
m = Len(s2)
If s1 = s2 Then
SumOfCommonStrings = n
Exit Function
End If
If n = 0 Or m = 0 Then
Exit Function
End If
's1 should always be the shorter of the two strings:
If n > m Then
s3 = s2
s2 = s1
s1 = s3
n = Len(s1)
m = Len(s2)
End If
n = Len(s1)
m = Len(s2)
' Special case: s1 is n exact substring of s2
If InStr(1, s2, s1, Compare) Then
SumOfCommonStrings = n
Exit Function
End If
For len1 = n To 1 Step -1
For i = 1 To n - len1 + 1
subs1 = Mid(s1, i, len1)
j = 0
j = InStr(1, s2, subs1, Compare)
If j > 0 Then
' We've found a matching substring...
iScore = iScore + len1
' Now clip out this substring from s1 and s2...
' And search the fragments before and after this excision:
If i > 1 And j > 1 Then
sBefore1 = left(s1, i - 1)
sBefore2 = left(s2, j - 1)
iScore = SumOfCommonStrings(sBefore1, _
sBefore2, _
Compare, _
iScore)
End If
If i + len1 < n And j + len1 < m Then
sAfter1 = right(s1, n + 1 - i - len1)
sAfter2 = right(s2, m + 1 - j - len1)
iScore = SumOfCommonStrings(sAfter1, _
sAfter2, _
Compare, _
iScore)
End If
SumOfCommonStrings = iScore
Exit Function
End If
Next
Next
End Function
Private Function Minimum(ByVal a As Integer, _
ByVal b As Integer, _
ByVal c As Integer) As Integer
Dim min As Integer
min = a
If b < min Then
min = b
End If
If c < min Then
min = c
End If
Minimum = min
End Function
Two algorithms I've found so far:
LiquidMetal
Better Ido Flex-Matching
I'm actually building something similar to Vim's Command-T and ctrlp plugins for Emacs, just for fun. I have just had a productive discussion with some clever workmates about ways to do this most efficiently. The goal is to reduce the number of operations needed to eliminate files that don't match. So we create a nested map, where at the top-level each key is a character that appears somewhere in the search set, mapping to the indices of all the strings in the search set. Each of those indices then maps to a list of character offsets at which that particular character appears in the search string.
In pseudo code, for the strings:
controller
model
view
We'd build a map like this:
{
"c" => {
0 => [0]
},
"o" => {
0 => [1, 5],
1 => [1]
},
"n" => {
0 => [2]
},
"t" => {
0 => [3]
},
"r" => {
0 => [4, 9]
},
"l" => {
0 => [6, 7],
1 => [4]
},
"e" => {
0 => [9],
1 => [3],
2 => [2]
},
"m" => {
1 => [0]
},
"d" => {
1 => [2]
},
"v" => {
2 => [0]
},
"i" => {
2 => [1]
},
"w" => {
2 => [3]
}
}
So now you have a mapping like this:
{
character-1 => {
word-index-1 => [occurrence-1, occurrence-2, occurrence-n, ...],
word-index-n => [ ... ],
...
},
character-n => {
...
},
...
}
Now searching for the string "oe":
Initialize a new map where the keys will be the indices of strings that match, and the values the offset read through that string so far.
Consume the first char from the search string "o" and look it up in the lookup table.
Since strings at indices 0 and 1 match the "o", put them into the map {0 => 1, 1 => 1}.
Now search consume the next char in the input string, "e" and loo it up in the table.
Here 3 strings match, but we know that we only care about strings 0 and 1.
Check if there are any offsets > the current offsets. If not, eliminate the items from our map, otherwise update the offset: {0 => 9, 1 => 3}.
Now by looking at the keys in our map that we've accumulated, we know which strings matched the fuzzy search.
Ideally, if the search is being performed as the user types, you'll keep track of the accumulated hash of results and pass it back into your search function. I think this will be a lot faster than iterating all search strings and performing a full wildcard search on each one.
The interesting thing about this is that you could also efficient store the Levenstein Distance along with each match, assuming you only care about insertions, not substitutions or deletions. Though perhaps it's not hard to get that logic added too.
I recently had to solve the same problem. My solution involves scoring strings with consecutively matched letters highly and excluding strings that don't contain the typed letters in order.
I've documented the algorithm in detail here: http://blog.kazade.co.uk/2014/10/a-fuzzy-filename-matching-algorithm.html
If your text is predominantly English then you may try your hand at various Soundex algorithms
1. Classic soundex
2. Metafone
These algorithms will let you choose words which sound like each other and will be a good way to find misspelled words.
So I'm creating a Activation class for a VB6 project and I've run into a brain fart. I've designed how I want to generate the Serial Number for this particular product in a following way.
XXXX-XXXX-XXXX-XXXX
Each group of numbers would be representative of data that I can read if I'm aware of the matching document that allows me to understand the codes with the group of digits. So for instance the first group may represent the month that the product was sold to a customer. But I can't have all the serial numbers in January all start with the same four digits so there's some internal math that needs to be done to calculate this value. What I've landed on is this:
A B C D = digits in the first group of the serial number
(A + B) - (C + D) = #
Now # would relate to a table of Hex values that would then represent the month the product was sold. Something like...
1 - January
2 - February
3 - March
....
B - November
C - December
My question lies here - if I know I need the total to equal B(11) then how exactly can I code backwards to generate (A + B) - (C + D) = B(11)?? It's a pretty simple equation, I know - but something I've just ran into and can't seem to get started in the right direction. I'm not asking for a full work-up of code but just a push. If you have a full solution available and want to share I'm always open to learning a bit more.
I am coding in VB6 but VB.NET, C#, C++ solutions could work as well since I can just port those over relatively easily. The community help is always greatly appreciated!
There's no single solution (you have one equation with four variables). You have to pick some random numbers. Here's one that works (in Python, but you get the point):
from random import randint
X = 11 # the one you're looking for
A_plus_B = randint(X, 30)
A = randint(max(A_plus_B - 15, 0), min(A_plus_B, 15))
B = A_plus_B - A
C_plus_D = A_plus_B - X
C = randint(max(C_plus_D - 15, 0), min(C_plus_D, 15))
D = C_plus_D - C
I assume you allow hexadecimal digits; if you just want 0 to 9, replace 15 by 9 and 30 by 18.
OK - pen and paper is always the solution... so here goes...
Attempting to find what values should be for (A + B) - (C + D) to equal a certain number called X. First I know that I want HEX values so that limits me to 0-F or 0-15. From there I need a better starting place so I'll generate a random number that will represent the total of (A + B), we'll call this Y, but not be lower than value X. Then subtract from that number Y value of X to determine that value that will represent (C + D), which we'll call Z. Use similar logic to break down Y and Z into two numbers each that can represent (A + B) = Y and (C + D) = Z. After it's all said and done I should have a good randomization of creating 4 numbers that when plugged into my equation will return a suitable result.
Just had to get past the brain fart.
This may seem a little hackish, and it may not take you where you're trying to go. However it should produce a wider range of values for your key strings:
Option Explicit
Private Function MonthString(ByVal MonthNum As Integer) As String
'MonthNum: January=1, ... December=12. Altered to base 0
'value for use internally.
Dim lngdigits As Long
MonthNum = MonthNum - 1
lngdigits = (Rnd() * &H10000) - MonthNum
MonthString = Right$("000" & Hex$(lngdigits + (MonthNum - lngdigits Mod 12)), 4)
End Function
Private Function MonthRecov(ByVal MonthString As String) As Integer
'Value returned is base 1, i.e. 1=January.
MonthRecov = CInt(CLng("&H" & MonthString) Mod 12) + 1
End Function
Private Sub Form_Load()
Dim intMonth As Integer
Dim strMonth As String
Dim intMonthRecov As Integer
Dim J As Integer
Randomize
For intMonth = 1 To 12
For J = 1 To 2
strMonth = MonthString(intMonth)
intMonthRecov = MonthRecov(strMonth)
Debug.Print intMonth, strMonth, intMonthRecov, Hex$(intMonthRecov)
Next
Next
End Sub
Not sure how best to explain it, other than using an example...
Imagine having a client with 10 outstanding invoices, and one day they provide you with a cheque, but do not tell you which invoices it's for.
What would be the best way to return all the possible combination of values which can produce the required total?
My current thinking is a kind of brute force method, which involves using a self-calling function that runs though all the possibilities (see current version).
For example, with 3 numbers, there are 15 ways to add them together:
A
A + B
A + B + C
A + C
A + C + B
B
B + A
B + A + C
B + C
B + C + A
C
C + A
C + A + B
C + B
C + B + A
Which, if you remove the duplicates, give you 7 unique ways to add them together:
A
A + B
A + B + C
A + C
B
B + C
C
However, this kind of falls apart after you have:
15 numbers (32,767 possibilities / ~2 seconds to calculate)
16 numbers (65,535 possibilities / ~6 seconds to calculate)
17 numbers (131,071 possibilities / ~9 seconds to calculate)
18 numbers (262,143 possibilities / ~20 seconds to calculate)
Where, I would like this function to handle at least 100 numbers.
So, any ideas on how to improve it? (in any language)
This is a pretty common variation of the subset sum problem, and it is indeed quite hard. The section on the Pseudo-polynomial time dynamic programming solution on the page linked is what you're after.
This is strictly for the number of possibilities and does not consider overlap. I am unsure what you want.
Consider the states that any single value could be at one time - it could either be included or excluded. That is two different states so the number of different states for all n items will be 2^n. However there is one state that is not wanted; that state is when none of the numbers are included.
And thus, for any n numbers, the number of combinations is equal to 2^n-1.
def setNumbers(n): return 2**n-1
print(setNumbers(15))
These findings are very closely related to combinations and permutations.
Instead, though, I think you may be after telling whether given a set of values any combination of them sum to a value k. For this Bill the Lizard pointed you in the right direction.
Following from that, and bearing in mind I haven't read the whole Wikipedia article, I propose this algorithm in Python:
def combs(arr):
r = set()
for i in range(len(arr)):
v = arr[i]
new = set()
new.add(v)
for a in r: new.add(a+v)
r |= new
return r
def subsetSum(arr, val):
middle = len(arr)//2
seta = combs(arr[:middle])
setb = combs(arr[middle:])
for a in seta:
if (val-a) in setb:
return True
return False
print(subsetSum([2, 3, 5, 8, 9], 8))
Basically the algorithm works as this:
Splits the list into 2 lists of approximately half the length. [O(n)]
Finds the set of subset sums. [O(2n/2 n)]
Loops through the first set of up to 2floor(n/2)-1 values seeing if the another value in the second set would total to k. [O(2n/2 n)]
So I think overall it runs in O(2n/2 n) - still pretty slow but much better.
Sounds like a bin packing problem. Those are NP-complete, i.e. it's nearly impossible to find a perfect solution for large problem sets. But you can get pretty close using heuristics, which are probably applicable to your problem even if it's not strictly a bin packing problem.
This is a variant on a similar problem.
But you can solve this by creating a counter with n bits. Where n is the amount of numbers. Then you count from 000 to 111 (n 1's) and for each number a 1 is equivalent to an available number:
001 = A
010 = B
011 = A+B
100 = C
101 = A+C
110 = B+C
111 = A+B+C
(But that was not the question, ah well I leave it as a target).
It's not strictly a bin packing problem. It's a what combination of values could have produced another value.
It's more like the change making problem, which has a bunch of papers detailing how to solve it. Google pointed me here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.3243
I don't know how often it would work in practice as there are many exceptions to this oversimplified case, but here's a thought:
In a perfect world, the invoices are going to be paid up to a certain point. People will pay A, or A+B, or A+B+C, but not A+C - if they've received invoice C then they've received invoice B already. In the perfect world the problem is not to find to a combination, it's to find a point along a line.
Rather than brute forcing every combination of invoice totals, you could iterate through the outstanding invoices in order of date issued, and simply add each invoice amount to a running total which you compare with the target figure.
Back in the real world, it's a trivially quick check you can do before launching into the heavy number-crunching, or chasing them up. Any hits it gets are a bonus :)
Here is an optimized Object-Oriented version of the exact integer solution to the Subset Sums problem(Horowitz, Sahni 1974). On my laptop (which is nothing special) this vb.net Class solves 1900 subset sums a second (for 20 items):
Option Explicit On
Public Class SubsetSum
'Class to solve exact integer Subset Sum problems'
''
' 06-sep-09 RBarryYoung Created.'
Dim Power2() As Integer = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32764}
Public ForceMatch As Boolean
Public watch As New Stopwatch
Public w0 As Integer, w1 As Integer, w1a As Integer, w2 As Integer, w3 As Integer, w4 As Integer
Public Function SolveMany(ByVal ItemCount As Integer, ByVal Range As Integer, ByVal Iterations As Integer) As Integer
' Solve many subset sum problems in sequence.'
''
' 06-sep-09 RBarryYoung Created.'
Dim TotalFound As Integer
Dim Items() As Integer
ReDim Items(ItemCount - 1)
'First create our list of selectable items:'
Randomize()
For item As Integer = 0 To Items.GetUpperBound(0)
Items(item) = Rnd() * Range
Next
For iteration As Integer = 1 To Iterations
Dim TargetSum As Integer
If ForceMatch Then
'Use a random value but make sure that it can be matched:'
' First, make a random bitmask to use:'
Dim bits As Integer = Rnd() * (2 ^ (Items.GetUpperBound(0) + 1) - 1)
' Now enumerate the bits and match them to the Items:'
Dim sum As Integer = 0
For b As Integer = 0 To Items.GetUpperBound(0)
'build the sum from the corresponding items:'
If b < 16 Then
If Power2(b) = (bits And Power2(b)) Then
sum = sum + Items(b)
End If
Else
If Power2(b - 15) * Power2(15) = (bits And (Power2(b - 15) * Power2(15))) Then
sum = sum + Items(b)
End If
End If
Next
TargetSum = sum
Else
'Use a completely random Target Sum (low chance of matching): (Range / 2^ItemCount)'
TargetSum = ((Rnd() * Range / 4) + Range * (3.0 / 8.0)) * ItemCount
End If
'Now see if there is a match'
If SolveOne(TargetSum, ItemCount, Range, Items) Then TotalFound += 1
Next
Return TotalFound
End Function
Public Function SolveOne(ByVal TargetSum As Integer, ByVal ItemCount As Integer _
, ByVal Range As Integer, ByRef Items() As Integer) As Boolean
' Solve a single Subset Sum problem: determine if the TargetSum can be made from'
'the integer items.'
'first split the items into two half-lists: [O(n)]'
Dim H1() As Integer, H2() As Integer
Dim hu1 As Integer, hu2 As Integer
If ItemCount Mod 2 = 0 Then
'even is easy:'
hu1 = (ItemCount / 2) - 1 : hu2 = (ItemCount / 2) - 1
ReDim H1((ItemCount / 2) - 1), H2((ItemCount / 2) - 1)
Else
'odd is a little harder, give the first half the extra item:'
hu1 = ((ItemCount + 1) / 2) - 1 : hu2 = ((ItemCount - 1) / 2) - 1
ReDim H1(hu1), H2(hu2)
End If
For i As Integer = 0 To ItemCount - 1 Step 2
H1(i / 2) = Items(i)
'make sure that H2 doesnt run over on the last item of an odd-numbered list:'
If (i + 1) <= ItemCount - 1 Then
H2(i / 2) = Items(i + 1)
End If
Next
'Now generate all of the sums for each half-list: [O( 2^(n/2) * n )] **(this is the slowest step)'
Dim S1() As Integer, S2() As Integer
Dim sum1 As Integer, sum2 As Integer
Dim su1 As Integer = 2 ^ (hu1 + 1) - 1, su2 As Integer = 2 ^ (hu2 + 1) - 1
ReDim S1(su1), S2(su2)
For i As Integer = 0 To su1
' Use the binary bitmask of our enumerator(i) to select items to use in our candidate sums:'
sum1 = 0 : sum2 = 0
For b As Integer = 0 To hu1
If 0 < (i And Power2(b)) Then
sum1 += H1(b)
If i <= su2 Then sum2 += H2(b)
End If
Next
S1(i) = sum1
If i <= su2 Then S2(i) = sum2
Next
'Sort both lists: [O( 2^(n/2) * n )] **(this is the 2nd slowest step)'
Array.Sort(S1)
Array.Sort(S2)
' Start the first half-sums from lowest to highest,'
'and the second half sums from highest to lowest.'
Dim i1 As Integer = 0, i2 As Integer = su2
' Now do a merge-match on the lists (but reversing S2) and looking to '
'match their sum to the target sum: [O( 2^(n/2) )]'
Dim sum As Integer
Do While i1 <= su1 And i2 >= 0
sum = S1(i1) + S2(i2)
If sum < TargetSum Then
'if the Sum is too low, then we need to increase the ascending side (S1):'
i1 += 1
ElseIf sum > TargetSum Then
'if the Sum is too high, then we need to decrease the descending side (S2):'
i2 -= 1
Else
'Sums match:'
Return True
End If
Loop
'if we got here, then there are no matches to the TargetSum'
Return False
End Function
End Class
Here is the Forms code to go along with it:
Public Class frmSubsetSum
Dim ssm As New SubsetSum
Private Sub btnGo_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnGo.Click
Dim Total As Integer
Dim datStart As Date, datEnd As Date
Dim Iterations As Integer, Range As Integer, NumberCount As Integer
Iterations = CInt(txtIterations.Text)
Range = CInt(txtRange.Text)
NumberCount = CInt(txtNumberCount.Text)
ssm.ForceMatch = chkForceMatch.Checked
datStart = Now
Total = ssm.SolveMany(NumberCount, Range, Iterations)
datEnd = Now()
lblStart.Text = datStart.TimeOfDay.ToString
lblEnd.Text = datEnd.TimeOfDay.ToString
lblRate.Text = Format(Iterations / (datEnd - datStart).TotalMilliseconds * 1000, "####0.0")
ListBox1.Items.Insert(0, "Found " & Total.ToString & " Matches out of " & Iterations.ToString & " tries.")
ListBox1.Items.Insert(1, "Tics 0:" & ssm.w0 _
& " 1:" & Format(ssm.w1 - ssm.w0, "###,###,##0") _
& " 1a:" & Format(ssm.w1a - ssm.w1, "###,###,##0") _
& " 2:" & Format(ssm.w2 - ssm.w1a, "###,###,##0") _
& " 3:" & Format(ssm.w3 - ssm.w2, "###,###,##0") _
& " 4:" & Format(ssm.w4 - ssm.w3, "###,###,##0") _
& ", tics/sec = " & Stopwatch.Frequency)
End Sub
End Class
Let me know if you have any questions.
For the record, here is some fairly simple Java code that uses recursion to solve this problem. It is optimised for simplicity rather than performance, although with 100 elements it seems to be quite fast. With 1000 elements it takes dramatically longer, so if you are processing larger amounts of data you could better use a more sophisticated algorithm.
public static List<Double> getMatchingAmounts(Double goal, List<Double> amounts) {
List<Double> remaining = new ArrayList<Double>(amounts);
for (final Double amount : amounts) {
if (amount > goal) {
continue;
} else if (amount.equals(goal)) {
return new ArrayList<Double>(){{ add(amount); }};
}
remaining.remove(amount);
List<Double> matchingAmounts = getMatchingAmounts(goal - amount, remaining);
if (matchingAmounts != null) {
matchingAmounts.add(amount);
return matchingAmounts;
}
}
return null;
}