Algorithm to align numerical sequences - algorithm

Hi
I have two sequence of numerical data let's say :
S1 : 1,6,4,9,8,7,5 and S2 : 6,9,7,5
And i'd like to find a sequence alignment in both sense left-right and right-left.
So i used 2 techniques before asking i actually used the hungarian algorithm but it's not sequencial so it doesn't give good results And i used a modified version of the Needleman–Wunsch algorithm but i think i'm maybe doing it wrong or something and i've been digging for at least 4 months for anything that could help me but i only find genetic algorithms which may be helpful but i was wondering if there's a algorithm that exists that i may haven't seen yet ?
So to formalise my question : How would you align two positive numerical (integer or double) sequences ?

I believe you can accomplish your objective with the following:
import string
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
seq1 = "1649875"
seq2 = "6975"
numDict = {}
for x in range(0,10):
for y in range(0,10):
numDict[(str(x),str(y))] = -abs(x-y)
#print(numDict)
for a in pairwise2.align.globalds(seq1, seq2, numDict, -3, -1):
print(format_alignment(*a)) #prints alignment with best score
#for a in pairwise2.align.globalms(seq1, seq2, 5, -5, -3, -1):
print(format_alignment(*a))
The globalds alignment allows you to use a custom dictionary (in this case, I created a dictionary containing numbers ranging from 1-9 and found the absolute value of their difference when paired). If you just want a flat yes/no scoring system, you could do something like globalms, where a success is +5 and a failure is -5. Note, I advise using gap penalties when performing alignments. Also familiarize yourself with 'global' and 'local' alignments. More information on the Pairwise2 biopython module can be found here: http://biopython.org/DIST/docs/api/Bio.pairwise2-module.html

Related

Algorithm for Cutting Patterns

Let's say I have a given length c and I need to cut out several pieces of different lengths a{i}, where i is the index of a specific piece. The length of every piece is smaller or equal to the length c. I need to find all possible permutations of cutting patterns.
Does someone has a smart approach for such tasks or an algorithm to solve this?
The function could look something similar to this:
Pattern[] getPatternList(double.. a, double c);
The input is hence a list of different sizes and the total available space. My goal is to optimize/minimize the trim loss.
I'll use the simplex algorithm for that but to create an linear programming model, I need a smart way to determine all the cutting patterns.
There are exponentially many cutting-patterns in general. So it might not be feasible to construct them all (time and memory)
If you need to optimize some cutting based on some objective, enumerating all possible cuttings is a bad approach (like #harold mentioned)
A bad analogy (which does not exactly apply here as your base-problem is np-hard):
solving 2-SAT is possible in polynomial-time
enumerating all 2-SAT solutions is Sharp-P-complete (an efficient algorithm would imply P=NP, so there might be none!)
A simple approach (to generate all valid cutting-patterns):
Generate all permutations if items = ordering of items (bounded by !n)
Place them one after another and stop if c is exceeded
(It would be a good idea to do this incrementally; build one permutation after another)
Assumption: each item can only be selected once
Assumption: moving/shifting a cut within a free range does not generate a new solution. It it would: solution-space is possibly an uncountably infinite set
edit
Code
Here is a more powerful approach handling the problem with the same assumptions as described above. It uses integer-programming to minimize the trim-loss, implemented in python with the use of cvxpy (and a commercial-solver; can be replaced by an open-source solver like cbc):
import numpy as np
from cvxpy import *
np.random.seed(1)
# random problem
SPACE = 25000
N_ITEMS = 10000
items = np.random.randint(0, 10, size=N_ITEMS)
def minimize_loss(items, space):
N = items.shape[0]
X = Bool(N)
constraint = [sum_entries(mul_elemwise(items, X)) <= space]
objective = Minimize(space - sum_entries(mul_elemwise(items, X)))
problem = Problem(objective, constraint)
problem.solve(solver=GUROBI, verbose=True)
print('trim-loss: ', problem.value)
print('validated trim-loss: ', space - sum(np.dot(X.value.flatten(), items)))
print('# selected items: ', np.count_nonzero(np.round(X.value)))
print('items: ', items)
print('space: ', SPACE)
minimize_loss(items, SPACE)
Output
items: [5 8 9 ..., 5 3 5]
space: 25000
Parameter OutputFlag unchanged
Value: 1 Min: 0 Max: 1 Default: 1
Changed value of parameter QCPDual to 1
Prev: 0 Min: 0 Max: 1 Default: 0
Optimize a model with 1 rows, 10000 columns and 8987 nonzeros
Coefficient statistics:
Matrix range [1e+00, 9e+00]
Objective range [1e+00, 9e+00]
Bounds range [1e+00, 1e+00]
RHS range [2e+04, 2e+04]
Found heuristic solution: objective -25000
Presolve removed 1 rows and 10000 columns
Presolve time: 0.01s
Presolve: All rows and columns removed
Explored 0 nodes (0 simplex iterations) in 0.01 seconds
Thread count was 1 (of 4 available processors)
Optimal solution found (tolerance 1.00e-04)
Best objective -2.500000000000e+04, best bound -2.500000000000e+04, gap 0.0%
trim-loss: 0.0
validated trim-loss: [[ 0.]]
# selected items: 6516
edit v2
After read your new comments, it is clear, that your model-description was incomplete/imprecise and nothing above tackles the problem you want to solve. It's a bit sad.
You will need to enumerate all permutations of a, and then take the longest prefix that has length less than or equal to c.
This sounds like a version of the knapsack problem (https://en.wikipedia.org/wiki/Knapsack_problem), and nobody knows an efficient way to do this.

OCR'ed and real string similarity

The problem:
There is a set of word S = {W1,W2.. Wn} where n < 10. This set just exists, we do not know its content.
These words are drawn on some image and then recognized. The OCR algorytm is poor as well as dpi and as a result there are mistakes. So we have a second set of errorneous words S' = {W1',W2'..Wn'}
Now we have a word W that is a member of original set S. And now I need and algorythm which, given W and S', return index of the word in S'. most similar to W.
Example. S is {"alpha", "bravo", "charlie"}, S' is for example {"alPha","hravc","onarlio"} (these are real possible ocr erros).
So the target function should return F("alpha") => 0, F("bravo") => 1, F("charlie") => 2
I tried Levenshtein distance, but it does not work well, because it returns small numbers on small strings and OCRed string can be longer than original.
Example if W' is {'hornist','cornrnunist'} and the given word is 'communist' the Levenshtein distance is 4 for the both words, but the right one is second.
Any suggestions?
As a zero approach, I'd suggest you to use the modification of Levenshtein distance algorithm with conditional cost of replacing/deleting/adding characters:
Distance(i, j) = min(Distance(i-1, j-1) + replace_cost(a.charAt(i), b.charAt(j)),
Distance(i-1, j ) + insert_cost(b.charAt(j)),
Distance(i , j-1) + delete_cost(a.charAt(i)))
You can implement function replace_cost in such way, that it will returns small values for visually similar characters (and high values for visually different characters), e.g.:
// visually similar characters
replace_cost('o', '0') = 0.1
replace_cost('o', 'O') = 0.1
replace_cost('O', '0') = 0.1
...
// visually different characters
replace_cost('O', 'K') = 0.9
...
And the similar approach can be used for insert_cost and delete_cost (e.g. you may notice, that during the OCR - some characters are more likely to disappear than others).
Also, in case when approach from above is not enough for you, I'd suggest you to look at Noisy channel model - which is widely used for spelling correction (this subject described very well in Natural Language Processing course by Dan Jurafsky, Christopher Manning - "Week 2 - Spelling Correction").
This appears to be quite difficult to do because the misread strings are not necessarily textually similar to the input, which is why Levinshtein distance won't work for you. The words are visually corrupted, not simply mistyped. You could try creating a dataset of common errors (o => 0, l -> 1, e => o) and then do some sort of comparison based on that.
If you have access to the OCR algorithm, you could run that algorithm again on a much broader set of inputs (with known outputs) and train a neural network to recognize common errors. Then you could use that model to predict mistakes in your original dataset (maybe overkill for an array of only ten items).

String similarity score/hash

Is there a method to calculate something like general "similarity score" of a string? In a way that I am not comparing two strings together but rather I get some number (hash) for each string that can later tell me that two strings are or are not similar. Two similar strings should have similar (close) hashes.
Let's consider these strings and scores as an example:
Hello world 1000
Hello world! 1010
Hello earth 1125
Foo bar 3250
FooBarbar 3750
Foo Bar! 3300
Foo world! 2350
You can see that Hello world! and Hello world are similar and their scores are close to each other.
This way, finding the most similar strings to a given string would be done by subtracting given strings score from other scores and then sorting their absolute value.
I believe what you're looking for is called a Locality Sensitive Hash. Whereas most hash algorithms are designed such that small variations in input cause large changes in output, these hashes attempt the opposite: small changes in input generate proportionally small changes in output.
As others have mentioned, there are inherent issues with forcing a multi-dimensional mapping into a 2-dimensional mapping. It's analogous to creating a flat map of the Earth... you can never accurately represent a sphere on a flat surface. Best you can do is find a LSH that is optimized for whatever feature it is you're using to determine whether strings are "alike".
Levenstein distance or its derivatives is the algorithm you want.
Match given string to each of strings from dictionary.
(Here, if you need only fixed number of most similar strings, you may want to use min-heap.)
If running Levenstein distance for all strings in dictionary is too expensive, then use some rough
algorithm first that will exclude too distant words from list of candidates.
After that, run levenstein distance on left candidates.
One way to remove distant words is to index n-grams.
Preprocess dictionary by splitting each of words into list of n-grams.
For example, consider n=3:
(0) "Hello world" -> ["Hel", "ell", "llo", "lo ", "o w", " wo", "wor", "orl", "rld"]
(1) "FooBarbar" -> ["Foo", "ooB", "oBa", "Bar", "arb", "rba", "bar"]
(2) "Foo world!" -> ["Foo", "oo ", "o w", " wo", "wor", "orl", "rld", "ld!"]
Next, create index of n-gramms:
" wo" -> [0, 2]
"Bar" -> [1]
"Foo" -> [1, 2]
"Hel" -> [0]
"arb" -> [1]
"bar" -> [1]
"ell" -> [0]
"ld!" -> [2]
"llo" -> [0]
"lo " -> [0]
"o w" -> [0, 2]
"oBa" -> [1]
"oo " -> [2]
"ooB" -> [1]
"orl" -> [0, 2]
"rba" -> [1]
"rld" -> [0, 2]
"wor" -> [0, 2]
When you need to find most similar strings for given string, you split given string into n-grams and select only those
words from dictionary which have at least one matching n-gram.
This reduces number of candidates to reasonable amount and you may proceed with levenstein-matching given string to each of left candidates.
If your strings are long enough, you may reduce index size by using min-hashing technnique:
you calculate ordinary hash for each of n-grams and use only K smallest hashes, others are thrown away.
P.S. this presentation seems like a good introduction to your problem.
This isn't possible, in general, because the set of edit distances between strings forms a metric space, but not one with a fixed dimension. That means that you can't provide a mapping between strings and integers that preserves a distance measure between them.
For example, you cannot assign numbers to these three phrases:
one two
one six
two six
Such that the numbers reflect the difference between all three phrases.
While the idea seems extremely sweet... I've never heard of this.
I've read many, many, technics, thesis, and scientific papers on the subject of spell correction / typo correction and the fastest proposals revolve around an index and the levenshtein distance.
There are fairly elaborated technics, the one I am currently working on combines:
A Bursted Trie, with level compactness
A Levenshtein Automaton
Even though this doesn't mean it is "impossible" to get a score, I somehow think there would not be so much recent researches on string comparisons if such a "scoring" method had proved efficient.
If you ever find such a method, I am extremely interested :)
Would Levenshtein distance work for you?
In an unbounded problem, there is no solution which can convert any possible sequence of words, or any possible sequence of characters to a single number which describes locality.
Imagine similarity at the character level
stops
spots
hello world
world hello
In both examples the messages are different, but the characters in the message are identical, so the measure would need to hold a position value , as well as a character value. (char 0 == 'h', char 1 == 'e' ...)
Then compare the following similar messages
hello world
ello world
Although the two strings are similar, they could differ at the beginning, or at the end, which makes scaling by position problematic.
In the case of
spots
stops
The words only differ by position of the characters, so some form of position is important.
If the following strings are similar
yesssssssssssssss
yessssssssssssss
Then you have a form of paradox. If you add 2 s characters to the second string, it should share the distance it was from the first string, but it should be distinct. This can be repeated getting progressively longer strings, all of which need to be close to the strings just shorter and longer than them. I can't see how to achieve this.
In general this is treated as a multi-dimensional problem - breaking the string into a vector
[ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' ]
But the values of the vector can not be
represented by a fixed size number, or
give good quality difference measure.
If the number of words, or length of strings were bounded, then a solution of coding may be possible.
Bounded values
Using something like arithmetic compression, then a sequence of words can be converted into a floating point number which represents the sequence. However this would treat items earlier in the sequence as more significant than the last item in the sequence.
data mining solution
If you accept that the problem is high dimensional, then you can store your strings in a metric-tree wikipedia : metric tree. This would limit your search space, whilst not solving your "single number" solution.
I have code for such at github : clustering
Items which are close together, should be stored together in a part of the tree, but there is really no guarantee. The radius of subtrees is used to prune the search space.
Edit Distance or Levenshtein distance
This is used in a sqlite extension to perform similarity searching, but with no single number solution, it works out how many edits change one string into another. This then results in a score, which shows similarity.
I think of something like this:
remove all non-word characters
apply soundex
Your idea sounds like ontology but applied to whole phrases. The more similar two phrases are, the closer in the graph they are (assuming you're using weighted edges). And vice-versa: non similar phrases are very far from each other.
Another approach, is to use Fourier transform to get sort of the 'index' for a given string (it won't be a single number, but always). You may find little bit more in this paper.
And another idea, that bases on the Levenshtein distance: you may compare n-grams that will give you some similarity index for two given phrases - the more they are similar the value is closer to 1. This may be used to calculate distance in the graph. wrote a paper on this a few years ago, if you'd like I can share it.
Anyways: despite I don't know the exact solution, I'm also interested in what you'll came up with.
Maybe use PCA, where the matrix is a list of the differences between the string and a fixed alphabet (à la ABCDEFGHI...). The answer could be simply the length of the principal component.
Just an idea.
ready-to-run PCA in C#
It is unlikely one can get a rather small number from two phrases that, being compared, provide a relevant indication of the similarity of their initial phrases.
A reason is that the number gives an indication in one dimension, while phrases are evolving in two dimensions, length and intensity.
The number could evolve as well in length as in intensity but I'm not sure it'll help a lot.
In two dimensions, you better look at a matrix, which some properties like the determinant (a kind of derivative of the matrix) could give a rough idea of the phrase trend.
In Natural Language Processing we have a thing call Minimum Edit Distance (also known as Levenshtein Distance)
Its basically defined as the smallest amount of operation needed in order to transform string1 to string2
Operations included Insertion, Deletion, Subsitution, each operation is given a score to which you add to the distance
The idea to solve your problem is to calculate the MED from your chosen string, to all the other string, sort that collection and pick out the n-th first smallest distance string
For example:
{"Hello World", "Hello World!", "Hello Earth"}
Choosing base-string="Hello World"
Med(base-string, "Hello World!") = 1
Med(base-string, "Hello Earth") = 8
1st closest string is "Hello World!"
This have somewhat given a score to each string of your string-collection
C# Implementation (Add-1, Deletion-1, Subsitution-2)
public static int Distance(string s1, string s2)
{
int[,] matrix = new int[s1.Length + 1, s2.Length + 1];
for (int i = 0; i <= s1.Length; i++)
matrix[i, 0] = i;
for (int i = 0; i <= s2.Length; i++)
matrix[0, i] = i;
for (int i = 1; i <= s1.Length; i++)
{
for (int j = 1; j <= s2.Length; j++)
{
int value1 = matrix[i - 1, j] + 1;
int value2 = matrix[i, j - 1] + 1;
int value3 = matrix[i - 1, j - 1] + ((s1[i - 1] == s2[j - 1]) ? 0 : 2);
matrix[i, j] = Math.Min(value1, Math.Min(value2, value3));
}
}
return matrix[s1.Length, s2.Length];
}
Complexity O(n x m) where n, m is length of each string
More info on Minimum Edit Distance can be found here
Well, you could add up the ascii value of each character and then compare the scores, having a maximum value on which they can differ. This does not guarantee however that they will be similar, for the same reason two different strings can have the same hash value.
You could of course make a more complex function, starting by checking the size of the strings, and then comparing each caracter one by one, again with a maximum difference set up.

sorting algorithm where pairwise-comparison can return more information than -1, 0, +1

Most sort algorithms rely on a pairwise-comparison the determines whether A < B, A = B or A > B.
I'm looking for algorithms (and for bonus points, code in Python) that take advantage of a pairwise-comparison function that can distinguish a lot less from a little less or a lot more from a little more. So perhaps instead of returning {-1, 0, 1} the comparison function returns {-2, -1, 0, 1, 2} or {-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5} or even a real number on the interval (-1, 1).
For some applications (such as near sorting or approximate sorting) this would enable a reasonable sort to be determined with less comparisons.
The extra information can indeed be used to minimize the total number of comparisons. Calls to the super_comparison function can be used to make deductions equivalent to a great number of calls to a regular comparsion function. For example, a much-less-than b and c little-less-than b implies a < c < b.
The deductions cans be organized into bins or partitions which can each be sorted separately. Effectively, this is equivalent to QuickSort with n-way partition. Here's an implementation in Python:
from collections import defaultdict
from random import choice
def quicksort(seq, compare):
'Stable in-place sort using a 3-or-more-way comparison function'
# Make an n-way partition on a random pivot value
segments = defaultdict(list)
pivot = choice(seq)
for x in seq:
ranking = 0 if x is pivot else compare(x, pivot)
segments[ranking].append(x)
seq.clear()
# Recursively sort each segment and store it in the sequence
for ranking, segment in sorted(segments.items()):
if ranking and len(segment) > 1:
quicksort(segment, compare)
seq += segment
if __name__ == '__main__':
from random import randrange
from math import log10
def super_compare(a, b):
'Compare with extra logarithmic near/far information'
c = -1 if a < b else 1 if a > b else 0
return c * (int(log10(max(abs(a - b), 1.0))) + 1)
n = 10000
data = [randrange(4*n) for i in range(n)]
goal = sorted(data)
quicksort(data, super_compare)
print(data == goal)
By instrumenting this code with the trace module, it is possible to measure the performance gain. In the above code, a regular three-way compare uses 133,000 comparisons while a super comparison function reduces the number of calls to 85,000.
The code also makes it easy to experiment with a variety comparison functions. This will show that naïve n-way comparison functions do very little to help the sort. For example, if the comparison function returns +/-2 for differences greater than four and +/-1 for differences four or less, there is only a modest 5% reduction in the number of comparisons. The root cause is that the course grained partitions used in the beginning only have a handful of "near matches" and everything else falls in "far matches".
An improvement to the super comparison is to covers logarithmic ranges (i.e. +/-1 if within ten, +/-2 if within a hundred, +/- if within a thousand.
An ideal comparison function would be adaptive. For any given sequence size, the comparison function should strive to subdivide the sequence into partitions of roughly equal size. Information theory tells us that this will maximize the number of bits of information per comparison.
The adaptive approach makes good intuitive sense as well. People should first be partitioned into love vs like before making more refined distinctions such as love-a-lot vs love-a-little. Further partitioning passes should each make finer and finer distinctions.
You can use a modified quick sort. Let me explain on an example when you comparison function returns [-2, -1, 0, 1, 2]. Say, you have an array A to sort.
Create 5 empty arrays - Aminus2, Aminus1, A0, Aplus1, Aplus2.
Pick an arbitrary element of A, X.
For each element of the array, compare it with X.
Depending on the result, place the element in one of the Aminus2, Aminus1, A0, Aplus1, Aplus2 arrays.
Apply the same sort recursively to Aminus2, Aminus1, Aplus1, Aplus2 (note: you don't need to sort A0, as all he elements there are equal X).
Concatenate the arrays to get the final result: A = Aminus2 + Aminus1 + A0 + Aplus1 + Aplus2.
It seems like using raindog's modified quicksort would let you stream out results sooner and perhaps page into them faster.
Maybe those features are already available from a carefully-controlled qsort operation? I haven't thought much about it.
This also sounds kind of like radix sort except instead of looking at each digit (or other kind of bucket rule), you're making up buckets from the rich comparisons. I have a hard time thinking of a case where rich comparisons are available but digits (or something like them) aren't.
I can't think of any situation in which this would be really useful. Even if I could, I suspect the added CPU cycles needed to sort fuzzy values would be more than those "extra comparisons" you allude to. But I'll still offer a suggestion.
Consider this possibility (all strings use the 27 characters a-z and _):
11111111112
12345678901234567890
1/ now_is_the_time
2/ now_is_never
3/ now_we_have_to_go
4/ aaa
5/ ___
Obviously strings 1 and 2 are more similar that 1 and 3 and much more similar than 1 and 4.
One approach is to scale the difference value for each identical character position and use the first different character to set the last position.
Putting aside signs for the moment, comparing string 1 with 2, the differ in position 8 by 'n' - 't'. That's a difference of 6. In order to turn that into a single digit 1-9, we use the formula:
digit = ceiling(9 * abs(diff) / 27)
since the maximum difference is 26. The minimum difference of 1 becomes the digit 1. The maximum difference of 26 becomes the digit 9. Our difference of 6 becomes 3.
And because the difference is in position 8, out comparison function will return 3x10-8 (actually it will return the negative of that since string 1 comes after string 2.
Using a similar process for strings 1 and 4, the comparison function returns -5x10-1. The highest possible return (strings 4 and 5) has a difference in position 1 of '-' - 'a' (26) which generates the digit 9 and hence gives us 9x10-1.
Take these suggestions and use them as you see fit. I'd be interested in knowing how your fuzzy comparison code ends up working out.
Considering you are looking to order a number of items based on human comparison you might want to approach this problem like a sports tournament. You might allow each human vote to increase the score of the winner by 3 and decrease the looser by 3, +2 and -2, +1 and -1 or just 0 0 for a draw.
Then you just do a regular sort based on the scores.
Another alternative would be a single or double elimination tournament structure.
You can use two comparisons, to achieve this. Multiply the more important comparison by 2, and add them together.
Here is a example of what I mean in Perl.
It compares two array references by the first element, then by the second element.
use strict;
use warnings;
use 5.010;
my #array = (
[a => 2],
[b => 1],
[a => 1],
[c => 0]
);
say "$_->[0] => $_->[1]" for sort {
($a->[0] cmp $b->[0]) * 2 +
($a->[1] <=> $b->[1]);
} #array;
a => 1
a => 2
b => 1
c => 0
You could extend this to any number of comparisons very easily.
Perhaps there's a good reason to do this but I don't think it beats the alternatives for any given situation and certainly isn't good for general cases. The reason? Unless you know something about the domain of the input data and about the distribution of values you can't really improve over, say, quicksort. And if you do know those things, there are often ways that would be much more effective.
Anti-example: suppose your comparison returns a value of "huge difference" for numbers differing by more than 1000, and that the input is {0, 10000, 20000, 30000, ...}
Anti-example: same as above but with input {0, 10000, 10001, 10002, 20000, 20001, ...}
But, you say, I know my inputs don't look like that! Well, in that case tell us what your inputs really look like, in detail. Then someone might be able to really help.
For instance, once I needed to sort historical data. The data was kept sorted. When new data were added it was appended, then the list was run again. I did not have the information of where the new data was appended. I designed a hybrid sort for this situation that handily beat qsort and others by picking a sort that was quick on already sorted data and tweaking it to be fast (essentially switching to qsort) when it encountered unsorted data.
The only way you're going to improve over the general purpose sorts is to know your data. And if you want answers you're going to have to communicate that here very well.

What is a good non-recursive algorithm for deciding whether a passed in amount can be built additively from a set of numbers?

What is a non recursive algorithm for deciding whether a passed in amount can be built additively from a set of numbers.
In my case I'm determining whether a certain currency amount (such as $40) can be met by adding up some combination of a set of bills (such as $5, $10 and $20 bills). That is a simple example, but the algorithm needs to work for any currency set (some currencies use funky bill amounts and some bills may not be available at a given time).
So $50 can be met with a set of ($20 and $30), but cannot be met with a set of ($20 and $40). The non-recursive requirement is due to the target code base being for SQL Server 2000 where the support of recursion is limited.
In addition this is for supporting a multi currency environment where the set of bills available may change (think a foreign currency exchange teller for example).
You have twice stated that the algorithm cannot be recursive, yet that is the natural solution to this problem. One way or another, you will need to perform a search to solve this problem. If recursion is out, you will need to backtrack manually.
Pick the largest currency value below the target value. If it's match, you're done. If not, push the current target value on a stack and subtract from the target value the picked currency value. Keep doing this until you find a match or there are no more currency values left. Then use the stack to backtrack and pick a different value.
Basically, it's the recursive solution inside a loop with a manually managed stack.
If you treat each denomination as a point on a base-n number, where n is the maximum number of notes you would need, then you can increment through that number until you've exhausted the problem space or found a solution.
The maximum number of notes you would need is the Total you require divided by the lowest denomination note.
It's a brute force response to the problem, but it'll definitely work.
Here's some p-code. I'm probably all over the place with my fence posts, and it's so unoptimized to be ridiculous, but it should work. I think the idea's right anyway.
Denominations = [10,20,50,100]
Required = 570
Denominations = sort(Denominations)
iBase = integer (Required / Denominations[1])
BumpList = array [Denominations.count]
BumpList.Clear
repeat
iTotal = 0
for iAdd = 1 to Bumplist.size
iTotal = iTotal + bumplist [iAdd] * Denominations[iAdd]
loop
if iTotal = Required then exit true
//this bit should be like a mileometer.
//We add 1 to each wheel, and trip over to the next wheel when it gets to iBase
finished = true
for iPos from bumplist.last to bumplist.first
if bumplist[iPos] = (iBase-1) then bumplist[iPos] = 0
else begin
finished = false
bumplist[iPos] = bumplist[iPos]+1
exit for
end
loop
until (finished)
exit false
That's a problem that can be solved by an approach known as dynamic programming. The lecture notes I have are too focused on bioinformatics, unfortunately, so you'll have to google for it yourself.
This sounds like the subset sum problem, which is known to be NP-complete.
Good luck with that.
Edit: If you're allowed arbitrary number of bills/coins of some denomination (as opposed to just one), then it's a different problem, and is easier. See the coin problem. I realized this when reading another answer to a (suspiciously) similar question.
I agree with Tyler - what you are describing is a variant of the Subset Sum problem which is known to be NP-Complete. In this case you are a bit lucky as you are working with a limited set of values so you can use dynamic programming techniques here to optimize the problem a bit. In terms of some general ideas for the code:
Since you are dealing with money, there are only so many ways to make change with a given bill and in most cases some bills are used more often than others. So if you store the results you can keep a set of the most common solutions and then just check them before you try and find the actual solution.
Unless the language you are working with doesn't support recursion there is no reason to completely ignore the use of recursion in the solution. While any recursive problem can be solved using iteration, this is a case where recursion is likely going to be easier to write.
Some of the other users such as Kyle and seanyboy point you in the right direction for writing your own function so you should take a look at what they have provided for what you are working on.
You can deal with this problem with Dynamic Programming method as MattW. mentioned.
Given limited number of bills and maximum amount of money, you can try the following solution. The code snippet is in C# but I believe you can port it to other language easily.
// Set of bills
int[] unit = { 40,20,70};
// Max amount of money
int max = 100000;
bool[] bucket = new bool[max];
foreach (int t in unit)
bucket[t] = true;
for (int i = 0; i < bucket.Length; i++)
if (bucket[i])
foreach (int t in unit)
if(i + t < bucket.Length)
bucket[i + t] = true;
// Check if the following amount of money
// can be built additively
Console.WriteLine("15 : " + bucket[15]);
Console.WriteLine("50 : " + bucket[50]);
Console.WriteLine("60 : " + bucket[60]);
Console.WriteLine("110 : " + bucket[110]);
Console.WriteLine("120 : " + bucket[120]);
Console.WriteLine("150 : " + bucket[150]);
Console.WriteLine("151 : " + bucket[151]);
Output:
15 : False
50 : False
60 : True
110 : True
120 : True
150 : True
151 : False
There's a difference between no recursion and limited recursion. Don't confuse the two as you will have missed the point of your lesson.
For example, you can safely write a factorial function using recursion in C++ or other low level languages because your results will overflow even your biggest number containers within but a few recursions. So the problem you will face will be that of storing the result before it ever gets to blowing your stack due to recursion.
This said, whatever solution you find - and I haven't even bothered understanding your problem deeply as I see that others have already done that - you will have to study the behaviour of your algorithm and you can determine what is the worst case scenario depth of your stack.
You don't need to avoid recursion altogether if the worst case scenario is supported by your platform.
Edit: The following will work some of the time. Think about why it won't work all the time and how you might change it to cover other cases.
Build it starting with the largest bill towards the smallest. This will yeild the lowest number of bills.
Take the initial amount and apply the largest bill as many times as you can without going over the price.
Step to the next largest bill and apply it the same way.
Keep doing this until you are on your smallest bill.
Then check if the sum equals the target amount.
Algorithm:
1. Sort currency denominations available in descending order.
2. Calculate Remainder = Input % denomination[i] i -> n-1, 0
3. If remainder is 0, the input can be broken down, otherwise it cannot be.
Example:
Input: 50, Available: 10,20
[50 % 20] = 10, [10 % 10] = 0, Ans: Yes
Input: 50, Available: 15,20
[50 % 20] = 10, [10 % 15] = 15, Ans: No

Resources