Using min/maximum values of a type when finding the min/maximum values in a set

Originally, I had basically written an essay with a question at the end, so I'm going to cram it down to this: which is better (being really nit-picky here)?
int min = someArray[0][0];
for (int i = 0; i < someArray.length; i++)
for (int j = 0; j < someArray[i].length; j++)
min = Math.min(min, someArray[i][j]);
int min = int.MAX_VALUE;
for (int i = 0; i < someArray.length; i++)
for (int j = 0; j < someArray[i].length; j++)
min = Math.min(min, someArray[i][j]);
I reckon b is faster, saving an instruction or two by initializing min to a constant value instead of using the indexer. It also feels less redundant - no comparing someArray[0][0] to itself...
As an algorithm, which is better/valid-er.
EDIT: Assume that the array is not null and not empty.
EDIT2: Fixed a couple of careless errors.

Both of these algorithms are correct (assuming, of course, the array is nonempty). I think that version A works more generally, since for some types (strings, in particular) there may not be a well-defined maximum value.
The reason that these algorithms are equivalent has to do with a cool mathematical object called a semilattice. To motivate semilattices, there are few cool properties of max that happen to hold true:
max is idempotent, so applying it to the same value twice gives back that original value: max(x, x) = x
max is commutative, so it doesn't matter what order you apply it to its arguments: max(x, y) = max(y, x)
max is associative, so when taking the maximum of three or more values it doesn't matter how you group the elements: max(max(x, y), z) = max(x, max(y, z))
These laws also hold for the minimum as well, as well as many other structures. For example, if you have a tree structure, the "least upper bound" operator also satisfies these constraints. Similarly, if you have a collection of sets and set union or intersection, you'd find that these constraints hold as well.
If you have a set of elements (for example, integers, strings, etc.) and some binary operator defined over them with the above three properties (idempotency, commutativity, and associativity), then you have found a structure called a semilattice. The binary operator is then called a meet operator (or sometimes a join operator depending on the context).
The reason that semilattices are useful is that if you have a (finite) collection of elements drawn from a semilattice and want to compute their meet, you can do so by using a loop like this:
Element e = data[0];
for (i in data[1 .. n])
e = meet(e, data[i])
The reason that this works is that because the meet operator is commutative and associative, we can apply the meet across the elements in any order that we want. Applying it one element at a time as we walk across the elements of the array in order thus produces the same value than if we had shuffled the array elements first, or iterated in reverse order, etc. In your case, the meet operator was "max" or "min," and since they satisfy the laws for meet operators described above the above code will correctly compute the max or min.
To address your initial question, we need a bit more terminology. You were curious about whether or not it was better or safer to initialize your initial guess of the minimum value to be the maximum possible integer. The reason this works is that we have the cool property that
min(int.MAX_VALUE, x) = min(x, int.MAX_VALUE) = x
In other words, if you compute the meet of int.MAX_VALUE and any other value, you get the second value back. In mathematical terms, this is because int.MAX_VALUE is the top element of the meet semilattice. More formally, a top element for a meet semilattice is an element (denoted &top;) satisfying
meet(&top;, x) = meet(x, &top;) = x
If you use max instead of min, then the top element would be int.MIN_VALUE, since
max(int.MIN_VALUE, x) = max(x, int.MIN_VALUE) = x
Because applying the meet operator to &top; and any other element produces that other element, if you have a meet semilattice with a well-defined top element, you can rewrite the above code to compute the meet of all the elements as
Element e = Element.TOP;
for (i in data[0 .. n])
e = meet(e, data[i])
This works because after the first iteration, e is set to meet(e, data[0]) = meet(Element.TOP, data[0]) = data[0] and the iteration proceeds as usual. Consequently, in your original question, it doesn't matter which of the two loops you use; as long as there is at least one element defined, they produce the same value.
That said, not all semilattices have a top element. Consider, for example, the set of all strings where the meet operator is defined as
meet(x, y) = x if x lexicographically precedes y
= y otherwise
For example, meet("a", "ab") = "a", meet("dog, "cat") = "cat", etc. In this case, there is no string s that satisfies the property meet(s, x) = meet(x, s) = x, and so the semilattice has no top element. In that case, you cannot possibly use the second version of the code, because there is no top element that you can initialize the initial value to.
However, there is a very cute technique you can use to fake this, which actually does end up getting used a bit in practice. Given a semilattice with no top element, you can create a new semilattice that does have a top element by introducing a new element &top; and arbitrarily defining that meet(&top;, x) = meet(x, &top;) = x. In other words, this element is specially crafted to be a top element and has no significance otherwise.
In code, you can introduce an element like this implicitly by writing
bool found = false;
Element e;
for (i in data[0 .. n]) {
if (!found) {
found = true;
e = i;
} else {
e = meet(e, i);
This code works by having an external boolean found keep track of whether or not we have seen the first element yet. If we haven't, then we pretend that the element e is this new top element. Computing the meet of this top element and the array element produces the array element, and so we can just set the element e to be equal to that array element.
Hope this helps! Sorry if this is too theoretical... I just happen to like math. :-)

B is better; if someArray happened to be empty, you'd get a runtime error; But A and B both could have an issue, because if someArray is null (and this wasn't checked in previous lines of code), both A and B will throw exceptions.

From a practical standpoint, I like option A marginally better because if the data type being dealt with changes in the future, changing the initial value is one less thing that needs to be updated (and therefore, one less thing that can go wrong).
From an algorithmic purity standpoint, I have no idea if one is better than the other.
By the way, option A should have its initialization like so:
int min = someArray[0][0];


Pairing the weight of a protein sequence with the correct sequence

This piece of code is part of a larger function. I already created a list of molecular weights and I also defined a list of all the fragments in my data.
I'm trying to figure out how I can go through the list of fragments, calculate their molecular weight and check if it matches the number in the other list. If it matches, the sequence is appended into an empty list.
combs = [397.47, 2267.58, 475.63, 647.68]
frags = []
for c in combs:
for f in fragments:
if c == SeqUtils.molecular_weight(f, 'protein', circular = True):
I'm guessing I don't fully know how the SeqUtils.molecular_weight command works in Python, but if there is another way that would also be great.
You are comparing floating point values for equality. That is bound to fail. You always have to account for some degree of error when dealing with floating point values. In this particular case you also have to take into account the error margin of the input values.
So do not compare floats like this
x == y
but instead like this
abs(x - y) < epsilon
where epsilon is some carefully selected arbitrary number.
I did two slight modifications to your code: I swapped the order of the f and the c loop to be able to store the calculated value of w. And I append the value of w to the list frags as well in order to better understand what is happening.
Your modified code now looks like this:
from Bio import SeqUtils
combs = [397.47, 2267.58, 475.63, 647.68]
frags = []
threshold = 0.5
for f in fragments:
w = SeqUtils.molecular_weight(f, 'protein', circular=True)
for c in combs:
if abs(c - w) < threshold:
frags.append((f, w))
This prints the result
[('AINV', 397.46909999999997), ('IEEATHMTPCYELHGLRWV', 2267.5843), ('MQCL', 475.6257), ('QIQDY', 647.6766)]
As you can see, the first value for the weight differs from the reference value by about 0.0009. That's why you did not catch it with your approach.

Proving that there are no overlapping sub-problems?

I just got the following interview question:
Given a list of float numbers, insert “+”, “-”, “*” or “/” between each consecutive pair of numbers to find the maximum value you can get. For simplicity, assume that all operators are of equal precedence order and evaluation happens from left to right.
(1, 12, 3) -> 1 + 12 * 3 = 39
If we built a recursive solution, we would find that we would get an O(4^N) solution. I tried to find overlapping sub-problems (to increase the efficiency of this algorithm) and wasn't able to find any overlapping problems. The interviewer then told me that there wasn't any overlapping subsolutions.
How can we detect when there are overlapping solutions and when there isn't? I spent a lot of time trying to "force" subsolutions to appear and eventually the Interviewer told me that there wasn't any.
My current solution looks as follows:
def maximumNumber(array, current_value=None):
if current_value is None:
current_value = array[0]
array = array[1:]
if len(array) == 0:
return current_value
return max(
maximumNumber(array[1:], current_value * array[0]),
maximumNumber(array[1:], current_value - array[0]),
maximumNumber(array[1:], current_value / array[0]),
maximumNumber(array[1:], current_value + array[0])
Looking for "overlapping subproblems" sounds like you're trying to do bottom up dynamic programming. Don't bother with that in an interview. Write the obvious recursive solution. Then memoize. That's the top down approach. It is a lot easier to get working.
You may get challenged on that. Here was my response the last time that I was asked about that.
There are two approaches to dynamic programming, top down and bottom up. The bottom up approach usually uses less memory but is harder to write. Therefore I do the top down recursive/memoize and only go for the bottom up approach if I need the last ounce of performance.
It is a perfectly true answer, and I got hired.
Now you may notice that tutorials about dynamic programming spend more time on bottom up. They often even skip the top down approach. They do that because bottom up is harder. You have to think differently. It does provide more efficient algorithms because you can throw away parts of that data structure that you know you won't use again.
Coming up with a working solution in an interview is hard enough already. Don't make it harder on yourself than you need to.
EDIT Here is the DP solution that the interviewer thought didn't exist.
def find_best (floats):
current_answers = {floats[0]: ()}
floats = floats[1:]
for f in floats:
next_answers = {}
for v, path in current_answers.iteritems():
next_answers[v + f] = (path, '+')
next_answers[v * f] = (path, '*')
next_answers[v - f] = (path, '-')
if 0 != f:
next_answers[v / f] = (path, '/')
current_answers = next_answers
best_val = max(current_answers.keys())
return (best_val, current_answers[best_val])
Generally the overlapping sub problem approach is something where the problem is broken down into smaller sub problems, the solutions to which when combined solve the big problem. When these sub problems exhibit an optimal sub structure DP is a good way to solve it.
The decision about what you do with a new number that you encounter has little do with the numbers you have already processed. Other than accounting for signs of course.
So I would say this is a over lapping sub problem solution but not a dynamic programming problem. You could use dive and conquer or evenmore straightforward recursive methods.
Initially let's forget about negative floats.
process each new float according to the following rules
If the new float is less than 1, insert a / before it
If the new float is more than 1 insert a * before it
If it is 1 then insert a +.
If you see a zero just don't divide or multiply
This would solve it for all positive floats.
Now let's handle the case of negative numbers thrown into the mix.
Scan the input once to figure out how many negative numbers you have.
Isolate all the negative numbers in a list, convert all the numbers whose absolute value is less than 1 to the multiplicative inverse. Then sort them by magnitude. If you have an even number of elements we are all good. If you have an odd number of elements store the head of this list in a special var , say k, and associate a processed flag with it and set the flag to False.
Proceed as before with some updated rules
If you see a negative number less than 0 but more than -1, insert a / divide before it
If you see a negative number less than -1, insert a * before it
If you see the special var and the processed flag is False, insert a - before it. Set processed to True.
There is one more optimization you can perform which is removing paris of negative ones as candidates for blanket subtraction from our initial negative numbers list, but this is just an edge case and I'm pretty sure you interviewer won't care
Now the sum is only a function of the number you are adding and not the sum you are adding to :)
Computing max/min results for each operation from previous step. Not sure about overall correctness.
Time complexity O(n), space complexity O(n)
const max_value = (nums) => {
const ops = [(a, b) => a+b, (a, b) => a-b, (a, b) => a*b, (a, b) => a/b]
const dp = Array.from({length: nums.length}, _ => [])
dp[0] = Array.from({length: ops.length}, _ => [nums[0],nums[0]])
for (let i = 1; i < nums.length; i++) {
for (let j = 0; j < ops.length; j++) {
let mx = -Infinity
let mn = Infinity
for (let k = 0; k < ops.length; k++) {
if (nums[i] === 0 && k === 3) {
// If current number is zero, removing division
ops.splice(3, 1)
dp.splice(3, 1)
const opMax = ops[j](dp[i-1][k][0], nums[i])
const opMin = ops[j](dp[i-1][k][1], nums[i])
mx = Math.max(opMax, opMin, mx)
mn = Math.min(opMax, opMin, mn)
return Math.max(...dp[nums.length-1].map(v => Math.max(...v)))
// Tests
console.log(max_value([1, 12, 3]))
console.log(max_value([1, 0, 3]))
console.log(max_value([59, 60, -0.000001]))
console.log(max_value([0, 1, -0.0001, -1.00000001]))

Algorithm to find matching real values in a list

I have a complex algorithm which calculates the result of a function f(x). In the real world f(x) is a continuous function. However due to rounding errors in the algorithm this is not the case in the computer program. The following diagram gives an example:
Furthermore I have a list of several thousands values Fi.
I am looking for all the x values which meet an Fi value i.e. f(xi)=Fi
I can solve this problem with by simply iterating through the x values like in the following pseudo code:
for i=0 to NumberOfChecks-1 do
//calculate the function result with the algorithm
//loop through the value list to see if the function result matches a value in the list
for j=0 to NumberOfValuesInTheList-1 do
if Abs(FunctionResult-ListValues[j])<Epsilon then
//mark that element j of the list matches
//and store the corresponding x value in the list
Of course it is necessary to use a high number of checks. Otherwise I will miss some x values. The higher the number of checks the more complete and accurate is the result. It is acceptable that the list is 90% or 95% complete.
The problem is that this brute force approach takes too much time. As I mentioned before the algorithm for f(x) is quite complex and with a high number of checks it takes too much time.
What would be a better solution for this problem?
Another way to do this is in two parts: generate all of the results, sort them, and then merge with the sorted list of existing results.
First step is to compute all of the results and save them along with the x value that generated them. That is:
results = list of <x, result>
for i = 0 to numberOfChecks
//calculate the function result with the algorithm
results.Add(x, FunctionResult)
end for
Now, sort the results list by FunctionResult, and also sort the FunctionResult-ListValues array by result.
You now have two sorted lists that you can move through linearly:
i = 0, j = 0;
while (i < results.length && j < ListValues.length)
diff = ListValues[j] - results[i];
if (Abs(diff) < Episilon)
// mark this one with the x value
// and move to the next result
i = i + 1
else if (diff > 0)
// list value is much larger than result. Move to next result.
i = i + 1
// list value is much smaller than result. Move to next list value.
j = j + 1
Sort the list, producing an array SortedListValues that contains
the sorted ListValues and an array SortedListValueIndices that
contains the index in the original array of each entry in
SortedListValues. You only actually need the second of these and
you can create both of them with a single sort by sorting an array
of tuples of (value, index) using value as the sort key.
Iterate over your range in 0..NumberOfChecks-1 and compute the
value of the function at each step, and then use a binary chop
method to search for it in the sorted list.
// sort as described above
SortedListValueIndices = sortIndices(ListValues);
for i=0 to NumberOfChecks-1 do
//calculate the function result with the algorithm
// do a binary chop to find the closest element in the list
highIndex = NumberOfValuesInTheList-1;
lowIndex = 0;
while true do
if Abs(FunctionResult-ListValues[SortedListValueIndices[lowIndex]])<Epsilon then
// find all elements in the range that match, breaking out
// of the loop as soon as one doesn't
for j=lowIndex to NumberOfValuesInTheList-1 do
if Abs(FunctionResult-ListValues[SortedListValueIndices[j]])>=Epsilon then
//mark that element SortedListValueIndices[j] of the list matches
//and store the corresponding x value in the list
// break out of the binary chop loop
// break out of the loop once the indices match
if highIndex <= lowIndex then
// do the binary chop searching, adjusting the indices:
middleIndex = (lowIndex + 1 + highIndex) / 2;
if ListValues[SortedListValueIndices[middleIndex] < FunctionResult then
lowIndex = middleIndex;
highIndex = middleIndex;
lowIndex = lowIndex + 1;
Possible complications:
The binary chop isn't taking the epsilon into account. Depending on
your data this may or may not be an issue. If it is acceptable that
the list is only 90 or 95% complete this might be ok. If not then
you'll need to widen the range to take it into account.
I've assumed you want to be able to match multiple x values for each FunctionResult. If that's not necessary you can simplify the code.
Naturally this depends very much on the data, and especially on the numeric distribution of Fi. Another problem is that the f(x) looks very jumpy, eliminating the concept of "assumption of nearby value".
But one could optimise the search.
Picture below.
Walking through F(x) at sufficient granularity, define a rough min
(red line) and max (green line), using suitable tolerance (the "air"
or "gap" in between). The area between min and max is "AREA".
See where each Fi-value hits AREA, do a stacked marking ("MARKING") at X-axis accordingly (can be multiple segments of X).
Where lots of MARKINGs at top of each other (higher sum - the vertical black "sum" arrows), do dense hit tests, hence increasing the overall
chance to get as many hits as possible. Elsewhere do more sparse tests.
Tighten this schema (decrease tolerance) as much as you dare.
EDIT: Fi is a bit confusing. Is it an ordered array or does it have random order (as i assumed)?
Jim Mischel's solution would work in a O(i+j) instead of the O(i*j) solution that you currently have. But, there is a (very) minor bug in his code. The correct code would be :
diff = ListValues[j] - results[i]; //no abs() here
if (abs(diff) < Episilon) //add abs() here
// mark this one with the x value
// and move to the next result
i = i + 1
the best methods will relay on the nature of your function f(x).
The best solution is if you can create the reversing to F(x) and use it
as you said F(x) is continuous:
therefore you can start evaluating small amount of far points, then find ranges that makes sense, and refine your "assumption" for x that f(x)=Fi
it is not bullet proof, but it is an option.
e.g. Fi=5.7; f(1)=1.4 ,f(4)=4,f(16)=12.6, f(10)=10.1, f(7)=6.5, f(5)=5.1, f(6)=5.8, you can take 5 < x < 7
on the same line as #1, and IF F(x) is hard to calculate, you can use Interpolation, and then evaluate F(x) only at the values that are probable.

Incorrect Recursive approach to finding combinations of coins to produce given change

I was recently doing a project euler problem (namely #31) which was basically finding out how many ways we can sum to 200 using elements of the set {1,2,5,10,20,50,100,200}.
The idea that I used was this: the number of ways to sum to N is equal to
(the number of ways to sum N-k) * (number of ways to sum k), summed over all possible values of k.
I realized that this approach is WRONG, namely due to the fact that it creates several several duplicate counts. I have tried to adjust the formula to avoid duplicates, but to no avail. I am seeking the wisdom of stack overflowers regarding:
whether my recursive approach is concerned with the correct subproblem to solve
If there exists one, what would be an effective way to eliminate duplicates
how should we approach recursive problems such that we are concerned with the correct subproblem? what are some indicators that we've chosen a correct (or incorrect) subproblem?
When trying to avoid duplicate permutations, a straightforward strategy that works in most cases is to only create rising or falling sequences.
In your example, if you pick a value and then recurse with the whole set, you will get duplicate sequences like 50,50,100 and 50,100,50 and 100,50,50. However, if you recurse with the rule that the next value should be equal to or smaller than the currently selected value, out of those three you will only get the sequence 100,50,50.
So an algorithm that counts only unique combinations would be e.g.:
function uniqueCombinations(set, target, previous) {
for all values in set not greater than previous {
if value equals target {
increment count
if value is smaller than target {
uniqueCombinations(set, target - value, value)
uniqueCombinations([1,2,5,10,20,50,100,200], 200, 200)
Alternatively, you can create a copy of the set before every recursion, and remove the elements from it that you don't want repeated.
The rising/falling sequence method also works with iterations. Let's say you want to find all unique combinations of three letters. This algorithm will print results like a,c,e, but not a,e,c or e,a,c:
for letter1 is 'a' to 'x' {
for letter2 is first letter after letter1 to 'y' {
for letter3 is first letter after letter2 to 'z' {
print [letter1,letter2,letter3]
m69 gives a nice strategy that often works, but I think it's worthwhile to better understand why it works. When trying to count items (of any kind), the general principle is:
Think of a rule that classifies any given item into exactly one of several non-overlapping categories. That is, come up with a list of concrete categories A, B, ..., Z that will make the following sentence true: An item is either in category A, or in category B, or ..., or in category Z.
Once you have done this, you can safely count the number of items in each category and add these counts together, comfortable in the knowledge that (a) any item that is counted in one category is not counted again in any other category, and (b) any item that you want to count is in some category (i.e., none are missed).
How could we form categories for your specific problem here? One way to do it is to notice that every item (i.e., every multiset of coin values that sums to the desired total N) either contains the 50-coin exactly zero times, or it contains it exactly once, or it contains it exactly twice, or ..., or it contains it exactly RoundDown(N / 50) times. These categories don't overlap: if a solution uses exactly 5 50-coins, it pretty clearly can't also use exactly 7 50-coins, for example. Also, every solution is clearly in some category (notice that we include a category for the case in which no 50-coins are used). So if we had a way to count, for any given k, the number of solutions that use coins from the set {1,2,5,10,20,50,100,200} to produce a sum of N and use exactly k 50-coins, then we could sum over all k from 0 to N/50 and get an accurate count.
How to do this efficiently? This is where the recursion comes in. The number of solutions that use coins from the set {1,2,5,10,20,50,100,200} to produce a sum of N and use exactly k 50-coins is equal to the number of solutions that sum to N-50k and do not use any 50-coins, i.e. use coins only from the set {1,2,5,10,20,100,200}. This of course works for any particular coin denomination that we could have chosen, so these subproblems have the same shape as the original problem: we can solve each one by simply choosing another coin arbitrarily (e.g. the 10-coin), forming a new set of categories based on this new coin, counting the number of items in each category and summing them up. The subproblems become smaller until we reach some simple base case that we process directly (e.g. no allowed coins left: then there is 1 item if N=0, and 0 items otherwise).
I started with the 50-coin (instead of, say, the largest or the smallest coin) to emphasise that the particular choice used to form the set of non-overlapping categories doesn't matter for the correctness of the algorithm. But in practice, passing explicit representations of sets of coins around is unnecessarily expensive. Since we don't actually care about the particular sequence of coins to use for forming categories, we're free to choose a more efficient representation. Here (and in many problems), it's convenient to represent the set of allowed coins implicitly as simply a single integer, maxCoin, which we interpret to mean that the first maxCoin coins in the original ordered list of coins are the allowed ones. This limits the possible sets we can represent, but here that's OK: If we always choose the last allowed coin to form categories on, we can communicate the new, more-restricted "set" of allowed coins to subproblems very succinctly by simply passing the argument maxCoin-1 to it. This is the essence of m69's answer.
There's some good guidance here. Another way to think about this is as a dynamic program. For this, we must pose the problem as a simple decision among options that leaves us with a smaller version of the same problem. It boils out to a certain kind of recursive expression.
Put the coin values c0, c1, ... c_(n-1) in any order you like. Then define W(i,v) as the number of ways you can make change for value v using coins ci, c_(i+1), ... c_(n-1). The answer we want is W(0,200). All that's left is to define W:
W(i,v) = sum_[k = 0..floor(200/ci)] W(i+1, v-ci*k)
In words: the number of ways we can make change with coins ci onward is to sum up all the ways we can make change after a decision to use some feasible number k of coins ci, removing that much value from the problem.
Of course we need base cases for the recursion. This happens when i=n-1: the last coin value. At this point there's a way to make change if and only if the value we need is an exact multiple of c_(n-1).
W(n-1,v) = 1 if v % c_(n-1) == 0 and 0 otherwise.
We generally don't want to implement this as a simple recursive function. The same argument values occur repeatedly, which leads to an exponential (in n and v) amount of wasted computation. There are simple ways to avoid this. Tabular evaluation and memoization are two.
Another point is that it is more efficient to have the values in descending order. By taking big chunks of value early, the total number of recursive evaluations is minimized. Additionally, since c_(n-1) is now 1, the base case is just W(n-1)=1. Now it becomes fairly obvious that we can add a second base case as an optimization: W(n-2,v) = floor(v/c_(n-2)). That's how many times the for loop will sum W(n-1,1) = 1!
But this is gilding a lilly. The problem is so small that exponential behavior doesn't signify. Here is a little implementation to show that order really doesn't matter:
#include <stdio.h>
#define n 8
int cv[][n] = {
int *c;
int w(int i, int v) {
if (i == n - 1) return v % c[n - 1] == 0;
int sum = 0;
for (int k = 0; k <= v / c[i]; ++k)
sum += w(i + 1, v - c[i] * k);
return sum;
int main(int argc, char *argv[]) {
unsigned p;
if (argc != 2 || sscanf(argv[1], "%d", &p) != 1 || p > 2) p = 0;
c = cv[p];
printf("Ways(%u) = %d\n", p, w(0, 200));
return 0;
Drumroll, please...
$ ./foo 0
Ways(0) = 73682
$ ./foo 1
Ways(1) = 73682
$ ./foo 2
Ways(2) = 73682

Algorithm to find matching pairs in a list

I will phrase the problem in the precise form that I want below:
Two floating point lists N and D of the same length k (k is multiple of 2).
It is known that for all i=0,...,k-1, there exists j != i such that D[j]*D[i] == N[i]*N[j]. (I'm using zero-based indexing)
A (length k/2) list of pairs (i,j) such that D[j]*D[i] == N[i]*N[j].
The pairs returned may not be unique (any valid list of pairs is okay)
The application for this algorithm is to find reciprocal pairs of eigenvalues of a generalized palindromic eigenvalue problem.
The equality condition is equivalent to N[i]/D[i] == D[j]/N[j], but also works when denominators are zero (which is a definite possibility). Degeneracies in the eigenvalue problem cause the pairs to be non-unique.
More generally, the algorithm is equivalent to:
A list X of length k (k is multiple of 2).
It is known that for all i=0,...,k-1, there exists j != i such that IsMatch(X[i],X[j]) returns true, where IsMatch is a boolean matching function which is guaranteed to return true for at least one j != i for all i.
A (length k/2) list of pairs (i,j) such that IsMatch(i,j) == true for all pairs in the list.
The pairs returned may not be unique (any valid list of pairs is okay)
Obviously, my first problem can be formulated in terms of the second with IsMatch(u,v) := { (u - 1/v) == 0 }. Now, due to limitations of floating point precision, there will never be exact equality, so I want the solution which minimizes the match error. In other words, assume that IsMatch(u,v) returns the value u - 1/v and I want the algorithm to return a list for which IsMatch returns the minimal set of errors. This is a combinatorial optimization problem. I was thinking I can first naively compute the match error between all possible pairs of indexes i and j, but then I would need to select the set of minimum errors, and I don't know how I would do that.
The IsMatch function is reflexive (IsMatch(a,b) implies IsMatch(b,a)), but not transitive. It is, however, 3-transitive: IsMatch(a,b) && IsMatch(b,c) && IsMatch(c,d) implies IsMatch(a,d).
This problem is apparently identically the minimum weight perfect matching problem in graph theory. However, in my case I know that there should be a "good" perfect matching, so the distribution of edge weights is not totally random. I feel that this information should be used somehow. The question now is if there is a good implementation to the min-weight-perfect-matching problem that uses my prior knowledge to arrive at a solution early in the search. I'm also open to pointers towards a simple implementation of any such algorithm.
I hope I got your problem.
Well, if IsMatch(i, j) and IsMatch(j, l) then IsMatch(i, l). More generally, the IsMatch relation is transitive, commutative and reflexive, ie. its an equivalence relation. The algorithm translates to which element appears the most times in the list (use IsMatch instead of =).
(If I understand the problem...)
Here is one way to match each pair of products in the two lists.
Multiply each pair N and save it to a structure with the product, and the subscripts of the elements making up the product.
Multiply each pair D and save it to a second instance of the structure with the product, and the subscripts of the elements making up the product.
Sort both structions on the product.
Make a merge-type pass through both sorted structure arrays. Each time you find a product from one array that is close enough to the other, you can record the two subscripts from each sorted list for a match.
You can also use one sorted list for an ismatch function, doing a binary search on the product.
well。。Multiply each pair D and save it to a second instance of the structure with the product, and the subscripts of the elements making up the product.
I just asked my CS friend, and he came up with the algorithm below. He doesn't have an account here (and apparently unwilling to create one), but I think his answer is worth sharing.
// We will find the best match in the minimax sense; we will minimize
// the maximum matching error among all pairs. Alpha maintains a
// lower bound on the maximum matching error. We will raise Alpha until
// we find a solution. We assume MatchError returns an L_1 error.
// This first part finds the set of all possible alphas (which are
// the pairwise errors between all elements larger than maxi-min
// error.
Alpha = 0
For all i:
min = Infinity
For all j > i:
if MatchError(i,j) < min
min = MatchError(i,j)
If min > Alpha
Alpha = min
Remove all elements of AlphaSet smaller than Alpha
// This next part increases Alpha until we find a solution
While !AlphaSet.Empty()
Alpha = AlphaSet.RemoveSmallest()
sol = GetBoundedErrorSolution(Alpha)
If sol != nil
Return sol
// This is the definition of the helper function. It returns
// a solution with maximum matching error <= Alpha or nil if
// no such solution exists.
GetBoundedErrorSolution(Alpha) :=
MaxAssignments = 0
For all i:
ValidAssignments[i] = empty set;
For all j > i:
if MatchError <= Alpha
// ValidAssignments[i].Size() > 0 due to our choice of Alpha
// in the outer loop
If ValidAssignments[i].Size() > MaxAssignments
MaxAssignments = ValidAssignments[i].Size()
If MaxAssignments = 1
return ValidAssignments
G = graph(ValidAssignments)
// G is an undirected graph whose vertices are all values of i
// and edges between vertices if they have match error less
// than or equal to Alpha
If G has a perfect matching
// Note that this part is NP-complete.
Return the matching
Return nil
It relies on being able to compute a perfect matching of a graph, which is NP-complete, but at least it is reduced to a known problem. It is expected that the solution be NP-complete, but this is OK since in practice the size of the given lists are quite small. I'll wait around for a better answer for a few days, or for someone to expand on how to find the perfect matching in a reasonable way.
You want to find j such that D(i)*D(j) = N(i)*N(j) {I assumed * is ordinary real multiplication}
assuming all N(i) are nonzero, let
Z(i) = D(i)/N(i).
Problem: find j, such that Z(i) = 1/Z(j).
Split set into positives and negatives and process separately.
take logs for clarity. z(i) = log Z(i).
Sort indirectly. Then in the sorted view you should have something like -5 -3 -1 +1 +3 +5, for example. Read off +/- pairs and that should give you the original indices.
Am I missing something, or is the problem easy?
Okay, I ended up using this ported Fortran code, where I simply specify the dense upper triangular distance matrix using:
complex_t num = N[i]*N[j] - D[i]*D[j];
complex_t den1 = N[j]*D[i];
complex_t den2 = N[i]*D[j];
if(std::abs(den1) < std::abs(den2)){
costs[j*(j-1)/2+i] = std::abs(-num/den2);
}else if(std::abs(den1) == 0){
costs[j*(j-1)/2+i] = std::sqrt(std::numeric_limits<double>::max());
costs[j*(j-1)/2+i] = std::abs(num/den1);
This works great and is fast enough for my purposes.
You should be able to sort the (D[i],N[i]) pairs. You don't need to divide by zero -- you can just multiply out, as follows:
bool order(i,j) {
float ni= N[i]; float di= D[i];
if(di<0) { di*=-1; ni*=-1; }
float nj= N[j]; float dj= D[j];
if(dj<0) { dj*=-1; nj*=-1; }
return ni*dj < nj*di;
Then, scan the sorted list to find two separation points: (N == D) and (N == -D); you can start matching reciprocal pairs from there, using:
as a validity check. Leave the (N == 0) and (D == 0) points for last; it doesn't matter whether you consider them negative or positive, as they will all match with each other.
edit: alternately, you could just handle (N==0) and (D==0) cases separately, removing them from the list. Then, you can use (N[i]/D[i]) to sort the rest of the indices. You still might want to start at 1.0 and -1.0, to make sure you can match near-zero cases with exactly-zero cases.
