Better than brute force algorithms for a coin-flipping game - algorithm

I have a problem and I feel like there should be a well-known algorithm for solving it that's better than just brute force, but I can't think of one, so I'm asking here.
The problem is as follows: given n sorted (from low to high) lists containing m probabilities, choose one index for each list such that the sum of the chosen indexes is less than m. Then, for each list, we flip a coin, where the chance of it landing heads is equal to the probability at the chosen index for that list. Maximize the chance of the coin landing heads at least once.
Are there any algorithms for solving this problem that are better than just brute force?
This problem seems most similar to the knapsack problem, except the value of the items in the knapsack isn't merely a sum of the items in the knapsack. (Written in Python, instead of sum(p for p in chosen_probabilities) it's 1 - math.prod([1 - p for p in chosen_probabilities])) And, there's restrictions on what items you can add given what items are already in the knapsack. For example, if the index = 3 item for a particular list is already in the knapsack, then adding in the item with index = 2 for that same list isn't allowed (since you can only pick one index for each list). So there are certain items that can and can't be added to the knapsack based on what items are already in it.
Linear optimization won't work because the values in the lists don't increase linearly, the final coin probability isn't linear with respect to the chosen probabilities, and our constraint is on the sum of the indexes, rather than the values in the lists themselves. As David has pointed out, linear optimization will work if you use binary variables to pick out the indexes and a logarithm to deal with the non-linearity.
EDIT:
I've found that explaining the motivation behind this problem can be helpful for understanding it. Imagine you have 10 seconds to solve a problem, and three different ways to solve it. You have models of how likely it is that each method will solve the problem, given how many seconds you try that method for, but if you switch methods, you lose all progress on the one you were previously trying. What methods should you try and for how long?

Maximizing 1 - math.prod([1 - p for p in chosen_probabilities]) is equivalent to minimizing math.prod([1 - p for p in chosen_probabilities]), which is equivalent to minimizing the log of this objective, which is a linear function of 0-1 indicator variables, so you could do an integer programming formulation this way.
I can't promise that this will be much better than brute force. The problem is that math.log(1 - p) is well approximated by -p when p is close to zero. My intuition is that for nontrivial instances it will be qualitatively similar to using integer programming to solve subset sum, which doesn't go particularly well.
If you're willing to settle for a bicriteria approximation scheme (get an answer such that the sum of the chosen indexes is less than m, that is at least as good as the best answer summing to less than (1 − ε) m) then you can round up the probability to multiples of ε and use dynamic programming to get an algorithm that runs in time polynomial in n, m, 1/ε.

Here is working code for David Eisenstat's solution.
To understand the implementation, I think it helps to go through the math first.
As a reminder, there are n lists, each with m options. (In the motivating example at the bottom of the question, each list represents a method for solving the problem, and you are given m-1 seconds to solve the problem. Each list is such that list[index] gives the chance of solving the problem with that method if the method is run for index seconds.)
We let the lists be stored in a matrix called d (named data in the code), where each row in the matrix is a list. (And thus each column represents an index, or, if following the motivating example, an amount of time.)
The probability of the coin landing heads, given that we chose index j* for list i, is computed as
We would like to maximize this.
(To explain the stats behind this equation, we're computing 1 minus the probability that the coin doesn't land on heads. The probability that the coin doesn't land on heads is the probability that each flip doesn't land on heads. The probability that a single flip doesn't land on heads is just 1 minus the probability that does land on heads. And the probability it does land on heads is the number we've chosen, d[i][j*]. Thus, the total probability that all the flips land on tails is just the product of the probability that each one lands on tails. And then the probability that the coin lands on heads is just 1 minus the probability that all the flips land on tails.)
Which, as David pointed out, is the same as minimizing:
Which is the same as minimizing:
Which is equivalent to:
Then, since this is linear sum, we can turn it into an integer program.
We'll be minimizing:
This lets the computer choose the indexes by allowing it to create an n by m matrix of 1s and 0s called x where the 1s pick out particular indexes. We'll then define rules so that it doesn't pick out invalid sets of indexes.
The first rule is that you have to pick out an index for each list:
The second rule is that you have to respect the constraint that the indexes chosen must sum to m or less:
And that's it! Then we can just tell the computer to minimize that sum according to those rules. It will spit out an x matrix with a single 1 on each row to tell us which index it has picked for the list on that row.
In code (using the motivating example), this is implemented as:
'''
Requirements:
cvxopt==1.2.6
cvxpy==1.1.10
ecos==2.0.7.post1
numpy==1.20.1
osqp==0.6.2.post0
qdldl==0.1.5.post0
scipy==1.6.1
scs==2.1.2
'''
import math
import cvxpy as cp
import numpy as np
# number of methods
n = 3
# if you have 10 seconds, there are 11 options for each method (0 seconds, 1 second, ..., 10 seconds)
m = 11
# method A has 30% chance of working if run for at least 3 seconds
# equivalent to [0, 0, 0, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]
A_list = [0, 0, 0] + [0.3] * (m - 3)
# method B has 30% chance of working if run for at least 3 seconds
# equivalent to [0, 0, 0, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]
B_list = [0, 0, 0] + [0.3] * (m - 3)
# method C has 40% chance of working if run for 4 seconds, 30% otherwise
# equivalent to [0.3, 0.3, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]
C_list = [0.3, 0.3, 0.3, 0.3] + [0.4] * (m - 4)
data = [A_list, B_list, C_list]
# do the logarithm
log_data = []
for row in data:
log_row = []
for col in row:
# deal with domain exception
if col == 1:
new_col = float('-inf')
else:
new_col = math.log(1 - col)
log_row.append(new_col)
log_data.append(log_row)
log_data = np.array(log_data)
x = cp.Variable((n, m), boolean=True)
objective = cp.Minimize(cp.sum(cp.multiply(log_data, x)))
# the current solver doesn't work with equalities, so each equality must be split into two inequalities.
# see https://github.com/cvxgrp/cvxpy/issues/1112
one_choice_per_method_constraint = [cp.sum(x[i]) <= 1 for i in range(n)] + [cp.sum(x[i]) >= 1 for i in range(n)]
# constrain the solution to not use more time than is allowed
# note that the time allowed is (m - 1), not m, because time is 1-indexed and the lists are 0-indexed
js = np.tile(np.array(list(range(m))), (n, 1))
time_constraint = [cp.sum(cp.multiply(js, x)) <= m - 1, cp.sum(cp.multiply(js, x)) >= m - 1]
constraints = one_choice_per_method_constraint + time_constraint
prob = cp.Problem(objective, constraints)
result = prob.solve()
def compute_probability(data, choices):
# compute 1 - ((1 - p1) * (1 - p2) * ...)
return 1 - np.prod(np.add(1, -np.multiply(data, choices)))
print("Choices:")
print(x.value)
'''
Choices:
[[0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
'''
print("Chance of success:")
print(compute_probability(data, x.value))
'''
Chance of success:
0.7060000000000001
'''
And there we have it! The computer has correctly determined that running method A for 3 seconds, method B for 3 seconds, and method C for 4 seconds is optimal. (Remember that the x matrix is 0-indexed, while the times are 1-indexed.)
Thank you, David, for the suggestion!

Related

Conditional sampling of binary vectors (?)

I'm trying to find a name for my problem, so I don't have to re-invent wheel when coding an algorithm which solves it...
I have say 2,000 binary (row) vectors and I need to pick 500 from them. In the picked sample I do column sums and I want my sample to be as close as possible to a pre-defined distribution of the column sums. I'll be working with 20 to 60 columns.
A tiny example:
Out of the vectors:
110
010
011
110
100
I need to pick 2 to get column sums 2, 1, 0. The solution (exact in this case) would be
110
100
My ideas so far
one could maybe call this a binary multidimensional knapsack, but I did not find any algos for that
Linear Programming could help, but I'd need some step by step explanation as I got no experience with it
as exact solution is not always feasible, something like simulated annealing brute force could work well
a hacky way using constraint solvers comes to mind - first set the constraints tight and gradually loosen them until some solution is found - given that CSP should be much faster than ILP...?
My concrete, practical (if the approximation guarantee works out for you) suggestion would be to apply the maximum entropy method (in Chapter 7 of Boyd and Vandenberghe's book Convex Optimization; you can probably find several implementations with your favorite search engine) to find the maximum entropy probability distribution on row indexes such that (1) no row index is more likely than 1/500 (2) the expected value of the row vector chosen is 1/500th of the predefined distribution. Given this distribution, choose each row independently with probability 500 times its distribution likelihood, which will give you 500 rows on average. If you need exactly 500, repeat until you get exactly 500 (shouldn't take too many tries due to concentration bounds).
Firstly I will make some assumptions regarding this problem:
Regardless whether the column sum of the selected solution is over or under the target, it weighs the same.
The sum of the first, second, and third column are equally weighted in the solution (i.e. If there's a solution whereas the first column sum is off by 1, and another where the third column sum is off by 1, the solution are equally good).
The closest problem I can think of this problem is the Subset sum problem, which itself can be thought of a special case of Knapsack problem.
However both of these problem are NP-Complete. This means there are no polynomial time algorithm that can solve them, even though it is easy to verify the solution.
If I were you the two most arguably efficient solution of this problem are linear programming and machine learning.
Depending on how many columns you are optimising in this problem, with linear programming you can control how much finely tuned you want the solution, in exchange of time. You should read up on this, because this is fairly simple and efficient.
With Machine learning, you need a lot of data sets (the set of vectors and the set of solutions). You don't even need to specify what you want, a lot of machine learning algorithms can generally deduce what you want them to optimise based on your data set.
Both solution has pros and cons, you should decide which one to use yourself based on the circumstances and problem set.
This definitely can be modeled as (integer!) linear program (many problems can). Once you have it, you can use a program such as lpsolve to solve it.
We model vector i is selected as x_i which can be 0 or 1.
Then for each column c, we have a constraint:
sum of all (x_i * value of i in column c) = target for column c
Taking your example, in lp_solve this could look like:
min: ;
+x1 +x4 +x5 >= 2;
+x1 +x4 +x5 <= 2;
+x1 +x2 +x3 +x4 <= 1;
+x1 +x2 +x3 +x4 >= 1;
+x3 <= 0;
+x3 >= 0;
bin x1, x2, x3, x4, x5;
If you are fine with a heuristic based search approach, here is one.
Go over the list and find the minimum squared sum of the digit wise difference between each bit string and the goal. For example, if we are looking for 2, 1, 0, and we are scoring 0, 1, 0, we would do it in the following way:
Take the digit wise difference:
2, 0, 1
Square the digit wise difference:
4, 0, 1
Sum:
5
As a side note, squaring the difference when scoring is a common method when doing heuristic search. In your case, it makes sense because bit strings that have a 1 in as the first digit are a lot more interesting to us. In your case this simple algorithm would pick first 110, then 100, which would is the best solution.
In any case, there are some optimizations that could be made to this, I will post them here if this kind of approach is what you are looking for, but this is the core of the algorithm.
You have a given target binary vector. You want to select M vectors out of N that have the closest sum to the target. Let's say you use the eucilidean distance to measure if a selection is better than another.
If you want an exact sum, have a look at the k-sum problem which is a generalization of the 3SUM problem. The problem is harder than the subset sum problem, because you want an exact number of elements to add to a target value. There is a solution in O(N^(M/2)). lg N), but that means more than 2000^250 * 7.6 > 10^826 operations in your case (in the favorable case where vectors operations have a cost of 1).
First conclusion: do not try to get an exact result unless your vectors have some characteristics that may reduce the complexity.
Here's a hill climbing approach:
sort the vectors by number of 1's: 111... first, 000... last;
use the polynomial time approximate algorithm for the subset sum;
you have an approximate solution with K elements. Because of the order of elements (the big ones come first), K should be a little as possible:
if K >= M, you take the M first vectors of the solution and that's probably near the best you can do.
if K < M, you can remove the first vector and try to replace it with 2 or more vectors from the rest of the N vectors, using the same technique, until you have M vectors. To sumarize: split the big vectors into smaller ones until you reach the correct number of vectors.
Here's a proof of concept with numbers, in Python:
import random
def distance(x, y):
return abs(x-y)
def show(ls):
if len(ls) < 10:
return str(ls)
else:
return ", ".join(map(str, ls[:5]+("...",)+ls[-5:]))
def find(is_xs, target):
# see https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution
S = [(0, ())] # we store indices along with values to get the path
for i, x in is_xs:
T = [(x + t, js + (i,)) for t, js in S]
U = sorted(S + T)
y, ks = U[0]
S = [(y, ks)]
for z, ls in U:
if z == target: # use the euclidean distance here if you want an approximation
return ls
if z != y and z < target:
y, ks = z, ls
S.append((z, ls))
ls = S[-1][1] # take the closest element to target
return ls
N = 2000
M = 500
target = 1000
xs = [random.randint(0, 10) for _ in range(N)]
print ("Take {} numbers out of {} to make a sum of {}", M, xs, target)
xs = sorted(xs, reverse = True)
is_xs = list(enumerate(xs))
print ("Sorted numbers: {}".format(show(tuple(is_xs))))
ls = find(is_xs, target)
print("FIRST TRY: {} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
splits = 0
while len(ls) < M:
first_x = xs[ls[0]]
js_ys = [(i, x) for i, x in is_xs if i not in ls and x != first_x]
replace = find(js_ys, first_x)
splits += 1
if len(replace) < 2 or len(replace) + len(ls) - 1 > M or sum(xs[i] for i in replace) != first_x:
print("Give up: can't replace {}.\nAdd the lowest elements.")
ls += tuple([i for i, x in is_xs if i not in ls][len(ls)-M:])
break
print ("Replace {} (={}) by {} (={})".format(ls[:1], first_x, replace, sum(xs[i] for i in replace)))
ls = tuple(sorted(ls[1:] + replace)) # use a heap?
print("{} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
print("AFTER {} splits, {} -> {}".format(splits, ls, sum(x for i, x in is_xs if i in ls)))
The result is obviously not guaranteed to be optimal.
Remarks:
Complexity: find has a polynomial time complexity (see the Wikipedia page) and is called at most M^2 times, hence the complexity remains polynomial. In practice, the process is reasonably fast (split calls have a small target).
Vectors: to ensure that you reach the target with the minimum of elements, you can improve the order of element. Your target is (t_1, ..., t_c): if you sort the t_js from max to min, you get the more importants columns first. You can sort the vectors: by number of 1s and then by the presence of a 1 in the most important columns. E.g. target = 4 8 6 => 1 1 1 > 0 1 1 > 1 1 0 > 1 0 1 > 0 1 0 > 0 0 1 > 1 0 0 > 0 0 0.
find (Vectors) if the current sum exceed the target in all the columns, then you're not connecting to the target (any vector you add to the current sum will bring you farther from the target): don't add the sum to S (z >= target case for numbers).
I propose a simple ad hoc algorithm, which, broadly speaking, is a kind of gradient descent algorithm. It seems to work relatively well for input vectors which have a distribution of 1s “similar” to the target sum vector, and probably also for all “nice” input vectors, as defined in a comment of yours. The solution is not exact, but the approximation seems good.
The distance between the sum vector of the output vectors and the target vector is taken to be Euclidean. To minimize it means minimizing the sum of the square differences off sum vector and target vector (the square root is not needed because it is monotonic). The algorithm does not guarantee to yield the sample that minimizes the distance from the target, but anyway makes a serious attempt at doing so, by always moving in some locally optimal direction.
The algorithm can be split into 3 parts.
First of all the first M candidate output vectors out of the N input vectors (e.g., N=2000, M=500) are put in a list, and the remaining vectors are put in another.
Then "approximately optimal" swaps between vectors in the two lists are done, until either the distance would not decrease any more, or a predefined maximum number of iterations is reached. An approximately optimal swap is one where removing the first vector from the list of output vectors causes a maximal decrease or minimal increase of the distance, and then, after the removal of the first vector, adding the second vector to the same list causes a maximal decrease of the distance. The whole swap is avoided if the net result is not a decrease of the distance.
Then, as a last phase, "optimal" swaps are done, again stopping on no decrease in distance or maximum number of iterations reached. Optimal swaps cause a maximal decrease of the distance, without requiring the removal of the first vector to be optimal in itself. To find an optimal swap all vector pairs have to be checked. This phase is much more expensive, being O(M(N-M)), while the previous "approximate" phase is O(M+(N-M))=O(N). Luckily, when entering this phase, most of the work has already been done by the previous phase.
from typing import List, Tuple
def get_sample(vects: List[Tuple[int]], target: Tuple[int], n_out: int,
max_approx_swaps: int = None, max_optimal_swaps: int = None,
verbose: bool = False) -> List[Tuple[int]]:
"""
Get a sample of the input vectors having a sum close to the target vector.
Closeness is measured in Euclidean metrics. The output is not guaranteed to be
optimal (minimum square distance from target), but a serious attempt is made.
The max_* parameters can be used to avoid too long execution times,
tune them to your needs by setting verbose to True, or leave them None (∞).
:param vects: the list of vectors (tuples) with the same number of "columns"
:param target: the target vector, with the same number of "columns"
:param n_out: the requested sample size
:param max_approx_swaps: the max number of approximately optimal vector swaps,
None means unlimited (default: None)
:param max_optimal_swaps: the max number of optimal vector swaps,
None means unlimited (default: None)
:param verbose: print some info if True (default: False)
:return: the sample of n_out vectors having a sum close to the target vector
"""
def square_distance(v1, v2):
return sum((e1 - e2) ** 2 for e1, e2 in zip(v1, v2))
n_vec = len(vects)
assert n_vec > 0
assert n_out > 0
n_rem = n_vec - n_out
assert n_rem > 0
output = vects[:n_out]
remain = vects[n_out:]
n_col = len(vects[0])
assert n_col == len(target) > 0
sumvect = (0,) * n_col
for outvect in output:
sumvect = tuple(map(int.__add__, sumvect, outvect))
sqdist = square_distance(sumvect, target)
if verbose:
print(f"sqdist = {sqdist:4} after"
f" picking the first {n_out} vectors out of {n_vec}")
if max_approx_swaps is None:
max_approx_swaps = sqdist
n_approx_swaps = 0
while sqdist and n_approx_swaps < max_approx_swaps:
# find the best vect to subtract (the square distance MAY increase)
sqdist_0 = None
index_0 = None
sumvect_0 = None
for index in range(n_out):
tmp_sumvect = tuple(map(int.__sub__, sumvect, output[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_0 is None or sqdist_0 > tmp_sqdist:
sqdist_0 = tmp_sqdist
index_0 = index
sumvect_0 = tmp_sumvect
# find the best vect to add,
# but only if there is a net decrease of the square distance
sqdist_1 = sqdist
index_1 = None
sumvect_1 = None
for index in range(n_rem):
tmp_sumvect = tuple(map(int.__add__, sumvect_0, remain[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_1 > tmp_sqdist:
sqdist_1 = tmp_sqdist
index_1 = index
sumvect_1 = tmp_sumvect
if sumvect_1:
tmp = output[index_0]
output[index_0] = remain[index_1]
remain[index_1] = tmp
sqdist = sqdist_1
sumvect = sumvect_1
n_approx_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_approx_swaps}"
f" approximately optimal swap{'s'[n_approx_swaps == 1:]}")
diffvect = tuple(map(int.__sub__, sumvect, target))
if max_optimal_swaps is None:
max_optimal_swaps = sqdist
n_optimal_swaps = 0
while sqdist and n_optimal_swaps < max_optimal_swaps:
# find the best pair to swap,
# but only if the square distance decreases
best_sqdist = sqdist
best_diffvect = diffvect
best_pair = None
for i0 in range(M):
tmp_diffvect = tuple(map(int.__sub__, diffvect, output[i0]))
for i1 in range(n_rem):
new_diffvect = tuple(map(int.__add__, tmp_diffvect, remain[i1]))
new_sqdist = sum(d * d for d in new_diffvect)
if best_sqdist > new_sqdist:
best_sqdist = new_sqdist
best_diffvect = new_diffvect
best_pair = (i0, i1)
if best_pair:
tmp = output[best_pair[0]]
output[best_pair[0]] = remain[best_pair[1]]
remain[best_pair[1]] = tmp
sqdist = best_sqdist
diffvect = best_diffvect
n_optimal_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_optimal_swaps}"
f" optimal swap{'s'[n_optimal_swaps == 1:]}")
return output
from random import randrange
C = 30 # number of columns
N = 2000 # total number of vectors
M = 500 # number of output vectors
F = 0.9 # fill factor of the target sum vector
T = int(M * F) # maximum value + 1 that can be appear in the target sum vector
A = 10000 # maximum number of approximately optimal swaps, may be None (∞)
B = 10 # maximum number of optimal swaps, may be None (unlimited)
target = tuple(randrange(T) for _ in range(C))
vects = [tuple(int(randrange(M) < t) for t in target) for _ in range(N)]
sample = get_sample(vects, target, M, A, B, True)
Typical output:
sqdist = 2639 after picking the first 500 vectors out of 2000
sqdist = 9 after 27 approximately optimal swaps
sqdist = 1 after 4 optimal swaps
P.S.: As it stands, this algorithm is not limited to binary input vectors, integer vectors would work too. Intuitively I suspect that the quality of the optimization could suffer, though. I suspect that this algorithm is more appropriate for binary vectors.
P.P.S.: Execution times with your kind of data are probably acceptable with standard CPython, but get better (like a couple of seconds, almost a factor of 10) with PyPy. To handle bigger sets of data, the algorithm would have to be translated to C or some other language, which should not be difficult at all.

Randomly select N unique elements from a list, given a probability for each

I've run into a problem: I have a list or array (IList) of elements that have a field (float Fitness). I need to efficiently choose N random unique elements depending on this variable: the bigger - the more likely it is to be chosen.
I searched on the internet, but the algorithms I found were rather unreliable.
The answer stated here seems to have a bigger probability at the beginning which I need to make sure to avoid.
-Edit-
For example I need to choose from objects with the values [-5, -3, 0, 1, 2.5] (negative values included).
The basic algorithm is to sum the values, and then draw a point from 0-sum(values) and an order for the items, and see which one it "intersects".
For the values [0.1, 0.2, 0.3] the "windows" [0-0.1, 0.1-0.3, 0.3-0.6] will look like this:
1 23 456
|-|--|---|
|-*--*---|
And you draw a point [0-0.6] and see what window it hit on the axis.
Pseudo-python for this:
original_values = {val1, val2, ... valn}
# list is to order them, order doesn't matter outside this context.
values = list(original_values)
# limit
limit = sum(values)
draw = random() * limit
while true:
candidate = values.pop()
if candidate > draw:
return candidate
draw -= candidate
So what shall those numbers represent?
Does 2.5 mean, that the probability to be chosen is twice as high than 1.25? Well - the negative values don't fit into that scheme.
I guess fitness means something like -5: very ill, 2.5: very fit. We have a range of 7.5 and could randomly pick an element, if we know how many candidates there are and if we have access by index.
Then, take a random number between -5 and 2.5 and see, if our number is lower than or equal to the candidates fitness. If so, the candidate is picked, else we repeat with step 1. I would say, that we then generate a new threshold to survive, because if we got an 2.5, but no candidate with that fitness remains, we would search infinitely.
The range of fitnesses has to be known for this, too.
fitnesses [-5, -3, 0, 1, 2.5]
rand -5 x x x x x
-2.5 - - x x x
0 - - x x x
2.5 - - - - x
If every candidate shall be testet every round, and the -5 guy shall have a chance to survive, you have to stretch the interval of random numbers a bit, to give him a chance, for instance, from -6 to 3.

Find North-East path with most points [duplicate]

In Cracking the Coding Interview, Fourth Edition, there is such a problem:
A circus is designing a tower routine consisting of people standing
atop one anoth- er’s shoulders For practical and aesthetic reasons,
each person must be both shorter and lighter than the person below him
or her Given the heights and weights of each person in the circus,
write a method to compute the largest possible number of people in
such a tower.
EXAMPLE: Input (ht, wt): (65, 100) (70, 150) (56, 90)
(75, 190) (60, 95) (68, 110)
Output: The longest tower is length 6 and
includes from top to bottom: (56, 90) (60,95) (65,100) (68,110)
(70,150) (75,190)
Here is its solution in the book
Step 1 Sort all items by height first, and then by weight This means that if all the heights are unique, then the items will be sorted by their height If heights are the same, items will be sorted by their weight
Step 2 Find the longest sequence which contains increasing heights and increasing weights
To do this, we:
a) Start at the beginning of the sequence Currently, max_sequence is empty
b) If, for the next item, the height and the weight is not greater than those of the previous item, we mark this item as “unfit”
c) If the sequence found has more items than “max sequence”, it becomes “max sequence”
d) After that the search is repeated from the “unfit item”, until we reach the end of the original sequence
I have some questions about its solutions.
Q1
I believe this solution is wrong.
For example
(3,2) (5,9) (6,7) (7,8)
Obviously, (6,7) is an unfit item, but how about (7,8)? According to the solution, it is NOT unfit as its h and w are bother bigger than (6,7), however, it cannot be considered into the sequence, because (7,8) does not fit (5,9).
Am I right?
If I am right, what is the fix?
Q2
I believe even if there is a fix for the above solution, the style of the solution will lead to at least O(n^2), because it need to iterate again and again, according to step 2-d.
So is it possible to have a O(nlogn) solution?
You can solve the problem with dynamic programming.
Sort the troupe by height. For simplicity, assume all the heights h_i and weights w_j are distinct. Thus h_i is an increasing sequence.
We compute a sequence T_i, where T_i is a tower with person i at the top of maximal size. T_1 is simply {1}. We can deduce subsequent T_k from the earlier T_j — find the largest tower T_j that can take k's weight (w_j < w_k) and stand k on it.
The largest possible tower from the troupe is then the largest of the T_i.
This algorithm takes O(n**2) time, where n is the cardinality of the troupe.
Tried solving this myself, did not meant to give 'ready made solution', but still giving , more to check my own understanding and if my code(Python) is ok and would work of all test cases. I tried for 3 cases and it seemed to work of correct answer.
#!/usr/bin/python
#This function takes a list of tuples. Tuple(n):(height,weight) of nth person
def htower_len(ht_wt):
ht_sorted = sorted(ht_wt,reverse=True)
wt_sorted = sorted(ht_wt,key=lambda ht_wt:ht_wt[1])
max_len = 1
len1 = len(ht_sorted)
i=0
j=0
while i < (len1-1):
if(ht_sorted[i+1][1] < ht_sorted[0][1]):
max_len = max_len+1
i=i+1
print "maximum tower length :" ,max_len
###Called above function with below sample app code.
testcase =1
print "Result of Test case ",testcase
htower_len([(5,75),(6.7,83),(4,78),(5.2,90)])
testcase = testcase + 1
print "Result of Test case ",testcase
htower_len([(65, 100),(70, 150),(56, 90),(75, 190),(60, 95),(68, 110)])
testcase = testcase + 1
print "Result of Test case ",testcase
htower_len([(3,2),(5,9),(6,7),(7,8)])
For example
(3,2) (5,9) (6,7) (7,8)
Obviously, (6,7) is an unfit item, but how about (7,8)?
In answer to your Question - the algorithm first runs starting with 3,2 and gets the sequence (3,2) (5,9) marking (6,7) and (7,8) as unfit.
It then starts again on (6,7) (the first unfit) and gets (6,7) (7,8), and that makes the answer 2. Since there are no more "unfit" items, the sequence terminates with maximum length 2.
After first sorting the array by height and weight, my code checks what the largest tower would be if we grabbed any of the remaining tuples in the array (and possible subsequent tuples). In order to avoid re-computing sub-problems, solution_a is used to store the optimal max length from the tail of the input_array.
The beginning_index is the index from which we can consider grabbing elements from (the index from which we can consider people who could go below on the human stack), and beginning_tuple refers to the element/person higher up on the stack.
This solution runs in O(nlogn) to do the sort. The space used is O(n) for the solution_a array and the copy of the input_array.
def determine_largest_tower(beginning_index, a, beginning_tuple, solution_a):
# base case
if beginning_index >= len(a):
return 0
if solution_a[beginning_index] != -1: # already computed
return solution_a[beginning_index]
# recursive case
max_len = 0
for i in range(beginning_index, len(a)):
# if we can grab that value, check what the max would be
if a[i][0] >= beginning_tuple[0] and a[i][1] >= beginning_tuple[1]:
max_len = max(1 + determine_largest_tower(i+1, a, a[i], solution_a), max_len)
solution_a[beginning_index] = max_len
return max_len
def algorithm_for_human_towering(input_array):
a = sorted(input_array)
return determine_largest_tower(0, a, (-1,-1), [-1] * len(a))
a = [(3,2),(5,9),(6,7),(7,8)]
print algorithm_for_human_towering(a)
Here is another way to approach the problem altogether with code;
Algorithm
Sorting first by height and then by width
Sorted array:
[(56, 90), (60, 95), (65, 100), (68, 110), (70, 150), (75, 190)]
Finding the length of the longest increasing subsequence of weights
Why the longest subsequence of weights is the answer?
The people are sorted by increasing height,
so when we are finding a subsequence of people with increasing weights too
these selected people would satisfy our requirement as they are both in increasing order of heights and weights and therefore can form a human tower.
For example:
[(56, 90) (60,95) (65,100) (68,110) (70,150) (75,190)]
Efficient Implementation
In the attached implementation we maintain a list of increasing numbers and uses bisect_left, which is implemented under the hood using binary search, to find the proper index for insertion.
Please Note; The sequence generated by longest_increasing_sequence method might not be the actual longest subsequence, however, the length of it - will surely be as the length of the longest increasing subsequence.
Kindly refer to Longest increasing subsequence Efficient algorithms for more details.
The overall time complexity is O(n log(n)) as desired.
Code
from bisect import bisect_left
def human_tower(height, weight):
def longest_increasing_sequence(A, get_property):
lis = []
for i in range(len(A)):
x = get_property(A[i])
i = bisect_left(lis, x)
if i == len(lis):
lis.append(x)
else:
lis[i] = x
return len(lis)
# Edge case, no people
if 0 == len(height):
return 0
# Creating array of heights and widths
people = [(h, w) for h, w in zip(height, weight)]
# Sorting array first by height and then by width
people.sort()
# Returning length longest increasing sequence
return longest_increasing_sequence(people, lambda t : t[1])
assert 6 == human_tower([65,70,56,75,60,68], [100,150,90,190,95,110])

Compare two arrays of points [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm trying to find a way to find similarities in two arrays of different points. I drew circles around points that have similar patterns and I would like to do some kind of auto comparison in intervals of let's say 100 points and tell what coefficient of similarity is for that interval. As you can see it might not be perfectly aligned also so point-to-point comparison would not be a good solution also (I suppose). Patterns that are slightly misaligned could also mean that they are matching the pattern (but obviously with a smaller coefficient)
What similarity could mean (1 coefficient is a perfect match, 0 or less - is not a match at all):
Points 640 to 660 - Very similar (coefficient is ~0.8)
Points 670 to 690 - Quite similar (coefficient is ~0.5-~0.6)
Points 720 to 780 - Let's say quite similar (coefficient is ~0.5-~0.6)
Points 790 to 810 - Perfectly similar (coefficient is 1)
Coefficient is just my thoughts of how a final calculated result of comparing function could look like with given data.
I read many posts on SO but it didn't seem to solve my problem. I would appreciate your help a lot. Thank you
P.S. Perfect answer would be the one that provides pseudo code for function which could accept two data arrays as arguments (intervals of data) and return coefficient of similarity.
Click here to see original size of image
I also think High Performance Mark has basically given you the answer (cross-correlation). In my opinion, most of the other answers are only giving you half of what you need (i.e., dot product plus compare against some threshold). However, this won't consider a signal to be similar to a shifted version of itself. You'll want to compute this dot product N + M - 1 times, where N, M are the sizes of the arrays. For each iteration, compute the dot product between array 1 and a shifted version of array 2. The amount you shift array 2 increases by one each iteration. You can think of array 2 as a window you are passing over array 1. You'll want to start the loop with the last element of array 2 only overlapping the first element in array 1.
This loop will generate numbers for different amounts of shift, and what you do with that number is up to you. Maybe you compare it (or the absolute value of it) against a threshold that you define to consider two signals "similar".
Lastly, in many contexts, a signal is considered similar to a scaled (in the amplitude sense, not time-scaling) version of itself, so there must be a normalization step prior to computing the cross-correlation. This is usually done by scaling the elements of the array so that the dot product with itself equals 1. Just be careful to ensure this makes sense for your application numerically, i.e., integers don't scale very well to values between 0 and 1 :-)
i think HighPerformanceMarks's suggestion is the standard way of doing the job.
a computationally lightweight alternative measure might be a dot product.
split both arrays into the same predefined index intervals.
consider the array elements in each intervals as vector coordinates in high-dimensional space.
compute the dot product of both vectors.
the dot product will not be negative. if the two vectors are perpendicular in their vector space, the dot product will be 0 (in fact that's how 'perpendicular' is usually defined in higher dimensions), and it will attain its maximum for identical vectors.
if you accept the geometric notion of perpendicularity as a (dis)similarity measure, here you go.
caveat:
this is an ad hoc heuristic chosen for computational efficiency. i cannot tell you about mathematical/statistical properties of the process and separation properties - if you need rigorous analysis, however, you'll probably fare better with correlation theory anyway and should perhaps forward your question to math.stackexchange.com.
My Attempt:
Total_sum=0
1. For each index i in the range (m,n)
2. sum=0
3. k=Array1[i]*Array2[i]; t1=magnitude(Array1[i]); t2=magnitude(Array2[i]);
4. k=k/(t1*t2)
5. sum=sum+k
6. Total_sum=Total_sum+sum
Coefficient=Total_sum/(m-n)
If all values are equal, then sum would return 1 in each case and total_sum would return (m-n)*(1). Hence, when the same is divided by (m-n) we get the value as 1. If the graphs are exact opposites, we get -1 and for other variations a value between -1 and 1 is returned.
This is not so efficient when the y range or the x range is huge. But, I just wanted to give you an idea.
Another option would be to perform an extensive xnor.
1. For each index i in the range (m,n)
2. sum=1
3. k=Array1[i] xnor Array2[i];
4. k=k/((pow(2,number_of_bits))-1) //This will scale k down to a value between 0 and 1
5. sum=(sum+k)/2
Coefficient=sum
Is this helpful ?
You can define a distance metric for two vectors A and B of length N containing numbers in the interval [-1, 1] e.g. as
sum = 0
for i in 0 to 99:
d = (A[i] - B[i])^2 // this is in range 0 .. 4
sum = (sum / 4) / N // now in range 0 .. 1
This now returns distance 1 for vectors that are completely opposite (one is all 1, another all -1), and 0 for identical vectors.
You can translate this into your coefficient by
coeff = 1 - sum
However, this is a crude approach because it does not take into account the fact that there could be horizontal distortion or shift between the signals you want to compare, so let's look at some approaches for coping with that.
You can sort both your arrays (e.g. in ascending order) and then calculate the distance / coefficient. This returns more similarity than the original metric, and is agnostic towards permutations / shifts of the signal.
You can also calculate the differentials and calculate distance / coefficient for those, and then you can do that sorted also. Using differentials has the benefit that it eliminates vertical shifts. Sorted differentials eliminate horizontal shift but still recognize different shapes better than sorted original data points.
You can then e.g. average the different coefficients. Here more complete code. The routine below calculates coefficient for arrays A and B of given size, and takes d many differentials (recursively) first. If sorted is true, the final (differentiated) array is sorted.
procedure calc(A, B, size, d, sorted):
if (d > 0):
A' = new array[size - 1]
B' = new array[size - 1]
for i in 0 to size - 2:
A'[i] = (A[i + 1] - A[i]) / 2 // keep in range -1..1 by dividing by 2
B'[i] = (B[i + 1] - B[i]) / 2
return calc(A', B', size - 1, d - 1, sorted)
else:
if (sorted):
A = sort(A)
B = sort(B)
sum = 0
for i in 0 to size - 1:
sum = sum + (A[i] - B[i]) * (A[i] - B[i])
sum = (sum / 4) / size
return 1 - sum // return the coefficient
procedure similarity(A, B, size):
sum a = 0
a = a + calc(A, B, size, 0, false)
a = a + calc(A, B, size, 0, true)
a = a + calc(A, B, size, 1, false)
a = a + calc(A, B, size, 1, true)
return a / 4 // take average
For something completely different, you could also run Fourier transform using FFT and then take a distance metric on the returning spectra.

How to implement Random(a,b) with only Random(0,1)? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
how to get uniformed random between a, b by a known uniformed random function RANDOM(0,1)
In the book of Introduction to algorithms, there is an excise:
Describe an implementation of the procedure Random(a, b) that only makes calls to Random(0,1). What is the expected running time of your procedure, as a function of a and b? The probability of the result of Random(a,b) should be pure uniformly distributed, as Random(0,1)
For the Random function, the results are integers between a and b, inclusively. For e.g., Random(0,1) generates either 0 or 1; Random(a, b) generates a, a+1, a+2, ..., b
My solution is like this:
for i = 1 to b-a
r = a + Random(0,1)
return r
the running time is T=b-a
Is this correct? Are the results of my solutions uniformly distributed?
Thanks
What if my new solution is like this:
r = a
for i = 1 to b - a //including b-a
r += Random(0,1)
return r
If it is not correct, why r += Random(0,1) makes r not uniformly distributed?
Others have explained why your solution doesn't work. Here's the correct solution:
1) Find the smallest number, p, such that 2^p > b-a.
2) Perform the following algorithm:
r=0
for i = 1 to p
r = 2*r + Random(0,1)
3) If r is greater than b-a, go to step 2.
4) Your result is r+a
So let's try Random(1,3).
So b-a is 2.
2^1 = 2, so p will have to be 2 so that 2^p is greater than 2.
So we'll loop two times. Let's try all possible outputs:
00 -> r=0, 0 is not > 2, so we output 0+1 or 1.
01 -> r=1, 1 is not > 2, so we output 1+1 or 2.
10 -> r=2, 2 is not > 2, so we output 2+1 or 3.
11 -> r=3, 3 is > 2, so we repeat.
So 1/4 of the time, we output 1. 1/4 of the time we output 2. 1/4 of the time we output 3. And 1/4 of the time we have to repeat the algorithm a second time. Looks good.
Note that if you have to do this a lot, two optimizations are handy:
1) If you use the same range a lot, have a class that computes p once so you don't have to compute it each time.
2) Many CPUs have fast ways to perform step 1 that aren't exposed in high-level languages. For example, x86 CPUs have the BSR instruction.
No, it's not correct, that method will concentrate around (a+b)/2. It's a binomial distribution.
Are you sure that Random(0,1) produces integers? it would make more sense if it produced floating point values between 0 and 1. Then the solution would be an affine transformation, running time independent of a and b.
An idea I just had, in case it's about integer values: use bisection. At each step, you have a range low-high. If Random(0,1) returns 0, the next range is low-(low+high)/2, else (low+high)/2-high.
Details and complexity left to you, since it's homework.
That should create (approximately) a uniform distribution.
Edit: approximately is the important word there. Uniform if b-a+1 is a power of 2, not too far off if it's close, but not good enough generally. Ah, well it was a spontaneous idea, can't get them all right.
No, your solution isn't correct. This sum'll have binomial distribution.
However, you can generate a pure random sequence of 0, 1 and treat it as a binary number.
repeat
result = a
steps = ceiling(log(b - a))
for i = 0 to steps
result += (2 ^ i) * Random(0, 1)
until result <= b
KennyTM: my bad.
I read the other answers. For fun, here is another way to find the random number:
Allocate an array with b-a elements.
Set all the values to 1.
Iterate through the array. For each nonzero element, flip the coin, as it were. If it is came up 0, set the element to 0.
Whenever, after a complete iteration, you only have 1 element remaining, you have your random number: a+i where i is the index of the nonzero element (assuming we start indexing on 0). All numbers are then equally likely. (You would have to deal with the case where it's a tie, but I leave that as an exercise for you.)
This would have O(infinity) ... :)
On average, though, half the numbers would be eliminated, so it would have an average case running time of log_2 (b-a).
First of all I assume you are actually accumulating the result, not adding 0 or 1 to a on each step.
Using some probabilites you can prove that your solution is not uniformly distibuted. The chance that the resulting value r is (a+b)/2 is greatest. For instance if a is 0 and b is 7, the chance that you get a value 4 is (combination 4 of 7) divided by 2 raised to the power 7. The reason for that is that no matter which 4 out of the 7 values are 1 the result will still be 4.
The running time you estimate is correct.
Your solution's pseudocode should look like:
r=a
for i = 0 to b-a
r+=Random(0,1)
return r
As for uniform distribution, assuming that the random implementation this random number generator is based on is perfectly uniform the odds of getting 0 or 1 are 50%. Therefore getting the number you want is the result of that choice made over and over again.
So for a=1, b=5, there are 5 choices made.
The odds of getting 1 involves 5 decisions, all 0, the odds of that are 0.5^5 = 3.125%
The odds of getting 5 involves 5 decisions, all 1, the odds of that are 0.5^5 = 3.125%
As you can see from this, the distribution is not uniform -- the odds of any number should be 20%.
In the algorithm you created, it is really not equally distributed.
The result "r" will always be either "a" or "a+1". It will never go beyond that.
It should look something like this:
r=0;
for i=0 to b-a
r = a + r + Random(0,1)
return r;
By including "r" into your computation, you are including the "randomness" of all the previous "for" loop runs.

Resources