Algorithm for Finding a possible profit - algorithm

I 'm searching for an algorithm (and except the naive brute force solution had no luck) that efficiently (O(n^2) preferably) does the following:
Supposing I’m playing a game and in this game I’ll have to answer n questions (each question from a different category). For each category “i” i=1,...,n I’ve calculated the probability p_i to give a correct answer.
For each consecutive k correct answers I’m getting k^4 points. What is the expected average profit?
I will clarify what I mean by expected profit in the following example:
In the case n=3 and p_1=0.2,p_2=0.3,p_3=0.4
The expected profit is
EP= (0.2* 0.3* 0.4 )3^4+ (I get all 3 answers correct)
(0.2* 0.3* 0.6 )2^4+ (0.8* 0.3* 0.4 )2^4+ (0.2* 0.7* 0.4 )2+ (2 answers correct)
0.2* 0.7* 0.6 ) + (0.8* 0.3* 0.6 )+ (0.8*0.7* 0.4 ) (1 answer correct)
clearly for each possible outcome I'm calculating the probability and multiply it with the points gained. And then get the sum off all those.
Any ideas?
I'm only interested in the sum itself.
Thank you!

Let A[t] be the expected profit after t questions given that either t = 0, t = n, or the t'th question was answered wrong. Then you can compute
A[0] = 0
A[t] = sum(i = 0..t-1) (probability of getting questions i .. t-2 right and t-1 wrong) * ((t-i-1)4 + A[i]) when 0 < t < n.
A[n] is computed similarly to the general case above, except you should also add a term for when all questions after the ith are answered correctly.

Related

In how many ways you can count till N using the numbers <= with N [duplicate]

This question already has answers here:
What is the fastest (known) algorithm to find the n-th Catalan number mod m?
(2 answers)
Closed 8 years ago.
in how many ways you can sum the numbers less or equal with N to be equal with n. What is the algorithm to solve that?
Example:
lets say that we have
n =10;
so there are a lot of combinations but for example we can do:
1+1+1+1+1+1+1+1+1+1 = 10
1+2+1+1+1+1+1+1+1=10
1+1+2+1+1+1+1+1+1=10
.....
1+9=10
10=10
8+2=10
and so on.
If you think is the Catalan questions, the answer is: the problem seems to be Catalan problem but is not. If you take a look to the results you will see that lets say for N=5 In Catalan algorithm you have 14 possibilities. But in right answer you have 2^4=16 possibilities if you count all, or the Fibonacci array if you keep only the unique combinations. Eg N=5 we have 8 possibilities, so the Catalan algorithm doesn't verify.
This was a question received by me in a quiz done for fun, at that time i thought that the solution is a well known formula, so i lost a lot of time trying to remember it :)
I found 2 solutions for this problem and 1 more if you are considering only the unique combinations. Eg 2+8 is the same as 8+2, you are considering only 1 of them.
So what is the algorithm to solve it?
This is an interesting problem. I do not have the solution (yet), but I think this can be done in a divide-and-conquer way. If you think of the problem space as a binary tree, you can generate it like this:
The root is the whole number n
Its children are floor(n/2) and ceil(n/2)
Example:
n=5
5
/ \
2 3
/ \ / \
1 1 1 2
/ \
1 1
If you do this recursively, you get a binary tree. If can then traverse the tree in this manner to get all the possible combinations of summing up to n:
get_combinations(root_node)
{
combinations=[]
combine(combinations, root_node.child_left, root_node.child_right)
}
combine(combinations, nodeA, nodeB)
{
new_combi = "nodeA" + "+nodeB"
combinations.add(new_combi)
if nodeA.has_children(): combinations.add( combine(combinations, nodeA.child_left, nodeA.child_right) + "+nodeB" )
if nodeB.has_children(): combinations.add( "nodeA+" + combine(combinations, nodeB.child_left, nodeB.child_right) )
return new_combi
}
This is just a draft. Of yourse you don't have to explicitly generate the tree beforehand, but you can do that along the way. Maybe I can come up with a nicer algorithm if I find the time.
EDIT:
OK, I didn't quite answer OPs question to the point, but I don't like to leave stuff unfinished, so here I present my solution as a working python program:
import math
def print_combinations(n):
for calc in combine(n):
line = ""
count = 0
for op in calc:
line += str(int(op))
count += 1
if count < len(calc):
line += "+"
print line
def combine(n):
p_comb = []
if n >= 1: p_comb.append([n])
if n >1:
comb_left = combine(math.floor(n/float(2)))
comb_right = combine(math.ceil(n/float(2)))
for l in comb_left:
for r in comb_right:
lr_merge = []
lr_merge.extend(l)
lr_merge.extend(r)
p_comb.append(lr_merge)
return p_comb
You can now generate all possible ways of summing up n with numbers <= n. For example if you want to do that for n=5 you call this: print_combinations(5)
Have fun, be aware though that you run into memory issues pretty fast (dynamic programming to the rescue!) and that you can have equivalent calculations (e.g. 1+2 and 2+1).
All the 3 solutions that I fount use Math induction:
solution 1:
if n =0 comb =1
if n =1 comb = 1
if n=2 there are 1+1, 2 comb =2 = comb(0)+comb(1)
if n=3 there are 1+1+1, 1+2, 2+1, 3 comb = 4 = comb(0)+comb(1)+comb(2)
if n=4 there are 1+1+1+1, 1+2+1,1+1+2,2+1+1,2+2,1+3,3+1,4 comb = 8 =comb(0)+comb(1)+comb(2)+comb(3)
Now we see a pattern here that says that:
at k value we have comb(k)= sum(comb(i)) where i between 0 and k-1
using math induction we can prove it for k+1 that:
comb(k+1)= sum(comb(i)) where is is between 0 and k
Solution number 2:
If we pay a little more attention to the solution 1 we can say that:
comb(0)=2^0
comb(1)=2^0
comb(2)=2^1
comb(3)=2^2
comb(4)=2^3
comb(k)=2^(k-1)
again using the math induction we can prove that
comb(k+1)=2^k
Solution number 3 (if we keep only the unique combinations) we can see that:
comb(0)=1
comb(1)=1
comb(2)= 1+1,2=2
comb(3)= 1+1+1, 1+2, 2+1, 3 we take out 1+2 because we have 2+1 and its the same comb(3)=3
comb(4) = 1+1+1+1, 1+2+1,1+1+2,2+1+1,2+2,1+3,3+1,4, here we take out the 1+2+1,,2+1+1 and 1+3 because we have them but in different order comb(4)= 5.
If we continue we can see that:
comb(5) = 8
comb(6)=13
we now can see the pattern that:
comb (k) = comb (k-1) + comb(k-2) the Fibonacci array
again using Math induction we can prove that for k+1
comb(k+1) = comb(k)+comb(k-1)
now it's easy to implement those solutions in a language using recursion for 2 of the solutions or just the non recursive method for the solution with 2^k.
And by the way this has serious connections with graph theory (how many sub-graphs you can build starting from a bigger graph - our number N, and sub-graphs being the ways to count )
Amazing isn't it?

Compare two arrays of points [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm trying to find a way to find similarities in two arrays of different points. I drew circles around points that have similar patterns and I would like to do some kind of auto comparison in intervals of let's say 100 points and tell what coefficient of similarity is for that interval. As you can see it might not be perfectly aligned also so point-to-point comparison would not be a good solution also (I suppose). Patterns that are slightly misaligned could also mean that they are matching the pattern (but obviously with a smaller coefficient)
What similarity could mean (1 coefficient is a perfect match, 0 or less - is not a match at all):
Points 640 to 660 - Very similar (coefficient is ~0.8)
Points 670 to 690 - Quite similar (coefficient is ~0.5-~0.6)
Points 720 to 780 - Let's say quite similar (coefficient is ~0.5-~0.6)
Points 790 to 810 - Perfectly similar (coefficient is 1)
Coefficient is just my thoughts of how a final calculated result of comparing function could look like with given data.
I read many posts on SO but it didn't seem to solve my problem. I would appreciate your help a lot. Thank you
P.S. Perfect answer would be the one that provides pseudo code for function which could accept two data arrays as arguments (intervals of data) and return coefficient of similarity.
Click here to see original size of image
I also think High Performance Mark has basically given you the answer (cross-correlation). In my opinion, most of the other answers are only giving you half of what you need (i.e., dot product plus compare against some threshold). However, this won't consider a signal to be similar to a shifted version of itself. You'll want to compute this dot product N + M - 1 times, where N, M are the sizes of the arrays. For each iteration, compute the dot product between array 1 and a shifted version of array 2. The amount you shift array 2 increases by one each iteration. You can think of array 2 as a window you are passing over array 1. You'll want to start the loop with the last element of array 2 only overlapping the first element in array 1.
This loop will generate numbers for different amounts of shift, and what you do with that number is up to you. Maybe you compare it (or the absolute value of it) against a threshold that you define to consider two signals "similar".
Lastly, in many contexts, a signal is considered similar to a scaled (in the amplitude sense, not time-scaling) version of itself, so there must be a normalization step prior to computing the cross-correlation. This is usually done by scaling the elements of the array so that the dot product with itself equals 1. Just be careful to ensure this makes sense for your application numerically, i.e., integers don't scale very well to values between 0 and 1 :-)
i think HighPerformanceMarks's suggestion is the standard way of doing the job.
a computationally lightweight alternative measure might be a dot product.
split both arrays into the same predefined index intervals.
consider the array elements in each intervals as vector coordinates in high-dimensional space.
compute the dot product of both vectors.
the dot product will not be negative. if the two vectors are perpendicular in their vector space, the dot product will be 0 (in fact that's how 'perpendicular' is usually defined in higher dimensions), and it will attain its maximum for identical vectors.
if you accept the geometric notion of perpendicularity as a (dis)similarity measure, here you go.
caveat:
this is an ad hoc heuristic chosen for computational efficiency. i cannot tell you about mathematical/statistical properties of the process and separation properties - if you need rigorous analysis, however, you'll probably fare better with correlation theory anyway and should perhaps forward your question to math.stackexchange.com.
My Attempt:
Total_sum=0
1. For each index i in the range (m,n)
2. sum=0
3. k=Array1[i]*Array2[i]; t1=magnitude(Array1[i]); t2=magnitude(Array2[i]);
4. k=k/(t1*t2)
5. sum=sum+k
6. Total_sum=Total_sum+sum
Coefficient=Total_sum/(m-n)
If all values are equal, then sum would return 1 in each case and total_sum would return (m-n)*(1). Hence, when the same is divided by (m-n) we get the value as 1. If the graphs are exact opposites, we get -1 and for other variations a value between -1 and 1 is returned.
This is not so efficient when the y range or the x range is huge. But, I just wanted to give you an idea.
Another option would be to perform an extensive xnor.
1. For each index i in the range (m,n)
2. sum=1
3. k=Array1[i] xnor Array2[i];
4. k=k/((pow(2,number_of_bits))-1) //This will scale k down to a value between 0 and 1
5. sum=(sum+k)/2
Coefficient=sum
Is this helpful ?
You can define a distance metric for two vectors A and B of length N containing numbers in the interval [-1, 1] e.g. as
sum = 0
for i in 0 to 99:
d = (A[i] - B[i])^2 // this is in range 0 .. 4
sum = (sum / 4) / N // now in range 0 .. 1
This now returns distance 1 for vectors that are completely opposite (one is all 1, another all -1), and 0 for identical vectors.
You can translate this into your coefficient by
coeff = 1 - sum
However, this is a crude approach because it does not take into account the fact that there could be horizontal distortion or shift between the signals you want to compare, so let's look at some approaches for coping with that.
You can sort both your arrays (e.g. in ascending order) and then calculate the distance / coefficient. This returns more similarity than the original metric, and is agnostic towards permutations / shifts of the signal.
You can also calculate the differentials and calculate distance / coefficient for those, and then you can do that sorted also. Using differentials has the benefit that it eliminates vertical shifts. Sorted differentials eliminate horizontal shift but still recognize different shapes better than sorted original data points.
You can then e.g. average the different coefficients. Here more complete code. The routine below calculates coefficient for arrays A and B of given size, and takes d many differentials (recursively) first. If sorted is true, the final (differentiated) array is sorted.
procedure calc(A, B, size, d, sorted):
if (d > 0):
A' = new array[size - 1]
B' = new array[size - 1]
for i in 0 to size - 2:
A'[i] = (A[i + 1] - A[i]) / 2 // keep in range -1..1 by dividing by 2
B'[i] = (B[i + 1] - B[i]) / 2
return calc(A', B', size - 1, d - 1, sorted)
else:
if (sorted):
A = sort(A)
B = sort(B)
sum = 0
for i in 0 to size - 1:
sum = sum + (A[i] - B[i]) * (A[i] - B[i])
sum = (sum / 4) / size
return 1 - sum // return the coefficient
procedure similarity(A, B, size):
sum a = 0
a = a + calc(A, B, size, 0, false)
a = a + calc(A, B, size, 0, true)
a = a + calc(A, B, size, 1, false)
a = a + calc(A, B, size, 1, true)
return a / 4 // take average
For something completely different, you could also run Fourier transform using FFT and then take a distance metric on the returning spectra.

Problems with dynamic programming

I've got difficulties with understanding dynamic programming, so I decided to solve some problems. I know basic dynamic algorithms like longest common subsequence, knapsack problem, but I know them because I read them, but I can't come up with something on my own :-(
For example we have subsequence of natural numbers. Every number we can take with plus or minus. At the end we take absolute value of this sum. For every subsequence find the lowest possible result.
in1: 10 3 5 4;
out1: 2
in2: 4 11 5 5 5;
out2: 0
in3: 10 50 60 65 90 100;
out3: 5
explanation for 3rd: 5 = |10+50+60+65-90-100|
what it worse my friend told me that it is simple knapsack problem, but I can't see any knapsack here. Is dynamic programming something difficult or only I have big problems with it?
As has been pointed out by amit, this algorithm can be understood as an instance of the partition problem. For a simple implementation take a look at this Python code:
def partition(A):
n = len(A)
if n == 0:
return 0
k, s = max(A), sum(A)/2.0
table = [0 if x else 1 for x in xrange(n*k)]
for i in xrange(n):
for j in xrange(n*k-1, -1, -1):
if table[j-A[i]] > table[j]:
table[j] = 1
minVal, minIdx = float('+inf'), -1
for j in xrange(int(s)+1):
if table[j] and s-j < minVal:
minVal, minIdx = s-j, j
return int(2*minVal)
When called with one of the inputs in the question:
partition([10, 50, 60, 65, 90, 100])
It will return 5, as expected. For fully understanding the math behind the solution, please take a look at this examples and click the "Balanced Partition" link.
The knapsack in here is weight = value = number for each element.
your bound W is 1/2 * sum(elements).
The idea is - you want to maximize the amount of numbers you "pick" without passing the limit of 1/2 * sum(elements), which is exactly knapsack with value=weight.
This problem is actually the partition problem, which is a special case of the subset sum problem.
The partition problem says: "Is it possible to get a subset of the elements that sums exactly to half?"
The derivation to your problem from here is simple - if there is, take these as +, and those you didn't take as -, and you get out = 0. [the other way around works the same]. Thus, your described problem is the optimization for partition problem.
This is the same problem as in Tug Of War, without the constraint of balanced team sizes (which is not relevant):
http://acm.uva.es/p/v100/10032.html
I had solved this problem with a top-down approach. It works on the constraint that there is an upper limit to the numbers given. Do you have an upper limit or are the numbers unconstrained? If they are unconstrained I don't see how to solve this with dynamic programming.

Find the formula of this binary recurrence equation? f(m,n) = f(m-1,n) + f(m,n-1)

SORRY GUYS! MY MISTAKE! Thanks for your reminder, I found out f(0,k) == f(k,0) == 1. This question is about how to count the number of shortest paths from grid (0,0) to (m,n).
I have to solve the following equation now, find out exactly what f(m,n) equal to.
1) f(m,n) = 0 : when (m,n) = (0,0)
**2) f(m,n) = 1 : when f(0,k) or f(k,0)**
3) f(m,n) = f(m-1,n) + f(m,n-1) : when else
for example:
1) f(0,0) = 0;
2) f(0,1) = 1; f(2,0) = 1;
3) f(2,1) = f(1,1) + f(2,0) = f(0, 1) + f(1, 0) + f(2, 0) = 1 + 1 + 1 = 3
I remember there is a standard way to solve such kinds of binary recurrence equation as I learned in my algorithm class several years ago, but I just cannot remember for now.
Could anyone give any hint? Or a keyword how to find the answer?
Ugh, I was just having fun going through my old textbooks on generating functions, and you went and changed the question again!
This question is about how to count the number of shortest path from grid (0,0) to (m,n).
This is a basic combinatorics question - it doesn't require knowing anything about generating functions, or even recurrence relations.
To solve, imagine the paths being written out as a sequence of U's (for "up") and R's (for "right"). If we are moving from (0,0) to, say, (5, 8), there must be 5 R's and 8 U's. Just one example:
RRUURURUUURUU
There will always be, in this example, 8 U's and 5 R's; different paths will just have them in different orders. So we can just choose 8 positions for our U's, and the rest must be R's. Thus, the answer is
(8+5) choose (8)
Or, in general,
(m+n) choose (m)
This is simply the binomial coefficient
f(m,n) = (m+n choose m) = (m+n choose n)
You can prove this by noting that they satisfy the same recurrence relation.
To derive the formula (if you couldn't just guess and then check), use generating functions as Chris Nash correctly suggests.
Try looking up "generating functions" in the literature. One approach would be to imagine a function P(x,y) where the coefficient of x^m y^n is f(m,n). The recurrence line (line 3) tells you that P(x,y) - x.P(x,y) - y.P(x,y) = (1-x-y) P(x,y) should be pretty simple except for those pesky edge values. Then solve for P(x,y).
Are you sure f(k,0) = f(0,k) = k, and not 1, maybe? If it were, I'd say the best bet would be to write some values out, guess what they are, then prove it.

HMM for solving given coin output

I have got this assignment question on HMM and I have solved it. I would like to know if I am correct. The problem is:
Suppose a dishonest dealer has two coins, one fair and one biased; the biased coin
has heads probability 1/4. Assume that the dealer never switches the coins. Which
coin is more likely to have generated the sequence HTTTHHHTTTTHTHHTT? It may
be useful to know that log2(3) = 1.585
I calculated the P for fair coin and biased coin.
The P for fair coin is 7.6*10-6 where as P for biased coin is 3.43*10-6. I didn't use log term, which can be used if I solve it the other way. So, I concluded that it is more likely that the given sequence is generated by a fair coin.
Am I right?
Any help is greatly appreciated.
So you are given the following.
P(H|Fake) = 1/4 P(T|Fake) = 3/4
P(H|Fair) = 1/2 P(T|Fair) = 1/2
P(Fair) = 1/2 P(Fake) = 1/2
To answer the question you need to answer P(Fake/HTTTHHHTTTTHTHHTT) and P(Fair/HTTTHHHTTTTHTHHTT) for which you need to apply bayes:
Let X be HTTTHHHTTTTHTHHTT
P(Fake|X) = (P(X|Fake) * P(Fake)) / P(X)
P(Fair|X) = (P(X|Fair) * P(Fair)) / P(X)
Where
P(X) = P(X|Fake) * P(Fake) + P(X|Fair) * P(Fair)
P(X) = (3.43710e-6 * 0.5) + (7.629e-6 * 0.5) = 5.533e-6
And therefore
P(Fake|X) = (3.43710e-6 * 0.5) / 5.533e-6 = 0.3106
P(Fair|X) = (7.629e-6 * 0.5) / 5.533e-6 = 0.6894
So therefore, is more likely that the used coin is the FAIR one. Even though intuitively one might think that the selected coin is the Fake it seems that this is not the case. The given distribution is closer to 0.5 tail 0.5 heads than to 0.25 heads 0.75 tails. For example, in the case of tails 10/17 is 0.58 that is closer to P(T|Fair)=.5 than to P(T|Fake)=.75
HMM is a bit of an overkill for this example. The probability of getting heads in binomially distributed, with p = 0.5 for the fair coin and p = 0.25 for the other one. For both of them, the number of trials n = 17 (if my counting is correct). From the 17 samples you got 7 successes (7 heads). Using Wolfram Alpha, the probability of the fair coin generating this sample is approx 0.15, as opposed to approx 0.07 for the unfair coin. Note I did not bother calculating the exact numbers, just looked at the plots. The formula is there for you to work with if you want to.
EDIT
If you absolutely must use a HMM, set the set of hidden states to be {fair; unfair} . The transition probabilities are: from a hidden state "fair" to a hidden state "fair"= 1, from a fair to unfair 0, etc, because the dealer is not allowed to change coins halfway through the trial. The emission probability from a hidden state "fair" are 0.5 for observable state "heads" and 0.5 for observable state "tails" (0.25 and 0.75 from "unfair"). You can assume at time t=0 hidden state "fair" and "unfair" are equally likely.

Resources