Probability of state representation - probability

Just working my way through the W.Feller introduction to probability theory and its applications volume 1. An example in the chapter on combinatorial analysis asks the question:
"Each of the 50 states has 2 senators. If we choose 50 senators at random, what is the probability a given state is represented?"
I understand the answer given which uses the complement of the event but was curious whether the method where you force the desired outcome to occur, then work out how many ways the remaining cells can be chosen, would work here too?
AJ

Let s1 and s2 the two senators of the state.
P(state is represented) = P(s1 or s2 is chosen by chance).
Let us compute the respective numbers of favorable cases:
s1 and s2 chosen: 48 to choose in the 98 remaining senators
s1 chosen without s2: 49 to choose in the 98 remaining senators
s2 chosen without s1: the same
none of them chosen: 50 to choose in the 98 remaining senators
that is, P(state is represented) = (98!/48!50! + 2*98!/49!49!) / (98!/48!50! + 2*98!/49!49! + 98!/48!50!)

Related

Prove XOR doesn't work for finding a missing number (interview question)?

Interview question: you're given a file of roughly one billion unique numbers, each of which is a 32-bit quantity. Find a number not in the file.
When I was approaching this question, I tried a few examples with 3-bit and 4-bit numbers. For the examples I tried, I found that when I XOR'd the set of numbers, I got a correct answer:
a = [0,1,2] # missing 3
b = [1,2,3] # missing 0
c = [0,1,2,3,4,5,6] # missing 7
d = [0,1,2,3,5,6,7] # missing 4
functools.reduce((lambda x, y: x^y), a) # returns 3
functools.reduce((lambda x, y: x^y), b) # returns 0
functools.reduce((lambda x, y: x^y), c) # returns 7
functools.reduce((lambda x, y: x^y), d) # returns 4
However, when I coded this up and submitted it, it failed the test cases.
My question is: in an interview setting, how can I confirm or rule out with certainty that an approach like this is not a viable solution?
In all your examples, the array is missing exactly one number. That's why XOR worked. Try not to test with the same property.
For the problem itself, you can construct a number by taking the minority of each bit.
EDIT
Why XOR worked on your examples:
When you take the XOR for all the numbers from 0 to 2^n - 1 the result is 0 (there are exactly 2^(n-1) '1' in each bit). So if you take out one number and take XOR of all the rest, the result is the number you took out because taking XOR of that number with the result of all the rest needs to be 0.
Assuming a 64-bit system with more than 4gb free memory, I would read the numbers into an array of 32-bit integers. Then I would loop through the numbers up to 32 times.
Similarly to an inverse ”Mastermind” game, I would construct a missing number bit-by-bit. In every loop, I count all numbers which match the bits, I have chosen so far and a subsequent 0 or 1. Then I add the bit which occurs less frequently. Once the count reaches zero, I have a missing number.
Example:
The numbers in decimal/binary are
1 = 01
2 = 10
3 = 11
There is one number with most-significant-bit 0 and two numbers with 1. Therefore, I take 0 as most significant bit.
In the next round, I have to match 00 and 01. This immediately leads to 00 as missing number.
Another approach would be to use a random number generator. Chances are 50% that you find a non-existing number as first guess.
Proof by counterexample: 3^4^5^6=4.

0-1 Knapsack with penalty for under and overweight cases

Assume a classic 0-1 knapsack problem but you are allowed to overflow/underflow the sack with some penalty. X profit is deducted for every unit overflow (weight above max capacity) and Y profit is deducted for every unit underflow (weight below max capacity).
I thought of sorting all items by the ratio of profit to weight and then try to fill the sack like a normal knapsack problem then for remaining weight and items I calculate extra profit by taking the underflow, overflow in consideration.
This solution fails in some cases like when there are 3 items with weight 30,20,10 and profit 30, 25, 20 respectively. Max weight allowed is 39, underflow penalty is 5 and overflow penalty is 10.
My solution was to solve it like normal knapsack then considering penalties so it gives the solution of selecting items of weight 20,10 but then it does not add the item of weight 30 as its penalty is higher than profit. The optimal solution should be selection items of weight 30 and 10. The only thing I can think of now is to brute force which should be avoided if possible. If anyone could think of any other solution, that'd be great!
You can break it into two subproblems, one with an underweight penalty and one with an overweight penalty. More specifically, you can solve the problem by solving two different integer linear programming problems, and taking the best of the two solutions:
Say that you have n items of weights w1,w2,...,wn and values v1, v2, ..., vn. Say that the weight capacity is C, the penalty for undeweight is A and the penality for overweight is B (per unit).
In both problems, let the binary decision variable be x1, ..., xn indicating whether or not the corresponding item is selected.
Problem 1)
max v1*x1 + v2*x2 + ... + vn*xn - A*(C - w1*x1 - w2*x2 - ... - wn*xn)
subject to
w1*x1 + w2*x2 + ... + wn*xn <= C
Note that via algebra the objective function is the same as the affine expression
(v1 + A*w1)*x1 + ... + (vn + A*wn)*xn - A*C
and is maximized at the same values x1, ..., xn which maximize the purely linear function
(v1 + A*w1)*x1 + ... + (vn + A*wn)*xn
This subproblem can be solved using any ILP solver, or just as an ordinary knapsack problem.
Problem 2)
max v1*x1 + v2*x2 + ... + vn*xn - B*(w1*x1 + w2*x2 + ... + wn*xn - C)
subject to
w1*x1 + w2*x2 + ... + wn*xn >= C
which can be solved by maximizing the linear objective function
(v1 - B*w1)*x1 + ... + (vn - B*wn)*xn
Again, that can be solved with any ILP solver. This problem isn't a knapsack problem since the inequality in the main constraint points in the wrong direction, though there might be some way of reducing it to a knapsack problem.
On Edit. The second problem can also be solved as a knapsack problem -- one in which you decide which items to not include. Start with the solution in which you include everything. If this isn't feasible (if the sum of all weights doesn't exceed the capacity) then you are done. The solution of problem 1 is the global solution. Otherwise. Define the surplus, S, to be
S = w1 + w2 + ... + wn - C
Now, solve the following knapsack problem:
weights: w1, w2, ..., wn //same as before
values: Bw1 - v1, Bw2 - v2, ..., BWn - vn
capacity: S
A word on the values: Bwi - vi is a measure of how much removing the ith object helps (under the assumption that removing it keeps you above the original capacity so that you don't need to consider the underweight penalties). On the one hand, it removes part of the penalty, Bwi, but on the other hand it takes some value away, vi.
After you solve this knapsack problem -- remove these items. The remaining items are the solution for problem 2.
Lets see how this plays out for your toy problem:
weights: 30, 20, 10
values: 20, 25, 20
C: 39
A: 5 //per-unit underflow penalty
B: 10 //per-unit overflow penalty
For problem 1, solve the following knapsack problem:
weights: 30, 20, 10
values: 170, 125, 70 // = 20 + 5*30, 25 + 5*20, 20 + 5*10
C: 39
This has solution: include 20, 10 with value of 195. In terms of the original problem this has value 195 - 5*39 = 0. That seems a bit weird, but in terms of the original problem the value of using the last two items is 25 + 20 = 45 but it leaves you 9 units under with a penalty of 5*9 = 45 and 45 - 45 = 0
Second problem:
weights: 30, 20, 10
values: 280, 175, 80 // = 10*30 - 20, 10*20 - 25, 10*10 - 20
S: 26 // = 30 + 20 + 10 - 39
The solution of this problem is clearly to select 20. This means that 20 is selected for non-inclusion. This means that for the second problem I want to include the objects of weights 30 and 10.
The value of doing so is (in terms of the original problem)
20 + 20 - 10*1 = 30
Since 30 > 0 (the value of solution 1), this is the overall optimal solution.
To sum up: you can solve your version of the knapsack problem by solving two ordinary knapsack problems to find two candidate solutions and then taking the better of the two. If you already have a function to solve knapsack problems, it shouldn't be too hard to write another function which calls it twice, interprets the outputs, and returns the best solution.
You can still use standard dynamic programming.
Let's compute whether the sum s is reachable for all s from 0 to the sum of all elements of the array. That's exactly what a standard dynamic programming solution does. We don't care about penalty here.
Let's iterate over all reachable sums and choose the best one taking into account the penalty for over(or under)flow.

Can an ANN of 2 neurons solve XOR?

I know that an artificial neural network (ANN) of 3 neurons in 2 layers can solve XOR
Input1----Neuron1\
\ / \
/ \ +------->Neuron3
/ \ /
Input2----Neuron2/
But to minify this ANN, can just 2 neurons (Neuron1 takes 2 inputs, Neuron2 take only 1 input) solve XOR?
Input1
\
\ Neuron1------->Neuron2
/
Input2/
The artificial neuron receives one or more inputs...
https://en.wikipedia.org/wiki/Artificial_neuron
Bias input '1' is assumed to be always there in both diagrams.
Side notes:
Single neuron can solve xor but with additional input x1*x2 or x1+x2
https://www.quora.com/Why-cant-the-XOR-problem-be-solved-by-a-one-layer-perceptron/answer/Razvan-Popovici/log
The ANN form in second diagram may solve XOR with additional input like above to Neuron1 or Neuron2?
No that's not possible, unless (maybe) you start using some rather strange, unusual activation functions.
Let's first ignore neuron 2, and pretend that neuron 1 is the output node. Let x0 denote the bias value (always x0 = 1), and x1 and x2 denote the input values of an example, let y denote the desired output, and let w1, w2, w3 denote the weights from the x's to neuron 1. With the XOR problem, we have the following four examples:
x0 = 1, x1 = 0, x2 = 0, y = 0
x0 = 1, x1 = 1, x2 = 0, y = 1
x0 = 1, x1 = 0, x2 = 1, y = 1
x0 = 1, x1 = 1, x2 = 1, y = 0
Let f(.) denote the activation function of neuron 1. Then, assuming we can somehow train our weights to solve the XOR problem, we have the following four equations:
f(w0 + x1*w1 + x2*w2) = f(w0) = 0
f(w0 + x1*w1 + x2*w2) = f(w0 + w1) = 1
f(w0 + x1*w1 + x2*w2) = f(w0 + w2) = 1
f(w0 + x1*w1 + x2*w2) = f(w0 + w1 + w2) = 0
Now, the main problem is that activation functions that are typically used (ReLUs, sigmoid, tanh, idendity function... maybe others) are nondecreasing. That means that if you give it a larger input, you also get a larger output: f(a + b) >= f(a) if b >= 0. If you look at the above four equations, you'll see this is a problem. Comparing the second and third equations to the first tell us that w1 and w2 need to be positive because they need to increase the output in comparison to f(w0). But, then the fourth equation won't work out because it will give an even greater output, instead of 0.
I think (but didn't actually try to verify, maybe I'm missing something) that it would be possible if you use an activation function that goes up first and then down again. Think of something like f(x) = -(x^2) with some extra term to shift it away from the origin. I don't think such activation functions are commonly used in neural networks. I suspect they'll behave less nicely when training, and are not plausible from a biological point of view (remember than neural networks are at least inspired by biology).
Now, in your question you also added an extra link from neuron 1 to neuron 2, which I ignored in the discussion above. The problem here is still the same though. The activation level in neuron 1 is always going to be higher than (or at least as high as) the second and third cases. Neuron 2 would typically again have a nondecreasing activation function, so would not be able to change this (unless you put a negative weight between the hidden neuron 1 and output neuron 2, in which case you flip the problem around and will predict too high a value for the first case)
EDIT: Note that this is related to Aaron's answer, which is essentially also about the problem of nondecreasing activation functions, just using more formal language. Give him an upvote too!
It's not possible.
Firstly, you need an equal number of inputs to the inputs of XOR. The smallest ANN capable of modelling any binary operation will contain two inputs. The second diagram only shows one input, one output.
Secondly, and this is probably the most direct refutation, the XOR function's output is not an additive or multiplicative relationship, but can be modelled using a combination of them. A neuron is generally modelled using functions like sigmoids or lines which have no stationary points, so one layer of neurons can roughly approximate an additive or multiplicative relationship.
What this means is that a minimum of two layers of processing are required to produce a XOR operation.
This question brings up an interesting topic of ANNs. They are well-suited to identifying fuzzy relationships, but tend to require at least as much network complexity as any mathematical process which would solve the problem with no fuzzy margin for error. Use ANNs where you need to identify something which looks mostly like what you are identifying, and use math where you need to know precisely whether something matches a set of concrete traits.
Understanding the distinction between ANN and mathematics opens up the possibility of combining the two in more powerful calculation pipelines, such as identifying possible circles in an image using ANN, using mathematics to pin down their precise origins, and using a second ANN to compare those origins to the configurations on known objects.
It is absolutely possible to solve the XOR problem with only two neurons.
Take a look at the model below.
This model solves the problem easily.
The first representing logic AND and the other logic OR. The value of +1.5 for the threshold of the hidden neuron insures that it will be turned on only when both input units are on. The value of +0.5 for the output neuron insures that it will turn on only when it receives a net positive input greater than +0.5. The weight of -2 from the hidden neuron to the output one insures that the output neuron will not come on when both input neurons are on (ref. 2).
ref. 1: Hazem M El-Bakry, Modular neural networks for solving high complexity problems (link)
ref. 2: D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representation by error backpropagation, Parallel distributed processing: Explorations in the Microstructures of Cognition, Vol. 1, Cambridge, MA: MIT Press, pp. 318-362, 1986.
Of course it is possible. But before solving XOR problem with two neurons I want to discuss on linearly separability. A problem is linearly separable if only one hyperplane can make the decision boundary. (Hyperplane is just a plane drawn to differentiate the classes. For an N dimensional problem i.e, a problem having N features as inputs the hyperplane will be an N-1 dimensional plane.) So for a 2 input XOR problem the hyperplane will be an one dimensional plane that is a "line".
Now coming to the question, XOR is not linearly separable. Hence we cannot directly solve XOR problem with two neurons. Following images show no matter how many ways we draw a line in 2D space we cannot differentiate one side's output with the other. For example for the first one (0,1) and (1,0) both inputs makes XOR to give 1. But for the input (1,1) the output is 0 but we cannot make it separated and unfortunately they are falling in the same side.
So here we have two options to solve it:
Using hidden layer. But it will increase the number of neurons more than two.
Another option is to increase the dimensions.
Let's have an illustration how increasing dimensions can solve this problem keeping the the number of neurons 2.
For an analogy we can think XOR as a subtraction of AND from OR like below,
If you notice the upper figure, the first neuron will mimic logical AND after passing "v=(-1.5)+(x1*1)+(x2*1)" to some activation function and the output will be considered as 0 or 1 depending on v is negative or positive respectively (I am not getting into the details...hope you got the point). And the same way the next neuron will mimic logical OR.
So for the first three cases of the truth table the AND neuron will remain turned off. But for the last one (actually where OR is different from XOR) the AND neuron will be turned on providing a big negative value to the OR neuron which will overwhelm the total summation to negative as it is big enough to make the summation a negative number. So finally activation function of the second neuron will interpret it as 0.
By this way we can make XOR with 2 neurons.
Following two figures are also the solutions to your questions which I have collected:
The problem can be split in two parts.
Part one
a b c
-------
0 0 0
0 1 1
1 0 0
1 1 0
Part two
a b d
-------
0 0 0
0 1 0
1 0 1
1 1 0
Part one can be solved with one neuron.
Part two can also be solved with one neuron.
part one and part two added together makes the XOR.
c = sigmoid(a * 6.0178 + b * -6.6000 + -2.9996)
d = sigmoid(a * -6.5906 + b *5.9016 + -3.1123 )
----------------------------------------------------------
sigmoid(0.0 * 6.0178 + 0 * -6.6000 + -2.9996)+ sigmoid(0.0 * -6.5906 + 0 *5.9016 + -3.1123 ) = 0.0900
sigmoid(1.0 * 6.0178 + 0 * -6.6000 + -2.9996)+ sigmoid(1.0 * -6.5906 + 0 *5.9016 + -3.1123 ) = 0.9534
sigmoid(0.0 * 6.0178 + 1 * -6.6000 + -2.9996)+ sigmoid(0.0 * -6.5906 + 1 *5.9016 + -3.1123 ) = 0.9422
sigmoid(1.0 * 6.0178 + 1 * -6.6000 + -2.9996)+ sigmoid(1.0 * -6.5906 + 1 *5.9016 + -3.1123 ) = 0.0489

Minimum count of numbers to be inserted in [a,b] such that GCD of 2 consecutive numbers is 1

This question was asked in TopCoder - SRM 577. Given 1 <= a < b <= 1000000, what is the minimum count of numbers to be inserted between a & b such that no two consecutive numbers will share a positive divisor greater than 1.
Example:
a = 2184; b = 2200. We need to insert 2 numbers 2195 & 2199 such that the condition holds true. (2184,2195,2199,2200)
a = 7; b= 42. One number is sufficient to insert between them. The number can be 11.
a = 17;b = 42. The GCD is already 1, so no need to insert any number.
Now, the interesting part is that for the given range [1,1000000] we never require more than 2 elements to be inserted between a and b. Even more, the 2 numbers are speculated to be a+1 and b-1 though it yet to be proven.
Can anyone prove this?
Can it be extended to larger range of numbers also? Say, [1,10^18] etc
Doh, sorry. The counterexample I have is
a=3199611856032532876288673657174760
b=3199611856032532876288673657174860
(Would be nice if this stupid site allowed everyone to edit its posts)
Each number has some factorization. If a, b each have a little number of distinct prime factors (DPF), and distance between them is large, it is certain there will be at least one number between them, whose set of DPF s has no elements in common with the two. So this will be our one-number pick n, such that gcd(a,n) == 1 and gcd(n,b) == 1. The higher we go, the more prime factors there are, potentially, and the probability for even gcd(a,b)==1 is higher and higher, and also for the one-num-in-between solution.
When will one-num solution not be possible? When a and b are highly-composite - have a lot of DPF s each - and are situated not too far from each other, so each intermediate number has some prime factors in common with one or two of them. But gcd(n,n+1)==1 for any n, always; so picking one of a+1 or b-1 - specifically the one with smallest amount of DPF s - will decrease the size of combined DPF set, and so picking one number between them will be possible. (... this is far from being rigorous though).
This is not a full answer, more like an illustration. Let's try this.
-- find a number between the two, that fulfills the condition
gg a b = let fs=union (fc a) (fc b)
in filter (\n-> null $ intersect fs $ fc n) [a..b]
fc = factorize
Try it:
Main> gg 5 43
[6,7,8,9,11,12,13,14,16,17,18,19,21,22,23,24,26,27,28,29,31,32,33,34,36,37,38,39
,41,42]
Main> gg 2184 2300
[2189,2201,2203,2207,2209,2213,2221,2227,2237,2239,2243,2251,2257,2263,2267,2269
,2273,2279,2281,2287,2291,2293,2297,2299]
Plenty of possibilities for just one number to pick between 5 and 43, or between 2184 and 2300. But what about the given pair, 2184 and 2200?
Main> gg 2184 2200
[]
No one number exists to put in between them. But obviously, gcd (n,n+1) === 1:
Main> gg 2185 2200
[2187,2191,2193,2197,2199]
Main> gg 2184 2199
[2185,2189,2195]
So having picked one adjacent number, we indeed have plenty of possibilities for the 2nd number. Your question is, to prove that it is always the case.
Let's look at their factorizations:
Main> mapM_ (print.(id&&&factorize)) [2184..2200]
(2184,[2,2,2,3,7,13])
(2185,[5,19,23])
(2186,[2,1093])
(2187,[3,3,3,3,3,3,3])
(2188,[2,2,547])
(2189,[11,199])
(2190,[2,3,5,73])
(2191,[7,313])
(2192,[2,2,2,2,137])
(2193,[3,17,43])
(2194,[2,1097])
(2195,[5,439])
(2196,[2,2,3,3,61])
(2197,[13,13,13])
(2198,[2,7,157])
(2199,[3,733])
(2200,[2,2,2,5,5,11])
It is obvious that the higher the range, the easier it is to satisfy the condition, because the variety of contributing prime factors is greater.
(a+1) won't always work by itself - consider 2185, 2200 case (similarly, for 2184,2199 the (b-1) won't work).
So if we happen to get two highly composite numbers as our a and b, picking an adjacent number to either one will help, because usually it will have only few factors.
This answer addresses that part of the question which asks for a proof that a subset of {a,a+1,b-1,b} will always work. The question says: “Even more, the 2 numbers are speculated to be a+1 and b-1 though it yet to be proven. Can anyone prove this?”. This answer shows that no such proof can exist.
An example that disproves that a subset of {a,a+1,b-1,b} always works is {105, 106, 370, 371} = {3·5·7, 2·53, 2·5·37, 7·53}. Let (x,y) denote gcd(x,y). For this example, (a,b)=7, (a,b-1)=5, (a+1,b-1)=2, (a+1,b)=53, so all of the sets {a,b}; {a, a+1, b}; {a,b-1,b}; and {a, a+1, b-1,b} fail.
This example is a result of the following reasoning: We want to find a,b such that every subset of {a,a+1,b-1,b} fails. Specifically, we need the following four gcd's to be greater than 1: (a,b), (a,b-1), (a+1,b-1), (a+1,b). We can do so by finding some e,f that divide even number a+1 and then construct b such that odd b is divisible by f and by some factor of a, while even b-1 is divisible by e. In this case, e=2 and f=53 (as a consequence of arbitrarily taking a=3·5·7 so that a has several small odd-prime factors).
a=3199611856032532876288673657174860
b=3199611856032532876288673657174960
appears to be a counterexample.

An algorithm for splitting a sequence in equally spaced, non colliding subsequences

I got this problem that I can't just solve algorithmically.
Let's say i have a video capture that always captures video frames at a fixed rate F (let's say 30 frames per second).
What I want is to "split" this frame sequence in n (say four) subsequences. Each subsequence has its framerate fn, that's obviously < F. Frames in a subsequence are equally spaced in time, so for example some valid 10 fps sequence f1 will be contructed like that for F = 30 fps and time = 1 second:
(0s are frames that don't belogn to the subsequence, 1s are frames that do):
100 (in 1 second it will repeated like: 100100100100100100100100100100)
or
010 (again, in 1 sec it will go like: 010010010010010010010010010010)
or, for F = 30 and f = 8:
100000001
(and it would take MCD (30,8) = 120 frames before a second restarts with an "1").
The problem is that subsequences can't collide, so if F=30, f1 = 10 fps (every three frames) and f2 = 5 fps (every six frames), this sequence is ok:
102100 (again, in a second: 102100102100102100102100102100)
But if we add f3 = 6 fps
132100 (1 AND 3) <--- collides! 02100102100102100102100
or
102103102130102 (1 AND 3) <--- collides! 00102100102100
The third subsequence would collide with the first.
The question is:
Is there a way to find every combination of the framerates of the n (with n <= 4) subsequences that won't collide and would be equally spaced?
(I need the general case, but in this particular case, I would need all the valid combinations for one sequence only (trivial), all the valid combinations for two sequences, all the valid combinations of three sequences, and all for four sequences).
Hope someone could enlighten my mind.
Thank you!
I believe this would do it for the 4 stream case, and it should be obvious what to do for the fewer stream cases.
for a in range(1,31):
for b in range(a,31):
for c in range(b,31):
for d in range(c,31):
if (1.0/a+1.0/b+1.0/c+1.0/d)<=1.0 and gcd(a,b,c,d)>=4:
print a,b,c,d
Basically, whatever frequencies you're considering, 1) they can't take up more than all of the stream 2) if their greatest common denominator is <4, you can't find an arrangement of them that won't conflict. (For example, consider the case of two prime numbers; gcd(p1,p2) is always 1, and they'll always conflict in <=p1*p2 frames regardless of how you offset them)
If you have a look at you rates, you are going to remark that:
There is k in N (integers >= 0) such that f1 = k * f2
There is no k in N such that f1 = k * f3
More to the point, f1 and f2 are specials in that f2 gives you a subsequence of what f1 would give starting at the same point. Then since two f1 sequences would never cross if they don't begin at the same point (think parallel), then naturally f2 is not going to cross f1 either!
You can also see that the contrary holds, since f3 is not a subsequence of f1 (ie, f3 is not a divisor of f1), then there exist i,j in Z (integers) such that if1 + jf3 = 1, though I can't remember which theorem this is from. This means that I can actually find a collision whatever the position both subsequences start from.
It alos means that you could get away with f1 = 29 and f3 = 27 if you could only had a few frames, but ultimately they will collide if you keep going long enough (though predicting and not computing it is beyond me at the moment).
In short: elect one 'master' frequence (the fastest of all those you want) and then only pick up divisors of this frequence and you will be okay whatever the length of your video.

Resources