LogLikelihood and MultinomialDistribution in mathematica - wolfram-mathematica

can someone explain to me why the following code
LogLikelihood[ MultinomialDistribution[ countstot, {dt1/ttot,
dt2/ttot, dt3/ttot, dt4/ttot, dt5/ttot}], {CR1, CR2, CR3, CR4,
CR5}]
does not produce a number as output, but instead this:
LogLikelihood[ MultinomialDistribution[ 156, {318/1049, 159/1049,
208/1049, 222/1049, 142/1049}], {0.00186,
0.00185, 0.00136, 0.00108, 0.00115}]
it is the first time I use LogLikelihood and MultinomialDistribution, and I have probably done something wrong, but I can't really understand what.
Thanks

Taking a few clues from the documentation.
d = MultinomialDistribution[
156, {318/1049, 159/1049, 208/1049, 222/1049, 142/1049}] // N;
These are the mean results expected from this distribution
m = Mean[d]
{47.2908, 23.6454, 30.9323, 33.0143, 21.1173}
Total[m]
156.
Taking some random values
r = RandomVariate[d]
{51, 17, 23, 41, 24}
The log-likelihood of these values (non-negative integer input for multinomial)
LogLikelihood[d, {r}]
-12.9418
Total[r]
156
Scaling up your figures and rounding so that they total 156
values = {0.00186, 0.00185, 0.00136, 0.00108, 0.00115};
factor = 156/Total[values];
scaled = 0.999 factor values;
rounded = Round[scaled]
{40, 39, 29, 23, 25}
Total[rounded]
156
LogLikelihood[d, {rounded}]
-16.555

Related

Function which increases fast and slows down reaching predefined maximum

I am creating a count up algorithm where I increase the number with bigger increments and then the increments get smaller over time, ideally reaching zero or 1. The final sum value should be predefined. The number of steps should be an input parameter and can vary. It seems like it is a logarithmic function with a maximum value. However, the logarithmic function grows until infinity.
The best I've found is square root of logarithm:
val log = (1..10).map { Math.sqrt(Math.log(it * 1.0)) }
val max = log.max()
val rounded = log.map { it * 1000 / max!! }.map { Math.round(it) }
rounded.forEachIndexed { i, l ->
if (i + 1 < rounded.size)
println("${rounded[i + 1] - rounded[i]}")
}
However, i still do not get to very small increments in the end.
If range is from zero to 10:
549, 142, 85, 60, 46, 37, 31, 27, 23
If the range is 20:
481, 125, 74, 53, 40, 33, 27, 23, 21, 18, 16, 14, 14, 12, 11, 10, 10, 9, 9
What algorthm to use to get to 1 in the end?
Update:
Based on Patricks formula I made this solution:
` val N = 18981.0
val log = (1..50).map { N - N/it }
val max = log.max()
log.map { print("$it, ") }
val rounded = log.map { it * N / max!! }.map { Math.round(it) }`
It is important that N is Double and not the whole number
Square root of the logarithm also grows to infinity. Try
f(n) = N - N/n
This has the value 0 at n = 1 and tends towards N as n grows without bound. If you need finer granularity, add some coefficients and play around with them until you get something reasonable. For instance, you can use [1 + (n/1000)] and get similar but much slower growth. You can also use exp(-x) or any function with a horizontal asymptote and get similar behavior.
f(n) = N - exp(-n)
Again, add some coefficients and see how the function changes

Unique random string with alphanumberic required in Ruby

I'm using the following code to generate a unique 10-character random string of [A-Z a-z 0-9] in Ruby:
random_code = [*('a'..'z'),*('0'..'9'),*('A'..'Z')].shuffle[0, 10].join
However, sometimes this random string does not contain a number or an uppercase character. Could you help me have a method that generates a unique random string that requires at least one number, one uppercase and one downcase character?
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
all = down + up + digits
[down.sample, up.sample, digits.sample].
concat(7.times.map { all.sample }).
shuffle.
join
#=> "TioS8TYw0F"
[Edit: The above reflects a misunderstanding of the question. I'll leave it, however. To have no characters appear more than once:
def rnd_str
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
[extract1(down), extract1(up), extract1(digits)].
concat(((down+up+digits).sample(7))).shuffle.join
end
def extract1(arr)
i = arr.size.times.to_a.sample
c = arr[i]
arr.delete_at(i)
c
end
rnd_str #=> "YTLe0WGoa1"
rnd_str #=> "NrBmAnE9bT"
down.sample.shift (etc.) would have been more compact than extract1, but the inefficiency was just too much to bear.
If you do not want to repeat random strings, simply keep a list of the ones you generate. If you generate another that is in the list, discard it and generate another. It's pretty unlikely you'll have to generate any extra ones, however. If, for example, you generate 100 random strings (satisfying the requirement of at least one lowercase letter, uppercase letter and digit), the chances that there will be one or more duplicate strings is about one in 700,000:
t = 107_518_933_731
n = t+1
t = t.to_f
(1.0 - 100.times.reduce(1.0) { |prod,_| prod * (n -= 1)/t }).round(10)
#=> 1.39e-07
where t = C(62,10) and C(62,10) is defined below.
An alternative
There is a really simple way to do this that turns out to be pretty efficient: just sample without replacement until a sample is found that meets the requirement of at least lowercase letter, one uppercase letter and one digit. We can do that as follows:
DOWN = ('a'..'z').to_a
UP = ('A'..'Z').to_a
DIGITS = ('0'..'9').to_a
ALL = DOWN + UP + DIGITS
def rnd_str
loop do
arr = ALL.sample(10)
break arr.shuffle.join unless (DOWN&&arr).empty? || (UP&&arr).empty? ||
(DIGITS&&arr).empty?
end
end
rnd_str #=> "3jRkHcP7Ge"
rnd_str #=> "B0s81x4Jto
How many samples must we reject, on average, before finding a "good" one? It turns out (see below if you are really, really interested) that the probability of getting a "bad" string (i.e, selecting 10 characters at random from the 62 elements of all, without replacement, that has no lowercase letters, no uppercase letters or no digits, is only about 0.15. (15%). That means that 85% of the time no bad samples will be rejected before a good one is found.
It turns out that the expected number of bad strings that will be sampled, before a good string is sampled, is:
0.15/0.85 =~ 0.17
The following shows how the above probability was derived, should anyone be interested.
Let n_down be the number of ways a sample of 10 can be drawn that has no lowercase letters:
n_down = C(36,10) = 36!/(10!*(36-10)!)
where (the binomial coefficient) C(36,10) equals the number of combinations of 36 "things" that can be "taken" 10 at a time, and equals:
C(36,10) = 36!/(10!*(36-10)!) #=> 254_186_856
Similarly,
n_up = n_down #=> 254_186_856
and
n_digits = C(52,10) #=> 15_820_024_220
We can add these three numbers together to obtain:
n_down + n_up + n_digits #=> 16_328_397_932
This is almost, but not quite, the number of ways to draw 10 characters, without replacement, that contains no lowercase letters characters, uppercase letters or digits. "Not quite" because there is a bit of double-counting going on. The necessary adjustment is as follows:
n_down + n_up + n_digits - 2*C(26,10) - 3
#=> 16_317_774_459
To obtain the probability of drawing a sample of 10 from a population of 62, without replacement, that has no lowercase letter, no uppercase letter or no digit, we divide this number by the total number of ways 10 characters can be drawn from 62 without replacement:
(16_317_774_459.0/c(62,10)).round(2)
#=> 0.15
If you want a script to generate just some small amount of tokens (like 2, 5, 10, 100, 1000, 10 000, etc), then the best way would be to simply keep the already generated tokens in memory and retry until a new one is generated (statistically speaking, this wont take long). If this is not the case - keep reading.
After thinking about it, this problem turned out to be in fact very interenting. For brievety, I will not mention the requirement to have at least one number, capital and lower case letters, but it will be included in the final solution. Also let all = [*'1'..'9', *'a'..'z', *'A'..'Z'].
To sum it up, we want to generate k-permutations of n elements with repetition randomly with uniqueness constraint.
k = 10, n = 61 (all.size)
Ruby just so happens to have such method, it's Array#repeated_permutation. So everything is great, we can just use:
all.repeated_permutation(10).to_a.map(&join).shuffle
and pop the resulting strings one by one, right? Wrong! The problem is that the amount of possibilities happens to be:
k^n = 10000000000000000000000000000000000000000000000000000000000000 (10**61).
Even if you had an infinetelly fast processor, you still can't hold such amount of data, no matter if this was the count of complex objects or simple bits.
The opposite would be to generate random permutations, keep the already generated in a set and make checks for inclusion before returning the next element. This is just delaying the innevitable - not only you would still have to hold the same amount of information at some point, but as the number of generated permutations grows, the number of tries required to generate a new permutation diverges to infinity.
As you might have thought, the root of the problem is that randomness and uniqueness hardly go hand to hand.
Firstly, we would have to define what we would consider as random. Judging by the amount of nerdy comics on the subject, you could deduce that this isn't that black and white either.
An intuitive definition for a random program would be one that doesn't generate the tokens in the same order with each execution. Great, so now we can just take the first n permutations (where n = rand(100)), put them at the end and enumerate everything in order? You can sense where this is going. In order for a random generation to be considered good, the generated outputs of consecutive runs should be equaly distributed. In simpler terms, the probability of getting any possible output should be equal to 1 / #__all_possible_outputs__.
Now lets explore the boundaries of our problem a little:
The number of possible k-permutations of n elements without repetition is:
n!/(n-k)! = 327_234_915_316_108_800 ((61 - 10 + 1).upto(61).reduce(:*))
Still out of reach. Same goes for
The number of possible full permutations of n elements without repetition:
n! = 507_580_213_877_224_798_800_856_812_176_625_227_226_004_528_988_036_003_099_405_939_480_985_600_000_000_000_000 (1.upto(61).reduce(:*))
The number of possible k-combinations of n elements without repetition:
n!/k!(n-k)! = 90_177_170_226 ((61 - 10 + 1).upto(61).reduce(:*)/1.upto(10).reduce(:*))
Finally, where we might have a break through with full permutation of k elements without repetition:
k! = 3_628_800 (1.upto(10).reduce(:*))
~3.5m isn't nothing, but at least it's reasonably computable. On my personal laptop k_permutations = 0.upto(9).to_a.permutation.to_a took 2.008337 seconds on average. Generally, as computing time goes, this is a lot. However, assuming that you will be running this on an actual server and only once per application startup, this is nothing. In fact, it would even be reasonable to create some seeds. A single k_permutations.shuffle took 0.154134 seconds, therefore in about a minute we can acquire 61 random permutations: k_seeds = 61.times.map { k_permutations.shuffle }.to_a.
Now lets try to convert the problem of k-permutations of n elements without repetition to solving multiple times full k-permutations without repetitions.
A cool trick for generating permutations is using numbers and bitmaps. The idea is to generate all numbers from 0 to 2^61 - 1 and look at the bits. If there is a 1 on position i, we will use the all[i] element, otherwise we will skip it. We still didn't escape the issue as 2^61 = 2305843009213693952 (2**61) which we can't keep in memory.
Fortunatelly, another cool trick comes to the rescue, this time from number theory.
Any m consecutive numbers raised to the power of a prime number by modulo of m give the numbers from 0 to m - 1
In other words:
5.upto(65).map { |number| number**17 % 61 }.sort # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
5.upto(65).map { |number| number**17 % 61 } # => [36, 31, 51, 28, 20, 59, 11, 22, 47, 48, 42, 12, 54, 26, 5, 34, 29, 57, 24, 53, 15, 55, 3, 38, 21, 18, 43, 40, 23, 58, 6, 46, 8, 37, 4, 32, 27, 56, 35, 7, 49, 19, 13, 14, 39, 50, 2, 41, 33, 10, 30, 25, 16, 9, 17, 60, 0, 1, 44, 52, 45]
Now actually, how random is that? As it turns out - the more common divisors shared by m and the selected m numbers, the less evenly distributed the sequence is. But we are at luck here - 61^2 - 1 is a prime number (also called Mersenne prime). Therefore, the only divisors it can share are 1 and 61^2 - 1. This means that no matter what power we choose, the positions of the numbers 0 and 1 will be fixed. That is not perfect, but the other 61^2 - 3 numbers can be found at any position. And guess what - we don't care about 0 and 1 anyway, because they don't have 10 1s in their binary representation!
Unfortunatelly, a bottleneck for our randomness is the fact that the bigger prime number we want to generate, the harder it gets. This is the best I can come up with when it comes to generating all the numbers in a range in a shuffled order, without keeping them in memory simultaneously.
So to put everything in use:
We generate seeds of full permutations of 10 elements.
We generate a random prime number.
We randomly choose if we want to generate permutations for the next number in the sequence or a number that we already started (up to a finite number of started numbers).
We use bitmaps of the generated numbers to get said permutations.
Note that this will solve only the problem of k-permutations of n elements without repetition. I still haven't thought of a way to add repetition.
DISCLAIMER: The following code comes with no guarantees of any kind, explicit or implied. Its point is to further express the author's ideas, rather than be a production ready solution:
require 'prime'
class TokenGenerator
NUMBERS_UPPER_BOUND = 2**61 - 1
HAS_NUMBER_MASK = ('1' * 9 + '0' * (61 - 9)).reverse.to_i(2)
HAS_LOWER_CASE_MASK = ('0' * 9 + '1' * 26 + '0' * 26).reverse.to_i(2)
HAS_UPPER_CASE_MASK = ('0' * (9 + 26) + '1' * 26).reverse.to_i(2)
ALL_CHARACTERS = [*'1'..'9', *'a'..'z', *'A'..'Z']
K_PERMUTATIONS = 0.upto(9).to_a.permutation.to_a # give it a couple of seconds
def initialize
random_prime = Prime.take(10_000).drop(100).sample
#all_numbers_generator = 1.upto(NUMBERS_UPPER_BOUND).lazy.map do |number|
number**random_prime % NUMBERS_UPPER_BOUND
end.select do |number|
!(number & HAS_NUMBER_MASK).zero? and
!(number & HAS_LOWER_CASE_MASK).zero? and
!(number & HAS_UPPER_CASE_MASK).zero? and
number.to_s(2).chars.count('1') == 10
end
#k_permutation_seeds = 61.times.map { K_PERMUTATIONS.shuffle }.to_a # this will take a minute
#numbers_in_iteration = {go_fish: nil}
end
def next
raise StopIteration if #numbers_in_iteration.empty?
number_generator = #numbers_in_iteration.keys.sample
if number_generator == :go_fish
add_next_number if #numbers_in_iteration.size < 1_000_000
self.next
else
next_permutation(number_generator)
end
end
private
def add_next_number
#numbers_in_iteration[#all_numbers_generator.next] = #k_permutation_seeds.sample.to_enum
rescue StopIteration # lol, you actually managed to traverse all 2^61 numbers!
#numbers_in_iteration.delete(:go_fish)
end
def next_permutation(number)
fetch_permutation(number, #numbers_in_iteration[number].next)
rescue StopIteration # all k permutations for this number were already generated
#numbers_in_iteration.delete(number)
self.next
end
def fetch_permutation(number_mask, k_permutation)
k_from_n_indices = number_mask.to_s(2).chars.reverse.map.with_index { |bit, index| index if bit == '1' }.compact
k_permutation.each_with_object([]) { |order_index, k_from_n_values| k_from_n_values << ALL_CHARACTERS[k_from_n_indices[order_index]] }
end
end
EDIT: it turned out that our constraints eliminate too much possibilities. This causes #all_numbers_generator to take too much time testing and skipping numbers. I will try to think of a better generator, but everything else remains valid.
The old version that generates tokens with uniqueness constraint on the containing characters:
numbers = ('0'..'9').to_a
downcase_letters = ('a'..'z').to_a
upcase_letters = downcase_letters.map(&:upcase)
all = [numbers, downcase_letters, upcase_letters]
one_of_each_set = all.map(&:sample)
random_code = (one_of_each_set + (all.flatten - one_of_each_set).sample(7)).shuffle.join
Use 'SafeRandom' Gem GithubLink
It will provide the easiest way to generate random values for Rails 2, Rails 3, Rails 4, Rails 5 compatible.
Here you can use the strong_string method to generate a strong combination of string ( ie combination of the alphabet(uppercase, downcase), number, and symbols
# => Strong string: Minumum number should be greater than 5 otherwise by default 8 character string.
require 'safe_random'
puts SafeRandom.strong_string # => 4skgSy93zaCUZZCoF9WiJF4z3IDCGk%Y
puts SafeRandom.strong_string(3) # => P4eUbcK%
puts SafeRandom.strong_string(5) # => 5$Rkdo

General algorithm: random sort array, so the distance between objects is maxed

Given an array of random not-unique numbers
[221,44,12,334,63,842,112,12]
What would be the best approach to random sort the values, but also try to max the distance |A-B| to the neighbouring number
You could try a suboptimal greedy algorithm:
1. sortedArr <- sort input array
2. resultArr <- initialize empty array
3. for i=0 to size of sortedArr
a. if i%2
I. resultArr[i] = sortedArr[i/2]
b. else
II. resultArr[i] = sortedArr[sortedArr.size - (i+1)/2]
This puts numbers in the result alternating from the left and right of the sorted input. For example if the sorted input is:
12, 12, 44, 63, 112, 221, 334, 842
Then the output would be:
12, 842, 12, 334, 44, 221, 63, 112
This might not be optimal but it probably gets pretty close and works in O(nlogn). On your example the optimal is obtained by:
63, 221, 12, 334, 12, 842, 44, 112
Which yields a sum of 2707. My algorithm yields a sum of 2656. I'm pretty sure that you won't be able to find the optimal in polynomial time.
A brute force solution in Python would look like:
import itertools
maxSum = 0
maxl = []
for l in itertools.permutations([221,44,12,334,63,842,112,12]):
sum = 0
for i in range(len(l)-1):
sum += abs(l[i]-l[i+1])
if sum > maxSum:
print maxSum
print maxl
maxSum = sum
maxl = l
print maxSum
print maxl

Two Egg problem confusion

Two Egg problem:
You are given 2 eggs.
You have access to a 100-storey building.
Eggs can be very hard or very fragile means it may break if dropped from the first floor or may not even break if dropped from 100 th floor.Both eggs are identical.
You need to figure out the highest floor of a 100-storey building an egg can be dropped without breaking.
Now the question is how many drops you need to make. You are allowed to break 2 eggs in the process
I am sure the two egg problem ( mentioned above ) has been discussed sufficiently. However could someone help me understand why the following solution is not optimal.
Let's say I use a segment and scan algorithm with the segment size s.
So,
d ( 100 / s + (s-1) ) = 0 [ this should give the minima, I need '(s-1)' scans per segment and there are '100/s' segments]
-
ds
=> -100 / s^2 + 1 = 0
=> s^2 = 100
=> s = 10
So according to this I need at most 19 drops. But the optimal solution can do this with 14 drops.
So where lies the problem?
You seem to be assuming equal-sized segments. For an optimal solution, if the first segment is of size N, then the second has to be of size N-1, and so on (because when you start testing the second segment, you've already dropped the egg once for the first segment).
So you need to solve n+(n-1)+(n-2)+...+1<=100, from where (n)(n+1)/2<=100 (this function transform is done with arithmetic series aka sum of an arithmetic sequence), now if you solve for n (wolframalpha: Reduce[Floor[n + n^2] >= 200, n] ) you get 14. Now you know that the first floor where you need to make the drop is 14th floor, next will be (14+14-1)th floor and whole sequence:
14; 27; 39; 50; 60; 69; 77; 84; 90; 95; 99; 100
If you break the first egg, you go back to the last one and linearly check all options until you break the second egg, when you do, you got your answer. There is no magic.
http://mathworld.wolfram.com/ArithmeticSeries.html
Correct and optimal solution is 13, 25, 36, 46, 55, 64, 72, 79, 85, 90, 94, 97, 99, 100 in which average number of trials of finding floor on which egg breaks is minimum, assuming floor on which egg breaks is selected randomly.
Based on this information we can write a recursive function to minimize average trials, that gives a solution of
13, 25, 36, 46, 55, 64, 72, 79, 85, 90, 94, 97, 99, 100
It has following max trials for each floor-step
13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14
This is obviously much better than naive solution arrived by assuming gaps starting at 14 and reducing. In this case 55% of time you just need 13 trials. It is very near to optimal solution derived from n (n+1) / 2 >= 100 which gives n = 13.651 and our optimal solution is (13*5+14*9)/14 i.e. 13.643
Here is a quick implementation:
import sys
def get_max_trials(floors):
pf = 0
trials = []
for i, f in enumerate(floors):
trials.append(i+f-pf)
pf = f
return trials
def get_trials_per_floor(floors):
# return list of trials if egg is assumed at each floor
pf = 0
trials = []
for i, f in enumerate(floors):
for mid_f in range(pf+1,f+1):
trial = (i+1) + f - mid_f + 1
if mid_f == pf+1:
trial -= 1
trials.append(trial)
pf = f
return trials
def get_average(floors):
trials = get_trials_per_floor(floors)
score = sum(trials)
return score*1.0/floors[-1], max(trials)
floors_map = {}
def get_floors(N, level=0):
if N == 1:
return [1]
if N in floors_map:
return floors_map[N]
best_floors = None
best_score = None
for i in range(1,N):
base_floors = [f+i for f in get_floors(N-i, level+1)]
for floors in [base_floors, [i] + base_floors]:
score = get_average(floors)
if best_score is None or score < best_score:
best_score = score
best_floors = floors
if N not in floors_map:
floors_map[N] = best_floors
return best_floors
floors = get_floors(100)
print "Solution:",floors
print "max trials",get_max_trials(floors)
print "avg.",get_average(floors)
naive_floors = [14, 27, 39, 50, 60, 69, 77, 84, 90, 95, 99, 100]
print "naive_solution",naive_floors
print "max trials",get_max_trials(naive_floors)
print "avg.",get_average(naive_floors)
Output:
Solution: [13, 25, 36, 46, 55, 64, 72, 79, 85, 90, 94, 97, 99, 100]
max trials [13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14]
avg. (10.31, 14)
naive_solution [14, 27, 39, 50, 60, 69, 77, 84, 90, 95, 99, 100]
max trials [14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 12]
avg. (10.35, 14)
I also had the same thought in mind . I was also trying to find the exact method you said . I cleared this solution as explained by one of the members here . But here is a bit more explanation if you might .
N is defined as the minimum no: of searches required .
I am trying to find a no: n such that it is the min no: of searches I have to make .
So I start at xth floor I have 2 scenarios ,
1)
It breaks , I have to do x-1 more checking's (because I have only 1 more egg) . All's fair there . Total is 1+ x-1 = x searches .
Now we have defined this value as n . Hence x = n ! [PS : This might be trivial but this has some subtleties IMO]
2)
It doesnt break - and I have used up one of my n possibilities already !
Now the searches allowed further is n - 1 . Only then the total no: of searches will be N and that is the definition of N .
The problem now has become a sub problem of 100 - n floors with 2 eggs .
If am chosing some yth floor now - its worst case should be n - 1 . (n - 1)th floor satisfies this .
Hence you get the pattern go to nth , n + (n -1 )th floor , n + (n - 1) + (n - 2)th floor .... Solve this for 100th floor and you get N .
The floor you start with and the no: of searches is a coincidence I think .
To get the maxima n = 14 , you can think of having n bulbs with 2 bulbs glowing at once .
It will require atleast 14 bulbs to cover all the possible combinations of where egg can break .
As a challenge try to do it for 3 eggs .
In your logic basically , there is an asymmetry in how the search progress .
For the first set of 10 elements , the algorithm finds out quickly .
I would suggest to try and check
http://ite.pubs.informs.org/Vol4No1/Sniedovich/ for some explnation and also try to visualize how this problem is seen in real cases of Networks .
A very nice explanation of the solution I found in the below link.
The Two Egg Problem
It explains how you get to n+(n-1)+(n-2)+...+1<=100
The 1 Egg Problem - Linear Complexity O(100)
and Multiple(Infinite) Eggs Problem - Logarithmic complexity O(log2(100)).
Here's a solution in Python. If you drop the egg at a certain floor f, it either breaks or it doesn't, and in each case you have a certain number of floors you still need to check (which is a subproblem). It uses a recursion and also a lookup dictionary to make it much faster to compute.
neededDict = {}
# number of drops you need to make
def needed(eggs, floors):
if (eggs, floors) in neededDict:
return neededDict[(eggs, floors)]
if eggs == 1:
return floors
if eggs > 1:
minimum = floors
for f in range(floors):
#print f
resultIfEggBreaks = needed(eggs - 1, f)
resultIfEggSurvives = needed(eggs, floors - (f + 1))
result = max(resultIfEggBreaks, resultIfEggSurvives)
if result < minimum:
minimum = result
# 1 drop at best level f plus however many you need to handle all floors that remain unknown
neededDict[(eggs, floors)] = 1 + minimum
return 1 + minimum
print needed(2, 100)
The question should not be how many drops you need to make ? but rather than that it should be find the minimal number of drops in order to know where the egg breaks, I saw this issue on careercup, below is the algorithms I thought of:
There are two ways to solve this problem :
binary search for the first egg (risked to know where we need to look
up) O(binary log)
Fibonaccy sequence search 1,2,3,5,8,13,21,34,55,89 for the first egg O(log) http://en.wikipedia.org/wiki/Fibonacci_search_technique
Once first egg is broken we know in which interval we need to look:
binary example:
we try 100/2 (50) if it broke we search from 1 to 50 incrementing by 1 if not we throw it from 50+100/2 (75) if it broke we search from 50 to 75 if not we throw it from 75+100/2 (87) if it broke we search from 75 to 87 incemrenting by one floor at a time and so on and so forth.
fibonacy example: same thing : we try 1,2,3,5,8.13,... if first egg
broke we get back to the last interval's minimum and increment by 1.
hey what about this approach.
Try this sequence:
1,2,4,8,16,32,64,100
And once you find the egg is broken you well get a space to work on.
lets suppose # 64 egg breaks. then the answer lies between 32 & 64.
We can use normal binary search between those 2 number.
we will check # 48 (32+64)/2 and then we will get the upper half or lower half as shortlisted space. and repeat
In this case the worst case is having the floor at 99. which will take 14 tries.
The explanation of the two eggs problem can make some people confused in the first time, so we can understand the solution as follows:
Given x is the floor we start dropping the eggs:
- If it breaks, the total of trials in the worst case is x + (x - 1)
- If it doesn't break, how should we step up to the next floor? We can jump to floor (x + x)th, (x + x + 1)th... But it will increase the number of trials, we can try at x = 10:
. If it does break, we must try 10 times total in the worst case.
. If it does not break, and we step up to 10th + 10th = 20th and try, and if it breaks, we must try 1 (at floor 10th) + 1 (at floor 20th) + 9 = 11 times. Similarly, if we step up to x + 1, or x + 2 floor it will increase the number of trials.
Actually, we want the number of trials being equal in both cases, for that reason we will step up to x - 1 floor instead of x, x + 1.,etc. Finally, we will have an expression in general:
x + (x - 1) + (x - 2) + ... + 1.
And that's it.
I would say the optimal solution for 100 floors with two eggs is 13 tries not 14.
13, 25, 36, 46, 55, 64, 72, 79, 85, 90, 94, 97, 99, 100 is the optimal answer, but if I reach to 99 I do not really need to try out 100. It is obvious the correct answer without try to drop egg from 100th floor :D

Non-repeating pseudo random number stream with 'clumping'

I'm looking for a method to generate a pseudorandom stream with a somewhat odd property - I want clumps of nearby numbers.
The tricky part is, I can only keep a limited amount of state no matter how large the range is. There are algorithms that give a sequence of results with minimal state (linear congruence?)
Clumping means that there's a higher probability that the next number will be close rather than far.
Example of a desirable sequence (mod 10): 1 3 9 8 2 7 5 6 4
I suspect this would be more obvious with a larger stream, but difficult to enter by hand.
Update:
I don't understand why it's impossible, but yes, I am looking for, as Welbog summarized:
Non-repeating
Non-Tracking
"Clumped"
Cascade a few LFSRs with periods smaller than you need, combining them to get a result such than the fastest changing register controls the least significant values. So if you have L1 with period 3, L2 with period 15 and L3 with some larger period, N = L1(n) + 3 * L2(n/3) + 45 * L3(n/45). This will obviously generate 3 clumped values, then jump and general another 3 clumped values. Use something other than multiplication ( such as mixing some of the bits of the higher period registers ) or different periods to make the clump spread wider than the period of the first register. It won't be particularly smoothly random, but it will be clumpy and non-repeating.
For the record, I'm in the "non-repeating, non-random, non-tracking is a lethal combination" camp, and I hope some simple though experiments will shed some light. This is not formal proof by any means. Perhaps someone will shore it up.
So, I can generate a sequence that has some randomness easily:
Given x_i, x_(i+1) ~ U(x_i, r), where r > x_i.
For example:
if x_i = 6, x_(i+1) is random choice from (6+epsilon, some_other_real>6). This guarantees non-repeating, but at the cost that the distribution is monotonically increasing.
Without some condition (like monotonicity), inherent to the sequence of generated numbers themselves, how else can you guarantee uniqueness without carrying state?
Edit: So after researching RBarryYoung's claim of "Linear Congruential Generators" (not differentiators... is this what RBY meant), and clearly, I was wrong! These sequences exist, and by necessity, any PRNG whose next number is dependent only on the current number and some global, non changing state can't have repeats within a cycle (after some initial burn it period).
By defining the "clumping features" in terms of the probability distribution of its size, and the probability distribution of its range, you can then use simple random generators with the underlying distribution and produce the sequences.
One way to get "clumpy" numbers would be to use a normal distribution.
You start the random list with your "initial" random value, then you generate a random number with the mean of the previous random value and a constant variance, and repeat as necessary. The overall variance of your entire list of random numbers should be approximately constant, but the "running average" of your numbers will drift randomly with no particular bias.
>>> r = [1]
>>> for x in range(20):
r.append(random.normalvariate(r[-1], 1))
>>> r
[1, 0.84583267252801408, 0.18585962715584259, 0.063850022580489857, 1.2892164299497422,
0.019381814281494991, 0.16043424295472472, 0.78446377124854461, 0.064401889591144235,
0.91845494342245126, 0.20196939102054179, -1.6521524237203531, -1.5373703928440983,
-2.1442902977248215, 0.27655425357702956, 0.44417440706703393, 1.3128647361934616,
2.7402744740729705, 5.1420432435119352, 5.9326297626477125, 5.1547981880261782]
I know it's hard to tell by looking at the numbers, but you can sort of see that the numbers clump together a little bit - the 5.X's at the end, and the 0.X's on the second row.
If you need only integers, you can just use a very large mean and variance, and truncate/divide to obtain integer output. Normal distributions by definition are a continuous distribution, meaning all real numbers are potential output - it is not restricted to integers.
Here's a quick scatter plot in Excel of 200 numbers generated this way (starting at 0, constant variance of 1):
scatter data http://img178.imageshack.us/img178/8677/48855312.png
Ah, I just read that you want non-repeating numbers. No guarantee of that in a normal distribution, so you might have to take into account some of the other approaches others have mentioned.
I don't know of an existing algorithm that would do this, but it doesn't seem difficult to roll your own (depending on how stringent the "limited amount of state" requirement is). For example:
RANGE = (1..1000)
CLUMP_ODDS = .5
CLUMP_DIST = 10
last = rand(RANGE)
while still_want_numbers
if rand(CLUMP_ODDS) # clump!
next = last + rand(CLUMP_DIST) - (CLUMP_DIST / 2) # do some boundary checking here
else # don't clump!
next = rand(RANGE)
end
print next
last = next
end
It's a little rudimentary, but would something like that suit your needs?
In the range [0, 10] the following should give a uniform distribution. random() yields a (pseudo) random number r with 0 <= r < 1.
x(n + 1) = (x(n) + 5 * (2 * random() - 1)) mod 10
You can get your desired behavior by delinearizing random() - for example random()^k will be skewed towards small numbers for k > 1. An possible function could be the following, but you will have to try some exponents to find your desired distribution. And keep the exponent odd, if you use the following function... ;)
x(n + 1) = (x(n) + 5 * (2 * random() - 1)^3) mod 10
How about (psuedo code)
// clumpiness static in that value retained between calls
static float clumpiness = 0.0f; // from 0 to 1.0
method getNextvalue(int lastValue)
float r = rand(); // float from 0 to 1
int change = MAXCHANGE * (r - 0.5) * (1 - clumpiness);
clumpiness += 0.1 * rand() ;
if (clumpiness >= 1.0) clumpiness -= 1.0;
// -----------------------------------------
return Round(lastValue + change);
Perhaps you could generate a random sequence, and then do some strategic element swapping to get the desired property.
For example, if you find 3 values a,b,c in the sequence such that a>b and a>c, then with some probability you could swap elements a and b or elements a and c.
EDIT in response to comment:
Yes, you could have a buffer on the stream that is whatever size you are comfortable with. Your swapping rules could be deterministic, or based on another known, reproducible psuedo-random sequence.
Does a sequence like 0, 94, 5, 1, 3, 4, 14, 8, 10, 9, 11, 6, 12, 7, 16, 15, 17, 19, 22, 21, 20, 13, 18, 25, 24, 26, 29, 28, 31, 23, 36, 27, 42, 41, 30, 33, 34, 37, 35, 32, 39, 47, 44, 46, 40, 38, 50, 43, 45, 48, 52, 49, 55, 54, 57, 56, 64, 51, 60, 53, 59, 62, 61, 69, 68, 63, 58, 65, 71, 70, 66, 73, 67, 72, 79, 74, 81, 77, 76, 75, 78, 83, 82, 85, 80, 87, 84, 90, 89, 86, 96, 93, 98, 88, 92, 99, 95, 97, 2, 91 (mod 100) look good to you?
This is the output of a small ruby program (explanations below):
#!/usr/bin/env ruby
require 'digest/md5'
$seed = 'Kind of a password'
$n = 100 # size of sequence
$k = 10 # mixing factor (higher means less clumping)
def pseudo_random_bit(k, n)
Digest::MD5.hexdigest($seed + "#{k}|#{n}")[-1] & 1
end
def sequence(x)
h = $n/2
$k.times do |k|
# maybe exchange 1st with 2nd, 3rd with 4th, etc
x ^= pseudo_random_bit(k, x >> 1) if x < 2*h
# maybe exchange 1st with last
if [0, $n-1].include? x
x ^= ($n-1)*pseudo_random_bit(k, 2*h)
end
# move 1st to end
x = (x - 1) % $n
# maybe exchange 1st with 2nd, 3rd with 4th, etc
# (corresponds to 2nd with 3rd, 4th with 5th, etc)
x ^= pseudo_random_bit(k, h+(x >> 1)) if x < 2*(($n-1)/2)
# move 1st to front
x = (x + 1) % $n
end
x
end
puts (0..99).map {|x| sequence(x)}.join(', ')
The idea is basically to start with the sequence 0..n-1 and disturb the order by passing k times over the sequence (more passes means less clumping). In each pass one first looks at the pairs of numbers at positions 0 and 1, 2 and 3, 4 and 5 etc (general: 2i and 2i+1) and flips a coin for each pair. Heads (=1) means exchange the numbers in the pair, tails (=0) means don't exchange them. Then one does the same for the pairs at positions 1 and 2, 3 and 4, etc (general: 2i+1 and 2i+2). As you mentioned that your sequence is mod 10, I additionally exchanged positions 0 and n-1 if the coin for this pair dictates it.
A single number x can be mapped modulo n after k passes to any number of the interval [x-k, x+k] and is approximately binomial distributed around x. Pairs (x, x+1) of numbers are not independently modified.
As pseudo-random generator I used only the last of the 128 output bits of the hash function MD5, choose whatever function you want instead. Thanks to the clumping one won't get a "secure" (= unpredictable) random sequence.
Maybe you can chain together 2 or more LCGs in a similar manner described for the LSFRs described here. Incement the least-significant LCG with its seed, on full-cycle, increment the next LCG. You only need to store a seed for each LCG. You could then weight each part and sum the parts together. To avoid repititions in the 'clumped' LstSig part you can randomly reseed the LCG on each full cycle.

Resources