Two Egg problem confusion - algorithm

Two Egg problem:
You are given 2 eggs.
You have access to a 100-storey building.
Eggs can be very hard or very fragile means it may break if dropped from the first floor or may not even break if dropped from 100 th floor.Both eggs are identical.
You need to figure out the highest floor of a 100-storey building an egg can be dropped without breaking.
Now the question is how many drops you need to make. You are allowed to break 2 eggs in the process
I am sure the two egg problem ( mentioned above ) has been discussed sufficiently. However could someone help me understand why the following solution is not optimal.
Let's say I use a segment and scan algorithm with the segment size s.
So,
d ( 100 / s + (s-1) ) = 0 [ this should give the minima, I need '(s-1)' scans per segment and there are '100/s' segments]
-
ds
=> -100 / s^2 + 1 = 0
=> s^2 = 100
=> s = 10
So according to this I need at most 19 drops. But the optimal solution can do this with 14 drops.
So where lies the problem?

You seem to be assuming equal-sized segments. For an optimal solution, if the first segment is of size N, then the second has to be of size N-1, and so on (because when you start testing the second segment, you've already dropped the egg once for the first segment).

So you need to solve n+(n-1)+(n-2)+...+1<=100, from where (n)(n+1)/2<=100 (this function transform is done with arithmetic series aka sum of an arithmetic sequence), now if you solve for n (wolframalpha: Reduce[Floor[n + n^2] >= 200, n] ) you get 14. Now you know that the first floor where you need to make the drop is 14th floor, next will be (14+14-1)th floor and whole sequence:
14; 27; 39; 50; 60; 69; 77; 84; 90; 95; 99; 100
If you break the first egg, you go back to the last one and linearly check all options until you break the second egg, when you do, you got your answer. There is no magic.
http://mathworld.wolfram.com/ArithmeticSeries.html

Correct and optimal solution is 13, 25, 36, 46, 55, 64, 72, 79, 85, 90, 94, 97, 99, 100 in which average number of trials of finding floor on which egg breaks is minimum, assuming floor on which egg breaks is selected randomly.
Based on this information we can write a recursive function to minimize average trials, that gives a solution of
13, 25, 36, 46, 55, 64, 72, 79, 85, 90, 94, 97, 99, 100
It has following max trials for each floor-step
13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14
This is obviously much better than naive solution arrived by assuming gaps starting at 14 and reducing. In this case 55% of time you just need 13 trials. It is very near to optimal solution derived from n (n+1) / 2 >= 100 which gives n = 13.651 and our optimal solution is (13*5+14*9)/14 i.e. 13.643
Here is a quick implementation:
import sys
def get_max_trials(floors):
pf = 0
trials = []
for i, f in enumerate(floors):
trials.append(i+f-pf)
pf = f
return trials
def get_trials_per_floor(floors):
# return list of trials if egg is assumed at each floor
pf = 0
trials = []
for i, f in enumerate(floors):
for mid_f in range(pf+1,f+1):
trial = (i+1) + f - mid_f + 1
if mid_f == pf+1:
trial -= 1
trials.append(trial)
pf = f
return trials
def get_average(floors):
trials = get_trials_per_floor(floors)
score = sum(trials)
return score*1.0/floors[-1], max(trials)
floors_map = {}
def get_floors(N, level=0):
if N == 1:
return [1]
if N in floors_map:
return floors_map[N]
best_floors = None
best_score = None
for i in range(1,N):
base_floors = [f+i for f in get_floors(N-i, level+1)]
for floors in [base_floors, [i] + base_floors]:
score = get_average(floors)
if best_score is None or score < best_score:
best_score = score
best_floors = floors
if N not in floors_map:
floors_map[N] = best_floors
return best_floors
floors = get_floors(100)
print "Solution:",floors
print "max trials",get_max_trials(floors)
print "avg.",get_average(floors)
naive_floors = [14, 27, 39, 50, 60, 69, 77, 84, 90, 95, 99, 100]
print "naive_solution",naive_floors
print "max trials",get_max_trials(naive_floors)
print "avg.",get_average(naive_floors)
Output:
Solution: [13, 25, 36, 46, 55, 64, 72, 79, 85, 90, 94, 97, 99, 100]
max trials [13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14]
avg. (10.31, 14)
naive_solution [14, 27, 39, 50, 60, 69, 77, 84, 90, 95, 99, 100]
max trials [14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 12]
avg. (10.35, 14)

I also had the same thought in mind . I was also trying to find the exact method you said . I cleared this solution as explained by one of the members here . But here is a bit more explanation if you might .
N is defined as the minimum no: of searches required .
I am trying to find a no: n such that it is the min no: of searches I have to make .
So I start at xth floor I have 2 scenarios ,
1)
It breaks , I have to do x-1 more checking's (because I have only 1 more egg) . All's fair there . Total is 1+ x-1 = x searches .
Now we have defined this value as n . Hence x = n ! [PS : This might be trivial but this has some subtleties IMO]
2)
It doesnt break - and I have used up one of my n possibilities already !
Now the searches allowed further is n - 1 . Only then the total no: of searches will be N and that is the definition of N .
The problem now has become a sub problem of 100 - n floors with 2 eggs .
If am chosing some yth floor now - its worst case should be n - 1 . (n - 1)th floor satisfies this .
Hence you get the pattern go to nth , n + (n -1 )th floor , n + (n - 1) + (n - 2)th floor .... Solve this for 100th floor and you get N .
The floor you start with and the no: of searches is a coincidence I think .
To get the maxima n = 14 , you can think of having n bulbs with 2 bulbs glowing at once .
It will require atleast 14 bulbs to cover all the possible combinations of where egg can break .
As a challenge try to do it for 3 eggs .
In your logic basically , there is an asymmetry in how the search progress .
For the first set of 10 elements , the algorithm finds out quickly .
I would suggest to try and check
http://ite.pubs.informs.org/Vol4No1/Sniedovich/ for some explnation and also try to visualize how this problem is seen in real cases of Networks .

A very nice explanation of the solution I found in the below link.
The Two Egg Problem
It explains how you get to n+(n-1)+(n-2)+...+1<=100
The 1 Egg Problem - Linear Complexity O(100)
and Multiple(Infinite) Eggs Problem - Logarithmic complexity O(log2(100)).

Here's a solution in Python. If you drop the egg at a certain floor f, it either breaks or it doesn't, and in each case you have a certain number of floors you still need to check (which is a subproblem). It uses a recursion and also a lookup dictionary to make it much faster to compute.
neededDict = {}
# number of drops you need to make
def needed(eggs, floors):
if (eggs, floors) in neededDict:
return neededDict[(eggs, floors)]
if eggs == 1:
return floors
if eggs > 1:
minimum = floors
for f in range(floors):
#print f
resultIfEggBreaks = needed(eggs - 1, f)
resultIfEggSurvives = needed(eggs, floors - (f + 1))
result = max(resultIfEggBreaks, resultIfEggSurvives)
if result < minimum:
minimum = result
# 1 drop at best level f plus however many you need to handle all floors that remain unknown
neededDict[(eggs, floors)] = 1 + minimum
return 1 + minimum
print needed(2, 100)

The question should not be how many drops you need to make ? but rather than that it should be find the minimal number of drops in order to know where the egg breaks, I saw this issue on careercup, below is the algorithms I thought of:
There are two ways to solve this problem :
binary search for the first egg (risked to know where we need to look
up) O(binary log)
Fibonaccy sequence search 1,2,3,5,8,13,21,34,55,89 for the first egg O(log) http://en.wikipedia.org/wiki/Fibonacci_search_technique
Once first egg is broken we know in which interval we need to look:
binary example:
we try 100/2 (50) if it broke we search from 1 to 50 incrementing by 1 if not we throw it from 50+100/2 (75) if it broke we search from 50 to 75 if not we throw it from 75+100/2 (87) if it broke we search from 75 to 87 incemrenting by one floor at a time and so on and so forth.
fibonacy example: same thing : we try 1,2,3,5,8.13,... if first egg
broke we get back to the last interval's minimum and increment by 1.

hey what about this approach.
Try this sequence:
1,2,4,8,16,32,64,100
And once you find the egg is broken you well get a space to work on.
lets suppose # 64 egg breaks. then the answer lies between 32 & 64.
We can use normal binary search between those 2 number.
we will check # 48 (32+64)/2 and then we will get the upper half or lower half as shortlisted space. and repeat
In this case the worst case is having the floor at 99. which will take 14 tries.

The explanation of the two eggs problem can make some people confused in the first time, so we can understand the solution as follows:
Given x is the floor we start dropping the eggs:
- If it breaks, the total of trials in the worst case is x + (x - 1)
- If it doesn't break, how should we step up to the next floor? We can jump to floor (x + x)th, (x + x + 1)th... But it will increase the number of trials, we can try at x = 10:
. If it does break, we must try 10 times total in the worst case.
. If it does not break, and we step up to 10th + 10th = 20th and try, and if it breaks, we must try 1 (at floor 10th) + 1 (at floor 20th) + 9 = 11 times. Similarly, if we step up to x + 1, or x + 2 floor it will increase the number of trials.
Actually, we want the number of trials being equal in both cases, for that reason we will step up to x - 1 floor instead of x, x + 1.,etc. Finally, we will have an expression in general:
x + (x - 1) + (x - 2) + ... + 1.
And that's it.

I would say the optimal solution for 100 floors with two eggs is 13 tries not 14.
13, 25, 36, 46, 55, 64, 72, 79, 85, 90, 94, 97, 99, 100 is the optimal answer, but if I reach to 99 I do not really need to try out 100. It is obvious the correct answer without try to drop egg from 100th floor :D

Related

Approaching Dynamic Programming problem / Two restrictions

Given an array A of n integers and k <= n, we want to choose k numbers from this array and split them to pairs, such that the sum of the differences of those pairs (in absolute value) is minimal.
Example: If n = 8 and k = 6 and the array is A = [140, 100, 92, 21, 32, 48, 32, 100], then the optimal answer is 27.
Does someone have an idea?
Where do I start from in this problem?
I'm really bad at DP problems, so I would appreciate an informative answer describing the right approach to solve the problem.
Thanks in advance.
Sort elements. Now pairs ought to be made only with neighbors (for cases like 10,20,20,30 pairing 10/20 + 20/30 gives the same result as 10/30 + 20/20, for cases like 10,14,20 pair 10/20 is worse than 10/14 or 14/10)
Walk through array.
If pair is opened with the last element, we have the only possibility - close that pair with current element
If there is no opened pair and number of closed pairs is less than k/2, we have two possibilities - start pair or omit current element (if number of elements in the rest of array is larger than we must use), and we have to choose the best result from these cases.
So we can build recursion and then transform it into DP (code below is not DP yet, it builds full solution tree).
A = [140, 100, 92, 21, 32, 48, 32, 100]
n = len(A)
k = 6
def best(idx, openstate, pairsleft):
if pairsleft > (n - idx + 1)//2:
return 10000000
if pairsleft == 0:
return 0
if openstate:
return abs(A[idx] - A[idx-1]) + best(idx + 1, False, pairsleft - 1)
else:
return(min(best(idx + 1, True, pairsleft), best(idx + 1, False, pairsleft)))
A.sort()
print(best(0, False, k//2))
>> 27

Unique random string with alphanumberic required in Ruby

I'm using the following code to generate a unique 10-character random string of [A-Z a-z 0-9] in Ruby:
random_code = [*('a'..'z'),*('0'..'9'),*('A'..'Z')].shuffle[0, 10].join
However, sometimes this random string does not contain a number or an uppercase character. Could you help me have a method that generates a unique random string that requires at least one number, one uppercase and one downcase character?
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
all = down + up + digits
[down.sample, up.sample, digits.sample].
concat(7.times.map { all.sample }).
shuffle.
join
#=> "TioS8TYw0F"
[Edit: The above reflects a misunderstanding of the question. I'll leave it, however. To have no characters appear more than once:
def rnd_str
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
[extract1(down), extract1(up), extract1(digits)].
concat(((down+up+digits).sample(7))).shuffle.join
end
def extract1(arr)
i = arr.size.times.to_a.sample
c = arr[i]
arr.delete_at(i)
c
end
rnd_str #=> "YTLe0WGoa1"
rnd_str #=> "NrBmAnE9bT"
down.sample.shift (etc.) would have been more compact than extract1, but the inefficiency was just too much to bear.
If you do not want to repeat random strings, simply keep a list of the ones you generate. If you generate another that is in the list, discard it and generate another. It's pretty unlikely you'll have to generate any extra ones, however. If, for example, you generate 100 random strings (satisfying the requirement of at least one lowercase letter, uppercase letter and digit), the chances that there will be one or more duplicate strings is about one in 700,000:
t = 107_518_933_731
n = t+1
t = t.to_f
(1.0 - 100.times.reduce(1.0) { |prod,_| prod * (n -= 1)/t }).round(10)
#=> 1.39e-07
where t = C(62,10) and C(62,10) is defined below.
An alternative
There is a really simple way to do this that turns out to be pretty efficient: just sample without replacement until a sample is found that meets the requirement of at least lowercase letter, one uppercase letter and one digit. We can do that as follows:
DOWN = ('a'..'z').to_a
UP = ('A'..'Z').to_a
DIGITS = ('0'..'9').to_a
ALL = DOWN + UP + DIGITS
def rnd_str
loop do
arr = ALL.sample(10)
break arr.shuffle.join unless (DOWN&&arr).empty? || (UP&&arr).empty? ||
(DIGITS&&arr).empty?
end
end
rnd_str #=> "3jRkHcP7Ge"
rnd_str #=> "B0s81x4Jto
How many samples must we reject, on average, before finding a "good" one? It turns out (see below if you are really, really interested) that the probability of getting a "bad" string (i.e, selecting 10 characters at random from the 62 elements of all, without replacement, that has no lowercase letters, no uppercase letters or no digits, is only about 0.15. (15%). That means that 85% of the time no bad samples will be rejected before a good one is found.
It turns out that the expected number of bad strings that will be sampled, before a good string is sampled, is:
0.15/0.85 =~ 0.17
The following shows how the above probability was derived, should anyone be interested.
Let n_down be the number of ways a sample of 10 can be drawn that has no lowercase letters:
n_down = C(36,10) = 36!/(10!*(36-10)!)
where (the binomial coefficient) C(36,10) equals the number of combinations of 36 "things" that can be "taken" 10 at a time, and equals:
C(36,10) = 36!/(10!*(36-10)!) #=> 254_186_856
Similarly,
n_up = n_down #=> 254_186_856
and
n_digits = C(52,10) #=> 15_820_024_220
We can add these three numbers together to obtain:
n_down + n_up + n_digits #=> 16_328_397_932
This is almost, but not quite, the number of ways to draw 10 characters, without replacement, that contains no lowercase letters characters, uppercase letters or digits. "Not quite" because there is a bit of double-counting going on. The necessary adjustment is as follows:
n_down + n_up + n_digits - 2*C(26,10) - 3
#=> 16_317_774_459
To obtain the probability of drawing a sample of 10 from a population of 62, without replacement, that has no lowercase letter, no uppercase letter or no digit, we divide this number by the total number of ways 10 characters can be drawn from 62 without replacement:
(16_317_774_459.0/c(62,10)).round(2)
#=> 0.15
If you want a script to generate just some small amount of tokens (like 2, 5, 10, 100, 1000, 10 000, etc), then the best way would be to simply keep the already generated tokens in memory and retry until a new one is generated (statistically speaking, this wont take long). If this is not the case - keep reading.
After thinking about it, this problem turned out to be in fact very interenting. For brievety, I will not mention the requirement to have at least one number, capital and lower case letters, but it will be included in the final solution. Also let all = [*'1'..'9', *'a'..'z', *'A'..'Z'].
To sum it up, we want to generate k-permutations of n elements with repetition randomly with uniqueness constraint.
k = 10, n = 61 (all.size)
Ruby just so happens to have such method, it's Array#repeated_permutation. So everything is great, we can just use:
all.repeated_permutation(10).to_a.map(&join).shuffle
and pop the resulting strings one by one, right? Wrong! The problem is that the amount of possibilities happens to be:
k^n = 10000000000000000000000000000000000000000000000000000000000000 (10**61).
Even if you had an infinetelly fast processor, you still can't hold such amount of data, no matter if this was the count of complex objects or simple bits.
The opposite would be to generate random permutations, keep the already generated in a set and make checks for inclusion before returning the next element. This is just delaying the innevitable - not only you would still have to hold the same amount of information at some point, but as the number of generated permutations grows, the number of tries required to generate a new permutation diverges to infinity.
As you might have thought, the root of the problem is that randomness and uniqueness hardly go hand to hand.
Firstly, we would have to define what we would consider as random. Judging by the amount of nerdy comics on the subject, you could deduce that this isn't that black and white either.
An intuitive definition for a random program would be one that doesn't generate the tokens in the same order with each execution. Great, so now we can just take the first n permutations (where n = rand(100)), put them at the end and enumerate everything in order? You can sense where this is going. In order for a random generation to be considered good, the generated outputs of consecutive runs should be equaly distributed. In simpler terms, the probability of getting any possible output should be equal to 1 / #__all_possible_outputs__.
Now lets explore the boundaries of our problem a little:
The number of possible k-permutations of n elements without repetition is:
n!/(n-k)! = 327_234_915_316_108_800 ((61 - 10 + 1).upto(61).reduce(:*))
Still out of reach. Same goes for
The number of possible full permutations of n elements without repetition:
n! = 507_580_213_877_224_798_800_856_812_176_625_227_226_004_528_988_036_003_099_405_939_480_985_600_000_000_000_000 (1.upto(61).reduce(:*))
The number of possible k-combinations of n elements without repetition:
n!/k!(n-k)! = 90_177_170_226 ((61 - 10 + 1).upto(61).reduce(:*)/1.upto(10).reduce(:*))
Finally, where we might have a break through with full permutation of k elements without repetition:
k! = 3_628_800 (1.upto(10).reduce(:*))
~3.5m isn't nothing, but at least it's reasonably computable. On my personal laptop k_permutations = 0.upto(9).to_a.permutation.to_a took 2.008337 seconds on average. Generally, as computing time goes, this is a lot. However, assuming that you will be running this on an actual server and only once per application startup, this is nothing. In fact, it would even be reasonable to create some seeds. A single k_permutations.shuffle took 0.154134 seconds, therefore in about a minute we can acquire 61 random permutations: k_seeds = 61.times.map { k_permutations.shuffle }.to_a.
Now lets try to convert the problem of k-permutations of n elements without repetition to solving multiple times full k-permutations without repetitions.
A cool trick for generating permutations is using numbers and bitmaps. The idea is to generate all numbers from 0 to 2^61 - 1 and look at the bits. If there is a 1 on position i, we will use the all[i] element, otherwise we will skip it. We still didn't escape the issue as 2^61 = 2305843009213693952 (2**61) which we can't keep in memory.
Fortunatelly, another cool trick comes to the rescue, this time from number theory.
Any m consecutive numbers raised to the power of a prime number by modulo of m give the numbers from 0 to m - 1
In other words:
5.upto(65).map { |number| number**17 % 61 }.sort # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
5.upto(65).map { |number| number**17 % 61 } # => [36, 31, 51, 28, 20, 59, 11, 22, 47, 48, 42, 12, 54, 26, 5, 34, 29, 57, 24, 53, 15, 55, 3, 38, 21, 18, 43, 40, 23, 58, 6, 46, 8, 37, 4, 32, 27, 56, 35, 7, 49, 19, 13, 14, 39, 50, 2, 41, 33, 10, 30, 25, 16, 9, 17, 60, 0, 1, 44, 52, 45]
Now actually, how random is that? As it turns out - the more common divisors shared by m and the selected m numbers, the less evenly distributed the sequence is. But we are at luck here - 61^2 - 1 is a prime number (also called Mersenne prime). Therefore, the only divisors it can share are 1 and 61^2 - 1. This means that no matter what power we choose, the positions of the numbers 0 and 1 will be fixed. That is not perfect, but the other 61^2 - 3 numbers can be found at any position. And guess what - we don't care about 0 and 1 anyway, because they don't have 10 1s in their binary representation!
Unfortunatelly, a bottleneck for our randomness is the fact that the bigger prime number we want to generate, the harder it gets. This is the best I can come up with when it comes to generating all the numbers in a range in a shuffled order, without keeping them in memory simultaneously.
So to put everything in use:
We generate seeds of full permutations of 10 elements.
We generate a random prime number.
We randomly choose if we want to generate permutations for the next number in the sequence or a number that we already started (up to a finite number of started numbers).
We use bitmaps of the generated numbers to get said permutations.
Note that this will solve only the problem of k-permutations of n elements without repetition. I still haven't thought of a way to add repetition.
DISCLAIMER: The following code comes with no guarantees of any kind, explicit or implied. Its point is to further express the author's ideas, rather than be a production ready solution:
require 'prime'
class TokenGenerator
NUMBERS_UPPER_BOUND = 2**61 - 1
HAS_NUMBER_MASK = ('1' * 9 + '0' * (61 - 9)).reverse.to_i(2)
HAS_LOWER_CASE_MASK = ('0' * 9 + '1' * 26 + '0' * 26).reverse.to_i(2)
HAS_UPPER_CASE_MASK = ('0' * (9 + 26) + '1' * 26).reverse.to_i(2)
ALL_CHARACTERS = [*'1'..'9', *'a'..'z', *'A'..'Z']
K_PERMUTATIONS = 0.upto(9).to_a.permutation.to_a # give it a couple of seconds
def initialize
random_prime = Prime.take(10_000).drop(100).sample
#all_numbers_generator = 1.upto(NUMBERS_UPPER_BOUND).lazy.map do |number|
number**random_prime % NUMBERS_UPPER_BOUND
end.select do |number|
!(number & HAS_NUMBER_MASK).zero? and
!(number & HAS_LOWER_CASE_MASK).zero? and
!(number & HAS_UPPER_CASE_MASK).zero? and
number.to_s(2).chars.count('1') == 10
end
#k_permutation_seeds = 61.times.map { K_PERMUTATIONS.shuffle }.to_a # this will take a minute
#numbers_in_iteration = {go_fish: nil}
end
def next
raise StopIteration if #numbers_in_iteration.empty?
number_generator = #numbers_in_iteration.keys.sample
if number_generator == :go_fish
add_next_number if #numbers_in_iteration.size < 1_000_000
self.next
else
next_permutation(number_generator)
end
end
private
def add_next_number
#numbers_in_iteration[#all_numbers_generator.next] = #k_permutation_seeds.sample.to_enum
rescue StopIteration # lol, you actually managed to traverse all 2^61 numbers!
#numbers_in_iteration.delete(:go_fish)
end
def next_permutation(number)
fetch_permutation(number, #numbers_in_iteration[number].next)
rescue StopIteration # all k permutations for this number were already generated
#numbers_in_iteration.delete(number)
self.next
end
def fetch_permutation(number_mask, k_permutation)
k_from_n_indices = number_mask.to_s(2).chars.reverse.map.with_index { |bit, index| index if bit == '1' }.compact
k_permutation.each_with_object([]) { |order_index, k_from_n_values| k_from_n_values << ALL_CHARACTERS[k_from_n_indices[order_index]] }
end
end
EDIT: it turned out that our constraints eliminate too much possibilities. This causes #all_numbers_generator to take too much time testing and skipping numbers. I will try to think of a better generator, but everything else remains valid.
The old version that generates tokens with uniqueness constraint on the containing characters:
numbers = ('0'..'9').to_a
downcase_letters = ('a'..'z').to_a
upcase_letters = downcase_letters.map(&:upcase)
all = [numbers, downcase_letters, upcase_letters]
one_of_each_set = all.map(&:sample)
random_code = (one_of_each_set + (all.flatten - one_of_each_set).sample(7)).shuffle.join
Use 'SafeRandom' Gem GithubLink
It will provide the easiest way to generate random values for Rails 2, Rails 3, Rails 4, Rails 5 compatible.
Here you can use the strong_string method to generate a strong combination of string ( ie combination of the alphabet(uppercase, downcase), number, and symbols
# => Strong string: Minumum number should be greater than 5 otherwise by default 8 character string.
require 'safe_random'
puts SafeRandom.strong_string # => 4skgSy93zaCUZZCoF9WiJF4z3IDCGk%Y
puts SafeRandom.strong_string(3) # => P4eUbcK%
puts SafeRandom.strong_string(5) # => 5$Rkdo

Find the smallest sum of the squares of two measurements taken at least 5 min apart

I'm trying to solve this problem in Python3. I know how to find min1 and min2, but I cannot guess how to search 5 elements in a single pass.
Problem Statement
The input program serves measurements performed by a device at intervals of 1 minute. All data are in natural numbers not exceeding 1000. The problem is to find the smallest sum of the squares of two measurements performed at intervals not less than 5 minutes apart. The first line will contain one natural number -- the number of measurements N. It is guaranteed that 5 < N <= 10000. Each of the following N lines contains one natural number -- the result of the next measurement.
Your program should output a single number, the lowest sum of the squares of two measurements performed at intervals not less than 5 minutes apart.
Sample input:
9
12
45
5
4
21
20
10
12
26
Expected output: 169
I like this question. Fun brain-teaser. :)
I noticed your sample input was all integers in range(1, 100) with some repetition, so I generated sample lists like so:
>>> import random
>>> sample_list = [random.choice(range(1, 100)) for i in range(10)]
>>> sample_list
[74, 68, 57, 18, 36, 8, 89, 73, 77, 80]
According to the problem statement, these numbers represent data measured at one-minute intervals, and one of our constraints is that our result must represent data gathered at least five minutes apart. Ultimately, that means the indices of the data in the original list must differ by at least five. In other words, for any two inputs v1 and v2:
abs(sample_list.index(v1) - sample_list.index(v2)) >= 5
must be true. We also know that we're searching for the smallest sum, so it will be helpful to look at the smallest numbers first.
Thus, I started by mapping the values in the sample_list to the indices where they occur, then sorting them:
>>> occurrences = {}
>>> for index, value in enumerate(sample_list):
... try:
... occurrences[value].append(index)
... except KeyError:
... occurrences[value] = [index]
...
>>> occurrences
{80: [9], 18: [3], 68: [1], 73: [7], 89: [6], 8: [5], 57: [2], 74: [0], 77: [8], 36: [4]}
>>> sorted_occurrences = sorted(occurrences)
>>> sorted_occurrences
[8, 18, 36, 57, 68, 73, 74, 77, 80, 89]
After a whole lot of trial and error, here's what I finally came up with in function form (including some of the earlier-discussed pieces):
def smallest_sum_of_squares_five_apart(sample):
occurrences = {}
for index, value in enumerate(sample):
try:
occurrences[value].append(index)
except KeyError:
occurrences[value] = [index]
sorted_occurrences = sorted(occurrences)
least_sum = 0
for index, v1 in enumerate(sorted_occurrences):
if least_sum and v1**2 > least_sum:
return least_sum
for v2 in sorted_occurrences[:index+1]:
if (abs(max(occurrences[v1]) - min(occurrences[v2])) >= 5 or
abs(max(occurrences[v2]) - min(occurrences[v1])) >= 5):
print('Found candidates:', str((v1, v2)))
sum_of_squares = v1**2 + v2**2
if not least_sum or sum_of_squares < least_sum:
least_sum = sum_of_squares
return least_sum
The idea here is to:
Start by looking at the smallest values first.
Compare them one by one with all the values smaller, up to themselves.
Check each against our criteria. Notice we do this by checking the extremes of each, where these two numbers occur the farthest away from one another in the original sample.
Break out when checking becomes pointless.
Unfortunately, it is not sufficient to find the first one. Depending how the list is constructed, it will not always find the smallest pair first this way. In fact, it does not for your own sample input. However, once v1**2 (the square of the larger value) is larger than the sum, we know since all numbers are natural numbers it is pointless to continue looking.
I have included a full runnable implementation of this below. It takes a command line argument (default 10) indicating the number of items you want in the randomly generated sample. It will print the randomly generated sample as well as all candidate pairs it checked, and finally the sum itself. I have checked this on 10-sized inputs several times and it seems to be working in general. However, feedback is welcome if it is not correct. Note also you can uncomment your sample list from the question to see how it works (and that it gets the right answer) for it.
import random
import sys
def smallest_sum_of_squares_five_apart(sample):
occurrences = {}
for index, value in enumerate(sample):
try:
occurrences[value].append(index)
except KeyError:
occurrences[value] = [index]
sorted_occurrences = sorted(occurrences)
least_sum = 0
for index, v1 in enumerate(sorted_occurrences):
if least_sum and v1**2 > least_sum:
return least_sum
for v2 in sorted_occurrences[:index+1]:
if (abs(max(occurrences[v1]) - min(occurrences[v2])) >= 5 or
abs(max(occurrences[v2]) - min(occurrences[v1])) >= 5):
print('Found candidates:', str((v1, v2)))
sum_of_squares = v1**2 + v2**2
if not least_sum or sum_of_squares < least_sum:
least_sum = sum_of_squares
return least_sum
if __name__ == '__main__':
try:
r = int(sys.argv[1])
except IndexError:
r = 10
sample_list = [random.choice(range(1, 100)) for i in range(r)]
#sample_list = [9, 12, 45, 5, 4, 21, 20, 10, 12, 26]
print(sample_list)
print(smallest_sum_of_squares_five_apart(sample_list))
Try this:
#!/usr/bin/env python3
import queue
inp = [9,12,45,5,4,21,20,10,12,26]
q = queue.Queue() #Make a new queue
smallest = False #No smallest number, yet
best = False #No best sum of squares, yet
for x in inp:
q.put(x) #Place current element on queue
#If there's an item from more than five minutes ago, consider it
if q.qsize()>5:
temp = q.get() #Pop oldest item from queue into temporary variable
if not smallest: #If this is the first item more than 5 minutes old
smallest = temp #it is the smallest item by default
else: #otherwise...
smallest = min(temp,smallest) #only store it if it is the smallest yet
#If we have no best sum of squares or the current item produces one, then
#save it as the best
if (not best) or (x*x+smallest*smallest<best):
best = x*x+smallest*smallest
print(best)
The idea is to walk through the queue keeping track of the smallest element we have seen yet which is older than five minutes and comparing it against the newest element keeping track of the smallest sum of squares as we go.
I think you'll find the answer to be pretty intuitive if you think about it.
The algorithm operates in O(N) time.

Am I right with this Double Hashing?

I am learning about Double Hash and I got difficulties understanding how it works. I have done an example but I don't know whether it's right or wrong. It would be great if someone can help me.
This is the input:
m = 13
k = { 5, 14, 29, 25, 17, 21, 18, 32, 20, 9, 15, 27 }
h1(k) = k mod 13
h2(k) = 1 + (k mod 11)
That will work as long as m is prime.
Otherwise h2(x) could evaluate to a non-relative-prime of m, which could make the algorithm fail when there is still room for more items.
For example:
m = 36
h1(x) = 1
h2(x) = 30
If table[1], table[31], table[19], table[13], table[7] are all used; Then the next slot that will be checked is table[1] again.
If h2(x) is relatively prime to m, the cycle will always visit all slots before returning to the starting-point. If m is prime, all numbers will be relatively prime.

Non-repeating pseudo random number stream with 'clumping'

I'm looking for a method to generate a pseudorandom stream with a somewhat odd property - I want clumps of nearby numbers.
The tricky part is, I can only keep a limited amount of state no matter how large the range is. There are algorithms that give a sequence of results with minimal state (linear congruence?)
Clumping means that there's a higher probability that the next number will be close rather than far.
Example of a desirable sequence (mod 10): 1 3 9 8 2 7 5 6 4
I suspect this would be more obvious with a larger stream, but difficult to enter by hand.
Update:
I don't understand why it's impossible, but yes, I am looking for, as Welbog summarized:
Non-repeating
Non-Tracking
"Clumped"
Cascade a few LFSRs with periods smaller than you need, combining them to get a result such than the fastest changing register controls the least significant values. So if you have L1 with period 3, L2 with period 15 and L3 with some larger period, N = L1(n) + 3 * L2(n/3) + 45 * L3(n/45). This will obviously generate 3 clumped values, then jump and general another 3 clumped values. Use something other than multiplication ( such as mixing some of the bits of the higher period registers ) or different periods to make the clump spread wider than the period of the first register. It won't be particularly smoothly random, but it will be clumpy and non-repeating.
For the record, I'm in the "non-repeating, non-random, non-tracking is a lethal combination" camp, and I hope some simple though experiments will shed some light. This is not formal proof by any means. Perhaps someone will shore it up.
So, I can generate a sequence that has some randomness easily:
Given x_i, x_(i+1) ~ U(x_i, r), where r > x_i.
For example:
if x_i = 6, x_(i+1) is random choice from (6+epsilon, some_other_real>6). This guarantees non-repeating, but at the cost that the distribution is monotonically increasing.
Without some condition (like monotonicity), inherent to the sequence of generated numbers themselves, how else can you guarantee uniqueness without carrying state?
Edit: So after researching RBarryYoung's claim of "Linear Congruential Generators" (not differentiators... is this what RBY meant), and clearly, I was wrong! These sequences exist, and by necessity, any PRNG whose next number is dependent only on the current number and some global, non changing state can't have repeats within a cycle (after some initial burn it period).
By defining the "clumping features" in terms of the probability distribution of its size, and the probability distribution of its range, you can then use simple random generators with the underlying distribution and produce the sequences.
One way to get "clumpy" numbers would be to use a normal distribution.
You start the random list with your "initial" random value, then you generate a random number with the mean of the previous random value and a constant variance, and repeat as necessary. The overall variance of your entire list of random numbers should be approximately constant, but the "running average" of your numbers will drift randomly with no particular bias.
>>> r = [1]
>>> for x in range(20):
r.append(random.normalvariate(r[-1], 1))
>>> r
[1, 0.84583267252801408, 0.18585962715584259, 0.063850022580489857, 1.2892164299497422,
0.019381814281494991, 0.16043424295472472, 0.78446377124854461, 0.064401889591144235,
0.91845494342245126, 0.20196939102054179, -1.6521524237203531, -1.5373703928440983,
-2.1442902977248215, 0.27655425357702956, 0.44417440706703393, 1.3128647361934616,
2.7402744740729705, 5.1420432435119352, 5.9326297626477125, 5.1547981880261782]
I know it's hard to tell by looking at the numbers, but you can sort of see that the numbers clump together a little bit - the 5.X's at the end, and the 0.X's on the second row.
If you need only integers, you can just use a very large mean and variance, and truncate/divide to obtain integer output. Normal distributions by definition are a continuous distribution, meaning all real numbers are potential output - it is not restricted to integers.
Here's a quick scatter plot in Excel of 200 numbers generated this way (starting at 0, constant variance of 1):
scatter data http://img178.imageshack.us/img178/8677/48855312.png
Ah, I just read that you want non-repeating numbers. No guarantee of that in a normal distribution, so you might have to take into account some of the other approaches others have mentioned.
I don't know of an existing algorithm that would do this, but it doesn't seem difficult to roll your own (depending on how stringent the "limited amount of state" requirement is). For example:
RANGE = (1..1000)
CLUMP_ODDS = .5
CLUMP_DIST = 10
last = rand(RANGE)
while still_want_numbers
if rand(CLUMP_ODDS) # clump!
next = last + rand(CLUMP_DIST) - (CLUMP_DIST / 2) # do some boundary checking here
else # don't clump!
next = rand(RANGE)
end
print next
last = next
end
It's a little rudimentary, but would something like that suit your needs?
In the range [0, 10] the following should give a uniform distribution. random() yields a (pseudo) random number r with 0 <= r < 1.
x(n + 1) = (x(n) + 5 * (2 * random() - 1)) mod 10
You can get your desired behavior by delinearizing random() - for example random()^k will be skewed towards small numbers for k > 1. An possible function could be the following, but you will have to try some exponents to find your desired distribution. And keep the exponent odd, if you use the following function... ;)
x(n + 1) = (x(n) + 5 * (2 * random() - 1)^3) mod 10
How about (psuedo code)
// clumpiness static in that value retained between calls
static float clumpiness = 0.0f; // from 0 to 1.0
method getNextvalue(int lastValue)
float r = rand(); // float from 0 to 1
int change = MAXCHANGE * (r - 0.5) * (1 - clumpiness);
clumpiness += 0.1 * rand() ;
if (clumpiness >= 1.0) clumpiness -= 1.0;
// -----------------------------------------
return Round(lastValue + change);
Perhaps you could generate a random sequence, and then do some strategic element swapping to get the desired property.
For example, if you find 3 values a,b,c in the sequence such that a>b and a>c, then with some probability you could swap elements a and b or elements a and c.
EDIT in response to comment:
Yes, you could have a buffer on the stream that is whatever size you are comfortable with. Your swapping rules could be deterministic, or based on another known, reproducible psuedo-random sequence.
Does a sequence like 0, 94, 5, 1, 3, 4, 14, 8, 10, 9, 11, 6, 12, 7, 16, 15, 17, 19, 22, 21, 20, 13, 18, 25, 24, 26, 29, 28, 31, 23, 36, 27, 42, 41, 30, 33, 34, 37, 35, 32, 39, 47, 44, 46, 40, 38, 50, 43, 45, 48, 52, 49, 55, 54, 57, 56, 64, 51, 60, 53, 59, 62, 61, 69, 68, 63, 58, 65, 71, 70, 66, 73, 67, 72, 79, 74, 81, 77, 76, 75, 78, 83, 82, 85, 80, 87, 84, 90, 89, 86, 96, 93, 98, 88, 92, 99, 95, 97, 2, 91 (mod 100) look good to you?
This is the output of a small ruby program (explanations below):
#!/usr/bin/env ruby
require 'digest/md5'
$seed = 'Kind of a password'
$n = 100 # size of sequence
$k = 10 # mixing factor (higher means less clumping)
def pseudo_random_bit(k, n)
Digest::MD5.hexdigest($seed + "#{k}|#{n}")[-1] & 1
end
def sequence(x)
h = $n/2
$k.times do |k|
# maybe exchange 1st with 2nd, 3rd with 4th, etc
x ^= pseudo_random_bit(k, x >> 1) if x < 2*h
# maybe exchange 1st with last
if [0, $n-1].include? x
x ^= ($n-1)*pseudo_random_bit(k, 2*h)
end
# move 1st to end
x = (x - 1) % $n
# maybe exchange 1st with 2nd, 3rd with 4th, etc
# (corresponds to 2nd with 3rd, 4th with 5th, etc)
x ^= pseudo_random_bit(k, h+(x >> 1)) if x < 2*(($n-1)/2)
# move 1st to front
x = (x + 1) % $n
end
x
end
puts (0..99).map {|x| sequence(x)}.join(', ')
The idea is basically to start with the sequence 0..n-1 and disturb the order by passing k times over the sequence (more passes means less clumping). In each pass one first looks at the pairs of numbers at positions 0 and 1, 2 and 3, 4 and 5 etc (general: 2i and 2i+1) and flips a coin for each pair. Heads (=1) means exchange the numbers in the pair, tails (=0) means don't exchange them. Then one does the same for the pairs at positions 1 and 2, 3 and 4, etc (general: 2i+1 and 2i+2). As you mentioned that your sequence is mod 10, I additionally exchanged positions 0 and n-1 if the coin for this pair dictates it.
A single number x can be mapped modulo n after k passes to any number of the interval [x-k, x+k] and is approximately binomial distributed around x. Pairs (x, x+1) of numbers are not independently modified.
As pseudo-random generator I used only the last of the 128 output bits of the hash function MD5, choose whatever function you want instead. Thanks to the clumping one won't get a "secure" (= unpredictable) random sequence.
Maybe you can chain together 2 or more LCGs in a similar manner described for the LSFRs described here. Incement the least-significant LCG with its seed, on full-cycle, increment the next LCG. You only need to store a seed for each LCG. You could then weight each part and sum the parts together. To avoid repititions in the 'clumped' LstSig part you can randomly reseed the LCG on each full cycle.

Resources