1000 Digit Fibonacci - Error on Euler? - ruby

Below is my code. It runs. It works.
The problem is, the INDEX of the first 1000 digit fibonacci number isn't 4872...it's 4871. 4872 is the POSITION, not the INDEX. Is Euler accepting the wrong answer, or did they use the word index when they should have used position?
def fib_of_a_certain_digit(num)
fibs = [1, 1]
idx = 1
while true
fib = fibs[idx] + fibs[idx-1]
fibs << fib
idx += 1
digilength = fib.to_s.split("").length
return "The first #{num} digit Fibonacci number is at index #{idx}, the fibonacci array is #{fibs.length} long" if digilength == num
end
end
puts fib_of_a_certain_digit(3)
puts fib_of_a_certain_digit(1000)
Here is the output.
The first 3 digit Fibonacci number is at index 11, the fibonacci array is 12 long
The first 1000 digit Fibonacci number is at index 4781, the fibonacci array is 4782 long
As you can see, the control case matches the known data.
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
The last number in the array is 144. It is at index 11, but is the 12th number in the array.
The same principle applies to the larger number (it's just too big to paste here). It winds up in the last position of the array (4872), which has the index of 4871.
Why has nobody else noticed this?

No, that's not an error. Project Euler says:
Hence the first 12 terms will be:
F1 = 1
F2 = 1
F3 = 2
...
F11 = 89
F12 = 144
Note the little subscript numbers bottom right of each "F". Those are the indexes. So they start indexing with 1, and thus "position" and "index" are equivalent here. In particular, we can see that the first Fibonacci number with three digits is at index 12.
Your choice of programming language and data type and that language's choice of indexing doesn't override what's in the problem statement. And if it did, there'd be a problem because there are programming languages that start indexing with 1.
In the comments below you talk about "common terms" and what they "usually mean". I'm sure you realized that Project Euler is very mathematical, and in mathematics, those subscripts are the indexes. See for example Index notation in mathematics. Btw, all the examples there start indexing with 1 (not 0), because that's a common/usual way in mathematics as well.

Related

Interview Question: Remove repeating numbers at the end of an array

I got a surprising interview question today at a big Bay Area tech company that I was absolutely stumped by despite seeming so easy. Was wondering if anyone has seen it or can offer a simpler solution as the interviewer didn't want to show me the answer. The solution can be written in any language or pseudocode.
Question:
Given a list of numbers, remove any extraneous repeating suffix sequences of numbers that appear at the end of the list until it has no repeating suffix sequences. The repeating sequence can be cut-off.
For example:
[1,2,3,4,5,6,7,5,6,7,5,6] -> [1,2,3,4,5,6,7]
explanation: [5, 6, 7] were repeating
Also consider the situation
[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5,] -> [1,2,3,4,5,4,5,1] # not [1,2,3,4,5,4,5,1,4,5,4,5,1]
explanation: [4,5,4,5,1] is a repeating sequence
There are always two ways to approach this topic. Finding any solution and finding an efficient one. It is usually better to start with any and then think on how to optimize it.
Now as we can see in the second example, the problem is complicated by the fact that the repeating pattern is not known. So we could just do it for all the possible patterns at the end. Then we would need to check two things
is it actually repeating
how long is the result
Then we could just take the shortest result. Here is the Python code:
def remove_repeating_tail(a: list) -> list:
results = []
for i in range(len(a)):
tail = a[i:]
results.append(remove_repeats(a, tail))
if len(results) == 0:
return a
return sorted(results, key=len)[0]
Also we made sure we cover all the cases. Empty list, no repeating pattern. Next we need to write remove_repeats. Also we check the empty repeating pattern, so we need to be aware of that.
def remove_repeats(a: list, tail: list) -> list:
assert len(tail) <= len(a)
if len(tail) == 0:
return a
remainder = a
count = 0
while remainder[-len(tail):] == tail:
remainder = remainder[:-len(tail)]
count += 1
if count <= 1:
return a
return remainder
We remove the repeating pattern and then add it back at the end. Now it's time to test the code if it actually works, if that is possible in the interview.
remove_repeating_tail([1,2,3,4,5,6,7,5,6,7,5,6])
-> [1, 2, 3, 4, 5, 6]
remove_repeating_tail([1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5])
-> [1, 2, 3, 4, 5, 4, 5]
Also good to check some other cases:
remove_repeating_tail([1,2,3,4])
-> [1, 2, 3, 4]
remove_repeating_tail([])
-> []
After quite a bit of fixing we got the above, which I think is correct. In particular I missed:
first I had an infinite loop in remove_repeats for an empty tail
remove_repeats removed always the tail and sometimes everything, as I wasn't checking that there is at least one repeat. I then added the counting.
I made simple mistakes like writing results = res instead of results.append(res) leading to some Exceptions.
Then a lot of simplification. First I used some sentinel None to communicate back that it is not repeating, but we could just return the whole list. Then I checked the repeating with some if before the while loop, but realized its basically doing the same as the first iteration, so I used counting.
Similarly I don't like the if len(results) == 0: check. I would probably add a to the result in the beginning and remove the check, as now there is always a result. Then we could start the counting from 1 instead of 0. Still I kept it in.
If we want something fast, we first need to analyze the complexity.
So remove repeating tails for a list of size n and tail size k is: O(n / k). Then we call this function n times. And then we sort it. Wait why do we sort it, we could just take the minimum return min(results, key=len). That's better.
In each loop we call remove_repeats starting with k = 1 to n. So we have:
sum(k = 1 .. n) O(n / k). This is n / 1 + n / 2 + n / 3 + .. n / n. I had to look this up on Wikipedia, but these are called harmonic numbers. We can also just make our live easy and say its less than O(n^2) for now. Otherwise I found an approximation of H_n = n ln(n) + 0.5 n here. So the complexity overall is O(n log n). Not to bad I would say. Is it the optimal? Maybe. Here I would compare it to some other similar algorithms (like substring search, etc).
Before going there, at this point, I would check with the interviewer, where he would like to go next. As there are many directions.
This seems a tricky question and there may not be a simple solution. Best solution I can think of would be O(n) time and O(n) and that is if I am not missing any edge case.
Let's take as example
[1,2,3,4,5,4,5,1,4,5,4,5,1,4,5,4,5] -> [1,2,3,4,5,4,5,1]
Steps would be as follows:
Iterate over the input array from last index to first and build a dictionary (hashtable) with every number in the array being a key and value: a list of positions where the specific number is found in the array.
Occurrences dictionary will become:
{
5: [14, 11, 9, 6, 4],
4: [13, 10, 8, 5, 3],
1: [12, 7, 0],
3: [2]
2: [1]
}
Find the possible suffix lengths by calculating deltas between every position and first position for every number. This way we take into consideration the case in which a specific number repeats in the suffix or in the prefix.
We then add each distinct possible suffix length to a set.
We sort the possible suffix lengths in descending order.
We get following suffix lengths:
[12, 10, 7, 5, 2]
For every possible length l, we test if arr[n-1] == arr[n-1-l]. If l is our suffix's length, it means that the number at last position is repeated at exactly l positions before. We then check the last l elements to respect the same condition. If they do, we found the maximum suffix length. If not, the max suffix length is even smaller, so we check the next possible length.
After finding the correct suffix length, we delete the remaining numbers that repeat at positions pos-l. We then return the slice of array with suffix removed.
def removeRepeatingSuffixes(arr):
if not arr:
return []
n = len(arr)
occurrences = {}
for i in range(n - 1, -1, -1):
c = arr[i]
if c not in occurrences:
occurrences[c] = []
occurrences[c].append(i)
# treat edge case: no repeating suffix
if len(occurrences[arr[n-1]]) == 1:
return arr
# create a set of possible suffix lengths,
# based on the differences between the positions of each number.
possible_suffixes_lengths_set = set()
for c, olist in occurrences.items():
if len(olist) >= 2:
for i in range(len(olist)-1):
delta = olist[i] - olist[len(olist)-1]
possible_suffixes_lengths_set.add(delta)
suff_lengths = sorted(possible_suffixes_lengths_set, reverse=True)
for l in suff_lengths:
if arr[n - 1] == arr[n - 1 - l]:
# possible suffix length, check if last l characters repeat
ok_length = True
for j in range(n-2, n-1-l, -1):
if arr[j] != arr[j-l]:
ok_length = False
break
if ok_length:
last_i = n-1-l
while last_i > 0 and arr[last_i] == arr[last_i - l]:
last_i -= 1
# return non-repeating slice, from 0 to last_i
return arr[0:last_i + 1]
quick way to remove repeating or dedupe is change to a type set() instead of a list

How to find the xth decibinary number?

Hackerrank has a problem called Decibinary numbers which are essentially numbers with 0-9 digit values but are exponentiated using powers of 2. The question asks us to display the xth decibinary number. There is another twist to the problem. Multiple decibinary numbers can equal the same decimal number. For example, 4 in decimal can be 100, 20, 12, and 4 in decibinary.
At first, I thought that finding how many decibinary numbers for a given decimal number would be helpful.
I consulted this post for a bit help ( https://math.stackexchange.com/questions/3540243/whats-the-number-of-decibinary-numbers-that-evaluate-to-given-decimal-number ). The post was a bit too hard to understand but then I also realized that even though we have how many decibinary numbers a decimal number can have, this doesn't help FINDING them (at least to my knowledge) which is the original goal of the question.
I do realize that for any decimal number, the largest decibinary number for it will simply be its binary representation. For ex, for 4 it is 100. So the brute force approach would be to check all numbers in this range for each decimal number and see if their decibinary representation evaluates to the given decimal number, but it is clearly evident that this approach will never pass since the input constraints define x to be from 1 to 10^16. Not only that, we have to find the xth decibinary number for a q amount of queries where q is from 1 to 10^5.
This question falls under the section of dp but I am confused how dp will be used or how it is even possible. In order for calculating the xth decibinary number q times (which is described in the brute force method above) it would be better to use a table (like the problem suggests). But for that, we would need to store and calculate 10^16 integers since that is the how big x can be. Assuming an integer is 4 Bytes, 4B * 10^16 ~= 4B * (2^3)^16 = 2^50 Bytes.
Can someone please explain how this problem is solved optimally. I am still new to CP so if I have made an error in something, please let me know.
(see link below for full problem statement):
https://www.hackerrank.com/challenges/decibinary-numbers/problem
This is solvable with about 80 MB of data. I won't give code, but I will explain the strategy.
Build a lookup count[n][i] that gives you the number of ways to get the decimal number n using the first i digits. You start by inserting 0 everywhere, and then put a 1 in count[0][0]. Now start filling in using the rule:
count[n][i] = count[n][i-1] + count[n - 2**i][i-1] + count[n - 2*2**i][i-1] + ... + count[n - 9*2**i][i-1]
It turns out that you only need the first 19 digits, and you only need counts of n up to 2**19-1. And the counts all fit in 8 byte longs.
Once you have that, create a second data structure count_below[n] which is the count of how many decibinary numbers will give a value less than n. Use the same range of n as before.
And now a lookup proceeds as follows. First you do a binary search on count_below to find the last value that has less than your target number below it. Subtracting count_below from your query, you know which decibinary number of that value you want.
Next, search through count[n][i] to find the i such that you get your target query with i digits, and not with less. This will be the position of the leading digit of your answer. You then subtract off count[n][i-1] from your query (all the decibinaries with fewer digits). Then subtract off count[n-2**i][i-1], count[n-2* 2**i][i-1], ... count[n-8*2**i][i-1] until you find what that leading digit is. Now you subtract the contribution of that digit from the value, and repeat the logic for finding the correct decibinary for that smaller value with fewer digits.
Here is a worked example to clarify. First the data structures for the first 3 digits and up to 2**3 - 1:
count = [
[1, 1, 1, 1], # sum 0
[0, 1, 1, 1], # sum 1
[0, 1, 2, 2], # sum 2
[0, 1, 2, 2], # sum 3
[0, 1, 3, 4], # sum 4
[0, 1, 3, 4], # sum 5
[0, 1, 4, 6], # sum 6
[0, 1, 4, 6], # sum 7
]
count_below = [
0, 1, 2, 4, 6, 10, 14, 20, 26, ...
]
Let's find the 20th.
count_below[6] is 14 and count_below[7] is 20 so our decimal sum is 6.
We want the 20 - count_below[6] = 6th decibinary with decimal sum 6.
count[6][2] is 4 while count[6][3] is 6 so we have a non-zero third digit.
We want the count[6][3] - count[6][2] = 2 with a non-zero third digit.
count[1][6 - 2**2] is 2, so 2 have 3rd digit 1.
The third digit is 1
We are now looking for the second decibinary whose decimal sum is 2.
count[2][1] is 1 and count[2][2] is 2 so it has a non-zero second digit.
We want the count[2][2] - count[2][1] = 1st with a non-zero second digit.
The second digit is 1
The rest is 0 because 2 - 2**1 = 0.
And thus you find that the answer is 110.
Now for such a small number, this was a lot of work. But even for your hardest lookup you'll only need about 20 steps of a binary search to find your decimal sum, another 20 steps to find the position of the first non-zero digit, and for each of of those digits, you'll have to do 1-9 different calculations to find what that digit is. Which means only hundreds of calculations to find the number.

Algorithm to finding shortest sequence of numbers from array A that add up to number B

I had an interesting interview question the other day that sort of stumped me. I couldn't find a really good answer for it. The problem stated:
Suppose you are given a number B and an array A of length n. The number B is a natural number, and all numbers in array A are distinct, natural numbers. Design an algorithm that would find the shortest sequence of numbers in array A that would sum up to the number B. Duplicates can be used.
So, as an example, let us say I have a number B = 19, and A = [9, 6, 3, 1]. I could say a solution is 6+6+6+1, or 3+3+3+3+3+3+1, but the solution they are looking for is 9+9+1, because that is the shortest sequence of numbers.
The algorithm that I designed would sort the array and reach into the largest number and subtract it from the original number. It would keep doing this until it could no longer subtract the largest number. It would then go through the array and see if it could keep finding any numbers that it could subtract from B. It actually looked a lot like this:
def domath(b, a):
a.sort()
x = []
n = 0
idx = -1
while b != 0:
n = a[idx]
if(b >= n):
b -= n
x.append(n)
else:
idx -= 1
return x
But this solution would not always work. It would only work if you were lucky enough to have, say, a 2 or a 1 in the array, or the numbers that you kept subtracting from b magically worked. Consider if B=21 and A=[7,8,9]. If it kept subtracting 9, it would not be able to find a solution.
So I was thinking "Okay, then maybe I need to backtrack a bit.".
If I reached into the x array, which keeps track of all the number we kept subtracting, I could add the latest number we subtracted from b, then try to move the idx to the next largest number. So, instead of doing 21 - 9, then 12 -9, it would do 21 - 9, then 12 - 8. It still wouldnt find anything, so then it would try 21 - 9, then 12 - 7. It still wouldnt find anything, so it would try 21 - 8, then 13 - 8, and it wouldnt find anything, so it would do 21 -8, then 13 -7, and it still wouldn't find anything, so it would try 21 -7, and continue on that, and determine if it could do it. If it cant (in this case, it should), it would just return "False" or something.
Is that... a good solution? I feel like there must be a better one, because the interviewers were kind of iffy about this solution.
Tricky. The linked wikipedia page suggests an approach that will take I think O (B * length (A)) which would take quite long if we had B = 1,000,000,000,000 instead of B = 21 with A = [9, 8, 7]. Your backtracking algorithm would handle this reasonably quickly if you start with a division:
111,111,111,111 nines leaves one, no way.
111,111,111,110 nines leaves ten, no way (trying 1 or 0 eights)
111,111,111,109 nines leaves 19, no way (trying 2, 1 or 0 eights)
111,111,111,108 nines leaves 28 = 4x7 (trying 3 .. 0 eights). Best so far.
111,111,111,107 nines leaves 37. 4x8 < 37, no solution can beat what we have.
In your example, B = 21, backtracking would also work quite well. If we just denote the numbers of nines, eights, and sevens, then you would just try the following: 2,0,0; 1,1,0; 1,0,1; 0,2,0; 0,1,1; 0,0,3.
You'd want to stop search branches when you have a solution and can prove that no further solution can be better. That's what I did: When you have 37 left and the highest number available is 8 then you need at least 5 numbers. And for every nine that you remove that number is going up at least by one, so the best solution so far cannot be beaten.

Why does this naive prime-number algorithm fail?

I am attempting to implement this "find the nth prime number" algorithm in Ruby 2.1.
I've tagged it 'algorithm' as well because I think the question is language-agnostic, and that the Ruby code written is simple enough to read even if you're unfamiliar. I've used descriptive variable names to help it.
Iterate over the whole number system, ignoring even numbers greater than 2 (2, 3, 5, 7, …)
For each integer, p, check if p is prime:
Iterate over the primes already found which are less than the square-root of p
For each prime in this set, f, check to see if it is a factor of p:
i. If f divides p then p is non-prime. Continue from 2 for the next p.
If no factors are found, p is prime. Continue to 3.
If p is not the nth prime we have found, add it to the list of primes. Continue from 2 for the next p.
Otherwise, p is the nth prime we have found and we should return it.
Sounds simple enough. So I write my method (function):
def nth_prime(n)
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
primes[-1].upto(Float::INFINITY) do |p|
return primes[n-1] if primes.length >= n-1
possible_prime = true
primes_to_check = primes.select{|x|x<=Math.sqrt(p)}
primes_to_check.each do |f|
if f%p==0
possible_prime = false
break
end
end
primes << p if possible_prime
end
end
The intent is to say nth_prime(10) and get the 10th prime number.
To explain my logic:
I start with a list of known primes, since the algorithm requires that. I list the first ten.
Then I iterate over the entire number system. (primes[-1]+2).upto(Float::Infinity) do |p| will offer each number up from the last known prime plus two (since +1 will result in an even number and evens over 2 cannot be prime) to infinity to the indented block as p. I have not skipped even numbers and have
The first thing I do is return the nth prime number if the list of known primes is already at least n elements long. This works for the known values -- if you ask for the 5th, you'll get 11 as a result.
Then I set a flag, possible_prime, to true. This indicates that nothing has proved it to be not a prime yet. I'm going to do some tests and if it survives those without the flag being changed to false, then p is proven to be prime and is appended to the known-primes array. Eventually that array will be as long as n and return the nth value.
I create an array, primes_to_check, containing all known primes <= the square root of p. Each one gets tested in turn as f.
If f can cleanly divide p, I know that p is not prime, so I change the flag to false, and break, which brings us out of the primes-to-check loop and back in the upto-infinity loop. There's only one statement left in that loop, the one that appends to the known-primes array if the flag is true, which it isn't so we restart the loop with the next number.
If no fs can cleanly divide p then p must be prime, which means it survives to the end of the primes-to-check loop with the flag still set to true, and reaches the final 'append p to known primes' statement.
Eventually this will make the primes array sufficiently longer to answer the question "What is the nth prime?".
Problem
Asking for the 10th prime does get me 29, the last prime I pre-supplied. But asking for 11 gets nil, or no value. I've gone over the code a hundred times and can't imagine a case in which no value gets returned.
What have I done wrong?
return primes[n-1] if primes.length >= n-1
For primes to have an element at index n-1, it must have length at least n.
if f%p==0
This checks whether a known prime is divisible by the candidate, not whether the candidate is divisible by a known prime.
primes[-1].upto(Float::INFINITY) do |p|
This starts the loop at a prime already in the list (29). 29 is correctly found to be prime, so it is added to the list again. You'll want to start the loop at a number after 29.
Algorithm for testing prime no.s:
1)Input num
2)counter= n-1
3)repeat
4)remainder = num%counter
5)if rem=0 then
6)broadcast not a prime.no and stop
7)change counter by -1
8)until counter = 1
9)say its a prime and stop

Randomly sampling unique subsets of an array

If I have an array:
a = [1,2,3]
How do I randomly select subsets of the array, such that the elements of each subset are unique? That is, for a the possible subsets would be:
[]
[1]
[2]
[3]
[1,2]
[2,3]
[1,2,3]
I can't generate all of the possible subsets as the real size of a is very big so there are many, many subsets. At the moment, I am using a 'random walk' idea - for each element of a, I 'flip a coin' and include it if the coin comes up heads - but I am not sure if this actually uniformly samples the space. It feels like it biases towards the middle, but this might just be my mind doing pattern-matching, as there will be more middle sized possiblities.
Am I using the right approach, or how should I be randomly sampling?
(I am aware that this is more of a language agnostic and 'mathsy' question, but I felt it wasn't really Mathoverflow material - I just need a practical answer.)
Just go ahead with your original "coin flipping" idea. It uniformly samples the space of possibilities.
It feels to you like it's biased towards the "middle", but that's because the number of possibilities is largest in the "middle". Think about it: there is only 1 possibility with no elements, and only 1 with all elements. There are N possibilities with 1 element, and N possibilities with (N-1) elements. As the number of elements chosen gets closer to (N/2), the number of possibilities grows very quickly.
You could generate random numbers, convert them to binary and choose the elements from your original array where the bits were 1. Here is an implementation of this as a monkey-patch for the Array class:
class Array
def random_subset(n=1)
raise ArgumentError, "negative argument" if n < 0
(1..n).map do
r = rand(2**self.size)
self.select.with_index { |el, i| r[i] == 1 }
end
end
end
Usage:
a.random_subset(3)
#=> [[3, 6, 9], [4, 5, 7, 8, 10], [1, 2, 3, 4, 6, 9]]
Generally this doesn't perform so bad, it's O(n*m) where n is the number of subsets you want and m is the length of the array.
I think the coin flipping is fine.
ar = ('a'..'j').to_a
p ar.select{ rand(2) == 0 }
An array with 10 elements has 2**10 possible combinations (including [ ] and all 10 elements) which is nothing more then 10 times (1 or 0). It does output more arrays of four, five and six elements, because there are a lot more of those in the powerset.
A way to select a random element from the power set is the following:
my_array = ('a'..'z').to_a
power_set_size = 2 ** my_array.length
random_subset = rand(power_set_size)
subset = []
random_subset.to_i(2).chars.each_with_index do |bit, corresponding_element|
subset << my_array[corresponding_element] if bit == "1"
end
This makes use of strings functions instead than working with real "bits" and bitwise operations just for my convenience. You can turn it into a faster (I guess) algorithm by using real bits.
What it does, is to encode the powerset of array as an integer between 0 and 2 ** array.length and then picks one of those integers at random (uniformly random, indeed). Then it decodes back the integer into a particular subset of array using a bitmask (1 = the element is in the subset, 0 = it is not).
In this way you have an uniform distribution over the power set of your array.
a.select {|element| rand(2) == 0 }
For each element, a coin is flipped. If heads ( == 0), then it is selected.

Resources