questions on binary search in Programming Pearls, 2nd Edition - algorithm

I am reading Section 9.3 of Jon Bentley's Programming Pearls, 2nd Edition.
On page 94, Jon gave an implementation of an improved binary search algorithm, utilizing the fact that n is 1000 (search 1000 numbers to find the target).
At the end of the program, it is:
if p > 1000 || x[p] != t
p = -1
My question is that, what if p is exactly 1000? It seems like the when p is 1000, it should also error out, like:
if p >= 1000 || x[p] != t
p = -1
Anyway, this part of code is translated from the code on page 93, at the end of wich:
if p >= n || x[p] != t
p = -1
Is my understanding correct? I am just wondering if this is a typo, or it is really not necessary to include the case p is 1000 in the condtion.
Another question is, in lines 5~6 from bottom up on page 94, it says: When the first test fails and l stays zero, the program computes the bits of p in order, most significant bit first.
What does it mean here? And when the first test fails, shoudn't l be -1, rather than 0?
Anyone can elaborate on this statement?
P.S. I can't find Jon's email address, otherwise, I'll field these questions to him. :-(

It is typo. Maxvalue of l is 999 (1000 - 512 + 256 + .. + 1, ), so maxvalue of p = l+1 is 1000. It is clear for hardcoded version of binsearch (listing 9.8).
And we can see real C code (not pseudocode) here (Alg.4) with if (p >= n ||

Related

Product of consecutive numbers f(n) = n(n-1)(n-2)(n-3)(n- ...) find the value of n

Is there a way to find programmatically the consecutive natural numbers?
On the Internet I found some examples using either factorization or polynomial solving.
Example 1
For n(n−1)(n−2)(n−3) = 840
n = 7, -4, (3+i√111)/2, (3-i√111)/2
Example 2
For n(n−1)(n−2)(n−3) = 1680
n = 8, −5, (3+i√159)/2, (3-i√159)/2
Both of those examples give 4 results (because both are 4th degree equations), but for my use case I'm only interested in the natural value. Also the solution should work for any sequences size of consecutive numbers, in other words, n(n−1)(n−2)(n−3)(n−4)...
The solution can be an algorithm or come from any open math library. The parameters passed to the algorithm will be the product and the degree (sequences size), like for those two examples the product is 840 or 1640 and the degree is 4 for both.
Thank you
If you're interested only in natural "n" solution then this reasoning may help:
Let's say n(n-1)(n-2)(n-3)...(n-k) = A
The solution n=sthen verifies:
remainder of A/s = 0
remainder of A/(s-1) = 0
remainder of A/(s-2) = 0
and so on
Now, we see that s is in the order of t= A^(1/k) : A is similar to s*s*s*s*s... k times. So we can start with v= (t-k) and finish at v= t+1. The solution will be between these two values.
So the algo may be, roughly:
s= 0
t= (int) (A^(1/k)) //this truncation by leave out t= v+1. Fix it in the loop
theLoop:
for (v= t-k to v= t+1, step= +1)
{ i=0
while ( i <= k )
{ if (A % (v - k + i) > 0 ) // % operator to find the reminder
continue at theLoop
i= i+1
}
// All are valid divisors, solution found
s = v
break
}
if (s==0)
not natural solution
Assuming that:
n is an integer, and
n > 0, and
k < n
Then approximately:
n = FLOOR( (product ** (1/(k+1)) + (k+1)/2 )
The only cases I have found where this isn't exactly right is when k is very close to n. You can of course check it by back-calculating the product and see if it matches. If not, it almost certainly is only 1 or 2 in higher than this estimate, so just keep incrementing n until the product matches. (I can write this up in pseudocode if you need it)

Sum of Fibonacci even number

I'm currently working on small ruby projects from project Euler site. I was given a task to sum even fibonacci numbers that are less than 4 millions. Unfortunately there is a small bug in my code, because when I change the limit e.i. to 100, it prints 188 instead of 44. Surprisingly this program gives the right answer but i don't really know in what way my code is wrong.
a=[]; a[0]=1; a[1]=1;
i = 1
while a[-1] < 608
a[i+1]=(a[i] + a[i-1])
i +=1
end
x = 0
a.each do |num|
if num % 2 == 0
x += num
end
end
print "The sum of even Fibonacci number is: #{x}"
The problem comes from the second iteration. You are stopping the generation of Fibonacci numbers when one of the numbers cross the limit (ie when the last number is > 100).
It turns out that after the generation step, the array is [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144], this explains your wrong result 188 = 144+44.
So, your code works only when the last element generated is odd, which is the case in Euler's problem test. In order to correct that, change your second iteration from a.each do ... end to a[0...-1].each do ... end In order to iterate through the array except the last element.
BTW I would recommend you not to use an array here.
You are just wasting memory and ruby is losing time on extending it (this can be solved via Array.new(ARRAY_SIZE)).
Since you don't actually need a fibbonaci sequence you can just have something like this:
LIMIT = 4_000_000
a = 1
b = 1
next_number = a + b
sum = 0
while next_number < LIMIT
sum += next_number if next_number.even?
a = b
b = next_number
next_number = a + b # or next_number += a
end
UPD. Oh my god I don't know why this question appeared in my feed. Sorry for necroposting:)

Homework: Implementing Karp-Rabin; For the hash values modulo q, explain why it is a bad idea to use q as a power of 2?

I have a two-fold homework problem, Implement Karp-Rabin and run it on a test file and the second part:
For the hash values modulo q, explain why it is a bad idea to use q as a power of 2. Can you construct a terrible example e.g. for q=64
and n=15?
This is my implementation of the algorithm:
def karp_rabin(text, pattern):
# setup
alphabet = 'ACGT'
d = len(alphabet)
n = len(pattern)
d_n = d**n
q = 2**32-1
m = {char:i for i,char in enumerate(alphabet)}
positions = []
def kr_hash(s):
return sum(d**(n-i-1) * m[s[i]] for i in range(n))
def update_hash():
return d*text_hash + m[text[i+n-1]] - d_n * m[text[i-1]]
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
positions.append(i)
return ' '.join(map(str, positions))
...The second part of the question is referring to this part of the code/algo:
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
# the modulo q used to check if the hashes are congruent
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
positions.append(i)
I don't understand why it would be a bad idea to use q as a power of 2. I've tried running the algorithm on the test file provided(which is the genome of ecoli) and there's no discernible difference.
I tried looking at the formula for how the hash is derived (I'm not good at math) trying to find some common factors that would be really bad for powers of two but found nothing. I feel like if q is a power of 2 it should cause a lot of clashes for the hashes so you'd need to compare strings a lot more but I didn't find anything along those lines either.
I'd really appreciate help on this since I'm stumped. If someone wants to point out what I can do better in the first part (code efficiency, readability, correctness etc.) I'd also be thrilled to hear your input on that.
There is a problem if q divides some power of d, because then only a few characters contribute to the hash. For example in your code d=4, if you take q=64 only the last three characters determine the hash (d**3 = 64).
I don't really see a problem if q is a power of 2 but gcd(d,q) = 1.
Your implementation looks a bit strange because instead of
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
you could also use
if pattern_hash == text_hash and pattern == text[i:i+n]:
which would be better because you get fewer collisions.
The Thue–Morse sequence has among its properties that its polynomial hash quickly becomes zero when a power of 2 is the hash module, for whatever polynomial base (d). So if you will try to search a short Thue-Morse sequence in a longer one, you will have a great lot of hash collisions.
For example, your code, slightly adapted:
def karp_rabin(text, pattern):
# setup
alphabet = '01'
d = 15
n = len(pattern)
d_n = d**n
q = 32
m = {char:i for i,char in enumerate(alphabet)}
positions = []
def kr_hash(s):
return sum(d**(n-i-1) * m[s[i]] for i in range(n))
def update_hash():
return d*text_hash + m[text[i+n-1]] - d_n * m[text[i-1]]
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
if pattern_hash % q == text_hash % q : #and pattern == text[i:i+n]:
positions.append(i)
return ' '.join(map(str, positions))
print(karp_rabin('0110100110010110100101100110100110010110011010010110100110010110', '0110100110010110'))
outputs a lot of positions, although only three of then are proper matches.
Note that I have dropped the and pattern == text[i:i+n] check. Obviously if you restore it, the result will be correct, but also it is obvious that the algorithm will do much more work checking this additional condition than for other q. In fact, because there are so many collisions, the whole idea of algorithm becomes not working: you could almost as effectively wrote a simple algorithm that checks every position for a match.
Also note that your implementation is quite strange. The whole idea of polynomial hashing is to take the modulo operation each time you compute the hash. Otherwise your pattern_hash and text_hash are very big numbers. In other languages this might mean arithmetic overflow, but in Python this will invoke big integer arithmetic, which is slow and once again loses the whole idea of the algorithm.

Division Algorithm

I've just started with a Design Analysis and Algorithms course and we've begin with simple algorithms.
There is a division algorithm which I can't make any sense of.
function divide(x,)
Input: 2 integers x and y where y>=1
Output: quotient and remainder of x divided by y
if x=0: return (q,r)=(0,0)
(q,r)=divide(floor (x/2), y)
q=2q, r=2r
if x is odd: r=r+1
if r>=y: r=r-y, q=q+1
return(q,r)
* floor is lower bound
We were supposed to try this algo for 110011%101 ( binary values )...I tried something and I got a weird answer...converted into decimal values and it was wrong.
So I tried it using simple decimal values instead of binary first.
x=25, y=5
This is what I'm doing
1st: q=x,r= 12,5
2nd: q=x,r= 6,5
3rd: q=x,r= 3,5
4th: q=x,r= 1,5
5th: q=x,r= 0,5
How will this thing work? Everytime I will run it, the last value of last x will be 0(condition) it will stop and return q=0,r=0
Can someone guide me where I'm going wrong...
Thanks
I implemented your algorithm (with obvious correction in the arg list) in Ruby:
$ irb
irb(main):001:0> def div(x,y)
irb(main):002:1> return [0,0] if x == 0
irb(main):003:1> q,r = div(x >> 1, y)
irb(main):004:1> q *= 2
irb(main):005:1> r *= 2
irb(main):006:1> r += 1 if x & 1 == 1
irb(main):007:1> if r >= y
irb(main):008:2> r -= y
irb(main):009:2> q += 1
irb(main):010:2> end
irb(main):011:1> [q,r]
irb(main):012:1> end
=> nil
irb(main):013:0> div(25, 5)
=> [5, 0]
irb(main):014:0> div(25, 2)
=> [12, 1]
irb(main):015:0> div(144,12)
=> [12, 0]
irb(main):016:0> div(144,11)
=> [13, 1]
It's working, so you must not be tracking the recursion properly when you're trying to hand-trace it. I find it helpful to write the logic out on a new sheet of paper for each recursive call and place the old sheet of paper on top of a stack of prior calls. When I get to a return statement on the current sheet, wad it up, throw it away, and write the return value in place of the recursive call on the piece of paper on top of the stack. Carry through with the logic on this sheet until you get to another recursive call or a return. Keep repeating this until you run out of sheets on the stack - the return from the last piece of paper is the final answer.
The function has a recursive structure, which might be why it's a bit tricky. I'm assuming there's a typo in your function declaration where divide(x,) should be divide(x, y). Given that the desired result is x/y with the remainder, let's continue. The first line in the function definition claims that IF the numerator is 0, return 0 with a remainder of 0. This makes sense: while b != 0 and a = 0, a / b = 0 for all integers. Then we set the result to a recursive call with half the original numerator and the current denominator. At some point, "half the original numerator" turns into 0 and the base case is reached. There's a bit of computation at the end of each recursive call in what seems to be tail recursion. Because we divided by 2 on each deepning, multiply by 2 to get the original result and add 1 to the remainder if it's odd. It's hard to visualize in text alone so step through it on paper with a given problem.
Mathematically, the division algorithm (it's called that) states that the remainder must be less than or equal to 5 when you input 25,5.
The algorithm gives 0, 5. This might mean to NOT consider the remainder when the quotient is 0 or there needs to be a check on the size of the remainder.
function divide(x,) Input: 2 integers x and y where y>=1 Output: quotient and remainder of x divided by y
if x=0: return (q,r)=(0,0)
(q,r)=divide(floor (x/2), y)
q=2q, r=2r
if x is odd: r=r+1
if r>=y: r=r-y, q=q+1
return(q,r)
* floor is lower bound
If I remember correctly, this is one of the most basic ways of doing integral division in a simple ALU. It's nice because you can run all the recursive divisions in parallel, since each division is based on just looking at one less bit of the binary.
To understand what this does, simply walk through it on paper, as Chris Zhang suggested. Here's what divide(25,5) looks like:
(x,y)=(25,5)
divide(12, 5)
divide(6,5)
divide(3,5)
divide(1,5)
divide(0,5) // x = 0!
return(0,0)
(q,r)=(2*0,2*0)
x is odd, so (q,r)=(0,1)
r < y
return(0,1)
(q,r)=(2*0,2*1)
x is odd, so (q,r)=(0,3)
r < y
return(0,3)
(q,r)=(2*0,2*3)
x is even
r >= y, so (q,r)=(1,1)
return(1,1)
(q,r)=(2*1,2*1)
x is even
r < y
return(2,2)
(q,r)=(2*2,2*2)
x is odd, so (q,r)=(4,5)
r >= y, so (q,r)=(5,0)
return(5,0)
As you can see, it work - it gives you a q of 5 and an r of 0. The part you noticed, that you'll always eventually have a 0 term is what Chris properly calls "the base case" - the case that makes the recursive call unfold.
This algorithm works with any base number for the division and the multiplication. It uses the same principle as the following: "123 / 5 = (100 + 20 + 3) / 5 = 20 + 4 + r3 = 24r3", just done in binary.

Ruby challenge: experienced developers opinions

I'm working on some ruby problems geared towards new developers, but I would like the opinions of experienced developers on this. Sorry for the long post, and I really appreciate your time and opinions.
Problem Question
Write a function, nearest_larger(arr, i) which takes an array and an
index. The function should return another index, j: this should
satisfy:
(a) arr[i] < arr[j], AND
(b) there is no j2 closer to i than j where arr[i] < arr[j].
In case of ties (see example below), choose the earliest (left-most)
of the two indices. If no number in arr is larger than arr[i],
return nil.
Difficulty: 2/5
Rspec Test
describe "#nearest_larger" do
it "handles a simple case to the right" do
nearest_larger([2,3,4,8], 2).should == 3
end
it "handles a simple case to the left" do
nearest_larger([2,8,4,3], 2).should == 1
end
it "treats any two larger numbers like a tie" do
nearest_larger([2,6,4,8], 2).should == 1
end
it "should choose the left case in a tie" do
nearest_larger([2,6,4,6], 2).should == 1
end
it "handles a case with an answer > 1 distance to the left" do
nearest_larger([8,2,4,3], 2).should == 0
end
it "handles a case with an answer > 1 distance to the right" do
nearest_larger([2,4,3,8], 1).should == 3
end
it "should return nil if no larger number is found" do
nearest_larger( [2, 6, 4, 8], 3).should == nil
end
end
Solution
def nearest_larger arr, idx
diff = 1
loop do
l = idx - diff
r = idx + diff
return l if (l >= 0) && (arr[l] > arr[idx])
return r if (r < arr.length) && (arr[r] > arr[idx])
return nil if (l < 0) && (r >= arr.length)
diff += 1
end
end
Feedback
How would you go about working towards a solution for this problem? (what's your process?)
In your opinion do find the Problem Question clear and easy to understand?
How long should it take you to solve this problem? (10min, 20min, ...?)
Do agree with the level of difficulty? (Keep in mind this is geared towards new developers)
If willing: please post your own solution, showcasing your style of solving this problem.
I decided to post this question because I know how easy it can be for new developer to get stuck on a problem and not know what to write first. I'm hoping your responses will give an insight on how you would work through a problem that you perceive as a challenge.
I have not an experienced developer, or even an inexperienced one, but I will give you my thoughts anyway.
1 How would you go about working towards a solution for this problem? (what's your process?)
I would look to break into pieces, but surely everyone does that. For example, here the values in the array are only used to pull out the indices of elements that are larger, so I'd see the first problem as pulling out the indices and the second problem as dealing with the indices alone. I'd further simplify the latter by subtracting i from each index so that j and be compared to k like so: if j.abs < k.abs ..., rather than if (j-i).abs < (k-i).abs.... In choosing among different approaches, I tend to look for the one that is most easily understood ("reads best").
2. In your opinion do find the Problem Question clear and easy to understand?
Yes.
3. How long should it take you to solve this problem?
I refuse to answer on the grounds that it would surely incriminate me.
4. Do you agree with the level of difficulty?
It seems about right. It would be a "beginner" problem at rubeque.com.
5. If willing: please post your own solution, showcasing your style of solving this problem.
Sure.
def nearest_larger(arr, i)
ret = nearest_to_zero( arr.each_with_index
.select { |e,j| e > arr[i] }
.map { |_,j| j-i } )
ret ? ret + i : nil
end
I looked at two ways of writing nearest_to_zero(). The first is short, direct and clear, but inefficient, using sort!:
def nearest_to_zero(a)
a.sort! { |j,k| (j.abs == k.abs) ? j <=> k : j.abs <=> k.abs }
a.any? ? a.first : nil
end
More efficient, but not as pretty:
def nearest_to_zero(a)
neg, pos = a.partition { |e| e < 0 }
case
when neg.empty?
pos.empty? ? nil : pos.first
when pos.empty?
neg.last
else
pos.last.abs < neg.last.abs ? pos.first : neg.last
end
end
For arr = [2,5,4,8,10], i = 2, the following steps are performed by nearest_larger():
a = arr.each_with_index.select { |e,j| e > arr[i] } # => [[5,1],[8,3],[10,4]]
b = a.map { |_,j| j-i } # => [-1,1,2]
ret = nearest_to_zero(b) # => -1
ret ? ret + i : nil # => 1
In the first nearest_to_zero(), if two indices have equal absolute value (meaning they are equally close to i before the transformation), the tie goes to the index with the lower vlaue; else it is the index with the smaller absolute value.
In the second nearest_to_zero():
neg, pos = [-1,1,2].partition {|e| e < 0} # => [[-1],[1,2]]
The rest should be self-explanatory.
I had read about rspec, but had not used it before. It was about time that it did. My code passed.
How would you go about working towards a solution for this problem? (what's your process?)
Start with a simple example, e.g. one of the tests. It is discovered that if the array element arr[i-1] is greater than arr[i] then you can immediately return i-1 as the answer. So you can just check in succession: i-1, i+1, i-2, i+2, i-3, i+3 etc. and return the first index that satisfies the inequality.
In your opinion do find the Problem Question clear and easy to understand?
Yes; the tests help but it only confirmed my understanding from the worded problem.
How long should it take you to solve this problem? (10min, 20min, ...?)
For a student in a test/classroom environment, no more than 10min. Depending on how much preparatory material they have had before this, maybe even less.
Do agree with the level of difficulty? (Keep in mind this is geared towards new developers)
Yes, 2/5 seems right.
If willing: please post your own solution, showcasing your style of solving this problem.
def nearest_larger( a, i )
2.upto([i,a.length-i].max << 1) do |k|
j = (k&1).zero? ? i - (k>>1) : i + (k>>1)
return j if 0 <= j && j < a.length && a[i] < a[j]
end
return nil
end
Addendum: Thinking in Bits
This addendum will go through in greater detail the problem solving that went into the above solution for the benefit of new programmers.
As was mentioned in the answer to Question #1 above, the return value of nearest_larger is the first index j for which a[i] < a[j] as j iterates through the sequence
i-1, i+1, i-2, i+2, i-3, i+3, ...
This opens the way to a sub-problem, which is how to generate this sequence of numbers. When actually writing the program, I used comments as a "scratch pad", and in the code had something like this:
# -1, 1, -2, 2, -3, 3, ... (Sequence G)
from which the prior sequence is constructed by just adding i to each term. Call this Sequence G. Now this is where a "binary intuition" would come into play. Consider a simple sequence of binary numbers that increases by one after each term, shown in Column A, and the familiar decimal representation is shown in Column B:
A B C D E F
----------------------------
0000 0 000 0 0 0
0001 1 000 1 0 0
0010 2 001 0 1 -1
0011 3 001 1 1 1
0100 4 010 0 2 -2
0101 5 010 1 2 2
0110 6 011 0 3 -3
0111 7 011 1 3 3
Now split the bits in each number into two groups: all the bits other than bit 0 (the right-most bit) as shown in Column C, and bit 0 shown in Column D. In other words, concatenate C and D to get A. The decimal representation of C is in column E. Notice that column D conveniently flips between 0 and 1, just as in Sequence G the numbers flip between negative and positive. We will use this to construct column F, which is the same as E, except when D is 0 make F negative. Finally, if we just start in the above table at A=0010 (or B=2) then Column F gives us the above Sequence G.
So now how do we get Column F from A in code? This is where bit operations come in to play.
C = A >> 1 - The >> right-shift operator shifts the bits on the LHS (left-hand side) by RHS (right-hand side). In this case, each value A is shifted to the right one place. The right-most bit is lost. Mathematically, it is the same as dividing by 2 and dropping the remainder in this case (B/2 == E with remainder dropped.)
D = A & 1 - The & is the bitwise AND operator. By "anding" A with 1, we select only bit 0; see the link in the prior sentence for more detail. This gives us Column D.
Putting this together in the code, we'll have k be the iteration variable that starts at 2 and increments by 1 each time. Then the above analysis gives us j:
j = (k&1).zero? ? i - (k>>1) : i + (k>>1)
The first value for j which is both in bounds and for which a[i] < a[j] holds is automatically the answer, so it can be returned immediately:
return j if 0 <= j && j < a.length && a[i] < a[j]
Finally, if there are no valid values for j then return nil. Other than calculating a lower upper-bound for k, which is left as a homework problem, that is the entirety of the nearest_larger function.
In actual practice, for a problem like this, a readable solution as posed in the OP is preferable since it is more clear and accessible to a wider group of programmers. This present approach was motivated by an opportunity to demonstrate the use of bit operations.

Resources