Continuous Random Variable Probability - probability

f_X(x) = c-x^2 for -1 <= x <= 1
find P( X > 0 | X < 0.5)
So I've thought about it and I wanted to know if I can say that the probability can also be modeled as P(0 < X < 0.5). Am I going wrong with that thinking or is this right?

Related

Finding the number of solutions and the solutions in a given interval of a Linear Diophantine Equation

I recently studied Linear Diophantine Equation and found one possible solution using Extended Euclidean approach but what if we are given a range of permissible 'x' and 'y' and asked to count the number of solutions and also find the solutions. I already looked this at here but was not able to understand it more clearly. Any other approach or explaining the above approach in easier words is appreciated. Thanks.
I have read this in cp-algorithms. The code was unclear to me, but I have written a code, which works correctly and is easier to understand. Actually, I have checked the correctness of my code by solving this problem in codeforces. It is the python version of my code :
from math import ceil, gcd, floor
def GCD(a, b):
if b == 0:
x = 1
y = 0
return(x, y)
A, B = GCD(b, a % b)
x = B
y = A-B*(a//b)
return(x, y)
def find_ans(a, b, c, minx, maxx, miny, maxy):
if c % gcd(abs(a), abs(b)) != 0:
return 0
sb = b//abs(b) # sign of a
sa = a//abs(a) # sign of b
x, y = GCD(abs(a), abs(b)) # a solution to the equation
x *= sa # adjusting the sign of x
y *= sb # adjusting the sign of x
g = gcd(abs(a), abs(b))
x *= c//g
y *= c//g
# lk1 (left_k) = lower bound for k due to [minx , maxx]
lk1 = (minx-x)*g/b # x+k*b/g >= minx --> k >= (minx-x)*g/b (if b>0)
# rk1 (right_k) = upper bound for k due to [minx , maxx]
rk1 = (maxx-x)*g/b # x+k*b/g <= maxx --> k <= (maxx-x)*g/b (if b>0)
# till this line of code, we have assumed that b>0 and we have : (minx-x)*g/b <= k <= (maxx-x)*g/b
# if b<0, then : (minx-x)*g/b >= k >= (maxx-x)*g/b . Thus the lower bound and the upper bound will change
if sb == -1:
lk1, rk1 = rk1, lk1
# for example if lk1= 1.5 , we have k>=2 ( the reason is that k must be integer), so we have to use ceil for lower bound
lk1 = ceil(lk1)
# for example if lk1= 10.5 , we have k<=10 , so we have to use floor for upper bound
rk1 = floor(rk1)
# we do the same thing for a
rk2 = (y-miny)*g/a # y-k*a/g >= miny --> k <= (y-miny)*g/a (if a>0)
lk2 = (y-maxy)*g/a # y-k*a/g <= maxy --> k >= lk2 = (y-maxy)*g/a (if a>0)
if sa == -1:
lk2, rk2 = rk2, lk2
lk2 = ceil(lk2)
rk2 = floor(rk2)
#### finding the interval ####
lans = max(lk1, lk2) # lower bound of interval
rans = min(rk1, rk2) # upper bound of interval
# it occurs when we have ( lk1 ------ rk1 lk2 ------ rk2 ) or (lk2 ------- rk2 lk1 -------- rk1).[lk1,rk1] and [lk2,rk2] don't have intersection
if rans < lans:
print(0)
else:
print(rans-lans+1)
I have implemented and tested Linear Diophantine Equation in Java. The implementation is commented out to understand better. It covers the following points
Finding One Solution
Finding Multiple Solutions
Finding Solutions in a range
https://github.com/love1024/Algorithms-Library-In-Java/blob/main/src/Math/LinearDiophantine.java

What's an algorithm to get a number closest to a constant that can evenly (within a margin) divide into two other constants?

So let't say I have numbers A=1483 and B = 635. My X=100.0
Let's say my allowed MARGIN is 10.0
What's the best way to get the closest number to X (can be floating point) that can divide into A and B with a remainder that is less that MARGIN?
For an answer K. A % K <= MARGIN, B % K <= MARGIN, with K being as close to X as possible, for example |K - X| < 100
Let's try and write the problem with mathematical notations.
What you have is Euclidean divisions:
A = Q1*X + R1
B = Q2*X + R2
You want to find the minimal |x| such that
A = Q1'*(X+x) + R1' , |R1'| <= M
B = Q2'*(X+x) + R2' , |R2'| <= M
To help you finding such x, you have relations like:
A = Q1*(X+x) + R1-Q1*x
B = Q2*(X+x) + R2-Q2*x
From here, you should first concentrate on how to solve the example you gave, then try and generalize.
1483 = 14*100 + 83 = 15*100 - 17
635 = 6*100 + 35 = 7*100 - 65
Should you can take x > 0 in order to reduce R2 (35) down to 10, or x < 0 to increase R1 (-17) up to -10?
In the first case, x should be in interval [25/6 , 45/6] to bring |R2'| <= M, but at the same time it must be in interval [73/14 , 93/14] to bring |R1'| <= M.
Do these intervals overlap?
if yes you have a solution.
if no, then you have to try further (decrement quotients Q1' and/or Q2')
Just check with any decent interpreter (Squeak/Pharo Smalltalk here)
{25/6 . 45/6. 73/14 . 93/14} sorted
= {(25/6) . (73/14) . (93/14) . (15/2)}
So they overlap, starting at x=73/14.
But maybe you would get a closer x in the other direction?
I have not given an algorithm, just a clue, up to you to continue. But you see that increment does not have to be random (like 0.001).
For now the best way I have found is a brute force method by finding the GCD of A and B and decrease by a small interval (0.001) and find the smallest c(K) where K >= X and c(x) = A % x + B % x
If I had found a way to differentiate c(x) correctly, I would've liked to find its gradient and use gradient descent to find the most optimal value without brute force.

What kind of algorithm would find a grid of squares in a reasonable time?

I found a result that there is a grid of size 9x13 with following properties:
Every cell contains a digit in base 10.
One can read the numbers from the grid by selecting a starting square, go to one of its 8 nearest grid, maintain that direction and concatenate numbers.
For example, if we have the following grid:
340934433
324324893
455423343
Then one can select the leftmost upper number 3 and select direction to the right and down to read numbers 3, 32 and 325.
Now one has to prove that there is a grid of size 9x13 where one can read the squares of 1 to 100, i.e. one can read all of the integers of the form i^2 where i=1,...,100 from the square.
The best grid I found on the net is of size 11x11, given in Solving a recreational square packing problem . But it looks like it is hard to modify the program to find integers in rectangular grid.
So what kind of algorithm would output a suitable grid in a reasonable time?
I just got a key error from this code:
import random, time, sys
N = 9
M = 13
K = 100
# These are the numbers we would like to pack
numbers = [str(i*i) for i in xrange(1, K+1)]
# Build the global list of digits (used for weighted random guess)
digits = "".join(numbers)
def random_digit(n=len(digits)-1):
return digits[random.randint(0, n)]
# By how many lines each of the numbers is currently covered
count = dict((x, 0) for x in numbers)
# Number of actually covered numbers
covered = 0
# All lines in current position (row, cols, diags, counter-diags)
lines = (["*"*N for x in xrange(N)] +
["*"*M for x in xrange(M)] +
["*"*x for x in xrange(1, N)] + ["*"*x for x in xrange(N, 0, -1)] +
["*"*x for x in xrange(1, M)] + ["*"*x for x in xrange(M, 0, -1)])
# lines_of[x, y] -> list of line/char indexes
lines_of = {}
def add_line_of(x, y, L):
try:
lines_of[x, y].append(L)
except KeyError:
lines_of[x, y] = [L]
for y in xrange(N):
for x in xrange(N):
add_line_of(x, y, (y, x))
add_line_of(x, y, (M + x, y))
add_line_of(x, y, (2*M + (x + y), x - max(0, x + y - M + 1)))
add_line_of(x, y, (2*M + 2*N-1 + (x + N-1 - y), x - max(0, x + (M-1 - y) - M + 1)))
# Numbers covered by each line
covered_numbers = [set() for x in xrange(len(lines))]
# Which numbers the string x covers
def cover(x):
c = x + "/" + x[::-1]
return [y for y in numbers if y in c]
# Set a matrix element
def setValue(x, y, d):
global covered
for i, j in lines_of[x, y]:
L = lines[i]
C = covered_numbers[i]
newL = L[:j] + d + L[j+1:]
newC = set(cover(newL))
for lost in C - newC:
count[lost] -= 1
if count[lost] == 0:
covered -= 1
for gained in newC - C:
count[gained] += 1
if count[gained] == 1:
covered += 1
covered_numbers[i] = newC
lines[i] = newL
def do_search(k, r):
start = time.time()
for i in xrange(r):
x = random.randint(0, N-1)
y = random.randint(0, M-1)
setValue(x, y, random_digit())
best = None
attempts = k
while attempts > 0:
attempts -= 1
old = []
for ch in xrange(1):
x = random.randint(0, N-1)
y = random.randint(0, M-1)
old.append((x, y, lines[y][x]))
setValue(x, y, random_digit())
if best is None or covered > best[0]:
now = time.time()
sys.stdout.write(str(covered) + chr(13))
sys.stdout.flush()
attempts = k
if best is None or covered >= best[0]:
best = [covered, lines[:N][:]]
else:
for x, y, o in old[::-1]:
setValue(x, y, o)
print
sys.stdout.flush()
return best
for y in xrange(N):
for x in xrange(N):
setValue(x, y, random_digit())
best = None
while True:
if best is not None:
for y in xrange(M):
for x in xrange(N):
setValue(x, y, best[1][y][x])
x = do_search(100000, M)
if best is None or x[0] > best[0]:
print x[0]
print "\n".join(" ".join(y) for y in x[1])
if best is None or x[0] >= best[0]:
best = x[:]
To create such a grid, I'd start with a list of strings representing the squares of the first K (100) numbers.
Reduce those strings as much as possible, where many are contained within others (for example, 625 contains 25, so 625 covers the squares of 5 and 25).
This should yield an initial list of 81 unique squares, requiring a minimum of about 312 digits:
def construct_optimal_set(K):
# compute a minimal solution:
numbers = [str(n*n) for n in range(0,K+1)]
min_numbers = []
# note: go in reverse direction, biggest to smallest, to maximize elimination of smaller numbers later
while len(numbers) > 0:
i = 0
while i < len(min_numbers):
q = min_numbers[i]
qr = reverse(min_numbers[i])
# check if the first number is contained within any element of min_numbers
if numbers[-1] in q or numbers[-1] in qr:
break
# check if any element of min_numbers is contained within the first number
elif q in numbers[-1] or qr in numbers[-1]:
min_numbers[i] = numbers[-1]
break
i += 1
# if not found, add it
if i >= len(min_numbers):
min_numbers.append(numbers[-1])
numbers = numbers[:-1]
min_numbers.sort()
return min_numbers
This will return a minimal set of squares, with any squares that are subsets of other squares removed. Extend this by concatenating any mostly-overlapping elements (such as 484 and 841 into 4841); I leave that as an exercise, since it will build familiarity with this code.
Then, you assemble these sort of like a cross-word puzzle. As you assemble the values, pack based on probability of possible future overlaps, by computing a weight for each digit (for example, 1's are fairly common, 9's are less common, so given the choice, you would favor overlapping 9's rather than 1's).
Use something like the following code to build a list of all possible values that are represented in the current grid. Use this periodically while building, in order to eliminate squares that are already represented, as well as to test whether your grid is a full solution.
def merge(digits):
result = 0
for i in range(len(digits)-1,-1,-1):
result = result * 10 + digits[i]
return result
def merge_reverse(digits):
result = 0
for i in range(0, len(digits)):
result = result * 10 + digits[i]
return result
# given a grid where each element contains a single numeric digit,
# return list of every ordering of those digits less than SQK,
# such that you pick a starting point and one of eight directions,
# and assemble digits until either end of grid or larger than SQK;
# this will construct only the unique combinations;
# also note that this will not construct a large number of values,
# since for any given direction, there are at most
# (sqrt(n*n + m*m))!
# possible arrangements, and there will rarely be that many.
def construct_lines(grid, k):
# rather than build a dictionary type, use a little more memory to use faster simple array indexes;
# index is #, and value at index indicates existence: 0 = does not exist, >0 means exists in grid
sqk = k*k
combinations = [0]*(sqk+1)
# do all horizontals, since they are easiest
for y in range(len(grid)):
digits = []
for x in range(len(grid[y])):
digits.append(grid[y][x])
# for every possible starting point...
for q in range(1,len(digits)):
number = merge(digits[q:])
if number <= sqk:
combinations[number] += 1
# now do all verticals
# note that if the grid is really square, grid[0] will give an accurate width of all grid[y][] rows
for x in range(len(grid[0])):
digits = []
for y in range(len(grid)):
digits.append(grid[y][x])
# for every possible starting point...
for q in range(1,len(digits)):
number = merge(digits[q:])
if number <= sqk:
combinations[number] += 1
# the longer axis (x or y) in both directions will contain every possible diagonal
# e.g. x is the longer axis here (using random characters to more easily distinguish idea):
# [1 2 3 4]
# [a b c d]
# [. , $ !]
# 'a,' can be obtained by reversing the diagonal starting on the bottom and working up and to the left
# this means that every set must be reversed as well
if len(grid) > len(grid[0]):
# for each y, grab top and bottom in each of two diagonal directions, for a total of four sets,
# and include the reverse of each set
for y in range(len(grid)):
digitsul = [] # origin point upper-left, heading down and right
digitsur = [] # origin point upper-right, heading down and left
digitsll = [] # origin point lower-left, heading up and right
digitslr = [] # origin point lower-right, heading up and left
revx = len(grid[y])-1 # pre-adjust this for computing reverse x coordinate
for deltax in range(len(grid[y])): # this may go off the grid, so check bounds
if y+deltax < len(grid):
digitsul.append(grid[y+deltax][deltax])
digitsll.append(grid[y+deltax][revx - deltax])
for q in range(1,len(digitsul)):
number = merge(digitsul[q:])
if number <= sqk:
combinations[number] += 1
number = merge_reverse(digitsul[q:])
if number <= sqk:
combinations[number] += 1
for q in range(1,len(digitsll)):
number = merge(digitsll[q:])
if number <= sqk:
combinations[number] += 1
number = merge_reverse(digitsll[q:])
if number <= sqk:
combinations[number] += 1
if y-deltax >= 0:
digitsur.append(grid[y-deltax][deltax])
digitslr.append(grid[y-deltax][revx - deltax])
for q in range(1,len(digitsur)):
number = merge(digitsur[q:])
if number <= sqk:
combinations[number] += 1
number = merge_reverse(digitsur[q:])
if number <= sqk:
combinations[number] += 1
for q in range(1,len(digitslr)):
number = merge(digitslr[q:])
if number <= sqk:
combinations[number] += 1
number = merge_reverse(digitslr[q:])
if number <= sqk:
combinations[number] += 1
else:
# for each x, ditto above
for x in range(len(grid[0])):
digitsul = [] # origin point upper-left, heading down and right
digitsur = [] # origin point upper-right, heading down and left
digitsll = [] # origin point lower-left, heading up and right
digitslr = [] # origin point lower-right, heading up and left
revy = len(grid)-1 # pre-adjust this for computing reverse y coordinate
for deltay in range(len(grid)): # this may go off the grid, so check bounds
if x+deltay < len(grid[0]):
digitsul.append(grid[deltay][x+deltay])
digitsll.append(grid[revy - deltay][x+deltay])
for q in range(1,len(digitsul)):
number = merge(digitsul[q:])
if number <= sqk:
combinations[number] += 1
number = merge_reverse(digitsul[q:])
if number <= sqk:
combinations[number] += 1
for q in range(1,len(digitsll)):
number = merge(digitsll[q:])
if number <= sqk:
combinations[number] += 1
number = merge_reverse(digitsll[q:])
if number <= sqk:
combinations[number] += 1
if x-deltay >= 0:
digitsur.append(grid[deltay][x-deltay])
digitslr.append(grid[revy - deltay][x - deltay])
for q in range(1,len(digitsur)):
number = merge(digitsur[q:])
if number <= sqk:
combinations[number] += 1
number = merge_reverse(digitsur[q:])
if number <= sqk:
combinations[number] += 1
for q in range(1,len(digitslr)):
number = merge(digitslr[q:])
if number <= sqk:
combinations[number] += 1
number = merge_reverse(digitslr[q:])
if number <= sqk:
combinations[number] += 1
# now filter for squares only
return [i for i in range(0,k+1) if combinations[i*i] > 0]
Constructing the grid will be computationally expensive overall, but you will only need to run the check function once for each possible placement, to select the best placement.
Optimize placement by finding the subset of overlapping areas where you can place a sequence of numbers - this should be tolerable in terms of time required, because you can cap the number of possible locations to check; e.g. you might cap it at 10 (again, find the optimal number experimentally), such that you test the first 10 possible placements against the function above to determine which placement, if any, adds the most possible squares. As you progress, you will have fewer possible locations in which to insert the numbers, so testing which placement is best becomes computationally less expensive at the same time that your search for possible placements becomes more expensive, balancing out each other.
This will not handle all combinations, and will not pack as tightly as trying every possible arrangement and computing how many squares are covered, so some might be missed, but compared to O((N*M)!), this algorithm will actually complete in your lifetime (I'd actually estimate a few minutes on a decent computer - more if you parallelize the check for placement).

The "guess the number" game for arbitrary rational numbers?

I once got the following as an interview question:
I'm thinking of a positive integer n. Come up with an algorithm that can guess it in O(lg n) queries. Each query is a number of your choosing, and I will answer either "lower," "higher," or "correct."
This problem can be solved by a modified binary search, in which you listing powers of two until you find one that exceeds n, then run a standard binary search over that range. What I think is so cool about this is that you can search an infinite space for a particular number faster than just brute-force.
The question I have, though, is a slight modification of this problem. Instead of picking a positive integer, suppose that I pick an arbitrary rational number between zero and one. My question is: what algorithm can you use to most efficiently determine which rational number I've picked?
Right now, the best solution I have can find p/q in at most O(q) time by implicitly walking the Stern-Brocot tree, a binary search tree over all the rationals. However, I was hoping to get a runtime closer to the runtime that we got for the integer case, maybe something like O(lg (p + q)) or O(lg pq). Does anyone know of a way to get this sort of runtime?
I initially considered using a standard binary search of the interval [0, 1], but this will only find rational numbers with a non-repeating binary representation, which misses almost all of the rationals. I also thought about using some other way of enumerating the rationals, but I can't seem to find a way to search this space given just greater/equal/less comparisons.
Okay, here's my answer using continued fractions alone.
First let's get some terminology here.
Let X = p/q be the unknown fraction.
Let Q(X,p/q) = sign(X - p/q) be the query function: if it is 0, we've guessed the number, and if it's +/- 1 that tells us the sign of our error.
The conventional notation for continued fractions is A = [a0; a1, a2, a3, ... ak]
= a0 + 1/(a1 + 1/(a2 + 1/(a3 + 1/( ... + 1/ak) ... )))
We'll follow the following algorithm for 0 < p/q < 1.
Initialize Y = 0 = [ 0 ], Z = 1 = [ 1 ], k = 0.
Outer loop: The preconditions are that:
Y and Z are continued fractions of k+1 terms which are identical except in the last element, where they differ by 1, so that Y = [y0; y1, y2, y3, ... yk] and Z = [y0; y1, y2, y3, ... yk + 1]
(-1)k(Y-X) < 0 < (-1)k(Z-X), or in simpler terms, for k even, Y < X < Z and for k odd, Z < X < Y.
Extend the degree of the continued fraction by 1 step without changing the values of the numbers. In general, if the last terms are yk and yk + 1, we change that to [... yk, yk+1=∞] and [... yk, zk+1=1]. Now increase k by 1.
Inner loops: This is essentially the same as #templatetypedef's interview question about the integers. We do a two-phase binary search to get closer:
Inner loop 1: yk = ∞, zk = a, and X is between Y and Z.
Double Z's last term: Compute M = Z but with mk = 2*a = 2*zk.
Query the unknown number: q = Q(X,M).
If q = 0, we have our answer and go to step 17 .
If q and Q(X,Y) have opposite signs, it means X is between Y and M, so set Z = M and go to step 5.
Otherwise set Y = M and go to the next step:
Inner loop 2. yk = b, zk = a, and X is between Y and Z.
If a and b differ by 1, swap Y and Z, go to step 2.
Perform a binary search: compute M where mk = floor((a+b)/2, and query q = Q(X,M).
If q = 0, we're done and go to step 17.
If q and Q(X,Y) have opposite signs, it means X is between Y and M, so set Z = M and go to step 11.
Otherwise, q and Q(X,Z) have opposite signs, it means X is between Z and M, so set Y = M and go to step 11.
Done: X = M.
A concrete example for X = 16/113 = 0.14159292
Y = 0 = [0], Z = 1 = [1], k = 0
k = 1:
Y = 0 = [0; ∞] < X, Z = 1 = [0; 1] > X, M = [0; 2] = 1/2 > X.
Y = 0 = [0; ∞], Z = 1/2 = [0; 2], M = [0; 4] = 1/4 > X.
Y = 0 = [0; ∞], Z = 1/4 = [0; 4], M = [0; 8] = 1/8 < X.
Y = 1/8 = [0; 8], Z = 1/4 = [0; 4], M = [0; 6] = 1/6 > X.
Y = 1/8 = [0; 8], Z = 1/6 = [0; 6], M = [0; 7] = 1/7 > X.
Y = 1/8 = [0; 8], Z = 1/7 = [0; 7]
--> the two last terms differ by one, so swap and repeat outer loop.
k = 2:
Y = 1/7 = [0; 7, ∞] > X, Z = 1/8 = [0; 7, 1] < X,
M = [0; 7, 2] = 2/15 < X
Y = 1/7 = [0; 7, ∞], Z = 2/15 = [0; 7, 2],
M = [0; 7, 4] = 4/29 < X
Y = 1/7 = [0; 7, ∞], Z = 4/29 = [0; 7, 4],
M = [0; 7, 8] = 8/57 < X
Y = 1/7 = [0; 7, ∞], Z = 8/57 = [0; 7, 8],
M = [0; 7, 16] = 16/113 = X
--> done!
At each step of computing M, the range of the interval reduces. It is probably fairly easy to prove (though I won't do this) that the interval reduces by a factor of at least 1/sqrt(5) at each step, which would show that this algorithm is O(log q) steps.
Note that this can be combined with templatetypedef's original interview question and apply towards any rational number p/q, not just between 0 and 1, by first computing Q(X,0), then for either positive/negative integers, bounding between two consecutive integers, and then using the above algorithm for the fractional part.
When I have a chance next, I will post a python program that implements this algorithm.
edit: also, note that you don't have to compute the continued fraction each step (which would be O(k), there are partial approximants to continued fractions that can compute the next step from the previous step in O(1).)
edit 2: Recursive definition of partial approximants:
If Ak = [a0; a1, a2, a3, ... ak] = pk/qk, then pk = akpk-1 + pk-2, and qk = akqk-1 + qk-2. (Source: Niven & Zuckerman, 4th ed, Theorems 7.3-7.5. See also Wikipedia)
Example: [0] = 0/1 = p0/q0, [0; 7] = 1/7 = p1/q1; so [0; 7, 16] = (16*1+0)/(16*7+1) = 16/113 = p2/q2.
This means that if two continued fractions Y and Z have the same terms except the last one, and the continued fraction excluding the last term is pk-1/qk-1, then we can write Y = (ykpk-1 + pk-2) / (ykqk-1 + qk-2) and Z = (zkpk-1 + pk-2) / (zkqk-1 + qk-2). It should be possible to show from this that |Y-Z| decreases by at least a factor of 1/sqrt(5) at each smaller interval produced by this algorithm, but the algebra seems to be beyond me at the moment. :-(
Here's my Python program:
import math
# Return a function that returns Q(p0/q0,p/q)
# = sign(p0/q0-p/q) = sign(p0q-q0p)*sign(q0*q)
# If p/q < p0/q0, then Q() = 1; if p/q < p0/q0, then Q() = -1; otherwise Q()=0.
def makeQ(p0,q0):
def Q(p,q):
return cmp(q0*p,p0*q)*cmp(q0*q,0)
return Q
def strsign(s):
return '<' if s<0 else '>' if s>0 else '=='
def cfnext(p1,q1,p2,q2,a):
return [a*p1+p2,a*q1+q2]
def ratguess(Q, doprint, kmax):
# p2/q2 = p[k-2]/q[k-2]
p2 = 1
q2 = 0
# p1/q1 = p[k-1]/q[k-1]
p1 = 0
q1 = 1
k = 0
cf = [0]
done = False
while not done and (not kmax or k < kmax):
if doprint:
print 'p/q='+str(cf)+'='+str(p1)+'/'+str(q1)
# extend continued fraction
k = k + 1
[py,qy] = [p1,q1]
[pz,qz] = cfnext(p1,q1,p2,q2,1)
ay = None
az = 1
sy = Q(py,qy)
sz = Q(pz,qz)
while not done:
if doprint:
out = str(py)+'/'+str(qy)+' '+strsign(sy)+' X '
out += strsign(-sz)+' '+str(pz)+'/'+str(qz)
out += ', interval='+str(abs(1.0*py/qy-1.0*pz/qz))
if ay:
if (ay - az == 1):
[p0,q0,a0] = [pz,qz,az]
break
am = (ay+az)/2
else:
am = az * 2
[pm,qm] = cfnext(p1,q1,p2,q2,am)
sm = Q(pm,qm)
if doprint:
out = str(ay)+':'+str(am)+':'+str(az) + ' ' + out + '; M='+str(pm)+'/'+str(qm)+' '+strsign(sm)+' X '
print out
if (sm == 0):
[p0,q0,a0] = [pm,qm,am]
done = True
break
elif (sm == sy):
[py,qy,ay,sy] = [pm,qm,am,sm]
else:
[pz,qz,az,sz] = [pm,qm,am,sm]
[p2,q2] = [p1,q1]
[p1,q1] = [p0,q0]
cf += [a0]
print 'p/q='+str(cf)+'='+str(p1)+'/'+str(q1)
return [p1,q1]
and a sample output for ratguess(makeQ(33102,113017), True, 20):
p/q=[0]=0/1
None:2:1 0/1 < X < 1/1, interval=1.0; M=1/2 > X
None:4:2 0/1 < X < 1/2, interval=0.5; M=1/4 < X
4:3:2 1/4 < X < 1/2, interval=0.25; M=1/3 > X
p/q=[0, 3]=1/3
None:2:1 1/3 > X > 1/4, interval=0.0833333333333; M=2/7 < X
None:4:2 1/3 > X > 2/7, interval=0.047619047619; M=4/13 > X
4:3:2 4/13 > X > 2/7, interval=0.021978021978; M=3/10 > X
p/q=[0, 3, 2]=2/7
None:2:1 2/7 < X < 3/10, interval=0.0142857142857; M=5/17 > X
None:4:2 2/7 < X < 5/17, interval=0.00840336134454; M=9/31 < X
4:3:2 9/31 < X < 5/17, interval=0.00379506641366; M=7/24 < X
p/q=[0, 3, 2, 2]=5/17
None:2:1 5/17 > X > 7/24, interval=0.00245098039216; M=12/41 < X
None:4:2 5/17 > X > 12/41, interval=0.00143472022956; M=22/75 > X
4:3:2 22/75 > X > 12/41, interval=0.000650406504065; M=17/58 > X
p/q=[0, 3, 2, 2, 2]=12/41
None:2:1 12/41 < X < 17/58, interval=0.000420521446594; M=29/99 > X
None:4:2 12/41 < X < 29/99, interval=0.000246366100025; M=53/181 < X
4:3:2 53/181 < X < 29/99, interval=0.000111613371282; M=41/140 < X
p/q=[0, 3, 2, 2, 2, 2]=29/99
None:2:1 29/99 > X > 41/140, interval=7.21500721501e-05; M=70/239 < X
None:4:2 29/99 > X > 70/239, interval=4.226364059e-05; M=128/437 > X
4:3:2 128/437 > X > 70/239, interval=1.91492009996e-05; M=99/338 > X
p/q=[0, 3, 2, 2, 2, 2, 2]=70/239
None:2:1 70/239 < X < 99/338, interval=1.23789953207e-05; M=169/577 > X
None:4:2 70/239 < X < 169/577, interval=7.2514738621e-06; M=309/1055 < X
4:3:2 309/1055 < X < 169/577, interval=3.28550190148e-06; M=239/816 < X
p/q=[0, 3, 2, 2, 2, 2, 2, 2]=169/577
None:2:1 169/577 > X > 239/816, interval=2.12389981991e-06; M=408/1393 < X
None:4:2 169/577 > X > 408/1393, interval=1.24415093544e-06; M=746/2547 < X
None:8:4 169/577 > X > 746/2547, interval=6.80448470014e-07; M=1422/4855 < X
None:16:8 169/577 > X > 1422/4855, interval=3.56972657711e-07; M=2774/9471 > X
16:12:8 2774/9471 > X > 1422/4855, interval=1.73982239227e-07; M=2098/7163 > X
12:10:8 2098/7163 > X > 1422/4855, interval=1.15020646951e-07; M=1760/6009 > X
10:9:8 1760/6009 > X > 1422/4855, interval=6.85549088053e-08; M=1591/5432 < X
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9]=1591/5432
None:2:1 1591/5432 < X < 1760/6009, interval=3.06364213998e-08; M=3351/11441 < X
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9, 1]=1760/6009
None:2:1 1760/6009 > X > 3351/11441, interval=1.45456726663e-08; M=5111/17450 < X
None:4:2 1760/6009 > X > 5111/17450, interval=9.53679318849e-09; M=8631/29468 < X
None:8:4 1760/6009 > X > 8631/29468, interval=5.6473816179e-09; M=15671/53504 < X
None:16:8 1760/6009 > X > 15671/53504, interval=3.11036635336e-09; M=29751/101576 > X
16:12:8 29751/101576 > X > 15671/53504, interval=1.47201634215e-09; M=22711/77540 > X
12:10:8 22711/77540 > X > 15671/53504, interval=9.64157420569e-10; M=19191/65522 > X
10:9:8 19191/65522 > X > 15671/53504, interval=5.70501257346e-10; M=17431/59513 > X
p/q=[0, 3, 2, 2, 2, 2, 2, 2, 9, 1, 8]=15671/53504
None:2:1 15671/53504 < X < 17431/59513, interval=3.14052228667e-10; M=33102/113017 == X
Since Python handles biginteger math from the start, and this program uses only integer math (except for the interval calculations), it should work for arbitrary rationals.
edit 3: Outline of proof that this is O(log q), not O(log^2 q):
First note that until the rational number is found, the # of steps nk for each new continued fraction term is exactly 2b(a_k)-1 where b(a_k) is the # of bits needed to represent a_k = ceil(log2(a_k)): it's b(a_k) steps to widen the "net" of the binary search, and b(a_k)-1 steps to narrow it). See the example above, you'll note that the # of steps is always 1, 3, 7, 15, etc.
Now we can use the recurrence relation qk = akqk-1 + qk-2 and induction to prove the desired result.
Let's state it in this way: that the value of q after the Nk = sum(nk) steps required for reaching the kth term has a minimum: q >= A*2cN for some fixed constants A,c. (so to invert, we'd get that the # of steps N is <= (1/c) * log2 (q/A) = O(log q).)
Base cases:
k=0: q = 1, N = 0, so q >= 2N
k=1: for N = 2b-1 steps, q = a1 >= 2b-1 = 2(N-1)/2 = 2N/2/sqrt(2).
This implies A = 1, c = 1/2 could provide desired bounds. In reality, q may not double each term (counterexample: [0; 1, 1, 1, 1, 1] has a growth factor of phi = (1+sqrt(5))/2) so let's use c = 1/4.
Induction:
for term k, qk = akqk-1 + qk-2. Again, for the nk = 2b-1 steps needed for this term, ak >= 2b-1 = 2(nk-1)/2.
So akqk-1 >= 2(Nk-1)/2 * qk-1 >= 2(nk-1)/2 * A*2Nk-1/4 = A*2Nk/4/sqrt(2)*2nk/4.
Argh -- the tough part here is that if ak = 1, q may not increase much for that one term, and we need to use qk-2 but that may be much smaller than qk-1.
Let's take the rational numbers, in reduced form, and write them out in order first of denominator, then numerator.
1/2, 1/3, 2/3, 1/4, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, ...
Our first guess is going to be 1/2. Then we'll go along the list until we have 3 in our range. Then we will take 2 guesses to search that list. Then we'll go along the list until we have 7 in our remaining range. Then we will take 3 guesses to search that list. And so on.
In n steps we'll cover the first 2O(n) possibilities, which is in the order of magnitude of efficiency that you were looking for.
Update: People didn't get the reasoning behind this. The reasoning is simple. We know how to walk a binary tree efficiently. There are O(n2) fractions with maximum denominator n. We could therefore search up to any particular denominator size in O(2*log(n)) = O(log(n)) steps. The problem is that we have an infinite number of possible rationals to search. So we can't just line them all up, order them, and start searching.
Therefore my idea was to line up a few, search, line up more, search, and so on. Each time we line up more we line up about double what we did last time. So we need one more guess than we did last time. Therefore our first pass uses 1 guess to traverse 1 possible rational. Our second uses 2 guesses to traverse 3 possible rationals. Our third uses 3 guesses to traverse 7 possible rationals. And our k'th uses k guesses to traverse 2k-1 possible rationals. For any particular rational m/n, eventually it will wind up putting that rational on a fairly big list that it knows how to do a binary search on efficiently.
If we did binary searches, then ignored everything we'd learned when we grab more rationals, then we'd put all of the rationals up to and including m/n in O(log(n)) passes. (That's because by that point we'll get to a pass with enough rationals to include every rational up to and including m/n.) But each pass takes more guesses, so that would be O(log(n)2) guesses.
However we actually do a lot better than that. With our first guess, we eliminate half the rationals on our list as being too big or small. Our next two guesses don't quite cut the space into quarters, but they don't come too far from it. Our next 3 guesses again don't quite cut the space into eighths, but they don't come too far from it. And so on. When you put it together, I'm convinced that the result is that you find m/n in O(log(n)) steps. Though I don't actually have a proof.
Try it out: Here is code to generate the guesses so that you can play and see how efficient it is.
#! /usr/bin/python
from fractions import Fraction
import heapq
import readline
import sys
def generate_next_guesses (low, high, limit):
upcoming = [(low.denominator + high.denominator,
low.numerator + high.numerator,
low.denominator, low.numerator,
high.denominator, high.numerator)]
guesses = []
while len(guesses) < limit:
(mid_d, mid_n, low_d, low_n, high_d, high_n) = upcoming[0]
guesses.append(Fraction(mid_n, mid_d))
heapq.heappushpop(upcoming, (low_d + mid_d, low_n + mid_n,
low_d, low_n, mid_d, mid_n))
heapq.heappush(upcoming, (mid_d + high_d, mid_n + high_n,
mid_d, mid_n, high_d, high_n))
guesses.sort()
return guesses
def ask (num):
while True:
print "Next guess: {0} ({1})".format(num, float(num))
if 1 < len(sys.argv):
wanted = Fraction(sys.argv[1])
if wanted < num:
print "too high"
return 1
elif num < wanted:
print "too low"
return -1
else:
print "correct"
return 0
answer = raw_input("Is this (h)igh, (l)ow, or (c)orrect? ")
if answer == "h":
return 1
elif answer == "l":
return -1
elif answer == "c":
return 0
else:
print "Not understood. Please say one of (l, c, h)"
guess_size_bound = 2
low = Fraction(0)
high = Fraction(1)
guesses = [Fraction(1,2)]
required_guesses = 0
answer = -1
while 0 != answer:
if 0 == len(guesses):
guess_size_bound *= 2
guesses = generate_next_guesses(low, high, guess_size_bound - 1)
#print (low, high, guesses)
guess = guesses[len(guesses)/2]
answer = ask(guess)
required_guesses += 1
if 0 == answer:
print "Thanks for playing!"
print "I needed %d guesses" % required_guesses
elif 1 == answer:
high = guess
guesses[len(guesses)/2:] = []
else:
low = guess
guesses[0:len(guesses)/2 + 1] = []
As an example to try it out I tried 101/1024 (0.0986328125) and found that it took 20 guesses to find the answer. I tried 0.98765 and it took 45 guesses. I tried 0.0123456789 and it needed 66 guesses and about a second to generate them. (Note, if you call the program with a rational number as an argument, it will fill in all of the guesses for you. This is a very helpful convenience.)
I've got it! What you need to do is to use a parallel search with bisection and continued fractions.
Bisection will give you a limit toward a specific real number, as represented as a power of two, and continued fractions will take the real number and find the nearest rational number.
How you run them in parallel is as follows.
At each step, you have l and u being the lower and upper bounds of bisection. The idea is, you have a choice between halving the range of bisection, and adding an additional term as a continued fraction representation. When both l and u have the same next term as a continued fraction, then you take the next step in the continued fraction search, and make a query using the continued fraction. Otherwise, you halve the range using bisection.
Since both methods increase the denominator by at least a constant factor (bisection goes by factors of 2, continued fractions go by at least a factor of phi = (1+sqrt(5))/2), this means your search should be O(log(q)). (There may be repeated continued fraction calculations, so it may end up as O(log(q)^2).)
Our continued fraction search needs to round to the nearest integer, not use floor (this is clearer below).
The above is kind of handwavy. Let's use a concrete example of r = 1/31:
l = 0, u = 1, query = 1/2. 0 is not expressible as a continued fraction, so we use binary search until l != 0.
l = 0, u = 1/2, query = 1/4.
l = 0, u = 1/4, query = 1/8.
l = 0, u = 1/8, query = 1/16.
l = 0, u = 1/16, query = 1/32.
l = 1/32, u = 1/16. Now 1/l = 32, 1/u = 16, these have different cfrac reps, so keep bisecting., query = 3/64.
l = 1/32, u = 3/64, query = 5/128 = 1/25.6
l = 1/32, u = 5/128, query = 9/256 = 1/28.4444....
l = 1/32, u = 9/256, query = 17/512 = 1/30.1176... (round to 1/30)
l = 1/32, u = 17/512, query = 33/1024 = 1/31.0303... (round to 1/31)
l = 33/1024, u = 17/512, query = 67/2048 = 1/30.5672... (round to 1/31)
l = 33/1024, u = 67/2048. At this point both l and u have the same continued fraction term 31, so now we use a continued fraction guess.
query = 1/31.
SUCCESS!
For another example let's use 16/113 (= 355/113 - 3 where 355/113 is pretty close to pi).
[to be continued, I have to go somewhere]
On further reflection, continued fractions are the way to go, never mind bisection except to determine the next term. More when I get back.
I think I found an O(log^2(p + q)) algorithm.
To avoid confusion in the next paragraph, a "query" refers to when the guesser gives the challenger a guess, and the challenger responds "bigger" or "smaller". This allows me to reserve the word "guess" for something else, a guess for p + q that is not asked directly to the challenger.
The idea is to first find p + q, using the algorithm you describe in your question: guess a value k, if k is too small, double it and try again. Then once you have an upper and lower bound, do a standard binary search. This takes O(log(p+q)T) queries, where T is an upper bound for the number of queries it takes to check a guess. Let's find T.
We want to check all fractions r/s with r + s <= k, and double k until k is sufficiently large. Note that there are O(k^2) fractions you need to check for a given value of k. Build a balanced binary search tree containing all these values, then search it to determine if p/q is in the tree. It takes O(log k^2) = O(log k) queries to confirm that p/q is not in the tree.
We will never guess a value of k greater than 2(p + q). Hence we can take T = O(log(p+q)).
When we guess the correct value for k (i.e., k = p + q), we will submit the query p/q to the challenger in the course of checking our guess for k, and win the game.
Total number of queries is then O(log^2(p + q)).
Okay, I think I figured out an O(lg2 q) algorithm for this problem that is based on Jason S's most excellent insight about using continued fractions. I thought I'd flesh the algorithm out all the way right here so that we have a complete solution, along with a runtime analysis.
The intuition behind the algorithm is that any rational number p/q within the range can be written as
a0 + 1 / (a1 + 1 / (a2 + 1 / (a3 + 1 / ...))
For appropriate choices of ai. This is called a continued fraction. More importantly, though these ai can be derived by running the Euclidean algorithm on the numerator and denominator. For example, suppose we want to represent 11/14 this way. We begin by noting that 14 goes into eleven zero times, so a crude approximation of 11/14 would be
0 = 0
Now, suppose that we take the reciprocal of this fraction to get 14/11 = 1 3/11. So if we write
0 + (1 / 1) = 1
We get a slightly better approximation to 11/14. Now that we're left with 3 / 11, we can take the reciprocal again to get 11/3 = 3 2/3, so we can consider
0 + (1 / (1 + 1/3)) = 3/4
Which is another good approximation to 11/14. Now, we have 2/3, so consider the reciprocal, which is 3/2 = 1 1/2. If we then write
0 + (1 / (1 + 1/(3 + 1/1))) = 5/6
We get another good approximation to 11/14. Finally, we're left with 1/2, whose reciprocal is 2/1. If we finally write out
0 + (1 / (1 + 1/(3 + 1/(1 + 1/2)))) = (1 / (1 + 1/(3 + 1/(3/2)))) = (1 / (1 + 1/(3 + 2/3)))) = (1 / (1 + 1/(11/3)))) = (1 / (1 + 3/11)) = 1 / (14/11) = 11/14
which is exactly the fraction we wanted. Moreover, look at the sequence of coefficients we ended up using. If you run the extended Euclidean algorithm on 11 and 14, you get that
11 = 0 x 14 + 11 --> a0 = 0
14 = 1 x 11 + 3 --> a1 = 1
11 = 3 x 3 + 2 --> a2 = 3
3 = 2 x 1 + 1 --> a3 = 2
It turns out that (using more math than I currently know how to do!) that this isn't a coincidence and that the coefficients in the continued fraction of p/q are always formed by using the extended Euclidean algorithm. This is great, because it tells us two things:
There can be at most O(lg (p + q)) coefficients, because the Euclidean algorithm always terminates in this many steps, and
Each coefficient is at most max{p, q}.
Given these two facts, we can come up with an algorithm to recover any rational number p/q, not just those between 0 and 1, by applying the general algorithm for guessing arbitrary integers n one at a time to recover all of the coefficients in the continued fraction for p/q. For now, though, we'll just worry about numbers in the range (0, 1], since the logic for handling arbitrary rational numbers can be done easily given this as a subroutine.
As a first step, let's suppose that we want to find the best value of a1 so that 1 / a1 is as close as possible to p/q and a1 is an integer. To do this, we can just run our algorithm for guessing arbitrary integers, taking the reciprocal each time. After doing this, one of two things will have happened. First, we might by sheer coincidence discover that p/q = 1/k for some integer k, in which case we're done. If not, we'll find that p/q is sandwiched between 1/(a1 - 1) and 1/a0 for some a1. When we do this, then we start working on the continued fraction one level deeper by finding the a2 such that p/q is between 1/(a1 + 1/a2) and 1/(a1 + 1/(a2 + 1)). If we magically find p/q, that's great! Otherwise, we then go one level down further in the continued fraction. Eventually, we'll find the number this way, and it can't take too long. Each binary search to find a coefficient takes at most O(lg(p + q)) time, and there are at most O(lg(p + q)) levels to the search, so we need only O(lg2(p + q)) arithmetic operations and probes to recover p/q.
One detail I want to point out is that we need to keep track of whether we're on an odd level or an even level when doing the search because when we sandwich p/q between two continued fractions, we need to know whether the coefficient we were looking for was the upper or the lower fraction. I'll state without proof that for ai with i odd you want to use the upper of the two numbers, and with ai even you use the lower of the two numbers.
I am almost 100% confident that this algorithm works. I'm going to try to write up a more formal proof of this in which I fill in all of the gaps in this reasoning, and when I do I'll post a link here.
Thanks to everyone for contributing the insights necessary to get this solution working, especially Jason S for suggesting a binary search over continued fractions.
Remember that any rational number in (0, 1) can be represented as a finite sum of distinct (positive or negative) unit fractions. For example, 2/3 = 1/2 + 1/6 and 2/5 = 1/2 - 1/10. You can use this to perform a straight-forward binary search.
Here is yet another way to do it. If there is sufficient interest, I will try to fill out the details tonight, but I can't right now because I have family responsibilities. Here is a stub of an implementation that should explain the algorithm:
low = 0
high = 1
bound = 2
answer = -1
while 0 != answer:
mid = best_continued_fraction((low + high)/2, bound)
while mid == low or mid == high:
bound += bound
mid = best_continued_fraction((low + high)/2, bound)
answer = ask(mid)
if -1 == answer:
low = mid
elif 1 == answer:
high = mid
else:
print_success_message(mid)
And here is the explanation. What best_continued_fraction(x, bound) should do is find the last continued fraction approximation to x with the denominator at most bound. This algorithm will take polylog steps to complete and finds very good (though not always the best) approximations. So for each bound we'll get something close to a binary search through all possible fractions of that size. Occasionally we won't find a particular fraction until we increase the bound farther than we should, but we won't be far off.
So there you have it. A logarithmic number of questions found with polylog work.
Update: And full working code.
#! /usr/bin/python
from fractions import Fraction
import readline
import sys
operations = [0]
def calculate_continued_fraction(terms):
i = len(terms) - 1
result = Fraction(terms[i])
while 0 < i:
i -= 1
operations[0] += 1
result = terms[i] + 1/result
return result
def best_continued_fraction (x, bound):
error = x - int(x)
terms = [int(x)]
last_estimate = estimate = Fraction(0)
while 0 != error and estimate.numerator < bound:
operations[0] += 1
error = 1/error
term = int(error)
terms.append(term)
error -= term
last_estimate = estimate
estimate = calculate_continued_fraction(terms)
if estimate.numerator < bound:
return estimate
else:
return last_estimate
def ask (num):
while True:
print "Next guess: {0} ({1})".format(num, float(num))
if 1 < len(sys.argv):
wanted = Fraction(sys.argv[1])
if wanted < num:
print "too high"
return 1
elif num < wanted:
print "too low"
return -1
else:
print "correct"
return 0
answer = raw_input("Is this (h)igh, (l)ow, or (c)orrect? ")
if answer == "h":
return 1
elif answer == "l":
return -1
elif answer == "c":
return 0
else:
print "Not understood. Please say one of (l, c, h)"
ow = Fraction(0)
high = Fraction(1)
bound = 2
answer = -1
guesses = 0
while 0 != answer:
mid = best_continued_fraction((low + high)/2, bound)
guesses += 1
while mid == low or mid == high:
bound += bound
mid = best_continued_fraction((low + high)/2, bound)
answer = ask(mid)
if -1 == answer:
low = mid
elif 1 == answer:
high = mid
else:
print "Thanks for playing!"
print "I needed %d guesses and %d operations" % (guesses, operations[0])
It appears slightly more efficient in guesses than the previous solution, and does a lot fewer operations. For 101/1024 it required 19 guesses and 251 operations. For .98765 it needed 27 guesses and 623 operations. For 0.0123456789 it required 66 guesses and 889 operations. And for giggles and grins, for 0.0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 (that's 10 copies of the previous one) it required 665 guesses and 23289 operations.
You can sort rational numbers in a given interval by for example the pair (denominator, numerator). Then to play the game you can
Find the interval [0, N] using the doubling-step approach
Given an interval [a, b] shoot for the rational with smallest denominator in the interval that is the closest to the center of the interval
this is however probably still O(log(num/den) + den) (not sure and it's too early in the morning here to make me think clearly ;-) )

How do you build a ratings implementation?

We have need for a "rating" system in a project we are working on, similar to the one in SO. However, in ours there are multiple entities that need to be "tagged" with a vote up (only up, never down, like an increment). Sometimes we will need to show all of the entities in order of what is rated highest, regardless of entity type, basically mixing the result sets, I guess. What data structures / algorithms do you use to implement this so that is flexible and still scalable?
Since reddit's ranking algorithm rocks, it makes very much sense to have a look at it, if not copy it:
Given the time the entry was posted A and the time of 7:46:43 a.m. December 8, 2005 B we have ts as their difference in seconds:
ts = A - B
and x as the difference between the number of up votes U and the number of down votes D:
x = U - D
Where
y = 1 if x > 0
y = 0 if x = 0
y = -1 if x < 0
and z as the maximal value of the absolute value of x and 1:
z = |x| if |x| >= 1
z = 1 if |x| < 1
we have the rating as a function ƒ(ts, y, z):
ƒ(ts, y, z) = log10 z + (y • ts)/45000

Resources