boosting algorithm - classifiers yields the correct label - algorithm

I am a computer science student; I am studying the Algorithms course independently.
During the course, I saw this question:
Suppose we have a set X = {x1, . . . , xn} of elements, each with a label L(i) ∈ {0, 1} (think of
x(i) as a picture, and the label indicates whether it is a cat or not). We also have a set of
classifiers H, and an algorithm A that given any distribution D on X, outputs h ∈ H such
that
Pr(i∼D)[h(x(i)) = L(i)] ≥ 0.51
Show an algorithm that produces a set of T = O(log n) classifiers h
(1), . . . , h(T) ∈ H, such
that the majority vote among these T classifiers yields the correct label for all 1 ≤ i ≤ n.
photo of the question
From what I can understand this is a question related to boosting. But it is not clear to me how to show an algorithm for this question.
I found an algorithm, but I do not know if it fits the problem:
Algorithm 1 Boost(D, A)
Let T ← 4 log n/ E^2 for E < 0.01.
Initialize a copy of polynomial weights to run over w^t ∈ ∆n.
for t = 1 to T do
Let h^t = A(D, w^t)
Let L^t ∈ [0, 1]^m be such that L^t_i = 1[h^t(xi) = yi].
Pass L^t to the PW algorithm.
end for
Let pˆ =1/T(SIGMA^T_t=1 e_h^t )
Return fpˆ(x).
link to algorithm, page 16
to be perfectly honest, did not understand how to solve the question.

We can view this as a two-player zero-sum game. Carol (the “classifier”
player) chooses a classifier, and Dave (the “data” player) chooses a
labeled element. Carol wins if the classifier is correct on that
element, and Dave wins if it’s incorrect.
Algorithm A implies that Carol can win this game at least 51% of the
time. We can run algorithm ComputeEQ with ε = 0.005 (0.5%) to find a
strategy for Carol where she chooses uniformly at random from O(log n)
classifiers and wins at least 50.5% of the time regardless of Dave’s
strategy. This implies that the majority vote is correct on all n
elements.
(This is really a question for https://cs.stackexchange.com.)

Related

Roots of a polynomial mod a prime

I'm looking for a speedy algorithm to find the roots of a univariate polynomial in a prime finite field.
That is, if f = a0 + a1x + a2x2 + ... + anxn (n > 0) then an algorithm that finds all r < p satisfying f(r) = 0 mod p, for a given prime p.
I found Chiens search algorithm https://en.wikipedia.org/wiki/Chien_search but I can't imagine this being that fast for primes greater than 20 bits. Does anyone have experience with Chien's search algorithm or know a faster way? Is there a sympy module for this?
This is pretty well studied, as mcdowella's comment indicates. Here is how the Cantor-Zassenhaus random algorithm works for the case where you want to find the roots of a polynomial, instead of the more general factorization.
Note that in the ring of polynomials with coefficients mod p, the product x(x-1)(x-2)...(x-p+1) has all possible roots, and equals x^p-x by Fermat's Little Theorem and unique factorization in this ring.
Set g = GCD(f,x^p-x). Using Euclid's algorithm to compute the GCD of two polynomials is fast in general, taking a number of steps that is logarithmic in the maximum degree. It does not require you to factor the polynomials. g has the same roots as f in the field, and no repeated factors.
Because of the special form of x^p-x, with only two nonzero terms, the first step of Euclid's algorithm can be done by repeated squaring, in about 2 log_2 (p) steps involving only polynomials of degree no more than twice the degree of f, with coefficients mod p. We may compute x mod f, x^2 mod f, x^4 mod f, etc, then multiply together the terms corresponding to nonzero places in the binary expansion of p to compute x^p mod f, and finally subtract x.
Repeatedly do the following: Choose a random d in Z/p. Compute the GCD of g with r_d = (x+d)^((p-1)/2)-1, which we can again compute rapidly by Euclid's algorithm, using repeated squaring on the first step. If the degree of this GCD is strictly between 0 and the degree of g, we have found a nontrivial factor of g, and we can recurse until we have found the linear factors hence roots of g and thus f.
How often does this work? r_d has as roots the numbers that are d less than a nonzero square mod p. Consider two distinct roots of g, a and b, so (x-a) and (x-b) are factors of g. If a+d is a nonzero square, and b+d is not, then (x-a) is a common factor of g and r_d, while (x-b) is not, which means GCD(g,r_d) is a nontrivial factor of g. Similarly, if b+d is a nonzero square while a+d is not, then (x-b) is a common factor of g and r_d while (x-a) is not. By number theory, one case or the other happens close to half of the possible choices for d, which means that on average it takes a constant number of choices of d before we find a nontrivial factor of g, in fact one separating (x-a) from (x-b).
Your answers are good, but I think I found a wonderful method to find the roots modulo any number: This method based on "LATTICES". Let r ≤ R be a root of mod p. We must find another function such as h(x) such that h isn't large and r is root of h. Lattice method find this function. At the first time, we must create a basis of polynomial for lattice and then, with "LLL" algorithm, we find a "shortest vector" that has root r without modulo p. In fact, we eliminate modulo p with this way.
For more explanation, refer to "Coppersmith D. Finding small solutions to small degree polynomials. In Cryptography and lattices".

Finding integral solution of an equation

This is part of a bigger question. Its actually a mathematical problem. So it would be really great if someone can direct me to any algorithm to obtain the solution of this problem or a pseudo code will be of help.
The question. Given an equation check if it has an integral solution.
For example:
(26a+5)/32=b
Here a is an integer. Is there an algorithm to predict or find if b can be an integer. I need a general solution not specific to this question. The equation can vary. Thanks
Your problem is an example of a linear Diophantine equation. About that, Wikipedia says:
This Diophantine equation [i.e., a x + b y = c] has a solution (where x and y are integers) if and only if c is a multiple of the greatest common divisor of a and b. Moreover, if (x, y) is a solution, then the other solutions have the form (x + k v, y - k u), where k is an arbitrary integer, and u and v are the quotients of a and b (respectively) by the greatest common divisor of a and b.
In this case, (26 a + 5)/32 = b is equivalent to 26 a - 32 b = -5. The gcd of the coefficients of the unknowns is gcd(26, -32) = 2. Since -5 is not a multiple of 2, there is no solution.
A general Diophantine equation is a polynomial in the unknowns, and can only be solved (if at all) by more complex methods. A web search might turn up specialized software for that problem.
Linear Diophantine equations take the form ax + by = c. If c is the greatest common divisor of a and b this means a=z'c and b=z''c then this is Bézout's identity of the form
with a=z' and b=z'' and the equation has an infinite number of solutions. So instead of trial searching method you can check if c is the greatest common divisor (GCD) of a and b
If indeed a and b are multiples of c then x and y can be computed using extended Euclidean algorithm which finds integers x and y (one of which is typically negative) that satisfy Bézout's identity
(as a side note: this holds also for any other Euclidean domain, i.e. polynomial ring & every Euclidean domain is unique factorization domain). You can use Iterative Method to find these solutions:
Integral solution to equation `a + bx = c + dy`

Algorithm for orthogonal polynomials

and thank you for the attention you're paying to my question :)
My question is about finding an (efficient enough) algorithm for finding orthogonal polynomials of a given weight function f.
I've tried to simply apply the Gram-Schmidt algorithm but this one is not efficient enough. Indeed, it requires O(n^2) integrals. But my goal is to use this algorithm in order to find Hankel determinants of a function f. So a "direct" computation wich consists in simply compute the matrix and take its determinants requires only 2*n - 1 integrals.
But I want to use the theorem stating that the Hankel determinant of order n of f is a product of the n first leading coefficients of the orthogonal polynomials of f. The reason is that when n gets larger (say about 20), Hankel determinant gets really big and my goal is to divided it by an other big constant (for n = 20, the constant is of order 10^103). My idea is then to "dilute" the computation of the constant in the product of the leading coefficients.
I hope there is a O(n) algorithm to compute the n first orthogonal polynomials :) I've done some digging and found nothing in that direction for general function f (f can be any smooth function, actually).
EDIT: I'll precise here what the objects I'm talking about are.
1) A Hankel determinant of order n is the determinant of a square matrix which is constant on the skew diagonals. Thus for example
a b c
b c d
c d e
is a Hankel matrix of size 3 by 3.
2) If you have a function f : R -> R, you can associate to f its "kth moment" which is defined as (I'll write it in tex) f_k := \int_{\mathbb{R}} f(x) x^k dx
With this, you can create a Hankel matrix A_n(f) whose entries are (A_n(f)){ij} = f{i+j-2}, that is something of the like
f_0 f_1 f_2
f_1 f_2 f_3
f_2 f_3 f_4
With this in mind, it is easy to define the Hankel determinant of f which is simply
H_n(f) := det(A_n(f)). (Of course, it is understood that f has sufficient decay at infinity, this means that all the moments are well defined. A typical choice for f could be the gaussian f(x) = exp(-x^2), or any continuous function on a compact set of R...)
3) What I call orthogonal polynomials of f is a set of polynomials (p_n) such that
\int_{\mathbb{R}} f(x) p_j(x) p_k(x) is 1 if j = k and 0 otherwize.
(They are called like that since they form an orthonormal basis of the vector space of polynomials with respect to the scalar product
(p|q) = \int_{\mathbb{R}} f(x) p(x) q(x) dx
4) Now, it is basic linear algebra that from any basis of a vector space equipped with a scalar product, you can built a orthonormal basis thanks to the Gram-Schmidt algorithm. This is where the n^2 integrations comes from. You start from the basis 1, x, x^2, ..., x^n. Then you need n(n-1) integrals for the family to be orthogonal, and you need n more in order to normalize them.
5) There is a theorem saying that if f : R -> R is a function having sufficient decay at infinity, then we have that its Hankel determinant H_n(f) is equal to
H_n(f) = \prod_{j = 0}^{n-1} \kappa_j^{-2}
where \kappa_j is the leading coefficient of the j+1th orthogonal polynomial of f.
Thank you for your answer!
(PS: I tagged octave because I work in octave so, with a bit of luck (but I doubt it), there is a built-in function or a package already done managing this kind of think)
Orthogonal polynomials obey a recurrence relation, which we can write as
P[n+1] = (X-a[n])*P[n] - b[n-1]*P[n-1]
P[0] = 1
P[1] = X-a[0]
and we can compute the a, b coefficients by
a[n] = <X*P[n]|P[n]> / c[n]
b[n-1] = c[n-1]/c[n]
where
c[n] = <P[n]|P[n]>
(Here < | > is your inner product).
However I cannot vouch for the stability of this process at large n.

Pollard Rho factorization method

Pollard Rho factorization method uses a function generator f(x) = x^2-a(mod n) or f(x) = x^2+a(mod n) , is the choice of this function (parabolic) has got any significance or we may use any function (cubic , polynomial or even linear) as we have to identify or find the numbers belonging to same congruence class modulo n to find the non trivial divisor ?
In Knuth Vol II (The Art Of Computer Programming - Seminumerical Algorithms) section 4.5.4 Knuth says
Furthermore if f(y) mod p behaves as a random mapping from the set {0,
1, ... p-1} into itself, exercise 3.1-12 shows that the average value
of the least such m will be of order sqrt(p)... From the theory in
Chapter 3, we know that a linear polynomial f(x) = ax + c will not be
sufficiently random for our purpose. The next simplest case is
quadratic, say f(x) = x^2 + 1. We don't know that this function is
sufficiently random, but our lack of knowledge tends to support the
hypothesis of randomness, and empirical tests show that this f does
work essentially as predicted
The probability theory that says that f(x) has a cycle of length about sqrt(p) assumes in particular that there can be two values y and z such that f(y) = f(z) - since f is chosen at random. The rho in Pollard Rho contains such a junction, with the cycle containing multiple lines leading on to it. For a linear function f(x) = ax + b then for gcd(a, p) = 1 mod p (which is likely since p is prime) f(y) = f(z) means that y = z mod p, so there are no such junctions.
If you look at http://www.agner.org/random/theory/chaosran.pdf you will see that the expected cycle length of a random function is about the sqrt of the state size, but the expected cycle length of a random bijection is about the state size. If you think of generating the random function only as you evaluate it you can see that if the function is entirely random then every value seen so far is available to be chosen again at random to find a cycle, so the odds of closing the cycle increase with the cycle length, but if the function has to be invertible the only way to close the cycle is to generate the starting point, which is much less likely.

What is the name of this problem?

You are given a list of distances between various points on a single line.
For example:
100 between a and b
20 between c and b
90 between c and d
170 between a and d
Return the sorted sequence of points as they appear on the line with distances between them:
For example the above input yields:
a<----80-----> c <----20------> b <----70-----> d or the reverse sequence(doesn't matter)
What is this problem called? I would like to research it.
If anybody knows, also, what are some of the possible asymptotic runtimes achieved for this?
not sure it has a name; more formally stated, it would be:
|a-b| = 100
|c-b| = 20
|c-d| = 90
|a-d| = 170
where |x| stands for the absolute value of x
As far as the generalized system goes, if you have N equations of this type with k unknowns, you have N choices of sign. Without loss of generality (because any solution yields a second solution with reversed ordering) you can choose a sign for the first equation, and a particular value for one of the unknowns (since the whole thing can slide left and right in position). Then you have 2N-1 possibilities for the remaining equations, and all you have to do is go through them to see which ones, if any, have solutions. Because the coefficients are all +/- 1 and each equation has 2 unknowns, you just go through them one by one:
Step 1: Without loss of generality,
choose a sign for one equation
and pick a value for one unknown:
a-b = 100, a = 0
Step 2: Choose signs for the remaining absolute values.
a = 0
a-b = 100
c-b = 20
c-d = 90
a-d = 170
Step 3: go through them one by one to solve / verify there aren't conflicts
(time = N steps).
0-b = 100 => b = -100
c-b = 20 => c = -80
c-d = 90 => d = -170
a-d = 170 => OK => (0,-100,-80,-170) is a solution
Step 4: if this doesn't work, go back through the possible choices of sign
and try again, starting at step 2.
Full set of solutions is (0,-100,-80,-170)
and its negation (0,100,80,170) and any number x<sub>0</sub> added to all terms.
So an upper bound for the runtime is O(N * 2N-1) ≡ O(N * 2N).
I suppose there could be a shortcut but no obvious one comes to mind.
As written, your problem is just a system of non-linear equations (expressed with absolute values or quadratic equations). However, it looks similar to the problems of finding Golomb rulers or perfect rulers.
If you consider your constraints as quadratic equations eg. (a-b)^2=100^2, then you can formulate this as a quadratic programming problem and use some of the well-studied techniques for that class of problem.
Considering the sign of the direction of each segment X[i] -> X[i+1] it becomes a boolean satisfiability problem. I can't see an obvious simplification. The runtime is O(2^N) - specifically 2^(N-2) tests with N values and an O(1) expression to test.
Assuming a = 0 and fixing the direction of a -> b:
a = 0
b = 100
c = b + 20 X[0] = 100 + 20 X[0]
d = c + 90 X[1] = 100 + 20 X[0] + 90 X[1]
test d == 170
where X[i] is either +1 or -1.
Although the expression for d appears to require O(N) operations ( (N-2) multiplications and (N-2) additions ), by using a Gray code or other such mechanism for changing the state of only one X at a time so the cost can be O(1) per test. ( though for N=4 it probably isn't worth it )
Simplifications may arise if you either have more constraints than points - if you were given |b-d| == 70, then you only need tests two cases rather than four - essentially b,c and d become their own fully constrained sub-problem.
Simplifications may also arise from the triangular property
| |a-b| - |b-c| | <= |a-c| <= |a-b| + |b-c| for all a, b and c.
So if you have many points, and you know the total of the distances between the points up to a certain point given the assignments made to X, and that total is further from the target value than the total of the distances between the remaining points, you can then deduce that there is no combination of assignments of the remaining points which will work.
algebra...
or it may be a simplification of the traveling salesman problem
I don't have an algorithms book handy, but this sounds like a graph search problem where the paths are constrained. You could probably use Dijkstra's Algorithm or some variant of it.

Resources