Approximation of a common divisor closest to some value? - algorithm

Say we have two numbers (not necessarily integers) x1 and x2. Say, the user inputs a number y. What I want to find, is a number y' close to y so that x1 % y' and x2 % y' are very small (smaller than 0.02, for example, but lets call this number LIMIT). In other words, I don't need an optimal algorithm, but a good approximation.
I thank you all for your time and effort, that's really kind!
Let me explain what the problem is in my application : say, a screen size is given, with a width of screenWidth and a height of screenHeight (in pixels). I fill the screen with squares of a length y'. Say, the user wants the square size to be y. If y is not a divisor of screenWidth and/or screenHeight, there will be non-used space at the sides of the screen, not big enough to fit squares. If that non-used space is small (e.g. one row of pixels), it's not that bad, but if it's not, it won't look good. How can I find common divisors of screenWidth and screenHeight?

I don't see how you can ensure that x1%y' and x2%y' are both below some value - if x1 is prime, nothing is going to be below your limit (if the limit is below 1) except x1 (or very close) and 1.
So the only answer that always works is the trivial y'=1.
If you are permitting non-integer divisors, then just pick y'=1/(x1*x2), since the remainder is always 0.
Without restricting the common divisor to integers, it can be anything, and the whole 'greatest common divisor' concept goes out the window.

x1 and x2 are not very large, so a simple brute force algorithm should be good enough.
Divide x1 and x2 to y and compute floor and ceiling of the results. This gives four numbers: x1f, x1c, y1f, y1c.
Choose one of these numbers, closest to the exact value of x1/y (for x1f, x1c) or x2/y (for y1f, y1c). Let it be x1f, for example. Set y' = x1/x1f. If both x1%y' and y1%y' are not greater than limit, success (y' is the best approximation). Otherwise add x1f - 1 to the pool of four numbers (or y1f - 1, or x1c + 1, or y1c + 1), choose another closest number and repeat.

You want to fit the maximum amount of evenly spaced squares inside a fixed area. It's possible to find the optimal solution for your problem with some simple math.
Lets say you have a region with width = W and height = H, and you are trying to fit squares with sides of length = x. The maximum number of squares horizontaly and verticaly, that I will call max_hor and max_vert respectively, are max_hor=floor(W/x) and max_vert=floor(H/x) . If you draw all the squares side by side, without any spacement, there will be a rest in each line and each column. Lets call the horizontal/vertical rests respectively by rest_w and rest_h. This case is illustrated in the figure below:
Note that rest_w=W-max_horx and rest_h=H-max_vertx.
What you want is divide rest_w and rest_h equaly, generating small horizontal and vertical spaces of sizes space_w and space_h like the figure below:
Note that space_w=rest_w/(max_hor+1) and space_h=rest_h/(max_vert+1).
Is that the number you are looking for?

I believe I made a mistake, but I don't see why. Based on Phil H's answer, I decided to restrict to integer values, but multiply x1 and x2 by a power of 10. Afterwards, I'd divide the common integer divisors by that number.
Online, I found a common factors calculator. Experimenting with it made me realize it wouldn't give me any common divisors... I tried multiple cases (x1 = 878000 and x2 = 1440000 and some others), and none of them had good results.
In other words, you probably have to multiply with very high numbers to achieve results, but that would make the calculation very, very slow.
If anyone has a solution to this problem, that would be awesome. For now though, I decided to take advantage of the fact that screenWidth and screenHeight are good numbers to work with, since they are the dimension of a computer screen. 900 and 1440 have more than enough common divisors, so I can work with that...
Thank you all for your answers on this thread and on my previous thread about an optimal algorithm for this problem.

Related

Rational approximation of rational exponentiation root with error control

I am looking for an algorithm that would efficiently calculate b^e where b and e are rational numbers, ensuring that the approximation error won't exceed given err (rational as well). Explicitly, I am looking for a function:
rational exp(rational base, rational exp, rational err)
that would preserve law |exp(b, e, err) - b^e| < err
Rational numbers are represented as pairs of big integers. Let's assume that all rationality preserving operations like addition, multiplication etc. are already defined.
I have found several approaches, but they did not allow me to control the error clearly enough. In this problem I don't care about integer overflow. What is the best approach to achieve this?
This one is complicated, so I'm going to outline the approach that I'd take. I do not promise no errors, and you'll have a lot of work left.
I will change variables from what you said to exp(x, y, err) to be x^y within error err.If y is not in the range 0 <= y < 1, then we can easily multiply by an appropriate x^k with k an integer to make it so. So we only need to worry about fractional `y
If all numerators and denominators were small, it would be easy to tackle this by first taking an integer power, and then taking a root using Newton's method. But that naive idea will fall apart painfully when you try to estimate something like (1000001/1000000)^(2000001/1000000). So the challenge is to keep that from blowing up on you.
I would recommend looking at the problem of calculating x^y as x^y = (x0^y0) * (x0^(y-y0)) * (x/x0)^y = (x0^y0) * e^((y-y0) * log(x0)) * e^(y * log(x/x0)). And we will choose x0 and y0 such that the calculations are easier and the errors are bounded.
To bound the errors, we can first come up with a naive upper bound b on x0^y0 - something like "next highest integer than x to the power of the next highest integer than y". We will pick x0 and y0 to be close enough to x and y that the latter terms are under 2. And then we just need to have the three terms estimated to within err/12, err/(6*b) and err/(6*b). (You might want to make those errors tighter half that then make the final answer a nearby rational.)
Now when we pick x0 and y0 we will be aiming for "close rational with smallish numerator/denominator". For that we start calculating the continued fraction. This gives a sequence of rational numbers that quickly converges to a target real. If we just cut off the sequence fairly soon, we can quickly find a rational number that is within any desired distance of a target real while keeping relatively small numerators and denominators.
Let's work from the third term backwards.
We want y * log(x/x0) < log(2). But from the Taylor series if x/2 < x0 < 2x then log(x/x0) < x/x0 - 1. So we can search the continued fraction for an appropriate x0.
Once we have found it, we can use the Taylor series for log(1+z) to calculate log(x/x0) to within err/(12*y*b). And then the Taylor series for e^z to calculate the term to our desired error.
The second term is more complicated. We need to estimate log(x0). What we do is find an appropriate integer k such that 1.1^k <= x0 < 1.1^(k+1). And then we can estimate both k * log(1.1) and log(x0 / 1.1^k) fairly precisely. Find a naive upper bound to that log and use it to find a close enough y0 for the second term to be within 2. And then use the Taylor series to estimate e^((y-y0) * log(x0)) to our desired precision.
For the first term we use the naive method of raising x0 to an integer and then Newton's method to take a root, to give x0^y0 to our desired precision.
Then multiply them together, and we have an answer. (If you chose the "tighter errors, nicer answer", then now you'd do a continued fraction on that answer to pick a better rational to return.)

Querying large amount of multidimensional points in R^N

I'm looking at listing/counting the number of integer points in R^N (in the sense of Euclidean space), within certain geometric shapes, such as circles and ellipses, subject to various conditions, for small N. By this I mean that N < 5, and the conditions are polynomial inequalities.
As a concrete example, take R^2. One of the queries I might like to run is "How many integer points are there in an ellipse (parameterised by x = 4 cos(theta), y = 3 sin(theta) ), such that y * x^2 - x * y = 4?"
I could implement this in Haskell like this:
ghci> let latticePoints = [(x,y) | x <- [-4..4], y <-[-3..3], 9*x^2 + 16*y^2 <= 144, y*x^2 - x*y == 4]
and then I would have:
ghci> latticePoints
[(-1,2),(2,2)]
Which indeed answers my question.
Of course, this is a very naive implementation, but it demonstrates what I'm trying to achieve. (I'm also only using Haskell here as I feel it most directly expresses the underlying mathematical ideas.)
Now, if I had something like "In R^5, how many integer points are there in a 4-sphere of radius 1,000,000, satisfying x^3 - y + z = 20?", I might try something like this:
ghci> :{
Prelude| let latticePoints2 = [(x,y,z,w,v) | x <-[-1000..1000], y <- [-1000..1000],
Prelude| z <- [-1000..1000], w <- [-1000..1000], v <-[1000..1000],
Prelude| x^2 + y^2 + z^2 + w^2 + v^2 <= 1000000, x^3 - y + z == 20]
Prelude| :}
so if I now type:
ghci> latticePoints2
Not much will happen...
I imagine the issue is because it's effectively looping through 2000^5 (32 quadrillion!) points, and it's clearly unreasonably of me to expect my computer to deal with that. I can't imagine doing a similar implementation in Python or C would help matters much either.
So if I want to tackle a large number of points in such a way, what would be my best bet in terms of general algorithms or data structures? I saw in another thread (Count number of points inside a circle fast), someone mention quadtrees as well as K-D trees, but I wouldn't know how to implement those, nor how to appropriately query one once it was implemented.
I'm aware some of these numbers are quite large, but the biggest circles, ellipses, etc I'd be dealing with are of radius 10^12 (one trillion), and I certainly wouldn't need to deal with R^N with N > 5. If the above is NOT possible, I'd be interested to know what sort of numbers WOULD be feasible?
There is no general way to solve this problem. The problem of finding integer solutions to algebraic equations (equations of this sort are called Diophantine equations) is known to be undecidable. Apparently, you can write equations of this sort such that solving the equations ends up being equivalent to deciding whether a given Turing machine will halt on a given input.
In the examples you've listed, you've always constrained the points to be on some well-behaved shape, like an ellipse or a sphere. While this particular class of problem is definitely decidable, I'm skeptical that you can efficiently solve these problems for more complex curves. I suspect that it would be possible to construct short formulas that describe curves that are mostly empty but have a huge bounding box.
If you happen to know more about the structure of the problems you're trying to solve - for example, if you're always dealing with spheres or ellipses - then you may be able to find fast algorithms for this problem. In general, though, I don't think you'll be able to do much better than brute force. I'm willing to admit that (and in fact, hopeful that) someone will prove me wrong about this, though.
The idea behind the kd-tree method is that you recursive subdivide the search box and try to rule out whole boxes at a time. Given the current box, use some method that either (a) declares that all points in the box match the predicate (b) declares that no points in the box match the predicate (c) makes no declaration (one possibility, which may be particularly convenient in Haskell: interval arithmetic). On (c), cut the box in half (say along the longest dimension) and recursively count in the halves. Obviously the method can choose (c) all the time, which devolves to brute force; the goal here is to do (a) or (b) as much as possible.
The performance of this method is very dependent on how it's instantiated. Try it -- it shouldn't be more than a couple dozen lines of code.
For nicely connected region, assuming your shape is significantly smaller than your containing search space, and given a seed point, you could do a growth/building algorithm:
Given a seed point:
Push seed point into test-queue
while test-queue has items:
Pop item from test-queue
If item tests to be within region (eg using a callback function):
Add item to inside-set
for each neighbour point (generated on the fly):
if neighbour not in outside-set and neighbour not in inside-set:
Add neighbour to test-queue
else:
Add item to outside-set
return inside-set
The trick is to find an initial seed point that is inside the function.
Make sure your set implementation gives O(1) duplicate checking. This method will eventually break down with large numbers of dimensions as the surface area exceeds the volume, but for 5 dimensions should be fine.

Robot moving in a grid algorithm possible paths and time complexity ?

I'm not able to understand how for the problem below the number of paths are (x+y)!/x!y! .. I understand it comes from choose X items out of a path of X+Y items, but why is it not choosing x items over x+y + choosing y items over x+y ? Why does it have to be only x ?
A robot is located at the top-left corner of a m x n grid (marked
‘Start’ in the diagram below). The robot can only move either down or
right at any point in time. The robot is trying to reach the
bottom-right corner of the grid (marked ‘Finish’ in the diagram
below). How many possible paths are there?
Are all of these paths unique ?
How do I determine that ?
And what would be the time complexity for the backtracking algorithm ?
This is somewhat based on Mukul Joshi's answer, but hopefully a little clearer.
To go from 0,0 to x,y, you need to move right exactly x times and down exactly y times.
Let each right movement be represented by a 0 and a down movement by a 1.
Let a string of 0s and 1s then indicate a path from 0,0 to x,y. This path will contain x 0s and y 1s.
Now we want to count all such strings. This is equivalent to counting the number of permutations of any string containing x 0s and y 1s. This string is a multiset (each element can appear more than once), thus we want a multiset permutation, which can be calculated as n!/(m1!m2!...mk!) where n is the total number of characters, k is the number of unique characters and mi is the number of times the ith unique character is repeated. Since there are x+y characters in total, and 0 is repeated x times and 1 is repeated y times, we get to (x+y)!/x!y!.
Time Complexity:
The time complexity of backtracking / brute force would involve having to explore all of these paths. Think of it as a tree, with there being (x+y)!/x!y! leaves. I might be wrong, but I think the number of nodes in trees with a branching factor > 1 can be represented as the big-O of the number of leaves, thus we end up with O((x+y)!/x!y!) nodes, and thus the same time complexity.
Ok, I give you a solution to that problem so that you have better time catching it.
First of all, let us decide a solution algorithm. We will count all possible paths for every cell to reach end from it. The algorithm will check cells and write there sum of right and bottom cells. We do it because robot can move down and follow any of bottom paths or move right and follow any of rightside paths, thus, adding the total number of different paths. It is quite obvious for me to prove the divercity of these paths. If you want I can do it in comments.
Initial values for cells will be 1 for rightmost bottom cell (finish) because there only 1 way to get there from this cell (not to move at all). And if cell doesn't exist (e.g. taking bottom cell for bottommost cell) it will have value of 0.
Building cell values one by one will result in a Pascal's Triangle which values are (x + y)! / x! / y! in a (x, y) cell where x is the Ox distance from finish and y is Oy one.
Talking about complexity we will have x * y iterations over grid cells, each iteration is a constant time. If you don't want to use backtracking algorith you can use the formula that is mentioned above and have O(x + y) instead of O(x * y)
Well here is the explanation.
To reach till the destination no matter how you go, the path has to have m rows and n columns.
Consider that you represent row by 1 and column by 0. Your path is a string of m+n characters. But it can have only m 1s and n 0s.
if you have m+n different characters the number of permutations will be (m+n)! but when you have repeating characters then it will be (m+n)!/m!n! Refer to this
Of course this will be unique. Test it for 4*3 grid and you can see it.
You don't add "How many ways can I distribute my X moves?" to "How many ways can I distribute my Y moves?" for two reasons:
The distribution of X moves and Y moves are not independent. For each configuration of X moves, there is only 1 possible configuration of Y moves.
If they were independent, you wouldn't add them, you would multiply them. For example, if I have X different color shirts and Y different color pants, there are X * Y different combinations of shirts and pants.
Note that for #1 there is nothing special about X - I could just have easily chosen Y and said: "The distribution of Y moves and X moves are not independent. For each configuration of Y moves, there is only 1 possible configuration of X moves." Which is why, as others have pointed out, counting the number of ways to distribute your Y moves gives the same result as counting the number of ways to distribute your X moves.

Which algorithm will be required to do this?

I have data of this form:
for x=1, y is one of {1,4,6,7,9,18,16,19}
for x=2, y is one of {1,5,7,4}
for x=3, y is one of {2,6,4,8,2}
....
for x=100, y is one of {2,7,89,4,5}
Only one of the values in each set is the correct value, the rest is random noise.
I know that the correct values describe a sinusoid function whose parameters are unknown. How can I find the correct combination of values, one from each set?
I am looking something like "travelling salesman"combinatorial optimization algorithm
You're trying to do curve fitting, for which there are several algorithms depending on the type of curve you want to fit your curve to (linear, polynomial, etc.). I have no idea whether there is a specific algorithm for sinusoidal curves (Fourier approximations), but my first idea would be to use a polynomial fitting algorithm with a polynomial approximation of the sine.
I wonder whether you need to do this in the course of another larger program, or whether you are trying to do this task on its own. If so, then you'd be much better off using a statistical package, my preferred one being R. It allows you to import your data and fit curves and draw graphs in just a few lines, and you could also use R in batch-mode to call it from a script or even a program (this is what I tend to do).
It depends on what you mean by "exactly", and what you know beforehand. If you know the frequency w, and that the sinusoid is unbiased, you have an equation
a cos(w * x) + b sin(w * x)
with two (x,y) points at different x values you can find a and b, and then check the generated curve against all the other points. Choose the two x values with the smallest number of y observations and try it for all the y's. If there is a bias, i.e. your equation is
a cos(w * x) + b sin(w * x) + c
You need to look at three x values.
If you do not know the frequency, you can try the same technique, unfortunately the solutions may not be unique, there may be more than one w that fits.
Edit As I understand your problem, you have a real y value for each x and a bunch of incorrect ones. You want to find the real values. The best way to do this is to fit curves through a small number of points and check to see if the curve fits some y value in the other sets.
If not all the x values have valid y values then the same technique applies, but you need to look at a much larger set of pairs, triples or quadruples (essentially every pair, triple, or quad of points with different y values)
If your problem is something else, and I suspect it is, please specify it.
Define sinusoid. Most people take that to mean a function of the form a cos(w * x) + b sin(w * x) + c. If you mean something different, specify it.
2 Specify exactly what success looks like. An example with say 10 points instead of 100 would be nice.
It is extremely unclear what this has to do with combinatorial optimization.
Sinusoidal equations are so general that if you take any random value of all y's these values can be fitted in sinusoidal function unless you give conditions eg. Frequency<100 or all parameters are integers,its not possible to diffrentiate noise and data theorotically so work on finding such conditions from your data source/experiment first.
By sinusoidal, do you mean a function that is increasing for n steps, then decreasing for n steps, etc.? If so, you you can model your data as a sequence of nodes connected by up-links and down-links. For each node (possible value of y), record the length and end-value of chains of only ascending or descending links (there will be multiple chain per node). Then you scan for consecutive runs of equal length and opposite direction, modulo some initial offset.

How to choose group of numbers in the vector

I have an application with some probabilities of measured features. I want to select n-best features from vector. I have a vector of real numbers. Vector is normalized, sum of all numbers is 1 (it is probability of some features).
I want to select group of n less than N (assume approx. 8) largest numbers. Numbers has to be close together without gaps and they're also should have large sum (sum of remaining numbers should be several times lower).
Any ideas how to accomplish that?
I tried to use 80% quantile (but it is not sensitive to relative large gaps like [0.2, 0.2, 0.01, 0.01, 0.001, 0.001 ... len ~ 100] ), I tried a some treshold between two successive numbers, but nothing work too good.
I have some partial solution at this moment but I am just wondering if there is some simple solution that I have overlooked.
John's answer is good. Also you might try
sort the probabilities
find the largest gap between successive probabilities
work up from there
From there, it's starting to sound like a pattern-recognition problem.My favorite method is markov-chain-monte-carlo(MCMC).
Edit: Since you clarified your question, my first thought is, since you only have 8 possible answers, develop a score for each one, based on how much probability it contains and whether or not it splits at a gap, and make a heuristic judgement.
Further edit: This sounds a bit like logistic regression. You want to find a value of P that effectively divides your set into members and non-members. For a given value of P, you can compute a log-likelihood for the ensemble, and choose P that maximizes that.
It sounds like you're wanting to select the n largest probabilities but the number n is flexible. If n were fixed, say n=10, you could just sort your vector and pull out the top 10 items. But from your example it sounds like you'd like to use a smaller value of n if there's a natural break in the data. Maybe you want to start with the largest probability and go down the list selecting items until the sum of the probabilities you pick crosses some threshold.
Maybe you have an implicit optimization problem where you want to maximize some probability with some penalty for large n. Try stating your problem that way. You might find your own answer, or you might be able to rephrase your question here in a way that helps other people give you a better answer.
I'm not really sure if this is what you want, but it seems you want to do the following.
Lets assume that the probabilities are x_1,...,x_N in increasing order. Then you should try to find 1<= i < j <= N such that the function
f(i,j) = (x_i + x_(i+1) + ... + x_j)/(x_j - x_i)
is maximized. This can be done naively in quadratic time.

Resources