Optimized algorithm for converting a decimal to a "pretty" fraction - algorithm

Rather than converting an arbitrary decimal to an exact fraction (something like 323527/4362363), I am trying to convert to just common easily-discernible (in terms of human-readability) quantities like 1/2, 1/4, 1/8 etc.
Other than using a series of if-then, less than/equal to etc comparisons, are there more optimized techniques to do this?
Edit: In my particular case, approximations are acceptable. The idea is that 0.251243 ~ 0.25 = 1/4 - in my usage case, that's "good enough", with the latter more preferable for human readability in terms of a quick indicator (not used for calculation, just used as display numerics).

Look up "continued fraction approximation". Wikipedia has a basic introduction in its "continued fraction" article, but there are optimized algorithms that generate the approximated value while generating the fraction.
Then pick some stopping heuristic, a combination of size of denominator and closeness of approximation, for when you're "close enough".

You can use Euclidean algorithm to get Greatest Common Divisor between enumerator and denominator and divide them by it.

In the following, I'm going to assume that our decimals fall in between 0 and 1. It should be straightforward to adapt this to larger numbers and negative numbers.
Probably the easiest thing to do would be to choose the largest denominator that you would find acceptable and then create a list of fractions between 0 and 1 which have that denominators less than or equal to them. Be sure to avoid any fractions which can be simplified. Obviously, once you've listed 1/2, you don't need 2/4. You can avoid fractions which can be simplified by checking that the GCD of the numerator and denominator is 1 suing Euclid's algorithm. Once you have your list. Evaluate these as floating point numbers (probably doubles, but the data type obviously depends on your choice of programming language). Then insert them into a balanced binary search tree storing both the original fraction and the floating point evaluation of the fraction. You should only need to do this once to set things up initially so the n*log(n) time (where n is the number of fractions) isn't very much.
Then, whenever you get a number, simply search the tree to find the closest number to it which is in the search tree. Note that this is slightly more complicated than searching for an exact match because the node you're looking for may not be a leaf node. So, as you traverse the tree keep a record of the closest valued node that you have visited. Once you reach a leaf node and compare that one to your closest valued node that you have visited, you are done. Whichever your closest one is, it's fraction is your answer.

Here is a suggestion: Assuming your starting fraction is p/q
Calculate r = p/q as a rational(floating point) value (e.g. r = float(p)/float(q))
Calculate the rounded decimal x = int(10000*r)
Calculate GCD (greatest common denominator) of x and 10000: s = GCD(x, 10000)
Represent the result as m / n where m = x/s and n = y/s (your example computes to 371 / 5000)
Normally, all denominators of 1000 are fairly human readable.
This might not provide the best result when the value is closer to simpler cases such as 1/3. However, I personally find 379/1000 much more human readable than 47/62 (which is the shortest fractional representation). You can add a few exceptions to fine tune such process though (e.g. calculating the p/GCD(p,q) , q/GCD(p,q) and accepting it if one of those are single digit values before proceeding to this method)

Pretty dumb solution, just for "previewing" fraction :
factor = 1/decimal
result = 1/Round(factor)
mult = 1
while (result = 1) {
mult = mult * 10
result = (1 * mult)/(Round(mult * factor))
}
result = simplify_with_GCD(result)
good luck!

Related

Rational approximation of rational exponentiation root with error control

I am looking for an algorithm that would efficiently calculate b^e where b and e are rational numbers, ensuring that the approximation error won't exceed given err (rational as well). Explicitly, I am looking for a function:
rational exp(rational base, rational exp, rational err)
that would preserve law |exp(b, e, err) - b^e| < err
Rational numbers are represented as pairs of big integers. Let's assume that all rationality preserving operations like addition, multiplication etc. are already defined.
I have found several approaches, but they did not allow me to control the error clearly enough. In this problem I don't care about integer overflow. What is the best approach to achieve this?
This one is complicated, so I'm going to outline the approach that I'd take. I do not promise no errors, and you'll have a lot of work left.
I will change variables from what you said to exp(x, y, err) to be x^y within error err.If y is not in the range 0 <= y < 1, then we can easily multiply by an appropriate x^k with k an integer to make it so. So we only need to worry about fractional `y
If all numerators and denominators were small, it would be easy to tackle this by first taking an integer power, and then taking a root using Newton's method. But that naive idea will fall apart painfully when you try to estimate something like (1000001/1000000)^(2000001/1000000). So the challenge is to keep that from blowing up on you.
I would recommend looking at the problem of calculating x^y as x^y = (x0^y0) * (x0^(y-y0)) * (x/x0)^y = (x0^y0) * e^((y-y0) * log(x0)) * e^(y * log(x/x0)). And we will choose x0 and y0 such that the calculations are easier and the errors are bounded.
To bound the errors, we can first come up with a naive upper bound b on x0^y0 - something like "next highest integer than x to the power of the next highest integer than y". We will pick x0 and y0 to be close enough to x and y that the latter terms are under 2. And then we just need to have the three terms estimated to within err/12, err/(6*b) and err/(6*b). (You might want to make those errors tighter half that then make the final answer a nearby rational.)
Now when we pick x0 and y0 we will be aiming for "close rational with smallish numerator/denominator". For that we start calculating the continued fraction. This gives a sequence of rational numbers that quickly converges to a target real. If we just cut off the sequence fairly soon, we can quickly find a rational number that is within any desired distance of a target real while keeping relatively small numerators and denominators.
Let's work from the third term backwards.
We want y * log(x/x0) < log(2). But from the Taylor series if x/2 < x0 < 2x then log(x/x0) < x/x0 - 1. So we can search the continued fraction for an appropriate x0.
Once we have found it, we can use the Taylor series for log(1+z) to calculate log(x/x0) to within err/(12*y*b). And then the Taylor series for e^z to calculate the term to our desired error.
The second term is more complicated. We need to estimate log(x0). What we do is find an appropriate integer k such that 1.1^k <= x0 < 1.1^(k+1). And then we can estimate both k * log(1.1) and log(x0 / 1.1^k) fairly precisely. Find a naive upper bound to that log and use it to find a close enough y0 for the second term to be within 2. And then use the Taylor series to estimate e^((y-y0) * log(x0)) to our desired precision.
For the first term we use the naive method of raising x0 to an integer and then Newton's method to take a root, to give x0^y0 to our desired precision.
Then multiply them together, and we have an answer. (If you chose the "tighter errors, nicer answer", then now you'd do a continued fraction on that answer to pick a better rational to return.)

Most efficient algorithm to compute a common numerator of a sum of fractions

I'm pretty sure that this is the right site for this question, but feel free to move it to some other stackexchange site if it fits there better.
Suppose you have a sum of fractions a1/d1 + a2/d2 + … + an/dn. You want to compute a common numerator and denominator, i.e., rewrite it as p/q. We have the formula
p = a1*d2*…*dn + d1*a2*d3*…*dn + … + d1*d2*…d(n-1)*an
q = d1*d2*…*dn.
What is the most efficient way to compute these things, in particular, p? You can see that if you compute it naïvely, i.e., using the formula I gave above, you compute a lot of redundant things. For example, you will compute d1*d2 n-1 times.
My first thought was to iteratively compute d1*d2, d1*d2*d3, … and dn*d(n-1), dn*d(n-1)*d(n-2), … but even this is inefficient, because you will end up computing multiplications in the "middle" twice (e.g., if n is large enough, you will compute d3*d4 twice).
I'm sure this problem could be expressed somehow using maybe some graph theory or combinatorics, but I haven't studied enough of that stuff to have a good feel for it.
And one note: I don't care about cancelation, just the most efficient way to multiply things.
UPDATE:
I should have known that people on stackoverflow would be assuming that these were numbers, but I've been so used to my use case that I forgot to mention this.
We cannot just "divide" out an from each term. The use case here is a symbolic system. Actually, I am trying to fix a function called .as_numer_denom() in the SymPy computer algebra system which presently computes this the naïve way. See the corresponding SymPy issue.
Dividing out things has some problems, which I would like to avoid. First, there is no guarantee that things will cancel. This is because mathematically, (a*b)**n != a**n*b**n in general (if a and b are positive it holds, but e.g., if a == b ==-1 and n == 1/2, you get (a*b)**n == 1**(1/2) == 1 but (-1)**(1/2)*(-1)**(1/2) == I*I == -1). So I don't think it's a good idea to assume that dividing by an will cancel it in the expression (this may be actually be unfounded, I'd need to check what the code does).
Second, I'd like to also apply a this algorithm to computing the sum of rational functions. In this case, the terms would automatically be multiplied together into a single polynomial, and "dividing" out each an would involve applying the polynomial division algorithm. You can see in this case, you really do want to compute the most efficient multiplication in the first place.
UPDATE 2:
I think my fears for cancelation of symbolic terms may be unfounded. SymPy does not cancel things like x**n*x**(m - n) automatically, but I think that any exponents that would combine through multiplication would also combine through division, so powers should be canceling.
There is an issue with constants automatically distributing across additions, like:
In [13]: 2*(x + y)*z*(S(1)/2)
Out[13]:
z⋅(2⋅x + 2⋅y)
─────────────
2
But this is first a bug and second could never be a problem (I think) because 1/2 would be split into 1 and 2 by the algorithm that gets the numerator and denominator of each term.
Nonetheless, I still want to know how to do this without "dividing out" di from each term, so that I can have an efficient algorithm for summing rational functions.
Instead of adding up n quotients in one go I would use pairwise addition of quotients.
If things cancel out in partial sums then the numbers or polynomials stay smaller, which makes computation faster.
You avoid the problem of computing the same product multiple times.
You could try to order the additions in a certain way, to make canceling more likely (maybe add quotients with small denominators first?), but I don't know if this would be worthwhile.
If you start from scratch this is simpler to implement, though I'm not sure it fits as a replacement of the problematic routine in SymPy.
Edit: To make it more explicit, I propose to compute a1/d1 + a2/d2 + … + an/dn as (…(a1/d1 + a2/d2) + … ) + an/dn.
Compute two new arrays:
The first contains partial multiples to the left: l[0] = 1, l[i] = l[i-1] * d[i]
The second contains partial multiples to the right: r[n-1] = 1, r[i] = d[i] * r[i+1]
In both cases, 1 is the multiplicative identity of whatever ring you are working in.
Then each of your terms on the top, t[i] = l[i-1] * a[i] * r[i+1]
This assumes multiplication is associative, but it need not be commutative.
As a first optimization, you don't actually have to create r as an array: you can do a first pass to calculate all the l values, and accumulate the r values during a second (backward) pass to calculate the summands. No need to actually store the r values since you use each one once, in order.
In your question you say that this computes d3*d4 twice, but it doesn't. It does multiply two different values by d4 (one a right-multiplication and the other a left-multiplication), but that's not exactly a repeated operation. Anyway, the total number of multiplications is about 4*n, vs. 2*n multiplications and n divisions for the other approach that doesn't work in non-commutative multiplication or non-field rings.
If you want to compute p in the above expression, one way to do this would be to multiply together all of the denominators (in O(n), where n is the number of fractions), letting this value be D. Then, iterate across all of the fractions and for each fraction with numerator ai and denominator di, compute ai * D / di. This last term is equal to the product of the numerator of the fraction and all of the denominators other than its own. Each of these terms can be computed in O(1) time (assuming you're using hardware multiplication, otherwise it might take longer), and you can sum them all up in O(n) time.
This gives an O(n)-time algorithm for computing the numerator and denominator of the new fraction.
It was also pointed out to me that you could manually sift out common denominators and combine those trivially without multiplication.

finding smallest scale factor to get each number within one tenth of a whole number from a set of doubles

Suppose we have a set of doubles s, something like this:
1.11, 1.60, 5.30, 4.10, 4.05, 4.90, 4.89
We now want to find the smallest, positive integer scale factor x that any element of s multiplied by x is within one tenth of a whole number.
Sorry if this isn't very clear—please ask for clarification if needed.
Please limit answers to C-style languages or algorithmic pseudo-code.
Thanks!
You're looking for something called simultaneous Diophantine approximation. The usual statement is that you're given real numbers a_1, ..., a_n and a positive real epsilon and you want to find integers P_1, ..., P_n and Q so that |Q*a_j - P_j| < epsilon, hopefully with Q as small as possible.
This is a very well-studied problem with known algorithms. However, you should know that it is NP-hard to find the best approximation with Q < q where q is another part of the specification. To the best of my understanding, this is not relevant to your problem because you have a fixed epsilon and want the smallest Q, not the other way around.
One algorithm for the problem is (Lenstra–Lenstra)–Lovász's lattice reduction algorithm. I wonder if I can find any good references for you. These class notes mention the problem and algorithm, but probably aren't of direct help. Wikipedia has a fairly detailed page on the algorithm, including a fairly large list of implementations.
To answer Vlad's modified question (if you want exact whole numbers after multiplication), the answer is known. If your numbers are rationals a1/b1, a2/b2, ..., aN/bN, with fractions reduced (ai and bi relatively prime), then the number you need to multiply by is the least common multiple of b1, ..., bN.
This is not a full answer, but some suggestions:
Note: I'm using "s" for the scale factor, and "x" for the doubles.
First of all, ask yourself if brute force doesn't work. E.g. try s = 1, then s = 2, then s = 3, and so forth.s
We have a list of numbers x[i], and a tolerance t = 1/10. We want to find the smallest positive integer s, such that for each x[i], there is an integer q[i] such that |s * x[i] - q[i]| < t.
First note that if we can produce an ordered list for each x[i], it's simple enough to merge these to find the smallest s that will work for all of them. Secondly note that the answer depends only on the fractional part of x[i].
Rearranging the test above, we have |x - q/s| < t/s. That is, we want to find a "good" rational approximation for x, in the sense that the approximation should be better than t/s. Mathematicians have studied a variant of this where the criterion for "good" is that it has to be better than any with a smaller "s" value, and the best way to find these is through truncations of the continued fraction expansion.
Unfortunately, this isn't quite what you need, since once you get under your tolerance, you don't necessarily need to continue to get increasingly better -- the same tolerance will work. The next obvious thing is to use this to skip to the first number that would work, and do brute force from there. Unfortunately, for any number the largest the first s can be is 5, so that doesn't buy you all that much. However, this method will find you an s that works, just not the smallest one. Can we use this s to find a smaller one, if it exists? I don't know, but it'll set an upper limit for brute forcing.
Also, if you need the tolerance for each x to be < t, than this means the tolerance for the product of all x must be < t^n. This might let you skip forward a great deal, and set a reasonable lower limit for brute forcing.

Mean and std dev. of large set of data using recursion

Lets say I have a large set of data .
Then I can divide it into two find mean of those two and calculate the mean of the last 2 values I get.
a) Is this the mean of the original big quantity ?
b) Can I do this sort of method for calculating standard deviation ??
a) only if the sets you divide into are always the same size, meaning that the original set size must be a power of 2.
For example, the mean of {6} is 6, and the mean of {3,6} is 4.5, but the mean of {3,6,6} is not 5.25, it's 5.
Certainly you could recursively divide into parts to calculate the sum, though, and divide by the total size at the end. Not sure if that does you any good.
b) no
For example, the s.d of {2} is 0, and the s.d. of {1} is 0, but the s.d of {1,2} is not 0.
Once you've calculated the mean of the whole set, you can recursively divide to calculate the sum square deviation from the mean, and as with the mean calculation, divide by the total size and take square root at the end. [Edit: in fact all you need to calculate s.d is the sumsquare, the sum, and the count. Forgot about that. So you don't have to calculate the mean first]
It is incorrect, but if you can express the mean and standard deviation of a set from the means, standard deviations, and size of the sets which that set is divided into.
Specifically, if m_x, s_x and n_x are the means, standard deviations, and sizes of x, and X is partitioned into many x's, then
n_X = sum_x(n_x)
m_X = sum_x(n_x m_x)/n_X
s_X^2 = (sum_x(n_x(s_x^2 + m_x^2)) - m_X)/n_X
assuming the standard deviation is of the form sum(x - mean(x))/n; if it is the sample unbiased estimator, just adjust the weights accordingly.
Sure you can. No need for equal sets, power of two. Pseudo code:
N1,mean1,s1;
N2,mean2,s2;
N12,mean12,s12;
N12 = N1+N2;
mean12 = ((mean1*N1) + (mean2*N2)) / N12;
s12 = sqrt( (s1*s1*N1 + s2*s2*N2) / N12 + N1*N2/(N12*N12)*(s1-s2)*(s1-s2) );
http://en.wikipedia.org/wiki/Weighted_mean
http://en.wikipedia.org/wiki/Standard_deviation#Combining_standard_deviations
On (a) - it's only precisely correct if you precisely divided the set into two. If there were an odd number of items, for instance, there is a slight weighting toward the smaller "half". The larger the set, the less significant the problem. However, the problem recurs for the smaller sets as you subdivide. You get very large error when dividing a set of three items into a single item and a pair - each item in the pair is only half as significant to the final result as the single item.
I don't see the gain, though. You still do as many additions. You even end up doing more divisions. More importantly, you access memory in a non-sequential order, leading to poor cache performance.
The usual approach for a mean and standard deviation is to first calculate the sum of all items, and the sum of the squares - both in the same loop. Old calculators used to handle this with running totals, also keeping count of the number of items as they went. At the end, those three values (n, sum-of-x and sum-of-x-squared) are all you need - the rest is just substitution into the standard formulae for the mean and standard deviation.
EDIT
If you're dead set on using recursion for this, look up "tail recursion". Mathematically, tail recursion and iteration are equivalent - different representations of the same thing. In implementation terms tail recursion might cause a stack overflow where iteration would work, but (1) some languages guarantee this will not happen (e.g. Scheme, Haskell), and (2) many compilers will handle this as an optimisation anyway (e.g. GCC for C or C++).

Programming problem - Game of Blocks

maybe you would have an idea on how to solve the following problem.
John decided to buy his son Johnny some mathematical toys. One of his most favorite toy is blocks of different colors. John has decided to buy blocks of C different colors. For each color he will buy googol (10^100) blocks. All blocks of same color are of same length. But blocks of different color may vary in length.
Jhonny has decided to use these blocks to make a large 1 x n block. He wonders how many ways he can do this. Two ways are considered different if there is a position where the color differs. The example shows a red block of size 5, blue block of size 3 and green block of size 3. It shows there are 12 ways of making a large block of length 11.
Each test case starts with an integer 1 ≤ C ≤ 100. Next line consists c integers. ith integer 1 ≤ leni ≤ 750 denotes length of ith color. Next line is positive integer N ≤ 10^15.
This problem should be solved in 20 seconds for T <= 25 test cases. The answer should be calculated MOD 100000007 (prime number).
It can be deduced to matrix exponentiation problem, which can be solved relatively efficiently in O(N^2.376*log(max(leni))) using Coppersmith-Winograd algorithm and fast exponentiation. But it seems that a more efficient algorithm is required, as Coppersmith-Winograd implies a large constant factor. Do you have any other ideas? It can possibly be a Number Theory or Divide and Conquer problem
Firstly note the number of blocks of each colour you have is a complete red herring, since 10^100 > N always. So the number of blocks of each colour is practically infinite.
Now notice that at each position, p (if there is a valid configuration, that leaves no spaces, etc.) There must block of a color, c. There are len[c] ways for this block to lie, so that it still lies over this position, p.
My idea is to try all possible colors and positions at a fixed position (N/2 since it halves the range), and then for each case, there are b cells before this fixed coloured block and a after this fixed colour block. So if we define a function ways(i) that returns the number of ways to tile i cells (with ways(0)=1). Then the number of ways to tile a number of cells with a fixed colour block at a position is ways(b)*ways(a). Adding up all possible configurations yields the answer for ways(i).
Now I chose the fixed position to be N/2 since that halves the range and you can halve a range at most ceil(log(N)) times. Now since you are moving a block about N/2 you will have to calculate from N/2-750 to N/2-750, where 750 is the max length a block can have. So you will have to calculate about 750*ceil(log(N)) (a bit more because of the variance) lengths to get the final answer.
So in order to get good performance you have to through in memoisation, since this inherently a recursive algorithm.
So using Python(since I was lazy and didn't want to write a big number class):
T = int(raw_input())
for case in xrange(T):
#read in the data
C = int(raw_input())
lengths = map(int, raw_input().split())
minlength = min(lengths)
n = int(raw_input())
#setup memoisation, note all lengths less than the minimum length are
#set to 0 as the algorithm needs this
memoise = {}
memoise[0] = 1
for length in xrange(1, minlength):
memoise[length] = 0
def solve(n):
global memoise
if n in memoise:
return memoise[n]
ans = 0
for i in xrange(C):
if lengths[i] > n:
continue
if lengths[i] == n:
ans += 1
ans %= 100000007
continue
for j in xrange(0, lengths[i]):
b = n/2-lengths[i]+j
a = n-(n/2+j)
if b < 0 or a < 0:
continue
ans += solve(b)*solve(a)
ans %= 100000007
memoise[n] = ans
return memoise[n]
solve(n)
print "Case %d: %d" % (case+1, memoise[n])
Note I haven't exhaustively tested this, but I'm quite sure it will meet the 20 second time limit, if you translated this algorithm to C++ or somesuch.
EDIT: Running a test with N = 10^15 and a block with length 750 I get that memoise contains about 60000 elements which means non-lookup bit of solve(n) is called about the same number of time.
A word of caution: In the case c=2, len1=1, len2=2, the answer will be the N'th Fibonacci number, and the Fibonacci numbers grow (approximately) exponentially with a growth factor of the golden ratio, phi ~ 1.61803399. For the
huge value N=10^15, the answer will be about phi^(10^15), an enormous number. The answer will have storage
requirements on the order of (ln(phi^(10^15))/ln(2)) / (8 * 2^40) ~ 79 terabytes. Since you can't even access 79
terabytes in 20 seconds, it's unlikely you can meet the speed requirements in this special case.
Your best hope occurs when C is not too large, and leni is large for all i. In such cases, the answer will
still grow exponentially with N, but the growth factor may be much smaller.
I recommend that you first construct the integer matrix M which will compute the (i+1,..., i+k)
terms in your sequence based on the (i, ..., i+k-1) terms. (only row k+1 of this matrix is interesting).
Compute the first k entries "by hand", then calculate M^(10^15) based on the repeated squaring
trick, and apply it to terms (0...k-1).
The (integer) entries of the matrix will grow exponentially, perhaps too fast to handle. If this is the case, do the
very same calculation, but modulo p, for several moderate-sized prime numbers p. This will allow you to obtain
your answer modulo p, for various p, without using a matrix of bigints. After using enough primes so that you know their product
is larger than your answer, you can use the so-called "Chinese remainder theorem" to recover
your answer from your mod-p answers.
I'd like to build on the earlier #JPvdMerwe solution with some improvements. In his answer, #JPvdMerwe uses a Dynamic Programming / memoisation approach, which I agree is the way to go on this problem. Dividing the problem recursively into two smaller problems and remembering previously computed results is quite efficient.
I'd like to suggest several improvements that would speed things up even further:
Instead of going over all the ways the block in the middle can be positioned, you only need to go over the first half, and multiply the solution by 2. This is because the second half of the cases are symmetrical. For odd-length blocks you would still need to take the centered position as a seperate case.
In general, iterative implementations can be several magnitudes faster than recursive ones. This is because a recursive implementation incurs bookkeeping overhead for each function call. It can be a challenge to convert a solution to its iterative cousin, but it is usually possible. The #JPvdMerwe solution can be made iterative by using a stack to store intermediate values.
Modulo operations are expensive, as are multiplications to a lesser extent. The number of multiplications and modulos can be decreased by approximately a factor C=100 by switching the color-loop with the position-loop. This allows you to add the return values of several calls to solve() before doing a multiplication and modulo.
A good way to test the performance of a solution is with a pathological case. The following could be especially daunting: length 10^15, C=100, prime block sizes.
Hope this helps.
In the above answer
ans += 1
ans %= 100000007
could be much faster without general modulo :
ans += 1
if ans == 100000007 then ans = 0
Please see TopCoder thread for a solution. No one was close enough to find the answer in this thread.

Resources