Algorithm for ProjectEuler Problem 99 - algorithm

From http://projecteuler.net/index.php?section=problems&id=99
Comparing two numbers written in index
form like 211 and
37 is not difficult, as any
calculator would confirm that
211 = 2048 37 =
2187.
However, confirming that
632382518061 >
519432525806 would be much
more difficult, as both numbers
contain over three million digits.
Using base_exp.txt (right click
and 'Save Link/Target As...'), a 22K
text file containing one thousand
lines with a base/exponent pair on
each line, determine which line number
has the greatest numerical value.
How might I approach this?

Not a full solution, but some ideas. You can use the following formula:
log(ax) = x*loga
The log10 can easily be estimated as number of digits. The log2 can easily be estimated by counting right shifts.
Based on the above you could narrow the list significantly. For the remaining numbers, you would have to do full calculations. Are math functions allowed in project Euler? If yes, it would be better to use logarithms.

Since the logarithm is a monotonic function, instead of ax you could compare x * log a to find the maximum. You might need to take numerical precision into account, though.

One possible approach would be to use logarithm identities (that is, ab is identical to eb * ln a). As a matter of fact, ab is identical to baseb * logbase a for all bases except 0 and 1.

Comparing exponent * log(base) instead of base^exponent for each line in the file works for this problem without taking precision into account. This is certainly the best solution from a mathematical perspective, but I just wanted to point out that this isn't really necessary.
Another possible solution is to divide every number in the file by some constant (say 100,000) before performing the exponentiation on the now smaller numbers. Since you're comparing all of the values to each other, scaling them all down by a constant factor doesn't affect the final outcome.

Related

Is there a better way to generate all equal arithmetic sequences using numbers 1 to 10?

Problem:
The numbers from 1 to 10 are given. Put the equal sign(somewhere between
them) and any arithmetic operator {+ - * /} so that a perfect integer
equality is obtained(both the final result and the partial results must be
integer)
Example:
1*2*3*4*5/6+7=8+9+10
1*2*3*4*5/6+7-8=9+10
My first idea to resolve this was using backtracking:
Generate all possibilities of putting operators between the numbers
For one such possibility replace all the operators, one by one, with the equal sign and check if we have two equal results
But this solution takes a lot of time.
So, my question is: Is there a faster solution, maybe something that uses the operator properties or some other cool math trick ?
I'd start with the equals sign. Pick a possible location for that, and split your sequence there. For left and right side independently, find all possible results you could get for each, and store them in a dict. Then match them up later on.
Finding all 226 solutions took my Python program, based on this approach, less than 0.15 seconds. So there certainly is no need to optimize further, is there? Along the way, I computed a total of 20683 subexpressions for a single side of one equation. They are fairly well balenced: 10327 expressions for left hand sides and 10356 expressions for right hand sides.
If you want to be a bit more clever, you can try reduce the places where you even attempt division. In order to allov for division without remainder, the prime factors of the divisor must be contained in those of the dividend. So the dividend must be some product and that product must contain the factors of number by which you divide. 2, 3, 5 and 7 are prime numbers, so they can never be such divisors. 4 will never have two even numbers before it. So the only possible ways are 2*3*4*5/6, 4*5*6*7/8 and 3*4*5*6*7*8/9. But I'd say it's far easier to check whether a given division is possible as you go, without any need for cleverness.

Computing the square root of 1000+ bit word in C

Imagine that we have e.g. 1000 bit word in our memory. I'm wondering if there is any way to calcuate a square root of it (not necessarily accurate, lets say without floating point part). Or we've got only memory location and later specified various size.
I assume that our large number is one array (most significant bits at the beginning?). Square root is more or less half of original number. When trying to use Digit-by-digit algorithm there is a point when usnigned long long is not enough to remember partial result (subtraction with 01 extended number). How to solve it? What with getting single digit of the large number? Only by bitmask?
While thinking about pseudocode stucked at this questions. Any ideas?
How would you do it by hand? How would you divide a 1000 digit number by a 500 digit by hand? (Just think about the method, obviously it would be quite time consuming). Now with a square root, the method is very similar to division where you "guess" the first digit, then the second digit and so on and subtract things. It's just that for a square root, you subtract slightly different things (but not that different, calculating a square root can be done in a way very similar to a division except that with each digit added, the divisor changes).
I wouldn't want to tell you exactly how to do it, because that spoils the whole fun of discovering it yourself.
The trick is: Instead of thinking about the square root of x, think about finding a number y such that y*y = x. And as you improve y, recalculate x - y*y with the minimum effort.
Calculating square roots is very easily done with a binary search algorithm.
A pseudo-code algorithm:
Take a guess c: the 1000 bit value divided by 2 (simple bitshift).
Square it:
If the square (almost) equals your 1000 bit number you've got your answer
If the square is smaller than your number, you can be sure the root is between c and your upper bound.
If the square is larger than your number, you know that the root lies between your lower bound and c.
Repeat until you have found your root, while keeping track of your upper and lower bound.
This kind of algorithm should run in log (N) time.
Kind of depends on how accurate you want it. Consider that the square root of 2^32 == 2^16. So one thing you could do is shift the 1000-bit number 500 bits to the right, and you have an answer that would be in the ballpark.
How well does this work? Let's see. The number 36 in binary is 100100. If I shift that to the right 3 bits, then I get 4. Hmmm ... should be 6. Pretty big error of 33%. The square root of 1,000,000 is 1,000. In binary, 1,000,000 is 1111 0100 0010 0100 0000. That's 20 bits. Shifted right 10 bits, it's 1111 0100 00, or 976. The error is 24/1000, or 2.4%.
When you get to a 1,000 bit number, the absolute error might be large, but the percentage error is going to be very small.
Depending on how you're storing the numbers, shifting a 1,000 bit number 500 bits to the right shouldn't be terribly difficult.
Newton's method is probably the way to go. At some point with Newton's method you're going to have to perform a division (in particular, when finding the next point to test), but it might be okay to approximate this to the nearest power of two and just do a bitshift instead.

How to quickly determine if two sets of checksums are equal, with the same "strength" as the individual checksums

Say you have two unordered sets of checksums, one of size N and one of size M. Depending on the algorithm to compare them, you may not even know the sizes but can compare N != M for a quick abort if you do.
The hashing function used for a checksum has some chance of collision, which as a layman I'm foolishly referring to as "strength". Is there a way to take two sets of checksums, all made from the same hashing function, and quickly compare them (so comparing element to element is right out) with the same basic chance of collision between two sets as there is between two individual checksums?
For instance, one method would be to compute a "set checksum" by XORing all of the checksums in the set. This new single hash is used for comparing with other sets' hashes, meaning storage of size is no longer necessary. Especially since it can be modified for the addition/removal of an element checksum by XORing with the set's checksum without having to recompute the whole thing. But does that reduce the "strength" of the set's checksum compared to a brute force comparison of all the original ones? Is there a way to conglomerate the checksums of a set that doesn't reduce the "strength" (as much?) but still is less complex than a straight comparison of the set elements' checksums?
After my initial comment, I got to thinking about the math behind it. Here's what I came up with. I'm no expert so feel free to jump in with corrections. Note: This all assumes your hash function is uniformly distributed, as it should be.
Basically, the more bits in your checksum, the lower the chance of collision. The more files, the higher.
First, let's find the odds of a collision with a single pair of files XOR'd together. We'll work with small numbers at first, so let's assume our checksum is 4 bits(0-15), and we'll call it n.
With two sums, the total number of bits 2n(8), so there are 2^(2n)(256) possibilities total. However, we're only interested in the collisions. To collide an XOR, you need to flip the same bits in both sums. There are only 2^n(16) ways to do that, since we're using n bits.
So, the overall probability of a collision is 16/256, which is (2^n) / (2^(2n)), or simply 1/(n^2). That means the probability of a non-collision is 1 - (1/(n^2)). So, for our sample n, that means that it's only 15/16 secure, or 93.75%. Of course, for bigger checksums, it's better. Even for a puny n=16, you get 99.998%
That's for a single comparison, of course. Since you're rolling them all together, you're doing f-1 comparisons, where f is the number of files. To get the total odds of a collision that way, you take the f-1 power of the odds we got in the first step.
So, for ten files with a 4-bit checksum, we get pretty terrible results:
(15/16) ^ 9 = 55.92% chance of non-collision
This rapidly gets better as we add bits, even when we increase the number of files.
For 10 files with a 8-bit checksum:
(255/256) ^ 9 = 96.54%
For 100/1000 files with 16 bits:
(65536/65536) ^ 99 = 99.85%
(65536/65536) ^ 999 = 98.49%
As you can see, we're still working with small checksums. If you're using anything >= 32 bits, my calculator gets off into floating-point rounding errors when I try to do the math on it.
TL,DR:
Where n is the number of checksum bits and f is the number of files in each set:
nonCollisionChance = ( ((2^n)-1) / (2^n) ) ^ (f-1)
collisionChance = 1 - ( ((2^n)-1) / (2^n) ) ^ (f-1)
Your method of XOR'ing a bunch of checksums together is probably just fine.

Bin-packing (or knapsack?) problem

I have a collection of 43 to 50 numbers ranging from 0.133 to 0.005 (but mostly on the small side). I would like to find, if possible, all combinations that have a sum between L and R, which are very close together.*
The brute-force method takes 243 to 250 steps, which isn't feasible. What's a good method to use here?
Edit: The combinations will be used in a calculation and discarded. (If you're writing code, you can assume they're simply output; I'll modify as needed.) The number of combinations will presumably be far too large to hold in memory.
* L = 0.5877866649021190081897311406, R = 0.5918521703507438353981412820.
The basic idea is to convert it to an integer knapsack problem (which is easy).
Choose a small real number e and round numbers in your original problem to ones representable as k*e with integer k. The smaller e, the larger the integers will be (efficiency tradeoff) but the solution of the modified problem will be closer to your original one. An e=d/(4*43) where d is the width of your target interval should be small enough.
If the modified problem has an exact solution summing to the middle (rounded to e) of your target interval, then the original problem has one somewhere within the interval.
You haven't given us enough information. But it sounds like you are in trouble if you actually want to OUTPUT every possible combination. For example, consistent with what you told us are that every number is ~.027. If this is the case, then every collection of half of the elements with satisfy your criterion. But there are 43 Choose 21 such sets, which means you have to output at least 1052049481860 sets. (too many to be feasible)
Certainly the running time will be no better than the length of the required output.
Actually, there is a quicker way around this:
(python)
sums_possible = [(0, [])]
# sums_possible is an array of tuples like this: (number, numbers_that_yield_this_sum_array)
for number in numbers:
sums_possible_for_this_number = []
for sum in sums_possible:
sums_possible_for_this_number.insert((number + sum[0], sum[1] + [number]))
sums_possible = sums_possible + sums_possible_for_this_number
results = [sum[1] for sum in sums_possible if sum[0]>=L and sum[1]<=R]
Also, Aaron is right, so this may or may not be feasible for you

Best algorithm for avoiding loss of precision?

A recent homework assignment I have received asks us to take expressions which could create a loss of precision when performed in the computer, and alter them so that this loss is avoided.
Unfortunately, the directions for doing this haven't been made very clear. From watching various examples being performed, I know that there are certain methods of doing this: using Taylor series, using conjugates if square roots are involved, or finding a common denominator when two fractions are being subtracted.
However, I'm having some trouble noticing exactly when loss of precision is going to occur. So far the only thing I know for certain is that when you subtract two numbers that are close to being the same, loss of precision occurs since high order digits are significant, and you lose those from round off.
My question is what are some other common situations I should be looking for, and what are considered 'good' methods of approaching them?
For example, here is one problem:
f(x) = tan(x) − sin(x) when x ~ 0
What is the best and worst algorithm for evaluating this out of these three choices:
(a) (1/ cos(x) − 1) sin(x),
(b) (x^3)/2
(c) tan(x)*(sin(x)^2)/(cos(x) + 1).
I understand that when x is close to zero, tan(x) and sin(x) are nearly the same. I don't understand how or why any of these algorithms are better or worse for solving the problem.
Another rule of thumb usually used is this: When adding a long series of numbers, start adding from numbers closest to zero and end with the biggest numbers.
Explaining why this is good is abit tricky. when you're adding small numbers to a large numbers, there is a chance they will be completely discarded because they are smaller than then lowest digit in the current mantissa of a large number. take for instance this situation:
a = 1,000,000;
do 100,000,000 time:
a += 0.01;
if 0.01 is smaller than the lowest mantissa digit, then the loop does nothing and the end result is a == 1,000,000
but if you do this like this:
a = 0;
do 100,000,000 time:
a += 0.01;
a += 1,000,000;
Than the low number slowly grow and you're more likely to end up with something close to a == 2,000,000 which is the right answer.
This is ofcourse an extreme example but I hope you get the idea.
I had to take a numerics class back when I was an undergrad, and it was thoroughly painful. Anyhow, IEEE 754 is the floating point standard typically implemented by modern CPUs. It's useful to understand the basics of it, as this gives you a lot of intuition about what not to do. The simplified explanation of it is that computers store floating point numbers in something like base-2 scientific notation with a fixed number of digits (bits) for the exponent and for the mantissa. This means that the larger the absolute value of a number, the less precisely it can be represented. For 32-bit floats in IEEE 754, half of the possible bit patterns represent between -1 and 1, even though numbers up to about 10^38 are representable with a 32-bit float. For values larger than 2^24 (approximately 16.7 million) a 32-bit float cannot represent all integers exactly.
What this means for you is that you generally want to avoid the following:
Having intermediate values be large when the final answer is expected to be small.
Adding/subtracting small numbers to/from large numbers. For example, if you wrote something like:
for(float index = 17000000; index < 17000001; index++) {}
This loop would never terminate becuase 17,000,000 + 1 is rounded down to 17,000,000.
If you had something like:
float foo = 10000000 - 10000000.0001
The value for foo would be 0, not -0.0001, due to rounding error.
My question is what are some other
common situations I should be looking
for, and what are considered 'good'
methods of approaching them?
There are several ways you can have severe or even catastrophic loss of precision.
The most important reason is that floating-point numbers have a limited number of digits, e.g..doubles have 53 bits. That means if you have "useless" digits which are not part of the solution but must be stored, you lose precision.
For example (We are using decimal types for demonstration):
2.598765000000000000000000000100 -
2.598765000000000000000000000099
The interesting part is the 100-99 = 1 answer. As 2.598765 is equal in both cases, it
does not change the result, but waste 8 digits. Much worse, because the computer doesn't
know that the digits is useless, it is forced to store it and crams 21 zeroes after it,
wasting at all 29 digits. Unfortunately there is no way to circumvent it for differences,
but there are other cases, e.g. exp(x)-1 which is a function occuring very often in physics.
The exp function near 0 is almost linear, but it enforces a 1 as leading digit. So with 12
significant digits
exp(0.001)-1 = 1.00100050017 - 1 = 1.00050017e-3
If we use instead a function expm1(), use the taylor series:
1 + x +x^2/2 +x^3/6 ... -1 =
x +x^2/2 +x^3/6 =: expm1(x)
expm1(0.001) = 1.00500166667e-3
Much better.
The second problem are functions with a very steep slope like tangent of x near pi/2.
tan(11) has a slope of 50000 which means that any small deviation caused by rounding errors
before will be amplified by the factor 50000 ! Or you have singularities if e.g. the result approaches 0/0, that means it can have any value.
In both cases you create a substitute function, simplying the original function. It is of no use to highlight the different solution approaches because without training you will simply not "see" the problem in the first place.
A very good book to learn and train: Forman S. Acton: Real Computing made real
Another thing to avoid is subtracting numbers that are nearly equal, as this can also lead to increased sensitivity to roundoff error. For values near 0, cos(x) will be close to 1, so 1/cos(x) - 1 is one of those subtractions that you'd like to avoid if possible, so I would say that (a) should be avoided.

Resources