Project Euler #75: ways to optimize the algorithm - algorithm

I'm looking at ways to optimize my algorithm for solving Project Euler #75, two things I have done so far are,
Only check L with even values as this can be easily proved.
Store L values that have been verified to have only one way to form an integer sided right angle triangle. Later on, when checking a new L value, I look for L's divisors that are already verified to have this quality. If there are 2 or more divisors, then this value is skipped. E.g. 12, 30 and 40 are stored (24, 36, etc. are not stored because they are really enlarged versions of 12), so when I see 60 or 120, I can quickly determine that they should be skipped.
However my algorithm is still not quick enough. Do you have other suggestions or links to relevant articles? Thanks.

http://en.wikipedia.org/wiki/Pythagorean_triple
and
http://en.wikipedia.org/wiki/Formulas_for_generating_Pythagorean_triples
EDIT
I just solved the problem, using one of this formulas, If you need extra hint just post comment

Related

Need an algorithm approach to calculate meal plan

I’m having trouble solving a deceptively simple problem. My girlfriend and I are trying to formulate weekly meal plans and I had this brilliant idea that I could optimize what we buy in order to maximize the things that we could make from it. The trouble is, the problem is not as easy as it appears. Here’s the problem statement in a nutshell:
The problem:
Given a list of 100 ingredients and a list of 50 dishes that are composed of one or more of the 100 ingredients, find a list of 32 ingredients that can produce the maximum number of dishes.
This problem seems simple, but I’m finding that computing the answer is not trivial. The approach that I’ve taken is that I’ve computed a combination of the 32 ingredients as a 100 bit string with 32 of the bits set. Then I do a check of what dishes can be made with that ingredient number. If the number of dishes is greater than the current maximum, I save off the list. Then I compute the next valid ingredient combination and repeat, repeat, and repeat.
The number of combinations of the 32 ingredients is staggering! The way that I see it, it would take about 300 trillion years to calculate using my method. I’ve optimized the code so that each combination takes a mere 75 microseconds to figure out. Assuming that I can optimize the code, I might be able to reduce the run time to a mere trillion years.
I’m thinking that a completely new approach is in order. I'm currently coding this in XOJO (REALbasic), but I think the real problem is with approach rather than specific implementation. Anybody have an idea for an approach that has a chance of completion during this century?
Thanks,
Ron
mcdowella's branch and bound solution will be a big improvement over exhaustive enumeration, but it might still take a few thousand years. This is the kind of problem that is really best solved by an ILP solver.
Assuming that the set of ingredients for meal i is given by R[i] = { R[i][1], R[i][2], ..., R[i][|R[i]|] }, you can encode the problem as follows:
Create an integer variable x[i] for each ingredient 1 <= i <= 100. Each of these variables should be constrained to the range [0, 1].
Create an integer variable y[i] for each meal 1 <= i <= 50. Each of these variables should be constrained to the range [0, 1].
For each meal i, create |R[i]| additional constraints of the form y[i] <= x[R[i][j]] for 1 <= j <= |R[i]|. These will guarantee that we can only set y[i] to 1 if all of meal i's ingredients have been included.
Add a constraint that the sum of all x[i] must be <= 32.
Finally, the objective function should be the sum of all y[i], and we should be trying to maximise this.
Solving this will produce assignments for all the variables x[i]: 1 means the ingredient should be included, 0 means it should not.
My feeling is that a commercial ILP solver like CPLEX or Gurobi will probably solve a 150-variable ILP problem like this in milliseconds; even freely available solvers like lp_solve, which as a rule are much slower, should have no problems. In the unlikely case that it seems to be taking forever, you can still solve the LP relaxation, which will be very fast (milliseconds) and will give you (a) an upper bound on the maximum number of meals that can be prepared and (b) "hints" in the variable values: although the x[i] will in general not be exactly 0 or 1, values close to 1 are suggestive of ingredients that should be included, while values close to 0 suggest unhelpful ingredients.
There will be a http://en.wikipedia.org/wiki/Branch_and_bound solution to this, but it may be too expensive to get the exact answer - ILP as suggested by j_random_hacker is probably better - the LP relaxation of that is probably a better heuristic than the relaxation proposed here, and the ILP solver will be heavily optimized.
The basic idea is that you do a recursive depth first search of a tree of partial solutions, extending them one at a time. Once you recurse far enough down to reach a fully populated solution you can start keeping track of the best solution found so far. If I label your ingredients A, B, C, D... a partial solution is a list of ingredients of length <= 32. You start with the zero-length solution, then when you visit a partial solution e.g. ABC you consider ABCD, ABCE, ... and so on, and may visit some of these.
For each partial solution you work out the maximum score that any descendant of that solution could achieve. Getting an accurate idea of this is important. Here is one suggestion - suppose you have a partial solution of length 20. This leaves 12 ingredients to be chosen, so the best you could possibly do is to make all dishes which require no more than 12 ingredients not already in the 20 you have chosen so far work out how many of those there are and this is one example of a best possible score to any descendant of the partial solution.
Now when you consider extending the partial solution ABC to ABCD or ABCE or ABCF... if you have a best solution found so far you can ignore all extensions that cannot possibly score more than the best solution so far - this means that you don't need to consider all possible combinations of your 32 ingredients.
Once you have worked out which of the possible extensions might contain a new best answer, your recursive search should continue with the most promising of these possible extensions, because this is the one most likely to survive finding a better best solution so far.
One way to make this fast is to code it cleverly so that recursing up and down means only small changes to the existing data structure which you typically make on the way down and reverse on the way up.
Another way is to cut corners. One obvious way is to stop when you run out of time and go for the best solution found so far at that stage. Another way is to discard partial solutions more aggressively. If you have a score so far of e.g. 100 you could discard partial solutions that couldn't score any better than 110. This speeds up the search, and you know that although you might have best a better answer than 100 whatever you missed could not have been better than 110.
Solving some discrete mathematics huh? Well here is the wiki.
You also have not factored in anything about quantity. For example, flour would be used in a lot of fried recipes but buying 10 pounds of flour might not be great. And cost might be prohibitive for some ingredients that your solution wants. Not to mention a lot of ingredients are in everything. (milk, water, salt, pepper, sugar things like that)
In reality, optimization to this degree is probably not necessary. But I will not provide relationship advice on SO.
As for a new solution:
I would suggest identifying a lot of what you want to make and with what, and then writing a program to suggest things to make with the rest.
Why not just order the list of ingredients by the number of dishes they are used in?
This would be more like a greedy solution, of course, but it should give you some clues about what ingredients are most often used. From that you can compile a list of dishes that can be cooked already with the top 30 (or whatever) ingredients.
Also you could order the list of remaining (non-cookable) dishes by number of missing ingredients and maybe try to optimize on that to maximize the number of cookable dishes.
To be more "algorithmic", I think a local search is most promising here. Start with a candidate solution (random assignments to the 32 ingredients) and calculate as a fitness function the number of cookable dishes. Then check the neighboring states (switching one ingredient) and move to the state with the highest value. Repeat until a maximum is reached. Do this veeeery often and you should find a good solution. (This would be a simple greedy hill-climbing algorithm)
There are a lot of local search algorithms, you should be able to find more than enough information on the net. Most often you won't find the optimal solution (of course that depends on the problem), but a very good one nonetheless.

Partition matrix to minimize variance of parts

I have a matrix of real numbers and I would like to find a partition of this matrix such that the both the number of parts and the variance of the numbers in each part are minimized. Intuitively, I want as few parts as possible, but I also want all the numbers within any given part to be close together.
More formally, I suppose for the latter I would find for each part the variance of the numbers in that part, and then take the average of those variances over all the parts. This would be part of the "score" for a given solution, the other part of the score would be, for instance, the total number of elements in the matrix minus the number of parts in the partition, so that fewer parts would lead to this part of the score being higher. The final score for the solution would be a weighted average of the two parts, and the best solution is the one with the highest score.
Obviously a lot of this is heuristic: I need to decide how to balance the number of parts versus the variances. But I'm stuck for even a general approach to the problem.
For instance, given the following simple matrix:
10, 11, 12, 20, 21
8, 13, 9, 22, 23
25, 23, 24, 26, 27
It would be a reasonable solution to partition into the following submatrices:
10, 11, 12 | 20, 21
8, 13, 9 | 22, 23
--------------+----------
25, 23, 24 | 26, 27
Partitioning is only allowed by slicing vertically and horizontally.
Note that I don't need the optimal solution, I just need an approach to get a "good" solution. Also, these matrices are several hundred by several hundred, so brute forcing it is probably not a reasonable solution, unless someone can propose a good way to pare down the search space.
I think you'd be better off by starting with a simpler problem. Let's call this
Problem A: given a fixed number of vertical and/or horizontal partitions, where should they go to minimize the sum of variances (or perhaps some other measure of variation, such as the sum of ranges within each block).
I'd suggest using a dynamic programming formulation for problem A.
Once you have that under control, then you can deal with
Problem B: find the best trade-off between variation and the number of vertical and horizontal partitions.
Obviously, you can reduce the variance to 0 by putting each element into its own block. In general, problem B requires you to solve problem A for each choice of vertical and horizontal partition counts that is considered.
To use a dynamic programming approach for problem B, you would have to formulate an objective function that encapsulates the trade-off you seek. I'm not sure how feasible this is, so I'd suggest looking for different approaches.
As it stands, problem B is a 2D problem. You might find some success looking at 2D clustering algorithms. An alternative might be possible if it can be reformulated as a 1D problem: trading off variation with the number of blocks (instead of the number of vertical and horizontal partition count). Then you could use something like the Jenks natural breaks classification method to decide where to draw the line(s).
Anyway, this answer clearly doesn't give you a working algorithm. But I hope that it does at least provide an approach (which is all you asked for :)).

Need help creating a square matrix where the number of columns/rows is defined by the user

First of all, I'm a ruby newbie, so I ask that you have patience with me. :)
Second of all, before you read the request and think I'm trying to get easy answers, trust me, I've spent the last 7 days searching for them online, but haven't found any that answered my very specific question. And third, sorry about the long description, but the help I need is to be pointed in the right direction.
I had an idea for a small class project about genetic drift. In population genetics, a probability matrix is used to give the probability that the frequency of an allele will change from i to j in generation t1 to generation t2.
So, say I start with one copy of allele B in t1 and want to know the probability of it going to three copies in t2. The value (as given by the binomial distribution, to which I have already wrote a small code, which works nicely) then would go in the cell corresponding to column 1, row 3 (perhaps this can clarify things better: https://docs.google.com/viewer?url=http%3A%2F%2Fsamples.jbpub.com%2F9780763757373%2F57373_CH04_FINAL.pdf).
What I don't know how to do and would like to get information on is:
how do I make a square matrix where the number of rows/columns is determined by the user (say someone wants to get the probability matrix for a population of 4, which has 8 allele copies, but someone else wants to get the probability matrix for a population of 100, which has 200 allele copies?)
how do I apply the binomial distribution equation to the values of each one of the different column/row combinations (i.e. in the cell corresponding to column 1, row 3, the value would be determined by the binomial equation with variables 1 and 3; and in the cell corresponding to column 4, row 7, the value would be determined by the binomial equation with variables 4 and 7). The number of different combinations of variables (like 1 and 1, 1 and 2, 1 and 3, etc) is determined by the number of columns/rows set by the user.
I'm not asking anyone to give me the code or do my work for me, what I'm asking is for you, seasoned programmers, to point me in the direction of the correct answers, since I've so miserably failed in finding this direction. Should I be looking into arrays instead of matrices? Should I be looking into specific iterators? Which? Does anyone have more specific material I could look into, or could give me tips based on experience with creating matrices? I really want to learn ruby, and learn how to do this, not just get it done.
For generating a matrix, you might wish to take a look at the Matrix class: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/matrix/rdoc/Matrix.html which has a method Matrix.build(row_size, column_size) looking like a good fit to your problem. It even takes a block that you can use to generate values:
require 'matrix'
Matrix.build( 5, 5 ) do |row, col|
binomial_function( row, col )
end
Obviously you will need to write the binomial function too - seems like you may have done that already?
How to make the rows/columns a user choice, depends on how you want end users to run your code. You should probably make that clear in another question, there some differences in approach between a web site and a command line script.

Is there an efficient algorithm to generate random points in general position in the plane?

I need to generate n random points in general position in the plane, i.e. no three points can lie on a same line. Points should have coordinates that are integers and lie inside a fixed square m x m. What would be the best algorithm to solve such a problem?
Update: square is aligned with the axes.
Since they're integers within a square, treat them as points in a bitmap. When you add a point after the first, use Bresenham's algorithm to paint all pixels on each of the lines going through the new point and one of the old ones. When you need to add a new point, get a random location and check if it's clear; otherwise, try again. Since each pair of pixels gives a new line, and thus excludes up to m-2 other pixels, as the number of points grows you will have several random choices rejected before you find a good one. The advantage of the approach I'm suggesting is that you only pay the cost of going through all lines when you have a good choice, while rejecting a bad one is a very quick test.
(if you want to use a different definition of line, just replace Bresenham's with the appropriate algorithm)
Can't see any way around checking each point as you add it, either by (a) running through all of the possible lines it could be on, or (b) eliminating conflicting points as you go along to reduce the possible locations for the next point. Of the two, (b) seems like it could give you better performance.
Similar to #LaC's answer. If memory is not a problem, you could do it like this:
Add all points on the plane to a list (L).
Shuffle the list.
For each point (P) in the list,
For each point (Q) previously picked,
Remove every point from L which are linear to P-Q.
Add P to the picked list.
You could continue the outer loop until you have enough points, or run out of them.
This might just work (though might be a little constrained on being random). Find the largest circle you can draw within the square (this seems very doable). Pick any n points on the circle, no three will ever be collinear :-).
This should be an easy enough task in code. Say the circle is centered at origin (so something of the form x^2 + y^2 = r^2). Assuming r is fixed and x randomly generated, you can solve to find y coordinates. This gives you two points on the circle for every x which are diametrically opposite. Hope this helps.
Edit: Oh, integer points, just noticed that. Thats a pity. I'm going to keep this solution up though - since I like the idea
Both #LaC's and #MizardX's solution are very interesting, but you can combine them to get even better solution.
The problem with #LaC's solution is that you get random choices rejected. The more points you have already generated the harder it gets to generate new ones. If there is only one available position left you have slight chance of randomly choosing it (1/(n*m)).
In the #MizardX's solution you never get rejected choices, however if you directly implement the "Remove every point from L which are linear to P-Q." step you'll get worse complexity (O(n^5)).
Instead it would be better to use a bitmap to find which points from L are to be removed. The bitmap would contain a value indicating whether a point is free to use and what is its location on the L list or a value indicating that this point is already crossed out. This way you get worst-case complexity of O(n^4) which is probably optimal.
EDIT:
I've just found that question: Generate Non-Degenerate Point Set in 2D - C++
It's very similar to this one. It would be good to use solution from this answer Generate Non-Degenerate Point Set in 2D - C++. Modifying it a bit to use radix or bucket sort and adding all the n^2 possible points to the P set initially and shufflying it, one can also get worst-case complexity of O(n^4) with a much simpler code. Moreover, if space is a problem and #LaC's solution is not feasible due to space requirements, then this algorithm will just fit in without modifications and offer a decent complexity.
Here is a paper that can maybe solve your problem:
"POINT-SETS IN GENERAL POSITION WITH MANY
SIMILAR COPIES OF A PATTERN"
by BERNARDO M. ABREGO AND SILVIA FERNANDEZ-MERCHANT
um, you don't specify which plane.. but just generate 3 random numbers and assign to x,y, and z
if 'the plane' is arbitrary, then set z=o every time or something...
do a check on x and y to see if they are in your m boundary,
compare the third x,y pair to see if it is on the same line as the first two... if it is, then regenerate the random values.

Programming Logic: Finding the smallest equation to a large number

I do not know a whole lot about math, so I don't know how to begin to google what I am looking for, so I rely on the intelligence of experts to help me understand what I am after...
I am trying to find the smallest string of equations for a particular large number. For example given the number
"39402006196394479212279040100143613805079739270465446667948293404245721771497210611414266254884915640806627990306816"
The smallest equation is 64^64 (that I know of) . It contains only 5 bytes.
Basically the program would reverse the math, instead of taking an expression and finding an answer, it takes an answer and finds the most simplistic expression. Simplistic is this case means smallest string, not really simple math.
Has this already been created? If so where can I find it? I am looking to take extremely HUGE numbers (10^10000000) and break them down to hopefully expressions that will be like 100 characters in length. Is this even possible? are modern CPUs/GPUs not capable of doing such big calculations?
Edit:
Ok. So finding the smallest equation takes WAY too much time, judging on answers. Is there anyway to bruteforce this and get the smallest found thus far?
For example given a number super super large. Sometimes taking the sqaureroot of number will result in an expression smaller than the number itself.
As far as what expressions it would start off it, well it would naturally try expressions that would the expression the smallest. I am sure there is tons of math things I dont know, but one of the ways to make a number a lot smaller is powers.
Just to throw another keyword in your Google hopper, see Kolmogorov Complexity. The Kolmogorov complexity of a string is the size of the smallest Turing machine that outputs the string, given an empty input. This is one way to formalize what you seem to be after. However, calculating the Kolmogorov complexity of a given string is known to be an undecidable problem :)
Hope this helps,
TJ
There's a good program to do that here:
http://mrob.com/pub/ries/index.html
I asked the question "what's the point of doing this", as I don't know if you're looking at this question from a mathemetics point of view, or a large number factoring point of view.
As other answers have considered the factoring point of view, I'll look at the maths angle. In particular, the problem you are describing is a compressibility problem. This is where you have a number, and want to describe it in the smallest algorithm. Highly random numbers have very poor compressibility, as to describe them you either have to write out all of the digits, or describe a deterministic algorithm which is only slightly smaller than the number itself.
There is currently no general mathemetical theorem which can determine if a representation of a number is the smallest possible for that number (although a lower bound can be discovered by understanding shannon's information theory). (I said general theorem, as special cases do exist).
As you said you don't know a whole lot of math, this is perhaps not a useful answer for you...
You're doing a form of lossless compression, and lossless compression doesn't work on random data. Suppose, to the contrary, that you had a way of compressing N-bit numbers into N-1-bit numbers. In that case, you'd have 2^N values to compress into 2^N-1 designations, which is an average of 2 values per designation, so your average designation couldn't be uncompressed. Lossless compression works well on relatively structured data, where data we're likely to get is compressed small, and data we aren't going to get actually grows some.
It's a little more complicated than that, since you're compressing partly by allowing more information per character. (There are a greater number of N-character sequences involving digits and operators than digits alone.) Still, you're not going to get lossless compression that, on the average, is better than just writing the whole numbers in binary.
It looks like you're basically wanting to do factoring on an arbitrarily large number. That is such a difficult problem that it actually serves as the cornerstone of modern-day cryptography.
This really appears to be a mathematics problem, and not programming or computer science problem. You should ask this on https://math.stackexchange.com/
While your question remains unclear, perhaps integer relation finding is what you are after.
EDIT:
There is some speculation that finding a "short" form is somehow related to the factoring problem. I don't believe that is true unless your definition requires a product as the answer. Consider the following pseudo-algorithm which is just sketch and for which no optimization is attempted.
If "shortest" is a well-defined concept, then in general you get "short" expressions by using small integers to large powers. If N is my integer, then I can find an integer nearby that is 0 mod 4. How close? Within +/- 2. I can find an integer within +/- 4 that is 0 mod 8. And so on. Now that's just the powers of 2. I can perform the same exercise with 3, 5, 7, etc. We can, for example, easily find the nearest integer that is simultaneously the product of powers of 2, 3, 5, 7, 11, 13, and 17, call it N_1. Now compute N-N_1, call it d_1. Maybe d_1 is "short". If so, then N_1 (expressed as power of the prime) + d_1 is the answer. If not, recurse to find a "short" expression for d_1.
We can also pick integers that are maybe farther away than our first choice; even though the difference d_1 is larger, it might have a shorter form.
The existence of an infinite number of primes means that there will always be numbers that cannot be simplified by factoring. What you're asking for is not possible, sorry.

Resources