Finding optimum path through graph search

Finding optimum path through graph search - algorithm

I am currently working on Euler 411 https://projecteuler.net/problem=411.
I have figured out the mod exponentiation simplification to find all the coordinates in a reasonable amount of time and store the coordinates in files (70-200MB).
I also can plot coordinates and possible solutions. This is not the optimal solution. The optimal solution for this problem hits the maximum amount of stations.
Here's an image of N = 10000, PE reports 48 is the correct answer. The red line approximator gets 36. 504 coordinates.
N = 7**5 (16807) (actual from problem). Red line gets 159 points, 14406 unique coordinates.
This is a search problem right? Am I missing something? I have tried greedy search with a density heuristic to get an approximate search, but it is not good enough to approximate the solution to the biggest problems. It would take days to finish. I have not tried an exact search like A* because it would be slower than greedy. BFS is out of the question.
Any hints? NO SPOILERS PLEASE!! There must be a way to eliminate nodes from this massive search space I am missing.

Have you considered that there may be a pattern in where the points occur...and hence the function value? You should solve small cases (of k) by hand! Also check that there is nothing special about S(k^5). Finally, the second to last line of the problem statement seems a little suspicious, giving you particular information about S(123) and S(10000). If S(10000) is so low at forty-eight, it seems certain you are missing something and the search space need not be diabolical. So to my first reading, it does not appear to be a brute force search problem.

Related

What is a good algorithm to solve a non boolean puzzle?

and thanks for taking your time considering this question.
My problem is the solving of a non boolean puzzle that has no perfect solution.
In a regular puzzle, two pieces either match and can be placed next to each other, or don't match.
In a non boolean puzzle, any piece can be put next to any other one, but a matching score can be established for every junction between two pieces, between 0 and 1.
Here, let's say the best "solution" is the one that maximises the average score of all the junctions in the solved puzzle.
As with big puzzles it becomes impossible to test all solutions, I want an algorithm that gives a solution that is "relatively good".
In reality, I'm not dealing with a puzzle, but with a tiled image where the tiles are shuffled. We get the matching score of a tile junction by comparing their edge pixels.
An image with shuffeled tiles looks like this:
In my cases, the additional rules apply:
- No two pieces are the same. Two pieces might have the same edges however.
- The tiles doesn't have to get in a specific shape when solved. I want it to be roughly square/rectangle shaped however.
- It is possible to leave blank tiles with no pieces inside.
What I want is an algorithm idea that could give a decent solution is reasonable time (few minutes is totally ok).
If you think the definition of a good solution (currently: get the highest junction score in average in the final image) is wrong, feel free to change it to match your algorithm.
A problem I have with that solution is that currently, you can just place two tiles together to get a score of 1, and separate every single other tile with blank space to prevent them from getting a lower score. That technicly gives a perfect average score of 1, but the final image will contain moslty blank spaces, that's not what I want.
For comprehension sake, here is an answer I would find very good for the problem above:
Obviously I would be surprised to have that good of an output, but I would like at least that similar tiles get close to each other in the final image.
Thanks for your time, and remember any idea is welcome, even weird ones.

How to find the nearest line segment to a specific point more efficently?

This is a problem I came across frequently and I'm searching a more effective way to solve it. Take a look at this pics:
Let's say you want to find the shortest distance from the red point to a line segment an. Assume you only know the start/end point (x,y) of the segments and the point. Now this can be done in O(n), where n are the line segments, by checking every distance from the point to a line segment. This is IMO not effective, because in the worst case there have to be n-1 distance checks till the right one is found.
This can be a real performance issue for n = 1000 f.e. (which is a likely number), especially if the distance calculation isn't just done in the euclidean space by the Pythagorean theorem but for example by a geodesic method like the haversine formula or Vincenty's.
This is a general problem in different situations:
Is the point inside a radius of the vertices?
Which set of vertices is nearest to the point?
Is the point surrounded by line segments?
To answer these questions, the only approach I know is O(n). Now I would like to know if there is a data structure or a different strategy to solve these problems more efficiently?
To make it short: I'm searching a way, where the line segments / vertices could be "filtered" somehow to get a set of potential candidates before I start my distance calculations. Something to reduce the complexity to O(m) where m < n.

Probably not an acceptable answer, but too long for a comment: The most appropriate answer here depends on details that you did not state in the question.
If you only want to perform this test once, then there will be no way avoid a linear search. However, if you have a fixed set of lines (or a set of lines that does not change too significantly over time), then you may employ various techniques for accelerating the queries. These are sometimes referred to as Spatial Indices, like a Quadtree.
You'll have to expect a trade-off between several factors, like the query time and the memory consumption, or the query time and the time that is required for updating the data structure when the given set of lines changes. The latter also depends on whether it is a structural change (lines being added or removed), or whether only the positions of the existing lines change.

least squares approximate 1D data by a given amount horizontal lines

The problem is following. I have 1D array of data, that i need to approximate by a given amount of horizontal lines (for example, by 3 lines) in the optimal way (so, the summary error becomes minimal). The method of approximation should be as fast as possible (so, i cannot take every horizontal line, approximate data set, extract it value from data set and approximate the rest by the reamaining set of lines). Now, i have no idea how to do it except slightly feeling that the solution of this problem is linked to the solution of the maximum subarray problem. Please, could you give me some advices how to solve it?

The least-squares approximation of a 1D set of data is by definition its arithmetic mean, so there's one of your lines. I'm not sure what criterion you'd want to use for the other two.

Algorithm Problem Classification

There is great problem in one of the algo contest sites. I am trying to solve it for 5 days. I am not asking you to solve me this for me, as I am new to algorithms I would like to ask you help me with classification of this type problem, did anyone solved problems like this, what is the type of problem NP or not. Please do not think that I asking you to solve this for me, my purpose is just to learn algorithms and this is the problem which is enough difficult for me:
The goal of this puzzle is to determine where to place a set of gas
stations so that they are closest to airports. Airports make use of a
lot of gas for fueling planes, so placing gas stations close them is
of strategical importance.
Input Specification Your program should take one and only one command
line argument: the input file (passed in argv, args, arguments
depending on the language). The input file is formatted as follows:
the first line contains an integer: n the number of airports the n
following lines each contain 2 floating point values xi yi
representing the coordinates of the ith airport the following line
contains the number p of cases to analyze (p is always less than 5)
the following p lines each contain one integer gi giving the number of
required gas stations
Output Specifications:
You program should output
the result to the standard output (printf, print, echo, write): Your
output should contain p lines, each line providing the gj coordinates
xj,yj of the gas stations. Your solution score will be measured by the
quality of the solution. The quality of the solution is measured by
the total distance, the total distance D is the square root of the sum
of squared distances from each airport to its closest gas station. The
lower the total distance D, the higher your score will be.

This problem is the canonical unsupervised k-means classification problem. See here for the full details: http://en.wikipedia.org/wiki/K-means_clustering
For a quick hint (if you want to avoid complete spoilers) k-means simply starts by picking random locations for your gas stations. It improves the solution each iteration thereafter by reducing the cost of each individual gas station one at a time. It does this by moving a gas station with the goal of minimizing its cost for the set of airports that it currently fuels.

This seems to a be variant of the Facility location problem. Finding the optimal locations is NP-hard, but many approximation methods can be applied to find solutions within a certain guaranteed distance of the optimum. Alternatively, soft methods like clustering can be used, as proposed in other answers.

For the case gi=1. It is easy - you would just computer the center of gravity/mass (from all airports, heck you could even weigth each airport with the amount of fuel they consume, so it would place the fuel station closer to heavy consuming airports, but as this is not required you would give all the same weight). This would yield an optimal solution (this is also a good example that Nonlinear, global optimization does NOT necessary imply NP hard).
My idea would be to partition the set of airports into gi sets, and afterwards apply to each set the center-of-gravity/mass. This would be classified as a clustering problem (or maybe a partition, depends how you formulate it). (Practically I would apply a k-means clustering to solve this). (Here it gets indeed NP hard if you want perfect result, but maybe someone come up with another good solution)

Calculating a cutting list with the least amount of off cut waste

I am working on a project where I produce an aluminium extrusion cutting list.
The aluminium extrusions come in lengths of 5m.
I have a list of smaller lengths that need to be cut from the 5m lengths of aluminium extrusions.
The smaller lengths need to be cut in the order that produces the least amount of off cut waste from the 5m lengths of aluminium extrusions.
Currently I order the cutting list in such a way that generally the longest of the smaller lengths gets cut first and the shortest of smaller lengths gets cut last. The exception to this rule is whenever a shorter length will not fit in what is left of the 5m length of aluminium extrusion, I use the longest shorter length that will fit.
This seems to produce a very efficient (very little off cut waste) cutting list and doesn't take long to calculate. I imagine, however, that even though the cutting list is very efficient, it is not necessarily the most efficient.
Does anyone know of a way to calculate the most efficient cutting list which can be calculated in a reasonable amount of time?
EDIT: Thanks for the answers, I'll continue to use the "greedy" approach as it seems to be doing a very good job (out performs any human attempts to create an efficient cutting list) and is very fast.

This is a classic, difficult problem to solve efficiently. The algorithm you describe sounds like a Greedy Algorithm. Take a look at this Wikipedia article for more information: The Cutting Stock Problem

No specific ideas on this problem, I'm afraid - but you could look into a 'genetic algorithm' (which would go something like this)...
Place the lengths to cut in a random order and give that order a score based on how good a match it is to your ideal solution (0% waste, presumably).
Then, iteratively make random alterations to the order and re-score it. If the score is higher, ditch the result. If the score is lower, keep it and use it as the basis for your next calculation. Keep going until you get your score within acceptable limits.

What you described is indeed classified as a Cutting Stock problem, as Wheelie mentioned, and not a Bin Packing problem because you try to minimize the waste (sum of leftovers) rather than the number of extrusions used.
Both of those problems can be very hard to solve, but the 'best fit' algorithm you mentioned (using the longest 'small length' that fits the current extrusion) is likely to give you very good answers with a very low complexity.

Actually, since the size of material is fixed, but the requests are not, it's a bin packing problem.
Again, wikipedia to the rescue!
(Something I might have to look into for work too, so yay!)

That's an interesting problem because I suppose it depends on the quantity of each length you're producing. If they are all the same quantity and you can get Each different length onto one 5m extrusion then you have the optimum soloution.
However if they don't all fit onto one extrusion then you have a greater problem. To keep the same amount of cuts for each length you need to calculate how many lengths (not necessarily in order) can fit on one extrusion and then go in an order through each extrusion.

I've been struggling with this exact ( the lenght for my problem is 6 m) problem here too.
The solution I'm working on is a bit ugly, but I don't settle for your solution. Let me explain:
Stock size 5 m
Needs to cut in sizes(1 of each):
**3,5
1
1,5**
Your solution:
3,5 | 1 with a waste of 0,5
1,5 with a left over of 3,5
See the problem?
The solution I'm working on -> Brute force
1 - Test every possible solution
2 - Order the solutuion by their waste
3 - Choose the best solution
4 - Remove the items in the solution from the "Universe"
5 - Goto 1
I know it's time consuming (but I take 1h30 m to lunch... so... :) )
I really need the optimum solution (I do an almoust optimum solution by hand (+-) in excel) not just because I'm obsecive but also the product isn't cheap.
If anyone has an easy better solution I'd love it

The Column generation algorithm will quickly find a solution with the minimum possible waste.
To summarize, it works well because it doesn't generate all possible combinations of cuts that can fit on a raw material length. Instead, it iteratively solves for combinations that would improve the overall solution, until it reaches an optimum solution.
If anyone needs a working version of this, I've implemented it with python and posted it on GitHub: LengthNestPro

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio