Find nearest delivery centers to a given area code - algorithm

This was asked during interviewing process for a company. Suppose there is an interface to look for nearest delivery center to your area. All you have to enter is your zipcode/pincode and it returns the nearest delivery center. What would be the data structure and algorithm to do this? Like, you have broken your phone and want to go to a service center. You go to the company website and enter your zipcode to find out the nearest repair center. How does it do that?
I suggested a graph + hashmap solution where I will return the neighbouring nodes from a given node and addresses will be stored in hashmap w.r.t zipcodes but that wasn't good enough as the interviewer kept pressing on using the geographical property saying that you are not given the distance between two centers so how do you know which is the nearest and also if asked for top 3 nearest centers. I was not able to come up with any solution then. He was also asking me again and again what data you need to solve this thing. Would be really helpful to know what could be the approach for this as it has been bugging me for days. Thanks

Most algorithms deal with single points - just taking the centre point of a zip code area should suffice.
For a single nearest neighbour, a Voronoi diagram seem like the way to go.
It separates the space into regions such that, given any query point, we know which point is closest.
Taken from Wikipedia:
A kd-tree is also an option:
The k-d tree is a binary tree in which every node is a k-dimensional point. Every non-leaf node can be thought of as implicitly generating a splitting hyperplane that divides the space into two parts, known as half-spaces. Points to the left of this hyperplane are represented by the left subtree of that node and points right of the hyperplane are represented by the right subtree. The hyperplane direction is chosen in the following way: every node in the tree is associated with one of the k-dimensions, with the hyperplane perpendicular to that dimension's axis. So, for example, if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right subtree. In such a case, the hyperplane would be set by the x-value of the point, and its normal would be the unit x-axis.
Finding the k nearest neighbours is significantly more difficult. There is a k nearest neighbours algorithm, but this is a classification algorithm, so I'm not sure it helps here.
One option is to create a grid of the region. Then, given a point, we know which cell it's in, and we can simply query that cell and its neighbours until we've found the desired number of neighbours.
One just has to be careful here, as the next nearest point can actually be in another cell, e.g.:
--------------
| B|
A | X |
| |
| |
--------------
Given point X, the closest point is A, but B would be returned if we simply look in the same cell. We also need to look at all neighbouring cells after we've found k points.

You need the whole road network which is a sparse matrix containing the distance between all of the nodes. You also need to have the list of nodes containing the service centers. Having this, I think the A* algorithm should do the job in determining the distance between a given location and each service center, then picking up the least three in distance. I am certain there are more efficient algorithms but I believe the interviewer should concentrate on the way you think to resolve a problem rather than asking for implementation details such as data structures. Would I have to solve such a problem in real life, I would do a literature research first.
I am not sure about what strategy is best when facing such interviewer and if he would have accepted such a response. Being assertive and providing an overview of the solution before diping into the details might have been better.
Do not have regrets though. Benefit from the experience and move on. You do not know what bounties God has in store for you.

Related

Merge adjacent vertices of a graph until single vertex left in the fewest steps possible

I have a game system that can be represented as an undirected, unweighted graph where each vertex has one (relevant) property: a color. The goal of the game in terms of the graph representation is to reduce it down to one vertex in the fewest "steps" possible. In each step, the player can change the color of any one vertex, and all adjacent vertices of the same color are merged with it. (Note that in the example below I just happened to show the user only changing one specific vertex the whole game, but the user can pick any vertex in each step.)
What I am after is a way to compute the fewest amount of steps necessary to "beat" a given graph per the procedure described above, and also provide the specific moves needed to do so. I'm familiar with the basics of path-finding, BFS, and things of that nature, but I'm having a hard time framing this problem in terms of a "fastest path" solution.
I am unable to find this same problem anywhere on Google, or even a graph-theory term that encapsulates the problem. Does anyone have an idea of at least how to get started approaching this problem? Can anyone point me in the right direction?
EDIT Since this problem seems to be really difficult to solve efficiently, perhaps I could change the aim of my question. Could someone describe how I would even set up a brute force, breadth first search for this? (Brute force could possibly be okay, since in practice these graphs will only be 20 vertices at most.) I know how to write a BFS for a normal linked graph data structure... but in this case it seems quite weird since each vertex would have to contain a whole graph within itself, and the next vertices in the search graph would have to be generated based on possible moves to make in the graph within the vertex. How would one setup the data structure and search algorithm to accomplish this?
EDIT 2 This is an old question, but I figured it might help to just state outright what the game was. The game was essentially to be a rip-off of Kami 2 for iOS, except my custom puzzle editor would automatically figure out the quickest possible way to solve your puzzle, instead of having to find the shortest move number by trial and error yourself. I'm not sure if Kami was a completely original game concept, or if there is a whole class of games like it with the same "flood-fill" mechanic that I'm unaware of. If this is a common type of game, perhaps knowing the name of it could allow finding more literature on the algorithm I'm seeking.
EDIT 3 This Stack Overflow question seems like it may have some relevant insights.
Intuitively, the solution seems global. If you take a larger graph, for example, which dot you select first will have an impact on the direct neighbours which will have an impact on their neighbours and so on.
It sounds as if it were of the same breed of problems as the map colouring problem. Not because of the colours but because of the implications of a local selection to the other end of the graph down the road. In the map colouring, you have to decide what colour to draw a country and its neighbouring countries so two countries that touch don't have the same colour. That first set of selections have an impact on whether there is a solution in the subsequent iterations.
Just to show how complex problem is.
Lets check simpler problem where graph is changed with a tree, and only root vertex can change a colour. In that case path to a leaf can be represented as a sequence of colours of vertices on that path. Sequence A of colour changes collapses a leaf if leaf's sequence is subsequence of A.
Problem can be stated that for given set of sequences problem is to find minimal length sequence (S) so that each initial sequence is contained in S. That is called shortest common supersequence problem, and it is NP-complete.
Your problem is for sure more complex than this one :-/
Edit *
This is a comment on question's edit. Check this page for a terms.
Number of minimal possible moves is >= than graph radius. With that it seems good strategy to:
use central vertices for moves,
use moves that reduce graph radius, or at least reduce distance from central vertices to 'large' set of vertices.
I would go with a strategy that keeps track of central vertices and distances of all graph vertices to these central vertices. Step is to check all meaningful moves and choose one that reduce radius or distance to central vertices the most. I think BFS can be used for distance calculation and how move influences them. There are tricky parts, like when central vertices changes after moves. Maybe it is good idea to use not only central vertices but also vertices close to central.
I think the graph term you are looking for is the "valence" of a graph, which is the number of edges that a node is connected to. It looks like you want to change the color based on what node has the highest valence. Then in the resulting graph change the color for the node that has the highest valence, etc. until you have just one node left.

Trouble finding shortest path across a 2D mesh surface

I asked this question three days ago and I got burned by contributors because I didn't include enough information. I am sorry about that.
I have a 2D matrix and each array position relates to the depth of water in a channel, I was hoping to apply Dijkstra's or a similar "least cost path" algorithm to find out the least amount of concrete needed to build a bridge across the water.
It took some time to format the data into a clean version so I've learned some rudimentary Matlab skills doing that. I have removed most of the land so that now the shoreline is standardised to a certain value, my plan is to use a loop to move through each "pixel" on the "west" shore and run a least cost algorithm against it to the closest "east" shore and move through the entire mesh ultimately finding the least cost one.
This is my problem, fitting the data to any of the algorithms. Unfortunately I get overwhelmed by options and different formats because the other examples are for other use cases.
My other consideration is that when the shortest cost path is calculated that it will be a jagged line which would not be suitable for a bridge so I need to constrain the bend radius in the path if at all possible and I don't know how to go about doing that.
A picture of the channel:
Any advice in an approach method would be great, I just need to know if someone knows a method that should work, then I will spend the time learning how to fit the data.
You can apply Dijkstra to your problem in this way:
the two "dry" regions you want to connect correspond to matrix entries with value 0; the other cells have a positive value designating the depth (or the cost of filling this place with concrete)
your edges are the connections of neighbouring cells in your matrix. (It can be a 4- or 8-neighbourhood.) The weight of the edge is the arithmetic mean of the values of the connected cells.
then you apply the Dijkstra algorithm with a starting point in one "dry" region and an end point in the other "dry" region.
The cheapest path will connect two cells of value 0 and its weight will correspond to sum of costs of cells visited. (One half of each cell weight is coming from the edge going to the cell, the other half from the edge leaving the cell.)
This way you will get a possibly rather crooked path leading over the water, which may be a helpful hint for where to build a cheap bridge.
You can speed up the calculation by using the A*-algorithm. Therefore one needs a lower bound of the remaining costs for reaching the other side for each cell. Such a lower bound can be calculated by examining the "concentric rings" around a point as long as rings do not contain a 0-cell of the other side. The sum of the minimal cell values for each ring is then a lower bound of the remaining costs.
An alternative approach, which emphasizes the constraint that you require a non-jagged shape for your bridge, would be to use Monte-Carlo, simulated annealing or a genetic algorithm, where the initial "bridge" consisted a simple spline curve between two randomly chosen end points (one on each side of the chasm), plus a small number or randomly chosen intermediate points in the chasm. You would end up with a physically 'realistic' bridge and a reasonably optimized cost of concrete.

Connect points from set in the line segments

I have been given a task where I have to connects all the points in the 2D plane.
There are four conditions to to be met:
Length of the all segments joined together has to be minimal.
One point can be a part of only one line segment.
Line segments cannot intersect
All points have to be used(one can't be left alone but only if it cannot be avoided)
Image to visualize the problem:
The wrong image connected points correctly, although the total length is bigger that the the one in on the left.
At first I thought about sorting the points and doing it with a sweeping line and building a tree of all possibilities, although it does seem like a way to complicated solution with huge complexity. Therefore I search better approaches. I would appreciate some hints what to do, or how could I approach the problem.
I would start with a Delaunay triangulation of the point set. This should already give you the nearest neighbor connections of each point without any intersections. In the next step I'd look at the triangles that result from the triangulation - the convenient property here is that based on your ruleset you can pick exactly one side from each triangle and remove the remaining two from the selection.
The problem that remains now is to pick those edges that give you the smallest total sum which of course will not always be the smallest side since that one might already have been blocked by a neighboring triangle. I'd start with a greedy approach, always picking the smallest remaining edge that has not been blocked by neighboring triangles yet.
Edit: In the next step you retrieve a list of all the edges in that triangulation and sort them by length. You also make another list in which you count the amount of connections each point has. Now you iterate through the edge list going from the longest edge to the shortest one and check the two points it connects in the connection count list: if each of the points has still more than 1 connection left, you can discard the edge and decrement the connection count for the two points involved. If at least one of the points has only one connection left, you have got yourself one of the edges you are looking for. You repeat the process until there are no edges left and this should hopefully give you the smallest possible edge sum.
If I am not mistaken this problem is loosely related to the knapsack problem which is NP-Hard so I am not sure if this solution really gives you the best possible one.
I'd say this is an extension to the well-known travelling salesman problem.
A good technique (if a little old-fashioned) is to use a simulated annealing optimisation technique.
You'll need to make adjustments to the cost (a.k.a. objective) function to miss out sections of the path. But given a candidate continuous path, it's reasonably trivial to decide which sections to miss out to minimise its length. (You'd first remove the longer of any intersecting lines).
Wow, that's a tricky one. That's a lot of conditions to meet.
I think from a programming standpoint, the "simplest" solution might actually be to just loop through, find all the possibilities that satisfy the last 3 conditions, and record the total length as you loop through, and just choose the one with the shortest length in the end - brute force, guess-and-check. I think this is what you were referring to in your OP when you mentioned a "sweeping line and building a tree of all possibilities". This approach is very computationally expensive, but if the code is written right, it should always work in the end.
If you want the "best" solution, where you want to just solve for the single final answer right away, I'm afraid my math skills aren't strong enough for that - I'm not even sure if there is any single analytical solution to that problem for any arbitrary collection of points. Maybe try checking with the people over at MathOverflow. If someone over there can explain you with the math behind that calculation, and you then you still need help to convert that math into code in a certain programming language, update your question here (maybe with a link to the answer they provide you) and I'm sure someone will be able to help you out from that point.
One of the possible solutions is to use graph theory.
Construct a bipartite graph G, such that each point has its copy in both parts. Now put the edges between the points i and j with the weight = i == j ? infinity : distance[i][j]. The minimal weight maximum matching in the graph will be your desired configuration.
Notice that since this is on a euclidean 2D plane, the resulting "edges" of the matching will not intersect. Let's say that edges AB and XY intersect for points A, B, X, Y. Then the matching is not of the minimum weight, because either AX, BY or AY, BX will produce a smaller total weight without an intersection (this comes from triangle inequality a+b > c)

How to find the neighbors of a graph effiiciently

I have a program that create graphs as shown below
The algorithm starts at the green color node and traverses the graph. Assume that a node (Linked list type node with 4 references Left, Right, Up and Down) has been added to the graph depicted by the red dot in the image. Inorder to integrate the newly created node with it neighbors I need to find the four objects and link it so the graph connectivity will be preserved.
Following is what I need to clarify
Assume that all yellow colored nodes are null and I do not keep a another data structure to map nodes what is the most efficient way to find the existence of the neighbors of the newly created node. I know the basic graph search algorithms like DFS, BFS etc and shortest path algorithms but I do not think any of these are efficient enough because the graph can have about 10000 nodes and doing graph search algorithms (starting from the green node) to find the neighbors when a new node is added seems computationally expensive to me.
If the graph search is not avoidable what is the best alternative structure. I thought of a large multi-dimensional array. However, this has memory wastage and also has the issue of not having negative indexes. Since the graph in the image can grow in any directions. My solution to this is to write a separate class that consists of a array based data structure to portray negative indexes. However, before taking this option I would like to know if I could still solve the problem without resolving to a new structure and save a lot of rework.
Thank you for any feedback and reading this question.
I'm not sure if I understand you correctly. Do you want to
Check that there is a path from (0,0) to (x1,y1)
or
Check if any of the neighbors of (x1,y1) are in the graph? (even if there is no path from (0,0) to any of this neighbors).
I assume that you are looking for a path (otherwise you won't use a linked-list), which implies that you can't store points which have no path to (0,0).
Also, you mentioned that you don't want to use any other data structure beside / instead of your 2D linked-list.
You can't avoid full graph search. BFS and DFS are the classic algorithms. I don't think that you care about the shortest path - any path would do.
Another approaches you may consider is A* (simple explanation here) or one of its variants (look here).
An alternative data structure would be a set of nodes (each node is a pair < x,y > of course). You can easily run 4 checks to see if any of its neighbors are already in the set. It would take O(n) space and O(logn) time for both check and add. If your programming language does not support pairs as nodes of a set, you can use a single integer (x*(Ymax+1) + Y) instead.
Your data structure can be made to work, but probably not efficiently. And it will be a lot of work.
With your current data structure you can use an A* search (see https://en.wikipedia.org/wiki/A*_search_algorithm for a basic description) to find a path to the point, which necessarily finds a neighbor. Then pretend that you've got a little guy at that point, put his right hand on the wall, then have him find his way clockwise around the point. When he gets back, he'll have found the rest.
What do I mean by find his way clockwise? For example suppose that you go Down from the neighbor to get to his point. Then your guy should be faced the first of Right, Up, and Left which he has a neighbor. If he can go Right, he will, then he will try the directions Down, Right, Up, and Left. (Just imagine trying to walk through the maze yourself with your right hand on the wall.)
This way lies insanity.
Here are two alternative data structures that are much easier to work with.
You can use a quadtree. See http://en.wikipedia.org/wiki/Quadtree for a description. With this inserting a node is logarithmic in time. Finding neighbors is also logarithmic. And you're only using space for the data you have, so even if your graph is very spread out this is memory efficient.
Alternately you can create a class for a type of array that takes both positive and negative indices. Then one that builds on that to be 2-d class that takes both positive and negative indices. Under the hood that class would be implemented as a regular array and an offset. So an array that can start at some number, positive or negative. If ever you try to insert a piece of data that is before the offset, you create a new offset that is below that piece by a fixed fraction of the length of the array, create a new array, and copy data from the old to the new. Now insert/finding neighbors are usually O(1) but it can be very wasteful of memory.
You can use a spatial index like a quad tree or a r-tree.

Google Maps: Given a point, how to find all points at a given road distance?

In my app, the GPS picks the location of the vehicle. It is then supposed to put markers at all points where the vehicle could be if it drives for 1 KM in any direction (note that the roads may fork many times within his 1KM reach).
Can someone suggest me how to do this? Thanks in advance.
This is a very tricky problem to solve with the Google Maps API. The following is one method that you may want to consider:
You can easily calculate a bounding circle of 1km around your GPS point, and it is also easy to calculate points that fall on the circumference of this circle, for any angle. This distance will be "as the crow files" and not the actual road distance, but you may want to check out the following Stack Overflow post for a concrete implementation of this:
How to calculate the latlng of a point a certain distance away from another?
Screenshot with markers at 20 degree intervals on a bounding circle with a 1km radius:
removed dead ImageShack link - How to calculate the latlng of a point a certain distance away from another?
There is also a trick to snap these points to the nearest street. You can check out Mike Williams' Snap point to street examples for a good implementation of this.
Calculating the road distance from your GPS point to each snapped road point could be done with the directions service of the Google Maps API. Note that this will only work in countries that support directions in Google Maps, but more importantly, the road distance will almost always be greater than 1km, because our bounding circle has a 1km radius "as the crow flies". However if you can work with approximate information, this may already be one possible solution.
You can also consider starting with the above solution (1km bounding circle, calculate x points on the circumference, and snap them to the closest road), then calculate the road distance of each path (from your GPS point to each snapped point), and then you can repeat this this recursively for each path, each time using a smaller bounding circle, until you reach a road distance close to 1km. You can decrease the bounding circle in each recursion, in proportion to the error margin, to make your algorithm more efficient.
UPDATE:
I found a very neat implementation which appears to be using a similar method to the one I described above:
Driving Radius (Multiple destinations)
Note how you can change the interval of degrees from the top. With a wide interval you'll get fast results, but you could easily miss a few routes.
Screenshot:
removed dead ImageShack link - Driving Radius
Natural brute force algorithm is to build a list of all possible nodes taking into account each possible decision on every crossroad.
I doubt that within 1km you would get more then 10 crossroads on average and assuming avg of 3 choices on a crossroad you would end up with 3^10 - around 59,049 end nodes (notice that you need to have 10 crossroads on every branch of the road to reach the full number).
In reality the number would go down and I would assume getting to the same node by different route would not be uncommon, especially in cities.
This approach would give you an exact answer (providing you have good street map as input). It is potential time, but the n does not seem to be that high, so it might be practical.
Further improvements and optimizations might be possible depending on what do you need these nodes for (or which kind of scenarios you would consider similar enough to prune them).
Elaborating a bit on Daniel's approach above, you want to first find all the point within a straight line radius from your origin. That's your starting set of nodes. Now include ALL edges incident to those nodes and other nodes in your starting set. Now check that the nodes are connected and that there aren't any nodes out there floating around that you can't reach. Now create a "shortest path tree" starting from your vehicle node.
The tree will give you the shortest paths from your starting node to all other nodes. Note that if you start by creating paths at the furthest nodes, any sub-paths are also shortest paths to those nodes along the way. Make sure to label those nodes on sub-paths as you continue so you don't need to compute them. Worst case scenario, you need to develop a shortest path for all nodes, but in practice this should take much less time.
List all possible nodes taking into account each possible decision on every crossroad
(But how to do it automatically?
Use Dijkstra`s algorithm to find closes route to all points.
Visualize data.
(That is a little bit tricky, because there can be an unreachable areas inside reachable area.

Resources