Finding POIs that are near or contain a certain location - algorithm

I have an application that does the following:
Receives a device's location
Fetches a route (collection of POIs, or Points of Interest) assigned to that device
Determines if the device is near any of the POIs in the route
The route's POIs can be either a point with a radius, in which case it should detect if the device is within the radius of the point; or a polygon, where it should detect if the device is inside of it.
Here is a sample of a route with 3 POIs, two of them are points with different radii, and the other one is a polygon:
https://jsonblob.com/285c86cd-61d5-11e7-ae4c-fd99f61d20b8
My current algorithm is programmed in PHP with a MySQL database. When a device sends a new location, the script loads all the POIs for its route from the database into memory, and then iterates through them. For POIs that are points, it uses the Haversine formula to find if the device is within the POI's radius, and for POIs that are polygons it uses a "point in polygon" algorithm to find if the device is inside of it or not.
I would like to rewrite the algorithm with the goal of using less computing resources than the current one. We receive about 100 locations per second and they each have to be checked against routes that have about 40 POIs on average.
I can use any language and database to do so, which ones would you recommend for the best possible performance?

I'd use a database (e.g., Postgresql) that supports spatial queries.
That will let you create a spatial index that puts a bounding box around each POI. You can use this to do an initial check to (typically) eliminate the vast majority of POIs that aren't even close to the current position (i.e., where the current position isn't inside their bounding box).
Then when you've narrowed it down to a few POIs, you can test the few that are lest using roughly the algorithm you are now--but instead of testing 40 POIs per point, you might be testing only 2 or 3.
Exactly how well this will work will depend heavily upon how close to rectangular your POIs are. Circular is close enough that it tends to give pretty good results.
Others may depend--for example, a river that runs nearly north and south may work quite well. If you have a river that runs mostly diagonally, it may be worthwhile to break it up into a number of square/rectangular segments instead of treating the whole thing as a single feature, since the latter will create a bounding box with a lot of space that's quite a ways away from the river.

Related

Storage assignment algorithm

I'm trying to code a storage assignment algorithm but I'm not sure what the best way would be to model the warehouse the algorithm runs on.
The warehouse consists of shelves that can fit one item per storage location, walking ways and measuring points. Items can only be retrieved from the front of the storage locations denoted with broken lines. (The image below is just a basic representation of the warehouse. In the final version it is supposed to be tested with various amounts of storage locations and SKUs.
The idea is to measure the distance from a measuring point to a storage location for each SKU and minimize the overall distances.
The algorithm itself follows a two step approach:
First a simple greedy is used to find a feasible starting solution.
Secondly, the main algorithm is an adapeted version of binary search that runs multiple
binary search iterations through the set of potentially optimal maximum distance combinations obtained from the greedy before and assigns the SKUs to the storage location that minimizes the objective value.
My basic idea was to model the storage locations as a graph from each measuring point to the storage locations with arcs representing the distance but I'm not 100% sure if this would make sense.
So what are your ideas?
Disclaimer: The main idea is based on the paper 'Scattered Storage: How to Distribute Stock Keeping Units All Around a Mixed-Shelves Warehouse' published by Boysen & Weidinger in 2018.
Interesting problem. However, I think you might be looking at the problem from the wrong angle. Rather than searching for where to put it, find the "cost function" from all measuring points to all storage locations. Then have each storage location store it's cost. Then you throw all the (available) storage locations into a priority queue.
If you need a storage location pull the next one from the priority queue. If a location frees up, add it the queue.
Make a grid graph that represents the path that can be traversed. IE: if (0,0) is the top left corner. Then there is no direct connection between (0,1) and (1,1) as that storage isn't accessible from the left side. However, there is connection between (1,0) and (1,1). Once you have this, you can run all shortest paths to find all the distances. You'll likely need to be able to mark squares as either a) Measuring location, b) walkway or c) Storage location.
The cost function and related bits is the real tricky thing to get right in terms of "real world practicality". Here are some things to consider:
In simple terms you are just looking for the distances to all measuring stations from each storage location and the cost is probably the average. In more complex terms, you may need to consider throughput. By this I mean it may not make sense to take something out a storage location, put something back, take it out again, put it back again and so on. That may cause bottlenecks as now everyone is trying to store stuff in the same general area and you might have some traffic issues with too many people in the same area. In this case, you may need to add some "randomness" to the measurements. For example the middle 2 have the same weight, but are on opposite ends of the warehouse (in theory). It would be best if some randomness was used to ensure there is a 50:50 chance of either one being the next place an item goes. Though that alone isn't likely to be enough if this is a real issue.
You may not actually want to minimize distance to a all measuring locations practically speaking. There are likely cases where certain SKU's are more relevant distance wise to certain measuring locations. In which case, you may want to bias the priority value in that direction. IE: An SKU that is almost always going to be moved to M1 should more likely be placed in a storage location closer to that. Of course that requires something more complex than a priority queue to get working right as you need to be able to search the available storage locations for the one closest to M1.
You may need to consider order items can be stored. IE: If a storage location is 2 deep (all the ones you have are 1 deep), you may want to fill the location further back first. Though I suspect this probably isn't the issue.
Vertical storage locations. Once you have a 2D grid working, a 3D grid isn't significantly more complex to implement, useful if storage locations are actually multiple level allowing for items to be placed on different shelves or something (or just stacked). The issue here is just like #3. Do you fill things storage locations up top to bottom? Bottom to top? Or random order? Of course, quite possible your needs are such that vertical storage isn't possible or simply impractical (too tall, too fragile, unstackable, no shelving, etc).
#2 This can be further enhanced by keeping note of which items are being fetched/placed. The system can also track how long it typically takes to fetch/place items and direct placement of SKUs to other areas if others are expected to be in that area around the time the item is being fetched/placed in it's storage location.

Dividing the world in a thousand or so locations

Background: I want to create a weather service, and since most available APIs limit the number of daily calls, I want to divide the planet in a thousand or so areas.
Obviously, internet users are not uniformly distributed, so the sampling should be finer around densely populated regions.
How should I go about implementing this?
Where can I find data regarding geographical internet user density?
The algorithm will probably be something similar to k-means. However, implementing it on a sphere with oceans may be a bit tricky. Any insight?
Finally, maybe there is a way I can avoid doing all of this?
Very similar to k-mean is the centroidal Voronoi diagram (it is the continuous version of k-means). However, this would produce a uniform tesselation of your sphere that does not account for user density as you wish.
So a similar solution is the same technique but used with a Power Diagram : a Power Diagram is a Voronoi Diagram that accounts for a density (by assigning a weight to each Voronoi seed). Such diagram can be computed using an embedding in a 3D space (instead of 2D) that consists of the first two (x,y) coordinates plus a third one which is the square root of [any large positive constant minus the weight for the given point].
Using that, you can obtain a tesselation of your domain accounting for a user density.
You don't care about internet user density in general. You care about the density of users using your service - and you don't care where those users are, you care where they ask about. So once your site has been going for more than a day you can use the locations people ask about the previous day to work out what the areas should be for the next day.
Dynamic programming on a tree is easy. What I would do for an algorithm is to build a tree of successively more finely divided cells. More cells mean a smaller error, because people get predictions for points closer to them, and you can work out the error, or at least the relative error between more cells and fewer cells. Starting from the bottom up work out the smallest possible total error contributed by each subtree, allowing it to be divided in up to 1,2,3,..N. ways. You can work out the best possible division and smallest possible error for each k=1..N for a node by looking at the smallest possible error you have already calculated for each of its descendants, and working out how best to share out the available k divisions between them.
I would try to avoid doing this by thinking of a different idea. Depending on the way you look at life, there are at least two disadvantages of this:
1) You don't seem to be adding anything to the party. It looks like you are interposing yourself between organizations that actually make weather forecasts and their clients. Organizations lose direct contact with their clients, which might for instance lose them advertising revenue. Customers get a poorer weather forecast.
2) Most sites have legal terms of service, which must clients can ignore without worrying. My guess is that you would be breaking those terms of service, and if your service gets popular enough to be noticed they will be enforced against you.

Techniques to evaluate the "twistiness" of a road in Google Maps?

As per the title. I want to, given a Google maps URL, generate a twistiness rating based on how windy the roads are. Are there any techniques available I can look into?
What do I mean by twistiness? Well I'm not sure exactly. I suppose it's characterized by a high turn -to-distance ratio, as well as high angle-change-per-turn number. I'd also say that elevation change of a road comes in to it as well.
I think that once you know exactly what you want to measure, the implementation is quite straightforward.
I can think of several measurements:
the ratio of the road length to the distance between start and end (this would make a long single curve "twisty", so it is most likely not the complete answer)
the number of inflection points per unit length (this would make an almost straight road with a lot of little swaying "twisty", so it is most likely not the complete answer)
These two could be combined by multiplication, so that you would have:
road-length * inflection-points
--------------------------------------
start-end-distance * road-length
You can see that this can be shortened to "inflection-points per start-end-distance", which does seem like a good indicator for "twistiness" to me.
As for taking elevation into account, I think that making the whole calculation in three dimensions is enough for a first attempt.
You might want to handle left-right inflections separately from up-down inflections, though, in order to make it possible to scale the elevation inflections by some factor.
Try http://www.hardingconsultants.co.nz/transportationconference2007/images/Presentations/Technical%20Conference/L1%20Megan%20Fowler%20Canterbury%20University.pdf as a starting point.
I'd assume that you'd have to somehow capture the road centreline from Google Maps as a vectorised dataset & analyse using GIS software to do what you describe. Maybe do a screen grab then a raster-to-vector conversion to start with.
Cumulative turn angle per Km is a commonly-used measure in road assessment. Vertex density is also useful. Note that these measures depend upon an assumption that vertices have been placed at some form of equal density along the line length whilst they were captured, rather than being manually placed. Running a GIS tool such as a "bendsimplify" algorithm on the line should solve this. I have written scripts in Python for ArcGIS 10 to define these measures if anyone wants them.
Sinuosity is sometimes used for measuring bends in rivers - see the help pages for Hawths Tools for ArcGIS for a good description. It could be misleading for roads that have major
changes in course along their length though.

Sort POIs by distance from current location

Trover is an awesome app: it shows you a stream of discoveries (POIs) people have uploaded - sorted by the distance from any location you specify (usually your current location). The further you scroll through the feed, the farther away the displayed discoveries are. An indicator tells you quite accurately how far the currently shown discoveries are (see screenshots on Website).
This is different from most other location based apps that deliver their results (POIs) based on fixed regions (e.g. give me all Pizzerias withing a 10km radius) which can be implemented using a single spacial datastructure (or an SQL engine supporting spatial data types). Deliverying the results the way Trover does is considerably harder:
You can query POIs for arbitrary locations. Give Trover a location in the far East of Russia and it will deliver discoveries where the first one is 2000km away and continuously increasing from there.
The result list of POIs is not limited by some spatial range. If you scroll long enough through the feed you will probably see discoveries which are on the other side of the globe.
The above points require a semi-strict ordering of their POIs for any location. The fact that you can scroll down and reload more discoveries implies that they can deliver specific sections of the sorted data (e.g. give me the next 20 discoveries that are at least 100km away from my current location).
It's fast, the fetching and distance indications are instant. The discoveries must be pre-sorted. I don't know how many discoveries they have in their DB but it must be more than what you want to sort ad hoc.
I find these characteristics quite remarkable and wonder how this is implemented. Any suggestions what kind of data-structure, algorithms or caching might be used?
I don't get the question. What do want an answer to?
Edit:
They might use a graph-database where one edge represent the distance between the nodes. That way you can get the distance by the relationships of nearby POIs. You would calculate the distance and create edges to nearby nodes. To get the distance of an arbitrary point you just do a circle-distance calculation, for another node you just add up the edges value as they represent the distance (this is for the case of getting the walking,biking, or car calculation). The adding up might not be the closest way but will give a relative indication which it seems like they use.

Fast algorithm for line of sight calculation in an RTS game

I'm making a simple RTS game. I want it to run very fast because it should work with thousands of units and 8 players.
Everything seems to work flawlessly but it seems the line of sight calculation is a bottleneck. It's simple: if an enemy unit is closer than any of my unit's LOS range it will be visible.
Currently I use a quite naive algorithm: for every enemy units I check whether any of my units is see him. It's O(n^2)
So If there are 8 players and they have 3000 units each that would mean 3000*21000=63000000 tests per player at the worst case. Which is quite slow.
More details: it's a stupid simple 2D space RTS: no grid, units are moving along a straight lines everywhere and there is no collision so they can move through each other. So even hundreds of units can be at the same spot.
I want to speed up this LOS algorithm somehow. Any ideas?
EDIT:
So additional details:
I meant one player can have even 3000 units.
my units have radars so they towards all directions equally.
Use a spatial data structure to efficiently look up units by location.
Additionally, if you only care whether a unit is visible, but not which unit spotted it, you can do
for each unit
mark the regions this unit sees in your spatial data structure
and have:
isVisible(unit) = isVisible(region(unit))
A very simple spatial data structure is the grid: You overlay a coarse grid over the playing field. The regions are this grid's cells. You allocate an array of regions, and for each region keep of list of units presently in this region.
You may also find Muki Haklay's demonstration of spatial indexes useful.
One of the most fundamental rules in gamedev is to optimize the bejeebers out of your algorithms by exploiting all possible constraints your gameplay defines - this is the main reason that you don't see wildly different games built on top of any given companies game engine, they've exploited their constraints so efficiently that they can't deal with anything that isn't within these constraints.
That said, you said that units move in straight lines - and you say that players can 3000 units - even if I assume that's 3000 units for eight players, that's 375 units per player, so I think I'm safe in assuming that on each step of game play (and I am assuming that each step involves the calculation you describe above) that more units will not change their direction than units that will change direction.
So, if this is true, then you want to divide all your pieces into two groups - those that did change direction in the last step, and those that did not.
For those that did, you need to do a bit of calulating - for units of any two opposing forces, you want to ask 'when will unit A see unit B given that neither unit A nor unit B change direction or speed ?(you can deal with accelleration/decelleration, but then it gets more complicated) - to calculate this you need first to determine if the vectors that unit A and unit B are travelling on will intersect (simple 2D line intersection calculation, combined with a calculation that tells you when each unit hits this intersection) - if they don't, and they can't see each other now, then they never will see each other unless at least one of them changes direction. If they do intersect, then you need to calculate the time differential between when the first and second unit pass through the point of intersection - if this distance is greater than the LOS range, then these units will never see each other unless one changes direction - if this differential is less than the LOS range then a few more (wave hands vigorously) calculations will tell you when this blessed event will take place.
Now, what you have is a collection of information bifurcated into elements that never will see each other and elements that will see each other at some time t in the future - each step, you simply deal with the units that have changed direction and compute their interactions with the rest of the units. (Oh, and deal with those units that previous calculations told you would come into view of each other - remember to keep these in an insertable ordered structure) What you've effectively done is exploited the linear behavior of the system to change your question from 'Does unit A see unit B' to 'When will unit A see unit B'
Now, all of that said, this isn't to discount the spatial data structure answer - it's a good answer - however, it is also capable of dealing with units in random motion, so you want to consider how to optimize this process further - you also need to be careful about dealing with cross region visibility, i.e. units at the borders of two different regions may be able to see each other - if you have pieces that tend to clump up, using a spatial data structure with variable dimensions might be the answer, where pieces that are not in the same region are guaranteed not to be able to see each other.
I'd do this with a grid. I think that's how commercial RTS games solve the problem.
Discretize the game world for the visibility tracker. (Square grid is easiest. Experiment with the coarseness to see what value works best.)
Record the present units in each area. (Update whenever a unit moves.)
Record the areas each player sees. (This has to be updated as units move. The unit could just poll to determine its visible tiles. Or you could analyze the map before the game starts..)
Make a list (or whatever structure is fitting) for the enemy units seen by each player.
Now whenever a unit goes from one area of visibility to another, perform a check:
Went from an unseen to a seen area - add the unit to the player's visibility tracker.
Went from a seen to an unseen area - remove the unit from the player's visibility tracker.
In the other two cases no visibility change occurred.
This is fast but takes some memory. However, with BitArrays and Lists of pointers, the memory usage shouldn't be that bad.
There was an article about this in one of the Game Programming Gems books (one of the first three, I think.)

Resources