Trover is an awesome app: it shows you a stream of discoveries (POIs) people have uploaded - sorted by the distance from any location you specify (usually your current location). The further you scroll through the feed, the farther away the displayed discoveries are. An indicator tells you quite accurately how far the currently shown discoveries are (see screenshots on Website).
This is different from most other location based apps that deliver their results (POIs) based on fixed regions (e.g. give me all Pizzerias withing a 10km radius) which can be implemented using a single spacial datastructure (or an SQL engine supporting spatial data types). Deliverying the results the way Trover does is considerably harder:
You can query POIs for arbitrary locations. Give Trover a location in the far East of Russia and it will deliver discoveries where the first one is 2000km away and continuously increasing from there.
The result list of POIs is not limited by some spatial range. If you scroll long enough through the feed you will probably see discoveries which are on the other side of the globe.
The above points require a semi-strict ordering of their POIs for any location. The fact that you can scroll down and reload more discoveries implies that they can deliver specific sections of the sorted data (e.g. give me the next 20 discoveries that are at least 100km away from my current location).
It's fast, the fetching and distance indications are instant. The discoveries must be pre-sorted. I don't know how many discoveries they have in their DB but it must be more than what you want to sort ad hoc.
I find these characteristics quite remarkable and wonder how this is implemented. Any suggestions what kind of data-structure, algorithms or caching might be used?
I don't get the question. What do want an answer to?
Edit:
They might use a graph-database where one edge represent the distance between the nodes. That way you can get the distance by the relationships of nearby POIs. You would calculate the distance and create edges to nearby nodes. To get the distance of an arbitrary point you just do a circle-distance calculation, for another node you just add up the edges value as they represent the distance (this is for the case of getting the walking,biking, or car calculation). The adding up might not be the closest way but will give a relative indication which it seems like they use.
Related
I'm trying to code a storage assignment algorithm but I'm not sure what the best way would be to model the warehouse the algorithm runs on.
The warehouse consists of shelves that can fit one item per storage location, walking ways and measuring points. Items can only be retrieved from the front of the storage locations denoted with broken lines. (The image below is just a basic representation of the warehouse. In the final version it is supposed to be tested with various amounts of storage locations and SKUs.
The idea is to measure the distance from a measuring point to a storage location for each SKU and minimize the overall distances.
The algorithm itself follows a two step approach:
First a simple greedy is used to find a feasible starting solution.
Secondly, the main algorithm is an adapeted version of binary search that runs multiple
binary search iterations through the set of potentially optimal maximum distance combinations obtained from the greedy before and assigns the SKUs to the storage location that minimizes the objective value.
My basic idea was to model the storage locations as a graph from each measuring point to the storage locations with arcs representing the distance but I'm not 100% sure if this would make sense.
So what are your ideas?
Disclaimer: The main idea is based on the paper 'Scattered Storage: How to Distribute Stock Keeping Units All Around a Mixed-Shelves Warehouse' published by Boysen & Weidinger in 2018.
Interesting problem. However, I think you might be looking at the problem from the wrong angle. Rather than searching for where to put it, find the "cost function" from all measuring points to all storage locations. Then have each storage location store it's cost. Then you throw all the (available) storage locations into a priority queue.
If you need a storage location pull the next one from the priority queue. If a location frees up, add it the queue.
Make a grid graph that represents the path that can be traversed. IE: if (0,0) is the top left corner. Then there is no direct connection between (0,1) and (1,1) as that storage isn't accessible from the left side. However, there is connection between (1,0) and (1,1). Once you have this, you can run all shortest paths to find all the distances. You'll likely need to be able to mark squares as either a) Measuring location, b) walkway or c) Storage location.
The cost function and related bits is the real tricky thing to get right in terms of "real world practicality". Here are some things to consider:
In simple terms you are just looking for the distances to all measuring stations from each storage location and the cost is probably the average. In more complex terms, you may need to consider throughput. By this I mean it may not make sense to take something out a storage location, put something back, take it out again, put it back again and so on. That may cause bottlenecks as now everyone is trying to store stuff in the same general area and you might have some traffic issues with too many people in the same area. In this case, you may need to add some "randomness" to the measurements. For example the middle 2 have the same weight, but are on opposite ends of the warehouse (in theory). It would be best if some randomness was used to ensure there is a 50:50 chance of either one being the next place an item goes. Though that alone isn't likely to be enough if this is a real issue.
You may not actually want to minimize distance to a all measuring locations practically speaking. There are likely cases where certain SKU's are more relevant distance wise to certain measuring locations. In which case, you may want to bias the priority value in that direction. IE: An SKU that is almost always going to be moved to M1 should more likely be placed in a storage location closer to that. Of course that requires something more complex than a priority queue to get working right as you need to be able to search the available storage locations for the one closest to M1.
You may need to consider order items can be stored. IE: If a storage location is 2 deep (all the ones you have are 1 deep), you may want to fill the location further back first. Though I suspect this probably isn't the issue.
Vertical storage locations. Once you have a 2D grid working, a 3D grid isn't significantly more complex to implement, useful if storage locations are actually multiple level allowing for items to be placed on different shelves or something (or just stacked). The issue here is just like #3. Do you fill things storage locations up top to bottom? Bottom to top? Or random order? Of course, quite possible your needs are such that vertical storage isn't possible or simply impractical (too tall, too fragile, unstackable, no shelving, etc).
#2 This can be further enhanced by keeping note of which items are being fetched/placed. The system can also track how long it typically takes to fetch/place items and direct placement of SKUs to other areas if others are expected to be in that area around the time the item is being fetched/placed in it's storage location.
I have an application that does the following:
Receives a device's location
Fetches a route (collection of POIs, or Points of Interest) assigned to that device
Determines if the device is near any of the POIs in the route
The route's POIs can be either a point with a radius, in which case it should detect if the device is within the radius of the point; or a polygon, where it should detect if the device is inside of it.
Here is a sample of a route with 3 POIs, two of them are points with different radii, and the other one is a polygon:
https://jsonblob.com/285c86cd-61d5-11e7-ae4c-fd99f61d20b8
My current algorithm is programmed in PHP with a MySQL database. When a device sends a new location, the script loads all the POIs for its route from the database into memory, and then iterates through them. For POIs that are points, it uses the Haversine formula to find if the device is within the POI's radius, and for POIs that are polygons it uses a "point in polygon" algorithm to find if the device is inside of it or not.
I would like to rewrite the algorithm with the goal of using less computing resources than the current one. We receive about 100 locations per second and they each have to be checked against routes that have about 40 POIs on average.
I can use any language and database to do so, which ones would you recommend for the best possible performance?
I'd use a database (e.g., Postgresql) that supports spatial queries.
That will let you create a spatial index that puts a bounding box around each POI. You can use this to do an initial check to (typically) eliminate the vast majority of POIs that aren't even close to the current position (i.e., where the current position isn't inside their bounding box).
Then when you've narrowed it down to a few POIs, you can test the few that are lest using roughly the algorithm you are now--but instead of testing 40 POIs per point, you might be testing only 2 or 3.
Exactly how well this will work will depend heavily upon how close to rectangular your POIs are. Circular is close enough that it tends to give pretty good results.
Others may depend--for example, a river that runs nearly north and south may work quite well. If you have a river that runs mostly diagonally, it may be worthwhile to break it up into a number of square/rectangular segments instead of treating the whole thing as a single feature, since the latter will create a bounding box with a lot of space that's quite a ways away from the river.
Background: I want to create a weather service, and since most available APIs limit the number of daily calls, I want to divide the planet in a thousand or so areas.
Obviously, internet users are not uniformly distributed, so the sampling should be finer around densely populated regions.
How should I go about implementing this?
Where can I find data regarding geographical internet user density?
The algorithm will probably be something similar to k-means. However, implementing it on a sphere with oceans may be a bit tricky. Any insight?
Finally, maybe there is a way I can avoid doing all of this?
Very similar to k-mean is the centroidal Voronoi diagram (it is the continuous version of k-means). However, this would produce a uniform tesselation of your sphere that does not account for user density as you wish.
So a similar solution is the same technique but used with a Power Diagram : a Power Diagram is a Voronoi Diagram that accounts for a density (by assigning a weight to each Voronoi seed). Such diagram can be computed using an embedding in a 3D space (instead of 2D) that consists of the first two (x,y) coordinates plus a third one which is the square root of [any large positive constant minus the weight for the given point].
Using that, you can obtain a tesselation of your domain accounting for a user density.
You don't care about internet user density in general. You care about the density of users using your service - and you don't care where those users are, you care where they ask about. So once your site has been going for more than a day you can use the locations people ask about the previous day to work out what the areas should be for the next day.
Dynamic programming on a tree is easy. What I would do for an algorithm is to build a tree of successively more finely divided cells. More cells mean a smaller error, because people get predictions for points closer to them, and you can work out the error, or at least the relative error between more cells and fewer cells. Starting from the bottom up work out the smallest possible total error contributed by each subtree, allowing it to be divided in up to 1,2,3,..N. ways. You can work out the best possible division and smallest possible error for each k=1..N for a node by looking at the smallest possible error you have already calculated for each of its descendants, and working out how best to share out the available k divisions between them.
I would try to avoid doing this by thinking of a different idea. Depending on the way you look at life, there are at least two disadvantages of this:
1) You don't seem to be adding anything to the party. It looks like you are interposing yourself between organizations that actually make weather forecasts and their clients. Organizations lose direct contact with their clients, which might for instance lose them advertising revenue. Customers get a poorer weather forecast.
2) Most sites have legal terms of service, which must clients can ignore without worrying. My guess is that you would be breaking those terms of service, and if your service gets popular enough to be noticed they will be enforced against you.
As per the title. I want to, given a Google maps URL, generate a twistiness rating based on how windy the roads are. Are there any techniques available I can look into?
What do I mean by twistiness? Well I'm not sure exactly. I suppose it's characterized by a high turn -to-distance ratio, as well as high angle-change-per-turn number. I'd also say that elevation change of a road comes in to it as well.
I think that once you know exactly what you want to measure, the implementation is quite straightforward.
I can think of several measurements:
the ratio of the road length to the distance between start and end (this would make a long single curve "twisty", so it is most likely not the complete answer)
the number of inflection points per unit length (this would make an almost straight road with a lot of little swaying "twisty", so it is most likely not the complete answer)
These two could be combined by multiplication, so that you would have:
road-length * inflection-points
--------------------------------------
start-end-distance * road-length
You can see that this can be shortened to "inflection-points per start-end-distance", which does seem like a good indicator for "twistiness" to me.
As for taking elevation into account, I think that making the whole calculation in three dimensions is enough for a first attempt.
You might want to handle left-right inflections separately from up-down inflections, though, in order to make it possible to scale the elevation inflections by some factor.
Try http://www.hardingconsultants.co.nz/transportationconference2007/images/Presentations/Technical%20Conference/L1%20Megan%20Fowler%20Canterbury%20University.pdf as a starting point.
I'd assume that you'd have to somehow capture the road centreline from Google Maps as a vectorised dataset & analyse using GIS software to do what you describe. Maybe do a screen grab then a raster-to-vector conversion to start with.
Cumulative turn angle per Km is a commonly-used measure in road assessment. Vertex density is also useful. Note that these measures depend upon an assumption that vertices have been placed at some form of equal density along the line length whilst they were captured, rather than being manually placed. Running a GIS tool such as a "bendsimplify" algorithm on the line should solve this. I have written scripts in Python for ArcGIS 10 to define these measures if anyone wants them.
Sinuosity is sometimes used for measuring bends in rivers - see the help pages for Hawths Tools for ArcGIS for a good description. It could be misleading for roads that have major
changes in course along their length though.
Nowadays most of the Restaurants and other businesses have a "Find Locations" functionality on their websites which lists nearest locations for a given address/Zip. How is this implemented? Matching the zipcode against the DB is a simple no-brainer way to do but may not always work, for example there may be a branch closer to the given location but could be in a different zip. One approach that comes to my mind is to convert the given zip-code/address into map co-ordinates and list any branches falling into a pre-defined radius. I welcome your thoughts on how this would've been implemented.If possible provide more detailed implementation details like any web-services used etc.,
A lot of geospatial frameworks will help you out with this. In the geospatial world, a zip code is just a "polygon", which is just an area on a map which defines clear boundaries (not a polygon in the math sense). In SQL 2008 spatial, for example, you can create a new polygon based on your original polygon. So you can dynamically create a polygon that is your zip code extended by a certain distance at every point. It takes the funky shape of the zip code into account. With an address, It’s easy, because you just create a polygon, which is a circle around the one point. You can then do queries give you all points within the new polygon that you created in either method.
A lot of these sites are basically just doing this. They give you all points within a 5 mile extended polygon, and then maybe a 10 mile extended polygon, and so on and so forth. They are not actually calculating distance. Most ma stuff on the web is not sophisticated at all.
You can see some basic examples here to get the general idea of what I'm talking about.
There is a standard zipode/location database available. Here is one version in Access format that includes the lat/long of the zipcode as well as other information. You can then use The PostgreSQL GIS extensions to do searches on the locations for example.
(assuming of course that you extract the access db and insert into a more friendly database like PostgreSQL)
First, you Geocode the address, translating it into (usually) a latitude and longitude. Then, you do a nearest-neighbour query on your database for points of interest.
Most spatial indexes don't directly support nearest-neighbour queries, so the usual approach here is to query on a bounding box of a reasonable size with the geocoded point at the center, then sort the results in memory to pick the closest ones.
Just like you said. Convert an address/ZIP into a 2D world coordinate and compare it to other known locations. Pick the nearest. :) I think some DB's (Oracle, MSSQL 2008) even offer some functions that can help, but I've never used them.
I think it is pretty universal. They take the address or zipcode and turn it in to a "map coordinate" (differs depending on implementation, probably a lat/long) and then using the "map coordinates" of the things in the database it is easy to calculate a distance.
Note that some poor implementations convert the zipcode in to a coordinate representing the center of the zipcode area, which sometimes gives bad results.
Your thoughts on how to do it are how I would probably do it. You can geocode the co-oridinated for the zip and then do calculations based on that. I know SQL Server 2008 has some special new functionality to help doing queries based on these geocoded lon/lat co-ordinates.
There are actual geometric algorithms and/or datastructures that support lower O(...) nearest location queries on point, line and/or region data.
See this book as an example of information on some of them, like: Voronoi diagrams, quadtrees, etc.
However I think the other answers here are right in most cases that you find in software today:
geocode (a single point in) the search area
bounding box query to get an initial ballpark
in memory sorting/selecting
I had table that i would compile a database table every 6months it contained 3 columns, I used it for a few clients in Australia, it contained about 40k of rows, very lightweight to run a query. this is quite quick, if just looking to get something off the ground for a client
Postal Code from
Postal Code To
Distance
SELECT Store_ID, Store_AccountName, Store_PostalCode, Store_Address, Store_Suburb, Store_Phone, Store_State, Code_Distance FROM Store, (SELECT Code_To As Code_To, Code_Distance FROM Code WHERE Code_From = #PostalCode UNION ALL SELECT Code_From As Code_To, Code_Distance FROM Code WHERE Code_To = #PostalCode UNION ALL SELECT #PostalCode As Code_To, 0 As Code_Distance) As Code WHERE Store_PostalCode = Code_To AND Code_Distance <= #Distance ORDER BY Code_Distance
There may be plenty optimization that you could do to speed up this query!.