I'm a developer on a service vehicle dispatching web app. It's written in .Net 4+, MVC4, using SQL server.
There are 2000+ locations stored in the database as geography data-types. Assuming we send resources from location A to location B, the drive time / distance etc... needs to be displayed at one point. If I calculate the distance with SQL Server's STDistance it will only give me the "As the crow flies" distance. So the system will need to hit a geo spatial service like bing, Google, or ESRI and get the actual drive time or suggested routes. the problem is this is a core function and will happen ALOT.
Should I pre-populate a lookup table with pre-calculated distances or average drive times? The down side is even without adding more locations that's 4Million records to search every time the information is needed.
On top of this, most times the destination is not one of our stored geospatial coordinates and can instead be an address or long/lat point anywhere on the continent which makes pre-calculating impossible.
I'm trying to avoid performance issues having to hit some geoservies endpoint constantly.
Any suggestions on how best to approach this?
-thanks!
Having looked at these problems before, you are unlikely to be able to store them all.
it is usually against almost all of the routing providers TOS for you to cache the results. You can sometimes negotiate this ability but it costs alot.
Given that there is not a fixed set of points you are searching against, doing one calculation gives you little information for the next calculation.
I would say maybe you can store the route for pair once they have been selected so you can show that route again if needed. Once the transaction is done I would remove the route from your DB.
If you really want to cache all this or have more control over it you can use PGRouting (with Postgresql) and then obtain street data. though I doubt it is worth your effort.
Related
I want to implement a search feature on my app that re-filters upon each new character entered into the search bar so users can search for other users. This is a fairly common feature on apps, but as a beginner it would seem like a very computationally complex process. It would seem that one of two things happen:
For each new character typed, the frontend queries the backend, which applies filter and returns.
The frontend loads all (or many) possible results beforehand and updates filter on the stored info as new characters are entered.
It would seem that 1) would have time complexity issues, as it makes O(n) queries (where n is number of characters) per search. This is especially problematic because it's expected that the filtered search results update near instantaneously. Additionally, my average query time is probably slower than most, as I'm using a three tier architecture (frontend<->server<->graph database)
I don't like 2)--at least in its straightforward form--as the number of possible results can get very large. We can reduce the space complexity of this by querying only for a limited set of user attributes (perhaps only uid and name, and fetching details on the fly if needed), but the point remains.
Things get more interesting if we modify 2) to load only a sample of users (and here we can use data like Location as well as ML/AI to select). The problem with this is that the searching user could always be looking for someone we didn't select. It would be a horrible (even if rare) experience for a user to know their friend was on the app but was unable to find them because our algorithm was only accurate for 99% of searches.
I am sure this is possible--other apps seem to pull it off--so what am I missing?
First, you should avoid to query the server for each character typed. Most of the times the user types a bounce of chars very fast without looking at suggested results, especially because with few chars the results wouldn't be specific enough. All the autocompletion systems adopt both of the following:
query only if the string is at least 2-3 chars long;
query only if the user is not typing more, i.e. after 300ms from the last type.
To get all the pertinent results without huge data transfer you could implement a progressive data load. Just load enough results to fill the page height, then as the user scrolls down load more results. However if you reach a high number of results you should stop retrieving them and ask the user to type a more specific search.
If you want to make your users happy, try to sort the result by relevance. For example if you know where the users are located you may sort the results by distance, because if I live in Italy and I search for "Ste" it is more likely my friend is Stefano who lives in Rome, than Steve who lives in NY.
First of all, I must say that I was very hesitant to post the following as I am afraid from getting down votes. However, I've spent days thinking about a solution and I haven't found one. My last hope is to get some answers in this post.
The Problem
Say that you have a large database of drivers connected in real-time to your backend, while you are fetching their lat/long each 5 seconds and posting it back to the backend so you update in real-time a driver's location. Let's suppose that we want to benefit form drivers and their positions to let a particular user find the closer connected driver to it (like in UBER,Lyft, etc..).
The question:
How is it possible to dispatch request to these drivers ? (I want you to share with me only you thoughts and ideas).
What you are looking is called "GeoSpatial search".
If you are looking for algorithms to implement then take a look at Nearest Neighbour Search
The most famous algorithm is k-Nearest Neighbours algorithm.
If you are only interested in using an existing implementation and build your application on top of it then there are existing databases & search applications which provide capability of GeoSpatial search.
Check Apache Solr which provides Geospatial search capabilities. https://cwiki.apache.org/confluence/display/solr/Spatial+Search
you just need to feed your drivers live location into this and query with current location of user. Solr will take care of finding the nearest drivers and you will get a search result with your matching criteria.
You can use this as a starting point to build your app with location based searches. In pratice, Uber, Lyft and other major services have their own in-house application with custom implementations.
let's say a service operation like this
api/places/?category=entertainment&geo=123,456&area=30&orderBy=distance
so the user is searching for places of entertainment near the geo location (123,456), no further than 30 kms boundary, and want the results sorted by distance
suppose the search should be paged, say it will have 500 items satisfy the query, but the page size is 50, so it will have 10 pages.
each item in database stores only geo location of the place, and then I will have to fetch all 500 items from db first, calculate the distance of each item, cut the array to the page number and then return.
so every time the user is requesting next page, I will have to query all 500 and then do the same thing again.
is this the right way to implement a service like that, or is there a better strategy?
it seems to be worse when my database don't have the geo location, because I am using a different API provider to give me the geo of a place. It means I will have to query everything and hit another service to get the geo, calculate, and finally able to sort... :(
thank you very much!
If you were developing a single-page app for the search results, you wouldn't need to send another request to the server every time the user presses Next. You could use a pagination library that takes the full set of results and sorts them into pages accordingly.
In this situation, there's a trade between the size of the data you want to store, and the speed and efficiency of your web application. In this sort of situation, you should really be dealing with large data sets. You should ideally have additional databases for each general geographic region (such as Northeast, Southeast) that store the distance between each store and each location the user can enter. You should use a separate server for this, and aggregate the data at intervals (say, every six hours) using an automated database operation, such as running MongoDB scripts.
You should also consider using weighted graphs to store the distances between locations. You could then more easily traverse them with a graphing algorithm.
I'm pondering over how to efficiently represent locations in a database, such that given an arbitrary new location, I can efficiently query the database for candidate locations that are within an acceptable proximity threshold to the subject.
Similar things have been asked before, but I haven't found a discussion based on my criteria for the problem domain.
Things to bear in mind:
Starting from scratch, I can represent data in any way (eg. long&lat, etc)
Any result set is time-sensitive, in that it loses validity within a short window of time (~5-15mins) so I can't cache indefinitely
I can tolerate some reasonable margin of error in results, for example if a location is slightly outside of the threshold, or if a row in the result set has very recently expired
A language agnostic discussion is perfect, but in case it helps I'm using C# MVC 3 and SQL Server 2012
A couple of first thoughts:
Use an external API like Google, however this will generate thousands of requests and the latency will be poor
Use the Haversine function, however this looks expensive and so should be performed on a minimal number of candidates (possibly as a Stored Procedure even!)
Build a graph of postcodes/zipcodes, such that from any node I can find postcodes/zipcodes that border it, however this could involve a lot of data to store
Some optimization ideas to reduce possible candidates quickly:
Cache result sets for searches, and when we do subsequent searches, see if the subject is within an acceptable range to a candidate we already have a cached result set for. If so, use the cached result set (but remember, the results expire quickly)
I'm hoping the answer isn't just raw CPU power, and that there are some approaches I haven't thought of that could help me out?
Thank you
ps. Apologies if I've missed previously asked questions with helpful answers, please let me know below.
What about using GeoHash? (refer to http://en.wikipedia.org/wiki/Geohash)
I'm planning on creating a social network and I don't think I quite understand how the status update module of facebook is designed. Hoping I can find some help here. At algorithmic and datastructure level, what is the most efficient way to create a status update mechanism in a social network?
A full table scan for all friends and then sorting their updates is very naive and costly. Do we use some sort of mechanism based on hashing or something else? Please let me know.
P.S: I'm not talking about their EdgeRank algorithm but the basic status update. How do they find and fetch them from the database?
Thanks in advance for the help!
Here is a great presentation that answers your question. The specific answer comes up at around minute 55:40, but I suggest that you watch the entire presentation to understand how the solution fits into the entire architecture.
In short:
A particular server ("leaf") stores all feed items for a particular user. So data for each of your friends is stored entirely at a specific destination.
When you want to view your news feed, one of the aggregator servers sends request to all the leaf servers for your friends and ranks the results. The aggregator knows which servers to send requests to based on the userid of each friend.
This is terribly simplified, of course. This only works because all of it is memcached, the system is designed to minimize latency, some ranking is done at the leaf server that contains the friend's feed items, etc.
You really don't want to be hitting the database for any of this to work at a reasonable speed. FB use MySql mostly as a key-value store; JOINing tables is just impossible at their scale. Then they put memcache servers in front of the databases and application servers.
Having said that, don't worry about scaling problems until you have them (unless, of course, you are worrying about them for the fun of it.) On day one, scaling is the least of your problems.