I use mongo's "$near" query, it works as expected and saves me a lot of time.
Now I need to perform something more complicated. Imagine, we have a collection of "checkins" (let's use foursquare notation), that contains the geospacial information (nothing unusual: just lat and lng) and time. Given the checkins by two people, how do I find their "were near to each other" checkins? I mean, e.g.: "1/23/12 you've been 100 meters away"
The easiest solution is to select all the checkins by the first user and find nearest checkin for each first user's checkin on the framework side (I use ruby). But is it the most efficient solution?
Do you have better ideaas? May be I need some kind of a special index?
Best,
Roman
The MongoDB GeoSpatial indexes provide two types of queries: $near and $within. The $near query returns all points in the database that are within a certain range of a requested point, while the $within query lists all points in the database that are inside of a particular area (box, circle, or arbitrary polygon).
MongoDB does not currently provide a query that will return all points that are within a certain distance of any member of another set of points, which is what you seem to want.
You could conceivably use the point data from user1 to build a polygon describing the "area of interest" and then use the $within query to see if there were any checkins by other people inside of that area. If you use a compound index on location & date, you could even restrict the query to folks who were inside of that area on a particular day.
References:
http://docs.mongodb.org/manual/core/indexes/#geospatial-indexes
http://docs.mongodb.org/manual/reference/operators/#geospatial
Related
I'm trying to create a simple tool which will allow a user to specify two places around Seattle.
I'm working with the WSDOT traffic data set. An example of the output can be found here: https://gist.github.com/jaxxstorm/0ab818b300f65cf3a46cc01dbc35bf60
What I'd like to be able to do is specify two locations like:
Bellevue
Seattle
and then lookup all traffic times for those locations.
I'm considering doing a reverse geocode like this answer but I want it to be "fuzzy" in that I don't want people to have to specify exact locations. I also suspect the processing time for this might be long as I'd have to loop through the list, and reverse lookup all the coordinates which could take a short while
Is there any better alternatives for processing this data in this way? I'm writing the tool in Go
You have two problems for each set of points (start and end):
Convert locations to lat lon
Fuzzy match lat,lon to this traffic data (which contains lat,lon)
The location to lat,lon conversion is pretty straightforward using a reverse geocoding api like the one available from google.
To match lat,lon fuzzily, you could either truncate lat lon and store that as a hash (so that you're storing approximate matches), then lookup data that way, or you could do a radius calc and pick results within that radius (this requires some math involving the radius of the earth which you can look up easily enough, it can be done in sql if your data is in a db for example).
I have a class with objects that contain a geopoint type. Each object also has a boolean entitled "goodPoint".
I need to return these objects sorted from distance to user's location.
I have been using the "near" function to accomplish this.
I would also like to only grab the objects who have the boolean "goodPoint" set to "true".
I would typically use "isEqual" or "wherekey".
However, it is my understanding that parse does not support combinational queries when a geopoint query (like "near") is also used.
What is the best workaround for effectively achieving my desired result without the use of the unsupported combinational query?
Possible thoughts:
I would like to get all 1000 points. I could filter out client side, but am afraid this will not be scalable in the long run as I anticipate around 10%-20% of my points to be "bad" (goodPoint=false) where worst cast case would limit me to 800 points.
I could create a "graveyard" to send bad points so that they don't list in the nearest 1000, but I'm not sure where exactly to put the points for latitude and longitude.
I could move the "bad" points to another class, but parse doesn't seem to allow you to move objects across classes.
I also could just delete the points, but I need to keep them for user feedback purposes.
I'm thinking about ways to solve the following task.
We are developing a service (website) which has some objects. Each object has geo field (lat and long). It's about 200-300 cities with objects can be connected. Amount of objects is thousands and tens of thousands.
Also each object has date of creation.
We need to search objects with sorting by function of distance and freshness.
E.g. we have two close cities A and B. User from city A authorizes and he should see objects from city A and then, on some next pages, from city B (because objects from A are closer).
But, if there is an object from A which was added like a year ago, and an object from B which was added today, then B's object should be displayed befare A's one.
So, for peoeple from city A we can create special field with relevant index like = 100*distance + age_in_days
And then sort by this field and we will get data as we need.
The problem is such relevant index will not work for all other people from other places.
In my example i used linear function but it's just an example, we will need to fit correct function.
The site will work on our servers, so we can use almost any database or any other software (i supposed to use mongodb)
I have following ideas
Recacl relevant index every day and keep it with object like
{
fields : ...,
relindex : {
cityA : 100,
cityB : 120
}
}
And if user belongs to cityA then sort by relindex.cityA
Disadvantages:
Recurrent update of all object, but i dont think it's a hude problem
Huge mongo index. If we have about 300 cities than each object will have 300 indexed fields
Hard to add new cities.
Use 3d spatial index: (lat, long, freshness). But i dont know if any database supports 3d geo-patial
Compact close objects in cluster and search only in cluster but not by whole base. But im not sure that it's ok.
I think there are four possible solutions:
1) Use 3D index - lat, lon, time.
2) Distance is more important - use some geo index and select nearest objects. If the object is too old then discard it and increase allowed distance. Stop after you have enough objects.
3) Time is more important - index by time and discard the objects which are too far.
4) Approximate distance - choose some important points (centre of cities or centre of clusters of objects) and calculate the distances from these important points up front. The query will first find the nearest important point and then use index to find the data.
Alternatively you can create clusters from your objects and then calculate the distance in the query. The point here is that the amount of clusters is limited.
I am building a Ruby app on Heroku using Sinatra and a PostgreSQL database interfaced with ObjectMapper. I need to run a query which returns a list of all locations in a database (which each have latitude and longitude attributes) within a certain rectangle (corresponding to the visible map region).
I can do this by searching for latitudes which fall within the map bounds, same for longitude. My question however is, how do I return these results sorted by proximity? I could get all results matching the query and then sort them once they are out of the database, but I want to run this query in batches and return only say the nearest 5 places, then places 6-10, then 11-15, etc.
Can this be done?
EDIT: I have not decided yet whether to use PostgreSQL for sure, I might use MongoDB if it is appropriate.
The immediate question is proximity to what? You need to define a point to use as the basis for the proximity. You can then use st_distance from the ORDER BY clause to sort by distance between the geometry objects. This can be combined with LIMIT and OFFSET to do exactly what you want.
We're trying to add a simple search functionality to our website that lists restaurants. We try to detect the place name, location, and place features from the search string, something like "cheap restaurants near cairo" or "chinese and high-end food in virginia".
What we are doing right now it tokenizing the query and searching in the tables with the least performance cost first (the table of prices (cheap-budget-expensive-high-end) is smaller than the tables of the places list). Is this the right approach ?
--
Regards.
Yehia
I'd say you should build sets of synonyms (e.g. cheap, low budget, etc go into synset:1) and map each token from the search string to one of those groups.
Btw, it will be easy to handle spelling mistakes here since this is genereally a pretty small search space. Edit distance, common k-grams, ... anything should be alright.
In a next step you should build inverted index lists for each of those syn-groups the map to a sorted list of restaurants that can be associated with that property. For each syngroup from a query, get all those lists and simply intersect them.
Words that cannot be mapped to one of those synsets will probably have to be ignored unless you have some sort of fulltexts about the restaurants that you could index as well. In that can you can also buildsuch restaurant lists for "normal" words and intersect them as well. But this would already be quite close to classical search engines and it might be a good idea to use a technology like apache lucence. Without fulltexts I don't think you'd need such a thing because an inverted index of snygroups is really easy to process on your own.
Seems you may be missing how misspelled queries are handled.