Smart sorting by function of geo and int - algorithm

I'm thinking about ways to solve the following task.
We are developing a service (website) which has some objects. Each object has geo field (lat and long). It's about 200-300 cities with objects can be connected. Amount of objects is thousands and tens of thousands.
Also each object has date of creation.
We need to search objects with sorting by function of distance and freshness.
E.g. we have two close cities A and B. User from city A authorizes and he should see objects from city A and then, on some next pages, from city B (because objects from A are closer).
But, if there is an object from A which was added like a year ago, and an object from B which was added today, then B's object should be displayed befare A's one.
So, for peoeple from city A we can create special field with relevant index like = 100*distance + age_in_days
And then sort by this field and we will get data as we need.
The problem is such relevant index will not work for all other people from other places.
In my example i used linear function but it's just an example, we will need to fit correct function.
The site will work on our servers, so we can use almost any database or any other software (i supposed to use mongodb)

I have following ideas
Recacl relevant index every day and keep it with object like
{
fields : ...,
relindex : {
cityA : 100,
cityB : 120
}
}
And if user belongs to cityA then sort by relindex.cityA
Disadvantages:
Recurrent update of all object, but i dont think it's a hude problem
Huge mongo index. If we have about 300 cities than each object will have 300 indexed fields
Hard to add new cities.
Use 3d spatial index: (lat, long, freshness). But i dont know if any database supports 3d geo-patial
Compact close objects in cluster and search only in cluster but not by whole base. But im not sure that it's ok.

I think there are four possible solutions:
1) Use 3D index - lat, lon, time.
2) Distance is more important - use some geo index and select nearest objects. If the object is too old then discard it and increase allowed distance. Stop after you have enough objects.
3) Time is more important - index by time and discard the objects which are too far.
4) Approximate distance - choose some important points (centre of cities or centre of clusters of objects) and calculate the distances from these important points up front. The query will first find the nearest important point and then use index to find the data.
Alternatively you can create clusters from your objects and then calculate the distance in the query. The point here is that the amount of clusters is limited.

Related

Elasticsearch Data in Grafana without timestamp

I am wondering if it is possible to have data from elasticsearch indices without timestamp attached to them.
I need a list of two columns as a drop down. This list is cross checked against another index to generate maps but if I zoom into the graph breaks cause the drop down list exists from time a to be but not from c to d. (lol)
My macgyver solution to this is to just add the list every few minutes into the index so on the graph, the data is reasonably dense. This allows the user to zoom in pretty well into different parts of the graph. But overtime this is going to make my index unreasonably large.

Determine Mapping Function or Approximation from Massive Amount of Data

Is there a good/well known approach to flushing out a mapping function/approximation having access to lots of mapped data?
E.g. Situation
Let's say I have the domain space of a 3D cube (bottom_left: 0,0,0, top_right: 10,10,10). E.g. points: (0,0,1), (1,2,3) etc.
Each point maps to a separate series of 3 values in the solution space. We do NOT know the mapping function which is I guess perhaps the heart of the problem. But we do have a massive amount of mapped data. From the data these values were found to range from (-30.0 to +30.0). E.g. data:
[0,0,1] -> (0.1, 0.1, 0.1), [1,2,3]-> (10.2, 3.1, 29.3) etc.
Any two different keys CAN be mapped to the same point in the solution space however these keys will be positioned far away from one another in the domain space.
We also have the last position and a condition where the searched domain position cannot be a greater distance than a given distance e.g (0.1,0.1,0.1) from the last position. I feel like this can be used somehow to eliminate the condition of identical solution space values?
If I have a random point (2.3,6.5,2.6) which is in the solution space, how would I find the nearest domain value? Since I have massive amounts of data is there a good approach to flush out a mapping function/approximation?

Best Workaround for Geopoint SubQuery

I have a class with objects that contain a geopoint type. Each object also has a boolean entitled "goodPoint".
I need to return these objects sorted from distance to user's location.
I have been using the "near" function to accomplish this.
I would also like to only grab the objects who have the boolean "goodPoint" set to "true".
I would typically use "isEqual" or "wherekey".
However, it is my understanding that parse does not support combinational queries when a geopoint query (like "near") is also used.
What is the best workaround for effectively achieving my desired result without the use of the unsupported combinational query?
Possible thoughts:
I would like to get all 1000 points. I could filter out client side, but am afraid this will not be scalable in the long run as I anticipate around 10%-20% of my points to be "bad" (goodPoint=false) where worst cast case would limit me to 800 points.
I could create a "graveyard" to send bad points so that they don't list in the nearest 1000, but I'm not sure where exactly to put the points for latitude and longitude.
I could move the "bad" points to another class, but parse doesn't seem to allow you to move objects across classes.
I also could just delete the points, but I need to keep them for user feedback purposes.

MongoDB geospacial query

I use mongo's "$near" query, it works as expected and saves me a lot of time.
Now I need to perform something more complicated. Imagine, we have a collection of "checkins" (let's use foursquare notation), that contains the geospacial information (nothing unusual: just lat and lng) and time. Given the checkins by two people, how do I find their "were near to each other" checkins? I mean, e.g.: "1/23/12 you've been 100 meters away"
The easiest solution is to select all the checkins by the first user and find nearest checkin for each first user's checkin on the framework side (I use ruby). But is it the most efficient solution?
Do you have better ideaas? May be I need some kind of a special index?
Best,
Roman
The MongoDB GeoSpatial indexes provide two types of queries: $near and $within. The $near query returns all points in the database that are within a certain range of a requested point, while the $within query lists all points in the database that are inside of a particular area (box, circle, or arbitrary polygon).
MongoDB does not currently provide a query that will return all points that are within a certain distance of any member of another set of points, which is what you seem to want.
You could conceivably use the point data from user1 to build a polygon describing the "area of interest" and then use the $within query to see if there were any checkins by other people inside of that area. If you use a compound index on location & date, you could even restrict the query to folks who were inside of that area on a particular day.
References:
http://docs.mongodb.org/manual/core/indexes/#geospatial-indexes
http://docs.mongodb.org/manual/reference/operators/#geospatial

ObjectMapper: Find geo places within a certain square, sorted by proximity

I am building a Ruby app on Heroku using Sinatra and a PostgreSQL database interfaced with ObjectMapper. I need to run a query which returns a list of all locations in a database (which each have latitude and longitude attributes) within a certain rectangle (corresponding to the visible map region).
I can do this by searching for latitudes which fall within the map bounds, same for longitude. My question however is, how do I return these results sorted by proximity? I could get all results matching the query and then sort them once they are out of the database, but I want to run this query in batches and return only say the nearest 5 places, then places 6-10, then 11-15, etc.
Can this be done?
EDIT: I have not decided yet whether to use PostgreSQL for sure, I might use MongoDB if it is appropriate.
The immediate question is proximity to what? You need to define a point to use as the basis for the proximity. You can then use st_distance from the ORDER BY clause to sort by distance between the geometry objects. This can be combined with LIMIT and OFFSET to do exactly what you want.

Resources