How to design ElasticSearch document(s) for geo shape data? - elasticsearch

I'm trying to build a geoloc service (from France only) with Elastic Search. I have several indexes to create such as country, region, county, city, neighbour containing itself a polygon.
For each city, I have a list of the nearest cities to include.
How should I design this kind of data, I would like to be able to:
search by coordinates
search by full text
search by zip code, city name, neighbour name
manage faceting (n cities in n county in n region for instance)
etc...
I don't know if I have to use the path Hierarchy Tokenizer pattern.
I was trying to build a unique index Neighbours by referencing the data from the bottom to the top(neighbour => city => county => region => country). But I don't know if it's a good idea regarding the volume of data. (1 country, 40 regions (old + new ones), 101 counties, 36000 cities and 7700 neighbours), especially the polygones.
Any idea?
Thanks in advance!!

Related

Kibana visualization for grouping and filter

I have my data in Elasticsearch like below
I'm trying to create a pie chart in kibana which shows the percentage of people who visited both UK and India and % of people who visited only India and % percentage of people who visited only UK.
But I'm not able to find a way to group by person name and do filter on the country visited in the pie chart in kibana. Any way to do this?
I believe you are looking for filterAggregation : Filter Aggregation
Effectively it enables aggregation on different filters.
Add this as a second aggregation(split slice) after the Persons Aggregation.
If you are just interested in count and Not counts per person, just leverage Filter Aggregation with respective filters which will give you the percentages per filter.

Google distance matrix api issue with zipcode with different formats

I need to get drive times for a place to the nearest airports. There is a service which is already in place which gives me the address of the place as well as the address of nearest airports in the north america region. One airport has a 5 digit zipcode, one has the zipcode this format 83402 4906, other has a zipcode in this format(canada) A12 123
The distance matrix api is not accepting all other zipcode formats apart from the 5 digit one. If I remove the space its not working.
I need to get the drive times irrespective of the zipcode format

ElasticSearch: Trying to get City, State within distance to latitude/longitude and sort by distance from origin

What I have is a dataset that has IP Ranges and has city, state, zipcode, latitude, longitude for each IP Range.
What I want to get out is the city, state values that are closest to a given lat/lon. So, if this were SQL I'd do a group by city, state. How do I do this in ElasticSearch? I've been trying to use facets, but I'm either getting ALL city,state values in the database (via term facet) or I end up getting no city,state values.
Note that I do have a separately indexed field that contains city and state in a single indexed, not_analyzed field.
I'm also unsure where to do the GeoDistance check. I've tried using a Query, a Filter, and filter facet. I'm basically at a loss right now. Not sure if I've been staring at it too long or what.
EDIT: So, I've figured this out using an aggregation (added in 1.0) but I want to expose a minimum distance so that I can sort these aggregation buckets by the minimum geodistance for each bucket. So, let's say I have a bucket that contains all documents for NEW YORK, NEW YORK and I have another bucket that contains all documents for ALBANY, NEW YORK. Now, I want to be able to access the minimum geodistance for each bucket so that I know if Albany or New York City is closer to my origin point.
Pretty sure this page has everything you need:
http://www.elasticsearch.org/blog/geo-location-and-search/
If there's anything you need after reading that feel free to ask ;)

Smart sorting by function of geo and int

I'm thinking about ways to solve the following task.
We are developing a service (website) which has some objects. Each object has geo field (lat and long). It's about 200-300 cities with objects can be connected. Amount of objects is thousands and tens of thousands.
Also each object has date of creation.
We need to search objects with sorting by function of distance and freshness.
E.g. we have two close cities A and B. User from city A authorizes and he should see objects from city A and then, on some next pages, from city B (because objects from A are closer).
But, if there is an object from A which was added like a year ago, and an object from B which was added today, then B's object should be displayed befare A's one.
So, for peoeple from city A we can create special field with relevant index like = 100*distance + age_in_days
And then sort by this field and we will get data as we need.
The problem is such relevant index will not work for all other people from other places.
In my example i used linear function but it's just an example, we will need to fit correct function.
The site will work on our servers, so we can use almost any database or any other software (i supposed to use mongodb)
I have following ideas
Recacl relevant index every day and keep it with object like
{
fields : ...,
relindex : {
cityA : 100,
cityB : 120
}
}
And if user belongs to cityA then sort by relindex.cityA
Disadvantages:
Recurrent update of all object, but i dont think it's a hude problem
Huge mongo index. If we have about 300 cities than each object will have 300 indexed fields
Hard to add new cities.
Use 3d spatial index: (lat, long, freshness). But i dont know if any database supports 3d geo-patial
Compact close objects in cluster and search only in cluster but not by whole base. But im not sure that it's ok.
I think there are four possible solutions:
1) Use 3D index - lat, lon, time.
2) Distance is more important - use some geo index and select nearest objects. If the object is too old then discard it and increase allowed distance. Stop after you have enough objects.
3) Time is more important - index by time and discard the objects which are too far.
4) Approximate distance - choose some important points (centre of cities or centre of clusters of objects) and calculate the distances from these important points up front. The query will first find the nearest important point and then use index to find the data.
Alternatively you can create clusters from your objects and then calculate the distance in the query. The point here is that the amount of clusters is limited.

Need to sort postal addresses in a database table in sequence

I'm developing an Android app that has postal address routing in it.
We would like to sort each address in a database table in such a way where each address can be printed out for our driver. We don't want to spend time sorting addresses ourselves when a database table would be ideal for that.
I was thinking somehow to create a primary key in the table on a number that we can easily sort in ascending order. That would make it easy to create a manifest for the driver.
I was researching and found that there was something called WOEID (Where on Earth ID) which I think represents a region but I'm not sure if it would be good for us since 2 postal address in the same neighbourhood could use the same WOEID.
For example these 3 addresses have the same WOEID:
100 Bowden St., Lowell, MA
131 Stedman St., Lowell, MA
50 Stromquest Ave., Lowell, MA
The concept of using a single number like WOEID is perfect for us but we need to sort addresses like these in order so we know which ones are closest to each other.
Maybe there is a web site we can use to send out a request from within our app providing them with the postal address and the site will return a single number representing that address in a format our app can parse.
Maybe there is a way to Geocode or convert latitude and longitude numbers into a single number we can use.
Thanks.
So the problems arise when multiple addresses share the same WOEID? I don't know how many resources are available but couldn't it be in idea to fed this data to some API like Google Maps and calculate the distance from your position to there?
If you want to avoid google a really basic approach could be to do the calculation by yourself. (Though this would be more of an estimation.)
However I got the feeling you want to find the best route over total distance so I think your problem is also related to the travelling salesman problem
Given a list of cities and their pairwise distances, the task is to
find the shortest possible tour that visits each city exactly once.
I found a web site that allows me to do a reverse Geocoding. It returns a latitude and longitude.
I experimented a bit and found that I can add the latitude and longitude together to get a single number. Since giving an exact address to the web site will give results not completely what I'm looking for I give it just a street and city and US state to locate. I can then use the latitude and longitude returned and then sort the house addresses in numeric ascending order.
For example we have many streets that are very grid like so 1 number from the adding of the latitude and longitude represent many homes on 1 particular street. On another street a block away I did the same thing. Now all I have to do is just list everything in the database table like this:
City chosen by a dropdown, US state also chosen by a dropdown, then the geocoded number for each street followed by the home numbers.
Here is the web site I used to get the Geocoding along with a sample street address:
http://where.yahooapis.com/geocode?q=stedman+st,+lowell,+ma

Resources