How do I use Elasticsearch's geo_point and geo_shape types at the same time? - elasticsearch

Here is our document:
{
"geometry" : {
"type" : "Point",
"coordinates" : [ -87.662682, 41.843014 ]
}
}
We'd like to do a geo_shape search with a _geo_distance sort, both against the same geometry field. The former requiresgeo_shape types while the latter requires geo_point.
These two indexes succeed individually, but not together:
"geometry": {
"type": "geo_shape"
}
and
"geometry": {
"properties": {
"coordinates": {
"type": "geo_point"
}
}
},
So far we've tried these and failed:
"geometry": {
"type": "geo_shape"
},
"geometry.coordinates": {
"type": "geo_point"
},
also
"geometry": {
"copy_to": "geometryShape",
"type": "geo_shape"
},
"geometryShape": {
"properties": {
"coordinates": {
"type": "geo_point"
}
}
}
also
"geometry": {
"copy_to": "geometryShape",
"properties": {
"coordinates": {
"type": "geo_point"
}
}
},
"geometryShape": {
"type": "geo_shape"
}
Any ideas on how to create this index properly?

If scripting is enabled then you could achieve it via specifying transforms in mapping
would look something on these lines :
put test/test_type/_mapping
{
"transform": {
"script": "if (ctx._source['geometry']['coordinates']) ctx._source['test'] = ctx._source['geometry']['coordinates']",
"lang": "groovy"
},
"properties": {
"geometry": {
"type": "geo_shape"
},
"coordinates": {
"type": "geo_point"
}
}
}

I'd go with a function_score_query and your original mapping with just the geo_shape. Elasticsearch docs suggest that scoring by distance is usually better than sorting by distance.
On another note, have you checked out using a geo_bounding_box with a geo_point? I'm not sure exactly what you're using the geo_shape type for but you may be able to replicate it using the bounding box. Check out an example here.

Related

How to avoid index explosion in ElasticSearch

I have two docs from the same index that originally look like this (only _source value is shown here)
{
"id" : "3",
"name": "Foo",
"property":{
"schemaId":"guid_of_the_RGB_schema_defined_extenally",
"value":{
"R":255,
"G":100,
"B":20
}
}
}
{
"id" : "2",
"name": "Bar",
"property":{
"schemaId":"guid_of_the_HSL_schema_defined_extenally",
"value":{
"H":255,
"S":100,
"L":20
}
}
}
The schema(used for validation of value) is stored outside of ES since it has nothing to do with the indexing.
If I don't define mapping, the value field will be consider Object mapping. And its subfield will grow once there is a new subfield.
Currently, ElasticSearch supports Flattened mapping https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html to prevent this explosion in the index. However it has a limited support for searching for inner field due to its restriction: As with queries, there is no special support for numerics — all values in the JSON object are treated as keywords. When sorting, this implies that values are compared lexicographically.
I need to be able to query the index to find the document match a given doc (e.g. B in the range [10,30])
So far I come up with a solution that structure my doc like this
{
"id":4,
"name":"Boo",
"property":
{
"guid_of_the_normalized_RGB_schema_defined_extenally":
{
"R":0.1,
"G":0.2,
"B":0.5
}
}
Although it does not solve my issue of the explosion in mapping, it mitigates some other issue.
My mapping now will look similar like this for the field property
"property": {
"properties": {
"guid_of_the_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "long"
},
"G": {
"type": "long"
},
"R": {
"type": "long"
}
}
},
"guid_of_the_normalized_RGB_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
},
"guid_of_the_HSL_schema_defined_extenally": {
"properties": {
"B": {
"type": "float"
},
"G": {
"type": "float"
},
"R": {
"type": "float"
}
}
}
}
}
This solve the issue with the case where the field have the same name but different data type.
Can someone suggest me a solution that could solve the explosion of indices with out suffering from the limit that the Flattened has in searching?
To avoid mapping explosion, the best solution is to normalize your data better.
You can set "dynamic": "strict", in your mapping, then a doc will be rejected if it contains a field which is not already in the mapping.
After that, you can still add new fields but you will have to add them explicitly in the mapping before.
You can add a pipeline to clean up and normalize your data before ingestion.
If you don't want, or cannot reindex:
To make your query easy even if you can not know the "middle" part of your key, you can use a multimatch with a star.
GET myindex/_search
{
"query": {
"multi_match": {
"query": 0.5,
"fields": ["property.*.B"]
}
}
}
But you will still not be able to sort it as you want.
For ordering on multiple 'unknown' field names without touching the data, you can use a script: https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html
But maybe you could simplify the whole process by adding a dynamic template to your index.
PUT test/_mapping
{
"dynamic_templates": [
{
"unified_red": {
"path_match": "property.*.R",
"mapping": {
"type": "float",
"copy_to": "unified_color.R"
}
}
},
{
"unified_green": {
"path_match": "property.*.G",
"mapping": {
"type": "float",
"copy_to": "unified_color.G"
}
}
},
{
"unified_blue": {
"path_match": "property.*.B",
"mapping": {
"type": "float",
"copy_to": "unified_color.B"
}
}
}
],
"properties": {
"unified_color": {
"properties": {
"R": {
"type": "float"
},
"G": {
"type": "float"
},
"B": {
"type": "float"
}
}
}
}
}
Then you'll be able to query any value with the same query :
GET test/_search
{
"query": {
"range": {
"unified_color.B": {
"gte": 0.1,
"lte": 0.6
}
}
}
}
For already existing fields, you'll have to add the copy_to by yourself on the mapping, and after that run an _update_by_query to populate them.

FeatureCollection to geo_shape in Elasticsearch

Whats the right way to translate a geojson FeatureCollection to a es geo_shape?
I have a FeatureCollection looking like this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[1.96, 42.455],[1.985,42.445]]]
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [...]
}
}
]
}
How can I translate this into the es geo_shape.
Currently I just index it like that (dropping type: Feature and type: FeatureCollection fields) and add a mapping saying:
"features": {
"geometry": {
"type": "geo_shape"
}
}
This seems to work fine, but feels wrong, as I give an array of geometrys.
Is this okay or would the right way be to translate the FeatureCollection to type geometrycollection? Which clearly wants multiple geometry elements.
One Followup question, can I do a query a la: Give me all elements geometrically inside Element X(where X is also in the index) in one query, without fetching X and than doing multiple follow up queries for each polygon?
The GeometryCollection is probably what you're looking for.
So if you have this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [[[1.96, 42.455],[1.985,42.445]]]
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [...]
}
}
]
}
You can index it in ES like this:
PUT example
{
"mappings": {
"doc": {
"properties": {
"location": {
"type": "geo_shape"
}
}
}
}
}
POST /example/doc
{
"location" : {
"type": "geometrycollection",
"geometries": [
{
"type": "Polygon",
"coordinates": [[[1.96, 42.455],[1.985,42.445]]]
},
{
"type": "Polygon",
"coordinates": [...]
}
]
}
}
So basically, you simply need to:
change FeatureCollection to geometrycollection
change features to geometries
populate the geometries array with the geometry inner-objects
Regarding your query, you can do it like this:
POST /example/_search
{
"query":{
"bool": {
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates" : [[13.0, 53.0], [14.0, 52.0]]
},
"relation": "within"
}
}
}
}
}
}
The within relationship returns all documents whose geo_shape field is within the geometry given in the query.

Elasticsearch: Schema without mapping?

According to Elasticsearch's roadmap, mapping types are going to be completely removed at 7.x
How are we going to give a schema structure to Documents without mapping?
For example how would we replace this (A Doc/mapping_type with 3 fields of specific data type):
PUT twitter
{
"mappings": {
"user": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
}
}
They are going to remove types (user in you example) from mapping, because there is only 1 type per index now, the rest will be the same:
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" }
}
}
}
}
As you can see, there is no user type anymore.

Elasticsearch Field Preference for result sequence

I have created the index in elasticsearch with the following mapping:
{
"test": {
"mappings": {
"documents": {
"properties": {
"fields": {
"type": "nested",
"properties": {
"uid": {
"type": "keyword"
},
"value": {
"type": "text",
"copy_to": [
"fulltext"
]
}
}
},
"fulltext": {
"type": "text"
},
"tags": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
While searching I want to set the preference of fields for example if search text found in title or url then that document comes first then other documents.
Can we set a field preference for search result sequence(in my case preference like title,url,tags,fields)?
Please help me into this?
This is called "boosting" . Prior to elasticsearch 5.0.0 - boosting could be applied in indexing phase or query phase( added as part of field mapping ). This feature is deprecated now and all mappings after 5.0 are applied in query time .
Current recommendation is to to use query time boosting.
Please read this documents to get details on how to use boosting:
1 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
2 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html

elasticsearch query child list containing specific value

I writing a query to return the products that has a specific promotionCode. In my index, product has following property indexed
"offers": [
{
"promotionCode": "MV"
},
{
"promotionCode": "LI"
},
.....
]
My initial thought the following would be the answer to
GET alias-live-dev/_search
{
"query": {
"match": {
"offers.promotionCode":"MV"
}
}
}
However, this always return 0 hit, I am guessing, it failed because offers is a list. Could anyone please advise what would the right query for this scenario. Thanks in advance.
In mapping,
"productId": {
"type": "keyword"
},
"offers": {
"type": "nested",
"properties": {
......
"promotionCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

Resources