Elasticsearch nested geo-shape query - elasticsearch

Suppose I have the following mapping:
"mappings": {
"doc": {
"properties": {
"name": {
"type": "text"
},
"location": {
"type": "nested",
"properties": {
"point": {
"type": "geo_shape"
}
}
}
}
}
}
}
There is one document in the index:
POST /example/doc?refresh
{
"name": "Wind & Wetter, Berlin, Germany",
"location": {
"type": "point",
"coordinates": [13.400544, 52.530286]
}
}
How can I make a nested geo-shape query?
Example of usual geo-shape query from the documentation (the "bool" block can be skipped):
{
"query":{
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates" : [[13.0, 53.0], [14.0, 52.0]]
},
"relation": "within"
}
}
}
}
}
}
Example of a nested query is:
{
"query": {
"nested" : {
"path" : "obj1",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{ "match" : {"obj1.name" : "blue"} },
{ "range" : {"obj1.count" : {"gt" : 5}} }
]
}
}
}
}
}
Now how to combine them? In the documentation it is mentioned that nested filter has been replaced by nested query. And that it behaves as a query in “query context” and as a filter in “filter context”.
If I try query for intersect with the point:
{
"query": {
"nested": {
"path": "location",
"query": {
"geo_shape": {
"location.point": {
"shape": {
"type": "point",
"coordinates": [
13.400544,
52.530286
]
},
"relation": "disjoint"
}
}
}
}
}
}
I still get back the document even if relation is "disjoint", so it's not correct. I tried different combinations, with "bool" and "filter", etc. but query is ignored, returning the whole index. Maybe it's impossible with this type of mapping?
Clearly I am missing something here. Can somebody help me out with that, please? Any help is greatly appreciated.

Related

How to filter on nested document length by script in Elasticsearch

I am trying to filter documents that have at least a given amount of items in a nested field, but I keep getting the following exception:
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "No field found for [items] in mapping"
}
Here's an example code to reproduce:
PUT store
{
"mappings": {
"properties": {
"subject": {
"type": "keyword"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"count": {
"type": "integer"
}
}
}
}
}
}
POST store/_bulk?refresh=true
{"create":{"_index":"store","_id":"1"}}
{"type":"appliance","items":[{"name":"Color TV"}]}
{"create":{"_index":"store","_id":"2"}}
{"type":"vehicle","items":[{"name":"Car"},{"name":"Bicycle"}]}
{"create":{"_index":"store","_id":"3"}}
{"type":"instrument","items":[{"name":"Guitar"},{"name":"Piano"},{"name":"Drums"}]}
GET store/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['items'].size() > 1"
}
}
}
]
}
}
}
Please note that this is only a simplified filter script of what I really wanted to do, and if I can get over this, I will probable be able to solve my task as well.
Any help would be appreciated.
I ended up solving it with a custom score approach:
GET store/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": "params['_source']['items'].length > 1 ? 1 : 0"
}
}
}
]
}
}
}

Elasticsearch query by generic properties: keywords and numeric values

I have this mapping in ES 7.9:
{
"mappings": {
"properties": {
"cid": {
"type": "keyword",
"store": true
},
"id": {
"type": "keyword",
"store": true
},
"a": {
"type": "nested",
"properties": {
"attribute":{
"type": "keyword"
},
"key": {
"type": "keyword"
},
"num": {
"type": "float"
}
}
}
}
}
}
And some documents indexed like:
{
"cid": "177",
"id": "1",
"a": [
{
"attribute": "tags",
"key": [
"heel",
"thong",
"low_heel",
"economic"
]
},
{
"attribute": "weight",
"num": 15
}
]
}
Basically, an object can have multiple attributes (a property array).
Those attributes can be different for each client. In this example, I have 2 types of attributes: tag and weight, however other documents could have other attributes like vendor, size, power, etc., so the model has to be generic enough to support beforehand unknown attributes.
An attribute can be a list of keywords (like tags) or a numeric value (like weight).
I need an ES query to fetch the documents ids with this pseudo-query:
cid="177" and (tag="flat" or tag="heel") and tag="economic" and weight<20
I managed to reach this query that seems to be working as expected:
{
"_source": ["id"],
"query": {
"bool": {
"must" : [
{"term" : { "cid" : "177" }},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "tags"}},
{"terms" : { "a.key": ["flat","heel"]}}
]
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "tags"}},
{"term" : { "a.key": "economic"}}
]
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "weight" } },
{"range": { "a.num": {"lt": 20} } }
]
}
}
}
}
]
}
}
}
Is this query correct or I am getting the correct results by chance?
Is the query (or mapping) optimal or I should rethink something?
Can the query be simplified?
The query is correct.
The mapping is great and the query is optimal.
While the query can be simplified:
{
"_source": [
"id"
],
"query": {
"bool": {
"must": [
{
"term": {
"cid": "177"
}
},
{
"nested": {
"path": "a",
"query": {
"query_string": {
"query": "a.attribute:tags AND ((a.key:flat OR a.key:heel) AND a.key:economic)"
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"query_string": {
"query": "a.attribute:weight AND a.num:<20"
}
}
}
}
]
}
}
}
it'd be less optimal due to the fact that these query_strings would still need to be internally compiled into essentially the query DSL that you've got above. Plus you'd still be needing the two separate nested groups so... You're good to roll with what you've got.

Finding nested result of a certain parent

I have the following query which works in giving me all periods (nested) +houses they belong to that have an arrivaldate for the period I specify.
Now I want to try and get just the arrivaldates for a certain house, but I cannot figure out the syntax of how to do this in Elasticsearch.
GET /houses/house/_search
{
"_source" : ["HouseId"],
"query": {
"nested": {
"path": "Periods",
"query": {
"bool": {
"must": [
{"range": {
"Periods.ArrivalDate": {
"gte" : "2017-10-01",
"lt" : "2017-11-01"
}
}
}
]
}
},
"inner_hits" : {}
}
}
}
The mapping is this (shortened to I hope the relevant parts)
{
"houses": {
"mappings": {
"house": {
"properties": {
"Periods": {
"type": "nested",
"properties": {
"ArrivalDate": {
"type": "date",
"format": "yyyy-MM-dd"
},
....
"HouseId": {
"type": "keyword"
},
So I would like to find the available arrivaldates for a house with a certain HouseId within a certain month
I think I have it figured out, but please let me know if better solutions are available:
{
"_source":[
"HouseCode",
"Country",
"Region"
],
"query":{
"bool":{
"must":[
{
"match":{
"HouseId":"someid"
}
},
{
"nested":{
"path":"Periods",
"query":{
"range":{
"Periods.ArrivalDate":{
"gte":"2017-05-01",
"lt":"2017-06-01"
}
}
},
"inner_hits":{
"size":1000
}
}
}
]
}
}
}
The query will return you whole houses.
If you wan to get only some Periods, you should use a nested aggregations, combined with a filter aggregation :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html

ElasticSearch query on tags

I am trying to crack the elasticsearch query language, and so far I'm not doing very good.
I've got the following mapping for my documents.
{
"mappings": {
"jsondoc": {
"properties": {
"header" : {
"type" : "nested",
"properties" : {
"plainText" : { "type" : "string" },
"title" : { "type" : "string" },
"year" : { "type" : "string" },
"pages" : { "type" : "string" }
}
},
"sentences": {
"type": "nested",
"properties": {
"id": { "type": "integer" },
"text": { "type": "string" },
"tokens": { "type": "nested" },
"rhetoricalClass": { "type": "string" },
"babelSynsetsOcc": {
"type": "nested",
"properties" : {
"id" : { "type" : "integer" },
"text" : { "type" : "string" },
"synsetID" : { "type" : "string" }
}
}
}
}
}
}
}
}
It mainly resembles a JSON file referring to a pdf document.
I have been trying to make queries with aggregations and so far is going great. I've gotten to the point of grouping by (aggregating) rhetoricalClass, get the total number of repetitions of babelSynsetsOcc.synsetID. Heck, even the same query even by grouping the whole result by header.year
But, right now, I am struggling with filtering the documents that contain a term and doing the same query.
So, how could I make a query such that grouping by rhetoricalClass and only taking into account those documents whose field header.plainText contains either ["Computational", "Compositional", "Semantics"]. I mean contain instead of equal!.
If I were to make a rough translation to SQL it would be something similar to
SELECT count(sentences.babelSynsetsOcc.synsetID)
FROM jsondoc
WHERE header.plainText like '%Computational%' OR header.plainText like '%Compositional%' OR header.plainText like '%Sematics%'
GROUP BY sentences.rhetoricalClass
WHERE clauses are just standard structured queries, so they translate to queries in Elasticsearch.
GROUP BY and HAVING loosely translate to aggregations in Elasticsearch's DSL. Functions like count, min max, and sum are a function of GROUP BY and it's therefore also an aggregation.
The fact that you're using nested objects may be necessary, but it adds an extra layer to each part that touches them. If those nested objects are not arrays, then do not use nested; use object in that case.
I would probably look at translating your query to:
{
"query": {
"nested": {
"path": "header",
"query": {
"bool": {
"should": [
{
"match": {
"header.plainText" : "Computational"
}
},
{
"match": {
"header.plainText" : "Compositional"
}
},
{
"match": {
"header.plainText" : "Semantics"
}
}
]
}
}
}
}
}
Alternatively, it could be rewritten as this, which is a little less obvious of its intent:
{
"query": {
"nested": {
"path": "header",
"query": {
"match": {
"header.plainText": "Computational Compositional Semantics"
}
}
}
}
}
The aggregation would then be:
{
"aggs": {
"nested_sentences": {
"nested": {
"path": "sentences"
},
"group_by_rhetorical_class": {
"terms": {
"field": "sentences.rhetoricalClass",
"size": 10
},
"aggs": {
"nested_babel": {
"path": "sentences.babelSynsetsOcc"
},
"aggs": {
"count_synset_id": {
"count": {
"field": "sentences.babelSynsetsOcc.synsetID"
}
}
}
}
}
}
}
}
Now, if you combine them and throw away hits (since you're just looking for the aggregated result), then it looks like this:
{
"size": 0,
"query": {
"nested": {
"path": "header",
"query": {
"match": {
"header.plainText": "Computational Compositional Semantics"
}
}
}
},
"aggs": {
"nested_sentences": {
"nested": {
"path": "sentences"
},
"group_by_rhetorical_class": {
"terms": {
"field": "sentences.rhetoricalClass",
"size": 10
},
"aggs": {
"nested_babel": {
"path": "sentences.babelSynsetsOcc"
},
"aggs": {
"count_synset_id": {
"count": {
"field": "sentences.babelSynsetsOcc.synsetID"
}
}
}
}
}
}
}
}

Term, nested documents and must_not query incompatible in ElasticSearch?

I have trouble combining term, must_not queries on nested documents.
Sense example can be found here : http://sense.qbox.io/gist/be436a1ffa01e4630a964f48b2d5b3a1ef5fa176
Here my mapping :
{
"mappings": {
"docs" : {
"properties": {
"tags" : {
"type": "nested",
"properties" : {
"type": {
"type": "string",
"index": "not_analyzed"
}
}
},
"label" : {
"type": "string"
}
}
}
}
}
with two documents in this index :
{
"tags" : [
{"type" : "POST"},
{"type" : "DELETE"}
],
"label" : "item 1"
},
{
"tags" : [
{"type" : "POST"}
],
"label" : "item 2"
}
When I query this index like this :
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must": {
"term": {
"tags.type": "DELETE"
}
}
}
}
}
}
}
I've got one hit (which is correct)
When I want to get documents WHICH DON'T CONTAIN the tag "DELETE", with this query :
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must_not": {
"term": {
"tags.type": "delete"
}
}
}
}
}
}
}
I've got 2 hits (which is incorrect).
This issue seems very close to this one (Elasticsearch array must and must_not) but it's not...
Can you give me some clues to resolve this issue ?
Thank you
Your original query would search in each individual nested object and eliminate the objects that don't match, but if there are some nested objects left, they do match with your query and so you get your results. This is because nested objects are indexed as a hidden separate document
Original code:
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must_not": {
"term": {
"tags.type": "delete"
}
}
}
}
}
}
}
The solution is then quite simple really, you should bring the bool query outside the nested documents. Now all the documents are discarded who have a nested object with the "DELETE" type. Just what you wanted!
The solution:
{
"query": {
"bool": {
"must_not": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.type": "DELETE"
}
}
}
}
}
}
}
NOTE: Your strings are "not analyzed" and you searched for "delete" instead of "DELETE". If you want to search case insensitive, make your strings analyzed
This should fix your problem: http://sense.qbox.io/gist/f4694f542bc76c29624b5b5c9b3ecdee36f7e3ea
Two most important things:
include_in_root on "tags.type". This will tell ES to index tag types as "doc.tags.types" : ['DELETE', 'POSTS'], so you can access an array of those values "flattened" on the root doc . This means you no longer need a nested query (see #2)
Drop the nested query.
{
"mappings": {
"docs" : {
"properties": {
"tags" : {
"type": "nested",
"properties" : {
"type": {
"type": "string",
"index": "not_analyzed"
}
},
"include_in_root": true
},
"label" : {
"type": "string"
}
}
}
}
}
{
"query": {
"bool": {
"must_not": {
"term": {
"tags.type": "DELETE"
}
}
}
}
}

Resources