Match document if it contains multiple nested documents in elasticsearch - elasticsearch

I have a document that contains arrays of nested documents. I have a requirement to return matches if the document contains all of the specified nested documents.
here is the relevant part of the mapping:
"element": {
"dynamic": "false",
"properties": {
"tenantId": {
"type": "string",
"index": "not_analyzed"
},
"fqn": {
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"location": {
"type": "string",
"index": "not_analyzed"
},
"tags": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"dataSourceId": {
"type": "long",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
The goal is to be able to return elements that contain all of a list of tags (although the element is permitted to contain additional tags beyond the search requirement).
Here is what I have so far:
{
"query": {
"bool": {
"filter": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [
{
"bool": {
"must":{
"term": { "tags.name": "name1" },
"term": { "tags.value": "value1" }
}
}
},
{
"bool": {
"must":{
"term": { "tags.name": "name2" },
"term": { "tags.value": "value2" }
}
}
}
]
}
}
}
}
}
}
}
The problem with this approach is that it returns 0 hits with multiple tag values (it works fine for a single value). I believe that this is because the query is requiring that a tag have multiple names and values in order to match, which obviously can't happen. Does anyone know how to query for elements that contain all of a list of tags?
edit: this is using elasticsearch 5.0

We figured it out. The answer was to create two nested queries, instead of having two clauses to the same nested query.
{
"query":{
"bool":{
"must":[{
"nested":{
"path":"tags",
"query":{
"bool":{
"must":[
{"term":{"tags.name":"name1"}},
{"term":{"tags.value":"value1"}}
]
}
}
}
},
{
"nested":{
"path":"tags",
"query":{
"bool":{
"must":[
{"term":{"tags.name":"name2"}},
{"term":{"tags.value":"value2"}}
]
}
}
}
}]
}
}
}

Related

using match_phrase twice on ES query filter? No filter registered for [match_phrase]

I have been trying to fetch a document using multiple filters.
Im currently using ES 1.7 Is it possible to use match_phrase twice on a filter?
example: people document
q=aaron&address=scarborough - searching a person by name and address, works fine.
{
"query": {
"match_phrase": {
"name": "aaron"
}
},
"filter": {
"bool": {
"must": {
"nested": {
"path": "addresses",
"query": {
"match_phrase": {
"address": "scarborough"
}
}
}
}
}
},
q=aaron&phone=813-689-6889 - searching a person by name and phone number works fine as well.
{
"query": {
"match_phrase": {
"name": "aaron"
}
},
"filter": {
"bool": {
"must": {
"query": {
"match_phrase": {
"phone": "813-689-6889"
}
}
}
}
}
However, When I try to use both filters, address and phone I get a No filter registered for [match_phrase] error
for example: q=aaron&address=scarborough&phone=813-689-6889
{
"query": {
"match_phrase": {
"name": "aaron"
}
},
"filter": {
"bool": {
"must": {
"nested": {
"path": "addresses",
"query": {
"match_phrase": {
"address": "scarborough"
}
}
},
"query": {
"match_phrase": {
"phone": "813-689-6889"
}
}
}
}
}
the error, when using address and phone filters together:
nested: QueryParsingException[[pl_people] No filter registered for [match_phrase]]; }]","status":400}):
index mapping (person) as requested:
{
"pl_people": {
"mappings": {
"person": {
"properties": {
"ac_name": {
"type": "string",
"analyzer": "autocomplete"
},
"date_of_birth": {
"type": "date",
"format": "dateOptionalTime"
},
"email": {
"type": "string"
},
"first_name": {
"type": "string",
"fields": {
"na_first_name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"last_name": {
"type": "string",
"fields": {
"na_last_name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"middle_name": {
"type": "string",
"fields": {
"na_middle_name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"name": {
"type": "string",
"fields": {
"na_name": {
"type": "string",
"index": "not_analyzed"
},
"ngram_name": {
"type": "string",
"analyzer": "my_start"
},
"ns_name": {
"type": "string",
"analyzer": "no_stopwords"
}
}
},
"phone": {
"type": "string"
},
"time": {
"type": "date",
"format": "dateOptionalTime"
},
"updated_at": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}
Maybe you can use term-filter, instead of match_phrase as a filter.
See here.

Elasticsearch geo_distance in combination with other queries

Hello I have a problem with the combination of multiple queries within Elasticsearch.
The problem only occurs whenever I try to combine a multi_match query with the geo_distance query. The multi_match query works when the geo_distance query is not present and the geo_distance query works when the multi_match query is not present.
Whenever I execute the multi_match query without the geo_distance query I get the results that I expect. I also get the expected results when I try the geo_distance query without the multi_match query.
Boths results contain the dataset that I would expect to receive when both queries are executed together. But whenever I execute them together I receive 0 results.
When I combine the geo_distance query with a simple term query the search works. So I presume it is problem with the combination of queries.
I would appreciate any ideas.
My query is the following:
{
"query": {
"bool": {
"must": {
"bool": {
"should": {
"multi_match": {
"query": "CompanyName GmbH",
"fields": [
"originalName",
"legalName"
],
"type": "cross_fields",
"operator": "AND"
}
}
}
},
"filter": {
"bool": {
"should": {
"geo_distance": {
"location": [
9.87107,
51.69915
],
"distance": "30.0km",
"distance_type": "arc"
}
}
}
}
}
}
}
The mapping behind all of that is:
{
"customer": {
"aliases": {
},
"mappings": {
"customer-entity": {
"properties": {
"communication": {
"properties": {
"domain": {
"type": "string"
},
"email": {
"type": "string"
},
"landline": {
"type": "string"
},
"mobile": {
"type": "string"
}
}
},
"id": {
"type": "long"
},
"legalName": {
"type": "string",
"store": true
},
"location": {
"type": "geo_point"
},
"operatingModes": {
"type": "string"
},
"originalName": {
"type": "string",
"store": true
}
}
},
"homepage-entity": {
"_parent": {
"type": "customer-entity"
},
"_routing": {
"required": true
},
"properties": {
"customerId": {
"type": "string",
"store": true
},
"id": {
"type": "long"
},
"metas": {
"type": "string",
"store": true
}
}
},
"person-entity": {
"_parent": {
"type": "customer-entity"
},
"_routing": {
"required": true
},
"properties": {
"customerId": {
"type": "string",
"store": true
},
"firstName": {
"type": "string",
"store": true
},
"id": {
"type": "long"
},
"lastName": {
"type": "string",
"store": true
},
"personId": {
"type": "string",
"store": true
}
}
}
},
"settings": {
"index": {
"refresh_interval": "-1",
"number_of_shards": "1",
"creation_date": "1488920698118",
"store": {
"type": "fs"
},
"number_of_replicas": "0",
"uuid": "ZcLN5sxASXGUnKZMg8mBpw",
"version": {
"created": "2040499"
}
}
},
"warmers": {
}
}
}

Unable to drop result bucket in terms aggregation - Elasticsearch

I have documents in Elasticsearch with the following structure:
"mappings": {
"document": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string"
},
"id_secuencia": {
"type": "long"
},
"event": {
"properties": {
"elapsedTime": {
"type": "double"
},
"requestTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"error": {
"properties": {
"errorCode": {
"type": "string",
"index": "not_analyzed"
},
"failureDetail": {
"type": "string"
},
"fault": {
"type": "string"
}
}
},
"file": {
"type": "string",
"index": "not_analyzed"
},
"messageId": {
"type": "string"
},
"request": {
"properties": {
"body": {
"type": "string"
},
"header": {
"type": "string"
}
}
},
"responseTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"service": {
"properties": {
"operation": {
"type": "string",
"index": "not_analyzed"
},
"project": {
"type": "string",
"index": "not_analyzed"
},
"proxy": {
"type": "string",
"index": "not_analyzed"
},
"version": {
"type": "string",
"index": "not_analyzed"
}
}
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
},
"type": {
"type": "string"
}
}
}
}
And I need to retrieve a list of unique values for the field "event.file" (to show in a Kibana Data Table) according to the following criteria:
There is more than one document with the same value for the field "event.file"
All the occurences for that value of "event.file" have resulted in error (field "event.error.errorCode" exists in all documents)
For that purpose the approach I've been testing is the use of terms aggregation, so I can get a list of buckets with all documents for a single file name. What I haven't been able to achieve is to drop some of the resulting buckets in the aggregation according to the previous criteria (if at least one of them does not have an error the bucket should be discarded).
Is this the correct approach or is there a better/easier way to get this type of result?
Thanks a lot.
After trying out several queries I found the following approach (see query below) to be valid for my purpose. The problem I see now is that apparently it is not possible to do this in Kibana, as it has no support for pipeline aggregations (see https://github.com/elastic/kibana/issues/4584).
{
"query": {
"bool": {
"must": [
{
"filtered": {
"filter": {
"exists": {
"field": "event.file"
}
}
}
}
]
}
},
"size": 0,
"aggs": {
"file-events": {
"terms": {
"field": "event.file",
"size": 0,
"min_doc_count": 2
},
"aggs": {
"files": {
"filter": {
"exists": {
"field": "event.file"
}
},
"aggs": {
"totalFiles": {
"value_count": {
"field": "event.file"
}
}
}
},
"errors": {
"filter": {
"exists": {
"field": "event.error.errorCode"
}
},
"aggs": {
"totalErrors": {
"value_count": {
"field": "event.error.errorCode"
}
}
}
},
"exhausted": {
"bucket_selector": {
"buckets_path": {
"total_files":"files>totalFiles",
"total_errors":"errors>totalErrors"
},
"script": "total_errors == total_files"
}
}
}
}
}
}
Again, if I'm missing something feedback will be appreciated :)

elasticsearch "having not" query

Some documents has category fields.. Some of these docs has category fields its value equals to "-1". I need a query return documents which have category fields and "not equal to -1".
I tried this:
GET webproxylog/_search
{
"query": {
"filtered": {
"filter": {
"not":{
"filter": {"and": {
"filters": [
{"term": {
"category": "-1"
}
},
{
"missing": {
"field": "category"
}
}
]
}}
}
}
}
}
}
But not work.. returns docs not have "category field"
EDIT
Mapping:
{
"webproxylog": {
"mappings": {
"accesslog": {
"properties": {
"category": {
"type": "string",
"index": "not_analyzed"
},
"clientip": {
"type": "string",
"index": "not_analyzed"
},
"clientmac": {
"type": "string",
"index": "not_analyzed"
},
"clientname": {
"type": "string",
"index": "not_analyzed"
},
"duration": {
"type": "long"
},
"filetype": {
"type": "string",
"index": "not_analyzed"
},
"hierarchycode": {
"type": "string",
"index": "not_analyzed"
},
"loggingdate": {
"type": "date",
"format": "dateOptionalTime"
},
"reqmethod": {
"type": "string",
"index": "not_analyzed"
},
"respsize": {
"type": "long"
},
"resultcode": {
"type": "string",
"index": "not_analyzed"
},
"url": {
"type": "string",
"analyzer": "slash_analyzer"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If your category field is string and is analyzed by default, then your -1 will be indexed as 1 (stripping the minus sign).
You will need that field to be not_analyzed or to add a sub-field which is not analyzed (as my solution below).
Something like this:
DELETE test
PUT /test
{
"mappings": {
"test": {
"properties": {
"category": {
"type": "string",
"fields": {
"notAnalyzed": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST /test/test/1
{"category": "-1"}
POST /test/test/2
{"category": "2"}
POST /test/test/3
{"category": "3"}
POST /test/test/4
{"category": "4"}
POST /test/test/5
{"category2": "-1"}
GET /test/test/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"category.notAnalyzed": {
"value": "-1"
}
}
},
{
"filtered": {
"filter": {
"missing": {
"field": "category"
}
}
}
}
]
}
}
}

Elastic search nested match_phrase issue

We are doing match_phrase query on nested objects, where nested object has a string value only.
We intended to find string phrase occurrences.
Lets suppose,
1) Mapping is as follows.
"attr": {
"type": "nested",
"properties": {
"attr": {
"type": "multi_field",
"fields": {
"attr": { "type": "string", "index": "analyzed", "include_in_all": true, "analyzer": "keyword" },
"untouched": { "type": "string", "index": "analyzed", "include_in_all": false, "analyzer": "not_analyzed" }
}
}
}
}
2) Data is like.
Object A:
"attr": [
{
"attr": "beverage"
},
{
"attr": "apple wine"
}
]
Object B:
"attr": [
{
"attr": "beverage"
},
{
"attr": "apple"
},
{
"attr": "wine"
}
]
3) Therefore, on query like
{
"query": {
"match": {
"_all": {
"query": "apple wine",
"type": "phrase"
}
}
}
}
We are expecting only Object A, but unfortunately Object B is also coming.
Look forward to your suggestions please.
In your case, separate array values should have large gaps in their offsets to avoid phrase matching.
There is a default configurable gap between instances of the same field, but the default value for this gap is 0.
You should change it in the field mapping:
"attr": { "type": "string",
"index": "analyzed",
"include_in_all": true,
"analyzer": "keyword",
"position_offset_gap": 100
}
You will also need to tell the query to search all terms in one nested doc:
"query": {
"nested": {
"path": "attr",
"query": {
"match": {
"attr": {
"query": "apple wine",
"operator": "and"
}
}
}
}
}
A good source of information is http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Resources