ElasticSearch multilayer nested properties - elasticsearch

I have an index mapping like this
"mappings": {
"properties": {
"filter": {
"type": "nested",
"properties": {
"Hersteller": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"analyzer": "analyzerFilter",
"fielddata": true
},
"value": {
"type": "text",
"analyzer": "analyzerFilter",
"fielddata": true
}
}
},
"Modell": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"analyzer": "analyzerFilter",
"fielddata": true
},
"value": {
"type": "text",
"analyzer": "analyzerFilter",
"fielddata": true
}
}
}
}
},
"id": {
"type": "text",
"analyzer": "analyzerFilter"
}
}
}
}
There are 2 nested layers filter.Modell. I need a query to get all unique filter.Modell.value where filter.Hersteller.value is equal some predefined value.
I am trying first without any condition
{
"size": 4,
"aggs": {
"distinct_filter": {
"nested": { "path": "filter" },
"aggs": {
"distinct_filter_modell": {
"nested": {
"path": "filter.Modell",
"aggs": {
"distinct_filter_modell_value": {
"terms": { "field": "filter.Modell.value" }
}
}
}
}
}
}
}
}
And I get issue like
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Unexpected token START_OBJECT in [distinct_filter_modell].",
"line": 1,
"col": 144
}
],
"type": "parsing_exception",
"reason": "Unexpected token START_OBJECT in [distinct_filter_modell].",
"line": 1,
"col": 144
},
"status": 400
}
Thanks in advance

Related

adding a script-based field to an elasticSearch index mapping

I am following the following docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-indexed.html
I have a field which I would like to not be scripted on runtime but rather on index-time, and according to above I can do that simply by putting the field and its script inside the mapping object as normal.
Here is a simplified version of the index I'm trying to create
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive_analyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"events": {
"properties": {
"fields": {
"type": "text"
},
"id": {
"type": "text"
},
"event": {
"type": "text"
},
"time": {
"type": "date"
},
"user": {
"type": "text"
},
"state": {
"type": "integer"
}
}
},
"eventLast": {
"type": "date",
"on_script_error": "fail",
"script": {
"source": "def events = doc['events']; emit(events[events.length-1].time.value"
}
}
}
}
}
I'm getting this 400 error back:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "unknown parameter [script] on mapper [eventLast] of type [date]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_doc]: unknown parameter [script] on mapper [eventLast] of type [date]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "unknown parameter [script] on mapper [eventLast] of type [date]"
}
},
"status": 400
}
Essentially I'm trying to create a scripted indexed field that is calculated off the last event time in the events array of the document.
Thanks
Tldr;
As the error states, you can not define your script in here.
There is a specific way to create runtime fields in elasticsearch.
You need to put the definition at the root of the json in the runtime object.
Solution
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive_analyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "keyword"
}
}
}
},
"runtime": {
"eventLast": {
"type": "date",
"on_script_error": "fail",
"script": {
"source": "def events = doc['events']; emit(events[events.length-1].time.value"
}
}
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"events": {
"properties": {
"fields": {
"type": "text"
},
"id": {
"type": "text"
},
"event": {
"type": "text"
},
"time": {
"type": "date"
},
"user": {
"type": "text"
},
"state": {
"type": "integer"
}
}
}
}
}
}

Get all the buckets for a aggregate elastic search

I want to get all the buckets available for a particular aggregate. Is there any query or endpoint to get the buckets?
Below is my Mapping. If I query with any filter then the related buckets are coming up, but I want all the buckets to show it on the frontend to have or operations.
Example: If we have 2 records, one is with category as chair and the other is in the table. If I select a chair it is returning table count is zero but it should show as table count as 1. So user can select both.
MyMapping:
{
"properties": {
"australiasellable": {
"type": "boolean"
},
"avgRating": {
"type": "float"
},
"categories": {
"type": "nested"
},
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"categorycode": {
"type": "text",
"fielddata": true
},
"categoryname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"colour": {
"type": "text",
"fielddata": true
},
"commercialuse": {
"type": "boolean"
},
"customisable": {
"type": "boolean"
},
"depth": {
"type": "float"
},
"freedelivery": {
"type": "boolean"
},
"height": {
"type": "float"
},
"listprice": {
"type": "float"
},
"location": {
"type": "geo_point"
},
"material": {
"type": "text",
"fielddata": true
},
"materialcode": {
"type": "text",
"fielddata": true
},
"message": {
"type": "geo_point"
},
"numberOfRating": {
"type": "long"
},
"online": {
"type": "boolean"
},
"outdooruse": {
"type": "boolean"
},
"productid": {
"type": "long"
},
"productimageurl": {
"type": "text",
"fielddata": true
},
"productname": {
"type": "text",
"fielddata": true
},
"producttypecode": {
"type": "text",
"fielddata": true
},
"sellercode": {
"type": "text",
"fielddata": true
},
"sellerdescription": {
"type": "text",
"fielddata": true
},
"shortdescription": {
"type": "text",
"fielddata": true
},
"sku": {
"type": "text",
"fielddata": true
},
"state": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"stylecode": {
"type": "text",
"fielddata": true
},
"warrantycode": {
"type": "text",
"fielddata": true
},
"weight": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"width": {
"type": "float"
}
}
}
Regards,
Sreenivas
A possible solution would be not to set the filter in the query section of your payload but rather perform filtered aggregations and use the top_hits to get the _sources of the matched docs.
Long story short, if you apply a query, it'll of course affect your aggregations. So the trick is to not apply any query (either match_all or remove the whole query object) and perform the queries in the sub-aggregations as follows:
Using your category field:
GET your_index/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"actual_query_agg": {
"filter": {
"term": {
"category.keyword": {
"value": "chair"
}
}
},
"aggs": {
"actual_query_agg_top_hits": {
"top_hits": {
"_source": [
"category"
],
"size": 10
}
}
}
},
"excluding_my_query_filtered_agg": {
"filter": {
"bool": {
"must_not": {
"term": {
"category.keyword": "chair"
}
}
}
},
"aggs": {
"by_other_categories_agg": {
"terms": {
"field": "category.keyword",
"size": 10
},
"aggs": {
"categorized_other_docs_agg_top_hits": {
"top_hits": {
"_source": [
"category"
],
"size": 10
}
}
}
}
}
}
}
}
You can get rid of the top_hits sub-aggregations if you're just interested in the counts and not the underlying docs, i.e.:
GET your_index/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"actual_query_agg": {
"filter": {
"term": {
"category.keyword": {
"value": "chair"
}
}
}
},
"excluding_my_query_filtered_agg": {
"filter": {
"bool": {
"must_not": {
"term": {
"category.keyword": "chair"
}
}
}
},
"aggs": {
"by_other_categories_agg": {
"terms": {
"field": "category.keyword",
"size": 10
}
}
}
}
}
}

Elasticsearch index_out_of_bounds exception for multi level nested and reverse_nested aggregation

I have an elasticsearch index with multiple deep level nestings. I am performing terms aggregation on one deep level field and want to fetch all the records for another associated deep level field. For which I am using the top_hits aggregation. But my query is returning me "index_out_of_bounds" exception.
Here are index mappings:
{
"mappings": {
"type": {
"properties": {
"campaigns": {
"type": "nested",
"properties": {
"campaign_id": {
"type": "integer"
},
"campaign_name": {
"type": "text"
},
"contents": {
"type": "nested",
"properties": {
"content_id": {
"type": "integer"
},
"content_name": {
"type": "text",
"fielddata": true
}
}
}
}
},
"forms": {
"type": "nested",
"properties": {
"form_id": {
"type": "integer"
},
"form_issubmitted": {
"type": "integer"
},
"form_name": {
"type": "text"
},
"form_tabs": {
"type": "nested",
"properties": {
"tab_id": {
"type": "integer"
},
"tab_name": {
"type": "text"
},
"tab_section": {
"properties": {
"section_id": {
"type": "long"
},
"section_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
}
}
}
and my query is looking like this:
{
"size": 0,
"aggs": {
"sectionAgg": {
"nested": {
"path": "forms.form_tabs.tab_sections"
},
"aggs": {
"termsField": {
"filter": {
"bool": {}
},
"aggs": {
"sectionFields": {
"terms": {
"field": "forms.form_tabs.tab_sections.section_id",
"size": 10000
},
"aggs": {
"sectionFieldDocs": {
"top_hits": {
"size": 1,
"_source": [
"forms.form_tabs.tab_sections.*"
]
}
},
"completioncampaigns.contentsFields": {
"reverse_nested": {
"path": "campaigns.contents"
},
"aggs": {
"completionFieldFilter": {
"filter": {
"bool": {}
},
"aggs": {
"campaignContents": {
"top_hits": {
"size": 100,
"_source": [
"campaigns.contents.*"
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Not pasting the whole result otherwise, it'll be very long. But it does have aggregation data also.
And it is throwing me the error like this
{
"took": 178,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 2,
"skipped": 0,
"failed": 3,
"failures": [
{
"shard": 1,
"index": "userlocal",
"node": "KmxVF34iTXWeLFLmtCy7WQ",
"reason": {
"type": "index_out_of_bounds_exception",
"reason": "2147483647 is out of bounds: [0-803["
}
},
{
"shard": 2,
"index": "userlocal",
"node": "KmxVF34iTXWeLFLmtCy7WQ",
"reason": {
"type": "index_out_of_bounds_exception",
"reason": "2147483647 is out of bounds: [0-1278["
}
},
{
"shard": 4,
"index": "userlocal",
"node": "KmxVF34iTXWeLFLmtCy7WQ",
"reason": {
"type": "index_out_of_bounds_exception",
"reason": "2147483647 is out of bounds: [0-2659["
}
}
]
}
}
I want to know why this error is happening and how to resolve this.

Elasticsearch Postings highlighter Error - cannot highlight

I got the following errors when trying to search with posting highlighter:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "field 'author_name' was indexed without offsets, cannot highlight"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query_fetch",
"grouped": true,
"failed_shards": [
{
"shard": 1,
"index": "post",
"node": "abc",
"reason": {
"type": "illegal_argument_exception",
"reason": "field 'author_name' was indexed without offsets, cannot highlight"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field 'author_name' was indexed without offsets, cannot highlight"
}
},
"status": 400
}
And here's my mapping:
{
"post": {
"mappings": {
"page": {
"_routing": {
"required": true
},
"properties": {
"author_name": {
"type": "text",
"store": true,
"index_options": "offsets",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "the_analyzer",
"search_analyzer": "the_search_analyzer"
},
"editor": {
"properties": {
"author_name": {
"type": "keyword"
}
}
}
}
},
"blog_post": {
"_routing": {
"required": true
},
"properties": {
"author_name": {
"type": "text",
"store": true,
"index_options": "offsets",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "the_analyzer",
"search_analyzer": "the_search_analyzer"
},
"editor": {
"properties": {
"author_name": {
"type": "keyword"
}
}
}
}
},
"comments": {
"_routing": {
"required": true
},
"_parent": {
"type": "blog_post"
},
"properties": {
"author_name": {
"type": "text",
"store": true,
"index_options": "offsets",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "the_analyzer",
"search_analyzer": "the_search_analyzer"
}
}
}
}
}
}
And my query:
GET post/article/_search?routing=cat
{
"query": {
"bool": {
"filter": {
"term": {
"category": "cat"
}
},
"must": [
{
"query_string": {
"query": "bill",
"fields": ["author_name"]
}
}]
}
},
"highlight": {
"fields": {
"author_name": {}
}
}
}
Elasticsearch version: 5.1.1
Lucence version: 6.3.0
When I did _update_by_query it works for a while, before failing again (after more data added).
I did some Googling, and found this issue on Elasticsearch repo:
https://github.com/elastic/elasticsearch/issues/8558, cmiiw, basically said that I need to have the same mapping for the same field name, on the same index. But I already did that, but I didn't know if my editor object, that has author_name can cause that issue.
Lucence code that throws that error:
https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java#L92
https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldHighlighter.java#L162
Question: how do I fix this error? thanks

Unable to drop result bucket in terms aggregation - Elasticsearch

I have documents in Elasticsearch with the following structure:
"mappings": {
"document": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string"
},
"id_secuencia": {
"type": "long"
},
"event": {
"properties": {
"elapsedTime": {
"type": "double"
},
"requestTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"error": {
"properties": {
"errorCode": {
"type": "string",
"index": "not_analyzed"
},
"failureDetail": {
"type": "string"
},
"fault": {
"type": "string"
}
}
},
"file": {
"type": "string",
"index": "not_analyzed"
},
"messageId": {
"type": "string"
},
"request": {
"properties": {
"body": {
"type": "string"
},
"header": {
"type": "string"
}
}
},
"responseTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"service": {
"properties": {
"operation": {
"type": "string",
"index": "not_analyzed"
},
"project": {
"type": "string",
"index": "not_analyzed"
},
"proxy": {
"type": "string",
"index": "not_analyzed"
},
"version": {
"type": "string",
"index": "not_analyzed"
}
}
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
},
"type": {
"type": "string"
}
}
}
}
And I need to retrieve a list of unique values for the field "event.file" (to show in a Kibana Data Table) according to the following criteria:
There is more than one document with the same value for the field "event.file"
All the occurences for that value of "event.file" have resulted in error (field "event.error.errorCode" exists in all documents)
For that purpose the approach I've been testing is the use of terms aggregation, so I can get a list of buckets with all documents for a single file name. What I haven't been able to achieve is to drop some of the resulting buckets in the aggregation according to the previous criteria (if at least one of them does not have an error the bucket should be discarded).
Is this the correct approach or is there a better/easier way to get this type of result?
Thanks a lot.
After trying out several queries I found the following approach (see query below) to be valid for my purpose. The problem I see now is that apparently it is not possible to do this in Kibana, as it has no support for pipeline aggregations (see https://github.com/elastic/kibana/issues/4584).
{
"query": {
"bool": {
"must": [
{
"filtered": {
"filter": {
"exists": {
"field": "event.file"
}
}
}
}
]
}
},
"size": 0,
"aggs": {
"file-events": {
"terms": {
"field": "event.file",
"size": 0,
"min_doc_count": 2
},
"aggs": {
"files": {
"filter": {
"exists": {
"field": "event.file"
}
},
"aggs": {
"totalFiles": {
"value_count": {
"field": "event.file"
}
}
}
},
"errors": {
"filter": {
"exists": {
"field": "event.error.errorCode"
}
},
"aggs": {
"totalErrors": {
"value_count": {
"field": "event.error.errorCode"
}
}
}
},
"exhausted": {
"bucket_selector": {
"buckets_path": {
"total_files":"files>totalFiles",
"total_errors":"errors>totalErrors"
},
"script": "total_errors == total_files"
}
}
}
}
}
}
Again, if I'm missing something feedback will be appreciated :)

Resources