ElasticSearch field for sort index and wildcard query - sorting

I have an ID field which is a UUID that I want to use as unique sort (because created_at is not unique) and wildcard query.
Example, searching b85f9fdd will result in document with ID b85f9fdd-5557-4f70-bbd7-9a23b0485235
I have try to create this index:
{
"settings": {
"index": {
"sort.field": [ "created_at", "id" ],
"sort.order": [ "desc", "desc" ]
}
},
"mappings": {
"properties": {
"id": { "type": "wildcard", "fields": { "raw": { "type": "keyword" }}},
"current_status": { "type": "keyword" },
"version_rev": { "type": "keyword" },
"tracking_id": { "type": "wildcard" },
"invoice_number": { "type": "keyword" },
"created_at": { "type": "date" }
}
}
}
}
But i got the reply:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "invalid index sort field:[id]"
}
],
"type": "illegal_argument_exception",
"reason": "invalid index sort field:[id]"
},
"status": 400
}

You need to sort by id.raw instead:
"settings": {
"index": {
"sort.field": [ "created_at", "id.raw" ],
"sort.order": [ "desc", "desc" ]
}
},

Related

adding a script-based field to an elasticSearch index mapping

I am following the following docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-indexed.html
I have a field which I would like to not be scripted on runtime but rather on index-time, and according to above I can do that simply by putting the field and its script inside the mapping object as normal.
Here is a simplified version of the index I'm trying to create
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive_analyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"events": {
"properties": {
"fields": {
"type": "text"
},
"id": {
"type": "text"
},
"event": {
"type": "text"
},
"time": {
"type": "date"
},
"user": {
"type": "text"
},
"state": {
"type": "integer"
}
}
},
"eventLast": {
"type": "date",
"on_script_error": "fail",
"script": {
"source": "def events = doc['events']; emit(events[events.length-1].time.value"
}
}
}
}
}
I'm getting this 400 error back:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "unknown parameter [script] on mapper [eventLast] of type [date]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_doc]: unknown parameter [script] on mapper [eventLast] of type [date]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "unknown parameter [script] on mapper [eventLast] of type [date]"
}
},
"status": 400
}
Essentially I'm trying to create a scripted indexed field that is calculated off the last event time in the events array of the document.
Thanks
Tldr;
As the error states, you can not define your script in here.
There is a specific way to create runtime fields in elasticsearch.
You need to put the definition at the root of the json in the runtime object.
Solution
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive_analyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "keyword"
}
}
}
},
"runtime": {
"eventLast": {
"type": "date",
"on_script_error": "fail",
"script": {
"source": "def events = doc['events']; emit(events[events.length-1].time.value"
}
}
},
"mappings": {
"properties": {
"id": {
"type": "text"
},
"events": {
"properties": {
"fields": {
"type": "text"
},
"id": {
"type": "text"
},
"event": {
"type": "text"
},
"time": {
"type": "date"
},
"user": {
"type": "text"
},
"state": {
"type": "integer"
}
}
}
}
}
}

ElasticSearch multilayer nested properties

I have an index mapping like this
"mappings": {
"properties": {
"filter": {
"type": "nested",
"properties": {
"Hersteller": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"analyzer": "analyzerFilter",
"fielddata": true
},
"value": {
"type": "text",
"analyzer": "analyzerFilter",
"fielddata": true
}
}
},
"Modell": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"analyzer": "analyzerFilter",
"fielddata": true
},
"value": {
"type": "text",
"analyzer": "analyzerFilter",
"fielddata": true
}
}
}
}
},
"id": {
"type": "text",
"analyzer": "analyzerFilter"
}
}
}
}
There are 2 nested layers filter.Modell. I need a query to get all unique filter.Modell.value where filter.Hersteller.value is equal some predefined value.
I am trying first without any condition
{
"size": 4,
"aggs": {
"distinct_filter": {
"nested": { "path": "filter" },
"aggs": {
"distinct_filter_modell": {
"nested": {
"path": "filter.Modell",
"aggs": {
"distinct_filter_modell_value": {
"terms": { "field": "filter.Modell.value" }
}
}
}
}
}
}
}
}
And I get issue like
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "Unexpected token START_OBJECT in [distinct_filter_modell].",
"line": 1,
"col": 144
}
],
"type": "parsing_exception",
"reason": "Unexpected token START_OBJECT in [distinct_filter_modell].",
"line": 1,
"col": 144
},
"status": 400
}
Thanks in advance

Easticsearch reindexing multi-type parent/child index(v5.0) to join type index(v6.2)

I am reindexing my index data from ES 5.0(parent-child) to ES 6.2(Join type)
Data in index ES 5.0 is stored as parent-child documents in separate types and for reindex i have created new index/mapping based on 6.2 in my new cluster.
The parent documents flawlessly reindex to new index but the child documents throwing error as below
{
"index": "index_two",
"type": "_doc",
"id": "AVpisCkMuwDYFnQZiFXl",
"cause": {
"type": "mapper_parsing_exception",
"reason": "failed to parse",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "[routing] is missing for join field [field_relationship]"
}
},
"status": 400
}
scripts i am using to reindex the data
{
"source": {
"remote": {
"host": "http://myescluster.com:9200",
"socket_timeout": "1m",
"connect_timeout": "20s"
},
"index": "index_two",
"type": ["actions"],
"size": 5000,
"query":{
"bool":{
"must":[
{"term": {"client_id.raw": "cl14ous0ydao"}}
]
}
}
},
"dest": {
"index": "index_two",
"type": "_doc"
},
"script": {
"params": {
"jdata": {
"name": "actions"
}
},
"source": "ctx._routing=ctx._routing;ctx.remove('_parent');params.jdata.parent=ctx._source.user_id;ctx._source.field_relationship=params.jdata"
}
}
I have passed the routing field in painless script as the documents are dynamic from source index.
Mapping of the destination index
{
"index_two": {
"mappings": {
"_doc": {
"dynamic_templates": [
{
"template_actions": {
"match_mapping_type": "string",
"mapping": {
"fields": {
"raw": {
"index": true,
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
}
}
}
],
"date_detection": false,
"properties": {
"attributes": {
"type": "nested"
}
},
"cl_other_params": {
"type": "nested"
},
"cl_triggered_ts": {
"type": "date"
},
"cl_utm_params": {
"type": "nested"
},
"end_ts": {
"type": "date"
},
"field_relationship": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"users": [
"actions",
"segments"
]
}
},
"ip_address": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"processed_ts": {
"type": "date"
},
"processing_time": {
"type": "date"
},
"products": {
"type": "nested",
"properties": {
"traits": {
"type": "nested"
}
}
},
"segment_id": {
"type": "integer"
},
"start_ts": {
"type": "date"
}
}
}
}
}
My sample source document
{
"_index": "index_two",
"_type": "actions",
"_id": "AVvKUYcceQCc2OyLKWZ9",
"_score": 7.4023576,
"_routing": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
"_parent": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
"_source": {
"user_id": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
"client_id": "cl14ous0ydao",
"session_id": "CL-e0ec3941-6dad-4d2d-bc9b",
"source": "betalist",
"action": "pageview",
"action_type": "pageview",
"device": "Desktop",
"ip_address": "49.35.14.224",
"location": "20.7333 , 77",
"attributes": [
{
"key": "url",
"value": "https://www.google.com/",
"type": "string"
}
],
"products": []
}
}
I had the same issue and searching in elasticsearch discussions I found this that works:
POST _reindex
{
"source": {
"index": "old_index",
"type": "actions"
},
"dest": {
"index": "index_two"
},
"script": {
"source": """
ctx._type = "_doc";
String routingCode = ctx._source.user_id;
Map join = new HashMap();
join.put('name', 'actions');
join.put('parent', routingCode);
ctx._source.put('field_relationship', join);
ctx._parent = null;
ctx._routing = new StringBuffer(routingCode)"""
}
}
Hope this helps :) .
I'd like to point out that routing is generally not required for a join field, however if you're creating the child before the parent is created, then you're going to face this problem.
It's advisable to re-index all the parents first then the children.

Elasticsearch Postings highlighter Error - cannot highlight

I got the following errors when trying to search with posting highlighter:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "field 'author_name' was indexed without offsets, cannot highlight"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query_fetch",
"grouped": true,
"failed_shards": [
{
"shard": 1,
"index": "post",
"node": "abc",
"reason": {
"type": "illegal_argument_exception",
"reason": "field 'author_name' was indexed without offsets, cannot highlight"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "field 'author_name' was indexed without offsets, cannot highlight"
}
},
"status": 400
}
And here's my mapping:
{
"post": {
"mappings": {
"page": {
"_routing": {
"required": true
},
"properties": {
"author_name": {
"type": "text",
"store": true,
"index_options": "offsets",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "the_analyzer",
"search_analyzer": "the_search_analyzer"
},
"editor": {
"properties": {
"author_name": {
"type": "keyword"
}
}
}
}
},
"blog_post": {
"_routing": {
"required": true
},
"properties": {
"author_name": {
"type": "text",
"store": true,
"index_options": "offsets",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "the_analyzer",
"search_analyzer": "the_search_analyzer"
},
"editor": {
"properties": {
"author_name": {
"type": "keyword"
}
}
}
}
},
"comments": {
"_routing": {
"required": true
},
"_parent": {
"type": "blog_post"
},
"properties": {
"author_name": {
"type": "text",
"store": true,
"index_options": "offsets",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "the_analyzer",
"search_analyzer": "the_search_analyzer"
}
}
}
}
}
}
And my query:
GET post/article/_search?routing=cat
{
"query": {
"bool": {
"filter": {
"term": {
"category": "cat"
}
},
"must": [
{
"query_string": {
"query": "bill",
"fields": ["author_name"]
}
}]
}
},
"highlight": {
"fields": {
"author_name": {}
}
}
}
Elasticsearch version: 5.1.1
Lucence version: 6.3.0
When I did _update_by_query it works for a while, before failing again (after more data added).
I did some Googling, and found this issue on Elasticsearch repo:
https://github.com/elastic/elasticsearch/issues/8558, cmiiw, basically said that I need to have the same mapping for the same field name, on the same index. But I already did that, but I didn't know if my editor object, that has author_name can cause that issue.
Lucence code that throws that error:
https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java#L92
https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldHighlighter.java#L162
Question: how do I fix this error? thanks

Searching on fields of a nested object on elasticsearch

I have this mapping on ES 1.7.3:
{
"customer": {
"aliases": {},
"mappings": {
"customer": {
"properties": {
"addresses": {
"type": "nested",
"include_in_parent": true,
"properties": {
"address1": {
"type": "string"
},
"address2": {
"type": "string"
},
"address3": {
"type": "string"
},
"country": {
"type": "string"
},
"latitude": {
"type": "double",
"index": "not_analyzed"
},
"longitude": {
"type": "double",
"index": "not_analyzed"
},
"postcode": {
"type": "string"
},
"state": {
"type": "string"
},
"town": {
"type": "string"
},
"unit": {
"type": "string"
}
}
},
"companyNumber": {
"type": "string"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string"
},
"status": {
"type": "string"
},
"timeCreated": {
"type": "date",
"format": "dateOptionalTime"
},
"timeUpdated": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "5",
"creation_date": "1472372294516",
"store": {
"type": "fs"
},
"uuid": "RxJdXvPWSXGpKz8pdcF91Q",
"version": {
"created": "1050299"
},
"number_of_replicas": "1"
}
},
"warmers": {}
}
}
The spring application generates this query:
{
"query": {
"bool": {
"should": {
"query_string": {
"query": "(addresses.\\*:sample* AND NOT status:ARCHIVED)",
"fields": [
"type",
"name",
"companyNumber",
"status",
"addresses.unit",
"addresses.address1",
"addresses.address2",
"addresses.address3",
"addresses.town",
"addresses.state",
"addresses.postcode",
"addresses.country"
],
"default_operator": "or",
"analyze_wildcard": true
}
}
}
}
}
on which "addresses.*:sample*" is the only input.
"query": "(sample* AND NOT status:ARCHIVED)"
Code above works but searches all fields of the customer object.
Since I want to search only on address fields I used the "addresses.*"
Query works only if the fields of the address object are of String type and before I added longitude and latitude fields of double type on address object. Now the error occurs because of these two new fields.
Error:
Parse Failure [Failed to parse source [{
"query": {
"bool": {
"should": {
"query_string": {
"query": "(addresses.\\*:sample* AND NOT status:ARCHIVED)",
"fields": [
"type",
"name",
"companyNumber","country",
"state",
"status",
"addresses.unit",
"addresses.address1",
"addresses.address2",
"addresses.address3",
"addresses.town",
"addresses.state",
"addresses.postcode",
"addresses.country",
],
"default_operator": "or",
"analyze_wildcard": true
}
}
}
}
}
]]
NumberFormatException[For input string: "sample"
Is there a way to search "String" fields within a nested object using addresses.* only?
The solution was to add "lenient": true. As per the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
lenient - If set to true will cause format based failures (like providing text to a numeric field) to be ignored.

Resources