Elastic Search mapper_parsing_exception error - elasticsearch

I have created a index in elastic search with name test. Index mapping is as follow:
{
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
after creating index I have added following documents into it:
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
But I am getting following error:
{"mapper_parsing_exception","reason":"failed to parse field [url] of
type [text]"}
can anyone help me into this?

If your documents look like this:
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
Then your mapping needs to look like this, i.e. url is an object with the name and age fields:
{
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"age": {
"type": "integer"
}
}
}
}

Hi You need to create mapping like this
PUT test
{
"settings" : {
"number_of_shards" : 1
},
"mapping": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
And the document is
put test/doc/1
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
GET test/doc/1
And the result is
{
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "demo",
"url" : {
"name" : "tiger",
"age" : 10
}
}
}

One reason for this if you're on Elastic Cloud is that the data types are assigned to fields the first time they appear on an index. And it will throw this error if you send it a subsequent log with a different type in that field.
For me, the log field was a string in the first log sent to the index but an object in the second. So the second one got rejected.
Good explanation here: https://discuss.elastic.co/t/getting-illegal-state-exception-error-while-pushing-logs-to-elasticsearch/290029

Related

ElasticSearch : "copy_to" a nested fields

I try to use the ES "copy_to" attribute to replicate an object field into a nested field, but I got an error despite my multiple tries. Here is my structure :
...
"identifiedBy": {
"type": "object",
"properties": {
"type": {
"type": "keyword",
"copy_to": "nested_identifiers.type"
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
"copy_to": "nested_identifiers.type"
},
"note": {
"type": "text"
},
"qualifier": {
"type": "keyword"
},
"source": {
"type": "keyword",
"copy_to": "nested_identifiers.type"
},
"status": {
"type": "text"
}
}
},
"nested_identifiers": {
"type": "nested",
"properties": {
"type": {
"type": "keyword",
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
},
"source": {
"type": "keyword",
}
}
}
...
The mapping error is
java.lang.IllegalArgumentException: Illegal combination of [copy_to] and [nested]
mappings: [copy_to] may only copy data to the current nested document or any of its
parents, however one [copy_to] directive is trying to copy data from nested object [null]
to [nested_identifiers]
I also try to place the "copy_to" at the "identifiedBy" root level : doesn't work.
I also try to use the a "fields" property into "identifiedBy" and "copy_to" this subfield : doesn't work.
Is anyone knows a solution to solve my problem ?
Thanks for your help.
Tldr;
Because of how Elasticsearch index nested documents. This is not possible ... without updating the mapping.
There is indeed a work around, using include_in_root: true setting.
Else I suggest you pre process you data before indexing it, and during this pre process copy the data over to the nested field. Maybe using an ingest pipeline ?
Ingest Pipeline
PUT /72270706/
{
"mappings": {
"properties": {
"root_type":{
"type": "keyword"
},
"nested_doc":{
"type": "nested",
"properties": {
"nested_type":{
"type": "keyword"
}
}
}
}
}
}
PUT _ingest/pipeline/set_nested_type
{
"processors": [
{
"set": {
"field": "nested_doc.nested_type",
"copy_from": "root_type"
}
}
]
}
POST /72270706/_doc?pipeline=set_nested_type
{
"root_type": "a type"
}
GET /72270706/_search
Should give you
{
"took" : 392,
"timed_out" : false,
"_shards" : {
...
},
"hits" : {
...
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "72270706",
"_id" : "laOB0YABOgujegeQNA8D",
"_score" : 1.0,
"_source" : {
"root_type" : "a type",
"nested_doc" : {
"nested_type" : "a type"
}
}
}
]
}
}
To work around
...
"identifiedBy": {
"type": "object",
"properties": {
"type": {
"type": "keyword",
"copy_to": "nested_identifiers.type"
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
"copy_to": "nested_identifiers.type"
},
"note": {
"type": "text"
},
"qualifier": {
"type": "keyword"
},
"source": {
"type": "keyword",
"copy_to": "nested_identifiers.type"
},
"status": {
"type": "text"
}
}
},
"nested_identifiers": {
"type": "nested",
"include_in_root": true,
"properties": {
"type": {
"type": "keyword",
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
},
"source": {
"type": "keyword",
}
}
}
...
You will need to re index the existing data.
But be aware the copy_to will not copy the information to the nested object. But to another field, that has the same name but is not nested.

How to sort on a text field with elastic search

{
"parent" : "some_id",
"type" : "support",
"metadata" : {
"account_type" : "Regular",
"subject" : "Test Subject",
"user_name" : "John Doe",
"origin" : "Origin",
"description" : "TEST",
"media" : [ ],
"ticket_number" : "XXXX",
"status" : "completed",
},
"create_time" : "2021-02-24T15:08:57.750Z",
"entity_name" : "comment"
}
This is my demo data. and when I try to sort by metadata.sort for e.g. ->
GET comments-*/_search
{
"query": {
"bool": {
"must": [{
"match": {
"type": "support"
}
}]
}
},
"from": 0,
"size": 50,
"sort": [{
"metadata.status": {
"order": "desc"
}
}]
}
it says -> Fielddata is disabled on text fields by default. Set fielddata=true on [metadata.status] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.
I am not sure how to achieve the same as I am very new to ESS. Any help would be appreciated
You can only sort by fields of type "keyword" on string fields.
Elasticsearch dynamic mappings will create 2 fields if you dont set the mappings before sending docs.
In this case "status" , and "status.keyword".
So try with "metadata.status.keyword".
TL;DR
It is a good practice for fields you will not be doing full text search (like status flags) to only store the keyword version of the field.
To do that you have to set the mappings before indexing any document.
There is a trick:
Ingest Data
POST test_predipa/_doc
{
"parent" : "some_id",
"type" : "support",
"metadata" : {
"account_type" : "Regular",
"subject" : "Test Subject",
"user_name" : "John Doe",
"origin" : "Origin",
"description" : "TEST",
"media" : [ ],
"ticket_number" : "XXXX",
"status" : "completed"
},
"create_time" : "2021-02-24T15:08:57.750Z",
"entity_name" : "comment"
}
Get the autogenerated mappings
GET test_predipa/_mapping
Create a new empty index with the same mappings and modify as you want (on this case remove the text type field from metadata.status and let only the keyword one.
PUT test_predipa_new
{
"mappings": {
"properties": {
"create_time": {
"type": "date"
},
"entity_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"metadata": {
"properties": {
"account_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"status": {
"type": "keyword"
},
"subject": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ticket_number": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"parent": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Move the data from the old index to the new empty one
POST _reindex
{
"source": {
"index": "test_predipa"
},
"dest": {
"index": "test_predipa_new"
}
}
Run the sort query
GET test_predipa_new/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"type": "support"
}
}
]
}
},
"from": 0,
"size": 50,
"sort": [
{
"metadata.status": {
"order": "desc"
}
}
]
}
Most probably, the issue is that metadata.status is of text type, which is not sortable (see docs). You can sort over a textual field if this is of a keyword type.
Please check the mapping of your index. Most probably, your index has default mapping (see docs), and a keyword sub-field is automatically assigned to every field with a string value.
TL;DR: try to run this query
GET comments-*/_search
{
"query": {
"bool": {
"must": [{
"match": {
"type": "support"
}
}]
}
},
"from": 0,
"size": 50,
"sort": [{
"metadata.status.keyword": {
"order": "desc"
}
}]
}

Find the number of documents with filter at a particular time in Elasticsearch

I have documents in elasticsearch in which each document looks something like as follows:
{
"id": "T12890ADSA12",
"status": "ENDED",
"type": "SAMPLE",
"updatedAt": "2020-05-29T18:18:08.483Z",
"audit": [
{
"event": "STARTED",
"version": 1,
"timestamp": "2020-04-30T13:41:25.862Z"
},
{
"event": "INPROGRESS",
"version": 2,
"timestamp": "2020-05-14T17:03:09.137Z"
},
{
"event": "INPROGRESS",
"version": 3,
"timestamp": "2020-05-17T17:03:09.137Z"
},
{
"event": "ENDED",
"version": 4,
"timestamp": "2020-05-29T18:18:08.483Z"
}
],
"createdAt": "2020-04-30T13:41:25.862Z"
}
If I wanted to know the number of documents which are in STARTED state at a particular time given. How can I do that? It should use the timestamp from each event in the events field.
Edit: Mapping of the index is as follows:
{
"id": "text",
"status": "text",
"type": "text",
"updatedAt": "date",
"events": [
{
"event": "text",
"version": long,
"timestamp": "date"
}
],
"createdAt": "date"
}
In order to achieve what you want, you need to make sure that the events array is of nested type because you have two conditions that you need to apply on each array element and this is only possible if events is nested:
"events" : {
"type": "nested", <--- you need to add this
"properties" : {
"event" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timestamp" : {
"type" : "date"
},
"version" : {
"type" : "long"
}
}
},
Then you'll be able to run the following nested query:
{
"query": {
"nested": {
"path": "events",
"query": {
"bool": {
"must": [
{
"range": {
"events.date": {
"gte": "2020-06-08",
"lte": "2020-06-08"
}
}
},
{
"term": {
"events.event": "STARTED"
}
}
]
}
}
}
}
}

ElasticSearch Mapping Issue - Nested to Non-Nested

I am creating a mapping for data generated by a computer vision application. However, I am getting an error when I test pushing an example data message to ElasticSearch. I have read tons of forums where others have had this issue. Some have resolved their issue but I have tried everything I know to try. I actually think there may be a simple resolution but I am relatively new to Elastic
Search.
The index and mapping are created successfully using:
PUT vision_events
{
"settings" : {
"number_of_shards" : 5
},
"mappings" : {
"properties": {
"camera_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"hit_counts": {
"type": "long"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"intersection": {
"type": "boolean"
},
"label": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"locations": {
"type" : "nested",
"properties": {
"coords" : {
"type" : "float"
},
"location": {
"type": "text"
},
"street_segment": {
"type": "text"
},
"timestamp": {
"type": "date"
}
}
},
"pole_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timestamp": {
"type": "date"
}
}
}
}
Once completed, I move on to validating the mapping is correct. I push the following example data:
POST /vision_events/1?pretty=true
{
"pole_id": "mlk-central-2",
"camera_id": "mlk-central-cam-2",
"intersection": true,
"id": "644d1c06-4c60-4ed8-93b4-1aa79b87a622",
"label": "car",
"timestamp": 1586838108683,
"locations": [
{
"timestamp": 1586838109448,
"coords": 1626.3220383482665,
"street_segment": "None"
},
{
"timestamp": 1586838109832,
"coords": 1623.3129222859882,
"street_segment": "None"
}
],
"hit_counts": 2
}
This produces the following error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "object mapping [locations] can't be changed from nested to non-nested"
}
],
"type" : "illegal_argument_exception",
"reason" : "object mapping [locations] can't be changed from nested to non-nested"
},
"status" : 400
}
The locations field is a list of "objects" which contain the fields: coords, location, street_segment and timestamp. Messages have varying length of locations. Any help would be greatly appreciated.
Put the unchanged mapping:
PUT vision_events
{"settings":{"number_of_shards":5},"mappings":{"properties":{"camera_id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"hit_counts":{"type":"long"},"id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"intersection":{"type":"boolean"},"label":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"locations":{"type":"nested","properties":{"coords":{"type":"float"},"location":{"type":"text"},"street_segment":{"type":"text"},"timestamp":{"type":"date"}}},"pole_id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"timestamp":{"type":"date"}}}}
Insert a single doc in accordance w/ the POST structure from the docs:
POST /vision_events/_doc/1?pretty=true
{
"pole_id": "mlk-central-2",
"camera_id": "mlk-central-cam-2",
"intersection": true,
"id": "644d1c06-4c60-4ed8-93b4-1aa79b87a622",
"label": "car",
"timestamp": 1586838108683,
"locations": [
{
"timestamp": 1586838109448,
"coords": 1626.3220383482665,
"street_segment": "None"
},
{
"timestamp": 1586838109832,
"coords": 1623.3129222859882,
"street_segment": "None"
}
],
"hit_counts": 2
}

Elastic search fuzzy query unexpected results

I have 2 indices, cities and places. Places one has a mapping like this:
{
"mappings": {
"properties": {
"cityId": {
"type": "integer"
},
"cityName": {
"type": "text"
},
"placeName": {
"type": "text"
},
"status": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"reviews": {
"properties": {
"rating": {
"type": "long"
},
"comment": {
"type": "keyword"
},
"user": {
"type": "nested"
}
}
}
}
}
}
And City is index is mapped like this:
{
"mappings": {
"properties": {
"state": {
"type": "keyword"
},
"postal": {
"type": "keyword"
},
"phone": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"notes": {
"type": "keyword"
},
"status": {
"type": "keyword"
},
"cityName": {
"type": "text"
},
"website": {
"type": "keyword"
},
"cityId": {
"type": "integer"
}
}
}
}
Initially we had a single document where cities had places embedded but I was having trouble searching nested places array so I changed the structure to this, I want to be able to search both cityName and placeName in a single query with fuzziness. I have a city including the word Welder's in it's name and also the some places inside the same location have the word Welder's in their name, which have a type:text. However when searched for welder both of the following queries see below don't return these documents, a search for welders OR welder's does return these documents. I am not sure why welder won't match with Welder's*. I didn't specify any analyzer during the creation of both the indices and neither am I explicitly defining it in the query can anyone help me out with this query so it behaves as expected:
Query 1: index = places
{
"query": {
"bool": {
"should": [
{
"match": {
"placeName": {
"query": "welder",
"fuzziness": 20
}
}
},
{
"match": {
"cityName": {
"query": "welder",
"fuzziness": 20
}
}
}
]
}
}
}
Query 2: index = places
{
"query": {
"match": {
"placeName": {
"query": "welder",
"fuzziness": 20
}
}
}
}
Can anyone post a query that when passed a word welder would return documents having Welder's in their name (should also work for other terms like these, this is just an example)
Edit 1 :
This is a sample place document I would want to be returned by any of the queries posted above:
{
cityId: 29,
placeName: "Welder's Garage Islamabad",
cityName: "Islamabad",
status: "verified",
category: null,
reviews: []
}
Using your mapping and query and fuzziness set as "20" I am getting document back. Fuzziness: 20 will tolerate 20 edit distance between searched word and welder's so even "w" will match with "welder's". I think this value is different in your actual query.
If you want to search for welder or welders and return welder's then you can use stemmer token filter
Mapping:
PUT indexfuzzy
{
"mappings": {
"properties": {
"cityId": {
"type": "integer"
},
"cityName": {
"type": "text"
},
"placeName": {
"type": "text",
"analyzer": "my_analyzer"
},
"status": {
"type": "keyword"
},
"category": {
"type": "keyword"
},
"reviews": {
"properties": {
"rating": {
"type": "long"
},
"comment": {
"type": "keyword"
},
"user": {
"type": "nested"
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"stem_possessive_english",
"stem_minimal_english"
]
}
},
"filter": {
"stem_possessive_english": {
"type": "stemmer",
"name": "possessive_english"
},
"stem_minimal_english": {
"type": "stemmer",
"name": "minimal_english"
}
}
}
}
}
Query :
GET indexfuzzy/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"placeName": {
"query": "welder"--> welder,welders,welder's will work
}
}
},
{
"match": {
"cityName": {
"query": "welder"
}
}
}
]
}
}
}
Result:
[
{
"_index" : "indexfuzzy",
"_type" : "_doc",
"_id" : "Jc-yx3ABd7NBn_0GTBdp",
"_score" : 0.2876821,
"_source" : {
"cityId" : 29,
"placeName" : "Welder's Garage Islamabad",
"cityName" : "Islamabad",
"status" : "verified",
"category" : null,
"reviews" : [ ]
}
}
]
possessive_english:- removes trailing 's from tokens
minimal_english:- removes plurals
GET <index_name>/_analyze
{
"text": "Welder's Garage Islamabad",
"analyzer": "my_analyzer"
}
returns
{
"tokens" : [
{
"token" : "welder", --> will be matched for welder's, welders
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "garage",
"start_offset" : 9,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "islamabad",
"start_offset" : 16,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}

Resources