Elasticsearch MapperParsingException - elasticsearch

I am trying to index following data to elasticsearch,
{
"_id": "5619578c1983757a72efef15",
"aseg": {},
"cs": {
"source": "None",
"ss": "In Transit",
"sr": "Weight Captured",
"act": "+B",
"pid": "BAG21678106",
"st": "UD",
"dest": "Bharatpur_DC (Rajasthan)",
"u": "J",
"sl": "Jaipur_Hub (Rajasthan)",
"ud": "2015-10-12T14:59:44.270000",
"sd": "2015-10-12T14:59:44.270000"
},
"nsl": [
{
"dt": [
2015,
10,
10
],
"code": "X-PPONM"
},
{
"dt": [
2015,
10,
11
],
"code": "X-UCI"
},
]
}
but in return i am getting this error
MapperParsingException[failed to parse [cs.nsl]]; nested: ElasticsearchIllegalArgumentException[unknown property [dt]];
I checked the mapping, mapping is correct, nsl nested inside cs dict has a different mapping than nsl at root level.
"cs": {
"properties": {
"act": {
"type": "string"
},
"add": {
"type": "string"
},
"asr": {
"type": "string"
},
"bucket": {
"type": "string"
},
"dest": {
"type": "string",
"index": "not_analyzed"
},
"dwbn": {
"type": "string"
},
"lcld": {
"type": "string"
},
"lat": {
"type": "string"
},
"lon": {
"type": "string"
},
"loc": {
"type": "double"
},
"nsl": {
"type": "string",
"index": "not_analyzed"
},
"ntd": {
"type": "date",
"format": "dateOptionalTime"
},
"pbs": {
"type": "string"
},
"pid": {
"type": "string"
},
"pupid": {
"type": "string"
},
"sd": {
"type": "date",
"format": "dateOptionalTime"
},
"sl": {
"type": "string",
"index": "not_analyzed"
},
"source": {
"properties": {
"source": {
"type": "string"
},
"source_id": {
"type": "string"
},
"source_type": {
"type": "string"
}
}
},
"sr": {
"type": "string"
},
"ss": {
"type": "string",
"index": "not_analyzed"
},
"st": {
"type": "string"
},
"u": {
"type": "string",
"index": "not_analyzed"
},
"ud": {
"type": "date",
"format": "dateOptionalTime"
},
"vh": {
"type": "string"
}
}
},
and for nsl at root level mapping is as follow
"nsl": {
"properties" : {
"code" : {
"type" : "string",
"index": "not_analyzed"
},
"dt" : {
"type" : "string",
"index": "not_analyzed"
}
}
},
this is happening for only a few records, rest all are syncing fine.
there isn't any changes in payload.
Futher nsl is a sparse key inside cs.

In your mapping nsl is as follows -
"nsl": {
"type": "string",
"index": "not_analyzed"
},
As per mapping , Elasticsearch is expecting a concrete string value to the nsl field but its a object array in the document you have provided.
Elasticsearch once it has a mapping , its definite. You cant insert an object data into a string field.

I tried your document without pre-setting any mapping as follows:
{
"aseg": {},
"cs": {
"source": "None",
"ss": "In Transit",
"sr": "Weight Captured",
"act": "+B",
"pid": "BAG21678106",
"st": "UD",
"dest": "Bharatpur_DC (Rajasthan)",
"u": "J",
"nsl":"foo",
"sl": "Jaipur_Hub (Rajasthan)",
"ud": "2015-10-12T14:59:44.270000",
"sd": "2015-10-12T14:59:44.270000"
},
"nsl": [
{
"dt": [
2015,
10,
10
],
"code": "X-PPONM"
},
{
"dt": [
2015,
10,
11
],
"code": "X-UCI"
}
]
}
And the ES created the mapping as follows:
"nsl": {
"properties": {
"dt": {
"type": "long"
},
"code": {
"type": "string"
}
}
}
As you can see ES put the "dt" type as "long" which is the internal representation of a date type. So, may be need to change that type?
Also, without seeing the successful document it is difficult to guess but I believe those documents do not have the "dt" field value.
Of course, you are free to put "not_analyzed" as you see fit for any field.

Related

Why does the keyword type take up much more space than text in elasticsearch?

env: ElasticSearch 5.5.1
First there are two indexs in my elasticsearch
and the only different of two index is the message field, the field's type of message in index1 is keyword, and in index2 is text.
To ensure that it is not affected by other fields,I remove the message field and compare before and after result:
Before remove message field:
after remove message field i got:
Obvious the message field takes up a lot of space,and the type of keyword take up much more than text,but I don't know why keyword take up much more size than text?
so, is there anyone help me ?
Following is the index of index1's mapping info:
"mappings": {
"system": {
"dynamic": "true",
"_all": {
"enabled": false
},
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"dynamic_templates": [
{
"geo2": {
"match": "*_geo",
"mapping": {
"type": "geo_point"
}
}
},
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"numeric_detection": false,
"properties": {
"#agent_timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"#timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"Kafkaspeed": {
"type": "keyword"
},
"_index_name": {
"type": "keyword"
},
"count": {
"type": "long"
},
"datex": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"defaultWriteESspeed": {
"type": "double"
},
"filepathname": {
"type": "keyword"
},
"jsonmessage": {
"type": "text"
},
"key": {
"type": "keyword"
},
"logcount": {
"type": "long"
},
"loglevel": {
"type": "keyword"
},
"message": {
"type": "keyword"
},
"paredspeed": {
"type": "float"
},
"seccount": {
"type": "long"
},
"sn": {
"type": "long"
},
"sourceName": {
"type": "keyword"
},
"sourceip": {
"type": "keyword"
},
"sourcename": {
"type": "keyword"
},
"sourceport": {
"type": "long"
},
"sucesscount": {
"type": "long"
},
"time_str": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"timestamp": {
"type": "long"
},
"totalcount": {
"type": "long"
},
"uniqueid": {
"type": "keyword"
}
}
}
}
and settings info:
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "3",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "60s",
"durability": "async"
},
"provided_name": "index1",
"creation_date": "1531389785215",
"analysis": {
"analyzer": {
"optionIK": {
"filter": [
"word_delimiter"
],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "0",
"uuid": "zd8oVbwUQbys1UJ8hJZRmQ",
"version": {
"created": "5050099"
}
}
}
Following is the index of index2's mapping info:
"mappings": {
"system": {
"dynamic": "true",
"_all": {
"enabled": false
},
"dynamic_date_formats": [
"yyyy-MM-dd HH:mm:ss.SSS"
],
"dynamic_templates": [
{
"geo2": {
"match": "*_geo",
"mapping": {
"type": "geo_point"
}
}
},
{
"strings2": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
],
"numeric_detection": false,
"properties": {
"#agent_timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"#timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"CommunicationReturnCode": {
"type": "keyword"
},
"Kafkaspeed": {
"type": "keyword"
},
"_index_name": {
"type": "keyword"
},
"action": {
"type": "keyword"
},
"count": {
"type": "long"
},
"datex": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"defaultWriteESspeed": {
"type": "double"
},
"filepathname": {
"type": "keyword"
},
"jsonmessage": {
"type": "text"
},
"key": {
"type": "keyword"
},
"logcount": {
"type": "long"
},
"loglevel": {
"type": "keyword"
},
"message": {
"type": "text"
},
"msgid": {
"type": "keyword"
},
"msgname": {
"type": "keyword"
},
"nodetype": {
"type": "keyword"
},
"orgid": {
"type": "keyword"
},
"orgname": {
"type": "keyword"
},
"paredspeed": {
"type": "float"
},
"processingState": {
"type": "keyword"
},
"processingStatecode": {
"type": "keyword"
},
"seccount": {
"type": "long"
},
"sn": {
"type": "long"
},
"sourceName": {
"type": "keyword"
},
"sourceip": {
"type": "keyword"
},
"sourcename": {
"type": "keyword"
},
"sourceport": {
"type": "long"
},
"sucesscount": {
"type": "long"
},
"thread": {
"type": "keyword"
},
"time_str": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"timestamp": {
"type": "long"
},
"totalcount": {
"type": "long"
},
"transDescription": {
"type": "keyword"
},
"transactionErrorCode": {
"type": "keyword"
},
"transactionTimeConsuming": {
"type": "keyword"
},
"transcode": {
"type": "keyword"
},
"uniqueid": {
"type": "keyword"
}
}
}
}
and setting info:
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "2",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "60s",
"durability": "async"
},
"provided_name": "index2",
"creation_date": "1531467294314",
"analysis": {
"analyzer": {
"optionIK": {
"filter": [
"word_delimiter"
],
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "0",
"uuid": "yROU2MrMTzip4VXH_zWEXQ",
"version": {
"created": "5050099"
}
}
}
Following are one of the index's file structure of the two shards about the text type field:
and the keyword type field:
And you can believe that there are same number of documents in two folder, and the only difference of the field is the type of message field.
Could you explain it?
Thank you so much!
In Elasticsearch keyword fields have doc_values enabled by default, while text fields does not. This means that on your keyword fields it will store the whole field in a column-oriented fashion, in order to be able to perform aggregations or sorting, without relying on fielddata.
Also, Once you tokenize a string, with stemming, lowercasing, etc, you can achieve much better compression.
You can try to disable doc_values on that field if you don't perform aggregations or sorting on it.

Sorting in Elasticsearch ignoring the date part of field

I have a mapping of date field as :
"created": {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
Now when i sort based on the above field:
"sort":[
{"created":{"order":"asc"}}
],
It takes only time part of the date while sorting and ignores the date part.
{
"_index": "somehting",
"_type": "UserActivity",
"_id": "81574",
"_score": null,
"_source": {
"created": "2016-03-29 00:00:07",
"appCode": "appcode",
"userId": "100008057363993"
},
"sort": [
"00:00:07"
]
},
How do sort based on the whole date?
Please note i cannot use scripting as its disabled on production server and i cannot re-index..
Adding the total mapping:
{
"someIndex": {
"mappings": {
"UserActivity": {
"_timestamp": {
"enabled": true,
"store": true,
"format": "yyyy-MM-dd HH:mm:ss"
},
"properties": {
"_table": {
"type": "string"
},
"_tableat": {
"type": "string"
},
"activity": {
"properties": {
"_table": {
"type": "string"
},
"_tableat": {
"type": "string"
},
"clientId": {
"type": "integer"
},
"code": {
"type": "string"
},
"created": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"foreignName": {
"type": "string"
},
"frequency": {
"type": "integer"
},
"id": {
"type": "long"
},
"lastUpdated": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"name": {
"type": "string"
},
"points": {
"type": "long"
},
"strategy": {
"type": "string"
}
}
},
"activityId": {
"type": "string"
},
"appCode": {
"type": "string"
},
"clientId": {
"type": "long"
},
"created": {
"type": "string"
},
"details": {
"type": "string"
},
"foreignName": {
"type": "string"
},
"id": {
"type": "long"
},
"lastUpdated": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"notes": {
"type": "string"
},
"userId": {
"type": "long"
}
}
}
}
}
}

Searching on fields of a nested object on elasticsearch

I have this mapping on ES 1.7.3:
{
"customer": {
"aliases": {},
"mappings": {
"customer": {
"properties": {
"addresses": {
"type": "nested",
"include_in_parent": true,
"properties": {
"address1": {
"type": "string"
},
"address2": {
"type": "string"
},
"address3": {
"type": "string"
},
"country": {
"type": "string"
},
"latitude": {
"type": "double",
"index": "not_analyzed"
},
"longitude": {
"type": "double",
"index": "not_analyzed"
},
"postcode": {
"type": "string"
},
"state": {
"type": "string"
},
"town": {
"type": "string"
},
"unit": {
"type": "string"
}
}
},
"companyNumber": {
"type": "string"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string"
},
"status": {
"type": "string"
},
"timeCreated": {
"type": "date",
"format": "dateOptionalTime"
},
"timeUpdated": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "5",
"creation_date": "1472372294516",
"store": {
"type": "fs"
},
"uuid": "RxJdXvPWSXGpKz8pdcF91Q",
"version": {
"created": "1050299"
},
"number_of_replicas": "1"
}
},
"warmers": {}
}
}
The spring application generates this query:
{
"query": {
"bool": {
"should": {
"query_string": {
"query": "(addresses.\\*:sample* AND NOT status:ARCHIVED)",
"fields": [
"type",
"name",
"companyNumber",
"status",
"addresses.unit",
"addresses.address1",
"addresses.address2",
"addresses.address3",
"addresses.town",
"addresses.state",
"addresses.postcode",
"addresses.country"
],
"default_operator": "or",
"analyze_wildcard": true
}
}
}
}
}
on which "addresses.*:sample*" is the only input.
"query": "(sample* AND NOT status:ARCHIVED)"
Code above works but searches all fields of the customer object.
Since I want to search only on address fields I used the "addresses.*"
Query works only if the fields of the address object are of String type and before I added longitude and latitude fields of double type on address object. Now the error occurs because of these two new fields.
Error:
Parse Failure [Failed to parse source [{
"query": {
"bool": {
"should": {
"query_string": {
"query": "(addresses.\\*:sample* AND NOT status:ARCHIVED)",
"fields": [
"type",
"name",
"companyNumber","country",
"state",
"status",
"addresses.unit",
"addresses.address1",
"addresses.address2",
"addresses.address3",
"addresses.town",
"addresses.state",
"addresses.postcode",
"addresses.country",
],
"default_operator": "or",
"analyze_wildcard": true
}
}
}
}
}
]]
NumberFormatException[For input string: "sample"
Is there a way to search "String" fields within a nested object using addresses.* only?
The solution was to add "lenient": true. As per the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
lenient - If set to true will cause format based failures (like providing text to a numeric field) to be ignored.

dynamic mapping for nested type

I have an index/type of test1/all which looks as follows:
{
"test1": {
"mappings": {
"all": {
"properties": {
"colors": {
"properties": {
"H": {"type": "double"},
"S": {"type": "long"},
"V": {"type": "long"},
"color_percent": {"type": "long"}
}
},
"file_name": {
"type": "string"
},
"id": {
"type": "string"
},
"no_of_colors": {
"type": "long"
}
}
}
}
}
}
I would like to make the colors field nested, I am trying the following :
PUT /test1/all/_mapping
{
"mappings":{
"all":{
"properties": {
"file_name":{
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"no_of_colors":{
"type":"long",
"index": "not_analyzed"
},
"colors":{
"type":"nested",
"properties":{
"H":{"type":"double"},
"S":{"type":"long"},
"V":{"type":"long"},
"color_percent":{"type":"integer"}
}
}
}
}
}
}
But I get the following error:
{
"error": "MapperParsingException[Root type mapping not empty after parsing!
Remaining fields: [mappings : {all={properties={file_name={type=string, index=not_analyzed}, id={type=string, index=not_analyzed}, no_of_colors={type=integer, index=not_analyzed}, colors={type=nested, properties={H={type=double}, S={type=long}, V={type=long}, color_percent={type=integer}}}}}}]]",
"status": 400
}
Any suggestions? Appreciate the help.
You're almost there, you simply need to remove the mappings section like this:
PUT /test1/all/_mapping
{
"properties": {
"file_name": {
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"no_of_colors": {
"type": "long",
"index": "not_analyzed"
},
"colors": {
"type": "nested",
"properties": {
"H": {
"type": "double"
},
"S": {
"type": "long"
},
"V": {
"type": "long"
},
"color_percent": {
"type": "integer"
}
}
}
}
}
However, note that this will not work either, because you cannot change the colors type from object to nested and the other string fields from analyzed to _not_analyzed. You need to delete your index and re-create it from scratch

Elastic Search queries not working with curl

Running the command:
curl -XGET http://127.0.0.1:9200/30556/_search -d '{
"query": {
"constant_score" : {
"filter" : {
"term" : { "portal_type" : "Folder"}
}
}
}
}'
yields 0 results. The output is:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
If fact, I can't get any queries to yield results.
However, when I run the same query using the head plugin, it works fine.
I'm on elasticsearch 0.20.2 on Mac OS X. I'm starting elastic search using the command:
bin/elasticsearch -f
Is there something obvious I'm missing? Seems like I have the correct syntax and I don't get any errors.
Mapping:
{
"30556": {
"portal_catalog": {
"properties": {
"CreationDate": {
"type": "date",
"format": "dateOptionalTime"
},
"Creator": {
"type": "string"
},
"Date": {
"type": "date",
"format": "dateOptionalTime"
},
"Description": {
"type": "string"
},
"ModificationDate": {
"type": "date",
"format": "dateOptionalTime"
},
"SearchableText": {
"type": "string"
},
"Title": {
"type": "string"
},
"Type": {
"type": "string"
},
"UID": {
"type": "string"
},
"allowedRolesAndUsers": {
"type": "string"
},
"created": {
"type": "date",
"format": "dateOptionalTime"
},
"effective": {
"type": "date",
"format": "dateOptionalTime"
},
"effectiveRange": {
"dynamic": "true",
"properties": {
"effectiveRange1": {
"type": "date",
"format": "dateOptionalTime"
},
"effectiveRange2": {
"type": "date",
"format": "dateOptionalTime"
}
}
},
"exclude_from_nav": {
"type": "boolean"
},
"expires": {
"type": "date",
"format": "dateOptionalTime"
},
"getId": {
"type": "string"
},
"getObjPositionInParent": {
"type": "long"
},
"getObjSize": {
"type": "string"
},
"id": {
"type": "string"
},
"is_default_page": {
"type": "boolean"
},
"is_folderish": {
"type": "boolean"
},
"listCreators": {
"type": "string"
},
"meta_type": {
"type": "string"
},
"modified": {
"type": "date",
"format": "dateOptionalTime"
},
"object_provides": {
"type": "string"
},
"path": {
"dynamic": "true",
"properties": {
"depth": {
"type": "long"
},
"path": {
"type": "string"
}
}
},
"portal_type": {
"type": "string"
},
"review_state": {
"type": "string"
},
"sortable_title": {
"type": "string"
},
"total_comments": {
"type": "long"
}
}
}
}
}
Example Indexed Document:
{
"_index": "30556",
"_type": "portal_catalog",
"_id": "30613",
"_score": 1,
"_source": {
"sortable_title": "news",
"exclude_from_nav": false,
"meta_type": "ATFolder",
"Date": "2013-01-14T09:24:56-06:00",
"CreationDate": "2013-01-14T09:24:56-06:00",
"path": {
"depth": 2,
"path": "/el/news"
},
"allowedRolesAndUsers": [
"Anonymous"
],
"portal_type": "Folder",
"id": "news",
"UID": "3116b6c7ec384a9393f238fdde778612",
"expires": "2499-12-31T00:00:00-06:00",
"Subject": [],
"is_folderish": true,
"is_default_page": false,
"effectiveRange": {
"effectiveRange1": "1000-01-01T00:00:00-06:00",
"effectiveRange2": "2499-12-31T00:00:00-06:00"
},
"commentators": [],
"created": "2013-01-14T09:24:56-06:00",
"getRawRelatedItems": [],
"cmf_uid": [],
"Creator": "admin",
"end": [],
"modified": "2013-01-14T09:24:56-06:00",
"Description": "Site News",
"ModificationDate": "2013-01-14T09:24:56-06:00",
"total_comments": 0,
"in_reply_to": [],
"getIcon": "",
"effective": "1000-01-01T00:00:00-06:00",
"SearchableText": "news News Site News ",
"getObjPositionInParent": 61,
"object_provides": [
"collective.syndication.interfaces.ISyndicatable",
"Products.ATContentTypes.interfaces.folder.IATFolder",
"Products.CMFCore.interfaces._content.IContentish",
"z3c.relationfield.interfaces.IHasIncomingRelations",
"webdav.interfaces.IWriteLock"
],
"last_comment_date": null,
"review_state": "published",
"start": [],
"Type": "Folder",
"listCreators": [
"admin"
],
"getId": "news",
"getObjSize": "1 kB",
"Title": "News"
}
Try to use lower case index names.
Does it work?
If not, can you provide your indexed document and mapping if any?
UPDATE:
You use default analyzer so your field is broken into tokens which are lowercased.
A TermFilter is not analyzed So it does not match.
You can lowercase you TermFilter or use a MatchQuery which is analyzed or change your mapping and set the field to not_analyzed.

Resources