Elasticsearch : "birthday" exception - elasticsearch

This is my document:
{
"user" : {
"name" : "test",
"birthday" : "123"
}
}
when I post this to elasticsearch , it went wrong:
"type" : "mapper_parsing_exception",
"reason" : "object mapping for [user.birthday] tried to parse field [birthday] as object, but found a concrete value"
But if I changed it to this:
{
"user" : {
"name" : "test",
"birthay" : "123"
}
}
It went well.
Is the birthday a keyword ? What can I do for it ?

It's a problem with your mapping. I suppose your birthday is a date, like below:
{
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"birthday": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}

I imagine your mapping looks something like this:
{
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"birthday": {
"type": "object",
"properties" : {
"date" : {"type" : "string"}
}
"index": "not_analyzed"
}
}
}
Or at least something similar that sets the birthday field to be an object type. Your mapping actually needs to be as follows:
{
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"birthday": {
"type": "date",
"format": "YYYY-MM-dd",
"index": "not_analyzed"
}
}
}
And the reason that setting the document field name to 'birthay' instead of 'birthday' worked is that if you don't have a type mapping set for a field Elasticsearch tries to determine one that fits best dynamically.
It's also worth noting that if you don't have a mapping defined and you're getting this error it might be because a document you're indexing before the document that fails has something other than a string format date as the birthday. Thsi would cause ES to determine the field type as something else and then fail on other documents.

Related

How to update field format in Opensearch/Elasticsearch?

I am trying to change the format of a string field in opensearch:
PUT my_index/_mapping
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "YYYY-MM-DD HH:mm:ss.SSS"
}
}
}
}
Response is
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "Root mapping definition has unsupported parameters: [mappings : {properties={timestamp={format=YYYY-MM-DD HH:mm:ss.SSS, type=date}}}]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "Root mapping definition has unsupported parameters: [mappings : {properties={timestamp={format=YYYY-MM-DD HH:mm:ss.SSS, type=date}}}]"
},
"status" : 400
}
I've spent days trying to figure this out, seems to me like Opensearch is just so unnecessarily complex.
You cannot change the type of an existing field once it's been created. You need to reindex your index with the wrong mapping into a new index with the right mapping.
First, create the new index:
PUT new_index
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "YYYY-MM-DD HH:mm:ss.SSS"
}
}
}
}
Then, reindex the old index into the new one
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}

ElasticSearch - string concat aggregation?

I've got the following simple mapping:
"element": {
"dynamic": "false",
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"group": { "type": "string", "index": "not_analyzed" },
"type": { "type": "string", "index": "not_analyzed" }
}
}
Which basically is a way to store Group object:
{
id : "...",
elements : [
{id: "...", type: "..."},
...
{id: "...", type: "..."}
]
}
I want to find how many different groups exist sharing the same set of element types (ordered, including repetitions).
An obvious solution would be to change the schema to:
"element": {
"dynamic": "false",
"properties": {
"group": { "type": "string", "index": "not_analyzed" },
"concatenated_list_of_types": { "type": "string", "index": "not_analyzed" }
}
}
But, due to the requirements, we need to be able to exclude some types from group by (aggregation) :(
All fields of the document are mongo ids, so in SQL I would do something like this:
SELECT COUNT(id), concat_value FROM (
SELECT GROUP_CONCAT(type_id), group_id
FROM table
WHERE type_id != 'some_filtered_out_type_id'
GROUP BY group_id
) T GROUP BY concat_value
In Elastic with given mapping it's really easy to filter out, its also not a problem to count assuming we have a concated value. Needless to say, sum aggregation does not work for strings.
How can I get this working? :)
Thanks!
Finally I solved this problem with scripting and by changing the mapping.
{
"mappings": {
"group": {
"dynamic": "false",
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"elements": { "type": "string", "index": "not_analyzed" }
}
}
}
}
There are still some issues with duplicate elements in array (ScriptDocValues.Strings) for some reason strips out dups, but here's an aggregation that counts by string concat:
{
"aggs": {
"path": {
"scripted_metric": {
"map_script": "key = doc['elements'].join('-'); _agg[key] = _agg[key] ? _agg[key] + 1 : 1",
"combine_script": "_agg",
"reduce_script": "_aggs.collectMany { it.entrySet() }.inject( [:] ) { result, e -> result << [ (e.key):e.value + ( result[ e.key ] ?: 0 ) ]}"
}
}
}
}
The result would be as follows:
"aggregations" : {
"path" : {
"value" : {
"5639abfb5cba47087e8b457e" : 362,
"568bfc495cba47fc308b4567" : 3695,
"5666d9d65cba47701c413c53" : 14,
"5639abfb5cba47087e8b4571-5639abfb5cba47087e8b457b" : 1,
"570eb97abe529e83498b473d" : 1
}
}
}

elasticsearch run any query on field exists

I want to run the any query/filter based on the field exists. In our case if user answers a particular field then only we will store that value, other wise will not store that field it self. How can I run the query?
Below is my mapping:
"mappings": {
"responses_10_57": {
"properties": {
"rid: {
"type": "long"
},
"end_time": {
"type": "date",
"format": "dateOptionalTime"
},
"start_time": {
"type": "date",
"format": "dateOptionalTime"
},
"qid_1": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
},
"qid_2": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
},
"qid_3": {
"properties": {
"msg_text": {
"type": "string"
},
"msg_tags": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "str_params"
}
}
}
}
}
}
}
}
qid_1 is the name field, qid_2 is the category field, qid_3 is the text message field.
But the qid_3 is not a mandatory field. So we will not insert the record if user doesn't entered any text message.
1) I want each category wide count those who responded the third question.
2) I have to search the names who answered the third question.
How can I write these two queries?
Both queries should have an exists filter to limit the response to only those documents where the qid_3 exists (is not null). For your first query you could try a terms aggregation. For your second query, you can filter the source to include only the names in the response or store the field and use fields.
1)
{
"size": 0,
"filter" : {
"exists" : { "field" : "quid_3" }
},
"aggs" : {
"group_by_category" : {
"terms" : { "field" : "qid_2" }
}
}
}
2)
{
"filter" : {
"exists" : { "field" : "quid_3" }
},
"_source": [ "qid_1"]
}

Multiple document types with same mapping in Elasticseach

I have index named test which can be associated to n number of documents types named sub_test_1 to sub_text_n. But all will have same mapping.
Is there any way to make an index such all document types have same mapping for their documents? I.e. test\sub_text1\_mapping should be same as test\sub_text2\_mapping.
Otherwise if I have like 1000 document types, I will we having 1000 mappings of the same type referring to each document types.
UPDATE:
PUT /test_index/
{
"settings": {
"index.store.type": "default",
"index": {
"number_of_shards": 5,
"number_of_replicas": 1,
"refresh_interval": "60s"
},
"analysis": {
"filter": {
"porter_stemmer_en_EN": {
"type": "stemmer",
"name": "porter"
},
"default_stop_name_en_EN": {
"type": "stop",
"name": "_english_"
},
"snowball_stop_words_en_EN": {
"type": "stop",
"stopwords_path": "snowball.stop"
},
"smart_stop_words_en_EN": {
"type": "stop",
"stopwords_path": "smart.stop"
},
"shingle_filter_en_EN": {
"type": "shingle",
"min_shingle_size": "2",
"max_shingle_size": "2",
"output_unigrams": true
}
}
}
}
}
Intended mapping:
{
"sub_text" : {
"properties" : {
"_id" : {
"include_in_all" : false,
"type" : "string",
"store" : true,
"index" : "not_analyzed"
},
"alternate_id" : {
"include_in_all" : false,
"type" : "string",
"store" : true,
"index" : "not_analyzed"
},
"text" : {
"type" : "multi_field",
"fields" : {
"text" : {
"type" : "string",
"store" : true,
"index" : "analyzed",
},
"pdf": {
"type" : "attachment",
"fields" : {
"pdf" : {
"type" : "string",
"store" : true,
"index" : "analyzed",
}
}
}
}
}
}
}
}
I want this mapping to be an individual mapping for all sub_texts I create so that I can change it for one sub_text without affecting others e.g. I may want to add two custom analyzers to sub_text1 and three analyzers to sub_text3, rest others will stay same.
UPDATE:
PUT /my-index/document_set/_mapping
{
"properties": {
"type": {
"type": "string",
"index": "not_analyzed"
},
"doc_id": {
"type": "string",
"index": "not_analyzed"
},
"plain_text": {
"type": "string",
"store": true,
"index": "analyzed"
},
"pdf_text": {
"type": "attachment",
"fields": {
"pdf_text": {
"type": "string",
"store": true,
"index": "analyzed"
}
}
}
}
}
POST /my-index/document_set/1
{
"type": "d1",
"doc_id": "1",
"plain_text": "simple text for doc1."
}
POST /my-index/document_set/2
{
"type": "d1",
"doc_id": "2",
"pdf_text": "cGRmIHRleHQgaXMgaGVyZS4="
}
POST /my-index/document_set/3
{
"type": "d2",
"doc_id": "3",
"plain_text": "simple text for doc3 in d2."
}
POST /my-index/document_set/4
{
"type": "d2",
"doc_id": "4",
"pdf_text": "cGRmIHRleHQgaXMgaGVyZSBpbiBkMi4="
}
GET /my-index/document_set/_search
{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"type" : "d1"
}
}
}
}
}
This gives me the documents related to type "d1". How to add analyzers only to document of type "d1"?
At the moment a possible solution is to use index templates or dynamic mapping. However they do not allow wildcard type matching so you would have to use the _default_ root type to apply the mappings to all types in the index and thus it would be up to you to ensure that all your types can be applied to the same dynamic mapping. This template example may work for you:
curl -XPUT localhost:9200/_template/template_1 -d '
{
"template" : "test",
"mappings" : {
"_default_" : {
"dynamic": true,
"properties": {
"field1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
'
Do not do this.
Otherwise if I have like 1000 document types, I will we having 1000 mappings of the same type referring to each document types.
You're exactly right. For every additional _type with an identical mapping you are needlessly adding to the size of your index's mapping. They will not be merged, nor will any compression save you.
A much better solution is to simply create a shared _type and to create a field that represents the intended type. This completely avoids having wasted mappings and all of the negatives associated with it, including an unnecessary increase for your cluster state's size.
From there, you can imitate what Elasticsearch is doing for you and filter on your custom type without ballooning your mappings.
$ curl -XPUT localhost:9200/my-index -d '{
"mappings" : {
"my-type" : {
"properties" : {
"type" : {
"type" : "string",
"index" : "not_analyzed"
},
# ... whatever other mappings exist ...
}
}
}
}'
Then, for any search against sub_text1 (etc.), then you can do a term (for one) or terms (for more than one) filter to imitate the _type filter that would happen for you.
$ curl -XGET localhost:9200/my-index/my-type/_search -d '{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"type" : "sub_text1"
}
}
}
}
}'
This is doing the same thing as the _type filter and you can create _aliases that contain the filter if you want to have the higher level search capability without exposing client-level logic to the filtering.

Dynamic Mapping for Nested Type

I am trying to create a dynamic mapping for objects like the following:
{
"product": {
"productId": 99999,
"manufacturerId": "A0001",
"manufacturerCode": "A101LI",
"name": "Test Product",
"description": "Describe the product here.",
"feature_details":{
"category": "Category1",
"brand": "Brand Name"
},
"feature_tpcerts":{
"certifiedPass": true,
"levelCertified": 2
},
"feature_characteristics":{
"amount": 0.73,
"location": 49464
}
}
}
I would like the feature_* properties to be a nested type, which I have defined in the mapping below with the nested_feature template and it is working as expected. However, I also want to have each property in the nested object of the feature_*property to be multi_value with an additional facet property defined. I have tried the second nested_template template, but without any success.
{
"product" : {
"_timestamp" : {"enabled" : true, "store": "yes" },
"dynamic_templates": [
{
"nested_feature": {
"match" : "feature_*",
"mapping" : {
"type" : "nested",
"stored": "true"
}
}
},
{
"nested_template": {
"match": "feature_*.*",
"mapping": {
"type": "multi_field",
"fields": {
"{name}": {
"type": "{dynamic_type}",
"index": "analyzed"
},
"facet": {
"type": "{dynamic_type}",
"index": "not_analyzed"
}
}
}
}
}
],
"properties" : {
"productId" : { "type" : "integer", "store" : "yes"},
"manufacturerId" : { "type" : "string", "store" : "yes", "index" : "analyzed"},
"manufacturer" : { "type" : "string", "store" : "yes", "index" : "not_analyzed"},
"manufacturerCode" : { "type" : "string", "store" : "yes"},
"name" : {"type" : "string", "store" : "yes"},
"description": {"type": "string", "index" : "analyzed"}
}
}
}
Unfortunately, the properties within the feature_* properties are created from another process and can be almost any name/value pair. Any suggestions on how to use a dynamic template to setup a property as nested as well as make each property within the nested object multi_field with an additional facet property?
You just have to use path_match instead of match when the pattern refers to the whole field path, otherwise only its name (last part) is taken into account. Have a look at the reference page for the root object, which contains also some documentation related to dynamic templates.
You might also want to use match_mapping_type as you can't set "index":"analyzed" for numeric or boolean fields for instance. In that case you might want to do different things depending on the field type.
I noticed that your document contains the product root object, which you don't really need. I would remove it, as the type name is already product.
Also, I would avoid storing fields explicitly unless you really need to, as with elasticsearch you have the _source field stored by default, which is what you are going to need all the time.
The following mapping should work in your case (without the product root object in the documents):
{
"product" : {
"dynamic_templates": [
{
"nested_feature": {
"match" : "feature_*",
"mapping" : {
"type" : "nested"
}
}
},
{
"nested_template": {
"path_match": "feature_*.*",
"match_mapping_type" : "string",
"mapping": {
"type": "multi_field",
"fields": {
"{name}": {
"type": "{dynamic_type}",
"index": "analyzed"
},
"facet": {
"type": "{dynamic_type}",
"index": "not_analyzed"
}
}
}
}
}
]
}
}

Resources