Simple elasticsearch input - Rejecting mapping update final mapping would have more than 1 type: [_doc, doc] - elasticsearch

I'm trying to send data to elasticsearch but running into an issue where my number field only comes up as a string. These are the steps I took.
Step 1. Add index & map
PUT http://123.com:5101/core_060619/
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
},
"data": {
"type": "integer"
}
}
}
}
Result:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "core_060619"
}
Step 2. Add data
PUT http://123.com:5101/core_060619/doc/1
{
"test" : [ {
"data" : "119050300",
"date" : "00:00 2019-06-03"
} ]
}
Result:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [zyxnewcoreyxbl_060619] as the final mapping would have more than 1 type: [_doc, doc]"
}
],
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [zyxnewcoreyxbl_060619] as the final mapping would have more than 1 type: [_doc, doc]"
},
"status": 400
}

You can not have more than one type of document in Elasticsearch 6.0.0+. If you set your document type to doc, then you can add another document by simply PUT http://123.com:5101/core_060619/doc/1, PUT http://123.com:5101/core_060619/doc/2 etc.
Elasticsearch 6.+
PUT core_060619/
{
"mappings": {
"doc": { //type of documents in index is 'doc'
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
},
"data": {
"type": "integer"
}
}
}
}
}
Since we created mapping to have doc type of documents, now we can add new documents by simply adding /doc/_id:
PUT core_060619/doc/1
{
"test" : [ {
"data" : "119050300",
"date" : "00:00 2019-06-03"
} ]
}
PUT core_060619/doc/2
{
"test" : [ {
"data" : "111120300",
"date" : "10:15 2019-06-02"
} ]
}
Elasticsearch 7.+
Types are removed, but you can use custom like field(s):
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"type": { "type": "keyword" },
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" },
"content": { "type": "text" },
"tweeted_at": { "type": "date" }
}
}
}
}
PUT twitter/_doc/user-kimchy
{
"type": "user",
"name": "Shay Banon",
"user_name": "kimchy",
"email": "shay#kimchy.com"
}
PUT twitter/_doc/tweet-1
{
"type": "tweet",
"user_name": "kimchy",
"tweeted_at": "2017-10-24T09:00:00Z",
"content": "Types are going away"
}
GET twitter/_search
{
"query": {
"bool": {
"must": {
"match": {
"user_name": "kimchy"
}
},
"filter": {
"match": {
"type": "tweet"
}
}
}
}
}
Removal of mapping types

Related

Update a string parameter in Elasticsearch _mapping

I have such a _mapping in Elasticsearch 6.8:
{
"grch38_test__wes__grch38__variants__20210222" : {
"mappings" : {
"variant" : {
"_meta" : {
"gencodeVersion" : "25",
"hail_version" : "0.2.20",
"genomeVersion" : "38",
"sampleType" : "WES",
"sourceFilePath" : "s3://my_folder/my_vcf.vcf"
},
...
My goal is to issue a query in Kibana to modify variant._meta.sourceFilePath. Following thread:
Elastic search mapping for nested json objects
I was able to come up with the query:
PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant
{
"properties": {
"variant": {
"type": "nested",
"properties": {
"_meta": {
"type": "nested",
"properties": {
"type": "text",
"sourceFilePath": "s3://my_folder/my_vcf.vcf"
}
}
}
}
}
}
But its giving me an error:
elasticsearch mapping Expected map for property [fields] on field [name] but got a class java.lang.String
Full error message:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Expected map for property [fields] on field [type] but got a class java.lang.String"
}
],
"type": "mapper_parsing_exception",
"reason": "Expected map for property [fields] on field [type] but got a class java.lang.String"
},
"status": 400
}
I have also tried:
PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant
{
"properties": {
"variant": {
"type": "nested",
"properties": {
"_meta": {
"type": "nested",
"properties": {
"sourceFilePath": {
"type": "text",
"value":"s3://my_folder/my_vcf.vcf"
}
}
}
}
}
}
}
But its telling me that value is unsupported:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Mapping definition for [sourceFilePath] has unsupported parameters: [value : s3://seqr-dp-data--prod/vcf/dev/grch38_test_contracted.vcf]"
}
],
"type": "mapper_parsing_exception",
"reason": "Mapping definition for [sourceFilePath] has unsupported parameters: [value : s3://seqr-dp-data--prod/vcf/dev/grch38_test_contracted.vcf]"
},
"status": 400
}
What am I doing wrong? How to modify the field?
_meta is a reserved field for storing application-specific metadata. It's not meant to be searchable and can be only retrieved through the GET Mapping API.
This means that if your _meta content was intended to be consistent with what the _meta field is designed for, you cannot apply any mappings to it. It's a "final" hashmap of concrete values and would need to be defined at the top level of your update-mapping payload:
PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant
{
"_meta": {
"variant": { <-- shared index-level metadata
"gencodeVersion": "25",
"hail_version": "0.2.20",
"genomeVersion": "38",
"sampleType": "WES",
"sourceFilePath": "s3://my_folder/my_vcf.vcf"
}
},
"properties": {
"some_text_field": { <-- actual document properties
"type": "text"
}
}
}
If, on the other hand, your _meta field is an unfortunate naming coincidence, you can declare the mappings for it like so:
PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant
{
"properties": {
"_meta": {
"properties": {
"variant": {
"properties": {
"gencodeVersion": {
"type": "text"
},
"genomeVersion": {
"type": "text"
},
"hail_version": {
"type": "text"
},
"sampleType": {
"type": "text"
},
"sourceFilePath": {
"type": "text"
}
}
}
}
}
}
}
and ingest documents of the form:
POST grch38_test__wes__grch38__variants__20210222/variant/_doc
{
"_meta": {
"variant": {
"gencodeVersion": "25",
"hail_version": "0.2.20",
"genomeVersion": "38",
"sampleType": "WES",
"sourceFilePath": "s3://my_folder/my_vcf.vcf"
}
}
}
But again, the _meta content would be document-specific, not index-wide!
And BTW, the nested mapping only makes sense if you're dealing with arrays of objects, not objects of objects.
But if you insist on wanting it, here's how you'd do it:
PUT /grch38_test__wes__grch38__variants__20210222/_mapping/variant?include_type_name
{
"properties": {
"_meta": {
"type": "nested", <---
"properties": {
"variant": {
"type": "nested", <---
"properties": {
"gencodeVersion": {
"type": "text"
},
"genomeVersion": {
"type": "text"
},
"hail_version": {
"type": "text"
},
"sampleType": {
"type": "text"
},
"sourceFilePath": {
"type": "text"
}
}
}
}
}
}
}

Separation of hits returned from elastic by nested field value

I've index with products there. I'm trying to separate hits returned from elastic by nested field value. There's my shortened index:
{
"mapping": {
"product": {
"properties": {
"id": {
"type": "integer"
},
"model_name": {
"type": "text",
},
"variants": {
"type": "nested",
"properties": {
"attributes": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"product_attribute_id": {
"type": "integer"
},
"value": {
"type": "text"
}
}
},
"id": {
"type": "integer"
},
"product_id": {
"type": "integer"
}
}
}
}
}
}
}
And product example (there's is more variants and attributes in product - I just cut them off):
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1271,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"XL",
"product_attribute_id":36740
}
]
},
{
"id":1272,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"L",
"product_attribute_id":36741
}
]
}
]
}
The field in question is attribute id. Let's say I want to separate products by size attribute - id 29. It would be perfect if the response would look like:
"hits" : [
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1271,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"XL",
"product_attribute_id":36740
}
]
}
]
},
{
"_index":"product_index",
"_type":"product",
"id":192,
"model_name":"Some tshirt",
"variants":[
{
"id":1272,
"product_id":192,
"attributes":[
{
"id":29,
"name":"clothesSize",
"value":"L",
"product_attribute_id":36741
}
]
}
]
}]
I thought about separate all variants in elastic request and then group them on application side by those attribute but i think it's not most elegant and above all, efficient way.
What are the elastic keywords that I should be interested in?
Thank you in advance for your help.

elasticsearch reindex nested object's element to keyword

I have an index structured like below:
"my_index": {
"mappings": {
"my_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeStatistics": {
"type": "nested",
"properties": {
"clicks": {
"type": "long"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
}
}
I need to remove the nested object in a new index and just save the creativeId as a new keyword (to make it clear: I know I will loose the clicks data, and it is not important). It means the final new index scheme would be:
"my_new_index": {
"mappings": {
"my_new_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
Right now each row has exactly one creativeStatistics. and therefore there is no complexity in selecting one of the creativeIds.
I know it is possible to reindex using painless scripts, but I don't know how can I do that. Any help will be appreciated.
You can do it like this:
POST _reindex
{
"source": {
"index": "my_old_index"
},
"dest": {
"index": "my_new_index"
},
"script": {
"source": "if (ctx._source.creativeStatistics != null && ctx._source.creativeStatistics.size() > 0) {ctx._source.creativeId = ctx._source.creativeStatistics[0].creativeId; ctx._source.remove('creativeStatistics')}",
"lang": "painless"
}
}
You can also create a Pipeline by creating a Script Processor as follows:
PUT _ingest/pipeline/my_pipeline
{
"description" : "My pipeline",
"processors" : [
{ "script" : {
"source": "for (item in ctx.creativeStatistics) { if(item.creativeId!=null) {ctx.creativeId = item.creativeId;} }"
}
},
{
"remove": {
"field": "creativeStatistics"
}
}
]
}
Note that if you have multiple nested objects, it would append the last object's creativeId. And it would only add creativeId if a source document has one in its creativeStatistics.
Below is how you can then use reindex query:
POST _reindex
{
"source": {
"index": "creativeindex_src"
},
"dest": {
"index": "creativeindex_dest",
"pipeline": "my_pipeline"
}
}

Elasticsearch Index type doesn't changed after updating status

I've made some _bulk insert successfully , now I'm trying to make query with date range and filter something like:
{
"query": {
"bool": {
"must": [{
"terms": {
"mt_id": [613]
}
},
{
"range": {
"time": {
"gt": 1470009600000,
"lt": 1470009600000
}
}
}]
}
}
Unfortunately I got no results , Now I noticed that the index mapping is created after bulk insert as following:
{
"agg__ex_2016_8_3": {
"mappings": {
"player": {
"properties": {
"adLoad": {
"type": "long"
},
"mt_id": {
"type": "long"
},
"time": {
"type": "string"
}
}
},
As a solution I tried to change the index mapping with:
PUT /agg__ex_2016_8_3/_mapping/player
{
"properties" : {
"mt_id" : {
"type" : "long",
"index": "not_analyzed"
}
}
}
got
{
"acknowledged": true
}
and PUT /agg__ex_2016_8_3/_mapping/player
{
"properties" : {
"time" : {
"type" : "date",
"format" : "yyyy/MM/dd HH:mm:ss"
}
}
}
got:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[vj_es_c1-esc13][10.132.69.145:9300][indices:admin/mapping/put]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [time] of different type, current_type [string], merged_type [date]"
},
"status": 400
}
but nothing happened , and still doesn't get any results.
What i'm doing wrong ? ( I must work with http , not using curl)
Thanks!!
Try this:
# 1. delete index
DELETE agg__ex_2016_8_3
# 2. recreate it with the proper mapping
PUT agg__ex_2016_8_3
{
"mappings": {
"player": {
"properties": {
"adLoad": {
"type": "long"
},
"mt_id": {
"type": "long"
},
"time": {
"type": "date"
}
}
}
}
}
# 3. create doc
PUT agg__ex_2016_8_3/player/104
{
"time": "1470009600000",
"domain": "organisemyhouse.com",
"master_domain": "613###organisemyhouse.com",
"playerRequets": 4,
"playerLoads": 0,
"c_Id": 0,
"cb_Id": 0,
"mt_Id": 613
}
# 4. search
POST agg__ex_2016_8_3/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"mt_Id": [
613
]
}
},
{
"range": {
"time": {
"gte": 1470009600000,
"lte": 1470009600000
}
}
}
]
}
}
}

Update and search in multi field properties in ElasticSearch

I'm trying to use multi field properties for multi language support. I created following mapping for this:
{
"mappings": {
"product": {
"properties": {
"prod-id": {
"type": "string"
},
"prod-name": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
}
}
}
}
}
}
}
I created test record:
{
"prod-id": "1234567",
"prod-name": [
"Test product",
"Produit d'essai"
]
}
and tried to query using some language:
{
"query": {
"bool": {
"must": [
{"match": {
"prod-name.en": "Produit"
}}
]
}
}
}
As a result I got my document. But I expected that I will have empty result when I use French but choose English. It seems ElasticSearch ignores which field I specified in query. There is no difference in search result when I use "prod-name.en" or "prod-name.fr" or just "prod-name". Is this behaviour expected? Should I do some special things to have searching just in one language?
Another problem with updating multi field property. I can't update just one field.
{
"doc" : {
"prod-name.en": "Test"
}
}
I got following error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Field name [prod-name.en] cannot contain '.'"
}
],
"type": "mapper_parsing_exception",
"reason": "Field name [prod-name.en] cannot contain '.'"
},
"status": 400
}
Is there any way to update just one field in multi field property?
In your mapping, the prod-name.en field will simply be analyzed using the english analyzer and the same for the french field. However, ES will not choose for you which value to put in which field.
Instead, you need to modify your mapping like this
{
"mappings": {
"product": {
"properties": {
"prod-id": {
"type": "string"
},
"prod-name": {
"type": "object",
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"fr": {
"type": "string",
"analyzer": "french"
}
}
}
}
}
}
}
and input document to be like this and you'll get the results you expect.
{
"prod-id": "1234567",
"prod-name": {
"en": "Test product",
"fr": "Produit d'essai"
}
}
As for the updating part, your partial document should be like this instead.
{
"doc" : {
"prod-name": {
"en": "Test"
}
}
}

Resources