ElasticSearch : "copy_to" a nested fields

ElasticSearch : "copy_to" a nested fields - elasticsearch

I try to use the ES "copy_to" attribute to replicate an object field into a nested field, but I got an error despite my multiple tries. Here is my structure :
...
"identifiedBy": {
"type": "object",
"properties": {
"type": {
"type": "keyword",
"copy_to": "nested_identifiers.type"
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
"copy_to": "nested_identifiers.type"
},
"note": {
"type": "text"
},
"qualifier": {
"type": "keyword"
},
"source": {
"type": "keyword",
"copy_to": "nested_identifiers.type"
},
"status": {
"type": "text"
}
}
},
"nested_identifiers": {
"type": "nested",
"properties": {
"type": {
"type": "keyword",
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
},
"source": {
"type": "keyword",
}
}
}
...
The mapping error is
java.lang.IllegalArgumentException: Illegal combination of [copy_to] and [nested]
mappings: [copy_to] may only copy data to the current nested document or any of its
parents, however one [copy_to] directive is trying to copy data from nested object [null]
to [nested_identifiers]
I also try to place the "copy_to" at the "identifiedBy" root level : doesn't work.
I also try to use the a "fields" property into "identifiedBy" and "copy_to" this subfield : doesn't work.
Is anyone knows a solution to solve my problem ?
Thanks for your help.

Tldr;
Because of how Elasticsearch index nested documents. This is not possible ... without updating the mapping.
There is indeed a work around, using include_in_root: true setting.
Else I suggest you pre process you data before indexing it, and during this pre process copy the data over to the nested field. Maybe using an ingest pipeline ?
Ingest Pipeline
PUT /72270706/
{
"mappings": {
"properties": {
"root_type":{
"type": "keyword"
},
"nested_doc":{
"type": "nested",
"properties": {
"nested_type":{
"type": "keyword"
}
}
}
}
}
}
PUT _ingest/pipeline/set_nested_type
{
"processors": [
{
"set": {
"field": "nested_doc.nested_type",
"copy_from": "root_type"
}
}
]
}
POST /72270706/_doc?pipeline=set_nested_type
{
"root_type": "a type"
}
GET /72270706/_search
Should give you
{
"took" : 392,
"timed_out" : false,
"_shards" : {
...
},
"hits" : {
...
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "72270706",
"_id" : "laOB0YABOgujegeQNA8D",
"_score" : 1.0,
"_source" : {
"root_type" : "a type",
"nested_doc" : {
"nested_type" : "a type"
}
}
}
]
}
}
To work around
...
"identifiedBy": {
"type": "object",
"properties": {
"type": {
"type": "keyword",
"copy_to": "nested_identifiers.type"
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
"copy_to": "nested_identifiers.type"
},
"note": {
"type": "text"
},
"qualifier": {
"type": "keyword"
},
"source": {
"type": "keyword",
"copy_to": "nested_identifiers.type"
},
"status": {
"type": "text"
}
}
},
"nested_identifiers": {
"type": "nested",
"include_in_root": true,
"properties": {
"type": {
"type": "keyword",
},
"value": {
"type": "text",
"analyzer": "identifier-analyzer",
},
"source": {
"type": "keyword",
}
}
}
...
You will need to re index the existing data.
But be aware the copy_to will not copy the information to the nested object. But to another field, that has the same name but is not nested.

Related

Disable dynamic mapping completely in Elasticsearch

I have an index template, from which I am creating an index
PUT /_index_template/example_template
{
"index_patterns": [
"example*"
],
"priority": 1,
"template": {
"aliases": {
"example":{}
},
"mappings": {
"dynamic":strict,
"_source":
{"enabled": false},
"properties": {
"SomeID":
{ "type": "keyword", "index" : true,"store":true,"ignore_above":5},
"firstName":
{ "type": "text", "index" : true,"store":true},
"lastName":
{ "type": "text", "index" : false},
"PersonInfo": {
"type": "object",
"dynamic":"true",
"properties": {
"FirstName": {
"type": "keyword",
"index": true,
"store": false
}
}
}
}
},
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 3
}
}
}
}
As in the template mappings you can see I am making the dynamic as Strict, so that new fields cant be added to the mappings,
while on inner object, PersonInfo, I can set dynamic as true, which takes precedence and allow to insert a new field mapping.
PUT example10022021/_doc/1
{
"SomeID":"1234",
"firstName":"Nishikant",
"PersonInfo.service_data":"random"
}
Here service_data is getting added into mappings, as dynamic is true
"PersonInfo" : {
"dynamic" : "true",
"properties" : {
"FirstName" : {
"type" : "keyword"
},
"service_data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
Is there any way to disable the dynamic mapping completely? like specifying globally?
Thanks!
Steps I took After #Val answer:
PUT /_index_template/example_template
{
"index_patterns": [
"example*"
],
"priority": 1,
"template": {
"aliases": {
"order":{}
},
"mappings": {
"dynamic": "strict",
"dynamic_templates": [
{
"objects": {
"match_mapping_type": "object",
"mapping": {
"dynamic": "strict"
}
}
}
],
"_source":
{"enabled": false},
"properties": {
"BillToID":
{ "type": "keyword", "index" : true,"store":true,"ignore_above":5},
"firstName":
{ "type": "text", "index" : true,"store":true},
"lastName":
{ "type": "text", "index" : false},
"PersonInfo": {
"type": "object",
"dynamic":true,
"properties": {
"FirstName": {
"type": "keyword",
"index": true,
"store": false
}
}
}
}
},
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 3
}
}
}
}
then I create an index
PUT example10022021
then inserting a document
POST example10022021/_doc/1
{
"BillToID":"1234",
"firstName":"Nishikant",
"PersonInfo.service_data":"random"
}
this will result in 200OK, now if you check the mappings again
GET example10022021
in o/p you can see the dynamic field mapping getting added(this behavior I don't want),
"PersonInfo" : {
"dynamic" : "true",
"properties" : {
"FirstName" : {
"type" : "keyword"
},
"service_data" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}

What you can do is to create another index template that applies to all indexes, i.e. using the * name pattern:
PUT /_index_template/common_template
{
"index_patterns": [
"*"
],
"priority": 0,
"template": {
"mappings": {
"dynamic": "strict",
...
If you want to also restrict the creation of dynamic fields inside inner objects, you can leverage dynamic templates, like this:
PUT /_index_template/common_template
{
"index_patterns": [
"*"
],
"priority": 1000,
"template": {
"settings": {},
"mappings": {
"dynamic": "strict",
"dynamic_templates": [
{
"objects": {
"match_mapping_type": "object",
"mapping": {
"dynamic": "strict"
}
}
}
],
"properties": {
"test": {
"type": "object",
"properties": {
"inner": {
"type": "integer"
}
}
}
}
}
}
}
With the above index template, you can create a document like this one:
POST test/_doc/
{
"test": {
"inner": 1
}
}
But not like this one:
POST test/_doc/
{
"test": {
"inner": 1,
"inner2": 2 <--- this will not be allowed
}
}

How to sort on a text field with elastic search

{
"parent" : "some_id",
"type" : "support",
"metadata" : {
"account_type" : "Regular",
"subject" : "Test Subject",
"user_name" : "John Doe",
"origin" : "Origin",
"description" : "TEST",
"media" : [ ],
"ticket_number" : "XXXX",
"status" : "completed",
},
"create_time" : "2021-02-24T15:08:57.750Z",
"entity_name" : "comment"
}
This is my demo data. and when I try to sort by metadata.sort for e.g. ->
GET comments-*/_search
{
"query": {
"bool": {
"must": [{
"match": {
"type": "support"
}
}]
}
},
"from": 0,
"size": 50,
"sort": [{
"metadata.status": {
"order": "desc"
}
}]
}
it says -> Fielddata is disabled on text fields by default. Set fielddata=true on [metadata.status] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.
I am not sure how to achieve the same as I am very new to ESS. Any help would be appreciated

You can only sort by fields of type "keyword" on string fields.
Elasticsearch dynamic mappings will create 2 fields if you dont set the mappings before sending docs.
In this case "status" , and "status.keyword".
So try with "metadata.status.keyword".
TL;DR
It is a good practice for fields you will not be doing full text search (like status flags) to only store the keyword version of the field.
To do that you have to set the mappings before indexing any document.
There is a trick:
Ingest Data
POST test_predipa/_doc
{
"parent" : "some_id",
"type" : "support",
"metadata" : {
"account_type" : "Regular",
"subject" : "Test Subject",
"user_name" : "John Doe",
"origin" : "Origin",
"description" : "TEST",
"media" : [ ],
"ticket_number" : "XXXX",
"status" : "completed"
},
"create_time" : "2021-02-24T15:08:57.750Z",
"entity_name" : "comment"
}
Get the autogenerated mappings
GET test_predipa/_mapping
Create a new empty index with the same mappings and modify as you want (on this case remove the text type field from metadata.status and let only the keyword one.
PUT test_predipa_new
{
"mappings": {
"properties": {
"create_time": {
"type": "date"
},
"entity_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"metadata": {
"properties": {
"account_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"status": {
"type": "keyword"
},
"subject": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ticket_number": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"parent": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Move the data from the old index to the new empty one
POST _reindex
{
"source": {
"index": "test_predipa"
},
"dest": {
"index": "test_predipa_new"
}
}
Run the sort query
GET test_predipa_new/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"type": "support"
}
}
]
}
},
"from": 0,
"size": 50,
"sort": [
{
"metadata.status": {
"order": "desc"
}
}
]
}

Most probably, the issue is that metadata.status is of text type, which is not sortable (see docs). You can sort over a textual field if this is of a keyword type.
Please check the mapping of your index. Most probably, your index has default mapping (see docs), and a keyword sub-field is automatically assigned to every field with a string value.
TL;DR: try to run this query
GET comments-*/_search
{
"query": {
"bool": {
"must": [{
"match": {
"type": "support"
}
}]
}
},
"from": 0,
"size": 50,
"sort": [{
"metadata.status.keyword": {
"order": "desc"
}
}]
}

ElasticSearch Mapping Issue - Nested to Non-Nested

I am creating a mapping for data generated by a computer vision application. However, I am getting an error when I test pushing an example data message to ElasticSearch. I have read tons of forums where others have had this issue. Some have resolved their issue but I have tried everything I know to try. I actually think there may be a simple resolution but I am relatively new to Elastic
Search.
The index and mapping are created successfully using:
PUT vision_events
{
"settings" : {
"number_of_shards" : 5
},
"mappings" : {
"properties": {
"camera_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"hit_counts": {
"type": "long"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"intersection": {
"type": "boolean"
},
"label": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"locations": {
"type" : "nested",
"properties": {
"coords" : {
"type" : "float"
},
"location": {
"type": "text"
},
"street_segment": {
"type": "text"
},
"timestamp": {
"type": "date"
}
}
},
"pole_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timestamp": {
"type": "date"
}
}
}
}
Once completed, I move on to validating the mapping is correct. I push the following example data:
POST /vision_events/1?pretty=true
{
"pole_id": "mlk-central-2",
"camera_id": "mlk-central-cam-2",
"intersection": true,
"id": "644d1c06-4c60-4ed8-93b4-1aa79b87a622",
"label": "car",
"timestamp": 1586838108683,
"locations": [
{
"timestamp": 1586838109448,
"coords": 1626.3220383482665,
"street_segment": "None"
},
{
"timestamp": 1586838109832,
"coords": 1623.3129222859882,
"street_segment": "None"
}
],
"hit_counts": 2
}
This produces the following error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "object mapping [locations] can't be changed from nested to non-nested"
}
],
"type" : "illegal_argument_exception",
"reason" : "object mapping [locations] can't be changed from nested to non-nested"
},
"status" : 400
}
The locations field is a list of "objects" which contain the fields: coords, location, street_segment and timestamp. Messages have varying length of locations. Any help would be greatly appreciated.

Put the unchanged mapping:
PUT vision_events
{"settings":{"number_of_shards":5},"mappings":{"properties":{"camera_id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"hit_counts":{"type":"long"},"id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"intersection":{"type":"boolean"},"label":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"locations":{"type":"nested","properties":{"coords":{"type":"float"},"location":{"type":"text"},"street_segment":{"type":"text"},"timestamp":{"type":"date"}}},"pole_id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"timestamp":{"type":"date"}}}}
Insert a single doc in accordance w/ the POST structure from the docs:
POST /vision_events/_doc/1?pretty=true
{
"pole_id": "mlk-central-2",
"camera_id": "mlk-central-cam-2",
"intersection": true,
"id": "644d1c06-4c60-4ed8-93b4-1aa79b87a622",
"label": "car",
"timestamp": 1586838108683,
"locations": [
{
"timestamp": 1586838109448,
"coords": 1626.3220383482665,
"street_segment": "None"
},
{
"timestamp": 1586838109832,
"coords": 1623.3129222859882,
"street_segment": "None"
}
],
"hit_counts": 2
}

Elastic Search mapper_parsing_exception error

I have created a index in elastic search with name test. Index mapping is as follow:
{
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
after creating index I have added following documents into it:
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
But I am getting following error:
{"mapper_parsing_exception","reason":"failed to parse field [url] of
type [text]"}
can anyone help me into this?

If your documents look like this:
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
Then your mapping needs to look like this, i.e. url is an object with the name and age fields:
{
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"age": {
"type": "integer"
}
}
}
}

Hi You need to create mapping like this
PUT test
{
"settings" : {
"number_of_shards" : 1
},
"mapping": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
And the document is
put test/doc/1
{
"title": "demo",
"url": {
"name": "tiger",
"age": 10
}
}
GET test/doc/1
And the result is
{
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "demo",
"url" : {
"name" : "tiger",
"age" : 10
}
}
}

One reason for this if you're on Elastic Cloud is that the data types are assigned to fields the first time they appear on an index. And it will throw this error if you send it a subsequent log with a different type in that field.
For me, the log field was a string in the first log sent to the index but an object in the second. So the second one got rejected.
Good explanation here: https://discuss.elastic.co/t/getting-illegal-state-exception-error-while-pushing-logs-to-elasticsearch/290029

How to build a parent/child mapping for Elasticsearch?

I tried to use the following mapping to index my data:
{
"mappings": {
"chow-demo": {
"properties": {
"#fields": {
"dynamic": "true",
"properties": {
"asgid": {
"type": "string",
"analyzer": "keyword"
},
"asid": {
"type": "long"
},
"astid": {
"type": "long"
},
"clfg": {
"analyzer": "keyword",
"type": "string"
},
"httpcode": {
"type": "long"
},
"oid": {
"type": "string"
},
"onid": {
"type": "long"
},
"ptrnr": {
"analyzer": "keyword",
"type": "string"
},
"pguid": {
"analyzer": "keyword",
"type": "string"
},
"ptid": {
"type": "long"
},
"sid": {
"type": "long"
},
"src_url": {
"analyzer": "keyword",
"type": "string"
},
"title": {
"analyzer": "keyword",
"type": "string"
},
"ts": {
"type": "long"
}
}
},
"#timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"#message": {
"type": "string"
},
"#source": {
"type": "string"
},
"#type": {
"analyzer": "keyword",
"type": "string"
},
"#tags": {
"type": "string"
},
"#source_host": {
"type": "string"
},
"#source_path": {
"type": "string"
}
}
},
"chow-clfg": {
"_parent": {
"type": "chow-demo"
},
"dynamic": "true",
"properties": {
"_ttl": {
"enabled": true,
"default": "1h"
},
"clfg": {
"analyzer": "keyword",
"type": "string"
},
"#timestamp": {
"format": "dateOptionalTime",
"type": "date"
},
"count": {
"type": "long"
}
}
}
}
}
I tried to populate the parent type "chow-demo" without populating the child type "chow-clfg", and the document refused to index. (No documents were indexed into Elasticsearach)
When I take out the child mapping for "chow-clfg", it does indexing properly as usual. Hence I have the following question:
Is my mapping structure wrong?
Must the parent and child be indexed together at the same time before the data can be successfully indexed?
Really need help in this question for my project to progress! Thanks!

Yes, your mapping is wrong. The _ttl element should be one level higher in the chow-clfg type. In other words _ttl should be on the same level as _parent. However, I am not quite sure how this problem can affect your ability to index.
Parents and children don't have to be indexed together.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ElasticSearch : "copy_to" a nested fields - elasticsearch

Related

Disable dynamic mapping completely in Elasticsearch

How to sort on a text field with elastic search

ElasticSearch Mapping Issue - Nested to Non-Nested

Elastic Search mapper_parsing_exception error

How to build a parent/child mapping for Elasticsearch?

Categories

Resources