Elasticsearch appends object fields on update instead of overwriting - elasticsearch

I have an object in Elasticsearch which may contain different fields. In my app this object is Enum so it can't actually contain more than one field at the same time. But when i do an update in Elasticsearch - it appends the fields instead of overwriting the whole object.
For example - the document may be public or accessible only to a group of users:
POST _template/test_template
{
"index_patterns": [
"test*"
],
"template": {
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "keyword"
},
"users": {
"type": "object",
"properties": {
"permitted": {
"type": "keyword"
},
"public": {
"type": "boolean"
}
}
}
}
}
},
"aliases" : {
"test-alias" : { }
}
}
POST test_doc/_doc/1
{
"id": "1",
"users": {
"permitted": [
"1", "2"
]
}
}
POST _bulk
{"update":{"_index":"test_doc","_type":"_doc","_id":1}}
{"doc":{"id":"1","users":{"public": true}},"doc_as_upsert":true}
GET test-alias/_search
I am expecting this result:
{
"id": "1",
"users": {
"public": true
}
}
But the actual result is:
{
"id": "1",
"users": {
"permitted": [
"1",
"2"
],
"public": true
}
}
At the same time it overwrites the fields with the same name perfectly (i can change the permitted array or public field to false). How do you disable object fields appending?

You need to change the action in bulk request to index from update, correct request would be
{"index":{"_index":"71908768","_id":1}}
{"doc":{"id":"1","users":{"public": true}}}
refer actions and what they do in details in the official Elasticsearch docs. In short, update partially updates the document, while index action Indexes the specified document. If the document exists, replaces the document and increments the version.

Related

Return only top level fields in elasticsearch query?

I have a document that has nested fields. Example:
"mappings": {
"blogpost": {
"properties": {
"title": { "type": "text" },
"body": { "type": "text" },
"comments": {
"type": "nested",
"properties": {
"name": { "type": "text" },
"comment": { "type": "text" },
"age": { "type": "short" },
"stars": { "type": "short" },
"date": { "type": "date" }
}
}
}
}
}
}
Can the query be modified so that the response only contains non-nested fields?
In this example, the response would only contain body and title.
Using _source you can exclude/include fields
GET /blogpost/_search
{
"_source":{
"excludes":["comments"]
}
}
But you have to explicitly put the field names inside exclude, I'm searching for a way to exclude all nested fields without knowing their field name
You can achieve that but in a static way, which means you entered the field(s) name using excludes keyword, like:
GET your_index/_search
{
"_source": {
"excludes": "comments"
},
"query": {
"match_all" : {}
}
}
excludes can take an array of strings; not just one string.

Parent Child Relation In Elastic Search 7.5

I am new to "Elastic Search" and currently trying to understand how does ES maintain "Parent-Child" relationship. I started with the following article:
https://www.elastic.co/blog/managing-relations-inside-elasticsearch
But the article is based on old version of ES and I am currently using ES 7.5 which states that:
The _parent field has been removed in favour of the join field.
Now I am currently following this article:
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/parent-join.html
However, I am not able to get the desired result.
I have a scenario in which i have two indices "Person" and "Home". Each "Person" can have multiple "Home" which is basically a one-to-many relation. Problem is when I query to fetch all homes whose parent is "XYZ" person the answer is null.
Below are my indexes structure and search query:
Person Index:
Request URL: http://hostname/person
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Home Index:
Request URL: http://hostname/home
{
"mappings": {
"properties": {
"state": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Adding data in person Index
Request URL: http://hostname/person/_doc/1
{
"name": "shujaat",
"person_home": {
"name": "person"
}
}
Adding data in home index
Request URL: http://hostname/home/_doc/2?routing=1&refresh
{
"state": "ontario",
"person_home": {
"name": "home",
"parent": "1"
}
}
Query to fetch data: (To fetch all the records who parent is person id "1")
Request URL: http://hostname/person/_search
{
"query": {
"has_parent": {
"parent_type": "person",
"query": {
"match": {
"name": "shujaat"
}
}
}
}
}
OR
{
"query": {
"has_parent": {
"parent_type": "person",
"query": {
"match": {
"_id": "1"
}
}
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
I am unable to understand what I am missing here or what is wrong with the above mentioned query as it not returning any data.
You should put the parent and child documents in the same index:
The join datatype is a special field that creates parent/child
relation within documents of the same index.
So the mapping would look like the following:
PUT http://hostname/person_home
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"state": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Notice that it has both fields from your original person and home indexes.
The rest of your code should work just fine. Try inserting the person and home documents into the same index person_home and use the queries as you posted in the question.
What if person and home objects have overlapping field names?
Let's say, both object types have got field name but we want to index and query them separately. In this case we can come up with a mapping like this:
PUT http://hostname/person_home
{
"mappings": {
"properties": {
"person": {
"properties": {
"name": {
"type": "text"
}
}
},
"home": {
"properties": {
"name": {
"type": "keyword"
},
"state": {
"type": "text"
}
}
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Now, we should change the structure of the objects themselves:
PUT http://hostname/person_home/_doc/1
{
"name": "shujaat",
"person_home": {
"name": "person"
}
}
PUT http://hostname/person_home/_doc/2?routing=1&refresh
{
"home": {
"name": "primary",
"state": "ontario"
},
"person_home": {
"name": "home",
"parent": "1"
}
}
If you have to migrate old data from the two old indexes into a new merged one, reindex API may be of use.

Elasticsearch Mapping and Field Type

Hello Elasticsearch Gurus out there.
Given the following index and doctype:
localhost:9200/myindex/mydoctype
I currently have this index definition:
{
"myindex": {
"aliases": {},
"mappings": {
"mydoctype": {
"properties": {
"theNumber": {
"type": "integer"
},
"theString": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1487158714808",
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1070599"
},
"uuid": "cm2OtivhTO-RjuZPeHvL1w"
}
},
"warmers": {}
}
}
And I was able to add this document:
{
"theNumber" : 0,
"theString" : "zero"
}
But what I wasn't expecting, is that, I am also able to add this document:
{
"theNumber" : 3.1418,
"theString" : 3,
"fiefoe" : "fiefoe"
}
... where the field types doesn't match.
As well as there is a new field/column introduced.
I wasn't expecting this kind of behaviour because of the Mappings I have defined for my index.
Does this have something to do with Elasticsearch being schema-less?
Is it possible to set Elasticsearch to accept only those types and fields for every document added for this index?
Is this how elasticsearch mapping work in the first place? (maybe I didn't know hehehe)
Thanks =)
Elasticsearch uses dynamic mapping, so when it finds a field that doesn't exist in the mapping, it tries to index it by guessing its type.
What you can do it to disable this behavior using dynamic: false in the mapping on the root object. In this case ElasticSearch will ignore the unmapped field.
{
"myindex": {
"aliases": {},
"mappings": {
"mydoctype": {
"dynamic": false, <-----
"properties": {
"theNumber": {
"type": "integer"
},
"theString": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1487158714808",
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "1070599"
},
"uuid": "cm2OtivhTO-RjuZPeHvL1w"
}
},
"warmers": {}
}
}
Alternatively, you can use dynamic:strict if you want to throw an exception when an unmapped field is trying to be indexed.
The documentation for this is here.
Kindly allow me to answer my own question...
This setting worked for me in this case:
API URL Request: localhost:9200/myindex/_mapping/mydoctype
HTTP Body:
{
"mydoctype" : {
"dynamic": "strict",
"properties" : {
"theNumber" : {"type" : "integer"},
"theString" : {"type" : "string"},
"stash": {
"type": "object",
"dynamic": false
}
}
}
}
Then I tried adding this object:
{
"theNumber" : 5.55555,
"theString" : 5,
"fiefoe" : "fiefoe"
}
I got this response:
{
"error": "StrictDynamicMappingException[mapping set to strict, dynamic introduction of [fiefoe] within [mydoctype] is not allowed]",
"status": 400
}
Thanks =)!
P.S.
Reference:
https://www.elastic.co/guide/en/elasticsearch/guide/1.x/dynamic-mapping.html

Elasticsearch Multi-Field With 'Raw' Value Not Being Created

I'm attempting to add an un-analyzed version of an analyzed field, as a 'raw' multi-field, as per the ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/multi-fields.html
This seems to be a common, well-supported pattern.
I've created the following index / field :
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"userName": {
"type": "string",
"analyzer": "autocomplete",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If I query the index directly, i.e. GET /person, I see the mapping as I've posted above, so I'm confident that there wasn't a syntax error, etc.
However, when we're pushing data into the index, a userName.raw field is not being created.
{
"_index": "person",
"_type": "employee",
"_id": "2",
"_version": 1,
"found": true,
"_source": {
"username": "Test Value"
}
}
Anyone see something I'm missing?
Thanks!
EDIT:
This was a novice mistake when creating my index.
PUT person
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Notice the person key is being PUT in the 'person' index. This was creating a nested person.
Correct syntax is to remove the extra "person"
PUT person
{
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Please see Linoy.M.K's answer, as he is correct.
The 'raw' field will not appear when retrieving a record by ID. Its only useful as part of a query.
Adding multiple analyzers will not modify your source document means your source document will always have username only not username.raw
Added analyzers are useful when you do searching, means you can now search with username and username.raw to achieve different behavior like below.
GET /person/employee/_search
{
"query": {
"match": {
"username": "Te"
}
}
}
GET /person/employee/_search
{
"query": {
"match": {
"username.raw": "Test Value"
}
}
}

Saving variable types under a single key in elasticsearch?

I have bunch of documents coming in from fluentd and I'm saving then to elasticsearch with fluent-plugin-elasticsearch.
Some of those documents have a string under the key name and some have an object.
Example
{
"name": "foo"
}
and
{
"name": {
"en": "foo",
"fi": "bar"
}
}
These documents are the same type in terms of my application and they are saved to same elasticsearch index.
But elasticsearch has an issue with this. When the second document is saved to elasticsearch it throws this error:
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [name]
This seems to happen because elasticsearch has set the key name to be type of string. I can see this using curl http://localhost:9200/fluentd-[tagname]/_mapping and it obviously doesn't like it when I try save an object to it afterwards.
So is there any way to workaround this in elasticsearch?
I cannot control the incoming documents and there are multiple keys with variable types - not just name. So I cannot make a single hack for that key only.
This is pretty annoying since those documents are completely left out of elasticsearch and sent to /dev/null.
If this is completely impossible - is possible to at least save those documents to a file or something so I wouldn't lose them?
Here's my template for the fluentd-* indices:
{
"fluentd_template": {
"template": "fluentd-*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index": {
"query": {
"default_field": "msg"
},
"analysis" : {
"analyzer" : {
"default" : {
"type" : "keyword"
}
}
}
}
},
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"_source": {
"compress": true
},
"properties": {
"#timestamp": {
"type": "date",
"index": "not_analyzed"
}
}
}
}
}
}

Resources