Update restrictions on Elasticsearch Object type field - elasticsearch

I have to store documents with a single field contains a single Json object. this object has a variable depth and variable schema.
I config a mapping like this:
"mappings": {
"properties": {
"#timestamp": {
"type": "date"
},
"message": {
"type": "object"
}
}
}
It works fine and Elasticsearch creates and updates mapping with documents that received.
The problem is that after some updates in mapping, it rejects new documents and do not update mapping anymore. At this time I change the indices and mapping update occurred for that indies. I'm looking forward to know the right solution.
for example the first document is:
{
personalInfo:{
fistName: "tom"
}
moviesStatistics: {
count: 100
}
}
the second document that will update Elasticsearch mapping is:
{
personalInfo:{
fistName: "tom",
lastName: "hanks"
},
moviesStatistics: {
count: 100
},
education: {
title: "a title..."
}
}
Elasticsearch creates mapping with doc1 and updates it with doc2, doc3, ... until a number of documents received. After that it starts to reject every document that is not matched to the last mapping fields.

After all I found the solution in the home page of Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/7.13//dynamic-field-mapping.html
We can use Dynamic mapping and simply use this mapping:
"mappings": {
"dynamic": "true"
}
You should also change some default restrictions that mentioned here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.13//mapping-settings-limit.html

Related

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

How to change the field type in an ElasticSearch Index?

I have index_A, which includes a number field "foo".
I copy the mapping for index_A, and make a dev tools call PUT /index_B with the field foo changed to text, so the mapping portion of that is:
"foo": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
I then reindex index_A to index_B with:
POST _reindex
{
"source": {
"index": "index_A"
},
"dest": {
"index": "index_B"
}
}
When I go to view any document for index_B, the entry for the "foo" field is still a number. (I was expecting for example: "foo": 30 to become "foo" : "30" in the new document's source).
As much as I've read on Mappings and reindexing, I'm still at a loss on how to accomplish this. What specifically do I need to run in order to get this new index with "foo" as a text field, and all number entries for foo in the original index changed to text entries in the new index?
There's a distinction between how a field is stored vs indexed in ES. What you see inside of _source is stored and it's the "original" document that you've ingested. But there's no explicit casting based on the mapping type -- ES stores what it receives but then proceeds to index it as defined in the mapping.
In order to verify how a field was indexed, you can inspect the script stack returned in:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(doc['foo'])"
}
}
}
}
as opposed to how a field was stored:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(params._source['foo'])"
}
}
}
}
So in other words, rest assured that foo was indeed indexed as text + keyword.
If you'd like to explicitly cast a field value into a different data type in the _source, you can apply a script along the lines of:
POST _reindex
{
"source": {
"index": "index_a"
},
"dest": {
"index": "index_b"
},
"script": {
"source": "ctx._source.foo = '' + ctx._source.foo"
}
}
I'm not overly familiar with java but I think ... = ctx._source.foo.toString() would work too.
FYI there's a coerce mapping parameter which sounds like it could be of use here but it only works the other way around -- casting/parsing from strings to numerical types etc.
FYI#2 There's a pipeline processor called convert that does exactly what I did in the above script, and more. (A pipeline is a pre-processor that runs before the fields are indexed in ES.) The good thing about pipelines is that they can be run as part of the _reindex process too.

Add default value on a field while modifying existing elasticsearch mapping

Let's say I've an elasticsearch index with around 10M documents on it. Now I need to add a new filed with a default value e.g is_hotel_type=0 for each and every ES document. Later I'll update as per my requirments.
To do that I've modified myindex with a PUT request like below-
PUT myindex
{
"mappings": {
"rp": {
"properties": {
"is_hotel_type": {
"type": "integer"
}
}
}
}
}
Then run a painless script query with POST to update all the existing documents with the value is_hotel_type=0
POST myindex/_update_by_query
{
"query": {
"match_all": {}
},
"script" : "ctx._source.is_hotel_type = 0;"
}
But this process is very time consuming for a large index with 10M documents. Usually we can set default values on SQL while creating new columns. So my question-
Is there any way in Elasticsearch so I can add a new field with a default value.I've tried below PUT request with null_value but it doesn't work for.
PUT myindex/_mapping/rp
{
"properties": {
"is_hotel_type": {
"type": "integer",
"null_value" : 0
}
}
}
I just want to know is there any other way to do that without the script query?

ElasticSearch append non matched docs at the end of the search result

Is there any way to append non matched docs at the end of the search result?
I have been working on a project where we need to search docs by geolocation data but some docs don't have the geolocation data available. As a result of that these docs not returning in the search result.
Is there any way to append non matched docs at the end of the search result?
Example mapping:
PUT /my_locations
{
"mappings": {
"_doc": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text"
},
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
Data with geo location:
PUT /my_locations/_doc/1
{
"address" : {
"city: "XYZ",
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}
Data without geo location:
PUT /my_locations/_doc/2
{
"address" : {
"city: "ABC"
}
}
Is there any way to perform geo distance query which will select the docs with geolocation data plus append the non geo docs at the end of the result?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-query.html#query-dsl-geo-distance-query
You have two separate queries
Get documents within the area
Get other documents
To get both of these in one search, would mean all of the documents appear in one result, and share ranking. It would be difficult to create a relevancy model which gets first 9 documents with address, and one without.
But you can just run two queries at once, one for say, the first 9 documents with location, and one for without any.
Example:
GET my_locations/_msearch
{}
{"size":9,"query":{"geo_distance":{"distance":"200km","pin.location":{"lat":40,"lon":-70}}}}
{}
{"size":1,"query":{"bool":{"must_not":[{"exists":{"field":"pin.location"}}]}}}

How to use mapping in elasticsearch?

After treating logs with logstash, All my fields have the same type 'STRING so i want to use mapping in elasticsearch to change some type like ip, port ect.. whereas i don't know how to do it, i'm a super beginner in ElasticSearch..
Any help ?
The first thing to do would be to install the Marvel plugin in Elasticsearch. It allows you to work with the Elasticsearch REST API very easily - to index documents, modify mappings, etc.
Go to the Elasticsearch folder and run:
bin/plugin -i elasticsearch/marvel/latest
Then go to http://localhost:9200/_plugin/marvel/sense/index.html to access Marvel Sense from which you can send commands. Marvel itself provides you with a dashboard about Elasticsearch indices, performance stats, etc.: http://localhost:9200/_plugin/marvel/
In Sense, you can run:
GET /_cat/indices
to learn what indices exist in your Elasticsearch instance.
Let's say there is an index called logstash.
You can check its mapping by running:
GET /logstash/_mapping
Elasticsearch will return a JSON document that describes the mapping of the index. It could be something like:
{
"logstash": {
"mappings": {
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "String"
},
"y": {
"type": "String"
}
}
}
}
}
}
}
}
...in this case doc is the document type (collection) in which you index documents. In Sense, you could index a document as follows:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
... that's a command to index the JSON object under the id 1.
Once a document field such as Foo.x has a type String, it cannot be changed to a number. You have to set the mapping first and then reindex.
First delete the index:
DELETE logstash
Then create the index and set the mapping as follows:
PUT logstash
PUT logstash/doc/_mapping
{
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "long"
},
"y": {
"type": "long"
}
}
}
}
}
}
Now, even if you index a doc with the properties as JSON strings, Elastisearch will convert them to numbers:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
Search for the new doc:
GET logstash/_search
Notice that the returned document, in the _source field, looks exactly the way you sent it to Elasticsearch - that's on purpose, Elasticsearch always preserves the original doc this way. The properties are indexed as numbers though. You can run a range query to confirm:
GET logstash/_search
{
"query":{
"range" : {
"Foo.x" : {
"gte" : 500
}
}
}
}
With respect to Logstash, you might want to set a mapping template for index name logstash-* since Logstash creates new indices automatically: http://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-templates.html

Resources