Elastic search versioning using last updated timestamp? - elasticsearch

Our data model doesn't have version field separately. One of the ways we versioned the data model is by the id and the last updated timestamp, the version will be incremented when a new record with same id but latest last updated timestamp is received.
However in elastic search, there is no way to derive the value of _id field. Multi fields cannot applied to _id field.
Our system is reactive and message driven, so can't rely on the order in which we receive the message.
is there anyways we can solve versioning in a performant way?

The _version field in elasticsearch is not for versioning. It's to ensure you are working on the expected document (e.g. you read a doc and decide to delete it, than it would be wise to add the version-number of the read doc to the delete command).
You can set the _id field to "[your_id]_[timestamp]" and add two additional fields "my_id" and "timestamp".
How to set the _id to "[your_id]_[timestamp]"? If you use logstash than you can use the mutate filter:
mutate { add_field => ["id", "%{your_id}", "_", "%{timestamp}"] }
should work. If you don't use logstash, you have to create the id-field similar.

Related

elasticsearch unknown setting index.include_type_name

I'm in really weird situations, I need to create indexes in elasticsearch that contain typeless fields. I have a rails application that sends any data per second to my elasticsearch. about my architecture, I have to say I use elastic-stack on docker in ubuntu server and use socket to send data's to elk and all of them are the latest version.
In my rails application user could choose datatype for each field but the issues happen when the user want to change the datatype of one field right after it's created, logstash return this error
error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [field] of type [long] in document with id '5e760cac-cafc-4fd0-9e45-1c650967ccd4'. Preview of field's value: '2022-01-18T08:06:30'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"For input string: \"2022-01-18T08:06:30\
I found deadly queue letter plugins to save wrong input in my server after that I think if I could index documents without any type the problem is solved so I start to googling and found Removal of mapping types in elasticsearch documents and I follow instructions which describe in tutorials I get the following error:
unknown setting [index.include_type_name] please check that any required plugins are installed, or check the breaking changes documentation for removed settings
even I put "include_type_name" in the request to send to the elastic noting change I have the latest version of elastic.
I think maybe it's helpful to edit the default elasticsearch template but noting the change. could you please help me with what should I do?
As already mentioned in the comments, Elasticsearch does not support changing the data type of a field without a reindex or creating a new index.
For example, if a field is mapped as a numeric field like integer and the user wants to index a string value in this field, elasticsearch will return an mapping error.
You would need to change the mapping of the index and reindex it or create a entirely new index using the new mapping.
None of this is done automatically by elastic, you would need to deal with this in your application, you could catch the error and implement some logic to create a new index with the new mapping, but this also could lead to other problems as having too many indices in the cluster and query errors when the range of the query include index with the same field with different mappings.
One feature that Elasticsearch has that could help you in some way is the runtime fields, with runtime fields you can query a field that has a specific mapping using a different mapping.
For example, if you have a field that has date values, but was wrongly mapped as a keyword or text field, you could use a runtime field to query it as it was a date field.
But again, this will need that you implement a logic to build those runtime fields and also can lead to other problems, not all the data types are available to runtime fields and runtime fields can impact in the performance.
Another feature that could help you is to use of multi-fields, this, I think, is the closest you got of having a field with multiple data types.
Using multi-fields you could have a field named date with the date type and also a field named date.keyword with the keyword type, you also could have a field name code with the keyword type and a field name code.int with the integer type, you would also need to use the ignore_malformed setting in the mapping so elastic does not reject the entire document in case of mapping errors, just the field with the wrong mapping.
Just keep in mind that when use multi-fields, you will have a different field for each mapping, for example date is a field, date.keyword is another field, this will increase the storage usage.
But again, none of this is done automatically, it needs logic in your application, elasticsearch does not allows you to change the mapping of a field, if your application needs this, you will need to implement something in the application that can work with that limitations of elasticsearch.

elasticsearch - Tag data with lookup table values

I’m trying to tag my data according to a lookup table.
The lookup table has these fields:
• Key- represent the field name in the data I want to tag.
In the real data the field is a subfield of “Headers” field..
An example for the “Key” field:
“Server. (* is a wildcard)
• Value- represent the wanted value of the mentioned field above.
The value in the lookup table is only a part of a string in the real data value.
An example for the “Value” field:
“Avtech”.
• Vendor- the value I want to add to the real data if a combination of field- value is found in an document.
An example for combination in the real data:
“Headers.Server : Linux/2.x UPnP/1.0 Avtech/1.0”
A match with that document in the look up table will be:
Key= Server (with wildcard on both sides).
Value= Avtech(with wildcard on both sides)
Vendor= Avtech
So baisically I’ll need to add a field to that document with the value- “ Avtech”.
the subfields in “Headers” are dynamic fields that changes from document to document.
of a match is not found I’ll need to add to the tag field with value- “Unknown”.
I’ve tried to use the enrich processor , use the lookup table as the source data , the match field will be ”Value” and the enrich field will be “Vendor”.
In the enrich processor I didn’t know how to call to the field since it’s dynamic and I wanted to search if the value is anywhere in the “Headers” subfields.
Also, I don’t think that there will be a match between the “Value” in the lookup table and the value of the Headers subfield, since “Value” field in the lookup table is a substring with wildcards on both sides.
I can use some help to accomplish what I’m trying to do.. and how to search with wildcards inside an enrich processor.
or if you have other idea besides the enrich processor- such as parent- child and lookup terms mechanism.
Thanks!
Adi.
There are two ways to accomplish this:
Using the combination of Logstash & Elasticsearch
Using the only the Elastichsearch Ingest node
Constriant: You need to know the position of the Vendor term occuring in the Header field.
Approach 1
If so then you can use the GROK filter to extract the term. And based on the term found, do a lookup to get the corresponding value.
Reference
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_static.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_streaming.html
Approach 2
Create an index consisting of KV pairs. In the ingest node, create a pipeline which consists of Grok processor and then Enrich it. The Grok would work the same way mentioned in the Approach 1. And you seem to have already got the Enrich part working.
Reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html
If you are able to isolate the sub field within the Header where the Term of interest is present then it would make things easier for you.

Filtering Elasticsearch fields from index/store

I was wondering what is the recommended approach to filter out some of the fields that are sent to Elasticsearch from Store and Index?
I want to filter our some fields from getting indexed in Elasticsearch. You may ask why you are sending them to Elasticsearch from the first place. Unfortunately, it is sent via another application that doesn't accept any filtering mechanism. Hence, filtering should be addressed at the time of indexing. Here is what we have done, but I am not sure what would be the consequences of these steps:
1- Disable dynamic mapping ("dynamic": "false" ) in ES templates.
2- Including only the required fields in _source and excluding the rest.
According to ES website, some of the ES functionalities will be disabled by disabling _source fields. Given I don't need the filtered fields at all, I was wondering whether the mentioned solution will break anything regarding the remaining fields or not?
There are a few mapping parameters that allow you to do what you want:
index: true/false: if true the field value is indexed in order to be searched later on (default: true)
store: true/false: if true the field values are stored in addition to being indexed. Usually, the field values are stored in the source already, but you can choose to not store the source but store the field value itself (default: false)
enabled: true/false: only for the mapping type as a whole or for object types. you can decide whether to only store the value but not index it
So you can use any combination of the above parameters if you don't want to modify the source documents and simple let ES do it for you.

How to project a new field in response in ElasticSearch?

I am using Elasticsearch 6.2.
I have an index products with index_type productA having data with following structure:
{
"id": 1,
"parts": ["part1", "part2",...]
.....
.....
}
Now during the query time, I want to add or project a field parts_count to the response which simply represents the number of parts i.e the length of parts array. Also, if possible, I would also like to sort the documents of productA based on the generated field parts_count.
I have gone through most of the docs but haven't found a way to achieve this.
Note:
I don't want to update the mapping and add dynamic fields. I am not sure if Elasticsearch allows it. I just wanted to mention it.
Did you read about Script Fields and on Script Based Sorting?
I think you should be able to achieve both things and this not require any mapping updates.

Elasticsearch: Updating a field that has been set as a document _id via mapping with a path

So basically I have an index that I created, and have set the mapping so that whenever a document is created, the _id of the document is set as one of the fields of the document.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-id-field.html
This was easy enough, but I noticed that when I update that field (through the Java API), the _id of the document remains the same so the field and _id are out of sync.
Is this intended behaviour? If so, does anyone know why, and if it is a bad idea to set the _id as a field that may frequently change?
If I wanted the _id and the field to be in sync, is reindexing an option?
Thanks
The _id is extracted and copied from that field while indexing.
Also _id is used as routing key and it decides where the document should go in the whole cluster.
Hence its not possible to keep _id as reference to some field , rather that valued is copied to _id before indexing.
If you want to change _id , re-indexing is the only option.

Resources