nifi: How to specify dynamic index name when sending data to elasticsearch - elasticsearch

I am new to apache NiFi.
I am trying to put data into elasticsearch using nifi.
I want to specify an index name by combining a specific string and the value converted from a timestamp field into date format.
I was able to create the desired shape with the expression below, but failed to create the index name with the value of the timestamp field of the content.
${now():format('yyyy-MM-dd')}
example json data
{
"timestamp" :1625579799000,
"data1": "abcd",
"date2": 12345
}
I would like to get the following result:
index : "myindex-2021.07.06"
What should I do? please tell me how

I know that if you use the PutElasticSearch Processor, you can provide it with a specific index name to use. And as long as the index name meets the proper ElasticSearch format for naming a new index, if the enable auto index creation in ElasticSearch is turned on, then when sent, Elastic will create the new index with the desired name. This has worked for me. Double check the Elastic Naming Rules that can be found here or https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-indexing.html

Related

elasticsearch unknown setting index.include_type_name

I'm in really weird situations, I need to create indexes in elasticsearch that contain typeless fields. I have a rails application that sends any data per second to my elasticsearch. about my architecture, I have to say I use elastic-stack on docker in ubuntu server and use socket to send data's to elk and all of them are the latest version.
In my rails application user could choose datatype for each field but the issues happen when the user want to change the datatype of one field right after it's created, logstash return this error
error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [field] of type [long] in document with id '5e760cac-cafc-4fd0-9e45-1c650967ccd4'. Preview of field's value: '2022-01-18T08:06:30'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"For input string: \"2022-01-18T08:06:30\
I found deadly queue letter plugins to save wrong input in my server after that I think if I could index documents without any type the problem is solved so I start to googling and found Removal of mapping types in elasticsearch documents and I follow instructions which describe in tutorials I get the following error:
unknown setting [index.include_type_name] please check that any required plugins are installed, or check the breaking changes documentation for removed settings
even I put "include_type_name" in the request to send to the elastic noting change I have the latest version of elastic.
I think maybe it's helpful to edit the default elasticsearch template but noting the change. could you please help me with what should I do?
As already mentioned in the comments, Elasticsearch does not support changing the data type of a field without a reindex or creating a new index.
For example, if a field is mapped as a numeric field like integer and the user wants to index a string value in this field, elasticsearch will return an mapping error.
You would need to change the mapping of the index and reindex it or create a entirely new index using the new mapping.
None of this is done automatically by elastic, you would need to deal with this in your application, you could catch the error and implement some logic to create a new index with the new mapping, but this also could lead to other problems as having too many indices in the cluster and query errors when the range of the query include index with the same field with different mappings.
One feature that Elasticsearch has that could help you in some way is the runtime fields, with runtime fields you can query a field that has a specific mapping using a different mapping.
For example, if you have a field that has date values, but was wrongly mapped as a keyword or text field, you could use a runtime field to query it as it was a date field.
But again, this will need that you implement a logic to build those runtime fields and also can lead to other problems, not all the data types are available to runtime fields and runtime fields can impact in the performance.
Another feature that could help you is to use of multi-fields, this, I think, is the closest you got of having a field with multiple data types.
Using multi-fields you could have a field named date with the date type and also a field named date.keyword with the keyword type, you also could have a field name code with the keyword type and a field name code.int with the integer type, you would also need to use the ignore_malformed setting in the mapping so elastic does not reject the entire document in case of mapping errors, just the field with the wrong mapping.
Just keep in mind that when use multi-fields, you will have a different field for each mapping, for example date is a field, date.keyword is another field, this will increase the storage usage.
But again, none of this is done automatically, it needs logic in your application, elasticsearch does not allows you to change the mapping of a field, if your application needs this, you will need to implement something in the application that can work with that limitations of elasticsearch.

geoip.location doesnot work with modified indexnames sent via logstash

geoip.location is of geo_point datatype when an event is sent from logstash to elasticsearch with default indexName. As geoip.location has geo_point datatype, i can view the plotting of location in maps in kibana as kibana looks for geo_point datatype for maps.
geoip.location becomes geoip.location.lat, geoip.location.lon with number datatype, when an event is sent from logstash to elasticsearch with modified indexName. Due to this i'm not able to view the plotting of location in maps in kibana.
i don't understand why elasticsearch would behave differently when i try to add data to a modifiedIndexName. is this a bug with elasticsearch?
For my usecase i need to use modified indexname, as i need new index for each day. The plan is to store the logs of a particular day in a single index. so, if there are 7 days then i need to have 7 indexes that contains logs of each day (new index should be created based on currentdate).
i searched around for solution, but i'm not able to comprehend and make it to work for me. Kindly help me out on this
Update (what i did after reading xeraa's answer?)
In the devtools in kibana,
GET _template/logstash - showed the allowed patterns in index_patterns property along with other properties
i included my pattern (dave*) inside index_patterns and triggered the PUT request. You have to pass the entire existing body content (which you would receive in the GET request) inside PUT request along with your required index_patterns, otherwise the default setting will disappear as the PUT api will replace whatever data you pass in the PUT body
PUT _template/logstash
{
...
"index_patterns": [
"logstash-*","dave*"
],
...
}
I'd guess that there is a template set for the default name, which isn't happening if you rename it.
Check with GET _template if any match your old index name and update the setting so that it also gets applied to the new one.

Index ID identification in Elasticsearch/Kibana visualization

"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"4eb9f840-3969-11e8-ae19-552e148747c3\",\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":\"\"}}"
}
The above mentioned snippet is a exported JSON of a Kibana visualization. Without exporting this json is there a direct way to get this
\"index\":\"4eb9f840-3969-11e8-ae19-552e148747c3\ index id.
And if i am not wrong this is supposed to be the index id as its common across visualization with same index.
So, you can retrieve all index patterns using this query
GET .kibana/_search?q=type:index-pattern&size=100
Additionally, you can retrieve a specific set of index pattern given its name using
GET .kibana/_search?q=type:index-pattern%20AND%20index-pattern.title:indexname
Similarly, regarding visualizations, you can retrieve one by name using
GET .kibana/_search?q=type:visualization%20AND%20visualization.title:vizname

Elastic Search - Find document with a conflicting field type

I'm using Elastic Search 5.6.2 with Kibana and I'm currently facing a problem
My documents are indexed on the field timestamp which is normally an integer, however recently somebody has logged a document with a timestamp that is not an integer, and Kibana complains of conflicting type.
The discover panels display nothing and the following errors pop:
Saved "field" parameter is now invalid. Please select a new field.
Discover: "field" is a required parameter
How can I look for the document(s) causing these conflicts so that to find the service creating bad logs ?
The field type (either integer or text/keyword) is not defined on per document basis but rather on per index basis (in the mappings). I guess you are manipulating timeseries data, and you probably have un index per day (or per month or ...).
In Kibana Dev Tools:
List the created indices with GET _cat/indices
For each index (logstash-2017.09.28 in my example) do a GET logstash-2017.09.28/_mapping and check the type of the field in #timestamp
The field type is probably different between indices.
You won't be able to change the field type on created indices. Deleting document won't solve you're problem. The only solution is to drop the index or reindex the whole index with a new field type (in a specific mapping).
To avoid this problem on future indices, the solution is to create an index template with a mapping telling that the field #timestamp is of type date or whatever.

kibana unique count seeing names with - as different entery

I have an problem with the unique count feature.
I get data from elasticsearch for example an computer name (PC-01) in a field.
When i want to use a visualisation unique count then kibana makes from "DESKTOP-2D562R2" -> "DESKTOP" and "2D562R2" as a entery.
See this splitted field:
The data kibana gets from elastic search looks like this entery data:
The problem with this is that 2d562r2 and desktop two different "enterys" are in a kibana table or with unique count.
Your field is being analyzed (split into tokens). Change the mapping (or template, depending on how you're creating the indexes) to make this field not_analyzed.
Note that, as a hack, logstash's default template creates a ".raw" version of string fields that is not analyzed. You could refer to enterys.raw.

Resources