ILM using Logstash Elasticsearch output plugin doesn't work - elasticsearch

I'm trying to implement ILM for an index to properly use hardware, using the Elasticsearch output plugin. Looks like I misunderstand how Logstash manages ILM.
I have ELK stack version 7.1.0 in docker. X-Pack is activated by trial license.
The index template is managed by Logstash Elasticsearch output plugin and the index lifecycle policy was created using Kibana.
Here is the output section of Logstash pipeline:
output {
elasticsearch {
hosts => ["http://eshost:9200"]
user => "logstash_writer"
password => "pass"
template => "/usr/share/logstash/es_templates/ilm-template.json"
template_name => "ilm-template"
template_overwrite => true
ilm_enabled => true
ilm_rollover_alias => "ilm-index"
ilm_pattern => "000001"
ilm_policy => "base-policy"
}
}
User logstash_writer has default role logstash_writer with permissions for ILM management.
Elasticsearch index template ilm-template.json:
{
"settings" : {
"index.number_of_replicas" : "1",
"index.number_of_shards" : "1",
"index.refresh_interval" : "5s"
}
}
Elasticsearch index template _template/ilm-template that was actually created by Logstash:
{
"ilm-template" : {
"order" : 0,
"index_patterns" : [
"ilm-index-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "base-policy",
"rollover_alias" : "ilm-index"
},
"refresh_interval" : "5s",
"number_of_shards" : "1",
"number_of_replicas" : "1"
}
},
"mappings" : { },
"aliases" : { }
}
}
Policy base-policy created using Kibana:
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "100mb",
"max_docs": 100000
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "2d",
"actions": {
"delete": {}
}
}
}
}
}
I expect the set of indices ilm-index-*, but only ilm-index-000001 is created and constantly growing, despite the limitations of base-policy. So I only see in Kibana one index (ilm-index-000001) associated with base-policy.

The provided configuration is correct. The problem is in interpretation of max_size and max_docs parameters when they have small value. Elasticsearch doesn't rollover indices when it's pri.store.size and docs.count become larger than set in max_size and max_docs.

Related

creating data stream through logstash

I have installed elasticsearch cluster v 7.14.
I have created ILM policy and Index template. However data stream parameters mentioned under logstash pipeline file are giving error.
ILM policy -
{
"testpolicy" : {
"version" : 1,
"modified_date" : "2021-08-28T02:58:25.942Z",
"policy" : {
"phases" : {
"hot" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_primary_shard_size" : "900mb",
"max_age" : "2d"
},
"set_priority" : {
"priority" : 100
}
}
},
"delete" : {
"min_age" : "2d",
"actions" : {
"delete" : {
"delete_searchable_snapshot" : true
}
}
}
}
},
"in_use_by" : {
"indices" : [ ],
"data_streams" : [ ],
"composable_templates" : [ ]
}
}
}
Index temaplate -
{
"index_templates" : [
{
"name" : "access_template",
"index_template" : {
"index_patterns" : [
"test-data-stream*"
],
"template" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"number_of_replicas" : "0"
}
},
"mappings" : {
"_routing" : {
"required" : false
},
"dynamic_date_formats" : [
"strict_date_optional_time",
"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
],
"numeric_detection" : true,
"_source" : {
"excludes" : [ ],
"includes" : [ ],
"enabled" : true
},
"dynamic" : true,
"dynamic_templates" : [ ],
"date_detection" : true
}
},
"composed_of" : [ ],
"priority" : 500,
"version" : 1,
"data_stream" : {
"hidden" : false
}
}
}
]
}
logstash pipeline config file -
input {
beats {
port => 5044
}
}
filter {
if [log_type] == "access_server" and [app_id] == "pa"
{
grok {
match => {
"message" => "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:%{MINUTE}(?::?%{SECOND})\| %{USERNAME:exchangeId}\| %{DATA:trackingId}\| %{NUMBER:RoundTrip:int}%{SPACE}ms\| %{NUMBER:ProxyRoundTrip:int}%{SPACE}ms\| %{NUMBER:UserInfoRoundTrip:int}%{SPACE}ms\| %{DATA:Resource}\| %{DATA:subject}\| %{DATA:authmech}\| %{DATA:scopes}\| %{IPV4:Client}\| %{WORD:method}\| %{DATA:Request_URI}\| %{INT:response_code}\| %{DATA:failedRuleType}\| %{DATA:failedRuleName}\| %{DATA:APP_Name}\| %{DATA:Resource_Name}\| %{DATA:Path_Prefix}"
}
}
mutate {
replace => {
"[type]" => "access_server"
}
}
}
}
output {
if [log_type] == "access_server" {
elasticsearch {
hosts => ['http://10.10.10.76:9200']
user => elastic
password => xxx
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "access"
data_stream_namespace => "default"
ilm_rollover_alias => "access"
ilm_pattern => "000001"
ilm_policy => "testpolicy"
template => "/tmp/access_template"
template_name => "access_template"
}
}
elasticsearch {
hosts => ['http://10.10.10.76:9200']
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
user => elastic
password => xxx
}
}
After all deployment done, can only see system indices but data stream is not created.
[2021-08-28T12:42:50,103][ERROR][logstash.outputs.elasticsearch][main] Invalid data stream configuration, following parameters are not supported: {"template"=>"/tmp/pingaccess_template", "ilm_pattern"=>"000001", "template_name"=>"pingaccess_template", "ilm_rollover_alias"=>"pingaccess", "ilm_policy"=>"testpolicy"}
[2021-08-28T12:42:50,547][ERROR][logstash.javapipeline ][main] Pipeline error {:pipeline_id=>"main", :exception=>#<LogStash::ConfigurationError: Invalid data stream configuration: ["template", "ilm_pattern", "template_name", "ilm_rollover_alias", "ilm_policy"]>, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.0.2-java/lib/logstash/outputs/elasticsearch/data_stream_support.rb:57:in `check_data_stream_config!'"
[2021-08-28T12:42:50,702][ERROR][logstash.agent ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}
error is saying parameters like template"=>"/tmp/pingaccess_template", "ilm_pattern"=>"000001", "template_name"=>"pingaccess_template", "ilm_rollover_alias"=>"pingaccess", "ilm_policy"=>"testpolicy" are not valid but in below link they are mentioned.
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-data-streams
The solution is to use logstash without be "aware" of data_stream.
FIRST of all (before running logstash) create your ILM and index_template BUT adding the "index.lifecycle.name" in the settings. That way, you are linking the template and ILM. Also, don't forget the data_stream in the index template.
{
"index_templates" : [
{
"name" : "access_template",
"index_template" : {
"index_patterns" : [
"test-data-stream*"
],
"template" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"number_of_replicas" : "0",
"index.lifecycle.name": "testpolicy"
}
},
"mappings" : {
...
}
},
"composed_of" : [ ],
"priority" : 500,
"version" : 1,
"data_stream" : {
"hidden" : false
}
}
}
]
}
Keep Logstash output like if data_stream doesn't exist but add action => create. This is because you can't use "index" API with data streams. Need the _create API call.
output { elasticsearch {
hosts => ['http://10.10.10.76:9200']
index => "test-data-stream"
user => elastic
password => xxx
action => "create"
}
That way, logstash will output to ES, but the index template will be applied automatically (because of pattern match) and also the ILM and data_stream will be applied.
Important: To make it work, you need to start from scratch. If the index "test-data-stream" already exists in ES (as a traditional index), then data_stream will NOT be created. Make the test with another index name to make sure it works.
The documentation is unclear, but the plugin does not support those options when datastream output is enabled. The plugin is logging the options returned by the invalid_data_stream_params function, which allows action, routing, data_stream, anything else that starts with data_stream_, the shared options defined by the mixin, and the common options defined by the output plugin base.

Combine two index into third index in elastic search using logstash

I have two index
employee_data
{"code":1, "name":xyz, "city":"Mumbai" }
transaction_data
{"code":1, "Month":June", payment:78000 }
I want third index like this
3)join_index
{"code":1, "name":xyz, "city":"Mumbai", "Month":June", payment:78000 }
How it's possible??
i am trying in logstash
input {
elasticsearch {
hosts => "localost"
index => "employees_data,transaction_data"
query => '{ "query": { "match": { "code": 1} } }'
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
You can use elasticsearch input on employees_data
In your filters, use the elasticsearch filter on transaction_data
input {
elasticsearch {
hosts => "localost"
index => "employees_data"
query => '{ "query": { "match_all": { } } }'
sort => "code:desc"
scroll => "5m"
docinfo => true
}
}
filter {
elasticsearch {
hosts => "localhost"
index => "transaction_data"
query => "(code:\"%{[code]}\"
fields => {
"Month" => "Month",
"payment" => "payment"
}
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
And send your new document to your third index with the elasticsearch output
You'll have 3 elastic search connection and the result can be a little slow.
But it works.
You don't need Logstash to do this, Elasticsearch itself supports that by leveraging the enrich processor.
First, you need to create an enrich policy (use the smallest index, let's say it's employees_data ):
PUT /_enrich/policy/employee-policy
{
"match": {
"indices": "employees_data",
"match_field": "code",
"enrich_fields": ["name", "city"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/employee-policy/_execute
When the enrichment index has been created and populated, the next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/employee_lookup
{
"description" : "Enriching transactions with employee data",
"processors" : [
{
"enrich" : {
"policy_name": "employee-policy",
"field" : "code",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "transaction_data"
},
"dest": {
"index": "join1",
"pipeline": "employee_lookup"
}
}
After running this, the join1 index will contain exactly what you need, for instance:
{
"_index" : "join1",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"code":1,
"name": "xyz",
"city": "Mumbai",
"Month": "June",
"payment": 78000
}
}
As long as I know, this can not be happened just using elasticsearch APIs. To handle this, you need to set a unique ID for documents that are relevant. For example, the code that you mentioned in your question can be a good ID for documents. So you can reindex the first index to the third one and use UPDATE API to update them by reading documents from the second index and update them by their IDs into the third index. I hope I could help.

Replica and shard settings not applied in elasticsearch template

I've added a template like this:
curl -X PUT "e.f.g.h:9200/_template/impression-template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["impression-%{+YYYY.MM.dd}"],
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"dynamic": false,
"properties": {
"message": {
"type": "object",
"properties": {
...
And I've logstash instance that read events from kafka on write them to ES. Here is my logstash config:
input {
kafka {
topics => ["impression"]
bootstrap_servers => "a.b.c.d:9092"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
elasticsearch {
hosts => ["e.f.g.h:9200"]
index => "impression-%{+YYYY.MM.dd}"
template_name => "impression-template"
}
}
But each day I get index with 5 shard and 1 replica (which is default config of ES). How I could fix that so I could get 2 replica and 2 shard?
Not sure you can add index_pattern as my_index-%{+YYYY.MM.dd}, because when you create it and PUT my_index-2019.03.10 it will have empty mapping because it's not recognized. I had same issue, and workaround for this was to set index_pattern as my_index-* and add year suffix to indices which should look like my_index-2017, my_index-2018...
{
"my_index_template" : {
"order" : 0,
"index_patterns" : [
"my_index-*"
],
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1"
}
},...
I took year part from timestamp field (YYYY-MM-dd) to generate year and add it to the end of index name by logstash
grok {
match => [
"timestamp", "(?<index_year>%{YEAR})"
]
}
mutate {
add_field => {
"[#metadata][index_year]" => "%{index_year}"
}
}
mutate {
remove_field => [ "index_year", "#version" ]
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index-%{[#metadata][index_year]}"
document_id => "%{some_field}"
}
}
After logstash was completed, I've managed to get my_index-2017, my_index-2018 and my_index-2019 indices with 5 shards, and 1 replica and correct mapping as I predefined in my template.

Understanding ELK Analyzer

Am newby to ELK 5.1.1 stack and I have a few questions just for my understanding.
I have setup this stack basicaly with standard analyzers / filters and everything works great.
My data source is a MySQL backend that I index using Logstash.
I would like to deal with queries containing accents and hopefully asciifolding token filter can help achieve this.
First I learned out how to create custom analyzer and save as template.
Right now when I query this url http://localhost:9200/_template?pretty I have 2 templates: the logstash default template named logstash and my custom template which settings are:
"custom_template" : {
"order" : 1,
"template" : "doo*",
"settings" : {
"index" : {
"analysis" : {
"analyzer" : {
"myCustomAnalyzer" : {
"filter" : [
"standard",
"lowercase",
"asciifolding"
],
"tokenizer" : "standard"
}
}
},
"refresh_interval" : "5s"
}
},
"mappings" : { },
"aliases" : { }
}
Searching for the keyword Yaoundé returns 70 hits but when I search for Yaounde I keep having no hit.
Below is my query for the second case
{
"query": {
"query_string": {
"query": "yaounde",
"fields": [
"title"
]
}
},
"from": 0,
"size": 10
}
Please can somebody help me guess what am doing wrong here?
Also knowing that my data is analyzed by Logstash during the index process do I really have to specify that the analyzer myCustomAnalyzer should be applied during the research as per this second query ?
{
"query": {
"query_string": {
"query": "yaounde",
"fields": [
"title"
],
"analyzer": "myCustomAnalyzer"
}
},
"from": 0,
"size": 10
}
Here is a sample of the output part of my logstash config file
output {
stdout { codec => json_lines }
if [type] == "announces" {
elasticsearch {
hosts => "localhost:9200"
document_id => "%{job_id}"
index => "dooone"
document_type => "%{type}"
}
} else {
elasticsearch {
hosts => "localhost:9200"
document_id => "%{uid}"
index => "dootwo"
document_type => "%{type}"
}
}
}
Thank You
A good place to start is with the index template documentation of elasticsearch:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
An example for your scenario that could work for the title field:
"custom_template" : {
"order" : 1,
"template" : "doo*",
"settings" : {
"index" : {
"analysis" : {
"analyzer" : {
"myCustomAnalyzer" : {
"filter" : [
"standard",
"lowercase",
"asciifolding"
],
"tokenizer" : "standard"
}
}
},
"refresh_interval" : "5s"
}
},
"mappings" : {
"your_type": {
"properties": {
"title": {
"type": "text",
"analyzer": "myCustomAnalyzer"
}
}
}
},
"aliases" : { }
}
An alternative would be to change the dynamic mapping. You can find a good example right here for strings.
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html#match-mapping-type
Can you show the mapping of your document ?
(GET /my_index/my_doc/_mapping )
The analyser you provide as argument in your query does only apply at search time, not at indexation time. So if you haven't set this analyser in your mapping, the string is still indexed with "default" analyzer, so it will not match your results.
The analyzer you provide at search time will apply on your query string, but then it will look into indexed data, which is indexed as "Yaoundé", not "yaounde".

CSV geodata into elasticsearch as a geo_point type using logstash

Below is a reproducible example of the problem I am having using to most recent versions of logstash and elasticsearch.
I am using logstash to input geospatial data from a csv into elasticsearch as geo_points.
The CSV looks like the following:
$ head simple_base_map.csv
"lon","lat"
-1.7841,50.7408
-1.7841,50.7408
-1.78411,50.7408
-1.78412,50.7408
-1.78413,50.7408
-1.78414,50.7408
-1.78415,50.7408
-1.78416,50.7408
-1.78416,50.7408
I have create a mapping template that looks like the following:
$ cat simple_base_map_template.json
{
"template": "base_map_template",
"order": 1,
"settings": {
"number_of_shards": 1
},
"mappings": {
"node_points" : {
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
and have a logstash config file that looks like the following:
$ cat simple_base_map.conf
input {
stdin {}
}
filter {
csv {
columns => [
"lon", "lat"
]
}
if [lon] == "lon" {
drop { }
} else {
mutate {
remove_field => [ "message", "host", "#timestamp", "#version" ]
}
mutate {
convert => { "lon" => "float" }
convert => { "lat" => "float" }
}
mutate {
rename => {
"lon" => "[location][lon]"
"lat" => "[location][lat]"
}
}
}
}
output {
stdout { codec => dots }
elasticsearch {
index => "base_map_simple"
template => "simple_base_map_template.json"
document_type => "node_points"
}
}
I then run the following:
$cat simple_base_map.csv | logstash-2.1.3/bin/logstash -f simple_base_map.conf
Settings: Default filter workers: 16
Logstash startup completed
....................................................................................................Logstash shutdown completed
However when looking at the index base_map_simple, it suggests the documents would not have a location: geo_point type in it...and rather it would be two doubles of lat and lon.
$ curl -XGET 'localhost:9200/base_map_simple?pretty'
{
"base_map_simple" : {
"aliases" : { },
"mappings" : {
"node_points" : {
"properties" : {
"location" : {
"properties" : {
"lat" : {
"type" : "double"
},
"lon" : {
"type" : "double"
}
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1457355015883",
"uuid" : "luWGyfB3ToKTObSrbBbcbw",
"number_of_replicas" : "1",
"number_of_shards" : "5",
"version" : {
"created" : "2020099"
}
}
},
"warmers" : { }
}
}
How would i need to change any of the above files to ensure that it goes into elastic search as a geo_point type?
Finally, I would like to be able to carry out a nearest neighbour search on the geo_points by using a command such as the following:
curl -XGET 'localhost:9200/base_map_simple/_search?pretty' -d'
{
"size": 1,
"sort": {
"_geo_distance" : {
"location" : {
"lat" : 50,
"lon" : -1
},
"order" : "asc",
"unit": "m"
}
}
}'
Thanks
The problem is that in your elasticsearch output you named the index base_map_simple while in your template the template property is base_map_template, hence the template is not being applied when creating the new index. The template property needs to somehow match the name of the index being created in order for the template to kick in.
It will work if you simply change the latter to base_map_*, i.e. as in:
{
"template": "base_map_*", <--- change this
"order": 1,
"settings": {
"index.number_of_shards": 1
},
"mappings": {
"node_points": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
UPDATE
Make sure to delete the current index as well as the template first., i.e.
curl -XDELETE localhost:9200/base_map_simple
curl -XDELETE localhost:9200/_template/logstash

Resources