Replica and shard settings not applied in elasticsearch template - elasticsearch

I've added a template like this:
curl -X PUT "e.f.g.h:9200/_template/impression-template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["impression-%{+YYYY.MM.dd}"],
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"dynamic": false,
"properties": {
"message": {
"type": "object",
"properties": {
...
And I've logstash instance that read events from kafka on write them to ES. Here is my logstash config:
input {
kafka {
topics => ["impression"]
bootstrap_servers => "a.b.c.d:9092"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
elasticsearch {
hosts => ["e.f.g.h:9200"]
index => "impression-%{+YYYY.MM.dd}"
template_name => "impression-template"
}
}
But each day I get index with 5 shard and 1 replica (which is default config of ES). How I could fix that so I could get 2 replica and 2 shard?

Not sure you can add index_pattern as my_index-%{+YYYY.MM.dd}, because when you create it and PUT my_index-2019.03.10 it will have empty mapping because it's not recognized. I had same issue, and workaround for this was to set index_pattern as my_index-* and add year suffix to indices which should look like my_index-2017, my_index-2018...
{
"my_index_template" : {
"order" : 0,
"index_patterns" : [
"my_index-*"
],
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1"
}
},...
I took year part from timestamp field (YYYY-MM-dd) to generate year and add it to the end of index name by logstash
grok {
match => [
"timestamp", "(?<index_year>%{YEAR})"
]
}
mutate {
add_field => {
"[#metadata][index_year]" => "%{index_year}"
}
}
mutate {
remove_field => [ "index_year", "#version" ]
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index-%{[#metadata][index_year]}"
document_id => "%{some_field}"
}
}
After logstash was completed, I've managed to get my_index-2017, my_index-2018 and my_index-2019 indices with 5 shards, and 1 replica and correct mapping as I predefined in my template.

Related

Logstash - how to change field types

Hello guys I am new to Elasticsearch and I have a table called 'american_football_sacks_against_stats'. It consists of three columns.
id
sacks_against_total
sacks_against_yards
1
12
5
2
15
3
...
...
...
The problem is that sacks_against_total and sacks_against_yards aren't 'imported' as integers/longs/floats whatsoever but as a text field and a keyword field. How can I convert them into numbers?
I tried this but its not working:
mutate {
convert => {
"id" => "integer"
"sacks_against_total" => "integer"
"sacks_against_yards" => "integer"
}
}
This is my logstash.conf file:
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/sportsdb"
jdbc_user => "user"
jdbc_password => "password"
jdbc_driver_class => "org.postgresql.Driver"
schedule => "*/5 * * * *"
statement => "SELECT * FROM public.american_football_sacks_against_stats"
jdbc_paging_enabled => "true"
jdbc_page_size => "300"
}
}
filter {
mutate {
convert => {
"id" => "integer"
"sacks_against_total" => "integer"
"sacks_against_yards" => "integer"
}
}
}
output {
stdout { codec => "json" }
elasticsearch {
hosts => "http://localhost:9200"
index => "sportsdb"
doc_as_upsert => true #
}
}
This is the solution I was looking for:
https://linuxhint.com/elasticsearch-reindex-change-field-type/
To start you create a input from your index and change the types.
PUT _ingest/pipeline/convert_pipeline
{
“description”: “converts the field sacks_against_total,sacks_against_yards fields to a long from string”,
"processors" : [
{
"convert" : {
"field" : "sacks_against_yards",
"type": "long"
}
},
{
"convert" : {
"field" : "sacks_against_total",
"type": "long"
}
}
]
}
Or in cURL:
curl -XPUT "http://localhost:9200/_ingest/pipeline/convert_pipeline" -H 'Content-Type: application/json' -d'{ "description": "converts the sacks_against_total field to a long from string", "processors" : [ { "convert" : { "field" : "sacks_against_total", "type": "long" } }, {"convert" : { "field" : "sacks_against_yards", "type": "long" } } ]}'
And then reindexing the index into another one
POST _reindex
{
“source”: {
"index": "american_football_sacks_against_stats"
},
"dest": {
"index": "american_football_sacks_against_stats_withLong",
"pipeline": "convert_pipeline"
}
}
Or in cURL:
curl -XPOST "http://localhost:9200/_reindex" -H 'Content-Type: application/json' -d'{ "source": { "index": "sportsdb" }, "dest": { "index": "sportsdb_finish", "pipeline": "convert_pipeline" }}'
Elasticsearch does not provide the functionality to change types for existing fields.
You can read here for some options:
https://medium.com/#max.borysov/change-field-type-in-elasticsearch-index-2d11bb366517

Elasticsearch - Reindex documents with stored / excluded fields

Im having an index mapping with the following configuration:
"mappings" : {
"_source" : {
"excludes" : [
"special_field"
]
},
"properties" : {
"special_field" : {
"type" : "text",
"store" : true
},
}
}
So, when A new document is indexed using this mapping a got de following result:
{
"_index": "********-2021",
"_id": "************",
"_source": {
...
},
"fields": {
"special_field": [
"my special text"
]
}
}
If a _search query is perfomed, special_field is not returned inside _source as its excluded.
With the following _search query, special_field data is returned perfectly:
GET ********-2021/_search
{
"stored_fields": [ "special_field" ],
"_source": true
}
Right now im trying to reindex all documents inside that index, but im loosing the info stored in special_field and only _source field is getting reindexed.
Is there a way to put that special_field back inside _source field?
Is there a way to reindex that documents without loosing special_field data?
How could these documents be migrated to another cluster without loosing special_field data?
Thank you all.
Thx Hamid Bayat, I finally got it using a small logstash pipeline.
I will share it:
input {
elasticsearch {
hosts => "my-first-cluster:9200"
index => "my-index-pattern-*"
user => "****"
password => "****"
query => '{ "stored_fields": [ "special_field" ], "_source": true }'
size => 500
scroll => "5m"
docinfo => true
docinfo_fields => ["_index", "_type", "_id", "fields"]
}
}
filter {
if [#metadata][fields][special_field]{
mutate {
add_field => { "special_field" => "%{[#metadata][fields][special_field]}" }
}
}
}
output {
elasticsearch {
hosts => ["http://my-second-cluster:9200"]
password => "****"
user => "****"
index => "%{[#metadata][_index]}"
document_id => "%{[#metadata][_id]}"
template => "/usr/share/logstash/config/index_template.json"
template_name => "template-name"
template_overwrite => true
}
}
I had to add fields into docinfo_fields => ["_index", "_type", "_id", "fields"] elasticsearch input plugin and all my stored_fields were on [#metadata][fields] event field.
As the #metadata field is not indexed i had to add a new field at root level with [#metadata][fields][special_field] value.
Its working like a charm.

Combine two index into third index in elastic search using logstash

I have two index
employee_data
{"code":1, "name":xyz, "city":"Mumbai" }
transaction_data
{"code":1, "Month":June", payment:78000 }
I want third index like this
3)join_index
{"code":1, "name":xyz, "city":"Mumbai", "Month":June", payment:78000 }
How it's possible??
i am trying in logstash
input {
elasticsearch {
hosts => "localost"
index => "employees_data,transaction_data"
query => '{ "query": { "match": { "code": 1} } }'
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
You can use elasticsearch input on employees_data
In your filters, use the elasticsearch filter on transaction_data
input {
elasticsearch {
hosts => "localost"
index => "employees_data"
query => '{ "query": { "match_all": { } } }'
sort => "code:desc"
scroll => "5m"
docinfo => true
}
}
filter {
elasticsearch {
hosts => "localhost"
index => "transaction_data"
query => "(code:\"%{[code]}\"
fields => {
"Month" => "Month",
"payment" => "payment"
}
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
And send your new document to your third index with the elasticsearch output
You'll have 3 elastic search connection and the result can be a little slow.
But it works.
You don't need Logstash to do this, Elasticsearch itself supports that by leveraging the enrich processor.
First, you need to create an enrich policy (use the smallest index, let's say it's employees_data ):
PUT /_enrich/policy/employee-policy
{
"match": {
"indices": "employees_data",
"match_field": "code",
"enrich_fields": ["name", "city"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/employee-policy/_execute
When the enrichment index has been created and populated, the next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/employee_lookup
{
"description" : "Enriching transactions with employee data",
"processors" : [
{
"enrich" : {
"policy_name": "employee-policy",
"field" : "code",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "transaction_data"
},
"dest": {
"index": "join1",
"pipeline": "employee_lookup"
}
}
After running this, the join1 index will contain exactly what you need, for instance:
{
"_index" : "join1",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"code":1,
"name": "xyz",
"city": "Mumbai",
"Month": "June",
"payment": 78000
}
}
As long as I know, this can not be happened just using elasticsearch APIs. To handle this, you need to set a unique ID for documents that are relevant. For example, the code that you mentioned in your question can be a good ID for documents. So you can reindex the first index to the third one and use UPDATE API to update them by reading documents from the second index and update them by their IDs into the third index. I hope I could help.

ILM using Logstash Elasticsearch output plugin doesn't work

I'm trying to implement ILM for an index to properly use hardware, using the Elasticsearch output plugin. Looks like I misunderstand how Logstash manages ILM.
I have ELK stack version 7.1.0 in docker. X-Pack is activated by trial license.
The index template is managed by Logstash Elasticsearch output plugin and the index lifecycle policy was created using Kibana.
Here is the output section of Logstash pipeline:
output {
elasticsearch {
hosts => ["http://eshost:9200"]
user => "logstash_writer"
password => "pass"
template => "/usr/share/logstash/es_templates/ilm-template.json"
template_name => "ilm-template"
template_overwrite => true
ilm_enabled => true
ilm_rollover_alias => "ilm-index"
ilm_pattern => "000001"
ilm_policy => "base-policy"
}
}
User logstash_writer has default role logstash_writer with permissions for ILM management.
Elasticsearch index template ilm-template.json:
{
"settings" : {
"index.number_of_replicas" : "1",
"index.number_of_shards" : "1",
"index.refresh_interval" : "5s"
}
}
Elasticsearch index template _template/ilm-template that was actually created by Logstash:
{
"ilm-template" : {
"order" : 0,
"index_patterns" : [
"ilm-index-*"
],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "base-policy",
"rollover_alias" : "ilm-index"
},
"refresh_interval" : "5s",
"number_of_shards" : "1",
"number_of_replicas" : "1"
}
},
"mappings" : { },
"aliases" : { }
}
}
Policy base-policy created using Kibana:
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "100mb",
"max_docs": 100000
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "2d",
"actions": {
"delete": {}
}
}
}
}
}
I expect the set of indices ilm-index-*, but only ilm-index-000001 is created and constantly growing, despite the limitations of base-policy. So I only see in Kibana one index (ilm-index-000001) associated with base-policy.
The provided configuration is correct. The problem is in interpretation of max_size and max_docs parameters when they have small value. Elasticsearch doesn't rollover indices when it's pri.store.size and docs.count become larger than set in max_size and max_docs.

upload csv with logstash to elasticsearch with new mappings

I have a csv file which I'm tryng to upload to ES using Logstash. My conf file is as follows:
input {
file {
path => ["filename"]
start_position => "beginning"
}
}
filter {
csv {
columns => ["name1", "name2", "name3", ...]
separator => ","
}
}
filter {
mutate {
remove_field => ["name31", "name32", "name33"]
}
}
output {
stdout{
codec => rubydebug
}
elasticsearch {
action => "index"
host => "localhost"
index => "newindex"
template_overwrite => true
document_type => "newdoc"
template => "template.json"
}
}
My template file looks like the following:
{
"mappings": {
"newdoc": {
"properties": {
"name1": {
"type": "integer"
},
"name2": {
"type": "float"
},
"name3": {
"format": "dateOptionalTime",
"type": "date"
},
"name4": {
"index": "not_analyzed",
"type": "string"
},
....
}
}
},
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"template": "newindex"
}
When I try to overwrite the default mapping, I get an 400 error even when I only try to write one line:
failed action with response of 400, dropping action: ["index", + ...
What can be the problem? Everything works fine if I don't overwrite the mapping but that is not a solution for me. I'm using Logstash 1.5.1 and Elasticsearch 1.5.0 on Red Hat.
Thanks
You should POST your request 'mapping' to elasticsearch before loading data in elasticsearch
POST mapping
You don't need to create the index before running logstash , It does create the index if you haven't yet , but it's better to create your own mapping before runing your conf file with logstash . Gives you more control over your field types etc.. Here is a simple tutorial on how to import csv to elasticsearch using logstash : http://freefilesdl.com/how-to-connect-logstash-to-elasticsearch-output

Resources