How to split file name in logstash? - elasticsearch

I am injecting a file from the s3 bucket to logstash, My file name is containing some information, I want to split the file name into multiple fields, so I can use them as separate fields. Please help me I am new with elk.
input {
s3 {
bucket => "***********"
access_key_id => "***********"
secret_access_key => "*******"
"region" => "*********"
"prefix" => "Logs"
"interval" => "1"
"additional_settings" => {
"force_path_style" => true
"follow_redirects" => false
}
}
}
filter {
mutate {
add_field => {
"file" => "%{[#metadata][s3][key]}" //This file name have to split
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "indexforlogstash"
}
}

In the filter section you can leverage the dissect filter in order to achieve what you want:
filter {
...
dissect {
mapping => {
"file" => "Logs/%{deviceId}-%{buildId}-log.txt"
}
}
}
After going through this filter, your document is going to get two new fields, namely:
deviceId (1232131)
buildId (custombuildv12)

Related

How to export Elasticsearch Index as CSV file to Google Cloud Storage Using Logstash

I am using ElasticSearch, here we are creating the day wise index and huge amount of data is being ingested every minute. wanted to export few fields from index created every day to Google cloud storage. am able to achieve this with output file as json as shown below:
input {
elasticsearch {
hosts => "localhost:9200"
index => "test"
query => '
{
"_source": ["field1","field2"],
"query": {
"match_all": {}
}
filter {
mutate {
rename => {
"field1" => "test1"
"field2" => "test2"
}
}
}
}
'
}
}
output {
google_cloud_storage {
codec => csv {
include_headers => true
columns => [ "test1", "test2" ]
}
bucket => "bucketName"
json_key_file => "creds.json"
temp_directory => "/tmp"
log_file_prefix => "logstash_gcs"
max_file_size_kbytes => 1024
date_pattern => "%Y-%m-%dT%H:00"
flush_interval_secs => 600
gzip => false
uploader_interval_secs => 600
include_uuid => true
include_hostname => true
}
}
However how to export it as CSV file and send it to Google Cloud Storage
You should be able to change output_format to plain but this setting is going to be deprecated
You should remove output_format and use the codec setting instead, which supports a csv output format
google_cloud_storage {
...
codec => csv {
include_headers => true
columns => [ "field1", "field2" ]
}
}
If you want to rename your fields, you can add a filter section and mutate/rename the fields however you like. Make sure to also change the columns settings in your csv codec output:
filter {
mutate {
rename => {
"field1" => "renamed1"
"field2" => "renamed2"
}
}
}
output {
...
}

Logstash how to give different index name to ElasticSearch based on file name

I have the following .conf file for Logstash:
input {
file {
path => "C:/elastic/logstash-8.3.2/config/*.csv"
start_position => "beginning"
sincedb_path => "NULL"
}
}
filter {
csv {
separator => ";"
columns => ["name","deposit","month"]
}
mutate {
convert => {
"deposit" => "integer"
}
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "payment_test"
}
stdout {}
}
I get inputs from 10 .csv files, which have names like in-0.csv, in-1.csv and so on. I want the index names in ElasticSearch to be payment_test-0, payment_test-1 and so on for the corresponding .csv input files (the data in in-0.csv would be in index payment_test-0 and so on). How can I achieve this?
I would simply do it like this with the dissect filter instead of grok:
filter {
... your other filters
dissect {
mapping => {
"[log][file][path]" => "%{?ignore_path}/in-%{file_no}.csv"
}
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "payment_test-%{file_no}"
}
stdout {}
}
You can create new field as shown below and that you can use in index name:
input {
file {
path => "C:/elastic/logstash-8.3.2/config/*.csv"
start_position => "beginning"
sincedb_path => "NULL"
}
}
filter {
csv {
separator => ";"
columns => ["name","deposit","month"]
}
mutate {
convert => {
"deposit" => "integer"
}
}
grok {
match => ["path","%{GREEDYDATA}/%{GREEDYDATA:file_name}\.csv"]
}
grok { match => { "file_name " => "^.{3}(?<file_no>.)" } }
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "payment-test-%{file_no}"
}
stdout {}
}
I have used file_name field name for file name but you can used your original field in which file name is coming.

How to add numeric IDs to elasticsearch documents when reading from CSV file using Logstash?

After importing my elasticsearch documents from a CSV file using Logstash, my documents have their ID value set to long alphanumeric strings. How can I have each document ID set to a numeric value instead?
Here is basically what my logstash config looks like:
input {
file {
path => "/path/to/movies.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["title","director","year","country"]
separator => ","
}
mutate {
convert => {
"year" => "integer"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "movie"
document_type => "movie"
}
stdout {}
}
The first and easiest option is to add a new column ID in your CSV and use that field as the document id.
Another option is to use a ruby filter that will add a dynamic ID to your events. The downside of this solution is that if your CSV changes and you re-run your pipeline each document might not get the same ID. Another downside is that you need to run your pipeline with only one worker (i.e. with -w 1) because the id_seq variable cannot be shared between worker pipelines.
filter {
csv {
columns => ["title","director","year","country"]
separator => ","
}
mutate {
convert => {
"year" => "integer"
}
}
# create ID
ruby {
"init" => "id_seq = 0"
"code" => "
event.set('id', id_seq)
id_seq += 1
"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "movie"
document_type => "movie"
document_id => "%{id}"
}
stdout {}
}

Logstash Duplicate Data

i have duplicate data in Logstash
how could i remove this duplication?
my input is:
input
input {
file {
path => "/var/log/flask/access*"
type => "flask_access"
max_open_files => 409599
}
stdin{}
}
filter
filter of files is :
filter {
mutate { replace => { "type" => "flask_access" } }
grok {
match => { "message" => "%{FLASKACCESS}" }
}
mutate {
add_field => {
"temp" => "%{uniqueid} %{method}"
}
}
if "Entering" in [api_status] {
aggregate {
task_id => "%{temp}"
code => "map['blockedprocess'] = 2"
map_action => "create"
}
}
if "Entering" in [api_status] or "Leaving" in [api_status]{
aggregate {
task_id => "%{temp}"
code => "map['blockedprocess'] -= 1"
map_action => "update"
}
}
if "End Task" in [api_status] {
aggregate {
task_id => "%{temp}"
code => "event['blockedprocess'] = map['blockedprocess']"
map_action => "update"
end_of_task => true
timeout => 120
}
}
}
Take a look at the image, the same data log, at the same time, and I just sent one log request.
i solve it
i create a unique id by ('document_id') in output section
document_id point to my temp and temp is my unique id in my project
my output changed to:
output {
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{temp}"
# sniffing => true
# manage_template => false
# index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
# document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
Executing tests in my local lab, I've just found out that logstash is sensitive to the number of its config files that are kept in /etc/logstash/conf.d directory.
If config files are more than 1, then we can see duplicates for the same record.
So, try to remove all backup configs from /etc/logstash/conf.d directory and perform logstash restart.

How to type data input in logstash

I'm trying to input a csv file to elasticsearch through logstash.
That's my configuration file
input {
file {
codec => plain{
charset => "ISO-8859-1"
}
path => ["PATH/*.csv"]
sincedb_path => "PATH/.sincedb_path"
start_position => "beginning"
}
}
filter {
if [message] =~ /^"ID","DATE"/ {
drop { }
}
date {
match => [ "DATE","yyyy-MM-dd HH:mm:ss" ]
target => "DATE"
}
csv {
columns => ["ID","DATE",...]
separator => ","
source => message
remove_field => ["message","host","path","#version","#timestamp"]
}
}
output {
elasticsearch {
embedded => false
host => "localhost"
cluster => "elasticsearch"
node_name => "localhost"
index => "index"
index_type => "type"
}
}
Now, the mapping produced in elasticsearch types the DATE field as string. I would like to type as a date field.
In the filter element, I tried to convert the type field in date but it doesn't work.
How can I fix that ?
Regards,
Alexandre
You have your filter chain setup in the wrong order. The date{} block needs to come after the csv {} block.

Resources