Logstash : Is there a way to change some of the properties in document while migrating - elasticsearch

I have been migrating some of the indexes from self-hosted Elasticsearch to AmazonElasticSearch using Logstash. While migrating the documents, We need to change the field names in the index based on some logic.
Our Logstash Config file
input {
elasticsearch {
hosts => ["https://staing-example.com:443"]
user => "userName"
password => "password"
index => "testingindex"
size => 100
scroll => "1m"
}
}
filter {
}
output {
amazon_es {
hosts => ["https://example.us-east-1.es.amazonaws.com:443"]
region => "us-east-1"
aws_access_key_id => "access_key_id"
aws_secret_access_key => "access_key_id"
index => "testingindex"
}
stdout{
codec => rubydebug
}
}
Here it is one of the documents for the testingIndex from our self-hosted elastic search
{
"uniqueIdentifier" => "e32d331b-ce5f-45c8-beca-b729707fca48",
"createdDate" => 1527592562743,
"interactionInfo" => [
{
"value" => "Hello this is testing",
"title" => "msg",
"interactionInfoId" => "8c091cb9-e51b-42f2-acad-79ad1fe685d8"
},
{
**"value"** => """"{"edited":false,"imgSrc":"asdfadf/soruce","cont":"Collaborated in <b class=\"mention\" gid=\"4UIZjuFzMXiu2Ege6cF3R4q8dwaKb9pE\">#2222222</b> ","chatMessageObjStr":"Btester has quoted your feed","userLogin":"test.comal#google.co","userId":"tester123"}"""",
"title" => "msgMeta",
"interactionInfoId" => "f6c7203b-2bde-4cc9-a85e-08567f082af3"
}
],
"componentId" => "compId",
"status" => [
"delivered"
]
},
"accountId" => "test123",
"applicationId" => "appId"
}
This is what we are expecting when documents get migrated to our AmazonElasticSearch
{
"uniqueIdentifier" => "e32d331b-ce5f-45c8-beca-b729707fca48",
"createdDate" => 1527592562743,
"interactionInfo" => [
{
"value" => "Hello this is testing",
"title" => "msg",
"interactionInfoId" => "8c091cb9-e51b-42f2-acad-79ad1fe685d8"
},
{
**"value-keyword"** => """"{"edited":false,"imgSrc":"asdfadf/soruce","cont":"Collaborated in <b class=\"mention\" gid=\"4UIZjuFzMXiu2Ege6cF3R4q8dwaKb9pE\">#2222222</b> ","chatMessageObjStr":"Btester has quoted your feed","userLogin":"test.comal#google.co","userId":"tester123"}"""",
"title" => "msgMeta",
"interactionInfoId" => "f6c7203b-2bde-4cc9-a85e-08567f082af3"
}
],
"componentId" => "compId",
"status" => [
"delivered"
]
},
"accountId" => "test123",
"applicationId" => "appId"
}
What we need is to change the "value" field to "value-keyword" wherever we find some JSON format. Is there any other filter in Logstash to achieve this

As documented in the Logstash website:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-rename
You can use the mutate filter, applying the rename function.
For example:
filter {
mutate {
replace => { "old-field" => "new-field" }
}
}
For nested fields, you could just pass the path of the field:
filter {
mutate {
replace => { "[interactionInfo][value]" => "[interactionInfo][value-keyword]" }
}
}

Try adding this to your filter:
filter {
ruby {
code => "event.get('interactionInfo').each { |item| if item['value'].match(/{.+}/) then item['value-keyword'] = item.delete('value') end }"
}
}

Related

Elasticsearch: Duplicates caused by overwriting log files

I'm using ELK stack. Log files are saved every 5 min by the simple Java app. Then Filebeat throws them to Logstash. Because of overwriting same messages are getting indexed (their fingerprints are identical). The only difference is the document id. Elasticsearch gives new id to documents everytime they get overwritten. How can I get rid of duplicates or keep document id the same?
Logstash input:
input {
beats {
port => 5044
ssl => false
ssl_certificate => "/etc/pki/tls/certs/logstash-beats.crt"
client_inactivity_timeout => 200
ssl_key => "/etc/pki/tls/private/logstash-beats.key"
}
}
filter {
if [fields][log_type] == "access" {
grok {
match => [ "message", "%{IP:client_ip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:apache_timestamp}\] \"%{WORD:method} /%{WORD:servername}/%{NOTSPACE:requestpage} HTTP/%{NUMBER:http_version}\" %{NUMBER:server_response} %{NUMBER:answer_size}" ]
}
}
else if [fields][log_type] == "errors" {
grok {
match => {"message" => "%{DATESTAMP:maximotime}(.*)SystemErr"}
}
date {
timezone => "Europe/Moscow"
match => ["maximotime", "dd.MM.yy HH:mm:ss:SSS"]
}
mutate {
copy => { "message" => "key" }
}
mutate {
gsub => [
"message", ".*SystemErr R ", "",
"key", ".*SystemErr R", ""
]
}
truncate {
fields => "key"
length_bytes => 255
}
fingerprint {
method => "SHA1"
source => ["key"]
}
if "_grokparsefailure" in [tags] {
drop { }
}
} else if [fields][log_type] == "info" {
grok {
match => {"message" => ["%{TIMESTAMP_ISO8601:maximotime}.* ПОЛЬЗОВАТЕЛЬ = \(%{WORD:username}.*программа \(%{WORD:appname}\).*объект \(%{WORD:object}\).*: %{GREEDYDATA:sql} \(выполнение заняло %{NUMBER:execution} миллисекунд\) \{conditions:%{GREEDYDATA:conditions}\}", "%{TIMESTAMP_ISO8601:maximotime}.* ПОЛЬЗОВАТЕЛЬ = \(%{WORD:username}.*программа \(%{WORD:appname}\).*объект \(%{WORD:object}\).*: %{GREEDYDATA:sql} \{conditions:%{GREEDYDATA:conditions}\}", "%{TIMESTAMP_ISO8601:maximotime}.* ПОЛЬЗОВАТЕЛЬ = \(%{WORD:username}.*программа \(%{WORD:appname}\).*объект \(%{WORD:object}\).*: %{GREEDYDATA:sql} \(выполнение заняло %{NUMBER:execution} миллисекунд\)"]}
add_field => {
"type" => "conditions"
}
}
mutate {
convert => {
"execution" => "integer"
}
}
fingerprint {
method => "SHA1"
source => ["message"]
}
if "_grokparsefailure" in [tags] {
grok {
match => {"message" => "%{TIMESTAMP_ISO8601:maximotime} (.*)getMboCount %{WORD:object}: mbosets \(%{WORD:mbosets}\), mbos \(%{WORD:mbos}\)"}
add_field => {
"type" => "maximoObjectCount"
}
remove_tag => ["_grokparsefailure"]
}
mutate {
convert => {
"mbosets" => "integer"
"mbos" => "integer"
}
}
fingerprint {
method => "SHA1"
source => ["message"]
}
if "_grokparsefailure" in [tags] {
drop { }
}
}
date {
timezone => "Europe/Moscow"
match => ["maximotime", "yyyy-MM-dd HH:mm:ss:SSS"]
target => "maximotime"
}
}
}
Logstash output:
output {
stdout {codec => rubydebug}
if [fields][log_type] == "access" {
elasticsearch {
hosts => ["localhost"]
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
} else if [fields][log_type] == "errors"{
elasticsearch {
hosts => ["localhost"]
manage_template => false
index => "%{[#metadata][beat]}-error-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
} else if [fields][log_type] == "info"{
elasticsearch {
hosts => ["localhost"]
manage_template => false
index => "%{[#metadata][beat]}-info-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
document_id => "%{fingerprint}"
}
}
}
Filebeat.yml:
filebeat.config:
modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
processors:
- add_cloud_metadata: ~
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/integration/*.log
fields: {log_type: access}
- type: log
enabled: true
paths:
- /var/log/maximo_error_logs/*.log
fields: {log_type: errors}
exclude_lines: '^((\*+)|Log file started at:)'
multiline.pattern: '(^$|(\t|\s)at .*|.*Caused by:.*|.*SystemErr( ){5}R[ \t]{2}at .*|^ru.ocrv..*|^(\s|\t|)null.*|Обратитесь за.*|.*Закрытое со.*|^(\s|\t|)(ORA-.*|BMX.*)|^(\\s|\t)[А-Яа-я].*)|(.*\d more$)'
multiline.negate: false
multiline.match: after
- type: log
enabled: true
paths:
- /var/log/maximo_logs/*.log
fields: {log_type: info}
output.logstash:
hosts: ["elk:5044"]
bulk_max_size: 200
I'm dumb. I was restarting Filebeat container instead of ELK, so my Logstash configs wasn't applying... Now it's working and my Logstash output config looks like this:
document_id => "%{type}-%{fingerprint}"
action => "create"

how to handle special characters ( " ) in input file logstash

i'm having a problem with my data when push to ELK using logstash.
here is my input file
input {
file {
path => ["C:/Users/HoangHiep/Desktop/test17.txt"]
type => "_doc"
start_position => beginning
}
}
filter {
dissect {
mapping => {
"message" => "%{word}"
}
}
}
output {
elasticsearch{
hosts => ["localhost:9200"]
index => "test01"
}
stdout { codec => rubydebug}
}
My data is
"day la text"
this is the output
{
"host" => "DESKTOP-T41GENH",
"path" => "C:/Users/HoangHiep/Desktop/test17.txt",
"#timestamp" => 2020-01-15T10:04:52.746Z,
"#version" => "1",
"type" => "_doc",
"message" => "\"day la text\"\r",
"word" => "\"day la text\"\r"
}
Is there any way to handle the character ( " ).
i want the "word" just be like "day la text \r" don't have character \"
Thanks all.
I can explain more about this if this change works for you. The reason I say is I have newest mac so I don't see the trailing \r in my message.
the input just like you have it "day la text"
filter {
mutate {
gsub => [
"message","(\")", ""
]
}
}
response is
{
"#timestamp" => 2020-01-15T15:01:58.828Z,
"#version" => "1",
"headers" => {
"http_version" => "HTTP/1.1",
"request_method" => "POST",
"http_accept" => "*/*",
"accept_encoding" => "gzip, deflate",
"postman_token" => "5ae8b2a0-2e94-433c-9ecc-e415731365b6",
"cache_control" => "no-cache",
"content_type" => "text/plain",
"connection" => "keep-alive",
"http_user_agent" => "PostmanRuntime/7.21.0",
"http_host" => "localhost:8080",
"content_length" => "13",
"request_path" => "/"
},
"host" => "0:0:0:0:0:0:0:1",
"message" => "day la text" <===== see the extra inbuilt `\"` gone.
}

Recursive not working in kv filter in logstash

I want to know about the use of recursive function in kv filter. I am using a csv file. I uploaded the file to ES using logstash. After reading the guide from this link https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html#plugins-filters-kv-recursive
I came to know that it duplicates the key/values pair and store it in a separate key. But i can't get additional info or examples about the filter. I added a recursive line in logstash config file. No changes.
Is it duplicates the fields with values(key-value pairs) or else what this function doing???
Here's my sample csv file data passing through logstash:
"host" => "smackcoders",
"Driveline" => "Four-wheel drive",
"Make" => "Jeep",
"Width" => "79",
"Torque" => "260",
"Year" => "2012",
"Horsepower" => "285",
"City_mpg" => "17",
"Height" => "34",
"Classification" => "Manual,Transmission",
"Model_Year" => "2012 Jeep Wrangler",
"Number_of_Forward_Gears" => "6",
"Length" => "41",
"Highway_mpg" => "21",
"#version" => "1",
"message" => "17,\"Manual,Transmission\",Four-wheel drive,Jeep 3.6L 6 Cylinder 280 hp 260 lb-ft,Gasoline,34,21,285,False,2012 Jeep Wrangler Arctic,41,Jeep,2012 Jeep Wrangler,6,260,6 Speed Manual,79,2012",
"Fuel_Type" => "Gasoline",
"Engine_Type" => "Jeep 3.6L 6 Cylinder 280 hp 260 lb-ft",
"path" => "/home/paulsteven/log_cars/cars.csv",
"Hybrid" => "False",
"ID" => "2012 Jeep Wrangler Arctic",
"#timestamp" => 2019-04-20T07:58:26.552Z,
"Transmission" => "6 Speed Manual"
}
Here's the config file:
input {
file {
path => "/home/paulsteven/log_cars/cars.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["City_mpg","Classification","Driveline","Engine_Type","Fuel_Type","Height","Highway_mpg","Horsepower","Hybrid","ID","Length","Make","Model_Year","Number_of_Forward_Gears","Torque","Transmission","Width","Year"]
}
kv {
recursive => "true"
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "kvfilter1"
document_type => "details"
}
stdout{}
}
Found some examples for recursive in kv filter:
input { generator { count => 1 message => 'foo=1,bar="foor=10,barr=11"' } }
filter {
kv { field_split => "," value_split => "=" recursive => false }
}
will produce
"foo" => "1",
"bar" => "foor=10,barr=11",
whereas
input { generator { count => 1 message => 'foo=1,bar="foor=10,barr=11"' } }
filter {
kv { field_split => "," value_split => "=" recursive => true }
}
will produce
"foo" => "1",
"bar" => {
"foor" => "10",
"barr" => "11"
},

Can’t Send #metadata to elasticsearch

I want to include #metadata field contents in my elasticsearch output.
This is the output when i am using stdout in my output filter-
{
"#timestamp" => 2018-03-08T08:17:42.059Z,
"thread_name" => "SimpleAsyncTaskExecutor-2",
"#metadata" => {
"dead_letter_queue" => {
"entry_time" => 2018-03-08T08:17:50.082Z,
"reason" => "Could not index event to Elasticsearch. status: 400, action: ["index", {:_id=>nil, :_index=>"applog-2018.03.08", :_type=>"doc", :_routing=>nil}, #LogStash::Event:0x3ab79ab5], response: {"index"=>{"_index"=>"applog-2018.03.08", "_type"=>"doc", "_id"=>"POuwBGIB0PJDPQOoDy1Q", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [message]", "caused_by"=>{"type"=>"illegal_state_exception", "reason"=>"Can't get text on a START_OBJECT at 1:223"}}}}",
"plugin_type" => "elasticsearch",
"plugin_id" => "7ee60ceccc2ef7c933cf5aa718d42f24a65b489e12a1e1c7b67ce82e04ef0d37"
}
},
"#version" => "1",
"beat" => {
"name" => "filebeat-kwjn6",
"version" => "6.0.0"
},
"dateOffset" => 408697,
"source" => "/var/log/applogs/spring-cloud-dataflow/Log.log",
"logger_name" => "decurtis.dxp.deamon.JobConfiguration",
"message" => {
"timeStamp" => "2018-01-30",
"severity" => "ERROR",
"hostname" => "",
"commonUtility" => {},
"offset" => "Etc/UTC",
"messageCode" => "L_9001",
"correlationId" => "ea5b13c3-d395-4fa5-8124-19902e400316",
"componentName" => "dxp-deamon-refdata-country",
"componentVersion" => "1",
"message" => "Unhandled exceptions",
},
"tags" => [
[0] "webapp-log",
[1] "beats_input_codec_plain_applied",
[2] "_jsonparsefailure"
]
}
I want my #metadata field in elasticsearch output.
Below is my conf file:
input {
dead_letter_queue {
path => "/usr/share/logstash/data/dead_letter_queue"
commit_offsets => true
pipeline_id => "main"
}
}
filter {
json {
source => "message"
}
mutate {
rename => { "[#metadata][dead_letter_queue][reason]" => "reason" }
}
}
output {
elasticsearch {
hosts => "elasticsearch"
manage_template => false
index => "deadletterlog-%{+YYYY.MM.dd}"
}
}
Now in my output there is a field called "reason" but without any content. Is there something i am missing.
this can help :-
mutate {
add_field => {
"reason" => "%{[#metadata][dead_letter_queue][reason]}"
"plugin_id" => "%{[#metadata][dead_letter_queue][plugin_id]}"
"plugin_type" => "%{[#metadata][dead_letter_queue][plugin_type]}"
}
}

How to solve date parsing error in logstash?

I have the following logstash configuration:
input {
file{
path => ["C:/Users/MISHAL/Desktop/ELK_Files/rm/evsb.json"]
type => "json"
start_position => "beginning"
}
}
filter {
json {
source => "message"
}
mutate {
convert => [ "increasedFare", "float"]
convert => ["enq", "float"]
convert => ["bkd", "float"]
}
date{
match => [ "date" , "YYYY-MM-dd HH:mm:ss" ]
target => "#timestamp"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => "localhost"
index => "zsx"
}
}
And this is the json data jt.json :
[{"id":1,"date":"2015-11-11 23:00:00","enq":"105","bkd":"9","increasedFare":"0"}, {"id":2,"date":"2015-11-15 23:00:00","eng":"55","bkd":"2","increasedFare":"0"}, {"id":3,"date":"2015-11-20 23:00:00","enq":"105","bkd":"9","increasedFare":"0"}, {"id":4,"date":"2015-11-25 23:00:00","eng":"55","bkd":"2","increasedFare":"0"}]
Tried running this in logstash however I am not able to parse the date or get the date in timestamp.
The following is the warning message im getting:
Failed parsing date from field {:field=>"[date]", :value=>"%{[date]}", :exception=>"Invalid format: \"%{[date]}\"", :config_parsers=>"YYYY-MM-dd HH:mm:ss", :config_locale=>"default=en_IN", :level=>:warn}
The following is the stdout
Logstash startup completed
{
"message" => "{\"id\":2,\"date\":\"2015-09-15 23:00:00\",\"enq\":\"34\",\"bkd\":\"2\",\"increasedFare\":\"0\"}\r",
"#version" => "1",
"#timestamp" => "2015-09-15T17:30:00.000Z",
"host" => "TCHWNG",
"path" => "C:/Users/MISHAL/Desktop/ELK_Files/jsonTest/jt.json",
"type" => "json",
"id" => 2,
"date" => "2015-09-15 23:00:00",
"enq" => 34.0,
"bkd" => 2.0,
"increasedFare" => 0.0
}
{
"message" => "{\"id\":3,\"date\":\"2015-09-20 23:00:00\",\"enq\":\"22\",\"bkd\":\"9\",\"increasedFare\":\"0\"}\r",
"#version" => "1",
"#timestamp" => "2015-09-20T17:30:00.000Z",
"host" => "TCHWNG",
"path" => "C:/Users/MISHAL/Desktop/ELK_Files/jsonTest/jt.json",
"type" => "json",
"id" => 3,
"date" => "2015-09-20 23:00:00",
"enq" => 22.0,
"bkd" => 9.0,
"increasedFare" => 0.0
}
{
"message" => "{\"id\":4,\"date\":\"2015-09-25 23:00:00\",\"enq\":\"66\",\"bkd\":\"2\",\"increasedFare\":\"0\"}\r",
"#version" => "1",
"#timestamp" => "2015-09-25T17:30:00.000Z",
"host" => "TCHWNG",
"path" => "C:/Users/MISHAL/Desktop/ELK_Files/jsonTest/jt.json",
"type" => "json",
"id" => 4,
"date" => "2015-09-25 23:00:00",
"enq" => 66.0,
"bkd" => 2.0,
"increasedFare" => 0.0
}
Been trying to solve this for two days and tried various things, But I am not able to solve this. Please tell what Im doing wrong here.

Resources