Logstash split root message - elasticsearch

I am collecting some metrics about my application and periodically export them over REST one by one. The output json looks like:
{
"name": "decoder.example.type-3",
"value": 2000,
"from": 1517847790049
"to": 1517847840004
}
This is my logstash configuration that is working well. It will remove all http headers, the original counter name, and add example as interface and type-3 as transaction.
input {
http {
port => 31311
}
}
filter {
json {
source => "message"
}
grok {
match => [ "name", "decoder.%{WORD:interface}.%{NOTSPACE:transaction}" ]
}
mutate {
remove_field => [ "name", "headers", "message" ]
}
}
output {
elasticsearch {
hosts => [ "http://localhost:9200" ]
index => "metric.decoder-%{+YYYY.MM.dd}"
}
}
What I am trying to do now is send all my metrics at once as json array and split all these messages and apply the same logic that was applied to them one by one. An example of the input message would look like:
[
{
"name": "decoder.example.type-3",
"value": 2000,
"from": 1517847790049,
"to": 1517847840004
},
{
"name": "decoder.another.type-0",
"value": 3500,
"from": 1517847790049,
"to": 1517847840004
}
]
I am pretty certain I am supposed to use split filter, but I can't figure out how to use it. I have tried putting split before and after my json plugin, using different field settings, targets, but nothing seems to work as expected.
Could someone point me in the right direction?

In my config I used split first, then I did the logic. Yours should look based on that something like this:
input {
http {
port => 31311
}
}
filter {
json {
source => "message"
}
split{
field => "message"
}
grok {
match => [ "name", "decoder.%{WORD:interface}.%{NOTSPACE:transaction}" ]
}
mutate {
remove_field => [ "name", "headers", "message" ]
}
}
output {
elasticsearch {
hosts => [ "http://localhost:9200" ]
index => "metric.decoder-%{+YYYY.MM.dd}"
}
}
But this presumes that you always have a message field that is an array.
Oh yeah, and I think you should check whether you have the new message field contain the object that you posted. Because if so, your grok won't find anything under name, you need to match message.name. (I usually create a temp field from [message][name] and remove temp later because I didn't care to look up how to call nested fields. There must be a smarter way.)

This is the configuration I ended up with. Perhaps it can be done in fewer steps, but this works well. I had to move some fields around to keep the same structure so it is a bit bigger than my initial one which worked one by one.
The basic idea is to put the parsed json into a specific field, not in the root, and then split that new field.
input {
http {
port => 31311
}
}
filter {
json {
source => "message"
target => "stats"
}
split {
field => "stats"
}
grok {
match => [ "[stats][name]", "decoder.%{WORD:interface}.%{NOTSPACE:transaction}" ]
}
mutate {
add_field => {
"value" => "%{[stats][value]}"
"from" => "%{[stats][from]}"
"to" => "%{[stats][to]}"
}
remove_field => [ "headers", "message", "stats" ]
}
mutate {
convert => {
"value" => "integer"
"from" => "integer"
"to" => "integer"
}
}
}
output {
elasticsearch {
hosts => [ "http://localhost:9200" ]
index => "metric.decoder-%{+YYYY.MM.dd}"
}
}

Related

Elastic search load csv data with context

I have 3m records. Headers are value, type, other_fields..
Here I need to load the data as in this
I need to specify type as context for that value in the record. Is there any way to do this with log stash? or any other options?
val,val_type,id
Sunnyvale it labs, seller, 10223667
For this, I'd use the new CSV ingest processor
First create the ingest pipeline to parse your CSV data
PUT _ingest/pipeline/csv-parser
{
"processors": [
{
"csv": {
"field": "message",
"target_fields": [
"val",
"val_type",
"id"
]
}
},
{
"script": {
"source": """
def val = ctx.val;
ctx.val = [
'input': val,
'contexts': [
'type': [ctx.val_type]
]
]
"""
}
},
{
"remove": {
"field": "message"
}
}
]
}
Then, you can index your documents as follow:
PUT index/_doc/1?pipeline=csv-parser
{
"message": "Sunnyvale it labs,seller,10223667"
}
After ingestion, the document will look like this:
{
"val_type": "seller",
"id": "10223667",
"val": {
"input": "Sunnyvale it labs",
"contexts": {
"type": [
"seller"
]
}
}
}
UPDATE: Logstash solution
Using Logstash, it's also feasible. The configuration file would look something like this:
input {
file {
path => "/path/to/your/file.csv"
sincedb_path => "/dev/null"
start_position => "beginning"
}
}
filter {
csv {
skip_header => true
separator => ","
columns => ["val", "val_type", "id"]
}
mutate {
rename => { "val" => "value" }
add_field => {
"[val][input]" => "%{value}"
"[val][contexts][type]" => "%{val_type}"
}
remove_field => [ "value" ]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "your-index"
}
}

Converting fields from String to Date in Logstash

I'm trying to index emails into elasticsearch with logstash
My conf file is like this :
sudo bin/logstash -e 'input
{ imap
{ host => "imap.googlemail.com"
password => "********"
user => "********#gmail.com"
port => 993
secure => "true"
check_interval => 10
folder => "Inbox"
verify_cert => "false" } }
output
{ stdout
{ codec => rubydebug }
elasticsearch
{ index => "emails"
document_type => "email"
hosts => "localhost:9200" } }'
The problem is that two fields of the outputs are parsed as String fields but they are supposed to be "date" fields
The format of the fields is as below :
"x-dbworld-deadline" => "31-Jul-2019"
"x-dbworld-start-date" => "18-Nov-2019"
How can I convert these two fields into date fields ?
Thanks!
How about create mapping of index on Elasticsearch.
It may look like this:
PUT date-test-191211
{
"mappings": {
"_doc": {
"properties": {
"x-dbworld-deadline": {
"type": "date",
"format": "dd-MMM-yyyy"
},
"x-dbworld-start-date": {
"type": "date",
"format": "dd-MMM-yyyy"
}
}
}
}
}
Then, those fields are recognized as Date format:
result:
[

How to use a field for determining index in Logstash without saving it?

I am using logstash for the first time and can't figure out how to determine index on a parsed field without persisting it.
This is my configuration file:
input {
http {
port => 31311
}
}
filter {
json {
source => "message"
}
mutate {
remove_field => [ "headers", "message" ]
}
grok {
match => [ "name", "^(?<metric-type>\w+)\..*" ]
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "%{metric-type}-%{+YYYY.MM.dd}"
}
}
Json example sent to the http plugin:
{
"name": "counter.custom",
"value": 321,
"from": "2017-11-30T10:43:17.213Z",
"to": "2017-11-30T10:44:00.001Z"
}
This record is saved in the counter-2017.11.30 index as expected. However, I don't want the field metric-type to be saved, I just need it to determine the index.
Any suggestions please?
I have used grok to put my metric-type into a field since grok pattern does not support [#metadata][metric-type] syntax. I have used a mutate filter to copy that field to #metadata and then removed the temporary field.
input {
http {
port => 31311
}
}
filter {
json {
source => "message"
}
mutate {
remove_field => [ "headers", "message" ]
}
grok {
match => [ "name", "^(?<metric-type>\w+)\..*" ]
}
mutate {
add_field => { "[#metadata][metric-type]" => "%{metric-type}" }
remove_field => [ "metric-type" ]
}
}
output {
elasticsearch {
hosts => [ "http://localhost:9200" ]
index => "%{[#metadata][metric-type]}-%{+YYYY.MM.dd}"
}
}
-- EDIT --
As suggested by #Phonolog in the discussion, there is a simpler and much better solution. By using grok keyword matching instead of regex, I was able to save the captured group directly to the #metadata.
input {
http {
port => 31311
}
}
filter {
json {
source => "message"
}
mutate {
remove_field => [ "headers", "message" ]
}
grok {
match => [ "name", "%{WORD:[#metadata][metric-type]}." ]
}
}
output {
elasticsearch {
hosts => [ "http://localhost:9200" ]
index => "%{[#metadata][metric-type]}-%{+YYYY.MM.dd}"
}
}

How to use Mutate/Convert in logstash config file for nested fields in Json file

I have below JSON as input to logstash.
{
"totalTurnoverUSD":11111.456,
"children":[
{
"totalTurnoverUSD":11100.456,
"children":[
{
"totalTurnoverUSD":11.00,
"children":[
]
}
]
}
]
}
And using below config file to output that to elasticSearch and stdout.
input {
file {
type => $type
path => $filePathofJsonFile
codec => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
close_older => 2
max_open_files => 10
}
}
filter {
mutate {
convert => { "totalTurnoverUSD" => "string" }
}
}
output {
elasticsearch{
hosts => $elasticHost
index =>"123"
}
stdout {
codec => rubydebug
}
}
But getting below error message
"error"=>{"type"=>"illegal_argument_exception", "reason"=>"mapper [children.totalTurnoverUSD] of different type, current_type [long], merged_type [double]"}}}, :level=>:warn}
because I am not converting totalTurnoverUSD field in the nested children document of JSON input file.
So, is there any way available to access nested fields in the JSON document for mutating them to converts their datatype to String.
One way to solve this is to let Logstash send whatever numeric type of totalTurnoverUSD it comes up with, but then to use an dynamic template in Elasticsearch.
You can modify your index like this:
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"full_name": {
"path_match": "*.totalTurnoverUSD",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
}
What this is going to achieve is that whenever indexing any document into that index, any field named totalTurnoverUSD at any level in the document, will get the type keyword.
You might need to delete your index first and recreate it from scratch, but try it out without deleting it first.
UPDATE
if you want to apply this to all your indices, you can create an index template like this:
PUT _template/all_indices
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"full_name": {
"path_match": "*.totalTurnoverUSD",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
}
As a result, all mapping type in all indices will get the dynamic template for totalTurnoverUSD

ElasticSearch 5.0.0 - error about object name is already in use

I am learning ElasticSearch and have hit a block. I am trying to use logstash to load a simple CSV into ElasticSearch. This is the data, it is a postcode, longitude, latitude
ZE1 0BH,-1.136758103355,60.150855671143
ZE1 0NW,-1.15526666950369,60.1532197533966
I am using the following logstash conf file to filter the CSV to create a "location" field
input {
file {
path => "postcodes.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["postcode", "lat", "lon"]
separator => ","
}
mutate { convert => {"lat" => "float"} }
mutate { convert => {"lon" => "float"} }
mutate { rename => {"lat" => "[location][lat]"} }
mutate { rename => {"lon" => "[location][lon]"} }
mutate { convert => { "[location]" => "float" } }
}
output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "postcodes"
}
stdout { codec => rubydebug }
}
And I have added the mapping to ElasticSearch using the console in Kibana
PUT postcodes
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"feature": {
"_all": { "enabled": true },
"properties": {
"postcode": {"type": "text"},
"location": {"type": "geo_point"}
}
}
}
}
I check the mappins for the index using
GET postcodes/_mapping
{
"postcodes": {
"mappings": {
"feature": {
"_all": {
"enabled": true
},
"properties": {
"location": {
"type": "geo_point"
},
"postcode": {
"type": "text"
}
}
}
}
}
}
So this all seems to be correct having looked at the documentation and the other questions posted.
However when i run
bin/logstash -f postcodes.conf
I get an error:
[location] is defined as an object in mapping [logs] but this name is already used for a field in other types
I have tried a number of alternative methods;
Deleted the index and the create a template.json and changed my conf file to have the extra settings:
manage_template => true
template => "postcode_template.json"
template_name =>"open_names"
template_overwrite => true
and this gets the same error.
I have managed to get the data loaded by not supplying a template however the data never gets loaded in as a geo_point so you cannot use the Kibana Tile Map to visualise the data
Can anyone explain why I am receiving that error and what method I should use?
Your problem is that you don't have a document_type => feature on your elasticsearch output. Without that, it's going to create the object on type logs which is why you are getting this conflict.

Resources