using logstash to get data out of elasticsearch to a csv file - elasticsearch

I am new to using logstash and am struggling to get data out of elasticsearch using logstash as a csv.
To create some sample data, we can first add a basic csv into elasticsearch... the head of the sample csv can be seen below
$ head uu.csv
"hh","hh1","hh3","id"
-0.979646332669359,1.65186132910743,"L",1
-0.283939374784435,-0.44785377794233,"X",2
0.922659898930901,-1.11689020559612,"F",3
0.348918777124474,1.95766948269957,"U",4
0.52667811182958,0.0168862169880919,"Y",5
-0.804765331279075,-0.186456470768865,"I",6
0.11411203100637,-0.149340801708981,"Q",7
-0.952836952412902,-1.68807271639322,"Q",8
-0.373528919496876,0.750994450392907,"F",9
I then put that into logstash using the following...
$ cat uu.conf
input {
stdin {}
}
filter {
csv {
columns => [
"hh","hh1","hh3","id"
]
}
if [hh1] == "hh1" {
drop { }
} else {
mutate {
remove_field => [ "message", "host", "#timestamp", "#version" ]
}
mutate {
convert => { "hh" => "float" }
convert => { "hh1" => "float" }
convert => { "hh3" => "string" }
convert => { "id" => "integer" }
}
}
}
output {
stdout { codec => dots }
elasticsearch {
index => "temp_index"
document_type => "temp_doc"
document_id => "%{id}"
}
}
This is put into logstash with the following command....
$ cat uu.csv | logstash-2.1.3/bin/logstash -f uu.conf
Settings: Default filter workers: 16
Logstash startup completed
....................................................................................................Logstash shutdown completed
So far so good, but I would like to get some of the data out in particular the hh and hh3 fields in the temp_index.
I wrote the following to extract the data out of elasticsearch into a csv.
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
query => "*"
}
}
filter {
elasticsearch{
add_field => {"hh" => "%{hh}"}
add_field => {"hh3" => "%{hh3}"}
}
}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}
But get the following error when trying to run logstash...
$ logstash-2.1.3/bin/logstash -f yy.conf
The error reported is:
Couldn't find any filter plugin named 'elasticsearch'. Are you sure this is correct? Trying to load the elasticsearch filter plugin resulted in this error: no such file to load -- logstash/filters/elasticsearch
What do I need to change to yy.conf so that a logstash command will extract the data out of elasticsearch and input into a new csv called yy.csv.
UPDATE
changing yy.conf to be the following...
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
query => "*"
}
}
filter {}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}
I got the following error...
$ logstash-2.1.3/bin/logstash -f yy.conf
Settings: Default filter workers: 16
Logstash startup completed
A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["localhost:9200"], index=>"temp_index", query=>"*", codec=><LogStash::Codecs::JSON charset=>"UTF-8">, scan=>true, size=>1000, scroll=>"1m", docinfo=>false, docinfo_target=>"#metadata", docinfo_fields=>["_index", "_type", "_id"], ssl=>false>
Error: [400] {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"init_scan","grouped":true,"failed_shards":[{"shard":0,"index":"temp_index","node":"zu3E6F7kQRWnDPY5L9zF-w","reason":{"type":"parse_exception","reason":"Failed to derive xcontent"}}]},"status":400} {:level=>:error}
A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["localhost:9200"], index=>"temp_index", query=>"*", codec=><LogStash::Codecs::JSON charset=>"UTF-8">, scan=>true, size=>1000, scroll=>"1m", docinfo=>false, docinfo_target=>"#metadata", docinfo_fields=>["_index", "_type", "_id"], ssl=>false>
Error: [400] {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"init_scan","grouped":true,"failed_shards":[{"shard":0,"index":"temp_index","node":"zu3E6F7kQRWnDPY5L9zF-w","reason":{"type":"parse_exception","reason":"Failed to derive xcontent"}}]},"status":400} {:level=>:error}
A plugin had an unrecoverable error. Will restart this plugin.
Interestingly...if i change yy.conf to remove elasticsearch{} to look like...
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
query => "*"
}
}
filter {
add_field => {"hh" => "%{hh}"}
add_field => {"hh3" => "%{hh3}"}
}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}
I get the following error...
$ logstash-2.1.3/bin/logstash -f yy.conf
Error: Expected one of #, { at line 10, column 19 (byte 134) after filter {
add_field
You may be interested in the '--configtest' flag which you can
use to validate logstash's configuration before you choose
to restart a running system.
Also when changing yy.conf to be something similar to take into account the error message
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
query => "*"
}
}
filter {
add_field {"hh" => "%{hh}"}
add_field {"hh3" => "%{hh3}"}
}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}
I get the following error...
$ logstash-2.1.3/bin/logstash -f yy.conf
The error reported is:
Couldn't find any filter plugin named 'add_field'. Are you sure this is correct? Trying to load the add_field filter plugin resulted in this error: no such file to load -- logstash/filters/add_field
* UPDATE 2 *
Thanks to Val I have made some progress and started to get output. But they don't seem correct. I get the following outputs when running the following commands...
$ cat uu.csv | logstash-2.1.3/bin/logstash -f uu.conf
Settings: Default filter workers: 16
Logstash startup completed
....................................................................................................Logstash shutdown completed
$ logstash-2.1.3/bin/logstash -f yy.conf
Settings: Default filter workers: 16
Logstash startup completed
....................................................................................................Logstash shutdown completed
$ head uu.csv
"hh","hh1","hh3","id"
-0.979646332669359,1.65186132910743,"L",1
-0.283939374784435,-0.44785377794233,"X",2
0.922659898930901,-1.11689020559612,"F",3
0.348918777124474,1.95766948269957,"U",4
0.52667811182958,0.0168862169880919,"Y",5
-0.804765331279075,-0.186456470768865,"I",6
0.11411203100637,-0.149340801708981,"Q",7
-0.952836952412902,-1.68807271639322,"Q",8
-0.373528919496876,0.750994450392907,"F",9
$ head yy.csv
-0.106007607975644E1,F
0.385395589205671E0,S
0.722392598488791E-1,Q
0.119773830827963E1,Q
-0.151090510772458E1,W
-0.74978830916084E0,G
-0.98888121700762E-1,M
0.965827615823707E0,S
-0.165311094671424E1,F
0.523818819076447E0,R
Any help would be much appreciated...

You don't need that elasticsearch filter, just specify the fields you want in your CSV in the csv output like you did and you should be fine. The fields you need in your CSV are already contained in the event, you simply need to list them in the fields list of the csv output, simply as that.
Concretely, your config file should look like this:
$ cat yy.conf
input {
elasticsearch {
hosts => "localhost:9200"
index => "temp_index"
}
}
filter {
}
output {
stdout { codec => dots }
csv {
fields => ['hh','hh3']
path => '/home/username/yy.csv'
}
}

Related

Logstash aggregate fields

I am trying to configure logstash to aggregate similar syslog based on a message field and in a specific timestamp.
To make my case clear, this is an example of what I would like to do.
example: I have those junk syslog coming through my logstash
timestamp. message
13:54:24. hello
13:54:35. hello
What I would like to do is have a condition that check if the message are the same and those message occurs in a specific timespan (for example 10min) I would like to aggregate them into one row, and increase the count
the output I am expecting to see is as follow
timestamp. message. count
13.54.35. hello. 2
I know and I saw that there is the opportunity to aggregate the fields, but I was wondering if there is a chance to do this aggregation based on a specific time range
If anyone can help me I would be extremely grateful as I am new to logstash and I have the problem that in my server I am receiving tons of junk syslog and I would like to reduce that amount.
So far I did some cleaning with this configuration
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
Now I just need to do the aggregation.
Thank you so much for your help guys
EDIT:
Following the documentation, I put in place this configuration:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
if [message] =~ "MESSAGE FROM" {
aggregate {
task_id => "%{message}"
code => "map['message'] ||= 0; map['message'] += 1;"
push_map_as_event_on_timeout => true
timeout_task_id_field => "message"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
timeout_code => "event.set('count_message', event.get('message') > 1)"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
}
I don't get any error but the output is not what I am expecting.
The actual output is that it create a tag field (Good) passing an array with _aggregationtimeout and _aggregationexception
{
"message" => "<88>MESSAGE FROM\r\n",
"tags" => [
[0] "_aggregatetimeout",
[1] "_aggregateexception"
],
"#timestamp" => 2021-07-23T12:10:45.646Z,
"#version" => "1"
}

how filter {"foo":"bar", "bar": "foo"} with grok to get only foo field?

I copied
{"name":"myapp","hostname":"banana.local","pid":40161,"level":30,"msg":"hi","time":"2013-01-04T18:46:23.851Z","v":0}
from https://github.com/trentm/node-bunyan and save it as my logs.json. I am trying to import only two fields (name and msg) to ElasticSearch via LogStash. The problem is that I depend on a sort of filter that I am not able to accomplish. Well I have successfully imported such line as a single message but certainly it is not worth in my real case.
That said, how can I import only name and msg to ElasticSearch? I tested several alternatives using http://grokdebug.herokuapp.com/ to reach an useful filter with no success at all.
For instance, %{GREEDYDATA:message} will bring the entire line as an unique message but how to split it and ignore all other than name and msg fields?
At the end, I am planing to use here:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
grok {
match => { "message" => "data=%{GREEDYDATA:request}"}
}
#### some extra lines here probably
}
output
{
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
I have just gone through the list of available Logstash filters. The prune filter should match your need.
Assume you have installed the prune filter, your config file should look like:
input {
file {
type => "my_type"
path => [ "/home/logs/logs.log" ]
codec => "json"
}
}
filter {
prune {
whitelist_names => [
"#timestamp",
"type",
"name",
"msg"
]
}
}
output {
elasticsearch {
codec => json
hosts => "http://127.0.0.1:9200"
index => "indextest"
}
stdout { codec => rubydebug }
}
Please be noted that you will want to keep type for Elasticsearch to index it into a correct type. #timestamp is required if you will view the data on Kibana.

ElasticSearch 2, Logstash and Kibana : grok match can't create fields

I try to parse a message field to generate differents fields. After research, the solution is to use grok with match. But in Kibana, I can't see the new fields (even after refresh or recreate fields from logstash indexes)
I try this in filter config :
grok {
match => {
"message" => "\[32m%{LOGLEVEL:loglevel}\[39m: memory: %{NOTSPACE:memory}, uptime \(seconds\): %{NUMBER:uptime}, load: %{NUMBER:load1},%{NUMBER:load5},%{NUMBER:load15}"
}
}
mutate {
rename => { "docker.id" => "container_id" }
rename => { "docker.name" => "container_name" }
rename => { "docker.image" => "docker_image" }
rename => { "docker.hostname" => "docker_hostname" }
}
To transform this type of message :
[32minfo[39m: memory: 76Mb, uptime (seconds): 5529.927, load: 0.05322265625,0.1298828125,0.19384765625
To this variables :
load15 0.19384765625
uptime 5529.927
load1 0.05322265625
load5 0.1298828125
memory 76Mb
loglevel info
I test the pattern in http://grokconstructor.appspot.com/do/match and my matches work fine. But, In Kibana I can't retrieve this fields.
Thanks

Logstash Geoip does not output coordinates as expected

I'm trying to set long and lat for the Kibana Bettermap using Geoip. I'm using Logstash 1.4.2 and Elasticsearch 1.1.1 and the following is my configuration file:
input
{
stdin { }
}
filter
{
geoip
{
source => "ip"
}
}
output
{
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
When I send the following example ip address:
"ip":"00.00.00.00"
The result is as follows:
{
"message" => "\"ip\":\"00.000.00.00\"",
"#version" => "1",
"#timestamp" => "2014-10-20T22:23:12.334Z",
}
As you can see, no geoip coordinates, and nothing on my Kibana Bettermap. What can I do to get this Bettermap to work?
You aren't parsing the message... Either add codec => json to your stdin and send in {"ip":"8.8.8.8"} or use a grok filter to parse your input:
grok { match => ['message', '%{IP:ip}' ] }

Logstash not importing files due to missing index error

I am having a difficult time trying to get the combination of the Logstash, Elasticsearch & Kibana working in my Windows 7 environment.
I have set all 3 up and they all seem to be running fine, Logstash and Elasticsearch are running as Windows services and Kibana as a website in IIS.
Logstash is running from http://localhost:9200
I have a web application creating log files in .txt with the format:
Datetime=[DateTime], Value=[xxx]
The log files get created in this directory:
D:\wwwroot\Logs\Errors\
My logstash.conf file looks like this:
input {
file {
format => ["plain"]
path => ["D:\wwwroot\Logs\Errors\*.txt"]
type => "testlog"
}
}
output {
elasticsearch {
embedded => true
}
}
My Kibana config.js file looks like this:
define(['settings'],
function (Settings) {
return new Settings({
elasticsearch: "http://localhost:9200",
kibana_index: "kibana-int",
panel_names: [
'histogram',
'map',
'pie',
'table',
'filtering',
'timepicker',
'text',
'fields',
'hits',
'dashcontrol',
'column',
'derivequeries',
'trends',
'bettermap',
'query',
'terms'
]
});
});
When I view Kibana I see the error:
No index found at http://localhost:9200/_all/_mapping. Please create at least one index.If you're using a proxy ensure it is configured correctly.
I have no idea on how to create the index, so if anyone can shed some light on what I am doing wrong that would be great.
It seems like nothing is making it to elasticsearch currently.
For the current version of es (0.90.5), I had to use elasticsearch_http output. The elasticsearch output seemed to be too closely associated with 0.90.3.
e.g: here is how my config is for log4j format to elastic search
input {
file {
path => "/srv/wso2/wso2am-1.4.0/repository/logs/wso2carbon.log"
path => "/srv/wso2/wso2as-5.1.0/repository/logs/wso2carbon.log"
path => "/srv/wso2/wso2is-4.1.0/repository/logs/wso2carbon.log"
type => "log4j"
}
}
output {
stdout { debug => true debug_format => "ruby"}
elasticsearch_http {
host => "localhost"
port => 9200
}
}
For my file format, I have a grok filter as well - to parse it properly.
filter {
if [message] !~ "^[ \t\n]+$" {
# if the line is a log4j type
if [type] == "log4j" {
# parse out fields from log4j line
grok {
match => [ "message", "TID:%{SPACE}\[%{BASE10NUM:thread_name}\]%{SPACE}\[%{WORD:component}\]%{SPACE}\[%{TIMESTAMP_ISO8601:timestamp}\]%{SPACE}%{LOGLEVEL:level}%{SPACE}{%{JAVACLASS:java_file}}%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}" ]
add_tag => ["test"]
}
if "_grokparsefailure" not in [tags] {
mutate {
replace => ["message", " "]
}
}
multiline {
pattern => "^TID|^ $"
negate => true
what => "previous"
add_field => {"additional_log" => "%{message}"}
remove_field => ["message"]
remove_tag => ["_grokparsefailure"]
}
mutate {
strip => ["additional_log"]
remove_tag => ["test"]
remove_field => ["message"]
}
}
} else {
drop {}
}
}
Also, I would get elasticsearch head plugin to monitor your content in elasticsearch- to easily verify the data and state it is in.

Resources