Dynamic elasticsearch index_type using logstash - elasticsearch

I am working on storing data on elasticsearch using logstash from a rabbitmq server.
My logstash command looks like
logstash -e 'input{
rabbitmq {
exchange => "redwine_log"
key => "info.redwine"
host => "localhost"
durable => true
user => "guest"
password => "guest"
}
}
output {
elasticsearch {
host => "localhost"
index => "redwine"
}
}
filter {
json {
source => "message"
remove_field => [ "message" ]
}
}'
But I needed logstash to put the data into different types in elasticsearch cluster. What i meant by type is:
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "logstash-2014.11.19",
"_type": "logs",
"_id": "ZEea8HBOSs-QwH67q1Kcaw",
"_score": 1,
"_source": {
"context": [],
"level": 200,
"level_name": "INFO",
This is the part of search result, where you can see the logstash by defualt creates a type named "logs" (_type : "logs"). In my project i needed the type to be dynamic and should be created based on the input data.
For example: my input data looks like
{
"data":"some data",
"type": "type_1"
}
and i need the logstash to create a new type in elasticsearch with a name "type_1"..
I tried using grok..But couldnt able to get this specifc requirment.

Its worked for me in this way
elasticsearch {
host => "localhost"
index_type => "%{type}"
}

Related

How to get fields inside message array from Logstash?

I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?

How to fix duplicated documents in Elasticsearch when indexing by Logstash?

I'm using the Elastic Stack to handle my log files but is generating duplicated documents in the Elasticsearch.
I've made some survey and already tried to add the "document_id", but it did not solve.
This is the configuration of my Logstash:
input {
beats {
port => 5044
}
}
filter {
fingerprint {
source => "message"
target => "[fingerprint]"
method => "SHA1"
key => "key"
base64encode => true
}
if [doctype] == "audit-log" {
grok {
match => { "message" => "^\(%{GREEDYDATA:loguser}#%{IPV4:logip}\) \[%{DATESTAMP:logtimestamp}\] %{JAVALOGMESSAGE:logmessage}$" }
}
mutate {
remove_field => ["host"]
}
date {
match => [ "logtimestamp" , "dd/MM/yyyy HH:mm:ss" ]
target => "#timestamp"
locale => "EU"
timezone => "America/Sao_Paulo"
}
}
}
output {
elasticsearch {
hosts => "192.168.0.200:9200"
document_id => "%{[fingerprint]}"
}
}
Here the duplicated documents:
{
"_index": "logstash-2019.05.02-000001",
"_type": "_doc",
"_id": "EbncP00tf9yMxXoEBU4BgAAX/gc=",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"fingerprint": "EbncP00tf9yMxXoEBU4BgAAX/gc=",
"message": "(thiago.alves#192.168.0.200) [06/05/2019 18:50:08] Logout do usuário 'thiago.alves'. (cookie=9d6e545860c24a9b8e3004e5b2dba4a6). IP=192.168.0.200",
...
}
######### DUPLICATED #########
{
"_index": "logstash-2019.05.02-000001",
"_type": "_doc",
"_id": "V7ogj2oB8pjEaraQT_cg",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"fingerprint": "EbncP00tf9yMxXoEBU4BgAAX/gc=",
"message": "(thiago.alves#192.168.0.200) [06/05/2019 18:50:08] Logout do usuário 'thiago.alves'. (cookie=9d6e545860c24a9b8e3004e5b2dba4a6). IP=192.168.0.200",
...
}
That's it. I don't know why is duplicating yet. Someone have any idea?
Thank you in advance...
I had this problem once and after many attempts to solve it, I realized that I did a backup for my conf file into 'pipeline' folder and Logstash was using this backup file to process input rules. Be careful because Logstash will use others files in pipeline folder even the file extension is different from '.conf'.
So, please check if do you have others files in the 'pipeline' folder.
Please let me know if this was useful to you.
Generate a UUID key for each document then your issue will be solved.
Your code seems fine and shouldn't allow duplicates, maybe the duplicated one was added before you added document_id => "%{[fingerprint]}" to your logstash, so elasticsearch generated a unique Id for it that wont be overriden by other ids, remove the duplicated (the one having _id different than fingerprint) manually and try again, it should work.
We noticed that Logstash 7.5.2 is not working properly, It duplicates the logs which are coming from the Filebeat. The actual issue we noticed the inbuild Beats plugin has a bug. So we removed the existing one and updated it to the stable version (6.0.14). The steps are as below,
download
./bin/logstash-plugin remove logstash-input-beats
./bin/logstash-plugin install /{file path}/logstash-input-beats-6.0.14-java.gem
./bin/logstash-plugin list --verbose

Outputting document metadata from ElasticSearch using Logstash output csv plugin

I am attempting to output the _id metadata field from ES into a CSV file using Logstash.
{
"_index": "data",
"_type": "default",
"_id": "vANfNGYB9XD0VZRJUFfy",
"_version": 1,
"_score": null,
"_source": {
"vulnid": "CVE-2018-1000060",
"product": [],
"year": "2018",
"month": "02",
"day": "09",
"hour": "23",
"minute": "29",
"published": "2018-02-09T18:29:02.213-05:00",
},
"sort": [
1538424651203
]
}
My logstash output filter is:
output { csv { fields => [ "_id", "vulnid", "published"] path =>
"/tmp/export.%{+YYYY-MM-dd-hh-mm}.csv" } }
I get output:
,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
But I would like to get:
vANfNGYB9XD0VZRJUFfy,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
How to output the metadata _id into the csv file?
It does not matter if I specify the field like "_id" or "#_id" or "#id".
When we query ES we have to enable docinfo => true. By default it is false.
input {
elasticsearch {
hosts => [ your hosts ]
index => "ti"
query => '{your query}'
size => 1000
scroll => "1s"
docinfo => true
schedule => "14 * * * *"
}
}
Well logstash is not able to get "_id" field from your input, because you must not have set the option docinfo into true.
docinfo helps to include elasticsearch documents information such as index,type _id etc..Please have a look here for more info https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
use your input plugin as
input {
elasticsearch {
hosts => "hostname"
index => "yourIndex"
query => '{ "query": { "query_string": { "query": "*" } } }' //optional
size => 500 //optional
scroll => "5m" //optional
docinfo => true
}
}

using elasticsearch filter in logstash pipeline

I'm using the elasticsearch filter in my logstash pipeline. I correctly find the result using :
filter{
if [class] == "DPAPIINTERNAL" {
elasticsearch {
hosts => "10.1.10.16"
index => "dp_audit-2017.02.16"
query_template => "/home/vittorio/Documents/elastic-queries/matching-requestaw.json"
}
}
}
as you can see, Im using "query_template" which is :
{
"query": {
"query_string": {
"query": "class:DPAPI AND request.aw:%{[aw]}"
}
},
"_source": ["end_point", "vittorio"]
}
that tells elastichsearch to look up the log with that specific class that match "aw" with the DPAPIINTERNAL log.
Perfect! but now that i found the result, i want to add some field from it and attach them to my DPAPIINTERNAL log, for instance, i want to take "end_point" and add it in the new key "vittorio" inside my log.
This is not happening and I don't understand why.
here is the log that i'm looking at using the query:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "dp_audit-2017.02.16",
"_type": "logs",
"_id": "AVpHoPHPuEPlW12Qu",
"_score": 1,
"_source": {
"svc": "dp-1.1",
"request": {
"method": "POST|PATCH|DELETE",
"aw": "prova",
"end_point": "/bank/6311",
"app_instance": "7D1-D233-87E1-913"
},
"path": "/home/vittorio/Documents/dpapi1.json",
"#timestamp": "2017-02-16T15:53:33.214Z",
"#version": "1",
"host": "Vito",
"event": "bank.add",
"class": "DPAPI",
"ts": "2017-01-16T19:20:30.125+01:00"
}
}
]
}
}
Your need to specify the fields parameter in your elasticsearch filter, like this:
elasticsearch {
hosts => "10.1.10.16"
index => "dp_audit-2017.02.16"
query_template => "/home/vittorio/Documents/elastic-queries/matching-requestaw.json"
fields => { "[request][end_point]" => "vittorio" }
}
Note that since end_point is a nested field, you need to modify the _source in your query template like this:
"_source": ["request.end_point"]
the problem is simply that you don't have to specify the "new" field using the query_template.
"_source": ["request"] # here you specify the field you want from the query result.
and then
filter{
if [class] == "DPAPIINTERNAL" {
elasticsearch {
hosts => "10.1.10.16"
index => "dp_audit-2017.02.16"
query_template => "/home/vittorio/Documents/elastic-queries/matching-requestaw.json"
fields => {"request" => "new_key"} # here you add the fields and will tell elastich filter to put request inside new_key
}
}
}
That worked for me!

Logstash and ElasticSearch filter Date #timestamp issue

Im trying to index some data from file to ElasticSearch by using Logstash.
If I'm not using the Date filter in order to replace the #timestamp everything works very well, but when in using the filter I do not get all the data.
I can't figure out why there is a difference between the Logstash command line and Elasticsearch in the #timestamp value.
Logstash conf
filter {
mutate {
replace => {
"type" => "dashboard_a"
}
}
grok {
match => [ "message", "%{DATESTAMP:Logdate} \[%{WORD:Severity}\] %{JAVACLASS:Class} %{GREEDYDATA:Stack}" ]
}
date {
match => [ "Logdate", "dd-MM-yyyy hh:mm:ss,SSS" ]
}
}
Logstash Command line trace
{
**"#timestamp" => "2014-08-26T08:16:18.021Z",**
"message" => "26-08-2014 11:16:18,021 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
"#version" => "1",
"host" => "bts10d1",
"path" => "D:\\ElasticSearch\\logstash-1.4.2\\Dashboard_A\\Log_1\\6.log",
"type" => "dashboard_a",
"Logdate" => "26-08-2014 11:16:18,021",
"Severity" => "DEBUG",
"Class" => "com.fnx.snapshot.mdb.SnapshotMDB",
"Stack" => " - SnapshotMDB Ctor is called\r"
}
ElasticSearch result
{
"_index": "logstash-2014.08.28",
"_type": "dashboard_a",
"_id": "-y23oNeLQs2mMbyz6oRyew",
"_score": 1,
"_source": {
**"#timestamp": "2014-08-28T14:31:38.753Z",
**"message": "15:07,565 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
"#version": "1",
"host": "bts10d1",
"path": "D:\\ElasticSearch\\logstash-1.4.2\\Dashboard_A\\Log_1\\6.log",
"type": "dashboard_a",
"tags": ["_grokparsefailure"]
}
}
Please make sure all your logs is in format!
You can see in the logstash command line trace the logs is
26-08-2014 11:16:18,021 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r
But, in the elastsicsearch the log is
15:07,565 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
Two logs have different time and their format are not same! The second one do not have any information about daytime, therefore it will cause the grok filter parsing error. You can go to check the origin logs. Or can you provide the origin logs sample for more discussion if all of them are in format!

Resources