Filter jdbc data in Logstash - elasticsearch

In my DB, I've data in below format:
But in ElasticSearch I want to push data with respect to item types. So each record in ElasticSearch will list all item names & its values per item type.
Like this:
{
"_index": "daily_needs",
"_type": "id",
"_id": "10",
"_source": {
"item_type: "10",
"fruits": "20",
"veggies": "32",
"butter": "11",
}
}
{
"_index": "daily_needs",
"_type": "id",
"_id": "11",
"_source": {
"item_type: "11",
"hair gel": "50",
"shampoo": "35",
}
}
{
"_index": "daily_needs",
"_type": "id",
"_id": "12",
"_source": {
"item_type: "12",
"tape": "9",
"10mm screw": "7",
"blinker fluid": "78",
}
}
Can I achieve this in Logstash?
I'm new into Logstash, but as per my understanding it can be done in filter. But I'm not sure which filter to use or do I've to create a custom filter for this.
Current conf example:
input {
jdbc {
jdbc_driver_library => "ojdbc6.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "myjdbc-configs"
jdbc_user => "dbuser"
jdbc_password => "dbpasswd"
schedule => "* * * * *"
statement => "SELECT * from item_table"
}
}
filter {
## WHAT TO WRITE HERE??
}
output {
elasticsearch {
hosts => [ "http://myeshost/" ]
index => "myindex"
}
}
Kindly suggest. Thank you.

You can achieve this using aggregate filter plugin. I have not tested below, but should give you an idea.
filter {
aggregate {
task_id => "%{item_type}" #
code => "
map['Item_type'] = event.get('Item_type')
map[event.get('Item_Name')] = map[event.get('Item_Value')]
"
push_previous_map_as_event => true
timeout => 3600
timeout_tags => ['_aggregatetimeout']
}
if "aggregated" not in [tags] {
drop {}
}
}
Important Caveats for using aggregate filter:
The sql query MUST order the results by Item_Type, so the events are not out of order.
Column names in sql query should match the column names in the filter map[]
You should use ONLY ONE worker thread for aggregations otherwise events may be processed out of sequence and unexpected results will occur.

Related

Logstash Config how to trasfer aws s3 csv without header to Elasticsearch

I have sample csv file in s3 with 3 column without any header. But during data transfer from s3 csv to elasticsearch, I want to give some name to each column (in my case id, name, age to column 0 to 2 respectively).
Input Sample.csv
1,myname,23
2,myname2,24
Expected Output should be following doc in ES index:
[{
"_index": "user_detail",
"_type": "user_detail_type",
"_id": "1",
"_score": 1.0,
"_source": {
"id": "1",
"name": "myname",
"age": "23"
}
},
{
"_index": "user_detail",
"_type": "user_detail_type",
"_id": "2",
"_score": 1.0,
"_source": {
"id": "2",
"name": "myname2",
"age": "24"
}
}]
Logstash config that I have written is:
input {
s3 {
bucket => "users"
region => "us-east-1"
watch_for_new_files => false
prefix => "user.csv"
}
}
filter {
// Need help here
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "user_detail"
document_type => "user_detail_type"
document_id => "%{id}"
}
}
Doubt:
What should I write in filter section or any change in config to convert column[0] => id, column[1] => name, column[2] => age during Elasticsearch insertion.

How to get fields inside message array from Logstash?

I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?

Outputting document metadata from ElasticSearch using Logstash output csv plugin

I am attempting to output the _id metadata field from ES into a CSV file using Logstash.
{
"_index": "data",
"_type": "default",
"_id": "vANfNGYB9XD0VZRJUFfy",
"_version": 1,
"_score": null,
"_source": {
"vulnid": "CVE-2018-1000060",
"product": [],
"year": "2018",
"month": "02",
"day": "09",
"hour": "23",
"minute": "29",
"published": "2018-02-09T18:29:02.213-05:00",
},
"sort": [
1538424651203
]
}
My logstash output filter is:
output { csv { fields => [ "_id", "vulnid", "published"] path =>
"/tmp/export.%{+YYYY-MM-dd-hh-mm}.csv" } }
I get output:
,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
But I would like to get:
vANfNGYB9XD0VZRJUFfy,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
How to output the metadata _id into the csv file?
It does not matter if I specify the field like "_id" or "#_id" or "#id".
When we query ES we have to enable docinfo => true. By default it is false.
input {
elasticsearch {
hosts => [ your hosts ]
index => "ti"
query => '{your query}'
size => 1000
scroll => "1s"
docinfo => true
schedule => "14 * * * *"
}
}
Well logstash is not able to get "_id" field from your input, because you must not have set the option docinfo into true.
docinfo helps to include elasticsearch documents information such as index,type _id etc..Please have a look here for more info https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
use your input plugin as
input {
elasticsearch {
hosts => "hostname"
index => "yourIndex"
query => '{ "query": { "query_string": { "query": "*" } } }' //optional
size => 500 //optional
scroll => "5m" //optional
docinfo => true
}
}

Nested objects in ElasticSearch with MySQL data

I am new to ES and trying to load data from MYSQL to Elasticsearch using logstash jdbc.
In my situation I want to use column values as field names, Please see new & hex in output data, I want 'id' values as field names.
Mysql data
cid id color new hex create modified
1 101 100 euro abcd #86c67c 5/5/2016 15:48 5/13/2016 14:15
1 102 100 euro 1234 #fdf8ff 5/5/2016 15:48 5/13/2016 14:15
output needed
{
"_index": "colors_hexa",
"_type": "colors",
"_id": "1",
"_version": 218,
"found": true,
"_source": {
"cid": 1,
"color": "100 euro",
"new" : {
"101": "abcd",
"102": "1234",
}
"hex" : {
"101": "#86c67c",
"102": "#fdf8ff",
}
"created": "2016-05-05T10:18:51.000Z",
"modified": "2016-05-13T08:45:30.000Z",
"#version": "1",
"#timestamp": "2016-05-14T01:30:00.059Z"
}
}
Logstash config:
input {
jdbc {
jdbc_driver_library => "/etc/logstash/mysql/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test"
jdbc_user => "root"
jdbc_password => "*****"
schedule => "* * * * *"
statement => "select cid,id,color, new ,hexa_value ,created,modified from colors_hex_test order by cid"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
output {
elasticsearch {
index => "colors_hexa"
document_type => "colors"
document_id => "%{cid}"
hosts => "localhost:9200"
}
}
Can anyone please help with filter tag for this data, 'new' & 'hex' fields are the issue here. I m trying to convert two records to single document.

load array data mysql to ElasticSearch using logstash jdbc

Hi i am new to ES and i m trying to load data from 'MYSQL' to 'Elasticsearch'
I am getting below error when trying to loadata in array format, any help
Here is mysql data, need array data for new & hex value columns
cid color new hex create modified
1 100 euro abcd #86c67c 5/5/2016 15:48 5/13/2016 14:15
1 100 euro 1234 #fdf8ff 5/5/2016 15:48 5/13/2016 14:15
Here us the logstash config
input {
jdbc {
jdbc_driver_library => "/etc/logstash/mysql/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test"
jdbc_user => "root"
jdbc_password => "*****"
schedule => "* * * * *"
statement => "select cid,color, new as 'cvalue.new',hexa_value as 'cvalue.hexa',created,modified from colors_hex_test order by cid"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
output {
elasticsearch {
index => "colors_hexa"
document_type => "colors"
document_id => "%{cid}"
hosts => "localhost:9200"
Need array data for cvalue (new, hexa) like
{
"_index": "colors_hexa",
"_type": "colors",
"_id": "1",
"_version": 218,
"found": true,
"_source": {
"cid": 1,
"color": "100 euro",
"cvalue" : {
"new": "1234",
"hexa_value": "#fdf8ff",
}
"created": "2016-05-05T10:18:51.000Z",
"modified": "2016-05-13T08:45:30.000Z",
"#version": "1",
"#timestamp": "2016-05-14T01:30:00.059Z"
}
}
this is the error i m getting while running logstash
"status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"Field name [cvalue.hexa] cannot contain '.'"}}}, :level=>:warn}
You cant give a field name with .. But you can try to add:
filter {
mutate {
rename => { "new" => "[cvalue][new]" }
rename => { "hexa" => "[cvalue][hexa]" }
}
}

Resources