I am attempting to output the _id metadata field from ES into a CSV file using Logstash.
{
"_index": "data",
"_type": "default",
"_id": "vANfNGYB9XD0VZRJUFfy",
"_version": 1,
"_score": null,
"_source": {
"vulnid": "CVE-2018-1000060",
"product": [],
"year": "2018",
"month": "02",
"day": "09",
"hour": "23",
"minute": "29",
"published": "2018-02-09T18:29:02.213-05:00",
},
"sort": [
1538424651203
]
}
My logstash output filter is:
output { csv { fields => [ "_id", "vulnid", "published"] path =>
"/tmp/export.%{+YYYY-MM-dd-hh-mm}.csv" } }
I get output:
,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
But I would like to get:
vANfNGYB9XD0VZRJUFfy,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
How to output the metadata _id into the csv file?
It does not matter if I specify the field like "_id" or "#_id" or "#id".
When we query ES we have to enable docinfo => true. By default it is false.
input {
elasticsearch {
hosts => [ your hosts ]
index => "ti"
query => '{your query}'
size => 1000
scroll => "1s"
docinfo => true
schedule => "14 * * * *"
}
}
Well logstash is not able to get "_id" field from your input, because you must not have set the option docinfo into true.
docinfo helps to include elasticsearch documents information such as index,type _id etc..Please have a look here for more info https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
use your input plugin as
input {
elasticsearch {
hosts => "hostname"
index => "yourIndex"
query => '{ "query": { "query_string": { "query": "*" } } }' //optional
size => 500 //optional
scroll => "5m" //optional
docinfo => true
}
}
Related
I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?
In my DB, I've data in below format:
But in ElasticSearch I want to push data with respect to item types. So each record in ElasticSearch will list all item names & its values per item type.
Like this:
{
"_index": "daily_needs",
"_type": "id",
"_id": "10",
"_source": {
"item_type: "10",
"fruits": "20",
"veggies": "32",
"butter": "11",
}
}
{
"_index": "daily_needs",
"_type": "id",
"_id": "11",
"_source": {
"item_type: "11",
"hair gel": "50",
"shampoo": "35",
}
}
{
"_index": "daily_needs",
"_type": "id",
"_id": "12",
"_source": {
"item_type: "12",
"tape": "9",
"10mm screw": "7",
"blinker fluid": "78",
}
}
Can I achieve this in Logstash?
I'm new into Logstash, but as per my understanding it can be done in filter. But I'm not sure which filter to use or do I've to create a custom filter for this.
Current conf example:
input {
jdbc {
jdbc_driver_library => "ojdbc6.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "myjdbc-configs"
jdbc_user => "dbuser"
jdbc_password => "dbpasswd"
schedule => "* * * * *"
statement => "SELECT * from item_table"
}
}
filter {
## WHAT TO WRITE HERE??
}
output {
elasticsearch {
hosts => [ "http://myeshost/" ]
index => "myindex"
}
}
Kindly suggest. Thank you.
You can achieve this using aggregate filter plugin. I have not tested below, but should give you an idea.
filter {
aggregate {
task_id => "%{item_type}" #
code => "
map['Item_type'] = event.get('Item_type')
map[event.get('Item_Name')] = map[event.get('Item_Value')]
"
push_previous_map_as_event => true
timeout => 3600
timeout_tags => ['_aggregatetimeout']
}
if "aggregated" not in [tags] {
drop {}
}
}
Important Caveats for using aggregate filter:
The sql query MUST order the results by Item_Type, so the events are not out of order.
Column names in sql query should match the column names in the filter map[]
You should use ONLY ONE worker thread for aggregations otherwise events may be processed out of sequence and unexpected results will occur.
This is my json log file. I'm trying to store the file to my elastic-Search through my logstash.
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":
["Action", "Adventure", "Sci-Fi"] }
after storing the data into the elasticSearch, my results is as follow
{
"_index": "filebeat-6.2.4-2018.11.09",
"_type": "doc",
"_id": "n-J39mYB6zb53NvEugMO",
"_score": 1,
"_source": {
"#timestamp": "2018-11-09T03:15:32.262Z",
"source": "/Users/jinwoopark/Jin/json_files/testJson.log",
"offset": 106,
"message": """{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }""",
"id": "%{id}",
"#version": "1",
"host": "Jinui-MacBook-Pro.local",
"tags": [
"beats_input_codec_plain_applied"
],
"prospector": {
"type": "log"
},
"title": "%{title}",
"beat": {
"name": "Jinui-MacBook-Pro.local",
"hostname": "Jinui-MacBook-Pro.local",
"version": "6.2.4"
}
}
}
What I'm trying to do is that,
I want to store only "genre value" into the message field, and store other values(ex id, title) into extra fields(the created fields, which is id and title field). but the extra fields were stored with empty values(%{id}, %{title}). It seems like I need to modify my logstash json filter, but here I need your help.
my current configuration of logstash is as follow
input {
beats {
port => 5044
}
}
filter {
json {
source => "genre" //want to store only genre (from json log) into message field
}
mutate {
add_field => {
"id" => "%{id}" // want to create extra field for id value from log file
"title" => "%{title}" // want to create extra field for title value from log file
}
}
date {
match => [ "timestamp", "dd/MM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}
When you tell the json filter that the source is genre, it should ignore the rest of the document, which would explain why you don't get an id or title.
Seems like you should parse the entire json document, and use the mutate->replace plugin to move the contents of genre to message.
I used kibana DEV Tools to query some range data,but there have 2 hits is out of my expectation,why it happens?
image of the range query
the query:
{
"query" : {
"constant_score" : {
"filter" : {
"range" : {
"rss" : {
"gte": 3000000
}
}
}
}
}
}
the result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 69,
"successful": 69,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "BBQ",
"_type": "BBQ",
"_id": "AWGJaCYkk-tGbWgj2e6R",
"_score": 1,
"_source": {
"message": [
"nodeProcessInfo"
],
"#timestamp": "2018-02-12T09:45:59.525Z",
"rss": "92636",
"#version": "1",
"host": "192.168.213.96"
}
},
{
"_index": "BBQ",
"_type": "BBQ",
"_id": "AWGJaJxzk-tGbWgj2e-V",
"_score": 1,
"_source": {
"message": [
"nodeProcessInfo"
],
"#timestamp": "2018-02-12T09:46:29.680Z",
"rss": "85272",
"#version": "1",
"host": "192.168.213.96"
}
}
]
}
}
The result of range query is not in my expectation, why gte => 3000000 but rss = 92636 appeared?
======================edit at 2018.2.13=========(1)
the log like this:
"nodeProcessInfo|auth-server-1|auth|9618|1.9|1.2|98060|2018-2-12 6:33:43 PM|"
the filter like this:
filter {
if "nodeProcessInfo" in [message] {
mutate {
split => ["message", "|"]
add_field => {
"serverId" => "%{[message[1]]}"
}
add_field => {
"serverType" => "%{[message[2]]}"
}
add_field => {
"pid" => "%{[message[3]]}"
}
add_field => {
"cpuAvg" => "%{[message[4]]}"
}
add_field => {
"memAvg" => "%{[message[5]]}"
}
add_field => {
"rss" => "%{[message[6]]}"
}
add_field => {
"time" => "%{[message[7]]}"
}
convert => ["rss", "integer"] # I try convert rss to int, but failed
add_tag => "nodeProcessInfo"
}
}
}
======================edit at 2018.2.13=========(2)
I let the convert code in a new mutate, and it worked to make "rss" into int type,but the result of range query also wrong,the change code like this:
if "nodeProcessInfo" in [message] {
mutate {
split => ["message", "|"]
...
...
add_field => {
"rss" => "%{[message[6]]}"
}
}
mutate {
convert => ["rss", "integer"] # add a new mutate here
}
}
======================edit at 2018.2.13=========(3)
At last I found the reason why rss'type is converted to int but range query also wrong:
"You can't change existing mapping type, you need to create a new index with the correct mapping and index the data again."
so I create a new field name to instead of rss and the result of range query is right now.
Can you share the mapping of the index.
I thing the problem is as i can see in the search results which you have shared , the type of the rss field is text or string.
If it is so then the range query you are using is treating them as string characters and giving you results according to that.
And what you are trying to use is number ranges which will work if you index data with type of rss field as long and then fire the same query.
You would then get the desired reuslts
I have a logstash event, which has the following field
{
"_index": "logstash-2016.08.09",
"_type": "log",
"_id": "AVZvz2ix",
"_score": null,
"_source": {
"message": "function_name~execute||line_no~128||debug_message~id was not found",
"#version": "1",
"#timestamp": "2016-08-09T14:57:00.147Z",
"beat": {
"hostname": "coredev",
"name": "coredev"
},
"count": 1,
"fields": null,
"input_type": "log",
"offset": 22299196,
"source": "/project_root/project_1/log/core.log",
"type": "log",
"host": "coredev",
"tags": [
"beats_input_codec_plain_applied"
]
},
"fields": {
"#timestamp": [
1470754620147
]
},
"sort": [
1470754620147
]
}
I am wondering how to use filter (kv maybe?) to extract core.log from "source": "/project_root/project_1/log/core.log", and put it in e.g. [#metadata][log_type], and so later on, I can use log_type in output to create an unique index, composing of hostname + logtype + timestamp, e.g.
output {
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[#metadata][_source][host]}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
You can leverage the mutate/gsub filter in order to achieve this:
filter {
# add the log_type metadata field
mutate {
add_field => {"[#metadata][log_type]" => "%{source}"}
}
# remove everything up to the last slash
mutate {
gsub => [ "[#metadata][log_type]", "^.*\/", "" ]
}
}
Then you can modify your elasticsearch output like this:
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{host}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}