How to get fields inside message array from Logstash? - elasticsearch

I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?

Related

Outputting document metadata from ElasticSearch using Logstash output csv plugin

I am attempting to output the _id metadata field from ES into a CSV file using Logstash.
{
"_index": "data",
"_type": "default",
"_id": "vANfNGYB9XD0VZRJUFfy",
"_version": 1,
"_score": null,
"_source": {
"vulnid": "CVE-2018-1000060",
"product": [],
"year": "2018",
"month": "02",
"day": "09",
"hour": "23",
"minute": "29",
"published": "2018-02-09T18:29:02.213-05:00",
},
"sort": [
1538424651203
]
}
My logstash output filter is:
output { csv { fields => [ "_id", "vulnid", "published"] path =>
"/tmp/export.%{+YYYY-MM-dd-hh-mm}.csv" } }
I get output:
,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
But I would like to get:
vANfNGYB9XD0VZRJUFfy,CVE-2018-1000060,2018-02-09T18:29:02.213-05:00
How to output the metadata _id into the csv file?
It does not matter if I specify the field like "_id" or "#_id" or "#id".
When we query ES we have to enable docinfo => true. By default it is false.
input {
elasticsearch {
hosts => [ your hosts ]
index => "ti"
query => '{your query}'
size => 1000
scroll => "1s"
docinfo => true
schedule => "14 * * * *"
}
}
Well logstash is not able to get "_id" field from your input, because you must not have set the option docinfo into true.
docinfo helps to include elasticsearch documents information such as index,type _id etc..Please have a look here for more info https://www.elastic.co/guide/en/logstash/current/plugins-inputs-elasticsearch.html#plugins-inputs-elasticsearch-docinfo
use your input plugin as
input {
elasticsearch {
hosts => "hostname"
index => "yourIndex"
query => '{ "query": { "query_string": { "query": "*" } } }' //optional
size => 500 //optional
scroll => "5m" //optional
docinfo => true
}
}

how to store my json log file to logstash with json filter

This is my json log file. I'm trying to store the file to my elastic-Search through my logstash.
{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":
["Action", "Adventure", "Sci-Fi"] }
after storing the data into the elasticSearch, my results is as follow
{
"_index": "filebeat-6.2.4-2018.11.09",
"_type": "doc",
"_id": "n-J39mYB6zb53NvEugMO",
"_score": 1,
"_source": {
"#timestamp": "2018-11-09T03:15:32.262Z",
"source": "/Users/jinwoopark/Jin/json_files/testJson.log",
"offset": 106,
"message": """{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }""",
"id": "%{id}",
"#version": "1",
"host": "Jinui-MacBook-Pro.local",
"tags": [
"beats_input_codec_plain_applied"
],
"prospector": {
"type": "log"
},
"title": "%{title}",
"beat": {
"name": "Jinui-MacBook-Pro.local",
"hostname": "Jinui-MacBook-Pro.local",
"version": "6.2.4"
}
}
}
What I'm trying to do is that,
I want to store only "genre value" into the message field, and store other values(ex id, title) into extra fields(the created fields, which is id and title field). but the extra fields were stored with empty values(%{id}, %{title}). It seems like I need to modify my logstash json filter, but here I need your help.
my current configuration of logstash is as follow
input {
beats {
port => 5044
}
}
filter {
json {
source => "genre" //want to store only genre (from json log) into message field
}
mutate {
add_field => {
"id" => "%{id}" // want to create extra field for id value from log file
"title" => "%{title}" // want to create extra field for title value from log file
}
}
date {
match => [ "timestamp", "dd/MM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}
When you tell the json filter that the source is genre, it should ignore the rest of the document, which would explain why you don't get an id or title.
Seems like you should parse the entire json document, and use the mutate->replace plugin to move the contents of genre to message.

Elasticsearch - Range query work wrong

I used kibana DEV Tools to query some range data,but there have 2 hits is out of my expectation,why it happens?
image of the range query
the query:
{
"query" : {
"constant_score" : {
"filter" : {
"range" : {
"rss" : {
"gte": 3000000
}
}
}
}
}
}
the result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 69,
"successful": 69,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "BBQ",
"_type": "BBQ",
"_id": "AWGJaCYkk-tGbWgj2e6R",
"_score": 1,
"_source": {
"message": [
"nodeProcessInfo"
],
"#timestamp": "2018-02-12T09:45:59.525Z",
"rss": "92636",
"#version": "1",
"host": "192.168.213.96"
}
},
{
"_index": "BBQ",
"_type": "BBQ",
"_id": "AWGJaJxzk-tGbWgj2e-V",
"_score": 1,
"_source": {
"message": [
"nodeProcessInfo"
],
"#timestamp": "2018-02-12T09:46:29.680Z",
"rss": "85272",
"#version": "1",
"host": "192.168.213.96"
}
}
]
}
}
The result of range query is not in my expectation, why gte => 3000000 but rss = 92636 appeared?
======================edit at 2018.2.13=========(1)
the log like this:
"nodeProcessInfo|auth-server-1|auth|9618|1.9|1.2|98060|2018-2-12 6:33:43 PM|"
the filter like this:
filter {
if "nodeProcessInfo" in [message] {
mutate {
split => ["message", "|"]
add_field => {
"serverId" => "%{[message[1]]}"
}
add_field => {
"serverType" => "%{[message[2]]}"
}
add_field => {
"pid" => "%{[message[3]]}"
}
add_field => {
"cpuAvg" => "%{[message[4]]}"
}
add_field => {
"memAvg" => "%{[message[5]]}"
}
add_field => {
"rss" => "%{[message[6]]}"
}
add_field => {
"time" => "%{[message[7]]}"
}
convert => ["rss", "integer"] # I try convert rss to int, but failed
add_tag => "nodeProcessInfo"
}
}
}
======================edit at 2018.2.13=========(2)
I let the convert code in a new mutate, and it worked to make "rss" into int type,but the result of range query also wrong,the change code like this:
if "nodeProcessInfo" in [message] {
mutate {
split => ["message", "|"]
...
...
add_field => {
"rss" => "%{[message[6]]}"
}
}
mutate {
convert => ["rss", "integer"] # add a new mutate here
}
}
======================edit at 2018.2.13=========(3)
At last I found the reason why rss'type is converted to int but range query also wrong:
"You can't change existing mapping type, you need to create a new index with the correct mapping and index the data again."
so I create a new field name to instead of rss and the result of range query is right now.
Can you share the mapping of the index.
I thing the problem is as i can see in the search results which you have shared , the type of the rss field is text or string.
If it is so then the range query you are using is treating them as string characters and giving you results according to that.
And what you are trying to use is number ranges which will work if you index data with type of rss field as long and then fire the same query.
You would then get the desired reuslts

logstash index a text file (username email password [duplicate]

I'd like to import a text file in Elasticsearch. The text file contains 3 values per line. After spending several hours of struggling, I didn't get it done. Help is greatly appreciated.
Elasticsearch 5.4.0 with Logstash installed.
Sample data:
username email hash
username email hash
username email hash
username email hash
username email hash
also built a python script but its too slow:
import requests
import json
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
i = 1
with open("my2") as fileobject:
for line in fileobject:
username, email, hash = line.strip('\n').split(' ')
body = {"username": username, "email": email, "password": hash}
es.index(index='dbs', doc_type='db1', id=i, body=body)
i += 1
edit:
Thanks its work but i guess my filter is bad because i want it to look like this:
{
"_index": "logstash-2017.06.01",
"_type": "db",
"_id": "AVxinqK5XRvft8kN7Q6M",
"_version": 1,
"_score": null,
"_source": {
"username": "Marlb0ro",
"email": "Marlb0ro#site.com",
"hash": "123456",
}
and it put the data like this:
{
"_index": "logstash-2017.06.01",
"_type": "logs",
"_id": "AVxinqK5XRvft8kN7Q6M",
"_version": 1,
"_score": null,
"_source": {
"path": "C:/Users/user/Desktop/user/log.txt",
"#timestamp": "2017-06-01T07:46:22.488Z",
"#version": "1",
"host": "DESKTOP-FNGSJ6C",
"message": "username email password",
"tags": [
"_grokparsefailure"
]
},
"fields": {
"#timestamp": [
1496303182488
]
},
"sort": [
1496303182488
]
}
Simply put this in a file called grok.conf:
input {
file {
path => "/path/to/your/file.log"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => {"message" => "%{WORD:username} %{WORD:email} %{WORD:hash}" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
Then run Logstash with bin/logstash -f grok.conf and you should be ok.

logstash splits event field values and assign to #metadata field

I have a logstash event, which has the following field
{
"_index": "logstash-2016.08.09",
"_type": "log",
"_id": "AVZvz2ix",
"_score": null,
"_source": {
"message": "function_name~execute||line_no~128||debug_message~id was not found",
"#version": "1",
"#timestamp": "2016-08-09T14:57:00.147Z",
"beat": {
"hostname": "coredev",
"name": "coredev"
},
"count": 1,
"fields": null,
"input_type": "log",
"offset": 22299196,
"source": "/project_root/project_1/log/core.log",
"type": "log",
"host": "coredev",
"tags": [
"beats_input_codec_plain_applied"
]
},
"fields": {
"#timestamp": [
1470754620147
]
},
"sort": [
1470754620147
]
}
I am wondering how to use filter (kv maybe?) to extract core.log from "source": "/project_root/project_1/log/core.log", and put it in e.g. [#metadata][log_type], and so later on, I can use log_type in output to create an unique index, composing of hostname + logtype + timestamp, e.g.
output {
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[#metadata][_source][host]}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
You can leverage the mutate/gsub filter in order to achieve this:
filter {
# add the log_type metadata field
mutate {
add_field => {"[#metadata][log_type]" => "%{source}"}
}
# remove everything up to the last slash
mutate {
gsub => [ "[#metadata][log_type]", "^.*\/", "" ]
}
}
Then you can modify your elasticsearch output like this:
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{host}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}

Resources