Getting JSON format error in Logstash. Need to decode JSON from the message part - elasticsearch

I am using the below code in my logstash but the message field is coming after getting converted into JSON twice, but I need the message with a single JSON format, how can I decode it from JSON one step back:
file {
id => "my_lt_log"
path => "/logs/logtransformer.log"
type => "log"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
mutate {
remove_field => [ "kubernetes"]
}
mutate {
gsub => [ "message", "(\W)at(\W)", '\1""\2' ]
}
if [message][metadata][proc_id] {
mutate {
add_field => { "[metadata][proc_id]" => "%{[message][metadata][proc_id]}" }
}
}
if "_jsonparsefailure" in [tags] {
mutate {
add_field => {
"logplane" => "adp-app-logs"
"abc" => "%{[message]}"
}
remove_field => [ "message", "kubernetes" ]
}
}
else {
mutate {
rename => {
"path" => "filename"
}
add_field => {
"def" => "%{[message]}"
"message" => "%{[message][message]}"
"timestamp" => "%{[message][timestamp]}"
}
}
}
}
output {
...
}
Output:
{
"_index" : "adp-app-logs-2023.02.03",
"_type" : "_doc",
"_id" : "PpiDF4YBCMtUNdxoMJFW",
"_score" : 0.79323065,
"_source" : {
"#version" : "1",
"service_id" : "%{[message][service_id]}",
"def" : "{\"version\": \"1.1.0\", \"timestamp\": \"2023-02-03T13:41:43.034Z\", \"severity\": \"info\", \"service_id\": \"eric-log-transformer\", \"metadata\" : {\"namespace\": \"zyadros\", \"pod_name\": \"eric-log-transformer-7b64896976-s6h5r\", \"node_name\": \"node-10-63-142-135\", \"pod_uid\": \"336c9706-41a9-41c0-b459-2eb4e9f6e2b4\", \"container_name\": \"logtransformer\"}, \"message\": \"Starting pipeline {:pipeline_id=>'opensearch', 'pipeline.workers'=>2, 'pipeline.batch.size'=>2048, 'pipeline.batch.delay'=>50, 'pipeline.max_inflight'=>4096, 'pipeline.sources'=>['/opt/logstash/resource/searchengine.conf'], :thread=>'#<Thread:0x7649ae47 run>'}\"}",
"#timestamp" : "2023-02-03T13:41:58.044976Z",
"severity" : "%{[message][severity]}"
}
}
Expected Output for the field 'def':
{"version": "1.1.0", "timestamp": "2023-02-06T06:18:33.647Z", "severity": "info", "service_id": "eric-log-transformer", "metadata" : {"namespace": "zyadros", "pod_name": "eric-log-transformer-5cb7dbc6b5-ghrsc", "node_name": "node-10-63-142-135", "pod_uid": "52b8e6fe-9547-4091-9034-36e1141f4391", "container_name": "logtransformer"}, "message": "Starting tcp input listener {:address=>'0.0.0.0:5015', :ssl_enable=>true}"}
Input file sample:
{"version": "1.1.0", "timestamp": "2023-02-06T13:42:59.634Z", "severity": "info", "service_id": "eric-log-transformer", "metadata" : {"namespace": "roshan", "pod_name": "eric-log-transformer-5bc84c4cb-c8qtw", "node_name": "node-10-63-142-135", "pod_uid": "00632d7b-d151-4b0c-84fe-8a6ee6b64b35", "container_name": "logtransformer"}, "message": "Starting pipeline {:pipeline_id=>'logstash', 'pipeline.workers'=>2, 'pipeline.batch.size'=>2048, 'pipeline.batch.delay'=>50, 'pipeline.max_inflight'=>4096, 'pipeline.sources'=>['/opt/logstash/resource/logstash.conf'], :thread=>'#<Thread:0x3d8b1518 run>'}"}
Getting the below Error while adding json {source}:
{
"_index" : "%{logplane}-2023.02.07",
"_type" : "_doc",
"_id" : "9d6xKoYBCoUR1nQuuhlp",
"_score" : 2.969562E-4,
"_source" : {
"path" : "/logs/logtransformer.log",
"#version" : "1",
"tags" : [
"_jsonparsefailure"
],
"#timestamp" : "2023-02-07T07:05:22.465556Z",
"host" : "eric-log-transformer-d6dddd6f9-lp6d7",
"message" : " at [Source: (byte[])' at [Source: (byte[])' at [Source: (byte[])'{'version': '1.1.0', 'timestamp': '2023-02-07T07:05:06.163Z', 'severity': 'warning', 'service_id': 'eric-log-transformer', 'metadata' : {'namespace': 'zyadros', 'pod_name': 'eric-log-transformer-d6dddd6f9-lp6d7', 'node_name': 'node-10-63-142-138', 'pod_uid': '02659e5c-c9ac-49c8-a4bd-74b9477e846d', 'container_name': 'logtransformer'}, 'message': 'Error parsing json {:source=>'message', :raw=>'{\\'version\\': \\'1.1.0\\', \\'timestamp\\': \\'2023-02-07T07:05:02'[truncated 68 bytes]; line: 1, column: 5]>}\"}"
}
},

Related

How to fetch field from array of objects in Elasticsearch Index as CSV file to Google Cloud Storage Using Logstash

I am using ElasticSearch to index data and wanted to export few fields from index created every day to Google cloud storage, How to get fields from array of objects in elastic search index and send them as csv file to GCS bucket using Logstash
Tried below conf to fetch nested fields from index:
input {
elasticsearch {
hosts => "host:443"
user => "user"
ssl => true
connect_timeout_seconds => 600
request_timeout_seconds => 600
password => "pwd"
ca_file => "ca.crt"
index => "test"
query => '
{
"_source": ["obj1.Name","obj1.addr","obj1.obj2.location", "Hierarchy.categoryUrl"],
"query": {
"match_all": {}
}
}
'
}
}
filter {
mutate {
rename => {
"[obj1][Name]" => "col1"
"[obj1][addr]" => "col2"
"[obj1][obj2][location]" => "col3"
"[Hierarchy][0][categoryUrl]" => "col4"
}
}
}
output {
google_cloud_storage {
codec => csv {
include_headers => true
columns => [ "col1", "col2","col3"]
}
bucket => "bucket"
json_key_file => "creds.json"
temp_directory => "/tmp"
log_file_prefix => "log_gcs"
max_file_size_kbytes => 1024
date_pattern => "%Y-%m-%dT%H:00"
flush_interval_secs => 600
gzip => false
uploader_interval_secs => 600
include_uuid => true
include_hostname => true
}
}
How to get field populated to above csv from array of objects, in below example wanted to fetch categoryUrl from the first object of an array and populate to csv table and send it to GCS Bucket:
have tried below approaches :
"_source": ["obj1.Name","obj1.addr","obj1.obj2.location", "Hierarchy.categoryUrl"]
and
"_source": ["obj1.Name","obj1.addr","obj1.obj2.location", "Hierarchy[0].categoryUrl"]
with
mutate {
rename => {
"[obj1][Name]" => "col1"
"[obj1][addr]" => "col2"
"[obj1][obj2][location]" => "col3"
"[Hierarchy][0][categoryUrl]" => "col4"
}
for input sample :
"Hierarchy" : [
{
"level" : "1",
"category" : "test",
"categoryUrl" : "testurl1"
},
{
"level" : "2",
"category" : "test2",
"categoryUrl" : "testurl2"
}}
Attaching sample document where I am trying to fetch merchandisingHierarchy[0].categoryUrl and pricingInfo[0].basePrice :
{
"_index" : "amulya-test",
"_type" : "_doc",
"_id" : "ldZPJoYBFi8LOEDK_M2f",
"_score" : 1.0,
"_ignored" : [
"itemDetails.description.keyword"
],
"_source" : {
"itemDetails" : {
"compSku" : "202726",
"compName" : "abc.com",
"compWebsite" : "abc.com",
"title" : "Monteray 38.25 in. x 73.375 in. Frameless Hinged Corner Shower Enclosure in Brushed Nickel",
"description" : "Create the modthroom of your dreams with the clean lines of the VIGO Monteray Frameless Shower Enclosure. Solid 3/8 in. tempered glass combined with stainless steel and solid brass construction makes this enclosure strong and long-lasting. The sleek, reversible, outward-opening door features a convenient towel bar. This versatile enclosure can be installed on a tile floor or with a VIGO Shower Base. With a single water deflector along the bottom seal strip, water is redirected back into the shower to keep your bathroom dry, clean, and pristine.",
"modelNumber" : "VG6011BNCL40",
"upc" : "8137756684",
"hasVariations" : false,
"productDetailsBulletPoints" : [ ],
"itemUrls" : {
"productPageUrl" : "https://.abc.com/p/VIGO-Monteray-38-in-x-73-375-in-Frameless-Hinged-Corner-Shower-Enclosure-in-Brushed-Nickel-VG6011BNCL40/202722616",
"primaryImageUrl" : "https://images.thdstatic.com/productImages/d77d9e8b-1ea1-4811-a470-8364c8e47402/svn/vigo-shower-enclosures-vg6011bncl40-64_600.jpg",
"secondaryImageUrls" : [
"https://images.thdstatic.com/productImages/d77d9e8b-1e1-4811-a470-8364c8e47402/svn/vigo-shower-enclosures-vg6011bncl40-64_1000.jpg",
"https://images.thdstatic.com/productImages/db539ff9-6df-48c2-897a-18dd1e1794e3/svn/vigo-shower-enclosures-vg6011bncl40-e1_1000.jpg",
"https://images.thdstatic.com/productImages/47c5090b-49a-46bc-a36d-921ddae5e1ab/svn/vigo-shower-enclosures-vg6011bncl40-40_1000.jpg",
"https://images.thdstatic.com/productImages/add6691c-a02-466d-9a1a-47200b05685e/svn/vigo-shower-enclosures-vg6011bncl40-a0_1000.jpg",
"https://images.thdstatic.com/productImages/d638230e-0d9-40c9-be93-7f7bf24f0732/svn/vigo-shower-enclosures-vg6011bncl40-1d_1000.jpg"
]
}
},
"merchandisingHierarchy" : [
{
"level" : "1",
"category" : "Home",
"categoryUrl" : "host/"
},
{
"level" : "2",
"category" : "Bath",
"categoryUrl" : "host/b/Bath/N-5yc1vZbzb3"
},
{
"level" : "3",
"category" : "Showers",
"categoryUrl" : "host/b/Bath-Showers/N-5yc1vZbzcd"
},
{
"level" : "4",
"category" : "Shower Doors",
"categoryUrl" : "host/b/Bath-Showers-Shower-Doors/N-5yc1vZbzcg"
},
{
"level" : "5",
"category" : "Shower Enclosures",
"categoryUrl" : "host/b/Bath-Showers-Shower-Doors-Shower-Enclosures/N-5yc1vZcbn2"
}
],
"reviewsAndRatings" : {
"pdtReviewCount" : 105
},
"additionalAttributes" : {
"isAddon" : false
},
"productSpecifications" : {
"Warranties" : { },
"Details" : { },
"Dimensions" : { }
},
"promoDetails" : [
{
"promoName" : "Save $150.00 (15%)",
"promoPrice" : 849.9
}
],
"locationDetails" : { },
"storePickupDetails" : {
"deliveryText" : "Get it by Mon, Feb 20",
"toEddDate" : "Mon, Feb 20",
"isBackordered" : false,
"selectedEddZipcode" : "20147",
"shipToStoreEnabled" : true,
"homeDeliveryEnabled" : true,
"scheduledDeliveryEnabled" : false
},
"recommendedProducts" : [ ],
"pricingInfo" : [
{
"type" : "SAS",
"offerPrice" : 849.9,
"sellerName" : "abc.com",
"onClearance" : false,
"inStock" : true,
"isBuyBoxWinner" : true,
"promo" : [
{
"onPromo" : true,
"promoName" : "Save $150.00 (15%)",
"promoPrice" : 849.9
}
],
"basePrice" : 999.9,
"priceVariants" : [
{
"basePrice" : 999.9,
"offerPrice" : 849.9
}
],
"inventoryDetails" : {
"stockInStore" : false,
"stockOnline" : true
}
}
]
}
}
You can do it like this:
input {
elasticsearch {
...
query => '
{
"_source": ["merchandisingHierarchy.categoryUrl"],
"query": {
"match_all": {}
}
}
'
}
}
filter {
mutate {
add_field => {
"col1" => "%{[merchandisingHierarchy][0][categoryUrl]}"
"col2" => "%{[pricingInfo][0][basePrice]}"
}
}
}
output {
stdout {
codec => csv {
include_headers => true
columns => [ "col1"]
}
}
}
I've tested with your sample document and I get the output below, which looks like is working per your expectation:
col1,col2
host/,999.9

ingest pipeline not preserving the date type field

Here is my JSON data that i am trying to send from filebeat to ingest pipeline "logpipeline.json" in opensearch.
json data
{
"#timestamp":"2022-11-08T10:07:05+00:00",
"client":"10.x.x.x",
"server_name":"example.stack.com",
"server_port":"80",
"server_protocol":"HTTP/1.1",
"method":"POST",
"request":"/example/api/v1/",
"request_length":"200",
"status":"500",
"bytes_sent":"598",
"body_bytes_sent":"138",
"referer":"",
"user_agent":"Java/1.8.0_191",
"upstream_addr":"10.x.x.x:10376",
"upstream_status":"500",
"gzip_ratio":"",
"content_type":"application/json",
"request_time":"6.826",
"upstream_response_time":"6.826",
"upstream_connect_time":"0.000",
"upstream_header_time":"6.826",
"remote_addr":"10.x.x.x",
"x_forwarded_for":"10.x.x.x",
"upstream_cache_status":"",
"ssl_protocol":"TLSv",
"ssl_cipher":"xxxx",
"ssl_session_reused":"r",
"request_body":"{\"date\":null,\"sourceType\":\"BPM\",\"processId\":\"xxxxx\",\"comment\":\"Process status: xxxxx: \",\"user\":\"xxxx\"}",
"response_body":"{\"statusCode\":500,\"reasonPhrase\":\"Internal Server Error\",\"errorMessage\":\"xxxx\"}",
"limit_req_status":"",
"log_body":"1",
"connection_upgrade":"close",
"http_upgrade":"",
"request_uri":"/example/api/v1/",
"args":""
}
Filebeat to Opensearch log shipping
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["192.168.29.117:9200"]
pipeline: logpipeline
#index: "filebeatelastic-%{[agent.version]}-%{+yyyy.MM.dd}"
index: "nginx_dev-%{+yyyy.MM.dd}"
# Protocol - either `http` (default) or `https`.
protocol: "https"
ssl.enabled: true
ssl.verification_mode: none
# Authentication credentials - either API key or username/password.
#api_key: "id:api_key"
username: "filebeat"
password: "filebeat"
I am carrying out the "data" fields transformation in the ingest pipeline for some of the fields by doing type conversion which works perfectly. But the only problem i am facing is with the "#timestamp".
The "#timestamp" is of "date" type and once the json data goes through the pipeline i am mapping the json data message to root level json object called "data". In that transformed data the "data.#timestamp" is showing as type "string" even though i haven't done any transformation for it.
Opensearch ingestpipeline - logpipeline.json
{
"description" : "Logging Pipeline",
"processors" : [
{
"json" : {
"field" : "message",
"target_field" : "data"
}
},
{
"date" : {
"field" : "data.#timestamp",
"formats" : ["ISO8601"]
}
},
{
"convert" : {
"field" : "data.body_bytes_sent",
"type": "integer",
"ignore_missing": true,
"ignore_failure": true
}
},
{
"convert" : {
"field" : "data.bytes_sent",
"type": "integer",
"ignore_missing": true,
"ignore_failure": true
}
},
{
"convert" : {
"field" : "data.request_length",
"type": "integer",
"ignore_missing": true,
"ignore_failure": true
}
},
{
"convert" : {
"field" : "data.request_time",
"type": "float",
"ignore_missing": true,
"ignore_failure": true
}
},
{
"convert" : {
"field" : "data.upstream_connect_time",
"type": "float",
"ignore_missing": true,
"ignore_failure": true
}
},
{
"convert" : {
"field" : "data.upstream_header_time",
"type": "float",
"ignore_missing": true,
"ignore_failure": true
}
},
{
"convert" : {
"field" : "data.upstream_response_time",
"type": "float",
"ignore_missing": true,
"ignore_failure": true
}
}
]
}
Is there any way i can preserve the "#timestamp" "date" type field even after the transformation carried out in ingest pipeline?
indexed document image:
Edit1: Update ingest pipeline simulate result
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_id" : "_id",
"_source" : {
"index_date" : "2022.11.08",
"#timestamp" : "2022-11-08T12:07:05.000+02:00",
"message" : """
{ "#timestamp": "2022-11-08T10:07:05+00:00", "client": "10.x.x.x", "server_name": "example.stack.com", "server_port": "80", "server_protocol": "HTTP/1.1", "method": "POST", "request": "/example/api/v1/", "request_length": "200", "status": "500", "bytes_sent": "598", "body_bytes_sent": "138", "referer": "", "user_agent": "Java/1.8.0_191", "upstream_addr": "10.x.x.x:10376", "upstream_status": "500", "gzip_ratio": "", "content_type": "application/json", "request_time": "6.826", "upstream_response_time": "6.826", "upstream_connect_time": "0.000", "upstream_header_time": "6.826", "remote_addr": "10.x.x.x", "x_forwarded_for": "10.x.x.x", "upstream_cache_status": "", "ssl_protocol": "TLSv", "ssl_cipher": "xxxx", "ssl_session_reused": "r", "request_body": "{\"date\":null,\"sourceType\":\"BPM\",\"processId\":\"xxxxx\",\"comment\":\"Process status: xxxxx: \",\"user\":\"xxxx\"}", "response_body": "{\"statusCode\":500,\"reasonPhrase\":\"Internal Server Error\",\"errorMessage\":\"xxxx\"}", "limit_req_status": "", "log_body": "1", "connection_upgrade": "close", "http_upgrade": "", "request_uri": "/example/api/v1/", "args": ""}
""",
"data" : {
"server_name" : "example.stack.com",
"request" : "/example/api/v1/",
"referer" : "",
"log_body" : "1",
"upstream_addr" : "10.x.x.x:10376",
"body_bytes_sent" : 138,
"upstream_header_time" : 6.826,
"ssl_cipher" : "xxxx",
"response_body" : """{"statusCode":500,"reasonPhrase":"Internal Server Error","errorMessage":"xxxx"}""",
"upstream_status" : "500",
"request_time" : 6.826,
"upstream_cache_status" : "",
"content_type" : "application/json",
"client" : "10.x.x.x",
"user_agent" : "Java/1.8.0_191",
"ssl_protocol" : "TLSv",
"limit_req_status" : "",
"remote_addr" : "10.x.x.x",
"method" : "POST",
"gzip_ratio" : "",
"http_upgrade" : "",
"bytes_sent" : 598,
"request_uri" : "/example/api/v1/",
"x_forwarded_for" : "10.x.x.x",
"args" : "",
"#timestamp" : "2022-11-08T10:07:05+00:00",
"upstream_connect_time" : 0.0,
"request_body" : """{"date":null,"sourceType":"BPM","processId":"xxxxx","comment":"Process status: xxxxx: ","user":"xxxx"}""",
"request_length" : 200,
"ssl_session_reused" : "r",
"server_port" : "80",
"upstream_response_time" : 6.826,
"connection_upgrade" : "close",
"server_protocol" : "HTTP/1.1",
"status" : "500"
}
},
"_ingest" : {
"timestamp" : "2023-01-18T08:06:35.335066236Z"
}
}
}
]
}
Finally able to resolve my issue. I updated the filebeat.yml with the following. Previously template name and pattern was different. But this default template name "filebeat" and pattern "filebeat" seems to be doing the job for me.
To
setup.template.name: "filebeat"
setup.template.pattern: "filebeat"
setup.template.settings:
index.number_of_shards: 1
#index.codec: best_compression
#_source.enabled: false
But still need to figure our how templates work though

aggregate multiple recursive logstash

I am using logstash with input jdbc, and would like to embed one object inside another with aggregate.
How can I use add recursive?
Ie add an object inside another object?
This would be an example:
{
"_index": "my-index",
"_type": "test",
"_id": "1",
"_version": 1,
"_score": 1,
"_source": {
"id": "1",
"properties": {
"nested_1": [
{
"A": 0,
"B": "true",
"C": "PEREZ, MATIAS ROGELIO Y/O",
"Nested_2": [
{
"Z1": "true",
"Z2": "99999"
}
},
{
"A": 0,
"B": "true",
"C": "SALVADOR MATIAS ROMERO",
"Nested_2": [
{
"Z1": "true",
"Z2": "99999"
}
}
]
}
}
}
I'm using something like that but it doesn't work
aggregate {
task_id => "%{id}"
code => "
map['id'] = event.get('id')
map['nested_1_list'] ||= []
map['nested_1'] ||= []
if (event.get('id') != nil)
if !( map['nested_1_list'].include?event.get('id') )
map['nested_1_list'] << event.get('id')
map['nested_1'] << {
'A' => event.get('a'),
'B' => event.get('b'),
'C' => event.get('c'),
map['nested_2_list'] ||= []
map['nested_2'] ||= []
if (event.get('id_2') != nil)
if !( map['nested_2_list'].include?event.get('id_2') )
map['nested_2_list'] << event.get('id_2')
map['nested_2'] << {
'Z1' => event.get('z1'),
'Z2' => event.get('z2')
}
end
end
}
end
end
event.cancel()
"
push_previous_map_as_event => true
timeout => 3
}
Any idea how to implement this?........................
..........
Finally what I did was, generate the JSON from the input, that is, from a stored procedure that is consumed from a view (vw) from the input statement of logstash.
Once consumed, I process it as json and I already have that json to work as one more variable.
# Convierto el string a json real (quita comillas y barras invertidas)
ruby {
code => "
require 'json'
json_value = JSON.parse(event.get('field_db').to_s)
event.set('field_convert_to_json',json_value)
"
}
Maybe you can try this. Note This will be applicable only when you want to have a single object and not an array of object.
Please do visit my blog for other formats.
https://xyzcoder.github.io/2020/07/29/indexing-documents-using-logstash-and-python.html
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/javalib/mssql-jdbc-8.2.2.jre11.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://host.docker.internal;database=StackOverflow2010;user=pavan;password=pavankumar#123"
jdbc_user => "pavan"
jdbc_password => "pavankumar#123"
statement => "select top 500 p.Id as PostId,p.AcceptedAnswerId,p.AnswerCount,p.Body,u.Id as userid,u.DisplayName,u.Location
from StackOverflow2010.dbo.Posts p inner join StackOverflow2010.dbo.Users u
on p.OwnerUserId=u.Id"
}
}
filter {
aggregate {
task_id => "%{postid}"
code => "
map['postid'] = event.get('postid')
map['accepted_answer_id'] = event.get('acceptedanswerid')
map['answer_count'] = event.get('answercount')
map['body'] = event.get('body')
map['user'] = {
'id' => event.get('userid'),
'displayname' => event.get('displayname'),
'location' => event.get('location')
}
map['user']['test'] = {
'test_body' => event.get('postid')
}
event.cancel()"
push_previous_map_as_event => true
timeout => 30
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200", "http://elasticsearch:9200"]
index => "stackoverflow_top"
}
stdout {
codec => rubydebug
}
}
and my output is
{
"_index" : "stackoverflow_top",
"_type" : "_doc",
"_id" : "S8WEmnMBrXsRTNbKO0JJ",
"_score" : 1.0,
"_source" : {
"#version" : "1",
"body" : """<p>How do I store binary data in MySQL?</p>
""",
"#timestamp" : "2020-07-29T12:20:22.649Z",
"answer_count" : 10,
"user" : {
"displayname" : "Geoff Dalgas",
"location" : "Corvallis, OR",
"test" : {
"test_body" : 17
},
"id" : 2
},
"postid" : 17,
"accepted_answer_id" : 26
}
Here test object is nested into the user object

Grok configuration ELK

I have an original type of log to parse. The syntax is :
2013-01-05 03:29:38,842 INFO [ajp-bio-8009-exec-69] web.CustomAuthenticationSuccessHandler - doLogin : admin.ebusiness date : 2013-01-05 03:29:38
When I use the grok pattern :
if [type] in ["edai"] {
grok {
match => { "message" => ["%{YEAR:year}-%{WORD:month}-%{DATA:day} %{DATA:hour}:%{DATA:minute}:%{DATA:second},%{DATA:millis} %{NOTSPACE:loglevel} {0,1}%{GREEDYDATA:message}"] }
overwrite => [ "message" ]
}
}
The pattern work as you can see, but when I go into Kibana, the log stay in one block in the "message" section like this:
2013-01-05 23:27:47,030 INFO [ajp-bio-8009-exec-63] web.CustomAuthenticationSuccessHandler - doLogin : admin.ebusiness date : 2013-01-05 23:27:47
I would prefer to have it like this:
{ "year": [["2013"]], "month": [["01"]], "day": [["05"]], "hour": [["04"]], "minute": [["04"]], "second": [["39"]], "millis": [["398"] ], "loglevel": [ ["INFO"]] }
Can you help me to parse it correctly please?
Just tested this configuration. I kinda copied everything from your question.
input {
stdin { type => "edai" }
}
filter {
if [type] == "edai" {
grok {
match => { "message" => ["%{YEAR:year}-%{WORD:month}-%{DATA:day} %{DATA:hour}:%{DATA:minute}:%{DATA:second},%{DATA:millis} %{NOTSPACE:loglevel} {0,1}%{GREEDYDATA:message}"] }
overwrite => [ "message" ]
}
}
}
output {
stdout { codec => rubydebug }
}
This is the output:
{
"year" => "2013",
"message" => " [ajp-bio-8009-exec-69] web.CustomAuthenticationSuccessHandler - doLogin : admin.ebusiness date : 2013-01-05 03:29:38\r",
"type" => "edai",
"minute" => "29",
"second" => "38",
"#timestamp" => 2017-06-29T08:19:08.605Z,
"month" => "01",
"hour" => "03",
"loglevel" => "INFO",
"#version" => "1",
"host" => "host_name",
"millis" => "842",
"day" => "05"
}
Everything seems fine from my perspective.
I had issue when I compared type the way you did:
if [type] in ["eday"]
It did not work and I've replaced it with direct comparison:
if [type] == "edai"
Also this worked too:
if [type] in "edai"
And that solved the issue.

Correlate messages in ELK by field

Related to: Combine logs and query in ELK
We are setting up ELK and would want to create a visualization in Kibana 4.
The issue here is that we want to relate between two different types of message.
To simplify:
Message type 1 fields: message_type, common_id_number, byte_count,
...
Message type 2 fields: message_type, common_id_number, hostname, ...
Both messages share the same index in elasticsearch.
As you can see we were trying to graph without taking that common_id_number into account, but it seems that we must use it. We don't know how yet, though.
Any help?
EDIT
These are the relevant field definitions in the ES template:
"URIHost" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
"Type" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
"SessionID" : {
"type" : "long"
},
"Bytes" : {
"type" : "long"
},
"BytesReceived" : {
"type" : "long"
},
"BytesSent" : {
"type" : "long"
},
This is a TRAFFIC type, edited document:
{
"_index": "logstash-2015.11.05",
"_type": "paloalto",
"_id": "AVDZqdBjpQiRid-uxPjE",
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2015-11-05T21:59:55.543Z",
"syslog_severity_code": 5,
"syslog_facility_code": 1,
"syslog_timestamp": "Nov 5 22:59:58",
"Type": "TRAFFIC",
"SessionID": 21713,
"Bytes": 939,
"BytesSent": 480,
"BytesReceived": 459,
},
"fields": {
"#timestamp": [
1446760795543
]
},
"sort": [
1446760795543
]
}
And this is a THREAT type document:
{
"_index": "logstash-2015.11.05",
"_type": "paloalto",
"_id": "AVDZqVNIpQiRid-uxPjC",
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2015-11-05T21:59:23.440Z",
"syslog_severity_code": 5,
"syslog_facility_code": 1,
"syslog_timestamp": "Nov 5 22:59:26",
"Type": "THREAT",
"SessionID": 21713,
"URIHost": "whatever.nevermind.com",
"URIPath": "/connectiontest.html"
},
"fields": {
"#timestamp": [
1446760763440
]
},
"sort": [
1446760763440
]
}
This is the logstash "filter" configuration:
filter {
if [type] == "paloalto" {
syslog_pri {
remove_field => [ "syslog_facility", "syslog_severity" ]
}
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{HOSTNAME:hostname} %{INT},%{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME},%{INT},%{WORD:Type},%{GREEDYDATA:log}"
}
remove_field => [ "message" ]
}
if [Type] == "THREAT" {
csv {
source => "log"
columns => [ "Threat_OR_ContentType", "ConfigVersion", "GenerateTime", "SourceAddress", "DestinationAddress", "NATSourceIP", "NATDestinationIP", "Rule", "SourceUser", "DestinationUser", "Application", "VirtualSystem", "SourceZone", "DestinationZone", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "SessionID", "RepeatCount", "SourcePort", "DestinationPort", "NATSourcePort", "NATDestinationPort", "Flags", "IPProtocol", "Action", "URL", "Threat_OR_ContentName", "reportid", "Category", "Severity", "Direction", "seqno", "actionflags", "SourceCountry", "DestinationCountry", "cpadding", "contenttype", "pcap_id", "filedigest", "cloud", "url_idx", "user_agent", "filetype", "xff", "referer", "sender", "subject", "recipient" ]
remove_field => [ "log" ]
}
mutate {
convert => {
"SessionID" => "integer"
"SourcePort" => "integer"
"DestinationPort" => "integer"
"NATSourcePort" => "integer"
"NATDestinationPort" => "integer"
}
remove_field => [ "ConfigVersion", "GenerateTime", "VirtualSystem", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "RepeatCount", "Flags", "Action", "reportid", "Severity", "seqno", "actionflags", "cpadding", "pcap_id", "filedigest", "recipient" ]
}
grok {
match => {
"URL" => "%{URIHOST:URIHost}%{URIPATH:URIPath}(%{URIPARAM:URIParam})?"
}
remove_field => [ "URL" ]
}
}
else if [Type] == "TRAFFIC" {
csv {
source => "log"
columns => [ "Threat_OR_ContentType", "ConfigVersion", "GenerateTime", "SourceAddress", "DestinationAddress", "NATSourceIP", "NATDestinationIP", "Rule", "SourceUser", "DestinationUser", "Application", "VirtualSystem", "SourceZone", "DestinationZone", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "SessionID", "RepeatCount", "SourcePort", "DestinationPort", "NATSourcePort", "NATDestinationPort", "Flags", "IPProtocol", "Action", "Bytes", "BytesSent", "BytesReceived", "Packets", "StartTime", "ElapsedTimeInSecs", "Category", "Padding", "seqno", "actionflags", "SourceCountry", "DestinationCountry", "cpadding", "pkts_sent", "pkts_received", "session_end_reason" ]
remove_field => [ "log" ]
}
mutate {
convert => {
"SessionID" => "integer"
"SourcePort" => "integer"
"DestinationPort" => "integer"
"NATSourcePort" => "integer"
"NATDestinationPort" => "integer"
"Bytes" => "integer"
"BytesSent" => "integer"
"BytesReceived" => "integer"
"ElapsedTimeInSecs" => "integer"
}
remove_field => [ "ConfigVersion", "GenerateTime", "VirtualSystem", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "RepeatCount", "Flags", "Action", "Packets", "StartTime", "seqno", "actionflags", "cpadding", "pcap_id", "filedigest", "recipient" ]
}
}
date {
match => [ "syslog_timastamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
timezone => "CET"
remove_field => [ "syslog_timestamp" ]
}
}
}
What we are trying to do is to visualize URIHost terms as X axis and Bytes, BytesSent and BytesReceived sums as Y axis.
I think you can use the aggregate filter to carry out your task. The aggregate filter provides support for aggregating several log lines into one single event based on a common field value. In your case, the common field we're going to use will be the SessionID field.
Then we need another field to detect the first event vs the second/last event that should be aggregated. In your case, this would be the Type field.
You need to change your current configuration like this:
filter {
... all other filters
if [Type] == "THREAT" {
... all other filters
aggregate {
task_id => "%{SessionID}"
code => "map['URIHost'] = event['URIHost']; map['URIPath'] = event['URIPath']"
}
}
else if [Type] == "TRAFFIC" {
... all other filters
aggregate {
task_id => "%{SessionID}"
code => "event['URIHost'] = map['URIHost']; event['URIPath'] = map['URIPath']"
end_of_task => true
timeout => 120
}
}
}
The general idea is that when Logstash encounters THREAT logs it will temporarily store the URIHost and URIPath in the in-memory event map, and then when a TRAFFIC log comes in, the URIHost and URIPath fields will be added to the event. You can copy other fields, too, if needed. You can also adapt the timeout (in seconds) depending on how long you expect a TRAFFIC event to come in after the last THREAT event.
In the end, you'll get documents with data merged from both THREAT and TRAFFIC log lines and you can easily create the visualization showing bytes count per URIHost as shown on your screenshot.

Resources