Logstash filter: aggregate nested arrays - ruby

I'm trying to fetch data from MySQL and push it to ElasticSearch using Logstash and I'm having trouble creating a config file for Logstash that suits my need
I'm trying to achieve this result:
{
"products":[
{
"id":1,
"deals":[
{
"id":5,
"options":[
{
"id":3
},
{
"id":8
}
]
}
]
}
]
}
In MySQL, each of these has its own table, meaning that
Product -> Deal (ONE => MANY) |
Deal -> Deal Option(ONE => MANY)
To combine them all, I have a MySQL View that would LEFT JOIN those tables so I can process everything using LogStash
Here is my current LogStash Configuration
filter {
aggregate {
task_id => "%{id}"
code => "
map['id'] ||= event.get('id')
map['deals'] ||= []
map['deals'] << {'id' => event.get('deal_id')}
event.cancel()
"
push_previous_map_as_event => true
timeout => 15
}
}
Although I got stuck at the part where I need to add Deal Options to a Deal, is my current Logstash config correct? If it is, how can I complete it, thanks for your time!

Related

Splitting A Nested JSON In Logstash

I am working on a project, ingesting CVE data from NVD into Elasticsearch.
My input data looks something like this:
{
"resultsPerPage":20,
"startIndex":0,
"totalResults":189227,
"result":{
"CVE_data_type":"CVE",
"CVE_data_format":"MITRE",
"CVE_data_version":"4.0",
"CVE_data_timestamp":"2022-11-23T08:27Z",
"CVE_Items":[
{...},
{...},
{...},
{
"cve":{
"data_type":"CVE",
"data_format":"MITRE",
"data_version":"4.0",
"CVE_data_meta":{
"ID":"CVE-2022-45060",
"ASSIGNER":"cve#mitre.org"
},
"problemtype":{...},
"references":{...},
"description":{
"description_data":[
{
"lang":"en",
"value":"An HTTP Request Forgery issue was discovered in Varnish Cache 5.x and 6.x before 6.0.11, 7.x before 7.1.2, and 7.2.x before 7.2.1. An attacker may introduce characters through HTTP/2 pseudo-headers that are invalid in the context of an HTTP/1 request line, causing the Varnish server to produce invalid HTTP/1 requests to the backend. This could, in turn, be used to exploit vulnerabilities in a server behind the Varnish server. Note: the 6.0.x LTS series (before 6.0.11) is affected."
}
]
}
},
"configurations":{
"CVE_data_version":"4.0",
"nodes":[
{
"operator":"OR",
"children":[
],
"cpe_match":[
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r2:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r1:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish_cache_project:varnish_cache:7.2.0:*:*:*:*:*:*:*",
"cpe_name":[
]
}
]
}
]
},
"impact":{...},
"publishedDate":"2022-11-09T06:15Z",
"lastModifiedDate":"2022-11-23T03:15Z"
}
]
}
}
Each query from the API gives back a result like this, containing 20 individual CVEs, each CVE contains multiple configuration. What I wanted to achieve was to have one document per configuration, where the other data is the same, only the cpe23Uri field changes. I've tried using the split filter in multiple different ways, with no success.
My pipeline looks like this:
filter {
json {
source => "message"
}
split {
field => "[result][CVE_Items]"
}
split {
field => "[result][CVE_Items][cve][description][description_data]"
}
}
kv {
source => "message"
field_split => ","
}
mutate {
remove_field => [...]
rename =>
rename => ["[result][CVE_Items][configurations][nodes]", "Affected_Products"]
.
.
.
rename => ["[result][CVE_data_timestamp]", "CVE_Timestamp"]
}
fingerprint {
source => ["CVE", "Affected Products"]
target => "fingerprint"
concatenate_sources => "true"
method => "SHA256"
}
}
For splitting I've tried simply using:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
and also with adding [cpe23Uri] to the end,
as well as:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
target => "cpeCopy"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
if [result][CVE_Items][configurations][nodes][cpe_match] {
ruby {
code => "event['cpeCopy'] = event['[result][CVE_Items][configurations][nodes][cpe_match]'][0]"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
}
if [cpeCopy] {
mutate {
rename => { "cpeCopy" => "CPE" }
}
As I've said I've tried multiple ways, I won't list them all but unfortunately nothing has worked so far.

Read a CSV in Logstash level and filter on basis of the extracted data

I am using Metricbeat to get process-level data and push it to Elastic Search using Logstash.
Now, the aim is to categorize the processes into 2 tags i.e the process running is either a browser or it is something else.
I am able to do that statically using this block of code :
input {
beats {
port => 5044
}
}
filter{
if [process][name]=="firefox.exe" or [process][name]=="chrome.exe" {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash"
}
}
But when I try to make that if condition dynamic by reading the process list from a CSV, I am not getting any valid results in Kibana, nor a error on my LogStash level.
The CSV config file code is as follows :
input {
beats {
port => 5044
}
file{
path=>"filePath"
start_position=>"beginning"
sincedb_path=>"NULL"
}
}
filter{
csv{
separator=>","
columns=>["processList","IT"]
}
if [process][name] in [processList] {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash2"
}
}
What you are trying to do does not work that way in logstash, the events in a logstash pipeline are independent from each other.
The events received by your beats input have no knowledge about the events received by your csv input, so you can't use fields from different events in a conditional.
To do what you want you can use the translate filter with the following config.
translate {
field => "[process][name]"
destination => "[process][type]"
dictionary_path => "process.csv"
fallback => "others"
refresh_interval => 300
}
This filter will check the value of the field [process][name] against a dictionary, loaded into memory from the file process.csv, the dictionary is a .csv file with two columns, the first is the name of the browser process and the second is always browser.
chrome.exe,browser
firefox.exe,browser
If the filter got a match, it will populate the field [process][type] (not process.type) with the value from the second column, in this case, always browser, if there is no match, it will populate the field [process][type] with the value of the fallback config, in this case, others, it will also reload the content of the process.csv file every 300 seconds (5 minutes)

update elastic-search document with the same ID

everyone. I'm new in elk and I have a question about logstash.
I have some services and each one has 4 or 6 logs; it means a doc in elastic may has 4 or 6 logs.
I want to read these logs and if they have the same id, put them in one elastic doc.
I must specify that all of the logs have a unique "id" and each request and every log that refers to that request has the same id. each log has a specific type.
I want to put together every log that has the same id and type; like this:
{
"_id":"123",
"Type1":{},
"Type2":[{},{}],
"Type3":[{},{}],
"Type4":{}
}
Every log for the same requset:
Some of them must be in the same group. because their type are the same. look example above. Type2 is Json Array and has 2 jsons. I want to use logstash to read every log and have them classified.
Imagine that our doc is like bellow JSON at the moment:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":{}
}
now a new log arrives, with id 123 and it's type is Type4. The doc must update like this:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":{},
"Type4":{}
}
again, I have new log with id, 123 and type, Type3. the doc update like this:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":[{},{}],
"Type4":{}
}
I tried with script, but I didn't succeed. :
{
"id": 1,
"Type2": {}
}
The script is:
input {
stdin {
codec => json_lines
}
}
output {
elasticsearch {
hosts => ["XXX.XXX.XXX.XXX:9200"]
index => "ss"
document_id => "%{requestId}"
action => "update" # update if possible instead of overwriting
document_type => "_doc"
script_lang => "painless"
scripted_upsert => true
script_type => "inline"
script => 'if (ctx._source.Type3 == null) { ctx._source.Type3 = new ArrayList() } if(!ctx._source.Type3.contains("%{Type3}")) { ctx._source.Type3.add("%{Type3}")}'
}
}
now my problem is this script format just one type; if it works for multiple types, what would it look like?
there is one more problem. I have some logs that they don't have an id, or they have an id, but don't have a type. I want to have these logs in the elastic, what should I do?
You can have a look on aggregate filter plugin for logstash. Or as you mentioned if some of the logs don't have an id, then you can use fingerprint filter plugin to create an id, which you can use to update document in elasticsearch.
E.g:
input {
stdin {
codec => json_lines
}
}
filter {
fingerprint {
source => "message"
target => "[#metadata][id]"
method => "MURMUR3"
}
}
output {
elasticsearch {
hosts => ["XXX.XXX.XXX.XXX:9200"]
index => "ss"
document_id => "%{[#metadata][id]}"
action => "update" # update if possible instead of overwriting
}
}

Logstash: error for querying elasticsearch

Hello everyone,
Through logstash, I want to query elasticsearch in order to get fields from previous events and do some computation with fields of my current event and add new fields. Here is what I did:
input file:
{"device":"device1","count":5}
{"device":"device2","count":11}
{"device":"device1","count":8}
{"device":"device3","count":100}
{"device":"device3","count":95}
{"device":"device3","count":155}
{"device":"device2","count":15}
{"device":"device1","count":55}
My expected output:
{"device":"device1","count":5,"previousCount=0","delta":0}
{"device":"device2","count":11,"previousCount=0","delta":0}
{"device":"device1","count":8,"previousCount=5","delta":3}
{"device":"device3","count":100,"previousCount=0","delta":0}
{"device":"device3","count":95,"previousCount=100","delta":-5}
{"device":"device3","count":155,"previousCount=95","delta":60}
{"device":"device2","count":15,"previousCount=11","delta":4}
{"device":"device1","count":55,"previousCount=8","delta":47}
Logstash filter part:
filter {
elasticsearch {
hosts => ["localhost:9200/device"]
query => 'device:"%{[device]}"'
sort => "#timestamp:desc"
fields => ['count','previousCount']
}
if [previousCount]{
ruby {
code => "event[delta] = event[count] - event[previousCount]"
}
}
else{
mutate {
add_field => { "previousCount" => "0" }
add_field => { "delta" => "0" }
}
}
}
My problem:
For every line of my input file I got the following error : Failed to query elasticsearch for previous event ..
It seems that every line completely treated is not put in elasticsearch before logstash starts to treat the next line.
I don't know if my conclusion is correct and, if yes, why it happens.
So, do you know how I could solve this problem please ?!
Thank you for your attention and your help.
S

Logstash configuration condition

i am new to Logstash, elasticsearch.
I have NodeJS app, where i am sending logs trough Winston:Redis. I have different types of logs, like Requests, system, etc. And i want these logs to be in separate index_type inside ElasticSearch.
I am sending these keys fe. : "web:production:request", "web:production:system" and im sending JSON obejcts.
My configuration is:
NodeJS (Winston Redis client) -> Redis -> Logstash -> Elastic search
Its working good, except index_types.
I have 1 redis client (stream/subcribe) and i want to filter these logs depending on key value to different index_types inside elastic search output.
I tried this config:
input {
redis {
host => "127.0.0.1"
data_type => "pattern_channel"
key => "web:production:*"
codec => json
}
filter {
if [key] == "web:production:request" {
alter {
add_field => { "index_type" => "request" }
}
}
if [key] == "web:production:system" {
alter {
add_field => { "index_type" => "system" }
}
}
}
output {
elasticsearch {
index => "web-production-%{+YYYY.MM.dd}"
index_type => "%{index_type}"
# THIS IS NOT WORKING
protocol => "http"
}
}
So questions are:
How do conditionals right ?
How would you proceed if you want to send different indexes depending on conditions
I cannot have condition inside command ? fe. grok { if [key] == "1" {} } ?
suggestion for a workaround:
output {
if [index_type] == "request"{
elasticsearch {
index => "web-production-request%{+YYYY.MM.dd}"
protocol => "http"
}
}
if [index_type] == "system"{
elasticsearch {
index => "web-production-system%{+YYYY.MM.dd}"
protocol => "http"
}
}
}

Resources