Splitting A Nested JSON In Logstash - elasticsearch

I am working on a project, ingesting CVE data from NVD into Elasticsearch.
My input data looks something like this:
{
"resultsPerPage":20,
"startIndex":0,
"totalResults":189227,
"result":{
"CVE_data_type":"CVE",
"CVE_data_format":"MITRE",
"CVE_data_version":"4.0",
"CVE_data_timestamp":"2022-11-23T08:27Z",
"CVE_Items":[
{...},
{...},
{...},
{
"cve":{
"data_type":"CVE",
"data_format":"MITRE",
"data_version":"4.0",
"CVE_data_meta":{
"ID":"CVE-2022-45060",
"ASSIGNER":"cve#mitre.org"
},
"problemtype":{...},
"references":{...},
"description":{
"description_data":[
{
"lang":"en",
"value":"An HTTP Request Forgery issue was discovered in Varnish Cache 5.x and 6.x before 6.0.11, 7.x before 7.1.2, and 7.2.x before 7.2.1. An attacker may introduce characters through HTTP/2 pseudo-headers that are invalid in the context of an HTTP/1 request line, causing the Varnish server to produce invalid HTTP/1 requests to the backend. This could, in turn, be used to exploit vulnerabilities in a server behind the Varnish server. Note: the 6.0.x LTS series (before 6.0.11) is affected."
}
]
}
},
"configurations":{
"CVE_data_version":"4.0",
"nodes":[
{
"operator":"OR",
"children":[
],
"cpe_match":[
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r2:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r1:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish_cache_project:varnish_cache:7.2.0:*:*:*:*:*:*:*",
"cpe_name":[
]
}
]
}
]
},
"impact":{...},
"publishedDate":"2022-11-09T06:15Z",
"lastModifiedDate":"2022-11-23T03:15Z"
}
]
}
}
Each query from the API gives back a result like this, containing 20 individual CVEs, each CVE contains multiple configuration. What I wanted to achieve was to have one document per configuration, where the other data is the same, only the cpe23Uri field changes. I've tried using the split filter in multiple different ways, with no success.
My pipeline looks like this:
filter {
json {
source => "message"
}
split {
field => "[result][CVE_Items]"
}
split {
field => "[result][CVE_Items][cve][description][description_data]"
}
}
kv {
source => "message"
field_split => ","
}
mutate {
remove_field => [...]
rename =>
rename => ["[result][CVE_Items][configurations][nodes]", "Affected_Products"]
.
.
.
rename => ["[result][CVE_data_timestamp]", "CVE_Timestamp"]
}
fingerprint {
source => ["CVE", "Affected Products"]
target => "fingerprint"
concatenate_sources => "true"
method => "SHA256"
}
}
For splitting I've tried simply using:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
and also with adding [cpe23Uri] to the end,
as well as:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
target => "cpeCopy"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
if [result][CVE_Items][configurations][nodes][cpe_match] {
ruby {
code => "event['cpeCopy'] = event['[result][CVE_Items][configurations][nodes][cpe_match]'][0]"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
}
if [cpeCopy] {
mutate {
rename => { "cpeCopy" => "CPE" }
}
As I've said I've tried multiple ways, I won't list them all but unfortunately nothing has worked so far.

Related

Read a CSV in Logstash level and filter on basis of the extracted data

I am using Metricbeat to get process-level data and push it to Elastic Search using Logstash.
Now, the aim is to categorize the processes into 2 tags i.e the process running is either a browser or it is something else.
I am able to do that statically using this block of code :
input {
beats {
port => 5044
}
}
filter{
if [process][name]=="firefox.exe" or [process][name]=="chrome.exe" {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash"
}
}
But when I try to make that if condition dynamic by reading the process list from a CSV, I am not getting any valid results in Kibana, nor a error on my LogStash level.
The CSV config file code is as follows :
input {
beats {
port => 5044
}
file{
path=>"filePath"
start_position=>"beginning"
sincedb_path=>"NULL"
}
}
filter{
csv{
separator=>","
columns=>["processList","IT"]
}
if [process][name] in [processList] {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash2"
}
}
What you are trying to do does not work that way in logstash, the events in a logstash pipeline are independent from each other.
The events received by your beats input have no knowledge about the events received by your csv input, so you can't use fields from different events in a conditional.
To do what you want you can use the translate filter with the following config.
translate {
field => "[process][name]"
destination => "[process][type]"
dictionary_path => "process.csv"
fallback => "others"
refresh_interval => 300
}
This filter will check the value of the field [process][name] against a dictionary, loaded into memory from the file process.csv, the dictionary is a .csv file with two columns, the first is the name of the browser process and the second is always browser.
chrome.exe,browser
firefox.exe,browser
If the filter got a match, it will populate the field [process][type] (not process.type) with the value from the second column, in this case, always browser, if there is no match, it will populate the field [process][type] with the value of the fallback config, in this case, others, it will also reload the content of the process.csv file every 300 seconds (5 minutes)

Lowercase field name in Logstash for Elasticsearch index

I have a logstash command that I'm piping a file to that will write to Elasticsearch. I want to use one field to select the index I will write to (appName). However the data in this field is not all lowercase so I need to do that when selecting the index but I don't want the data in the document itself to be modified.
I have an attempt below where I first copy the original field (appName) to a new one (appNameIndex), lowercase the new field, remove it from the upload and then use it pick the index.
input {
stdin { type => stdin }
}
filter {
csv {
separator => " "
columns => ["appName", "field1", "field2", ...]
convert => {
...
}
}
filter {
mutate {
copy => ["appName", "appNameIndex"]
}
}
filter {
mutate {
lowercase => ["appNameIndex"]
}
}
filter {
mutate {
remove_field => [
"appNameIndex", // if I remove this it works
...
]
}
}
output {
amazon_es {
hosts =>
["my-es-cluster.us-east-1.es.amazonaws.com"]
index => "%{appNameIndex}"
region => "us-east-1"
}
}
However I am getting errors that say
Invalid index name [%{appIndexName}]
Clearly it's not grabbing my mutation. Is it because the remove section takes it out entirely? I was hoping that just removed it from the document upload. Am I going about this incorrectly?
UPDATE I tried taking out the remove index name part and it does in fact work, so that helps identify the source of the error. Now the question becomes how do I get around it. With that part of the config removed I essentially have two fields with the same data, one lowercased and one not
You can define a #metadata field that is a special field which will never be included in the output https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#metadata.
input {
stdin { type => stdin }
}
filter {
csv {
separator => " "
columns => ["appName", "field1", "field2", ...]
convert => {
...
}
}
filter {
mutate {
copy => ["appName", "[#metadata][appNameIndex]"]
}
}
filter {
mutate {
lowercase => ["[#metadata][appNameIndex]"]
}
}
output {
amazon_es {
hosts => ["my-es-cluster.us-east-1.es.amazonaws.com"]
index => "%{[#metadata][appNameIndex]}"
region => "us-east-1"
}
}

Does Logstash support Elasticsearch's _update_by_query?

Does the Elasticsearch output plugin support elasticsearch's _update_by_query?
https://www.elastic.co/guide/en/logstash/6.5/plugins-outputs-elasticsearch.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html
The elasticsearch output plugin can only make calls to the _bulk endpoint, i.e. using the Bulk API.
If you want to call the Update by Query API, you need to use the http output plugin and construct the query inside the event yourself. If you explain what you want to achieve, I can update my answer with some more details.
Note: There's an issue requesting this feature, but it's still open after two years.
UPDATE
So if your input event is {"cname":"wang", "cage":11} and you want to update by query all documents with "cname":"wang" to set "cage":11, your query needs to look like this:
POST your-index/_update_by_query
{
"script": {
"source": "ctx._source.cage = params.cage",
"lang": "painless",
"params": {
"cage": 11
}
},
"query": {
"term": {
"cname": "wang"
}
}
}
So your Logstash config should look like this (your input may vary but I used stdin for testing purposes):
input {
stdin {
codec => "json"
}
}
filter {
mutate {
add_field => {
"[script][lang]" => "painless"
"[script][source]" => "ctx._source.cage = params.cage"
"[script][params][cage]" => "%{cage}"
"[query][term][cname]" => "%{cname}"
}
remove_field => ["host", "#version", "#timestamp", "cname", "cage"]
}
}
output {
http {
url => "http://localhost:9200/index/doc/_update_by_query"
http_method => "post"
format => "json"
}
}
The same result can be obtained with standard elasticsearch plugins:
input {
elasticsearch {
hosts => "${ES_HOSTS}"
user => "${ES_USER}"
password => "${ES_PWD}"
index => "<your index pattern>"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
...
}
output {
elasticsearch {
hosts => "${ES_HOSTS}"
user => "${ES_USER}"
password => "${ES_PWD}"
action => "update"
document_id => "%{[#metadata][_id]}"
index => "%{[#metadata][_index]}"
}
}

ELK for windows logs processing

I've made a working ELK stack on Debian Wheezy and have set up Nxlog to gather windows logs. I see the logs in Kibana - everything is working fine, but i get too much data and want to filter it by removing some fields that I don't need.
I've made a filter section but it's not working at all. What can be the reason?
The filter above
input {
tcp {
type => "eventlog"
port => 3515
format => "json"
}
}
filter {
type => "eventlog"
mutate {
remove => { "Hostname", "Keywords", "SeverityValue", "Severity", "SourceName", "ProviderGuid" }
remove => { "Version", "Task", "OpcodeValue", "RecordNumber", "ProcessID", "ThreadID", "Channel" }
remove => { "Category", "Opcode", "SubjectUserSid", "SubjectUserName", "SubjectDomainName" }
remove => { "SubjectLogonId", "ObjectType", "IpPort", "AccessMask", "AccessList", "AccessReason" }
remove => { "EventReceivedTime", "SourceModuleName", "SourceModuleType", "#version", "type" }
remove => { "_index", "_type", "_id", "_score", "_source", "KeyLength", "TargetUserSid" }
remove => { "TargetDomainName", "TargetLogonId", "LogonType", "LogonProcessName", "AuthenticationPackageName" }
remove => { "LogonGuid", "TransmittedServices", "LmPackageName", "ProcessName", "ImpersonationLevel" }
}
}
output {
elasticsearch {
cluster => "wisp"
node_name => "io"
}
}
I think you try to remove fields that do not exist in some logs.
Does all your logs contains all the fieds you're trying to remove ?
If not, you have to identify your logs before removing fields.
Your filter config will look like this :
filter {
type => "eventlog"
if [somefield] == "somevalue" {
mutate {
remove => { "specificfieldtoremove1", "specificfieldtoremove2" }
}
}
}

Changing the elasticsearch host in logstash 1.3.3 web interface

I followed the steps in this document and I was able to do get some reports on the Shakespeare data.
I want to do the same thing with elastic search remotely installed.I tried configuring the "host" in config file but the queries still run on host as opposed to remote .This is my config file
input {
stdin{
type => "stdin-type" }
file {
type => "accessLog"
path => [ "/Users/akushe/Downloads/requests.log" ]
}
}
filter {
grok {
match => ["message","%{COMMONAPACHELOG} (?:%{INT:responseTime}|-)"]
}
kv {
source => "request"
field_split => "&?"
}
if [lng] {
kv {
add_field => [ "location" , ["%{lng}","%{lat}"]]
}
}else if [lon] {
kv {
add_field => [ "location" , ["%{lon}","%{lat}"]]
}
}
}
output {
elasticsearch {
host => "slc-places-qa-es3001.slc.where.com"
port => 9200
}
}
You need to add protocol => http in to make it use HTTP transport rather than joining the cluster using multicast.

Resources