How to filter jsonb object in logstash - jdbc

I expected it to become a filter if I mutate it after applying a filter to the json type.
I was using jdbc input
postgres some_column type is jsonb type
Dataset
some_column: {
id: 1,
type: "some",
name: "name"
}
I was trying to
filter {
json {
source => "some_column"
}
mutate {
add_field => {
"[some_column][name]" => "some_column_name"
"[some_column][id]" => "some_column_id"
}
remove_field => ["some_column"]
}
}
Logstash throw this error
Sep 24 08:20:05 ip-172-31-39-95 logstash[60415]: [2022-09-24T08:20:05,028][WARN ][logstash.filters.mutate ][main][6b5bfecfc90257b9d235f6af6be2c9cef079a5ab89f59bf98b51f017841e4465] Exception caught while applying mutate filter {:exception=>"Could not set field 'name' on object '' to value 'some_column_name'.This is probably due to trying to set a field like [foo][bar] = someValuewhen [foo] is not either a map or a string"}
I want to save only the necessary values ​​in the json field and delete the some_column field, how do I do that?

Related

I want to convert a single json event into multiple events, through logstash Hope to get some inspiration, thanks

4 fields (warnTags、warnSlrs、warnActions、denyMsg) fields need to be separated by semicolon(;)
Raw String
{ "waf": {
"warnTags": "OWASP_CRS/WEB_ATTACK/SQL_INJECTION;OWASP_CRS/WEB_ATTACK/XSS;OWASP_CRS/WEB_ATTACK/XSS;OWASP_CRS/WEB_ATTACK/XSS;OWASP_CRS/WEB_ATTACK/SPECIAL_CHARS;OWASP_CRS/WEB_ATTACK/SQL_INJECTION",
"policy": "bot_77598",
"warnSlrs": "ARGS:wvstest;ARGS:wvstest;ARGS:wvstest;ARGS:wvstest;ARGS:wvstest;ARGS:wvstest",
"riskTuples": ":-973305-973333-973335",
"warnActions": "2;2;2;2;2;2",
"denyActions": "3",
"warnMsg": "SQL Injection Attack;XSS Attack Detected;IE XSS Filters - Attack Detected;IE XSS Filters - Attack Detected;Restricted SQL Character Anomaly Detection Alert - Total # of special characters exceeded;Classic SQL Injection Probes 1/2",
"riskGroups": ":XSS-ANOMALY",
"warnRules": "950901;973305;973333;973335;981173;981242",
"denyMsg": "Anomaly Score Exceeded for Cross-Site Scripting",
"ver": "2.0",
"denyData": "VmVjdG9yIFNjb3JlOiBx",
"riskScores": ":-5-5-2",
"warnData": "eHNzdGFnPigpbG9jeHNz;amF2YXNYcm"
} }
Expected Output Result
{
"waf": {
"warnTags": "OWASP_CRS/WEB_ATTACK/SQL_INJECTION",
"policy": "bot_77598",
"warnSlrs": "ARGS:wvstest",
"riskTuples": ":-973305-973333-973335",
"warnActions": "2",
"denyActions": "3",
"warnMsg": "SQL Injection Attack",
"riskGroups": ":XSS-ANOMALY",
"warnRules": "950901",
"denyMsg": "Anomaly Score Exceeded for Cross-Site Scripting",
"ver": "2.0",
"denyData": "VmVjdG9yIFNjb3JlOiBx",
"riskScores": ":-5-5-2",
"warnData": "eHNzdGFnPigpbG9jeHNz;amF2YXNYcm"
}
}
{
"waf": {
"warnTags": "OWASP_CRS/WEB_ATTACK/XSS",
"policy": "bot_77598",
"warnSlrs": "ARGS:wvstest",
"riskTuples": ":-973305-973333-973335",
"warnActions": "2",
"denyActions": "3",
"warnMsg": "XSS Attack Detected",
"riskGroups": ":XSS-ANOMALY",
"warnRules": "973305",
"denyMsg": "Anomaly Score Exceeded for Cross-Site Scripting",
"ver": "2.0",
"denyData": "VmVjdG9yIFNjb3JlOiBx",
"riskScores": ":-5-5-2",
"warnData": "eHNzdGFnPigpbG9jeHNz;amF2YXNYcm"
}
}
filter {
ruby {
code => "
#info = []
events = event.to_hash
#warnTags = events['waf']['warnTags'].split(';')
#warnMsgs = events['waf']['warnMsg'].split(';')
#warnActions = events['waf']['warnActions'].split(';')
#warnRules = events['waf']['warnRules'].split(';')
#list = #warnTags.zip( #warnMsgs, #warnActions, #warnRules )
#list.each do |tag, msg, action, rule|
detail = {
'tag' => tag,
'msg' => msg,
'action' => action,
'rule' => rule
}
#info.push(detail)
end
event.remove('[waf][warnTags]')
event.remove('[waf][warnMsg]')
event.remove('[waf][warnActions]')
event.remove('[waf][warnRules]')
event.set('[waf][info]', #info)
"
}
split {
field => "[waf][info]"
}}
The config below should be along the lines of what you need. It includes parsing as json at the outset which you may not need depending on prior steps in your pipeline. Essentially this will split the warnTags field on ; to begin with; that will result in warnTags being an array nested within one object. The output of the string split is passed in the to higher level split filter which will create multiple output events splitting on input field, in this case warnTags (again). Hope this helps!
[EDIT: Added warnSlrs as second split field]
filter {
json {
source => "message"
}
mutate {
split => {"[waf][warnTags]" => ";"}
}
mutate {
split => {"[waf][warnSlrs]" => ";"}
}
split {
field => "[waf][warnTags]"
}
split {
field => "[waf][warnSlrs]"
}
}

Created nested fields from Xpath & check for existing documents

I have two questions;
parsing xml data & adding it to an array in a record in an index
checking for an existing record in an index and if it exists add the new data of that record to the array of the existing record
I have an jdbc input that has an xml column,
input {
jdbc {
....
statement => "SELECT event_xml....
}
}
then an xml filter to parse the data,
How do i make the the last 3 xpaths to be an array? Do i need a mutate or ruby filter? I cant seem to figure it out
filter {
xml {
source => "event_xml"
remove_namespaces => true
store_xml => false
force_array => false
xpath => [ "/CaseNumber/text()", "case_number" ]
xpath => [ "/FormName/text()", "[conversations][form_name]" ]
xpath => [ "/EventDate/text()", "[conversations][event_date]" ]
xpath => [ "/CaseNote/text()", "[conversations][case_note]" ]
}
}
so it would something like this look like this in the Elastic search.
{
"case_number" : "12345",
"conversations" :
[
{
"form_name" : "form1",
"event_date" : "2019-01-09T00:00:00Z",
"case_note" : "this is a case note"
}
]
}
So second question is, if there is already a unique case_number of "12345" instead of creating a new record for this add the new xml values to the conversations array. so it would look like this
{
"case_number" : "12345",
"conversations" : [
{
"form_name" : "form1",
"event_date" : "2019-01-09T00:00:00Z",
"case_note" : "this is a case note"
},
{
"form_name" : "form2",
"event_date" : "2019-05-09T00:00:00Z",
"case_note" : "this is another case note"
}
]
}
my output filter
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "cases"
manage_template => false
}
}
Is this possible? thanks
this ruby filter created the array
ruby {
code => '
event.set("conversations", [Hash[
"publish_event_id", event.get("publish_event_id"),
"form_name", event.get("form_name"),
"event_date", event.get("event_date"),
"case_note", event.get("case_note")
]])
'
}
for the output was resolved by
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "cases"
document_id => "%{case_number}"
action => "update"
doc_as_upsert => true
script => "
boolean recordExists = false;
for (int i = 0; i < ctx._source.conversations.length; i++)
{
if(ctx._source.conversations[i].publish_event_id == params.event.get('conversations')[0].publish_event_id)
{
recordExists = true;
}
}
if(!recordExists){
ctx._source.conversations.add(params.event.get('conversations')[0]);
}
"
manage_template => false
}
}

Add/modify text between parentheses

I'm trying to make a classified text, and I'm having problem turning
(class1 (subclass1) (subclass2 item1 item2))
To
(class1 (subclass1 item1) (subclass2 item1 item2))
I have no idea to turn text above to below one, without caching subclass1 in memory. I'm using Perl on Linux, so any solution using shell script or Perl is welcome.
Edit: I've tried using grep, saving whole subclass1 in a variable, then modify and exporting it to the list; but the list may get larger and that way will use a lot of memory.
I have no idea to turn text above to below one
The general approach:
Parse the text.
You appear to have lists of space-separated lists and atoms. If so, the result could look like the following:
{
type => 'list',
value => [
{
type => 'atom',
value => 'class1',
},
{
type => 'list',
value => [
{
type => 'atom',
value => 'subclass1',
},
]
},
{
type => 'list',
value => [
{
type => 'atom',
value => 'subclass2',
},
{
type => 'atom',
value => 'item1',
},
{
type => 'atom',
value => 'item2',
},
],
}
],
}
It's possible that something far simpler could be generated, but you were light on details about the format.
Extract the necessary information from the tree.
You were light on details about the data format, but it could be as simple as the following if the above data structure was created by the parser:
my $item = $tree->{value}[2]{value}[1]{value};
Perform the required modifications.
You were light on details about the data format, but it could be as simple as the following if the above data structure was created by the parser:
my $new_atom = { type => 'atom', value => $item };
push #{ $tree->{value}[1]{value} }, $new_atom;
Serialize the data structure.
For the above data structure, you could use the following:
sub serialize {
my ($node) = #_;
return $node->{type} eq 'list'
? "(".join(" ", map { serialize($_) } #{ $node->{value} }).")"
: $node->{value};
}
Other approaches could be available depending on the specifics.

Are nested queries used for object mappings in ElasticSearch?

Say we have a document model in ElasticSearch, which has a string (registration) and an object datatype called currentname, which then has a value called name. I have added model JSON in the bottom.
How do you query this?
Since the metadata item is not using the nested keyword, I assume we can get the data without the nested. However, since the metadata do have a properties value, I am unsure.
My attempts:
I have tried the simple model (NEST):
var res2 = client.Search<dynamic>(s =>
s.Index("myindexname").AllTypes().
Query(q => q.
Bool(b => b.
Must(mu => mu.
Term(te =>
te.Field("metadata.currentname.name").Value(query))))));
This doesn't return any documents.
However, if I have to use nested queries, I don't completely understand why (is it because these objects are basically a different index?). And how would the DSL (or NEST code) look to do the right call?
Document model:
properties: {
Company: {
properties: {
registration: {
type: "string"
},
metadata: {
currentname: {
properties: {
name: "string"
}
}
}
}
}
}
Nested query are used only with nested datatypes

logstash kv filter, converting strings to integers using dynamic mapping

I have a log with a format similar to:
name=johnny amount=30 uuid=2039248934
The problem is I am using this parser on multiple log files with each basically containing numerous kv pairs.
Is there a way to recognize when values are integers and cast them as such without having to use mutate on every single key value pair?(Rather than a string)
I found this link but it was very vague in where the template json file was suppose to go and how I was to go about using it.
Can kv be told to auto-detect numeric values and emit them as numeric JSON values?
You can use ruby plugin to do it.
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event['message'].split(' ');
for field in fieldArray
name = field.split('=')[0];
value = field.split('=')[1];
if value =~ /\A\d+\Z/
event[name] = value.to_i
else
event[name] = value
end
end
"
}
}
output {
stdout { codec => rubydebug }
}
First, split the message to an array by SPACE.
Then, for each k,v mapping, check whether the value is numberic, if YES, convert it to Integer.
Here is the sample output for your input:
{
"message" => "name=johnny amount=30 uuid=2039248934",
"#version" => "1",
"#timestamp" => "2015-06-25T08:24:39.755Z",
"host" => "BEN_LIM",
"name" => "johnny",
"amount" => 30,
"uuid" => 2039248934
}
Update Solution for Logstash 5:
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event['message'].split(' ');
for field in fieldArray
name = field.split('=')[0];
value = field.split('=')[1];
if value =~ /\A\d+\Z/
event.set(name, value.to_i)
else
event.set(name, value)
end
end
"
}
}
output {
stdout { codec => rubydebug }
}
Note, if you decide to upgrade to Logstash 5, there are some breaking changes:
https://www.elastic.co/guide/en/logstash/5.0/breaking-changes.html
In particular, it is the event that needs to be modified to use either event.get or event.set. Here is what I used to get it working (based on Ben Lim's example):
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event.get('message').split(' ');
for field in fieldArray
name = field.split('=')[0];
value = field.split('=')[1];
if value =~ /\A\d+\Z/
event.set(name, value.to_i)
else
event.set(name, value)
end
end
"
}
}
output {
stdout { codec => rubydebug }
}

Resources