Created nested fields from Xpath & check for existing documents - elasticsearch

I have two questions;
parsing xml data & adding it to an array in a record in an index
checking for an existing record in an index and if it exists add the new data of that record to the array of the existing record
I have an jdbc input that has an xml column,
input {
jdbc {
....
statement => "SELECT event_xml....
}
}
then an xml filter to parse the data,
How do i make the the last 3 xpaths to be an array? Do i need a mutate or ruby filter? I cant seem to figure it out
filter {
xml {
source => "event_xml"
remove_namespaces => true
store_xml => false
force_array => false
xpath => [ "/CaseNumber/text()", "case_number" ]
xpath => [ "/FormName/text()", "[conversations][form_name]" ]
xpath => [ "/EventDate/text()", "[conversations][event_date]" ]
xpath => [ "/CaseNote/text()", "[conversations][case_note]" ]
}
}
so it would something like this look like this in the Elastic search.
{
"case_number" : "12345",
"conversations" :
[
{
"form_name" : "form1",
"event_date" : "2019-01-09T00:00:00Z",
"case_note" : "this is a case note"
}
]
}
So second question is, if there is already a unique case_number of "12345" instead of creating a new record for this add the new xml values to the conversations array. so it would look like this
{
"case_number" : "12345",
"conversations" : [
{
"form_name" : "form1",
"event_date" : "2019-01-09T00:00:00Z",
"case_note" : "this is a case note"
},
{
"form_name" : "form2",
"event_date" : "2019-05-09T00:00:00Z",
"case_note" : "this is another case note"
}
]
}
my output filter
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "cases"
manage_template => false
}
}
Is this possible? thanks

this ruby filter created the array
ruby {
code => '
event.set("conversations", [Hash[
"publish_event_id", event.get("publish_event_id"),
"form_name", event.get("form_name"),
"event_date", event.get("event_date"),
"case_note", event.get("case_note")
]])
'
}
for the output was resolved by
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "cases"
document_id => "%{case_number}"
action => "update"
doc_as_upsert => true
script => "
boolean recordExists = false;
for (int i = 0; i < ctx._source.conversations.length; i++)
{
if(ctx._source.conversations[i].publish_event_id == params.event.get('conversations')[0].publish_event_id)
{
recordExists = true;
}
}
if(!recordExists){
ctx._source.conversations.add(params.event.get('conversations')[0]);
}
"
manage_template => false
}
}

Related

I have json data i need search `unique` if key exist or not

I have JSON data I need search unique if the key exists or not.
[
{
"key1" => []
},
{
"key" => []
},
{
"unique" => []
}
]
I can use loop but need an efficient way to check unique exist or not
You'll need to iterate through the array either way.
# You'll get found item or `nil`
data.find { |item| item.key?('unique') }
# You'll get `true` or `false`
data.any? { |item| item.key?('unique') }
Btw better to use a hash as an input instead of an array:
data = {
"key1" => [],
"key" => [],
"unique" => []
}
data.key?('unique')
=> true

Map ruby hash to logstash configuration file

I have the following ruby hash.
config = {
'output' => {
'elasticsearch' => {
'hosts' => ['localhost:9200']
}
}
}
Which I'm trying to represent as a logstash configuration file (https://www.elastic.co/guide/en/logstash/current/configuration.html). In this case, something that looks similar to this.
output {
elasticsearch { hosts => ["localhost:9200"] }
}
I've tried using map which is close, but "elasticsearch" should not have "=>" and "elasticsearch" and "hosts" should not be quoted.
puts config.map{|k, v| "#{k} #{v}"}.join('&')
output {"elasticsearch"=>{"hosts"=>["localhost:9200"]}}
I've also tried converting to json and using gsub, but in this case I need to unindent the string and "output" and "elasticsearch" should not be quoted.
puts JSON.pretty_generate(config).gsub(/^[{}]$/, "")
.gsub(": {", " {")
.gsub(": ", " => ")[1..-2]
"output" {
"elasticsearch" {
"hosts" => [
"localhost:9200"
]
}
}
While each implementation is close, it's still off by a bit. Is there a simple way to achieve this?
The Logstash config format isn't standard JSON or anything. It may be best to just write a serializer for it. I took a quick stab at it:
def serialize_config(config, tabs = 0)
clauses = []
config.each do |key, val|
case val
when Hash
clauses << format("%s {\n%s%s}", key, serialize_config(val, tabs + 1), "\t" * tabs)
else
clauses << format("%s => %s", key, val.inspect)
end
end
clauses.map {|c| format("%s%s\n", "\t" * tabs, c) }.join
end
config = {
'output' => {
'elasticsearch' => {
'hosts' => ['localhost:9200']
},
'ruby' => {
"code" => "event.cancel if rand <= 0.90"
}
}
}
puts serialize_config(config)
When gives output:
output {
elasticsearch {
hosts => ["localhost:9200"]
}
ruby {
code => "event.cancel if rand <= 0.90"
}
}
You'd want to check it against more complex Logstash configs, though.

logstash kv filter, converting strings to integers using dynamic mapping

I have a log with a format similar to:
name=johnny amount=30 uuid=2039248934
The problem is I am using this parser on multiple log files with each basically containing numerous kv pairs.
Is there a way to recognize when values are integers and cast them as such without having to use mutate on every single key value pair?(Rather than a string)
I found this link but it was very vague in where the template json file was suppose to go and how I was to go about using it.
Can kv be told to auto-detect numeric values and emit them as numeric JSON values?
You can use ruby plugin to do it.
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event['message'].split(' ');
for field in fieldArray
name = field.split('=')[0];
value = field.split('=')[1];
if value =~ /\A\d+\Z/
event[name] = value.to_i
else
event[name] = value
end
end
"
}
}
output {
stdout { codec => rubydebug }
}
First, split the message to an array by SPACE.
Then, for each k,v mapping, check whether the value is numberic, if YES, convert it to Integer.
Here is the sample output for your input:
{
"message" => "name=johnny amount=30 uuid=2039248934",
"#version" => "1",
"#timestamp" => "2015-06-25T08:24:39.755Z",
"host" => "BEN_LIM",
"name" => "johnny",
"amount" => 30,
"uuid" => 2039248934
}
Update Solution for Logstash 5:
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event['message'].split(' ');
for field in fieldArray
name = field.split('=')[0];
value = field.split('=')[1];
if value =~ /\A\d+\Z/
event.set(name, value.to_i)
else
event.set(name, value)
end
end
"
}
}
output {
stdout { codec => rubydebug }
}
Note, if you decide to upgrade to Logstash 5, there are some breaking changes:
https://www.elastic.co/guide/en/logstash/5.0/breaking-changes.html
In particular, it is the event that needs to be modified to use either event.get or event.set. Here is what I used to get it working (based on Ben Lim's example):
input {
stdin {}
}
filter {
ruby {
code => "
fieldArray = event.get('message').split(' ');
for field in fieldArray
name = field.split('=')[0];
value = field.split('=')[1];
if value =~ /\A\d+\Z/
event.set(name, value.to_i)
else
event.set(name, value)
end
end
"
}
}
output {
stdout { codec => rubydebug }
}

Duplicate data in kibana from logstash

I'm trying display some Mongo data that I've been collecting using logstash using the Mongostat tool. It displays things with a suffix like "b", "k", "g" to signify byte, kilobyte, gigabyte, which is fine if I'm just reading the output, but I want to throw this into kibana and display it in a graphical format to see trends.
I've done this with several other log files and everything is fine. When I use a grok filter everything is fine but I've added a Ruby filter and now data seems to be duplicated in all fields other than the logstash generated fields and my new field created in my Ruby filter.
Here is the relevant parts of my conf file:
input {
file {
path => "/var/log/mongodb/mongostat.log"
type => "mongostat"
start_position => "end"
}
}
filter {
if [type] == "mongostat" {
grok {
patterns_dir => "/opt/logstash/patterns"
match => ["message","###a bunch of filtering that i know works###"]
add_tag => "mongostat"
}
if [mongoMappedQualifier] == 'b' {
ruby {
code => "event['mongoMappedKB'] = event['mongoMapped'].to_f / 1024"
}
}
if [mongoMappedQualifier] == 'k' {
ruby {
code => "event['mongoMappedKB'] = event['mongoMapped'].to_f * 1"
}
}
if [mongoMappedQualifier] == 'm' {
ruby {
code => "event['mongoMappedKB'] = event['mongoMapped'].to_f * 1024"
}
}
if [mongoMappedQualifier] == 'g' {
ruby {
code => "event['mongoMappedKB'] = event['mongoMapped'].to_f * 1048576"
}
}
}
}
output {
if [type] == "mongostat" {
redis {
host => "redis"
data_type => "list"
key => "logstash-mongostat"
}
}
}
Any idea why or how this can be fixed?

How do I extract values from nested JSON?

After parsing some JSON:
data = JSON.parse(data)['info']
puts data
I get:
[
{
"title"=>"CEO",
"name"=>"George",
"columns"=>[
{
"display_name"=> "Salary",
"value"=>"3.85",
}
, {
"display_name"=> "Bonus",
"value"=>"994.19",
}
, {
"display_name"=> "Increment",
"value"=>"8.15",
}
]
}
]
columns has nested data in itself.
I want to save the data in a database or CSV file.
title, name, value_Salary, value_Bonus, value_increment
But I'm not concerned about getting display_name, so just the values of first of columns, second of columns data, etc.
Ok I tried data.map after converting to hash & hash.flatten could find a way out.. .map{|x| x['columns']}
.map {|s| s["value"]}
tried to get the values atleast separately - but couldnt...
This is a simple problem, and resolves down to a couple nested map blocks.
Here's the data retrieved from JSON, plus an extra row to demonstrate how easy it is to handle a more complex JSON response:
data = [
{
"title" => "CEO",
"name" => "George",
"columns" => [
{
"display_name" => "Salary",
"value" => "3.85",
},
{
"display_name" => "Bonus",
"value" => "994.19",
},
{
"display_name" => "Increment",
"value" => "8.15",
}
]
},
{
"title" => "CIO",
"name" => "Fred",
"columns" => [
{
"display_name" => "Salary",
"value" => "3.84",
},
{
"display_name" => "Bonus",
"value" => "994.20",
},
{
"display_name" => "Increment",
"value" => "8.15",
}
]
}
]
Here's the code:
records = data.map { |record|
title, name = record.values_at('title', 'name')
values = record['columns'].map{ |column| column['value'] }
[title, name, *values]
}
Here's the resulting data structure, an array of arrays:
records
# => [["CEO", "George", "3.85", "994.19", "8.15"],
# ["CIO", "Fred", "3.84", "994.20", "8.15"]]
Saving it into a database or CSV is left for you to figure out, but Ruby's CSV class makes it trivial to write a file, and an ORM like Sequel makes it really easy to insert the data into a database.

Resources