Logstash filter out values with null values for a key in a nested json array - ruby

I have quite an extensive Logstash pipeline ending in a Json as such:
{
"keyA": 1,
"keyB": "sample",
"arrayKey": [
{
"key": "data"
},
{
"key": null
}
]
}
What I want to achieve is to filter "arrayKey" and remove objects within with value for "key" is null.
Tried this to no luck:
filter {
ruby {
code => "
event.get('arrayKey').each do |key|
[key].delete_if do |keyCandidate|
if [keyCandidate][key] != nil
true
end
end
end
"
}
}
This gives no implicit converter found from |hash|:|Int| error. How do I achieve this? Is there and easier way to do this?

As Aleksei pointed out, you can create a copy of the array that does not contain entries where [key] is null using reject. You have to use event.set to overwrite the inital value of [arrayKey]
ruby {
code => '
a = event.get("arrayKey")
if a
event.set("arrayKey", a.reject { |x| x["key"] == nil })
end
'
}

Related

Logstash escape JSON Keys

I have multiple systems that send data as JSON Request Body. This is my simple config file.
input {
http {
port => 5001
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
}
}
In most cases this works just fine. I can look at the json data with kibana.
In some cases the JSON will not be processed. It hase something to do with the JSON escaping. For example: If a key contains a '.', the JSON will not be processed.
I can not control the JSON. Is there a way to escape these characters in a JSON key?
Update: As mentioned in the comments I'll give an example of a JSON String (Content is altered. But I,ve tested the JSON String. It has the same behavior as the original.):
{
"http://example.com": {
"a": "",
"b": ""
}
}
My research brings me back to my post, finally.
Before Elasticsearch 2.0 dots in the key were allowed. Since version 2.0 this is not the case anymore.
One user in the logstash forum developed a ruby script that takes care of the dots in json keys:
filter {
ruby {
init => "
def remove_dots hash
new = Hash.new
hash.each { |k,v|
if v.is_a? Hash
v = remove_dots(v)
end
new[ k.gsub('.','_') ] = v
if v.is_a? Array
v.each { |elem|
if elem.is_a? Hash
elem = remove_dots(elem)
end
new[ k.gsub('.','_') ] = elem
} unless v.nil?
end
} unless hash.nil?
return new
end
"
code => "
event.instance_variable_set(:#data,remove_dots(event.to_hash))
"
}
}
All credits go to #hanzmeier1234 (Field name cannot contain ‘.’)

Search JSON Object with Ruby for Keyword

Trying to return the string "image-2016-05-05+19%3A13%3A49.058890.jpg" from of this relatively complex JSON object:
{
"Type": "Notification",
"MessageId": "e3a008de-7053-530e-b2b4-4778704d30a0",
"TopicArn": "arn:aws:sns:us-west-2:xxxx:xxxx",
"Subject": "Amazon S3 Notification",
"Message": "{\"Records\":[{\"eventVersion\":\"2.0\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"us-west-2\",\"eventTime\":\"2016-05-06T02:13:50.030Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AIDAIZ6VOIJWE82389JSE\"},\"requestParameters\":{\"sourceIPAddress\":\"0.0.0.0\"},\"responseElements\":{\"x-amz-request-id\":\"F819FA912DBD16\",\"x-amz-id-2\":\"7oOWHPhWsgjBW6XSj8DiSj8Sj8801LKJn5NLRn8JmYsNxJXKWqlkjDFL092zHuWYZn7pIKcRwX6g=\"},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"image-notification\",\"bucket\":{\"name\":\"project\",\"ownerIdentity\":{\"principalId\":\"A17D10FQZ\"},\"arn\":\"arn:aws:s3:::project\"},\"object\":{\"key\":\"image-2016-05-05+19%3A13%3A49.058890.jpg\",\"size\":54098,\"eTag\":\"fbc4bakjf8asdj8f890ece3474c55974927c\",\"sequencer\":\"00572LKJDF389238CA7B04BD\"}}}]}",
"Timestamp": "2016-05-06T02:13:50.126Z",
"SignatureVersion": "1",
"Signature": "Lao5PoEchryYf1slxxxlyI0GB2Xrv03VFC+4JVlji0y1El+rQGL837PYRHdj2m/dGD9/ynJxPhIBWcoJxX4D7MBsNqaZXilqJtjp+t8Rku0avErgWQVQG+rjZcdVbSU12DI/Ku0v9LhYg2/Js+ofYGPZH9U4C+Jfup5wjgHXah4BGNmF3TO+oq08Y56edhMxV25URDcU+z5aaVW2sK2tlnynSNzLuAF5TlKuuLmYr3Buci83FkU46l6Bz/ENba1BlGGqT8P+ljdf9092z+iP42T9qUzj1HL9p9SjEDIam/03n1039JS01gbPpgdo6/2Z6kZK3LvrVRBzI0voFitLg==",
"SigningCertURL": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-bbxxx750dd426323fafd95ee9390147a5624348ee.pem",
"UnsubscribeURL": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:332531341234:xxxx:0e43fsSDF40e-d4a7-46c0-95ab-4fd11739267b"
}
Without having to do this:
#key = JSON.parse(#request[:Message])["Records"][0]["s3"]["object"]["key"]
Is there a way to parse and search through this JSON object to return the aforementioned string by providing a keyword such as "image"?
You could use hashie deepLocate
request = JSON.parse(#request)
request.extend(Hashie::Extensions::DeepLocate)
request.deep_locate -> (key, value, object) { key == :key && value.include?("image") }
#=> { :key => "image-2016-05-05+19%3A13%3A49.058890.jpg" }
Apart from using the value to search, if you know the key, you could do this to find the deeply nested value of that key.
def nested_hash_value(obj,key)
if obj.respond_to?(:key?) && obj.key?(key)
obj[key]
elsif obj.respond_to?(:each)
r = nil
obj.find{ |*a| r=nested_hash_value(a.last,key) }
r
end
end
p nested_hash_value(JSON.parse(#request),:key)
#=> image-2016-05-05+19%3A13%3A49.058890.jpg

Ruby mongoid aggregation return object

I am doing an mongodb aggregation using mongoid, using ModleName.collection.aggregate(pipeline) . The value returned is an array and not a Mongoid::Criteria, so if a do a first on the array, I get the first element which is of the type BSON::Document instead of ModelName. As a result, I am unable to use it as a model.
Is there a method to return a criteria instead of an array from the aggregation, or convert a bson document to a model instance?
Using mongoid (4.0.0)
I've been struggling with this on my own too. I'm afraid you have to build your "models" on your own. Let's take an example from my code:
class Searcher
# ...
def results(page: 1, per_page: 50)
pipeline = []
pipeline <<
"$match" => {
title: /#{#params['query']}/i
}
}
geoNear = {
"near" => coordinates,
"distanceField" => "distance",
"distanceMultiplier" => 3959,
"num" => 500,
"spherical" => true,
}
pipeline << {
"$geoNear" => geoNear
}
count = aggregate(pipeline).count
pipeline << { "$skip" => ((page.to_i - 1) * per_page) }
pipeline << { "$limit" => per_page }
places_hash = aggregate(pipeline)
places = places_hash.map { |attrs| Offer.new(attrs) { |o| o.new_record = false } }
# ...
places
end
def aggregate(pipeline)
Offer.collection.aggregate(pipeline)
end
end
I've omitted a lot of code from original project, just to present the way what I've been doing.
The most important thing here was the line:
places_hash.map { |attrs| Offer.new(attrs) { |o| o.new_record = false } }
Where both I'm creating an array of Offers, but additionally, manually I'm setting their new_record attribute to false, so they behave like any other documents get by simple Offer.where(...).
It's not beautiful, but it worked for me, and I could take the best of whole Aggregation Framework!
Hope that helps!

Delete nested hash according to key => value

I have this hash:
response = '{"librairies":[{"id":1,"books":[{"id":1,"qty":1},{"id":2,"qty":3}]},{"id":2,"books":[{"id":1,"qty":0},{"id":2,"qty":3}]}]}'
in which I'd like to delete every librairies where, at least, one of the book quantity is null.
For instance, with this given response, I'd expect this return:
'{"librairies":[{"id":1,"books":[{"id":1,"qty":1},{"id":2,"qty":3}]}]}'
I've tried this:
parsed = JSON.parse(response)
parsed["librairies"].each do |library|
library["books"].each do |book|
parsed.delete(library) if book["qty"] == 0
end
end
but this returns the exact same response hash, without having deleted the second library (the one with id => 2).
You can use Array#delete_if and Enumerable#any? for this
# Move through each array element with delete_if
parsed["librairies"].delete_if do |library|
# evaluates to true if any book hash in the library
# has a "qty" value of 0
library["books"].any? { |book| book["qty"] == 0 }
end
Hope this helps
To avoid changing the hash parsed, you could do the following.
Firstly, let's format parsed so we can see what we're dealing with:
parsed = { "libraries"=>[ { "id"=>1,
"books"=>[ { "id"=>1, "qty"=>1 },
{ "id"=>2, "qty"=>3 } ]
},
{ "id"=>2,
"books"=>[ { "id"=>1, "qty"=>0 },
{ "id"=>2, "qty"=>3 } ]
}
]
}
Later I want to show that parsed has not been changed when we create the new hash. An easy way of doing that is to compute a hash code on parsed before and after, and see if it changes. (While it's not 100% certain that different hashes won't have the same hash code, here it's not something to lose sleep over.)
parsed.hash
#=> 852445412783960729
We first need to make a "deep copy" of parsed so that changes to the copy will not affect parsed. One way of doing that is to use the Marshal module:
new_parsed = Marshal.load(Marshal.dump(parsed))
We can now modify the copy as required:
new_parsed["libraries"].reject! { |h| h["books"].any? { |g| g["qty"].zero? } }
#=> [ { "id"=>1,
# "books"=>[ { "id"=>1, "qty"=>1 },
# { "id"=>2, "qty"=>3 }
# ]
# }
# ]
new_parsed # => { "libraries"=>[ { "id"=>1,
"books"=>[ { "id"=>1, "qty"=>1},
{ "id"=>2, "qty"=>3}
]
}
]
}
And we confirm the original hash was not changed:
parsed.hash
#=> 852445412783960729

Elegantly creating a hash from an array

I currently have some Ruby code that creates output like this (after conversion to JSON):
"days": [
{
"Jul-22": ""
},
{
"Aug-19": ""
}
],
What I want is output like this:
"days": {
"Jul-22": "",
"Aug-19": ""
},
Here is my code:
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).collect do |noteworthy_day|
{ noteworthy_day.date.to_s(:trends_id) => "" }
end
In other words I want a hash instead of an array of hashes. Here's my ugly solution:
days = {}
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).each do |noteworthy_day|
days[noteworthy_day.date.to_s(:trends_id)] = ""
end
days
That seems very unrubylike, though. Can someone help me do this more efficiently?
Hash[
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).collect { |noteworthy_day|
[noteworthy_day.date.to_s(:trends_id), ""]
}
]
Or...
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).each_with_object(Hash.new) { |noteworthy_day, ndays|
ndays[noteworthy_day] = ""
}
This is a problem tailor made for Enumerable#inject
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).inject({}) do |hash, noteworthy_day|
hash[noteworthy_day.date.to_s(:trends_id)] = ''
hash
end

Resources