Search JSON Object with Ruby for Keyword - ruby

Trying to return the string "image-2016-05-05+19%3A13%3A49.058890.jpg" from of this relatively complex JSON object:
{
"Type": "Notification",
"MessageId": "e3a008de-7053-530e-b2b4-4778704d30a0",
"TopicArn": "arn:aws:sns:us-west-2:xxxx:xxxx",
"Subject": "Amazon S3 Notification",
"Message": "{\"Records\":[{\"eventVersion\":\"2.0\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"us-west-2\",\"eventTime\":\"2016-05-06T02:13:50.030Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AIDAIZ6VOIJWE82389JSE\"},\"requestParameters\":{\"sourceIPAddress\":\"0.0.0.0\"},\"responseElements\":{\"x-amz-request-id\":\"F819FA912DBD16\",\"x-amz-id-2\":\"7oOWHPhWsgjBW6XSj8DiSj8Sj8801LKJn5NLRn8JmYsNxJXKWqlkjDFL092zHuWYZn7pIKcRwX6g=\"},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"image-notification\",\"bucket\":{\"name\":\"project\",\"ownerIdentity\":{\"principalId\":\"A17D10FQZ\"},\"arn\":\"arn:aws:s3:::project\"},\"object\":{\"key\":\"image-2016-05-05+19%3A13%3A49.058890.jpg\",\"size\":54098,\"eTag\":\"fbc4bakjf8asdj8f890ece3474c55974927c\",\"sequencer\":\"00572LKJDF389238CA7B04BD\"}}}]}",
"Timestamp": "2016-05-06T02:13:50.126Z",
"SignatureVersion": "1",
"Signature": "Lao5PoEchryYf1slxxxlyI0GB2Xrv03VFC+4JVlji0y1El+rQGL837PYRHdj2m/dGD9/ynJxPhIBWcoJxX4D7MBsNqaZXilqJtjp+t8Rku0avErgWQVQG+rjZcdVbSU12DI/Ku0v9LhYg2/Js+ofYGPZH9U4C+Jfup5wjgHXah4BGNmF3TO+oq08Y56edhMxV25URDcU+z5aaVW2sK2tlnynSNzLuAF5TlKuuLmYr3Buci83FkU46l6Bz/ENba1BlGGqT8P+ljdf9092z+iP42T9qUzj1HL9p9SjEDIam/03n1039JS01gbPpgdo6/2Z6kZK3LvrVRBzI0voFitLg==",
"SigningCertURL": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-bbxxx750dd426323fafd95ee9390147a5624348ee.pem",
"UnsubscribeURL": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:332531341234:xxxx:0e43fsSDF40e-d4a7-46c0-95ab-4fd11739267b"
}
Without having to do this:
#key = JSON.parse(#request[:Message])["Records"][0]["s3"]["object"]["key"]
Is there a way to parse and search through this JSON object to return the aforementioned string by providing a keyword such as "image"?

You could use hashie deepLocate
request = JSON.parse(#request)
request.extend(Hashie::Extensions::DeepLocate)
request.deep_locate -> (key, value, object) { key == :key && value.include?("image") }
#=> { :key => "image-2016-05-05+19%3A13%3A49.058890.jpg" }
Apart from using the value to search, if you know the key, you could do this to find the deeply nested value of that key.
def nested_hash_value(obj,key)
if obj.respond_to?(:key?) && obj.key?(key)
obj[key]
elsif obj.respond_to?(:each)
r = nil
obj.find{ |*a| r=nested_hash_value(a.last,key) }
r
end
end
p nested_hash_value(JSON.parse(#request),:key)
#=> image-2016-05-05+19%3A13%3A49.058890.jpg

Related

DRY Strategy for looping over unknown levels of nested objects

My scenario is based on Gmail API.
I've learned that email messages can have their message parts deeply or shallowly nested based upon varying factors, but mostly the presence of attachments.
I'm using the Google API Ruby Client gem, so I'm not working with JSON, I'm getting objects with all the same information, but I think the JSON representation makes it easier to understand my issue.
A simple message JSON response looks like this (one parts array with 2 hashes inside it):
{
"id": "175b418b1ff69896",
"snippet": "COVID-19: Resources to help your business manage through uncertainty 20 Liters 500 PEOPLE FOUND YOU ON GOOGLE Here are the top search queries used to find you: 20 liters used by 146 people volunteer",
"payload": {
"parts": [
{
"mimeType": "text/plain",
"body": {
"data": "Hey, you found the body of the email! I want this!"
}
},
{
"mimeType": "text/html",
"body": {
"data": "<div>I actually don't want this</div>"
}
}
]
}
}
The value I want is not that hard to get:
response.payload.parts.each do |part|
#body_data = part.body.data if part.mime_type == 'text/plain'
end
BUT The JSON response of a more complex email message with attachments looks something like this (now parts nests itself 3 levels deep):
{
"id": "175aee26de8209d2",
"snippet": "snippet text...",
"payload": {
"parts": [
{
"mimeType": "multipart/related",
"parts": [
{
"mimeType": "multipart/alternative",
"parts": [
{
"mimeType": "text/plain",
"body": {
"data": "hey, you found me! This is what I want!!"
}
},
{
"mimeType": "text/html",
"body": {
"data": "<div>I actually don't want this one.</div>"
}
}
]
},
{
"mimeType": "image/jpeg"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/jpeg"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/png"
}
]
},
{
"mimeType": "application/pdf"
}
]
}
}
And looking at a few other messages, the object can vary from 1 to 5 levels (maybe more) of parts
I need to loop over an unknown number of parts and then loop over an unknown number of nested parts and the repeat this again until I reach the bottom, hopefully finding the thing I want.
Here's my best attempt:
def trim_response(response)
# remove headers I don't care about
response.payload.headers.keep_if { |header| #valuable_headers.include? header.name }
# remove parts I don't care about
response.payload.parts.each do |part|
# parts can be nested within parts, within parts, within...
if part.mime_type == #valuable_mime_part && part.body.present?
#body_data = part.body.data
break
elsif part.parts.present?
# there are more layers down
find_body(part)
end
end
end
def find_body(part)
part.parts.each do |sub_part|
if sub_part.mime_type == #valuable_mime_part && sub_part.body.present?
#body_data = sub_part.body.data
break
elsif sub_part.parts.present?
# there are more layers down
######### THIS FEELS BAD!!! ###########
find_body(sub_part)
end
end
end
Yep, there's a method calling itself. I know, that's why I'm here.
This does work, I've tested it on a few dozen messages, but... there has to be a better, DRY-er way to do this.
How do I recursively loop and then move down a level and loop again in a DRY fashion when I don't know how deep the nesting goes?
No need to go through all this pain. Just keep diving in the parts dictionary until you find the first value where there is no parts anymore. At this moment you have the final parts in your parts variable.
Code:
reponse = {"id" => "175aee26de8209d2","snippet" => "snippet text...","payload" => {"parts" => [{"mimeType" => "multipart/related","parts" => [{"mimeType" => "multipart/alternative","parts" => [{"mimeType" => "text/plain","body" => {"data" => "hey, you found me! This is what I want!!"}},{"mimeType" => "text/html","body" => {"data" => "<div>I actually don't want this one.</div>"}}]},{"mimeType" => "image/jpeg"}]},{"mimeType" => "application/pdf"}]}}
parts = reponse["payload"]
parts = (parts["parts"].send("first") || parts["parts"]) while parts["parts"]
data = parts["body"]["data"]
puts data
Output:
hey, you found me! This is what I want!!
You can compute the desired result using recursion.
def find_it(h, top_key, k1, k2, k3)
return nil unless h.key?(top_key)
recurse(h[top_key], k1, k2, k3)
end
def recurse(h, k1, k2, k3)
return nil unless h.key?(k1)
h[k1].each do |g|
v = g.dig(k2,k3) || recurse(g, k1 , k2, k3)
return v unless v.nil?
end
nil
end
See Hash#dig.
Let h1 and h2 equal the two hashes given in the example1. Then:
find_it(h1, :payload, :parts, :body, :data)
#=> "Hey, you found the body of the email! I want this!"
find_it(h2, :payload, :parts, :body, :data)
#=> "hey, you found me! This is what I want!!"
1. The hash h[:payload][:parts].last #=> { "mimeType": "application/pdf" } appears to contain hidden characters that are causing a problem. I therefore removed that hash from h2.

Logstash filter out values with null values for a key in a nested json array

I have quite an extensive Logstash pipeline ending in a Json as such:
{
"keyA": 1,
"keyB": "sample",
"arrayKey": [
{
"key": "data"
},
{
"key": null
}
]
}
What I want to achieve is to filter "arrayKey" and remove objects within with value for "key" is null.
Tried this to no luck:
filter {
ruby {
code => "
event.get('arrayKey').each do |key|
[key].delete_if do |keyCandidate|
if [keyCandidate][key] != nil
true
end
end
end
"
}
}
This gives no implicit converter found from |hash|:|Int| error. How do I achieve this? Is there and easier way to do this?
As Aleksei pointed out, you can create a copy of the array that does not contain entries where [key] is null using reject. You have to use event.set to overwrite the inital value of [arrayKey]
ruby {
code => '
a = event.get("arrayKey")
if a
event.set("arrayKey", a.reject { |x| x["key"] == nil })
end
'
}

Logstash escape JSON Keys

I have multiple systems that send data as JSON Request Body. This is my simple config file.
input {
http {
port => 5001
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
}
}
In most cases this works just fine. I can look at the json data with kibana.
In some cases the JSON will not be processed. It hase something to do with the JSON escaping. For example: If a key contains a '.', the JSON will not be processed.
I can not control the JSON. Is there a way to escape these characters in a JSON key?
Update: As mentioned in the comments I'll give an example of a JSON String (Content is altered. But I,ve tested the JSON String. It has the same behavior as the original.):
{
"http://example.com": {
"a": "",
"b": ""
}
}
My research brings me back to my post, finally.
Before Elasticsearch 2.0 dots in the key were allowed. Since version 2.0 this is not the case anymore.
One user in the logstash forum developed a ruby script that takes care of the dots in json keys:
filter {
ruby {
init => "
def remove_dots hash
new = Hash.new
hash.each { |k,v|
if v.is_a? Hash
v = remove_dots(v)
end
new[ k.gsub('.','_') ] = v
if v.is_a? Array
v.each { |elem|
if elem.is_a? Hash
elem = remove_dots(elem)
end
new[ k.gsub('.','_') ] = elem
} unless v.nil?
end
} unless hash.nil?
return new
end
"
code => "
event.instance_variable_set(:#data,remove_dots(event.to_hash))
"
}
}
All credits go to #hanzmeier1234 (Field name cannot contain ‘.’)

How to add new key/value pair to existing JSON object in Ruby

How could I append a new key/value pair to an existing JSON object in Ruby?
My output is:
{
"2d967df3-ee07-4e40-8f65-7bbff59bbb7e": {
"name": "Book1",
"author": "Author1"
}
}
I want to achieve something like this when I add a new key/value pair:
{
"2d967df3-ee07-4e40-8f65-7bbff59bbb7e": {
"name": "Book1",
"author": "Author1"
},
"c55a3632-9bed-4a41-ae40-c1abfe0f332a": {
"name": "Book2",
"author": "Author2"
}
}
This is my method to write to a JSON file:
def create_book(name, author)
tempHash = {
SecureRandom.uuid => {
"name" => name,
"author" => author
}
}
File.open("./books/book.json","w") do |f|
f.write(JSON.pretty_generate(tempHash))
end
end
To clarify, I need to add a second entry to the original file. I tried using append (<<), and that's where my code fails:
file = File.read("./books/book.json")
data_hash = JSON.parse(file)
newJson = data_hash << tempHash
How could I append a new key/value pair to existing JSON object in Ruby?
If you want to add it to an existing file then you should read the JSON first, extract data from it, then add a new hash to an array.
Maybe something like this will solve your problem:
def create_book(name, author)
tempHash = {
SecureRandom.uuid => {
"name" => name,
"author" => author
}
}
data_from_json = JSON[File.read("./books/book.json")]
data_from_json = [data_from_json] if data_from_json.class != Array
File.open("./books/book.json","w") do |f|
f.write(JSON.pretty_generate(data_from_json << tempHash))
end
end
There are also some other ways like manipulating the JSON as a common string but for safety you should extract the data and then create a new JSON file.
If you need the new key/value pair to be in the same JSON element as the previous data, instead of shoveling (<<) the hashes together, merge them.
Additionally this can allow you to put the new key/value pair in the start of the element or in the end, by flipping which hash you merge first.
So, take Maxim's solution from Apr 14 '15, but modify to merge the two hashes together.
data_from_json = JSON[http://File.read("./books/book.json")]
File.open("./books/book.json","w") do |f|
f.write(JSON.pretty_generate([data_from_json.merge(tempHash)])
end

Delete nested hash according to key => value

I have this hash:
response = '{"librairies":[{"id":1,"books":[{"id":1,"qty":1},{"id":2,"qty":3}]},{"id":2,"books":[{"id":1,"qty":0},{"id":2,"qty":3}]}]}'
in which I'd like to delete every librairies where, at least, one of the book quantity is null.
For instance, with this given response, I'd expect this return:
'{"librairies":[{"id":1,"books":[{"id":1,"qty":1},{"id":2,"qty":3}]}]}'
I've tried this:
parsed = JSON.parse(response)
parsed["librairies"].each do |library|
library["books"].each do |book|
parsed.delete(library) if book["qty"] == 0
end
end
but this returns the exact same response hash, without having deleted the second library (the one with id => 2).
You can use Array#delete_if and Enumerable#any? for this
# Move through each array element with delete_if
parsed["librairies"].delete_if do |library|
# evaluates to true if any book hash in the library
# has a "qty" value of 0
library["books"].any? { |book| book["qty"] == 0 }
end
Hope this helps
To avoid changing the hash parsed, you could do the following.
Firstly, let's format parsed so we can see what we're dealing with:
parsed = { "libraries"=>[ { "id"=>1,
"books"=>[ { "id"=>1, "qty"=>1 },
{ "id"=>2, "qty"=>3 } ]
},
{ "id"=>2,
"books"=>[ { "id"=>1, "qty"=>0 },
{ "id"=>2, "qty"=>3 } ]
}
]
}
Later I want to show that parsed has not been changed when we create the new hash. An easy way of doing that is to compute a hash code on parsed before and after, and see if it changes. (While it's not 100% certain that different hashes won't have the same hash code, here it's not something to lose sleep over.)
parsed.hash
#=> 852445412783960729
We first need to make a "deep copy" of parsed so that changes to the copy will not affect parsed. One way of doing that is to use the Marshal module:
new_parsed = Marshal.load(Marshal.dump(parsed))
We can now modify the copy as required:
new_parsed["libraries"].reject! { |h| h["books"].any? { |g| g["qty"].zero? } }
#=> [ { "id"=>1,
# "books"=>[ { "id"=>1, "qty"=>1 },
# { "id"=>2, "qty"=>3 }
# ]
# }
# ]
new_parsed # => { "libraries"=>[ { "id"=>1,
"books"=>[ { "id"=>1, "qty"=>1},
{ "id"=>2, "qty"=>3}
]
}
]
}
And we confirm the original hash was not changed:
parsed.hash
#=> 852445412783960729

Resources