DRY Strategy for looping over unknown levels of nested objects - ruby

My scenario is based on Gmail API.
I've learned that email messages can have their message parts deeply or shallowly nested based upon varying factors, but mostly the presence of attachments.
I'm using the Google API Ruby Client gem, so I'm not working with JSON, I'm getting objects with all the same information, but I think the JSON representation makes it easier to understand my issue.
A simple message JSON response looks like this (one parts array with 2 hashes inside it):
{
"id": "175b418b1ff69896",
"snippet": "COVID-19: Resources to help your business manage through uncertainty 20 Liters 500 PEOPLE FOUND YOU ON GOOGLE Here are the top search queries used to find you: 20 liters used by 146 people volunteer",
"payload": {
"parts": [
{
"mimeType": "text/plain",
"body": {
"data": "Hey, you found the body of the email! I want this!"
}
},
{
"mimeType": "text/html",
"body": {
"data": "<div>I actually don't want this</div>"
}
}
]
}
}
The value I want is not that hard to get:
response.payload.parts.each do |part|
#body_data = part.body.data if part.mime_type == 'text/plain'
end
BUT The JSON response of a more complex email message with attachments looks something like this (now parts nests itself 3 levels deep):
{
"id": "175aee26de8209d2",
"snippet": "snippet text...",
"payload": {
"parts": [
{
"mimeType": "multipart/related",
"parts": [
{
"mimeType": "multipart/alternative",
"parts": [
{
"mimeType": "text/plain",
"body": {
"data": "hey, you found me! This is what I want!!"
}
},
{
"mimeType": "text/html",
"body": {
"data": "<div>I actually don't want this one.</div>"
}
}
]
},
{
"mimeType": "image/jpeg"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/jpeg"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/png"
}
]
},
{
"mimeType": "application/pdf"
}
]
}
}
And looking at a few other messages, the object can vary from 1 to 5 levels (maybe more) of parts
I need to loop over an unknown number of parts and then loop over an unknown number of nested parts and the repeat this again until I reach the bottom, hopefully finding the thing I want.
Here's my best attempt:
def trim_response(response)
# remove headers I don't care about
response.payload.headers.keep_if { |header| #valuable_headers.include? header.name }
# remove parts I don't care about
response.payload.parts.each do |part|
# parts can be nested within parts, within parts, within...
if part.mime_type == #valuable_mime_part && part.body.present?
#body_data = part.body.data
break
elsif part.parts.present?
# there are more layers down
find_body(part)
end
end
end
def find_body(part)
part.parts.each do |sub_part|
if sub_part.mime_type == #valuable_mime_part && sub_part.body.present?
#body_data = sub_part.body.data
break
elsif sub_part.parts.present?
# there are more layers down
######### THIS FEELS BAD!!! ###########
find_body(sub_part)
end
end
end
Yep, there's a method calling itself. I know, that's why I'm here.
This does work, I've tested it on a few dozen messages, but... there has to be a better, DRY-er way to do this.
How do I recursively loop and then move down a level and loop again in a DRY fashion when I don't know how deep the nesting goes?

No need to go through all this pain. Just keep diving in the parts dictionary until you find the first value where there is no parts anymore. At this moment you have the final parts in your parts variable.
Code:
reponse = {"id" => "175aee26de8209d2","snippet" => "snippet text...","payload" => {"parts" => [{"mimeType" => "multipart/related","parts" => [{"mimeType" => "multipart/alternative","parts" => [{"mimeType" => "text/plain","body" => {"data" => "hey, you found me! This is what I want!!"}},{"mimeType" => "text/html","body" => {"data" => "<div>I actually don't want this one.</div>"}}]},{"mimeType" => "image/jpeg"}]},{"mimeType" => "application/pdf"}]}}
parts = reponse["payload"]
parts = (parts["parts"].send("first") || parts["parts"]) while parts["parts"]
data = parts["body"]["data"]
puts data
Output:
hey, you found me! This is what I want!!

You can compute the desired result using recursion.
def find_it(h, top_key, k1, k2, k3)
return nil unless h.key?(top_key)
recurse(h[top_key], k1, k2, k3)
end
def recurse(h, k1, k2, k3)
return nil unless h.key?(k1)
h[k1].each do |g|
v = g.dig(k2,k3) || recurse(g, k1 , k2, k3)
return v unless v.nil?
end
nil
end
See Hash#dig.
Let h1 and h2 equal the two hashes given in the example1. Then:
find_it(h1, :payload, :parts, :body, :data)
#=> "Hey, you found the body of the email! I want this!"
find_it(h2, :payload, :parts, :body, :data)
#=> "hey, you found me! This is what I want!!"
1. The hash h[:payload][:parts].last #=> { "mimeType": "application/pdf" } appears to contain hidden characters that are causing a problem. I therefore removed that hash from h2.

Related

How can I iterate over an array of hashes and form new one

I have a call to Companies House API and response I get from API is an array of hashes.
companies = {
"total_results" => 2,
"items" => [{
"title" => "First company",
"date_of_creation" => "2016-11-09",
"company_type" => "ltd",
"company_number" => "10471071323",
"company_status" => "active"
},
{
"title" => "Second company",
"date_of_creation" => "2016-11-09",
"company_type" => "ltd",
"company_number" => "1047107132",
"company_status" => "active"
}]
}
How I can iterate over companies and get a result similar to:
[{
title: "First company",
company_number: "10471071323"
},
{
title: "Second company",
company_number: "1047107132"
}]
You can use map which will iterate through the elements in an array and return a new array:
companies["items"].map do |c|
{
title: c['title'],
company_number: c['company_number']
}
end
=> [
{:title=>"First company", :company_number=>"10471071323"},
{:title=>"Second company", :company_number=>"1047107132"}
]
companies.map { |company| company.slice('title', 'company_number').symbolize_keys }
This should do the trick.
If you're not using Rails (or, more specifically, ActiveSupport), then symbolize_keys won't be available. In this case, you'd have to go for a more standard-Ruby approach:
companies.map do |company|
{ title: company["title"], company_number: company["company_number"] }
end
The answers are totally correct; but you should be made aware that what you’re looking at from companies house is not just an array of hashes - it’s a valid JsonApi response.
You might find your job easier if you’re using a gem which is aware of JsonApi specs, or if you’re just approaching it as that kind of data.
Have a look at the ruby implementations of https://jsonapi.org/implementations/
Or ActiveModelSerializer for ways to not only reform your hashes but deserialise this very structured data into ruby objects.
But like I say, if all you’re looking for is a quick way to reform the data as you describe. The above answers are perfect.

How to query key values from a hash of arrays of hashes

I have a JSONB payload in my database. This payload is from a GraphQL query of the shopify_api.
For the shop_order below, I am trying to query for the name of the fourth order in the node.
shop_order = {"data":{"orders":{"edges":[{"node":{"id":"gid://shopify/Order/2228134674512","name":"#1001","createdAt":"2020-05-01T18:46:04Z","shippingAddress":{"address1":"1234 Long Avenue, 2N","address2":"","city":"Chicago","province":"Illinois","provinceCode":"IL","zip":"55555"}}},{"node":{"id":"gid://shopify/Order/2239643451472","name":"#1002","createdAt":"2020-05-05T14:40:36Z","shippingAddress":{"address1":"1234 Long Avenue","address2":"2N","city":"Chicago","province":"Illinois","provinceCode":"IL","zip":"55555"}}},{"node":{"id":"gid://shopify/Order/2239950323792","name":"#1003","createdAt":"2020-05-05T16:35:38Z","shippingAddress":{"address1":"1234 Long Avenue","address2":"2N","city":"Chicago","province":"Illinois","provinceCode":"IL","zip":"55555"}}},{"node":{"id":"gid://shopify/Order/2239959105616","name":"#1004","createdAt":"2020-05-05T16:38:27Z","shippingAddress":{"address1":"1234 Long Avenue","address2":"2N","city":"Chicago","province":"Illinois","provinceCode":"IL","zip":"55555"}}}]}},"casted_data":{},"errors":[]}
order = shop_order[:data][:orders][:edges][3]
puts order
response > {:node=>{:id=>"gid://shopify/Order/2239959105616", :name=>"#1004", :createdAt=>"2020-05-05T16:38:27Z", :shippingAddress=>{:address1=>"1234 Long Avenue", :address2=>"2N", :city=>"Chicago", :province=>"Illinois", :provinceCode=>"IL", :zip=>"55555"}}}
order_to_a = shop_order[:data][:orders][:edges][3].to_a
puts order_to_a
response > node
{:id=>"gid://shopify/Order/2239959105616", :name=>"#1004", :createdAt=>"2020-05-05T16:38:27Z", :shippingAddress=>{:address1=>"1234 Long Avenue", :address2=>"2N", :city=>"Chicago", :province=>"Illinois", :provinceCode=>"IL", :zip=>"55555"}}
How do I query and display a specific value from a key that is inside a node?
It's not entirely clear what your intent is, but your access of elements in a hash can be streamlined using dig:
shop_order = {
"data": {
"orders": {
"edges": [
{}, {}, {}, {
"node": {
"name": '#1004',
"shippingAddress": {
"zip": '55555'
}
}
}
]
}
}
}
Access data using:
order = shop_order.dig(:data, :orders, :edges)[3]
# => {:node=>{:name=>"#1004", :shippingAddress=>{:zip=>"55555"}}}
or:
order = shop_order.dig(:data, :orders, :edges, 3)
# => {:node=>{:name=>"#1004", :shippingAddress=>{:zip=>"55555"}}}
How do I query and display a specific value from a key that is inside a node?
Huh? If you want information inside order, do the same sort of thing:
order.dig(:node, :name) # => "#1004"
order.dig(:node, :shippingAddress, :zip) # => "55555"
or:
shop_order.dig(:data, :orders, :edges, 3, :node, :name) # => "#1004"
shop_order.dig(:data, :orders, :edges, 3, :node, :shippingAddress, :zip) # => "55555"
Many times when we're walking through a complex hash of arrays we point to the array in a variable and then work from that point. It's similar to putting your finger on a page in a recipe, so we can go back to it quickly. We do the same when parsing HTML/XML, parsed JSON and YAML, etc.

I want to convert a single json event into multiple events, through logstash Hope to get some inspiration, thanks

4 fields (warnTags、warnSlrs、warnActions、denyMsg) fields need to be separated by semicolon(;)
Raw String
{ "waf": {
"warnTags": "OWASP_CRS/WEB_ATTACK/SQL_INJECTION;OWASP_CRS/WEB_ATTACK/XSS;OWASP_CRS/WEB_ATTACK/XSS;OWASP_CRS/WEB_ATTACK/XSS;OWASP_CRS/WEB_ATTACK/SPECIAL_CHARS;OWASP_CRS/WEB_ATTACK/SQL_INJECTION",
"policy": "bot_77598",
"warnSlrs": "ARGS:wvstest;ARGS:wvstest;ARGS:wvstest;ARGS:wvstest;ARGS:wvstest;ARGS:wvstest",
"riskTuples": ":-973305-973333-973335",
"warnActions": "2;2;2;2;2;2",
"denyActions": "3",
"warnMsg": "SQL Injection Attack;XSS Attack Detected;IE XSS Filters - Attack Detected;IE XSS Filters - Attack Detected;Restricted SQL Character Anomaly Detection Alert - Total # of special characters exceeded;Classic SQL Injection Probes 1/2",
"riskGroups": ":XSS-ANOMALY",
"warnRules": "950901;973305;973333;973335;981173;981242",
"denyMsg": "Anomaly Score Exceeded for Cross-Site Scripting",
"ver": "2.0",
"denyData": "VmVjdG9yIFNjb3JlOiBx",
"riskScores": ":-5-5-2",
"warnData": "eHNzdGFnPigpbG9jeHNz;amF2YXNYcm"
} }
Expected Output Result
{
"waf": {
"warnTags": "OWASP_CRS/WEB_ATTACK/SQL_INJECTION",
"policy": "bot_77598",
"warnSlrs": "ARGS:wvstest",
"riskTuples": ":-973305-973333-973335",
"warnActions": "2",
"denyActions": "3",
"warnMsg": "SQL Injection Attack",
"riskGroups": ":XSS-ANOMALY",
"warnRules": "950901",
"denyMsg": "Anomaly Score Exceeded for Cross-Site Scripting",
"ver": "2.0",
"denyData": "VmVjdG9yIFNjb3JlOiBx",
"riskScores": ":-5-5-2",
"warnData": "eHNzdGFnPigpbG9jeHNz;amF2YXNYcm"
}
}
{
"waf": {
"warnTags": "OWASP_CRS/WEB_ATTACK/XSS",
"policy": "bot_77598",
"warnSlrs": "ARGS:wvstest",
"riskTuples": ":-973305-973333-973335",
"warnActions": "2",
"denyActions": "3",
"warnMsg": "XSS Attack Detected",
"riskGroups": ":XSS-ANOMALY",
"warnRules": "973305",
"denyMsg": "Anomaly Score Exceeded for Cross-Site Scripting",
"ver": "2.0",
"denyData": "VmVjdG9yIFNjb3JlOiBx",
"riskScores": ":-5-5-2",
"warnData": "eHNzdGFnPigpbG9jeHNz;amF2YXNYcm"
}
}
filter {
ruby {
code => "
#info = []
events = event.to_hash
#warnTags = events['waf']['warnTags'].split(';')
#warnMsgs = events['waf']['warnMsg'].split(';')
#warnActions = events['waf']['warnActions'].split(';')
#warnRules = events['waf']['warnRules'].split(';')
#list = #warnTags.zip( #warnMsgs, #warnActions, #warnRules )
#list.each do |tag, msg, action, rule|
detail = {
'tag' => tag,
'msg' => msg,
'action' => action,
'rule' => rule
}
#info.push(detail)
end
event.remove('[waf][warnTags]')
event.remove('[waf][warnMsg]')
event.remove('[waf][warnActions]')
event.remove('[waf][warnRules]')
event.set('[waf][info]', #info)
"
}
split {
field => "[waf][info]"
}}
The config below should be along the lines of what you need. It includes parsing as json at the outset which you may not need depending on prior steps in your pipeline. Essentially this will split the warnTags field on ; to begin with; that will result in warnTags being an array nested within one object. The output of the string split is passed in the to higher level split filter which will create multiple output events splitting on input field, in this case warnTags (again). Hope this helps!
[EDIT: Added warnSlrs as second split field]
filter {
json {
source => "message"
}
mutate {
split => {"[waf][warnTags]" => ";"}
}
mutate {
split => {"[waf][warnSlrs]" => ";"}
}
split {
field => "[waf][warnTags]"
}
split {
field => "[waf][warnSlrs]"
}
}

How to use a string description to access data from a hash-within-hash structure?

I have the following:
data_spec['data'] = "some.awesome.values"
data_path = ""
data_spec['data'].split('.').each do |level|
data_path = "#{data_path}['#{level}']"
end
data = "site.data#{data_path}"
At this point, data equals a string: "site.data['some']['awesome']['values']"
What I need help with is using the string to get the value of: site.data['some']['awesome']['values']
site.data has the following value:
{
"some" => {
"awesome" => {
"values" => [
{
"things" => "Stuff",
"stuff" => "Things",
},
{
"more_things" => "More Stuff",
"more_stuff" => "More Things",
}
]
}
}
}
Any help is greatly appreciated. Thanks!
You could do as tadman suggested and use site.data.dig('some', 'awesome', values') if you are using ruby 2.3.0 (which is awesome and I didn't even know existed). This is probably your best choice. But if you really want to write the code yourself read below.
You were on the right track, the best way to do this is:
data_spec['data'] = "some.awesome.values"
data = nil
data_spec['data'].split('.').each do |level|
if data.nil?
data = site.data[level]
else
data = data[level]
end
end
To understand why this works first you need to understand that site.data['some']['awesome']['values'] is the same as saying: first get some then inside that get awesome then inside that get values. So our first step is retrieving the some. Since we don't have that first level yet we get it from site.data and save it to a variable data. Once we have that we just get each level after that from data and save it to data, allowing us to get deeper and deeper into the hash.
So using your example data would initally look like this:
{"awesome" => {
"values" => [
{
"things" => "Stuff",
"stuff" => "Things",
},
{
"more_things" => "More Stuff",
"more_stuff" => "More Things",
}
]
}
}
Then this:
{"values" => [
{
"things" => "Stuff",
"stuff" => "Things",
},
{
"more_things" => "More Stuff",
"more_stuff" => "More Things",
}
]
}
and finally output like this:
[
{
"things" => "Stuff",
"stuff" => "Things",
},
{
"more_things" => "More Stuff",
"more_stuff" => "More Things",
}
]
If you're receiving a string like 'x.y.z' and need to navigate a nested hash, Ruby 2.3.0 includes the dig method:
spec = "some.awesome.values"
data = {
"some" => {
"awesome" => {
"values" => [
'a','b','c'
]
}
}
}
data.dig(*spec.split('.'))
# => ["a", "b", "c"]
If you don't have Ruby 2.3.0 and upgrading isn't an option you can just patch it in for now:
class Hash
def dig(*path)
path.inject(self) do |location, key|
location.respond_to?(:keys) ? location[key] : nil
end
end
end
I wrote something that does exactly this. Feel free to take any information of value from it or steal it! :)
https://github.com/keithrbennett/trick_bag/blob/master/lib/trick_bag/collections/collection_access.rb
Check out the unit tests to see how to use it:
https://github.com/keithrbennett/trick_bag/blob/master/spec/trick_bag/collections/collection_access_spec.rb
There's an accessor method that returns a lambda. Since lambdas can be called using the [] operator (method, really), you can get such a lambda and access arbitrary numbers of levels:
accessor['hostname.ip_addresses.0']
or, in your case:
require 'trick_bag'
accessor = TrickBag::CollectionsAccess.accessor(site.data)
do_something_with(accessor['some.awesome.values'])
What you are looking for is something generally looked down upon and for good reasons. But here you go - it's called eval:
binding.eval data

Iterate and search a JSON array for the element in the array

I have a JSON array that looks like this:
response = {
"items"=>[
{
"tags"=>[
"random"
],
"timestamp"=>12345,
"storage"=>{
"url"=>"https://example.com/example",
"key"=>"mykeys"
},
"envelope"=>{
},
"log-level"=>"info",
"id"=>"random_id_test_1",
"campaigns"=>[
],
"user-variables"=>{
},
"flags"=>{
"is-test-mode"=>false
},
"message"=>{
"headers"=>{
"to"=>"random#example.com",
"message-id"=>"foobar#example.com",
"from"=>"noreply#example.com",
"subject"=>"new subject"
},
"attachments"=>[
],
"recipients"=>[
"result#example.com"
],
"size"=>4444
},
"event"=>"stored"
},
{
"tags"=>[
"flowerPower"
],
"timestamp"=>567890,
"storage"=>{
"url"=>"https://yahoo.com",
"key"=>"some_really_cool_keys_go_here"
},
"envelope"=>{
},
"log-level"=>"info",
"id"=>"some_really_cool_ids_go_here",
"campaigns"=>[
],
"user-variables"=>{
},
"flags"=>{
"is-test-mode"=>false
},
"message"=>{
"headers"=>{
"to"=>"another_great#example.com",
"message-id"=>"email_id#example.com",
"from"=>"from#example.com",
"subject"=>"email_looks_good"
},
"attachments"=>[
],
"recipients"=>[
"example#example.com"
],
"size"=>2222
},
"event"=>"stored"
}]
}
I am trying to obtain the "storage" "url" based on the "to" email.
How do I iterate through this array where x is just the element in the array
response['items'][x]["message"]["headers"]["to"]
Once I find the specific email that I need, it will stop and return the value of x which is the element number.
I was going to use that value for x and call response['items'][x]['storage']['url']
which will return the string for the URL.
I thought about doing this but there's gotta be a better way:
x = 0
user_email = another_great#example.com
while user_email != response['items'][x]["message"]["headers"]["to"] do
x+=1
value = x
puts value
end
target =
response['items'].detect do |i|
i['message']['headers']['to'] == 'another_great#example.com'
end
then
target['storage']['url']
This is another option by creating Hash with key of to's email. And on basis of it fetch required information like this:
email_hash = Hash.new
response["items"].each do |i|
email_hash[i["message"]["headers"]["to"]] = i
end
Now if you want to fetch "storage" "url" then simply do:
user_email = "another_great#example.com"
puts email_hash[user_email]["storage"]["url"] if email_hash[user_email]
#=> "https://yahoo.com"
You can use it as #Satoru suggested. As a suggestion, if you use case involves complex queries on json data (more complex than this), then you can store your data in mongodb, and can elegantly query anything.

Resources