Elegantly creating a hash from an array - ruby

I currently have some Ruby code that creates output like this (after conversion to JSON):
"days": [
{
"Jul-22": ""
},
{
"Aug-19": ""
}
],
What I want is output like this:
"days": {
"Jul-22": "",
"Aug-19": ""
},
Here is my code:
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).collect do |noteworthy_day|
{ noteworthy_day.date.to_s(:trends_id) => "" }
end
In other words I want a hash instead of an array of hashes. Here's my ugly solution:
days = {}
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).each do |noteworthy_day|
days[noteworthy_day.date.to_s(:trends_id)] = ""
end
days
That seems very unrubylike, though. Can someone help me do this more efficiently?

Hash[
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).collect { |noteworthy_day|
[noteworthy_day.date.to_s(:trends_id), ""]
}
]
Or...
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).each_with_object(Hash.new) { |noteworthy_day, ndays|
ndays[noteworthy_day] = ""
}

This is a problem tailor made for Enumerable#inject
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).inject({}) do |hash, noteworthy_day|
hash[noteworthy_day.date.to_s(:trends_id)] = ''
hash
end

Related

DRY Strategy for looping over unknown levels of nested objects

My scenario is based on Gmail API.
I've learned that email messages can have their message parts deeply or shallowly nested based upon varying factors, but mostly the presence of attachments.
I'm using the Google API Ruby Client gem, so I'm not working with JSON, I'm getting objects with all the same information, but I think the JSON representation makes it easier to understand my issue.
A simple message JSON response looks like this (one parts array with 2 hashes inside it):
{
"id": "175b418b1ff69896",
"snippet": "COVID-19: Resources to help your business manage through uncertainty 20 Liters 500 PEOPLE FOUND YOU ON GOOGLE Here are the top search queries used to find you: 20 liters used by 146 people volunteer",
"payload": {
"parts": [
{
"mimeType": "text/plain",
"body": {
"data": "Hey, you found the body of the email! I want this!"
}
},
{
"mimeType": "text/html",
"body": {
"data": "<div>I actually don't want this</div>"
}
}
]
}
}
The value I want is not that hard to get:
response.payload.parts.each do |part|
#body_data = part.body.data if part.mime_type == 'text/plain'
end
BUT The JSON response of a more complex email message with attachments looks something like this (now parts nests itself 3 levels deep):
{
"id": "175aee26de8209d2",
"snippet": "snippet text...",
"payload": {
"parts": [
{
"mimeType": "multipart/related",
"parts": [
{
"mimeType": "multipart/alternative",
"parts": [
{
"mimeType": "text/plain",
"body": {
"data": "hey, you found me! This is what I want!!"
}
},
{
"mimeType": "text/html",
"body": {
"data": "<div>I actually don't want this one.</div>"
}
}
]
},
{
"mimeType": "image/jpeg"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/jpeg"
},
{
"mimeType": "image/png"
},
{
"mimeType": "image/png"
}
]
},
{
"mimeType": "application/pdf"
}
]
}
}
And looking at a few other messages, the object can vary from 1 to 5 levels (maybe more) of parts
I need to loop over an unknown number of parts and then loop over an unknown number of nested parts and the repeat this again until I reach the bottom, hopefully finding the thing I want.
Here's my best attempt:
def trim_response(response)
# remove headers I don't care about
response.payload.headers.keep_if { |header| #valuable_headers.include? header.name }
# remove parts I don't care about
response.payload.parts.each do |part|
# parts can be nested within parts, within parts, within...
if part.mime_type == #valuable_mime_part && part.body.present?
#body_data = part.body.data
break
elsif part.parts.present?
# there are more layers down
find_body(part)
end
end
end
def find_body(part)
part.parts.each do |sub_part|
if sub_part.mime_type == #valuable_mime_part && sub_part.body.present?
#body_data = sub_part.body.data
break
elsif sub_part.parts.present?
# there are more layers down
######### THIS FEELS BAD!!! ###########
find_body(sub_part)
end
end
end
Yep, there's a method calling itself. I know, that's why I'm here.
This does work, I've tested it on a few dozen messages, but... there has to be a better, DRY-er way to do this.
How do I recursively loop and then move down a level and loop again in a DRY fashion when I don't know how deep the nesting goes?
No need to go through all this pain. Just keep diving in the parts dictionary until you find the first value where there is no parts anymore. At this moment you have the final parts in your parts variable.
Code:
reponse = {"id" => "175aee26de8209d2","snippet" => "snippet text...","payload" => {"parts" => [{"mimeType" => "multipart/related","parts" => [{"mimeType" => "multipart/alternative","parts" => [{"mimeType" => "text/plain","body" => {"data" => "hey, you found me! This is what I want!!"}},{"mimeType" => "text/html","body" => {"data" => "<div>I actually don't want this one.</div>"}}]},{"mimeType" => "image/jpeg"}]},{"mimeType" => "application/pdf"}]}}
parts = reponse["payload"]
parts = (parts["parts"].send("first") || parts["parts"]) while parts["parts"]
data = parts["body"]["data"]
puts data
Output:
hey, you found me! This is what I want!!
You can compute the desired result using recursion.
def find_it(h, top_key, k1, k2, k3)
return nil unless h.key?(top_key)
recurse(h[top_key], k1, k2, k3)
end
def recurse(h, k1, k2, k3)
return nil unless h.key?(k1)
h[k1].each do |g|
v = g.dig(k2,k3) || recurse(g, k1 , k2, k3)
return v unless v.nil?
end
nil
end
See Hash#dig.
Let h1 and h2 equal the two hashes given in the example1. Then:
find_it(h1, :payload, :parts, :body, :data)
#=> "Hey, you found the body of the email! I want this!"
find_it(h2, :payload, :parts, :body, :data)
#=> "hey, you found me! This is what I want!!"
1. The hash h[:payload][:parts].last #=> { "mimeType": "application/pdf" } appears to contain hidden characters that are causing a problem. I therefore removed that hash from h2.

Logstash filter out values with null values for a key in a nested json array

I have quite an extensive Logstash pipeline ending in a Json as such:
{
"keyA": 1,
"keyB": "sample",
"arrayKey": [
{
"key": "data"
},
{
"key": null
}
]
}
What I want to achieve is to filter "arrayKey" and remove objects within with value for "key" is null.
Tried this to no luck:
filter {
ruby {
code => "
event.get('arrayKey').each do |key|
[key].delete_if do |keyCandidate|
if [keyCandidate][key] != nil
true
end
end
end
"
}
}
This gives no implicit converter found from |hash|:|Int| error. How do I achieve this? Is there and easier way to do this?
As Aleksei pointed out, you can create a copy of the array that does not contain entries where [key] is null using reject. You have to use event.set to overwrite the inital value of [arrayKey]
ruby {
code => '
a = event.get("arrayKey")
if a
event.set("arrayKey", a.reject { |x| x["key"] == nil })
end
'
}

Iterate and search a JSON array for the element in the array

I have a JSON array that looks like this:
response = {
"items"=>[
{
"tags"=>[
"random"
],
"timestamp"=>12345,
"storage"=>{
"url"=>"https://example.com/example",
"key"=>"mykeys"
},
"envelope"=>{
},
"log-level"=>"info",
"id"=>"random_id_test_1",
"campaigns"=>[
],
"user-variables"=>{
},
"flags"=>{
"is-test-mode"=>false
},
"message"=>{
"headers"=>{
"to"=>"random#example.com",
"message-id"=>"foobar#example.com",
"from"=>"noreply#example.com",
"subject"=>"new subject"
},
"attachments"=>[
],
"recipients"=>[
"result#example.com"
],
"size"=>4444
},
"event"=>"stored"
},
{
"tags"=>[
"flowerPower"
],
"timestamp"=>567890,
"storage"=>{
"url"=>"https://yahoo.com",
"key"=>"some_really_cool_keys_go_here"
},
"envelope"=>{
},
"log-level"=>"info",
"id"=>"some_really_cool_ids_go_here",
"campaigns"=>[
],
"user-variables"=>{
},
"flags"=>{
"is-test-mode"=>false
},
"message"=>{
"headers"=>{
"to"=>"another_great#example.com",
"message-id"=>"email_id#example.com",
"from"=>"from#example.com",
"subject"=>"email_looks_good"
},
"attachments"=>[
],
"recipients"=>[
"example#example.com"
],
"size"=>2222
},
"event"=>"stored"
}]
}
I am trying to obtain the "storage" "url" based on the "to" email.
How do I iterate through this array where x is just the element in the array
response['items'][x]["message"]["headers"]["to"]
Once I find the specific email that I need, it will stop and return the value of x which is the element number.
I was going to use that value for x and call response['items'][x]['storage']['url']
which will return the string for the URL.
I thought about doing this but there's gotta be a better way:
x = 0
user_email = another_great#example.com
while user_email != response['items'][x]["message"]["headers"]["to"] do
x+=1
value = x
puts value
end
target =
response['items'].detect do |i|
i['message']['headers']['to'] == 'another_great#example.com'
end
then
target['storage']['url']
This is another option by creating Hash with key of to's email. And on basis of it fetch required information like this:
email_hash = Hash.new
response["items"].each do |i|
email_hash[i["message"]["headers"]["to"]] = i
end
Now if you want to fetch "storage" "url" then simply do:
user_email = "another_great#example.com"
puts email_hash[user_email]["storage"]["url"] if email_hash[user_email]
#=> "https://yahoo.com"
You can use it as #Satoru suggested. As a suggestion, if you use case involves complex queries on json data (more complex than this), then you can store your data in mongodb, and can elegantly query anything.

how can I iterate through this json document using ruby?

I have a ruby code block, as follows:
require "elasticsearch"
require "json"
search_term = "big data"
city = "Hong Kong"
client = Elasticsearch::Client.new log: true
r = client.search index: 'candidates', body:
{
query: {
bool: {
must: [
{
match: {
tags: search_term
}
},
{
match: {
city: city
}
}
]
}
}
}
It produces multiple returns like this one:
{"_index":"candidates","_type":"data",
"_id":"AU3DyAmvtewNSFHuYn88",
"_score":3.889237,
"_source":{"first":"Kota","last":"Okayama","city":"Tokyo","designation":"Systems Engineer","email":"user#hotmail.co.jp","phone":"phone","country":"Japan","industry":"Technology","tags":["remarks","virtualization big data"]}}
I want to iterate through it and extract various elements. I have tried
data = JSON.parse(r)
data.each do |row|
puts row["_source"]["first"]
end
and the error is:
no implicit conversion of Hash into String (TypeError)
What's the best way forward on this chaps?
I have the solution, I hope it helps somebody else. It took me hours of fiddling and experimentation. Here it is:
require "elasticsearch"
require "json"
search_term = "big data"
city = "Tokyo"
client = Elasticsearch::Client.new log: true
h = client.search index: 'swiss_candidates', body:
{
query: {
bool: {
must: [
{
match: {
tags: search_term
}
},
{
match: {
city: city
}
}
]
}
}
}
data = JSON.parse(h.to_json)
data["hits"]["hits"].each do |r|
puts r["_id"]
puts r["_source"]["first"]
puts r["_source"]["tags"][1]
puts r["_source"]["screened"][0]
end
The important thing seems to be to convert the elasticsearch result into something ruby friendly.
JSON.parse expects a String containing a JSON document, but you are passing it the Hash which was returned from client.search.
I'm not entirely sure what you are trying to achieve with that, why you want to parse something which is already a Ruby Hash into a Ruby Hash.

Delete nested hash according to key => value

I have this hash:
response = '{"librairies":[{"id":1,"books":[{"id":1,"qty":1},{"id":2,"qty":3}]},{"id":2,"books":[{"id":1,"qty":0},{"id":2,"qty":3}]}]}'
in which I'd like to delete every librairies where, at least, one of the book quantity is null.
For instance, with this given response, I'd expect this return:
'{"librairies":[{"id":1,"books":[{"id":1,"qty":1},{"id":2,"qty":3}]}]}'
I've tried this:
parsed = JSON.parse(response)
parsed["librairies"].each do |library|
library["books"].each do |book|
parsed.delete(library) if book["qty"] == 0
end
end
but this returns the exact same response hash, without having deleted the second library (the one with id => 2).
You can use Array#delete_if and Enumerable#any? for this
# Move through each array element with delete_if
parsed["librairies"].delete_if do |library|
# evaluates to true if any book hash in the library
# has a "qty" value of 0
library["books"].any? { |book| book["qty"] == 0 }
end
Hope this helps
To avoid changing the hash parsed, you could do the following.
Firstly, let's format parsed so we can see what we're dealing with:
parsed = { "libraries"=>[ { "id"=>1,
"books"=>[ { "id"=>1, "qty"=>1 },
{ "id"=>2, "qty"=>3 } ]
},
{ "id"=>2,
"books"=>[ { "id"=>1, "qty"=>0 },
{ "id"=>2, "qty"=>3 } ]
}
]
}
Later I want to show that parsed has not been changed when we create the new hash. An easy way of doing that is to compute a hash code on parsed before and after, and see if it changes. (While it's not 100% certain that different hashes won't have the same hash code, here it's not something to lose sleep over.)
parsed.hash
#=> 852445412783960729
We first need to make a "deep copy" of parsed so that changes to the copy will not affect parsed. One way of doing that is to use the Marshal module:
new_parsed = Marshal.load(Marshal.dump(parsed))
We can now modify the copy as required:
new_parsed["libraries"].reject! { |h| h["books"].any? { |g| g["qty"].zero? } }
#=> [ { "id"=>1,
# "books"=>[ { "id"=>1, "qty"=>1 },
# { "id"=>2, "qty"=>3 }
# ]
# }
# ]
new_parsed # => { "libraries"=>[ { "id"=>1,
"books"=>[ { "id"=>1, "qty"=>1},
{ "id"=>2, "qty"=>3}
]
}
]
}
And we confirm the original hash was not changed:
parsed.hash
#=> 852445412783960729

Resources