Ruby: collect a target key's value into array from nested hash - ruby

I have a file like this:
$urls = [
{name:'Venture Capitals',
sites: [
'http://blog.ycombinator.com/posts.atom',
'http://themacro.com/feed.xml',
'http://a16z.com/feed/',
'http://firstround.com/review/feed.xml',
'http://www.kpcb.com/blog.rss',
'https://library.gv.com/feed',
'http://theaccelblog.squarespace.com/blog?format=RSS',
'https://medium.com/feed/accel-insights',
'http://500.co/blog/posts/feed/',
'http://feeds.feedburner.com/upfrontinsights?format=xml',
'http://versionone.vc/feed/',
'http://nextviewventures.com/blog/feed/',
]},
{name:'Companies and Groups',
sites: [
{name:'Product Companies',
sites: [
'https://m.signalvnoise.com/feed',
'http://feeds.feedburner.com/insideintercom',
'http://www.kickstarter.com/blog.atom',
'http://blog.invisionapp.com/feed/',
'http://feeds.feedburner.com/bufferapp',
'https://open.buffer.com/feed/',
'https://blog.asana.com/feed/',
'http://blog.drift.com/rss.xml',
'https://www.groovehq.com/blog/feed',]},
{name:'Consulting Groups, Studios',
sites: [
'http://svpg.com/articles/rss',
'http://www.thoughtworks.com/rss/insights.xml',
'http://zurb.com/blog/rss',]},
{name:'Communities',
sites: [
'http://alistapart.com/main/feed',
'https://www.mindtheproduct.com/feed/',]},
]},
]
I have organized the $url into different groups. Now I want to extract all the urls out (the link in the sites), how should I do?
The main problem is that, there are sites within sites, as the file showed above.
My problems are:
Am I using a proper file structure to save these links? (array within array). If not, what would be good way to save and group them?
How can I extract all the urls out into a flattened array? so I can later iterate through the list.
I can do this pretty manually, like the code shown below.
sites = []
$urls.each do |item|
item[:sites].each do |sub_item|
if sub_item.is_a?(Hash)
sites.concat sub_item[:sites]
else
sites.append sub_item
end
end
end
File.open('lib/flatten_sites.yaml', 'w') { |fo| fo.puts sites.to_yaml }
But I just feel this is bad code.
An alternative in this specific case, is to collect all the sites attribute, but I feel this is also very constrained, and may not help in some other cases.

If you have Hash, you can use this recursive method
Input
urls = [
{
:name => 'Venture Capitals',
:sites => [
'http://blog.ycombinator.com/posts.atom',
'http://themacro.com/feed.xml',
'http://a16z.com/feed/',
'http://firstround.com/review/feed.xml',
'http://www.kpcb.com/blog.rss',
'https://library.gv.com/feed',
'http://theaccelblog.squarespace.com/blog?format=RSS',
'https://medium.com/feed/accel-insights',
'http://500.co/blog/posts/feed/',
'http://feeds.feedburner.com/upfrontinsights?format=xml',
'http://versionone.vc/feed/',
'http://nextviewventures.com/blog/feed/',
]
},
{
:name => 'Companies and Groups',
:sites => [
{
:name => 'Product Companies',
:sites => [
'https://m.signalvnoise.com/feed',
'http://feeds.feedburner.com/insideintercom',
'http://www.kickstarter.com/blog.atom',
'http://blog.invisionapp.com/feed/',
'http://feeds.feedburner.com/bufferapp',
'https://open.buffer.com/feed/',
'https://blog.asana.com/feed/',
'http://blog.drift.com/rss.xml',
'https://www.groovehq.com/blog/feed',]
},
{
:name => 'Consulting Groups, Studios',
:sites => [
'http://svpg.com/articles/rss',
'http://www.thoughtworks.com/rss/insights.xml',
'http://zurb.com/blog/rss',]
},
{
:name => 'Communities',
:sites => [
'http://alistapart.com/main/feed',
'https://www.mindtheproduct.com/feed/',]
}
]
}
]
Method
def get_all_sites(data)
data[:sites].map { |r| Hash === r ? get_all_sites(r) : r }
end
urls.map { |r| get_all_sites(r) }.flatten
Output
[
"http://blog.ycombinator.com/posts.atom",
"http://themacro.com/feed.xml",
"http://a16z.com/feed/",
"http://firstround.com/review/feed.xml",
"http://www.kpcb.com/blog.rss",
"https://library.gv.com/feed",
"http://theaccelblog.squarespace.com/blog?format=RSS",
"https://medium.com/feed/accel-insights",
"http://500.co/blog/posts/feed/",
"http://feeds.feedburner.com/upfrontinsights?format=xml",
"http://versionone.vc/feed/",
"http://nextviewventures.com/blog/feed/",
"https://m.signalvnoise.com/feed",
"http://feeds.feedburner.com/insideintercom",
"http://www.kickstarter.com/blog.atom",
"http://blog.invisionapp.com/feed/",
"http://feeds.feedburner.com/bufferapp",
"https://open.buffer.com/feed/",
"https://blog.asana.com/feed/",
"http://blog.drift.com/rss.xml",
"https://www.groovehq.com/blog/feed",
"http://svpg.com/articles/rss",
"http://www.thoughtworks.com/rss/insights.xml",
"http://zurb.com/blog/rss",
"http://alistapart.com/main/feed",
"https://www.mindtheproduct.com/feed/"
]
I hope this helps

The solution similar to what Lukas Baliak proposed, but using more suitable Proc instead of redundant method (works for any amount of level’s nesting):
deep_map = ->(data) do
data[:sites].flat_map { |r| r.is_a?(String) ? r : deep_map.(r) }
end
urls.flat_map(&deep_map)

Related

Creating Ruby Hash from XML

I have an XML doc that looks like this (and contains hundreds of these entries):
<entry name="entryname">
<serial>1234567</serial>
<hostname>host1</hostname>
<ip-address>100.200.300.400</ip-address>
<mac-address>00-00-00-00</mac-address>
</entry>
ansible_hash is a hash that I will use as a basis for a dynamic ansible inventory, and has a structure as on the Ansible website:
ansible_hash = {
"_meta" => {"hostvars" => {}},
"all" => {
"children" => ["ungrouped"]
},
"ungrouped" => {}
}
I'm trying to use Nokogiri to retrieve the hostname from the XML doc, and add it to ansible_hash. I would like to have each of the hostnames to be appended to the array under the "hosts" key. How can I achieve this?
When I do this,
xml_doc = Nokogiri::XML(File.open("file.xml", "r"))
xml_doc.xpath("//entry//hostname").each do |entry|
ansible_hash["all"] = ansible_hash["all"].merge("hosts" => ["#{entry.inner_text}"])
end
the entry under "all" => {"hosts" => []} only has the last one like this:
{
"_meta" => {"hostvars"=>{}},
"all" => {
"children" => ["ungrouped"],
"hosts" => ["host200"]
},
"ungrouped" => {}
}
ansible_hash['all']['hosts'] = []
xml_doc.xpath("//entry//hostname").each do |entry|
ansible_hash['all']['hosts'] << entry.inner_text
end
The reason your code not working:
You're trying to merge two hashes with a same key hosts in each block, and the latter one's k/v will overwrite the previous one.
The behavior you need is to append something into an array, so just focus on it and forget about merge hashes.

Create a nested HASH from a API Call doesn't work properly

I am new here and i hope that I'm doing everything right.
I also searched the Forum and with Googel, but I didn't find the answer. (Or I did not notice that the solution lies before my eyes. Then I'm sorry >.< .)
i have a problem and i dont exactly know what i am doing wrong at the moment.
I make a API request and get a big JSON back. It looks somehow like that:
"apps": [
{
"title": "XX",
... many more data
},
{
"title": "XX",
... many more data
},
{
"title": "XX",
... many more data
}
... and so on
]
After that i want to create a hash with the data i need, for example it should look like:
{
"APP_0" => {"Title"=>"Name1", "ID"=>"1234", "OS"=>"os"}
"APP_1" => {"Title"=>"Name2", "ID"=>"5678", "OS"=>"os"}
}
but the values in the hash that i create with my code looks like:
"APP_1", {"Title"=>"Name2", "ID"=>"5678", "OS"=>"os"}
dont now if this is a valid hash? And after that i want to iterate through the Hash and just output the ID. But I get an error (TypeError). What am i doing wrong?
require 'json'
require 'net/http'
require 'uri'
require 'httparty'
response = HTTParty.get('https://xxx/api/2/app', {
headers: {"X-Toke" => "xyz"},
})
all_apps_parse = JSON.parse(response.body)
all_apps = Hash.new
all_apps_parse["apps"].each_with_index do |app, i|
all_apps["APP_#{i}"] = {'Title' => app["title"],
'ID' => app["id"],
'OS' => app["platform"]}
end
all_apps.each_with_index do |app, i|
app_id = app["App_#{i}"]["id"]
p app_id
end
I hope someone can understand the problem and can help me :-). Thanks in advance.
Assuming the data looks something like this:
all_apps_parse = { "apps" => [
{
"title" => "Name1",
"id" => 1234,
"platform" => "os"
},
{
"title" => "Name2",
"id" => 5678,
"platform" => "os"
},
{
"title" => "Name3",
"id" => 1111,
"platform" => "windows"
}]
}
and with a little idea of what you want to achieve, here is my solution:
all_apps = Hash.new
all_apps_parse["apps"].each_with_index do |app, i|
all_apps["APP_#{i}"] = { 'Title' => app["title"],
'ID' => app["id"],
'OS' => app["platform"] }
end
all_apps
=> {"APP_0"=>{"Title"=>"Name1", "ID"=>1234, "OS"=>"os"}, "APP_1"=>{"Title"=>"Name2", "ID"=>5678, "OS"=>"os"}, "APP_2"=>{"Title"=>"Name3", "ID"=>1111, "OS"=>"windows"}}
all_apps.each do |key, value|
puts key # => e.g. "APP_0"
puts value['ID'] # => e.g. 1234
end
# Prints
APP_0
1234
APP_1
5678
APP_2
1111

How to use a string description to access data from a hash-within-hash structure?

I have the following:
data_spec['data'] = "some.awesome.values"
data_path = ""
data_spec['data'].split('.').each do |level|
data_path = "#{data_path}['#{level}']"
end
data = "site.data#{data_path}"
At this point, data equals a string: "site.data['some']['awesome']['values']"
What I need help with is using the string to get the value of: site.data['some']['awesome']['values']
site.data has the following value:
{
"some" => {
"awesome" => {
"values" => [
{
"things" => "Stuff",
"stuff" => "Things",
},
{
"more_things" => "More Stuff",
"more_stuff" => "More Things",
}
]
}
}
}
Any help is greatly appreciated. Thanks!
You could do as tadman suggested and use site.data.dig('some', 'awesome', values') if you are using ruby 2.3.0 (which is awesome and I didn't even know existed). This is probably your best choice. But if you really want to write the code yourself read below.
You were on the right track, the best way to do this is:
data_spec['data'] = "some.awesome.values"
data = nil
data_spec['data'].split('.').each do |level|
if data.nil?
data = site.data[level]
else
data = data[level]
end
end
To understand why this works first you need to understand that site.data['some']['awesome']['values'] is the same as saying: first get some then inside that get awesome then inside that get values. So our first step is retrieving the some. Since we don't have that first level yet we get it from site.data and save it to a variable data. Once we have that we just get each level after that from data and save it to data, allowing us to get deeper and deeper into the hash.
So using your example data would initally look like this:
{"awesome" => {
"values" => [
{
"things" => "Stuff",
"stuff" => "Things",
},
{
"more_things" => "More Stuff",
"more_stuff" => "More Things",
}
]
}
}
Then this:
{"values" => [
{
"things" => "Stuff",
"stuff" => "Things",
},
{
"more_things" => "More Stuff",
"more_stuff" => "More Things",
}
]
}
and finally output like this:
[
{
"things" => "Stuff",
"stuff" => "Things",
},
{
"more_things" => "More Stuff",
"more_stuff" => "More Things",
}
]
If you're receiving a string like 'x.y.z' and need to navigate a nested hash, Ruby 2.3.0 includes the dig method:
spec = "some.awesome.values"
data = {
"some" => {
"awesome" => {
"values" => [
'a','b','c'
]
}
}
}
data.dig(*spec.split('.'))
# => ["a", "b", "c"]
If you don't have Ruby 2.3.0 and upgrading isn't an option you can just patch it in for now:
class Hash
def dig(*path)
path.inject(self) do |location, key|
location.respond_to?(:keys) ? location[key] : nil
end
end
end
I wrote something that does exactly this. Feel free to take any information of value from it or steal it! :)
https://github.com/keithrbennett/trick_bag/blob/master/lib/trick_bag/collections/collection_access.rb
Check out the unit tests to see how to use it:
https://github.com/keithrbennett/trick_bag/blob/master/spec/trick_bag/collections/collection_access_spec.rb
There's an accessor method that returns a lambda. Since lambdas can be called using the [] operator (method, really), you can get such a lambda and access arbitrary numbers of levels:
accessor['hostname.ip_addresses.0']
or, in your case:
require 'trick_bag'
accessor = TrickBag::CollectionsAccess.accessor(site.data)
do_something_with(accessor['some.awesome.values'])
What you are looking for is something generally looked down upon and for good reasons. But here you go - it's called eval:
binding.eval data

Ruby Loop Array and Create Hash for Each Array Object

I'd like to loop through an array and create a hash for each object in the array, then group all those hashes into an array of hashes.
Here's an example starting array for me:
urls = ["http://stackoverflow.com", "http://example.com", "http://foobar.com"]
Now let's say I'd like to have a hash for each of those URLs into an array like this:
urls =[ {
'url' => "http://stackoverflow.com",
'dns_status' => "200",
'title' => "Stack Overflow"
},
{
'url' => "http://example.com",
'dns_status'=> "200",
'title' => "Example"
}
]
Leaving aside where I get the values for the dns_status and title keys in the example, I guess what I'm missing is how to loop through the original array and create a hash for each object...
I've played around with inject, collect, map and each and read through the docs but can't quite make sense of it or get anything to work.
Any recommendation? Will this be easier to accomplish with a class?
EDIT:
Thanks for your help everyone. Figured this out and got it working. Cheers!
Do something with each element of something enumerable and store the result in an array: that is what map does. Specify what you want in the block, like this:
urls = ["http://stackoverflow.com", "http://example.com", "http://foobar.com"]
p res = urls.map{|url| {"url"=>url, "dns_status"=>200, "title"=>url[7..-5]} }
#=> [{"url"=>"http://stackoverflow.com", "dns_status"=>200, "title"=>"stackoverflow"}, {"url"=>"http://example.com", "dns_status"=>200, "title"=>"example"}, {"url"=>"http://foobar.com", "dns_status"=>200, "title"=>"foobar"}]
"what I'm missing is how to loop through the original array and create a hash for each object..."
urls = [
"http://stackoverflow.com",
"http://example.com",
"http://foobar.com"
]
urls.each {|entry|
puts entry
}
You could use .map! for instance. But I am still not sure what your target result ought to be. How about this?
urls.map! {|entry|
{ 'url' => entry, 'dns_status' => "200", 'title' => "Stack Overflow"}
}
urls # => [{"url"=>"http://stackoverflow.com", "dns_status"=>"200", "title"=>"Stack Overflow"}, {"url"=>"http://example.com", "dns_status"=>"200", "title"=>"Stack Overflow"}, {"url"=>"http://foobar.com", "dns_status"=>"200", "title"=>"Stack Overflow"}]
Yikes, the result is hard to see. It is this:
[
{
"url"=>"http://stackoverflow.com",
"dns_status"=>"200",
"title"=>"Stack Overflow"
},
{
"url"=>"http://example.com",
"dns_status"=>"200",
"title"=>"Stack Overflow"
},
{
"url"=>"http://foobar.com",
"dns_status"=>"200",
"title"=>"Stack Overflow"
}
]
Obviously, you need to still supply the proper content for title,
but you did not give this in your original question so I could not
fill it in.

Ruby Array of Hashes of Hashes Map

I have this hash,
[{ "player" => { "name" => "Kelvin" , "id" => 1 } , "player" => { "name" => "David",
"id" => 2 }]
I checked if each event contains the keys [id,name] with the following line in my Rspec,
json_response.map{|player| ["name","id"].all? {|attribute| player["player"].key?
(attribute)}}.should_not include(false)
which works perfectly. How can I simplify this and make it more efficient?
How about :
json_response.each do |event|
event['player'].should have_key('name')
event['player'].should have_key('id')
end
Much clearer IMHO
Edit : if you need to check a lot of columns :
json_response.each do |event|
['name', 'id', 'foo', 'bar', 'baz'].each do |column|
event['player'].should have_key(column)
end
end
According to the documentation you should be able to do this:
json_response.each do |event|
event['player'].should include('name', 'id')
end

Resources