Tabbed text file to MultiDimensional hash using Ruby? - ruby

I'm having a bit of trouble figuring about how I'd go about this for a part of my project. Basically I need to take a normal tabbed text file and convert it into a Multi Dimensional hash in Ruby so I can cycle through and detect which parts have children. An example of the file:
hello
world
how
are
you
today
Would become:
{'hello' => ['world', 'how'], 'are' => {'you' => ['today']}}

Since your input format is up to you, I really don't understand why you're not using YAML:
puts { 'hello' => ['world', 'how'], 'are' => { 'you' => ['today'] } }.to_yaml
yields:
---
hello:
- world
- how
are:
you:
- today
Calling YAML.load with that string, of course, returns the original data structure. Contrary to what you believe, YAML does not require a "key value syntax".

Related

How to to parse HTML contents of a page using Nokogiri

require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = 'https://www.trumba.com/calendars/smithsonian-events.xml'
doc = Nokogiri::XML(open url)
I am trying to fetch the basic set of information like:
event_name
categories
sponsor
venue
event_location
cost
For example, for event_name I have this xpath:
"/html/body/div[2]/div[2]/div[1]/h3/a/span"
And use it like:
puts doc.xpath "/html/body/div[2]/div[2]/div[1]/h3/a/span"
This returns nil for event_name.
If I save the URL contents locally then above XPath works.
Along with this, I need above mentioned information as well. I checked the other XPaths too, but the result turns out to be blank.
Here's how I'd go about doing this:
require 'nokogiri'
doc = Nokogiri::XML(open('/Users/gferguson/smithsonian-events.xml'))
namespaces = doc.collect_namespaces
entries = doc.search('entry').map { |entry|
entry_title = entry.at('title').text
entry_time_start, entry_time_end = ['startTime', 'endTime'].map{ |p|
entry.at('gd|when', namespaces)[p]
}
entry_notes = entry.at('gc|notes', namespaces).text
{
title: entry_title,
start_time: entry_time_start,
end_time: entry_time_end,
notes: entry_notes
}
}
Which, when run, results in entries being an array of hashes:
require 'awesome_print'
ap entries [0, 3]
# >> [
# >> [0] {
# >> :title => "Conservation Clinics",
# >> :start_time => "2016-11-09T14:00:00Z",
# >> :end_time => "2016-11-09T17:00:00Z",
# >> :notes => "Have questions about the condition of a painting, frame, drawing,\n print, or object that you own? Our conservators are available by\n appointment to consult with you about the preservation of your art.\n \n To request an appointment or to learn more,\n e-mail DWRCLunder#si.edu and specify CLINIC in the subject line."
# >> },
# >> [1] {
# >> :title => "Castle Highlights Tour",
# >> :start_time => "2016-11-09T14:00:00Z",
# >> :end_time => "2016-11-09T14:45:00Z",
# >> :notes => "Did you know that the Castle is the Smithsonian’s first and oldest building? Join us as one of our dynamic volunteer docents takes you on a tour to explore the highlights of the Smithsonian Castle. Come learn about the founding and early history of the Smithsonian; its original benefactor, James Smithson; and the incredible history and architecture of the Castle. Here is your opportunity to discover the treasured stories revealed within James Smithson's crypt, the Gre...
# >> },
# >> [2] {
# >> :title => "Exhibition Interpreters/Navigators (throughout the day)",
# >> :start_time => "2016-11-09T15:00:00Z",
# >> :end_time => "2016-11-09T15:00:00Z",
# >> :notes => "Museum volunteer interpreters welcome visitors, answer questions, and help visitors navigate exhibitions. Interpreters may be stationed in several of the following exhibitions at various times throughout the day, subject to volunteer interpreter availability. <ul> \t<li><em>The David H. Koch Hall of Human Origins: What Does it Mean to be Human?</em></li> \t<li><em>The Sant Ocean Hall</em></li> </ul>"
# >> }
# >> ]
I didn't try to gather the specific information you asked for because event_name doesn't exist and what you're doing is very generic and easily done once you understand a few rules.
XML is generally very repetitive because it represents tables of data. The "cells" of the table might vary but there's repetition you can use to help you. In this code
doc.search('entry')
loops over the <entry> nodes. Then it's easy to look inside them to find the information needed.
The XML uses namespaces to help avoid tag-name collisions. At first those seem really hard, but Nokogiri provides the collect_namespaces method for the document that returns a hash of all namespaces in the document. If you're looking for a namespaces-tag, pass that hash as the second parameter.
Nokogiri allows us to use XPath and CSS for selectors. I almost always go with CSS for readability. ns|tag is the format to tell Nokogiri to use a CSS-based namespaced tag. Again, pass it the hash of namespaces in the document and Nokogiri will do the rest.
If you're familiar with working with Nokogiri you'll see the above code is very similar to normal code used to pull the content of <td> cells inside <tr> rows in an HTML <table>.
You should be able to modify that code to gather the data you need without risking namespace collisions.
The provided link contains XML, so your XPath expressions should work with XML structure.
The key thing is that the document has namespaces. As I understand all XPath expressions should keep that in mind and specify namespaces too.
In order to simply XPath expressions one can use the remove_namespaces! method:
require 'nokogiri'
require 'open-uri'
url = 'https://www.trumba.com/calendars/smithsonian-events.xml'
doc = Nokogiri::XML(open(url)); nil # nil is used to avoid huge output
doc.remove_namespaces!; nil
event = doc.xpath('//feed/entry[1]') # it will give you the first event
event.xpath('./title').text # => "Conservation Clinics"
event.xpath('./categories').text # => "Demonstrations,Lectures & Discussions"
Most likely you would like to have array of all event hashes.
You can do it like:
doc.xpath('//feed/entry').reduce([]) do |memo, event|
event_hash = {
title: event.xpath('./title').text,
categories: event.xpath('./categories').text
# all other attributes you need ...
}
memo << event_hash
end
It will give you an array like:
[
{:title=>"Conservation Clinics", :categories=>"Demonstrations,Lectures & Discussions"},
{:title=>"Castle Highlights Tour", :categories=>"Gallery Talks & Tours"},
...
]

How do I pass a hash from commandline?

I have a ruby script that has a hash.
Example:
animal_sound = { 'dog' => 'bark', 'cat' => 'meow' }
I want to add 'snake' => 'hiss'
Example:
myscript.rb --addsound "'snake' => 'hiss'"
Then in my script have it add it to animal_sound.
Example:
animal_sound.merge! 'snake' => 'hiss'
=> {"dog"=>"bark", "cat"=>"meow", "snake"=>"hiss"}
Is there a way to do this?
Here is the whole script:
#!/usr/bin/env ruby
require 'rubygems'
require 'micro-optparse'
options = Parser.new do |p|
p.option :addsound, "add sound"
end.process!
animal_sound = { 'dog' => 'bark', 'cat' => 'meow' }
if options[:add_sound]
newsound = options[:add_sound]
animal_sound.merge! newsound
end
puts animal_sound
When I run my script I get:
$ bin/myscript.rb --addsound "'snake' => 'hiss'"
bin/myscript.rb:14:in `merge!': can't convert true into Hash (TypeError)
from bin/myscript.rb:14:in `<main>'
SOLVED:
Using PSkocik's solution I got the script to work using animal, sound = options[:addsound].split(' => '); animal_sound[animal] = sound
I also used Simone Carletti's idea to simplify the CLI command. FYI it also works if I want to pass in hash format, like myscript.rb --addsound "'snake' => 'hiss'". Of course the split has to be changed back to split(' => '). I like the simpler CLI using the :.
Example:
myscript.rb --addsound snake:hiss
Final Code:
#!/usr/bin/env ruby
require 'rubygems'
require 'micro-optparse'
options = Parser.new do |p|
p.option :addsound, "add sound", default: ""
end.process!
animal_sound = { 'dog' => 'bark', 'cat' => 'meow' }
if options[:addsound]
animal, sound = options[:addsound].split(':')
animal_sound[animal] = sound
end
puts animal_sound
Command line:
$ bin/myscript.rb --addsound snake:hiss
{"dog"=>"bark", "cat"=>"meow", "snake"=>"hiss"}
I never could get the merge to work.
Each post was helpful. Thanks.
It's a good idea to keep the CLI interface detached from the underlying implementation. In fact, you may decide to switch the script in the future from Ruby to another language, and you don't really want to change the way the code is invoked.
My suggestion is to pass a serialized value, for example
myscript.rb --addsound snake:hiss
In the code, simply decompose the content and merge it.
if options[:add_sound]
animal, sound = options[:add_sound].split(":")
animal_sound.merge!(animal => sound)
end
p.option :addsound, "add sound"
^ this makes it a flag (true or false)
What you want is make it into a switch whose value is the next argument:
p.option :addsound, "add sound", default: ""
^ this makes it a switch, the string value will be assigned to options[:addsound]
newsound = options[:addsound]
^ Here you need to drop the underscore and parse the string into a hash.
Eval is evil.
For example, you could split it on ' => ' and forget about quoting:
newsound = [ options[:addsound].split(' => ') ].to_h #and then merge it
(Passing the argument like so --addsound snake:hiss and then splitting on ':' instead of ' => ' is another good option.)
^splitting on ' => ' should yield a two-member array. Here I put it into another array (arrays of two-member arrays are convertible to hashes) to make it convertible into a hash.
Or you do completely without merging and constructing another hash:
animal, sound = options[:addsound].split(' => ')
animal_sound[animal] = sound
In regards to your error
Notice the line if options[:add_sound]. That basically evaluates to if true. You are getting your error because you are setting newsound to true, and trying to merge a Boolean into a hash. To my knowledge, the .merge only works like so: hash1.merge(hash2).
Passing command line argument
Rather than passing the argument "'snake' => 'hiss'", I suggest making this a comma-delineated list, like so: "snake,hiss". From there, in your if options[:add_sound] block, you can split the string into an array, using a comma as a splitter. Finally, rather than using .merge, you can add your key:value as you normally would for any hash in Ruby. animal_sound[arr[0]] = arr[1].
Mind you, this method will work best with a single key:value pair. I am sure you can submit multiple pairs, but you would need to (by this method) split into more arrays by an additional character(like / maybe).

Can I manipulate yaml files and write them out again

I have a map of values, the key is a filename and the value is an array strings.
I have the corresponding files
how would I load the file and create a fixed yaml value which contains the value of the array whether or not the value already exists
e.g.
YAML (file.yaml)
trg::azimuth:
-extra
-intra
-lateral
or
trg::azimuth:
[extra,intra,lateral]
from
RUBY
{"file.yaml" => ["extra","intra","lateral"]}
The YAML documentation doesn't cover its methods very well, but does say
The underlying implementation is the libyaml wrapper Psych.
The Psych documentation, which underlies YAML, covers reading, parsing, and emitting YAML.
Here's the basic process:
require 'yaml'
foo = {"file.yaml" => ["extra","intra","lateral"]}
bar = foo.to_yaml
# => "---\nfile.yaml:\n- extra\n- intra\n- lateral\n"
And here's what the generated, serialized bar variable looks like if written:
puts bar
# >> ---
# >> file.yaml:
# >> - extra
# >> - intra
# >> - lateral
That's the format a YAML parser needs:
baz = YAML.load(bar)
baz
# => {"file.yaml"=>["extra", "intra", "lateral"]}
At this point the hash has gone round-trip, from a Ruby hash, to a YAML-serialized string, back to a Ruby hash.
Writing YAML to a file is easy using Ruby's File.write method:
File.write(foo.keys.first, foo.values.first.to_yaml)
or
foo.each do |k, v|
File.write(k, v.to_yaml)
end
Which results in a file named "file.yaml", which contains:
---
- extra
- intra
- lateral
To read and parse a file, use YAML's load_file method.
foo = YAML.load_file('file.yaml')
# => ["extra", "intra", "lateral"]
"How do I parse a YAML file?" might be of use, as well as the other "Related" links on the right side of this page.

Annotating Ruby structures to include anchors/references on #to_yaml

I have some large hashes (>10⁵ keys) with interlocking structures. They're stored on disk as YAML. I'd like to avoid duplication by using anchors and references in the YAML, but I haven't been able to figure out if there's a way to do it implicitly in the hash such that the #to_yaml method will label the anchor nodes properly.
Desired YAML:
---
parent1:
common-element-1: &CE1
complex-structure-goes: here
parent2:
uncomment-element-1:
blah: blah
<<: *CE1
Ruby code:
hsh = {
'parent1' => {
'common-element-1' => {
'complex-structure-goes' => 'here',
},
'parent2' => {
'uncommon-element-1' => {
'blah' => 'blah',
},
'<<' => '*CE1',
},
}
The reference is quite straightforward -- but how to embed the &CE1 anchor in the 'common-element-1' item in the Ruby hash?
I want to work as much as possible with native Ruby primitive types (like Hash) rather than mucking about with builders and emitters and such -- and I definitely don't want to write the YAML manually!
I've looked at Read and write YAML files without destroying anchors and aliases? and its relative, among other places, but haven't found an answer yet -- at least not that I've understood.
Thanks!
If you use the same Ruby object, the YAML library will set up references for you:
> common = {"ohai" => "I am common"}
> doc = {"parent1" => {"id" => 1, "stuff" => common}, "parent2" => {"id" => 2, "stuff" => common}}
> puts doc.to_yaml
---
parent1:
id: 1
stuff: &70133422893680
ohai: I am common
parent2:
id: 2
stuff: *70133422893680
I'm not sure there's a straightforward way of defining Hashes that are subsets of each other, though. Perhaps tweaking your structure a bit would be warranted?

Making rails do variable replacement from a db string

I have a string in a db that contains a local variable reference and I want Ruby to parse and replace it.
For example, the string in the db is "Hello #{classname.name}" and it is stored in classname.description
and my code reads:
<%=h #classname.description %>
Put that just prints the exact value from the db:
Hello #{name}
and not the (assume classname.name is Bob):
Hello Bob
How do I get Ruby to parse the string from the db?
You can use eval() to do this. For example:
>> a = {:name => 'bob'}
=> {:name=>"bob"}
>> eval('"Hello #{a[:name]}"')
=> "Hello bob"
However, what you are doing can be very dangerous and is almost never necessary. I can not be sure that this is or isn't the right way to do things for your project, but in general storing code to be executed in your database is bad practice.
Why don't you use a safe template engine like Liquid, to get around the eval problem?
template_string = "Hello {{name}}" #actually get from database
template = Liquid::Template.parse(template_string) #compile template
name = 'Bob'
text = template.render( 'name' => name )

Resources