How do I parse YAML into a hash/object? - ruby

I have a YAML file with a few entries that look like this:
001:
:title: Some title
:description: Some body text maybe
002:
:title: Some title
:description: Some body text maybe
I'm using the following Ruby method to parse that YAML file into a set of objects I can iterate over:
def parse_yaml(file)
YAML::load(File.open(File.join(settings.yaml_folder, file)))
end
def use_yaml
#items = parse_yaml('items.yml')
#items.each do |item|
x = item[1][:title]
etc...
end
end
Now, that method works, but I find it queer that I need to use item[1][:title] to access the attributes of the object I'm iterating over. How can I build my YAML file or my parsing code to allow me to use the more standard item[:title]?

It's a Hash. The parse_yaml output is:
{ 1=>
{ :title=>"Some title",
:description=>"Some body text maybe"},
2=> { :title=>"Some title",
:description=>"Some body text maybe" }
}
You may to use the each_value method like this:
#...
#items = parse_yaml('items.yml')
#items.each_value do |item|
x = item[:title]
# ... etc
end
Recomend: YAML for Ruby

The underlying issue is that your YAML file is storing your data as a hash, and trying to access it like an array.
To convert your data into array format:
---
- :title: Some title
:description: Some body text maybe
- :title: Some title
:description: Some body text maybe
Also interesting to note, the reason you had to use item[1][:title] to reference your items is that the keys 001 and 002 are converted to integers by YAML.load.
You can confirm this in irb:
irb(main):015:0> YAML.load(File.open("./test.yml"))
=> {1=>{:title=>"Some title", :description=>"Some body text maybe"}, 2=>{:title=>"Some title", :description=>"Some body text maybe"}}

Your YAML is the serialisation of a hash so you could do:
#items.each do |key, item|
#do something with item[:title]
end
Or change your YAML to look like:
- :title: blah
:description: description
- :title: second title
:description: second description
Which will result in YAML.load returning an array.

Related

Ruby Serialize and DeSerialize Struct

I'm trying to serialize an S3 Object so that I can deserialize at a later time. Deserialization is failing to grab the Object's class and is not grouping the object's variables. Here's my current code:
require 'yaml'
def serialize_array_of_objects(array, filename)
unless array.empty?
File.open(filename, "w+") do |f|
array.each { |element|
serialized_object = YAML::dump(element)
f.write(serialized_object)
}
end
end
end
Here's the contents of the file (redacted):
--- !ruby/struct:Aws::S3::Types::Object
key: file1.csv
last_modified: 2019-03-24 17:24:41.000000000 Z
etag: '"REDACTED"'
size: 41248
storage_class: STANDARD
owner:
--- !ruby/struct:Aws::S3::Types::Object
key: file2.csv
last_modified: 2019-04-24 15:30:41.000000000 Z
etag: '"REDACTED"'
size: 33527
storage_class: STANDARD
owner:
To deserialize the objects I'm using this code:
def serialized_file_to_array(filename)
array = []
File.open(filename, "r").each { |line|
array << YAML::load(line)
}
return array
end
My problem is that the object get's distorted on load. Here's the array now:
[nil, {"key"=>"file1.csv"}, {"last_modified"=>2019-03-24 17:24:41 UTC}, {"etag"=>"\"REDACTED\""}, {"size"=>41248}, {"storage_class"=>"STANDARD"}, {"owner"=>nil}, nil, {"key"=>"file2.csv"}, {"last_modified"=>2019-04-24 15:30:41 UTC}, {"etag"=>"\"REDACTED\""}, {"size"=>33527}, {"storage_class"=>"STANDARD"}, {"owner"=>nil}]
I need to be able to pull the object key values in the deserialized version.
The issue is you dump objects resulting in several lines in the yaml file, but you load back lines deserializing lines. Each line does not obviously contain the whole object, that’s why you get an array hashes (one per line) back.
You need to collect lines until the new object marker is there or to read the whole file content, split it into objects with e. g. regular expression and load split objects.
The first approach would be like:
File.readlines(FILE).
each_with_object([[], []]) do |line, (inner_acc, outer_acc)|
if line.start_with?('---')
outer_acc << YAML.load(inner_acc.join) unless inner_acc.empty?
inner_acc.clear << line
else
inner_acc << line
end
end.tap do |inner_acc, outer_acc|
break outer_acc << YAML.load(inner_acc.join) # last chunk
end
With regular expression, it should be even simpler.

ruby yaml don't remove header %YAML 1.1

I have a array in ruby named array, I aded value into yaml file, but after in file.yml, it remove me %YAML 1.1, so I won't
yaml_string = File.read "file.yaml"
data = YAML.load yaml_string
array.each do |value|
data["title"] <<"- "+value+"\n"
end
output = YAML.dump data
File.write("file.yaml", output)
before execution, the header is present, but after execution it remove it (%YAML 1.1) and all lines comment with #, so I won't
I think something like this is what you're trying to do.
I'm assuming your yaml array of titles matches your array object.
Otherwise you could just use something like Enum#with_index if you just want to map the number of the yaml array to the text.
require 'psych'
filename = "sample_yaml.yml"
array = [0, 1, 2, 3]
if File.exists?(filename)
puts "File exists. :) Parsing the yaml file."
yaml = Psych.load_file(filename)
array.each do |value|
yaml[value]["title"] << " - #{value}" # find the title that matches the index number of array
end
else
raise ArgumentError, "bad file name"
end
puts "Outputting to reformatted yaml file"
File.open("reformatted_file.yaml", 'wb') {|f| f.write "%YAML 1.1\n" + Psych.dump(yaml)}
assuming yaml file like such
---
- title: zero
- title: one
- title: two
- title: three
Outputs
---
- title: zero - 0
- title: one - 1
- title: two - 2
- title: three - 3

Ruby Yard documentation: how to add a "verbatim" (to generate something like a <pre> tag)

I want a piece of code, like a hash, to display with fixed typeface on the resulting html. Suppose this is the contents of my file:
=begin
One example of valid hash to this function is:
{
:name => "Engelbert",
:id => 1345
}
=end
def f hash_param
# ...
end
How to instruct yard (using the default of the version 0.9.15) so a yard doc file.rb will generate, for the hash example, the equivalent of adding 4 backslashes to the markdown format, or 4 starting empty spaces to stackoverflow, or the <pre> tag in html, resulting in a verbatim/fixed typeface format in the resulting html?
Expected output:
One example of valid hash to this function is:
{
:name => "Engelbert",
:id => 1345
}
EDIT
> gem install redcarpet
> yard doc --markup-provider redcarpet --markup markdown - file.rb
Should wrap the contents of file.rb within a <pre> tag, producing this page.
Use #example
Show an example snippet of code for an object. The first line is an optional title.
# #example One example of valid hash to this function is:
# {
# :name => "Engelbert",
# :id => 1345
# }
def f hash_param
# ...
end
Maybe I don't get your question:
the equivalent of adding 4 backslashes to the markdown format, or 4 starting empty spaces to stackoverflow
If I use the 4 starting empty spaces in my code like this:
=begin
One example of valid hash to this function is:
{
:name => "Engelbert",
:id => 1345
}
=end
def f hash_param
# ...
end
then I get
But maybe you can also use #option:
#param hash_param
#option hash_param [String] :name The name of...
#option hash_param [Integer] :id The id of...
and you get:
Disclaimer: I used yard 0.9.26 for my examples.

Ruby HTMLish tokenizer

I'm looking for a resource for tokenizing HTMLish markup. I'm creating a markup language that is a lot like (but isn't) HTML. All I want is something that can parse it up into tags, text, comments, etc. I don't need the tokens to be arranged into a tree structure or checked if they're valid tags or whatever - I'll do that myself.
So, for example, if given this string:
hello <x> dude <whatever></x>
it would return an array something like this:
hello
<x>
dude
<whatever>
</x>
It could also return objects representing those strings. Either would be cool.
I've looked into Nokogiri and Oga, but they seem to just want to parse and tree HTML. Suggestions?
If you're willing to do much of the validation yourself, could a regular expression work? Something like:
html = 'hello <x> dude <whatever></x>'
html.split(/(<[^<>]+>)/)
#=> ["hello ", "<x>", " dude ", "<whatever>", "", "</x>"]
Otherwise, I wonder: could your markup be XMLish rather than HTMLish? For example, do you need to support void elements like <whatever>, or would it be enough to support self-closing tags like <whatever />? That is, are you committed to supporting markup like hello <x> dude <whatever></x>, or would supporting hello <x> dude <whatever /></x> (with the self-closing <whatever />) be enough?
If self-closing tags are enough, it sounds like an XML parser could do the trick. Even if the parser builds a tree, you can usually flatten that into an array.
If you need custom void elements, you may need to find an HTML parser that supports those. I don't know any offhand, but it should be possible to modify Oga to do that. You could also modify Oga to support flattening a tree into an array. Something like:
module Oga
module XML
# Redefine the list of void elements.
remove_const :HTML_VOID_ELEMENTS
const_set :HTML_VOID_ELEMENTS, Whitelist.new(%w{
whatever
})
class TokenGenerator < Generator
def initialize(*args)
super
#tokens = []
end
%i[
on_element on_text on_cdata on_comment on_xml_declaration
on_processing_instruction on_doctype on_document
after_element
].each do |method|
define_method method do |content, output|
token = super(content, '')
#tokens << token if token
super(content, output)
end
end
def to_tokens
#tokens = []
to_xml
#tokens
end
end
end
end
html = Oga.parse_html('hello <x> dude <whatever></x>')
Oga::XML::TokenGenerator.new(html).to_tokens
=> ["hello ", "<x>", " dude ", "<whatever>", "</x>"]

Manipulating XML files in ruby with XmlSimple

I've got a complex XML file, and I want to extract a content of a specific tag from it.
I use a ruby script with XmlSimple gem. I retrieve an XML file with HTTP request, then strip all the unnecessary tags and pull out necessary info. That's the script itself:
data = XmlSimple.xml_in(response.body)
hash_1 = Hash[*data['results']]
def find_value(hash, value)
hash.each do |key, val|
if val[0].kind_of? Hash then
find_value(val[0], value)
else
if key.to_s.eql? value
puts val
end
end
end
end
hash_1['book'].each do |arg|
find_value(arg, "title")
puts("\n")
end
The problem is, that when I change replace puts val with return val, and then call find_value method with puts find_value (arg, "title"), i get the whole contents of hash_1[book] on the screen.
How to correct the find_value method?
A "complex XML file" and XmlSimple don't mix. Your task would be solved a lot easier with Nokogiri, and be faster as well:
require 'nokogiri'
doc = Nokogiri::XML(response.body)
puts doc.xpath('//book/title/text()')

Resources