Read and write YAML files without destroying anchors and aliases? - ruby

I need to open a YAML file with aliases used inside it:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: other
This obviously expands out to an equivalent YAML document of:
defaults:
foo: bar
zip: button
node:
foo: other
zip: button
Which YAML::load reads it as.
I need to set new keys in this YAML document and then write it back out to disk, preserving the original structure as much as possible.
I have looked at YAML::Store, but this completely destroys the aliases and anchors.
Is there anything available that could something along the lines of:
thing = Thing.load("config.yml")
thing[:node][:foo] = "yet another"
Saving the document back as:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
?
I opted to use YAML for this due to the fact it handles this aliasing well, but writing YAML that contains aliases appears to be a bit of a bleak-looking playing field in reality.

The use of << to indicate an aliased mapping should be merged in to the current mapping isn’t part of the core Yaml spec, but it is part of the tag repository.
The current Yaml library provided by Ruby – Psych – provides the dump and load methods which allow easy serialization and deserialization of Ruby objects and use the various implicit type conversion in the tag repository including << to merge hashes. It also provides tools to do more low level Yaml processing if you need it. Unfortunately it doesn’t easily allow selectively disabling or enabling specific parts of the tag repository – it’s an all or nothing affair. In particular the handling of << is pretty baked in to the handling of hashes.
One way to achieve what you want is to provide your own subclass of Psych’s ToRuby class and override this method, so that it just treats mapping keys of << as literals. This involves overriding a private method in Psych, so you need to be a little careful:
require 'psych'
class ToRubyNoMerge < Psych::Visitors::ToRuby
def revive_hash hash, o
#st[o.anchor] = hash if o.anchor
o.children.each_slice(2) { |k,v|
key = accept(k)
hash[key] = accept(v)
}
hash
end
end
You would then use it like this:
tree = Psych.parse your_data
data = ToRubyNoMerge.new.accept tree
With the Yaml from your example, data would then look something like
{"defaults"=>{"foo"=>"bar", "zip"=>"button"},
"node"=>{"<<"=>{"foo"=>"bar", "zip"=>"button"}, "foo"=>"other"}}
Note the << as a literal key. Also the hash under the data["defaults"] key is the same hash as the one under the data["node"]["<<"] key, i.e. they have the same object_id. You can now manipulate the data as you want, and when you write it out as Yaml the anchors and aliases will still be in place, although the anchor names will have changed:
data['node']['foo'] = "yet another"
puts Yaml.dump data
produces (Psych uses the object_id of the hash to ensure unique anchor names (the current version of Psych now uses sequential numbers rather than object_id)):
---
defaults: &2151922820
foo: bar
zip: button
node:
<<: *2151922820
foo: yet another
If you want to have control over the anchor names, you can provide your own Psych::Visitors::Emitter. Here’s a simple example based on your example and assuming there’s only the one anchor:
class MyEmitter < Psych::Visitors::Emitter
def visit_Psych_Nodes_Mapping o
o.anchor = 'defaults' if o.anchor
super
end
def visit_Psych_Nodes_Alias o
o.anchor = 'defaults' if o.anchor
super
end
end
When used with the modified data hash from above:
#create an AST based on the Ruby data structure
builder = Psych::Visitors::YAMLTree.new
builder << data
ast = builder.tree
# write out the tree using the custom emitter
MyEmitter.new($stdout).accept ast
the output is:
---
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
(Update: another question asked how to do this with more than one anchor, where I came up with a possibly better way to keep anchor names when serializing.)

YAML has aliases and they can round-trip, but you disable it by hash merging. << as a mapping key seems a non-standard extension to YAML (both in 1.8's syck and 1.9's psych).
require 'rubygems'
require 'yaml'
yaml = <<EOS
defaults: &defaults
foo: bar
zip: button
node: *defaults
EOS
data = YAML.load yaml
print data.to_yaml
prints
---
defaults: &id001
zip: button
foo: bar
node: *id001
but the << in your data merges the aliased hash into a new one which is no longer an alias.

Have you try Psych ? Another question with psych here.

I'm generating my CircleCI config file with a Ruby script and ERB templates. My script parses and regenerates the YAML, so I wanted to preserve all the anchors. The anchors in my config all have the same name as the key that defines them, e.g.
docker_images:
docker_auth: &docker_auth
username: '$DOCKERHUB_USERNAME'
password: '$DOCKERHUB_TOKEN'
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_ruby_image
So I was able to solve this with regular expressions on the generated YAML string. It wrote a #restore_yaml_anchors method that converts &1 and *1 back into &docker_auth and *docker_auth.
# Ruby 3.1.2
require 'rubygems'
require 'yaml'
yaml = <<EOS
docker_images:
docker_auth: &docker_auth
username: '$DOCKERHUB_USERNAME'
password: '$DOCKERHUB_TOKEN'
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_base_image
EOS
data = YAML.load yaml, aliases: true # needed for Ruby 3.x
def restore_yaml_anchors(yaml)
yaml.scan(/([A-Z0-9a-z_]+|<<): &(\d+)/).each do |anchor_name, anchor_id|
yaml.gsub!(/([:-]) (\*|&)#{anchor_id}/, "\\1 \\2#{anchor_name}")
end
yaml
end
puts [
"Original #to_yaml:",
data.to_yaml,
"-----------------------", '',
"With restored anchors:",
restore_yaml_anchors(data.to_yaml)
].join("\n")
Output:
Original #to_yaml:
---
docker_images:
docker_auth: &1
username: "$DOCKERHUB_USERNAME"
password: "$DOCKERHUB_TOKEN"
cimg_base_image: &2
image: cimg/base:2022.09
auth: *1
jobs:
tests:
docker:
- *2
-----------------------
With restored anchors:
---
docker_images:
docker_auth: &docker_auth
username: "$DOCKERHUB_USERNAME"
password: "$DOCKERHUB_TOKEN"
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_base_image
It's working well for my CI config, but you may need to update it to handle some other cases in your own YAML.

Related

How to get Ruby to retrieve specific data from a YAML file?

First off, I am still a novice Ruby user so this is probably a trivial question but I'm still struggling regardless.
So I have a YAML file set up like so:
userA:
{
nick: cat ,
fruit: apple ,
canDance: true ,
age: 20
}
userB:
{
nick: dog ,
fruit: orange ,
canDance: false ,
age: 23
}
Assuming that the YAML file has been loaded into Ruby, how would I be able to retrieve specific parts of this file, such as retrieving userA's fruit, or userB's canDance? Thanks in advance.
You can read the required information from your YAML like this:
require 'yaml'
people = YAML.load_file('the_filename.yaml')
puts people['userA']['fruit'] #=> 'apple'
puts people['userB']['canDance'] #=> true
Note: Your YAML file seems to be valid and can be read by the default Ruby YAML parser. But it uses a very special and uncommon syntax. I suggest writing your YAML like this:
userA:
nick: cat
fruit: apple
canDance: true
age: 20
userB:
nick: dog
fruit: orange
canDance: false
age: 23
Updated: Your sample data can be parsed as is by Ruby's standard lib YAML, however the curly braces and commas are not required.
Here is an example with some mixed types added for hobbies
test.yml
---
userA:
nick: cat
fruit: apple
canDance: true
age: 20
hobbies:
- coding
- tennis
music:
production: true
djing: true
guitar: true
userB:
nick: dog
fruit: orange
canDance: false
age: 23
hobbies:
- coding
- ruby
sports:
tennis: always
soccer: sometimes
running: rarely
Use Ruby's Yaml core lib which you can simply require.
require 'yaml'
people = File.load_file 'test.yml'
people is now an instance of Hash class which allows you to get the values of keys by calling them inside of square braces like so:
people['userA']
Now you can dig through the object by chaining keys like this:
people['userA']['hobbies']
However note that you will get an error if the chain "breaks"
people['userB']['sports']['tennis'] # this works
=>"always"
people['userA']['sports']['tennis'] # this will raise
=>NoMethodError: undefined method `[]' for nil:NilClass
Exception is raised because people['userA']['sports'] returns nil so trying to chain ['tennis'] throws the error. A useful way to avoid this when digging through a deeply nested hash is to use .dig
people.dig('userB','sports','tennis')
=>"always"
people.dig('userA','sports','tennis')
=>nil #
people.dig('userA','music','djing')
=>true
people.dig('userB','music','djing')
=>nil
With hashes who's key's are strings you can also string interpolate. Let's say we want to randomly select a user and dig through it we may do something like:
people.dig("user#{ ['A','B'].sample }",'music','djing')

How does a YAML double exclamation point work in this i18n gem?

I'm not using Rails and I haven't done any internationalization before, so I'm trying to understand how this particular example works but I'm a little bit stumped:
The r18n-desktop gem reads from a YAML file for translations. Pretty straightforward.
YAML file en.yml:
user:
edit: Edit user
name: User name is %1
count: !!pl
1: There is 1 user
n: There are %1 users
log:
signup: !!gender
male: Он зарегистрировался
female: Она зарегистрировалась
Test ruby code:
require 'r18n-desktop'
R18n.from_env('./localizations/')
R18n::Filters.add('gender') do |translation, config, user|
puts translation
puts config
puts user
translation[user.gender]
end
include R18n::Helpers
class Ayy
attr_accessor :gender
end
girl = Ayy.new
girl.gender = :female
puts t.user.count(5)
puts t.log.signup girl
Output:
There are 5 users
localization-test.rb:13:in
puts: can't convert R18n::Translation to Array (R18n::Translation#to_ary gives R18n::Untranslated) (TypeError) from localization-test.rb:13:in puts' from localization-test.rb:13:in '
Addenum: Looks like the error is in puts rather than the "translation". The actual result of a translation is log.signup[] though so the gender isn't getting through.
What is t.log.signup() expecting?
Seems like you forget to set a filter for !!gender custom type.
R18n has only few built-in filter — like !!pl. Gender filter is not built-in, you need to define it manually.
R18n Filter docs already contains simple filter example for gender:
R18n::Filters.add('gender') do |translation, config, user|
translation[user.gender]
end

Create nested object from YAML to access attributes via method calls in Ruby

I am completely new to ruby.
I have to parse a YAML file to construct an object
YAML File
projects:
- name: Project1
developers:
- name: Dev1
certifications:
- name: cert1
- name: Dev2
certifications:
- name: cert2
- name: Project2
developers:
- name: Dev1
certifications:
- name: cert3
- name: Dev2
certifications:
- name: cert4
I want to create an object from this YAML for which I wrote the following code in Ruby
require 'yaml'
object = YAML.load(File.read('./file.yaml'))
I can successfully access the attributes of this object with []
For e.g.
puts object[projects].first[developers].last[certifications].first[name]
# prints ABC
However, I want to access the attributes via method calls
For e.g.
puts object.projects.first.developers.last.certifications.first.name
# should print ABC
Is there any way to construct such an object whose attributes can be accessed in the (dots) way mentioned above?
I have read about OpenStruct and hashugar.
I also want to avoid usage of third party gems
Nice answer from Xavier, but it can be shorter, just require yaml, json and ostruct and parse your YAML, convert it to JSON, parse it in an Openstruct (a Struct would also be possible) like this
object = JSON.parse(YAML.load(yaml).to_json, object_class: OpenStruct)
To load your YAML from a file it's
object = JSON.parse(YAML::load_file("./test.yaml").to_json, object_class: OpenStruct)
This gives
object
=>#<OpenStruct projects=[#<OpenStruct name="Project1", developers=[#<OpenStruct name="Dev1", certifications=[#<OpenStruct name="cert1">]>, #<OpenStruct name="Dev2", certifications=[#<OpenStruct name="cert2">]>]>, #<OpenStruct name="Project2", developers=[#<OpenStruct name="Dev1", certifications=[#<OpenStruct name="cert3">]>, #<OpenStruct name="Dev2", certifications=[#<OpenStruct name="cert4">]>]>]>
object.projects.first.developers.last.certifications.first.name
=>cert2
I use this for loading configurations from file, a Yaml is easily to maintain and in your code it's easier to use than a configuration in Hash.
Don't do this for repetitive tasks.
If you are just experimenting, there is a quick and dirty way to do this:
class Hash
def method_missing(name, *args)
send(:[], name.to_s, *args)
end
end
I wouldn't use that in production code though, since both method_missing and monkey-patching are usually recipes for trouble down the road.
A better solution is to recursively traverse the data-structure and replace hashes with openstructs.
require 'ostruct'
def to_ostruct(object)
case object
when Hash
OpenStruct.new(Hash[object.map {|k, v| [k, to_ostruct(v)] }])
when Array
object.map {|x| to_ostruct(x) }
else
object
end
end
puts to_ostruct(object).projects.first.developers.last.certifications.first.name
Note that there are potentially performance issues with either approach if you are doing them a lot - if your application is time-sensitive make sure you benchmark them! This probably isn't relevant to you though.

Can I manipulate yaml files and write them out again

I have a map of values, the key is a filename and the value is an array strings.
I have the corresponding files
how would I load the file and create a fixed yaml value which contains the value of the array whether or not the value already exists
e.g.
YAML (file.yaml)
trg::azimuth:
-extra
-intra
-lateral
or
trg::azimuth:
[extra,intra,lateral]
from
RUBY
{"file.yaml" => ["extra","intra","lateral"]}
The YAML documentation doesn't cover its methods very well, but does say
The underlying implementation is the libyaml wrapper Psych.
The Psych documentation, which underlies YAML, covers reading, parsing, and emitting YAML.
Here's the basic process:
require 'yaml'
foo = {"file.yaml" => ["extra","intra","lateral"]}
bar = foo.to_yaml
# => "---\nfile.yaml:\n- extra\n- intra\n- lateral\n"
And here's what the generated, serialized bar variable looks like if written:
puts bar
# >> ---
# >> file.yaml:
# >> - extra
# >> - intra
# >> - lateral
That's the format a YAML parser needs:
baz = YAML.load(bar)
baz
# => {"file.yaml"=>["extra", "intra", "lateral"]}
At this point the hash has gone round-trip, from a Ruby hash, to a YAML-serialized string, back to a Ruby hash.
Writing YAML to a file is easy using Ruby's File.write method:
File.write(foo.keys.first, foo.values.first.to_yaml)
or
foo.each do |k, v|
File.write(k, v.to_yaml)
end
Which results in a file named "file.yaml", which contains:
---
- extra
- intra
- lateral
To read and parse a file, use YAML's load_file method.
foo = YAML.load_file('file.yaml')
# => ["extra", "intra", "lateral"]
"How do I parse a YAML file?" might be of use, as well as the other "Related" links on the right side of this page.

Search/check values in YAML document with Ruby

my goal:
check if yaml document include value for specific key using ypath/xpath
select value for specified key using ypath/xpath
document yaml:
app:
name: xxx
version: xxx
description:
author:
name: xxx
surname: xxx
email: xxx#xxx.xx
what was checked:*
google
stackoverflow
Ruby API (YAML::DBM as one of methods it provide is select)
example:
Module::Class.select('description/author/name')
Module::Class.select('*/name')
Module::Class.isset?('*/name')
Use yaml:
require 'yaml'
yml = YAML.load_file('your_file.yml')
Now yml is a hash. You can use it like one. Here is a simple and ugly solution for what you try:
if !yml["description"].nil? && !yml["description"]["author"].nil? && !yml["description"]["author"]["name"].nil? && !yml["description"]["author"]["name"].empty?
puts "An author is set!"
end
Since there are no up-to-date YPath implementations around, I would suggest to give a chance ActiveSupport and Nokogiri:
yml = LOAD_YML_WITH_YOUR_PREFERRED_YAML_ENGINE
# ActiveSupport adds a to_xml method to Hash
xml = yml.to_xml(:root => 'yaml')
doc = Nokogiri::XML(xml)
doc.xpath("description/author/name").map do |name|
puts [name['key'], name['value']]
end

Resources