I'm fairly new to Ruby and I've inherited some code which does a "deep merge" of some YAML. Here's the relevant part :-
class ::Hash
def deep_merge(second)
merger = proc { |key, v1, v2| Hash === v1 && Hash === v2 ? v1.merge(v2, &merger) : Array === v1 && Array === v2 ? v1 | v2 : [:undefined, nil, :nil].include?(v2) ? v1 : v2 }
self.merge(second.to_h, &merger)
end
end
which I found fairly unreadable TBH. It falls over when I pass it the following YAML :-
- {key: nginx.ingress.kubernetes.io/auth-type, value: basic}
- {key: nginx.ingress.kubernetes.io/auth-secret, value: basic-auth}
- {key: nginx.ingress.kubernetes.io/auth-realm, value: 'Authentication Required.'}
the "-" are all indented in the yaml input, but code formatting is messing with that here.
Here's a stripped down version of the YAML I'm trying to merge with (which also fails)
service:
container:
port: 3000
Any ideas?
OK I found the problem. I had forgotten to add a label to the YAML (annotations:) and as soon as I put that on, it started working again. Should I delete the question?
Related
I'm facing a problem that I couldn't find a working solution yet.
I have my YAML config file for the environment, let's call it development.yml.
This file is used to create the hash that should be updated:
data = YAML.load_file(File.join(Rails.root, 'config', 'environments', 'development.yml'))
What I'm trying to accomplish is something along these lines. Let's suppose we have an element of the sort
data['server']['test']['user']
data['server']['test']['password']
What I want to have is:
data['server']['test']['user'] = #{Server.Test.User}
data['server']['test']['password'] = #{Server.Test.Password}
The idea is to create a placeholder for each value that is the key mapping for that value dynamically, going until the last level of the hash and replacing the value with the mapping to this value, concatenating the keys.
Sorry, it doesn't solve my problem. The location data['server']['test']['user'] will be built dynamically, via a loop that will go through a nested Hash. The only way I found to do it was to append to the string the key for the current iteration of the Hash. At the end, I have a string like "data['server']['test']['name']", which I was thinking on converting to a variable data['server']['test']['name'] and then assigning to this variable the value #{Server.Test.Name}. Reading my question I'm not sure if this is clear, I hope this helps to clarify it.
Input sample:
api: 'active'
server:
test:
user: 'test'
password: 'passwordfortest'
prod:
user: 'nottest'
password: 'morecomplicatedthantest'
In this case, the final result should be to update this yml file in this way:
api: #{Api}
server:
test:
user: #{Server.Test.User}
password: #{Server.Test.Password}
prod:
user: #{Server.Prod.User}
password: #{Server.Prod.Password}
It sounds silly, but I couldn't figure out a way to do it.
I am posting another answer now since I realize what the question is all about.
Use Iteraptor gem:
require 'iteraptor'
require 'yaml'
# or load from file
yaml = <<-YAML.squish
api: 'active'
server:
test:
user: 'test'
password: 'passwordfortest'
prod:
user: 'nottest'
password: 'morecomplicatedthantest'
YAML
mapped =
yaml.iteraptor.map(full_parent: true) do |parent, (k, _)|
v = parent.map(&:capitalize).join('.')
[k, "\#{#{v}}"]
end
puts YAML.dump(mapped)
#⇒ ---
# api: "#{Api}"
# server:
# test:
# user: "#{Server.Test.User}"
# password: "#{Server.Test.Password}"
# prod:
# user: "#{Server.Prod.User}"
# password: "#{Server.Prod.Password}"
puts YAML.dump(mapped).delete('"')
#⇒ ---
# api: #{Api}
# server:
# test:
# user: #{Server.Test.User}
# password: #{Server.Test.Password}
# prod:
# user: #{Server.Prod.User}
# password: #{Server.Prod.Password}
Use String#%:
input = %|
data['server']['host']['name'] = %{server_host}
data['server']['host']['user'] = %{server_host_user}
data['server']['host']['password'] = %{server_host_password}
|
puts (
input % {server_host: "Foo",
server_host_user: "Bar",
server_host_password: "Baz"})
#⇒ data['server']['host']['name'] = Foo
# data['server']['host']['user'] = Bar
# data['server']['host']['password'] = Baz
You can not add key-value pair to a string.
data['server']['host'] # => which results in a string
Option 1:
You can either save Server.Host as host name in the hash
data['server']['host']['name'] = "#{Server.Host}"
data['server']['host']['user'] = "#{Server.Host.User}"
data['server']['host']['password'] = "#{Server.Host.Password}"
Option 2:
You can construct the hash in a single step with Host as key.
data['server']['host'] = { "#{Server.Host}" => {
'user' => "#{Server.Host.User}",
'password' => "#{Server.Host.Password}"
}
}
I’ve just started using YAML (through pyyaml) and I was wondering if there is any way to state that the value of a key is the key name itself or the parent key.
For example
---
foo: &FOO
bar: !.
baz: !..
foo2:
<<: *FOO
…
{‘foo’: {‘bar’: ‘bar’, ‘baz’: ’foo’}, ‘foo2’:{‘bar’:’bar’, ‘baz’:’foo2’}}
(notice the dot and double dot on bar and baz respectively - those are just placeholders for getting the key name and parent key name)
I've tried using add_constructor:
def key_construct(loader, node):
# return the key here
pass
yaml.add_constructor('!.', key_construct)
but Node, doesn't hold the key (or a reference to the parent) and I couldn't find the way to get them.
EDIT:
So, here is my real use case and a solution based on Anthon's response:
I have a logger configuration file (in yaml), and I wanted to reuse some definitions there:
handlers:
base: &base_handler
(): globals.TimedRotatingFileHandlerFactory
name: ../
when: midnight
backupCount: 14
level: DEBUG
formatter: generic
syslog:
class: logging.handlers.SysLogHandler
address: ['localhost', 514]
facility: local5
level: NOTSET
formatter: syslog
access:
<<: *base_handler
error:
<<: *base_handler
loggers:
base: &base_logger
handlers: [../, syslog]
qualname: ../
access:
<<: *base_logger
error:
<<: *base_logger
handlers: [../, syslog, email]
The solution, as Anthon suggested was to traverse the configuration dictionary after is was being processed:
def expand_yaml(val, parent=None, key=None, key1=None, key2=None):
if isinstance(val, str):
if val == './':
parent[key] = key1
elif val == '../':
parent[key] = key2
elif isinstance(val, dict):
for k, v in val.items():
expand_yaml(v, val, k, k, key1)
elif isinstance(val, list):
parent[key] = val[:] # support inheritance of the list (as in *base_logger)
for index, e in enumerate(parent[key]):
expand_yaml(e, parent[key], index, key1, key2)
return val
You don't have much context when you are constructing an element, so you are not going to find your key, and certainly not the parent key, to fill in the values, without digging in the call stack for the context (the loader knows about foo, bar and baz, but not in a way you can use to determine which is the corresponding key or parent_key).
What I suggest you do is create a special node that you return with the key_construct and then after the YAML load, walk the structure that your yaml.load() returned. Unless you have other ! objects, which make it more difficult to walk the resulting combination than a pure combination of sequences/lists and mappings/dicts ¹:
import ruamel.yaml as yaml
yaml_str = """\
foo: &FOO
bar: !.
baz: !..
foo2:
<<: *FOO
"""
class Expander:
def __init__(self, tag):
self.tag = tag
def expand(self, key, parent_key):
if self.tag == '!.':
return key
elif self.tag == '!..':
return parent_key
raise NotImplementedError
def __repr__(self):
return "E({})".format(self.tag)
def expand(d, key=None, parent_key=None):
if isinstance(d, list):
for elem in d:
expand(elem, key=key, parent_key=parent_key)
elif isinstance(d, dict):
for k in d:
v = d[k]
if isinstance(v, Expander):
d[k] = v.expand(k, parent_key)
expand(d[k], key, parent_key=k)
return d
def key_construct(loader, node):
return Expander(node.tag)
yaml.add_constructor('!.', key_construct)
yaml.add_constructor('!..', key_construct)
data = yaml.load(yaml_str)
print(data)
print(expand(data))
gives you:
{'foo': {'bar': E(!.), 'baz': E(!..)}, 'foo2': {'bar': E(!.), 'baz': E(!..)}}
{'foo': {'bar': 'bar', 'baz': 'foo'}, 'foo2': {'bar': 'bar', 'baz': 'foo2'}}
¹ This was done using ruamel.yaml of which I am the author. PyYAML, of which ruamel.yaml is a functional superset, should work the same.
my goal:
check if yaml document include value for specific key using ypath/xpath
select value for specified key using ypath/xpath
document yaml:
app:
name: xxx
version: xxx
description:
author:
name: xxx
surname: xxx
email: xxx#xxx.xx
what was checked:*
google
stackoverflow
Ruby API (YAML::DBM as one of methods it provide is select)
example:
Module::Class.select('description/author/name')
Module::Class.select('*/name')
Module::Class.isset?('*/name')
Use yaml:
require 'yaml'
yml = YAML.load_file('your_file.yml')
Now yml is a hash. You can use it like one. Here is a simple and ugly solution for what you try:
if !yml["description"].nil? && !yml["description"]["author"].nil? && !yml["description"]["author"]["name"].nil? && !yml["description"]["author"]["name"].empty?
puts "An author is set!"
end
Since there are no up-to-date YPath implementations around, I would suggest to give a chance ActiveSupport and Nokogiri:
yml = LOAD_YML_WITH_YOUR_PREFERRED_YAML_ENGINE
# ActiveSupport adds a to_xml method to Hash
xml = yml.to_xml(:root => 'yaml')
doc = Nokogiri::XML(xml)
doc.xpath("description/author/name").map do |name|
puts [name['key'], name['value']]
end
I need to open a YAML file with aliases used inside it:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: other
This obviously expands out to an equivalent YAML document of:
defaults:
foo: bar
zip: button
node:
foo: other
zip: button
Which YAML::load reads it as.
I need to set new keys in this YAML document and then write it back out to disk, preserving the original structure as much as possible.
I have looked at YAML::Store, but this completely destroys the aliases and anchors.
Is there anything available that could something along the lines of:
thing = Thing.load("config.yml")
thing[:node][:foo] = "yet another"
Saving the document back as:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
?
I opted to use YAML for this due to the fact it handles this aliasing well, but writing YAML that contains aliases appears to be a bit of a bleak-looking playing field in reality.
The use of << to indicate an aliased mapping should be merged in to the current mapping isn’t part of the core Yaml spec, but it is part of the tag repository.
The current Yaml library provided by Ruby – Psych – provides the dump and load methods which allow easy serialization and deserialization of Ruby objects and use the various implicit type conversion in the tag repository including << to merge hashes. It also provides tools to do more low level Yaml processing if you need it. Unfortunately it doesn’t easily allow selectively disabling or enabling specific parts of the tag repository – it’s an all or nothing affair. In particular the handling of << is pretty baked in to the handling of hashes.
One way to achieve what you want is to provide your own subclass of Psych’s ToRuby class and override this method, so that it just treats mapping keys of << as literals. This involves overriding a private method in Psych, so you need to be a little careful:
require 'psych'
class ToRubyNoMerge < Psych::Visitors::ToRuby
def revive_hash hash, o
#st[o.anchor] = hash if o.anchor
o.children.each_slice(2) { |k,v|
key = accept(k)
hash[key] = accept(v)
}
hash
end
end
You would then use it like this:
tree = Psych.parse your_data
data = ToRubyNoMerge.new.accept tree
With the Yaml from your example, data would then look something like
{"defaults"=>{"foo"=>"bar", "zip"=>"button"},
"node"=>{"<<"=>{"foo"=>"bar", "zip"=>"button"}, "foo"=>"other"}}
Note the << as a literal key. Also the hash under the data["defaults"] key is the same hash as the one under the data["node"]["<<"] key, i.e. they have the same object_id. You can now manipulate the data as you want, and when you write it out as Yaml the anchors and aliases will still be in place, although the anchor names will have changed:
data['node']['foo'] = "yet another"
puts Yaml.dump data
produces (Psych uses the object_id of the hash to ensure unique anchor names (the current version of Psych now uses sequential numbers rather than object_id)):
---
defaults: &2151922820
foo: bar
zip: button
node:
<<: *2151922820
foo: yet another
If you want to have control over the anchor names, you can provide your own Psych::Visitors::Emitter. Here’s a simple example based on your example and assuming there’s only the one anchor:
class MyEmitter < Psych::Visitors::Emitter
def visit_Psych_Nodes_Mapping o
o.anchor = 'defaults' if o.anchor
super
end
def visit_Psych_Nodes_Alias o
o.anchor = 'defaults' if o.anchor
super
end
end
When used with the modified data hash from above:
#create an AST based on the Ruby data structure
builder = Psych::Visitors::YAMLTree.new
builder << data
ast = builder.tree
# write out the tree using the custom emitter
MyEmitter.new($stdout).accept ast
the output is:
---
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
(Update: another question asked how to do this with more than one anchor, where I came up with a possibly better way to keep anchor names when serializing.)
YAML has aliases and they can round-trip, but you disable it by hash merging. << as a mapping key seems a non-standard extension to YAML (both in 1.8's syck and 1.9's psych).
require 'rubygems'
require 'yaml'
yaml = <<EOS
defaults: &defaults
foo: bar
zip: button
node: *defaults
EOS
data = YAML.load yaml
print data.to_yaml
prints
---
defaults: &id001
zip: button
foo: bar
node: *id001
but the << in your data merges the aliased hash into a new one which is no longer an alias.
Have you try Psych ? Another question with psych here.
I'm generating my CircleCI config file with a Ruby script and ERB templates. My script parses and regenerates the YAML, so I wanted to preserve all the anchors. The anchors in my config all have the same name as the key that defines them, e.g.
docker_images:
docker_auth: &docker_auth
username: '$DOCKERHUB_USERNAME'
password: '$DOCKERHUB_TOKEN'
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_ruby_image
So I was able to solve this with regular expressions on the generated YAML string. It wrote a #restore_yaml_anchors method that converts &1 and *1 back into &docker_auth and *docker_auth.
# Ruby 3.1.2
require 'rubygems'
require 'yaml'
yaml = <<EOS
docker_images:
docker_auth: &docker_auth
username: '$DOCKERHUB_USERNAME'
password: '$DOCKERHUB_TOKEN'
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_base_image
EOS
data = YAML.load yaml, aliases: true # needed for Ruby 3.x
def restore_yaml_anchors(yaml)
yaml.scan(/([A-Z0-9a-z_]+|<<): &(\d+)/).each do |anchor_name, anchor_id|
yaml.gsub!(/([:-]) (\*|&)#{anchor_id}/, "\\1 \\2#{anchor_name}")
end
yaml
end
puts [
"Original #to_yaml:",
data.to_yaml,
"-----------------------", '',
"With restored anchors:",
restore_yaml_anchors(data.to_yaml)
].join("\n")
Output:
Original #to_yaml:
---
docker_images:
docker_auth: &1
username: "$DOCKERHUB_USERNAME"
password: "$DOCKERHUB_TOKEN"
cimg_base_image: &2
image: cimg/base:2022.09
auth: *1
jobs:
tests:
docker:
- *2
-----------------------
With restored anchors:
---
docker_images:
docker_auth: &docker_auth
username: "$DOCKERHUB_USERNAME"
password: "$DOCKERHUB_TOKEN"
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_base_image
It's working well for my CI config, but you may need to update it to handle some other cases in your own YAML.
In the following Ruby example, is there a mode to have YAML NOT silently ignore the duplicate key 'one'?
irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> str = '{ one: 1, one: 2 }'
=> "{ one: 1, one: 2 }"
irb(main):003:0> YAML.load(str)
=> {"one"=>2}
Thanks!
Using Psych, you can traverse the AST tree to find duplicate keys. I'm using the following helper method in my test suite to validate that there are no duplicate keys in my i18n translations:
def duplicate_keys(file_or_content)
yaml = file_or_content.is_a?(File) ? file_or_content.read : file_or_content
duplicate_keys = []
validator = ->(node, parent_path) do
if node.is_a?(Psych::Nodes::Mapping)
children = node.children.each_slice(2) # In a Mapping, every other child is the key node, the other is the value node.
duplicates = children.map { |key_node, _value_node| key_node }.group_by(&:value).select { |_value, nodes| nodes.size > 1 }
duplicates.each do |key, nodes|
duplicate_key = {
file: (file_or_content.path if file_or_content.is_a?(File)),
key: parent_path + [key],
occurrences: nodes.map { |occurrence| "line: #{occurrence.start_line + 1}" },
}.compact
duplicate_keys << duplicate_key
end
children.each { |key_node, value_node| validator.call(value_node, parent_path + [key_node.try(:value)].compact) }
else
node.children.to_a.each { |child| validator.call(child, parent_path) }
end
end
ast = Psych.parse_stream(yaml)
validator.call(ast, [])
duplicate_keys
end
There is a solution involving a linter, but I'm not sure it will be relevant to
you since it's not a 100% Ruby solution. I'll post it anyway since I don't know
any way to do this in Ruby:
You can use the yamllint command-line tool:
sudo pip install yamllint
Specifically, it has a rule key-duplicates that detects duplicated keys:
$ cat test.yml
{ one: 1, one: 2 }
$ yamllint test.yml
test.yml
1:11 error duplication of key "one" in mapping (key-duplicates)
One of the things I do to help maintain the YAML files I use, is write code to initially generate it from a known structure in Ruby. That gets me started.
Then, I'll write a little snippet that loads it and outputs what it parsed using either PrettyPrint or Awesome Print so I can compare that to the file.
I also sort the fields as necessary to make it easy to look for duplicates.
No. You'd have to decide how to rename the keys since hash keys have to be unique - I'd go for some workaround like manually looking for keys that are the same and renaming them before you do a YAML::load.