Can I manipulate yaml files and write them out again - ruby

I have a map of values, the key is a filename and the value is an array strings.
I have the corresponding files
how would I load the file and create a fixed yaml value which contains the value of the array whether or not the value already exists
e.g.
YAML (file.yaml)
trg::azimuth:
-extra
-intra
-lateral
or
trg::azimuth:
[extra,intra,lateral]
from
RUBY
{"file.yaml" => ["extra","intra","lateral"]}

The YAML documentation doesn't cover its methods very well, but does say
The underlying implementation is the libyaml wrapper Psych.
The Psych documentation, which underlies YAML, covers reading, parsing, and emitting YAML.
Here's the basic process:
require 'yaml'
foo = {"file.yaml" => ["extra","intra","lateral"]}
bar = foo.to_yaml
# => "---\nfile.yaml:\n- extra\n- intra\n- lateral\n"
And here's what the generated, serialized bar variable looks like if written:
puts bar
# >> ---
# >> file.yaml:
# >> - extra
# >> - intra
# >> - lateral
That's the format a YAML parser needs:
baz = YAML.load(bar)
baz
# => {"file.yaml"=>["extra", "intra", "lateral"]}
At this point the hash has gone round-trip, from a Ruby hash, to a YAML-serialized string, back to a Ruby hash.
Writing YAML to a file is easy using Ruby's File.write method:
File.write(foo.keys.first, foo.values.first.to_yaml)
or
foo.each do |k, v|
File.write(k, v.to_yaml)
end
Which results in a file named "file.yaml", which contains:
---
- extra
- intra
- lateral
To read and parse a file, use YAML's load_file method.
foo = YAML.load_file('file.yaml')
# => ["extra", "intra", "lateral"]
"How do I parse a YAML file?" might be of use, as well as the other "Related" links on the right side of this page.

Related

CSV.generate and converters?

I'm trying to create a converter to remove newline characters from CSV output.
I've got:
nonewline=lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
I've verified that this works properly IF I load a variable and then run something like:
csv=CSV(variable,:converters=>[nonewline])
However, I'm attempting to use this code to update a bunch of preexisting code using CSV.generate, and it does not appear to work at all.
CSV.generate(:converters=>[nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
returns:
"\"hello\ngoodbye\"\n"
I've tried quite a few things as well as trying other examples I've found online, and it appears as though :converters has no effect when used with CSV.generate.
Is this correct, or is there something I'm missing?
You need to write your converter as as below :
CSV::Converters[:nonewline] = lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
Then do :
CSV.generate(:converters => [:nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
Read the documentation Converters .
Okay, above part I didn't remove, as to show you how to write the custom CSV converters. The way you wrote it is incorrect.
Read the documentation of CSV::generate
This method wraps a String you provide, or an empty default String, in a CSV object which is passed to the provided block. You can use the block to append CSV rows to the String and when the block exits, the final String will be returned.
After reading the docs, it is quite clear that this method is for writing to a csv file, not for reading. Now all the converters options ( like :converters, :header_converters) is applied, when you are reading a CSV file, but not applied when you are writing into a CSV file.
Let me show you 2 examples to illustrate this more clearly.
require 'csv'
string = <<_
foo,bar
baz,quack
_
File.write('a',string)
CSV::Converters[:upcase] = lambda do |s|
s.upcase
end
I am reading from a CSV file, so :converters option is applied to it.
CSV.open('a','r',:converters => :upcase) do |csv|
puts csv.read
end
output
# >> FOO
# >> BAR
# >> BAZ
# >> QUACK
Now I am writing into the CSV file, converters option is not applied.
CSV.open('a','w',:converters => :upcase) do |csv|
csv << ['dog','cat']
end
CSV.read('a') # => [["dog", "cat"]]
Attempting to remove newlines using :converters did not work.
I had to override the << method from csv.rb adding the following code to it:
# Change all CR/NL's into one space
row.map! { |element|
if element.is_a?(String)
element.gsub(/(\r?\n)+/,' ')
else
element
end
}
Placed right before
output = row.map(&#quote).join(#col_sep) + #row_sep # quote and separate
at line 21.
I would think this would be a good patch to CSV, as newlines will always produce bad CSV output.

How can I control the output formats used by Ruby CSV?

I'd like to be able to change the date and time formats used by CSV when generating csv output. For example, instead of generating '2004-1-30' for a date, I'd like it to generate '1/30/2004'.
How can I do that?
Here is a complete example :
require 'csv'
require 'date'
str = <<_
2004-1-30,foo
2004-11-20,bar
_
File.write('a',str)
CSV::Converters[:cdate] = lambda do |s|
begin
Date.strptime(s,"%Y-%m-%d").strftime("%-m/%d/%Y")
rescue ArgumentError
s
end
end
CSV.foreach('a',:converters => :cdate) do |row|
p row
end
# >> ["1/30/2004", "foo"]
# >> ["11/20/2004", "bar"]
Look at the documentation of Converters.
An Array of names from the Converters Hash and/or lambdas that handle custom conversion. A single converter doesn’t have to be in an Array. All built-in converters try to transcode fields to UTF-8 before converting. The conversion will fail if the data cannot be transcoded, leaving the field unchanged.

Read and write YAML files without destroying anchors and aliases?

I need to open a YAML file with aliases used inside it:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: other
This obviously expands out to an equivalent YAML document of:
defaults:
foo: bar
zip: button
node:
foo: other
zip: button
Which YAML::load reads it as.
I need to set new keys in this YAML document and then write it back out to disk, preserving the original structure as much as possible.
I have looked at YAML::Store, but this completely destroys the aliases and anchors.
Is there anything available that could something along the lines of:
thing = Thing.load("config.yml")
thing[:node][:foo] = "yet another"
Saving the document back as:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
?
I opted to use YAML for this due to the fact it handles this aliasing well, but writing YAML that contains aliases appears to be a bit of a bleak-looking playing field in reality.
The use of << to indicate an aliased mapping should be merged in to the current mapping isn’t part of the core Yaml spec, but it is part of the tag repository.
The current Yaml library provided by Ruby – Psych – provides the dump and load methods which allow easy serialization and deserialization of Ruby objects and use the various implicit type conversion in the tag repository including << to merge hashes. It also provides tools to do more low level Yaml processing if you need it. Unfortunately it doesn’t easily allow selectively disabling or enabling specific parts of the tag repository – it’s an all or nothing affair. In particular the handling of << is pretty baked in to the handling of hashes.
One way to achieve what you want is to provide your own subclass of Psych’s ToRuby class and override this method, so that it just treats mapping keys of << as literals. This involves overriding a private method in Psych, so you need to be a little careful:
require 'psych'
class ToRubyNoMerge < Psych::Visitors::ToRuby
def revive_hash hash, o
#st[o.anchor] = hash if o.anchor
o.children.each_slice(2) { |k,v|
key = accept(k)
hash[key] = accept(v)
}
hash
end
end
You would then use it like this:
tree = Psych.parse your_data
data = ToRubyNoMerge.new.accept tree
With the Yaml from your example, data would then look something like
{"defaults"=>{"foo"=>"bar", "zip"=>"button"},
"node"=>{"<<"=>{"foo"=>"bar", "zip"=>"button"}, "foo"=>"other"}}
Note the << as a literal key. Also the hash under the data["defaults"] key is the same hash as the one under the data["node"]["<<"] key, i.e. they have the same object_id. You can now manipulate the data as you want, and when you write it out as Yaml the anchors and aliases will still be in place, although the anchor names will have changed:
data['node']['foo'] = "yet another"
puts Yaml.dump data
produces (Psych uses the object_id of the hash to ensure unique anchor names (the current version of Psych now uses sequential numbers rather than object_id)):
---
defaults: &2151922820
foo: bar
zip: button
node:
<<: *2151922820
foo: yet another
If you want to have control over the anchor names, you can provide your own Psych::Visitors::Emitter. Here’s a simple example based on your example and assuming there’s only the one anchor:
class MyEmitter < Psych::Visitors::Emitter
def visit_Psych_Nodes_Mapping o
o.anchor = 'defaults' if o.anchor
super
end
def visit_Psych_Nodes_Alias o
o.anchor = 'defaults' if o.anchor
super
end
end
When used with the modified data hash from above:
#create an AST based on the Ruby data structure
builder = Psych::Visitors::YAMLTree.new
builder << data
ast = builder.tree
# write out the tree using the custom emitter
MyEmitter.new($stdout).accept ast
the output is:
---
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
(Update: another question asked how to do this with more than one anchor, where I came up with a possibly better way to keep anchor names when serializing.)
YAML has aliases and they can round-trip, but you disable it by hash merging. << as a mapping key seems a non-standard extension to YAML (both in 1.8's syck and 1.9's psych).
require 'rubygems'
require 'yaml'
yaml = <<EOS
defaults: &defaults
foo: bar
zip: button
node: *defaults
EOS
data = YAML.load yaml
print data.to_yaml
prints
---
defaults: &id001
zip: button
foo: bar
node: *id001
but the << in your data merges the aliased hash into a new one which is no longer an alias.
Have you try Psych ? Another question with psych here.
I'm generating my CircleCI config file with a Ruby script and ERB templates. My script parses and regenerates the YAML, so I wanted to preserve all the anchors. The anchors in my config all have the same name as the key that defines them, e.g.
docker_images:
docker_auth: &docker_auth
username: '$DOCKERHUB_USERNAME'
password: '$DOCKERHUB_TOKEN'
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_ruby_image
So I was able to solve this with regular expressions on the generated YAML string. It wrote a #restore_yaml_anchors method that converts &1 and *1 back into &docker_auth and *docker_auth.
# Ruby 3.1.2
require 'rubygems'
require 'yaml'
yaml = <<EOS
docker_images:
docker_auth: &docker_auth
username: '$DOCKERHUB_USERNAME'
password: '$DOCKERHUB_TOKEN'
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_base_image
EOS
data = YAML.load yaml, aliases: true # needed for Ruby 3.x
def restore_yaml_anchors(yaml)
yaml.scan(/([A-Z0-9a-z_]+|<<): &(\d+)/).each do |anchor_name, anchor_id|
yaml.gsub!(/([:-]) (\*|&)#{anchor_id}/, "\\1 \\2#{anchor_name}")
end
yaml
end
puts [
"Original #to_yaml:",
data.to_yaml,
"-----------------------", '',
"With restored anchors:",
restore_yaml_anchors(data.to_yaml)
].join("\n")
Output:
Original #to_yaml:
---
docker_images:
docker_auth: &1
username: "$DOCKERHUB_USERNAME"
password: "$DOCKERHUB_TOKEN"
cimg_base_image: &2
image: cimg/base:2022.09
auth: *1
jobs:
tests:
docker:
- *2
-----------------------
With restored anchors:
---
docker_images:
docker_auth: &docker_auth
username: "$DOCKERHUB_USERNAME"
password: "$DOCKERHUB_TOKEN"
cimg_base_image: &cimg_base_image
image: cimg/base:2022.09
auth: *docker_auth
jobs:
tests:
docker:
- *cimg_base_image
It's working well for my CI config, but you may need to update it to handle some other cases in your own YAML.

Tabbed text file to MultiDimensional hash using Ruby?

I'm having a bit of trouble figuring about how I'd go about this for a part of my project. Basically I need to take a normal tabbed text file and convert it into a Multi Dimensional hash in Ruby so I can cycle through and detect which parts have children. An example of the file:
hello
world
how
are
you
today
Would become:
{'hello' => ['world', 'how'], 'are' => {'you' => ['today']}}
Since your input format is up to you, I really don't understand why you're not using YAML:
puts { 'hello' => ['world', 'how'], 'are' => { 'you' => ['today'] } }.to_yaml
yields:
---
hello:
- world
- how
are:
you:
- today
Calling YAML.load with that string, of course, returns the original data structure. Contrary to what you believe, YAML does not require a "key value syntax".

Is it possible to specify formatting options for to_yaml in ruby?

The code
require 'yaml'
puts YAML.load("
is_something:
values: ['yes', 'no']
").to_yaml
produces
---
is_something:
values:
- "yes"
- "no"
While this is a correct yaml, it just looks ugly when you have a hash of arrays. Is there a way for me to get to_yaml to produce the inline array version of the yaml?
An options hash can be passed to to_yaml but how do you use it?
Edit 0: Thanks Pozsár Balázs. But, as of ruby 1.8.7 (2009-04-08 patchlevel 160), the options hash does not work as advertised. :(
irb
irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> puts [[ 'Crispin', 'Glover' ]].to_yaml( :Indent => 4, :UseHeader => true, :UseVersion => true )
---
- - Crispin
- Glover
=> nil
About the hash options: see http://yaml4r.sourceforge.net/doc/page/examples.htm
Ex. 24: Using to_yaml with an options Hash
puts [[ 'Crispin', 'Glover' ]].to_yaml( :Indent => 4, :UseHeader => true, :UseVersion => true )
# prints:
# --- %YAML:1.0
# -
# - Crispin
# - Glover
Ex. 25: Available symbols for an options Hash
Indent: The default indentation to use when emitting (defaults to 2)
Separator: The default separator to use between documents (defaults to '---')
SortKeys: Sort Hash keys when emitting? (defaults to false)
UseHeader: Display the YAML header when emitting? (defaults to false)
UseVersion: Display the YAML version when emitting? (defaults to false)
AnchorFormat: A formatting string for anchor IDs when emitting (defaults to 'id%03d')
ExplicitTypes: Use explicit types when emitting? (defaults to false)
BestWidth: The character width to use when folding text (defaults to 80)
UseFold: Force folding of text when emitting? (defaults to false)
UseBlock: Force all text to be literal when emitting? (defaults to false)
Encoding: Unicode format to encode with (defaults to :Utf8; requires Iconv)
Starting from Ruby 1.9 psych is used as a default YAML engine. It supports some attributes: http://ruby-doc.org/stdlib-2.1.0/libdoc/psych/rdoc/Psych/Handler/DumperOptions.html
So for me it works:
irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> puts [{'a'=> 'b', 'c'=> 'd'}, {'e'=> 'f', 'g'=>'h'}].to_yaml(:indentation => 4)
---
- a: b
c: d
- e: f
g: h
This ugly hack seems to do the trick...
class Array
def to_yaml_style
:inline
end
end
Browsing through ruby's source, I can't find any options I could pass to achieve the same. Default options are described in the lib/yaml/constants.rb.
Just another hack to specify the output style, but this one allows to customize it per specific object, instead of globally (e.g. for all arrays).
https://gist.github.com/jirutka/31b1a61162e41d5064fc
Simple example:
class Movie
attr_accessor :genres, :actors
# method called by psych to render YAML
def encode_with(coder)
# render array inline (flow style)
coder['genres'] = StyledYAML.inline(genres) if genres
# render in default style (block)
coder['actors'] = actors if actors
end
end
The latest versions of Ruby use the Psych module for YAML parsing. There aren't many options that you can pass but you can change indention and line width. Check the latest Psych documentation for more details.
Use Psych directly.
Indentation has no effect:
my_yaml.to_yaml(:indentation => 2)
Indentation works:
Psych.dump(my_yaml, :indentation => 8)

Resources