How to include metadata in a template file? - ruby

I have a system that filters template files through erb. Using convention over configuration, the output files get created in a file hierarchy that mirrors the input files. Many of the files have the same names, and I was able to use the directories to differentiate them.
That plan worked until I needed to associate additional info with each file. So I created a YAML file in each directory with the metadata. Now I have both convention and configuration. Yuck.
Then I learned Webby, and the way it includes a YAML metadata section at the top of each template file. They look like this:
---
title: Baxter the Dog
filter: textile
---
All the best little blogs use Webby.
If I could implement a header like that, I could ditch my hierarchy and the separate YAML files. The Webby implementation is very generic, implementing a new MetaFile class that separates the header from the "real text", but it seems more complicated than I need.
Putting the metadata in an erb comment seems good -- it will be automatically ignored by erb, but I'm not sure how to access the comment data.
<%#
title: Baxter the Dog
%>
Is there a way to access the erb comments? Or maybe a different approach? A lot of my templates do a bunch of erb stuff, but I could run erb in a separate step if it makes the rest easier.

How about if you dump your content as YAML too. Presumably the metadata is simply a Hash dumped to YAML. You could just append the content as string in a second YAML document in the same file :-
---
title: Baxter the Dog
filter: textile
--- |
Content line 1
Content line 2
Content line 3
Dumping is as simple as :-
File.open('file.txt', 'w') do |output|
YAML.dump(metadata, output)
YAML.dump(content, output)
end
Loading is as simple as :-
File.open('file.txt') do |input|
stream = YAML.load_stream(input)
metadata, content = stream.documents
end
Note that the pipe character appears in the YAML so that newlines in the content string are preserved.

Related

why --- (3 dashes/hyphen) in yaml file?

So I just started using YAML file instead of application.properties as it is more readable. I see in YAML files they start with ---. I googled and found the below explanation.
YAML uses three dashes (“---”) to separate directives from document
content. This also serves to signal the start of a document if no
directives are present.
Also, I tried a sample without --- and understood that it is not mandatory to have them.
I think I don't have a clear understanding of directive and document. Can anyone please explain with a simple example?
As you already found out, the three dashes --- are used to signal the start of a document, i.e.:
To signal the document start after directives, i.e., %YAML or %TAG lines according to the current spec. For example:
%YAML 1.2
%TAG !foo! !foo-types/
---
myKey: myValue
To signal the document start when you have multiple yaml documents in the same stream, e.g., a yaml file:
doc 1
---
doc 2
If doc 2 has some preceding directives, then we have to use three dots ... to indicate the end of doc 1 (and the start of potential directives preceding doc 2) to the parser. For example:
doc 1
...
%TAG !bar! !bar-types/
---
doc 2
The spec is good for yaml parser implementers. However, I find this article easier to read from a user perspective.
It's not mandatory to have them if you do not begin your YAML with a directive. If it's the case, you should use them.
Let's take a look at the documentation
3.2.3.4. Directives
Each document may be associated with a set of directives. A directive has a name and an optional sequence of
parameters. Directives are instructions to the YAML processor, and
like all other presentation details are not reflected in the YAML
serialization tree or representation graph. This version of YAML
defines a two directives, “YAML” and “TAG”. All other directives are
reserved for future versions of YAML.
One example of this can also be found in the documentation for directive YAML
%YAML 1.2 # Attempt parsing
# with a warning
---
"foo"

Can't YAML::load an YAML:dumped XML value

When trying to YAML::load a value produced by YAML::dump I get an error "did not find expected key while parsing a block mapping at line 1 column 1"
The YAML::dump value was written to an XML file as:
<format_store>---:text_formatting: '':url_pattern: ''</format_store>
If I look into the database, it is a text field with line breaks in it.
---
:text_formatting: ''
:url_pattern: ''
So it looks like the conversion from YAML::dump into the XML format dropped the line breaks.
I explicitly use the YAML::dump format for text fields. XML does not allow line breaks in element values. It would have to be escaped in some way and I assumed YAML would take care of that.
Is there a better way to dump/load text fields or is there someting I'm missing here?
Option 1: Wrap the YAML content in a <![CDATA]]> as suggested in Adding a new line/break tag in XML.
Option 2: Configure your YAML library to dump mappings using flow style (e.g {':text_formatting' : '', ':url_pattern' : ''). The exact method for accomplishing this will depend on the YAML library you are using and may require a bit of custom coding.

How to edit a YAML file using Ruby without changing the comments and the indentation

I was trying to edit one YAML file key/value but it is not preserving the indentation and the comments existed in the actual file.
How do I fix it?
Comments can't be retained when you parse a YAML file into an object, as objects, either Arrays or Hashes, in Ruby don't have a way of internally commenting themselves. Comments only exist in source code or data files that support a comment, but in either case, they're ignored by the interpreter:
require 'yaml'
hash = YAML.load(<<EOT)
---
#foo bar
foo: bar
EOT
hash
# => {"foo"=>"bar"}
Similarly, indentation in a YAML file can't be preserved, since an object has no way of knowing what the indentation in the file was.

How to parse USPTO XML files with Ruby and Nokogiri?

It's been the whole day that I'm trying to figure out how to parse USPTO bulk XML files. I've tried to download one of those files, unzipped it and then run:
Nokogiri::XML(File.open('ipg140513.xml'))
But it seems to load only the first element, not all patents (in that file there are few thousands)
What am I doing wrong?
The file you linked to, and presumably the others, are not valid XML because they do not have a root element. From Wikipedia:
Each XML document has exactly one single root element.
Nokogiri hints at this if you look at the errors (suggested by Arup Rakshit), as detailed in the documentation:
Nokogiri::XML(File.open("/Users/b/Downloads/ipg140513.xml")).errors # =>
# [
# #<Nokogiri::XML::SyntaxError: XML declaration allowed only at the start of the document>,
# #<Nokogiri::XML::SyntaxError: Extra content at the end of the document>
# ]
The file appears to be a concatenation of a series of valid XML files, each having a <us-patent-grant/> as its root element.
Fortunately, Nokogiri can handle this invalid XML if you process it as a document fragment. Try this:
Nokogiri::XML::DocumentFragment.parse(File.read('ipg140513.xml')).select{|element| element.name == 'us-patent-grant'}
The select chooses the root node of each concatenated document, ignoring the processing instructions and DTD declarations.
Alternately, you could pre-process the file and split it into its constituent, correctly-formatted documents. Parsing a 650MB document all at once is quite slow and memory intensive.

How to remove '---' on top of a YAML file?

I am modifying a YAML file in Ruby. After I write back the modified YAML, I see a --- added on top of the file. How is this getting added and how do I get rid of it?
YAML spec says:
YAML uses three dashes (“---”) to separate directives from document content. This also serves to signal the start of a document if no directives are present.
Example:
# Ranking of 1998 home runs
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey
# Team ranking
---
- Chicago Cubs
- St Louis Cardinals
So if you have multiple documents per YAML file, you have to separate them by three dashes. If you only have one document, you can remove/omit it (I never had a problem with YAML in ruby if three-dashes was missing). The reason why it's added when you yamlify your object is that, I guess, the dumper is written "by the spec" and doesn't care to implement such "shortcuts" (omit three-dashes when it's only one document).

Resources