Can't YAML::load an YAML:dumped XML value - ruby

When trying to YAML::load a value produced by YAML::dump I get an error "did not find expected key while parsing a block mapping at line 1 column 1"
The YAML::dump value was written to an XML file as:
<format_store>---:text_formatting: '':url_pattern: ''</format_store>
If I look into the database, it is a text field with line breaks in it.
---
:text_formatting: ''
:url_pattern: ''
So it looks like the conversion from YAML::dump into the XML format dropped the line breaks.
I explicitly use the YAML::dump format for text fields. XML does not allow line breaks in element values. It would have to be escaped in some way and I assumed YAML would take care of that.
Is there a better way to dump/load text fields or is there someting I'm missing here?

Option 1: Wrap the YAML content in a <![CDATA]]> as suggested in Adding a new line/break tag in XML.
Option 2: Configure your YAML library to dump mappings using flow style (e.g {':text_formatting' : '', ':url_pattern' : ''). The exact method for accomplishing this will depend on the YAML library you are using and may require a bit of custom coding.

Related

Loading data from YAML in ruby changing the encoding/byte structure of data?

I am trying to write a method to remove some blacklisted characters like bom characters using their UTF-8 values. I am successful to achieve this by creating a method in String class with the following logic,
def remove_blacklist_utf_chars
self.force_encoding("UTF-8").gsub!(config[:blacklist_utf_chars][:zero_width_space].force_encoding("UTF-8"), "")
self
end
Now to make it useful across the applications and reusable I create a config in a yml file. The yml structure is something like,
:blacklist_utf_chars:
:zero_width_space: '"\u{200b}"'
(Edit) Also as suggested by Drenmi this didn't work,
:blacklist_utf_chars:
:zero_width_space: \u{200b}
The problem I am facing is that the method remove_blacklist_utf_chars does not work when I load the utf-encoding of blacklist characters from yml file
But when I directly pass these in the method and not via the yml file the method works.
So basically
self.force_encoding("UTF-8").gsub!("\u{200b}".force_encoding("UTF-8"), "") -- works.
but,
self.force_encoding("UTF-8").gsub!(config[:blacklist_utf_chars][:zero_width_space].force_encoding("UTF-8"), "") -- doesn't work.
I printed the value of config[:blacklist_utf_chars][:zero_width_space] and its equal to "\u{200b}"
I got this idea by referring: https://stackoverflow.com/a/5011768/2362505.
Now I am not sure how what exactly is happening when the blacklist chars list is loaded via yml in ruby code.
EDIT 2:
On further investigation I observed that there is an extra \ getting added while reading the hash from the yaml.
So,
puts config[:blacklist_utf_chars][:zero_width_space].dump
prints:
"\\u{200b}"
But then if I just define the yaml as:
:blacklist_utf_chars:
:zero_width_space: 200b
and do,
ch = "\u{#{config[:blacklist_utf_chars][:zero_width_space]}}"
self.force_encoding("UTF-8").gsub!(ch.force_encoding("UTF-8"), "")
I get
/Users/harshsingh/dir/to/code/utils.rb:121: invalid Unicode escape (SyntaxError)
The "\u{200b}" syntax is used for escaping Unicode characters in Ruby source code. It won’t work inside Yaml.
The equivalent syntax for a Yaml document is the similar "\u200b" (which also happens to be valid in Ruby). Note the lack of braces ({}), and also the double quotes are required, otherwise it will be parsed as literal \u200b.
So your Yaml file should look like this:
:blacklist_utf_chars:
:zero_width_space: "\u200b"
If you puts the value, and get the output "\u{200b}", it means the quotes are included in your string. I.e., you're actually calling:
self.force_encoding("UTF-8").gsub!('"\u{200b}"'.config[:blacklist_utf_chars][:zero_width_space].force_encoding("UTF-8"), "")
Try changing your YAML file to:
:blacklist_utf_chars:
:zero_width_space: \u{200b}

Oracle XMLTYPE: parameter like <123.xml/> not working

I am using Oracle 11g in which I am passing file name to a function say
FUNCTION test(file_name IN XMLTYPE)
It accepts file name like <file1.xml/> but when I am passing file name as <123.xml/> then it is throwing Oracle type exception error. But if I pass file name starting with character again as <T123.xml/> then it is working fine.
Please tell me what need to do to process file name as <123.xml/>
<123.xml/> is not valid XML: tag names can't start with a digit (or a . or -). The Wikipedia article on XML has a good summary on well-formedness. The other two examples you posted are valid XML though, the tag names are well-formed.
Your use-case looks very strange though. If you want to pass a simple filename to that function, why don't you pass a plain string? If you do want to pass a filename encoded in an XML container, something like this would make more sense:
<file name="foo.bar"/>

Preserving whitespace / line breaks with REXML

I'm using Ruby 1.9.3 and REXML to parse an XML document, make a few changes (additions/subtractions), then re-output the file. Within this file is a block that looks like this:
<someElement>
some.namespace.something1=somevalue1
some.namespace.something2=somevalue2
some.namespace.something3=somevalue3
</someElement>
The problem is that after re-writing the file, this block always ends up looking like this:
<someElement>
some.namespace.something1=somevalue1
some.namespace.something2=somevalue2 some.namespace.something3=somevalue3
</someElement>
The newline after the second value (but never the first!) has been lost and turned into a space. Later, some other code which I have no control or influence over will be reading this file and depending on those newlines to properly parse the content. Generally in this situation i'd use a CDATA to preserve the whitespace, but this isn't an option as the code that parses this data later is not expecting one - it's essential that the inner text of this element is preserved exactly as-is.
My read/write code looks like this:
xmlFile = File.open(myFile)
contents = xmlFile.read
xmlDoc = REXML::Document.new(contents, { :respect_whitespace => :all })
xmlFile.close
{perform some tasks}
out = ""
xmlDoc.write(out, 2)
File.open(filePath, "w"){|file| file.puts(out)}
I'm looking for a way to preserve the whitespace of text between elements when reading/writing a file in this manner using REXML. I've read a number of other questions here on stackoverflow on this subject, but none that quite replicate this scenario. Any ideas or suggestions are welcome.
I get correct behavior by removing the indent (second) parameter to Document.write():
#xmlDoc.write(out, 2)
xmlDoc.write(out)
That seems like a bug in Document.write() according to my reading of the docs, but if you don't really need to set the indentation, then leaving that off should solve yor problem.

How do I use the XPath tokenizer function in Nokogiri?

I am attempting to extract information from the following HTML using Nokogiri and XPath.
<p>Friday, February 1<br><strong>Apple <br> Orange</strong></p>
e.xpath('./text()[following-sibling::br]')
Gives me the date just fine. I want to then grab the text inside the strong node and split on br. There may be many fruits separated by br or there may just be one with no br. I would ideally like to accomplish this in xpath instead of code since I'm essentially defining a bunch of parsers via JSON.
Right now I'm thinking that I should use the tokenizer function and pass the text in the strong tag. I thought that should look like this:
e.xpath('./strong[fn::tokenize(.,"<br>")]')
and have also tried
e.xpath('fn::tokenize(./strong,"<br>")')
but I am getting:
.../gems/nokogiri-1.5.6/lib/nokogiri/xml/node.rb:159:in `evaluate': Invalid expression: ./strong/text()[fn::tokenize(.,"br")] (Nokogiri::XML::XPath::SyntaxError)
I'm modeling my usage after the documentation for the method that the error occurs in (line 139):
node.xpath('.//title[regex(., "\w+")]',...

How to use Regular Expression to insert text in between text?

I have a unique scenario. There is a web application which is a simulator to check sending of data in XML and getting the data back in xml and verifying few details in xml.
Now the xml data which I am sending has a lot of details. In that xml I will have to insert a parameter which I have defined in my test. I am not able to get, how to send the data as parameter in the xml before sending it.
the xml structre looks like this
id='12345'><version>1.3.4<</version><accno>1234567890</accno>add<address details</> ..........
Now int this xml structure, I have parameterized <accno>1234567890</accno> ... Mean in begin of the script I am declaring accno='1234567890'
Now I want to using accno as parameter in the xml instead of the hard coded value in the xml. Please suggest how to do this.
XML is not regular, but context-free. Use a proper parser like Nokogiri instead of regex. See RegEx match open tags except XHTML self-contained tags.
As answer, as requested.
I will say editing xml, by regex is a bad idea.
but just to answer the direct question use gsub. eg.
str.gsub(/reg_match/, newstring)
but better way of doing it will be use of hpricot,
Or you can also use ruby templates.
require 'erb'
require 'ostruct'
data = {:accno => "1234567890"}
variables = OpenStruct.new(data)
template = "<id='12345'><version>1.3.4</version><accno><%= accno%></accno>"
res = ERB.new(template).result(variables.instance_eval { binding })
puts res
First identify the pattern, then replace it using gsub!
xml_data.gsub! (pattern, replacement)
http://ruby-doc.org/docs/ProgrammingRuby/html/ref_c_string.html#String.gsub_oh
The fast way to do it is with gsub (like Rajkaran says). The right way to do it is rexml or some other xml library. Investment should be related to how much you will use this kind of thing in the future.

Resources