Nokogiri compare field and puts - ruby

I am using Nokogiri to parse a XML document and want to output a list of locations where the product name matches a string.
I'm able to output a list of all product names or a list of all locations but I'm not able to compare the two. Removing the if portion of the statement correctly outputs all the locations. What am I doing wrong with my regex?
#doc = Nokogiri::HTML::DocumentFragment.parse <<-EOXML
<?xml version="1.0"?>
<root>
<product>
<name>cool_fish</name>
<product_details>
<location>ocean</location>
<costs>
<msrp>9.99</msrp>
<margin>5.00</margin>
</costs>
</product_details>
</product>
<product>
<name>veggies</name>
<product_details>
<location>field</location>
<costs>
<msrp>2.99</msrp>
<margin>1.00</margin>
</costs>
</product_details>
</product>
</root>
EOXML
doc.xpath("//product").each do |x|
puts x.xpath("location") if x.xpath("name") =~ /cool_fish/
end

A few things going on here:
As others have pointed out, you should be parsing as XML not HTML, although that wouldn’t actually make much difference to the results you get.
You are parsing as a DocumentFragment, you should parse as a complete document. There are some issues involved querying document fragments, in particular queries starting with // don’t work right.
The location element is actually at the position product_details/location relative to the product node in your XML, so you need to update your query to take that into account.
You are trying to use the =~ operator on the result of the xpath method which is a Nokogiri::XML::NodeSet. NodeSet doesn’t define a =~ method, so it uses the default one on Object that just returns nil, so it will never match. You should use at_xpath to only get the first result, and then call text on it to get the string that you can match using =~.
(Also you use #doc and doc, but I’m assuming that’s just a typo.)
So combining those four points your code will look like:
#parse using XML, and not a fragment
doc = Nokogiri::XML <<-EOXML
# ... XML elided for space
EOXML
doc.xpath("//product").each do |x|
# correct query, use at_xpath and call text method
puts x.at_xpath("product_details/location") if x.at_xpath("name").text =~ /cool_fish/
end
However in this case you could do it all in a single XPath query, using the contains function:
# parse doc as XML document as above
puts doc.xpath("//product[contains(name, 'cool_fish')]/product_details/location")
This works because you have a fairly simple regex that only checks against a literal string. XPath 1.0 doesn’t have support for regex, so if your real use case involves a more complex one you may need to do it the “hard way”. (You could write a custom XPath function in that case, but that’s another story.)

Write your code as below :
require 'nokogiri'
#doc = Nokogiri::XML <<-EOXML
<?xml version="1.0"?>
<root>
<product>
<name>cool_fish</name>
<product_details>
<location>ocean</location>
<costs>
<msrp>9.99</msrp>
<margin>5.00</margin>
</costs>
</product_details>
</product>
<product>
<name>veggies</name>
<product_details>
<location>field</location>
<costs>
<msrp>2.99</msrp>
<margin>1.00</margin>
</costs>
</product_details>
</product>
</root>
EOXML
#doc.xpath("//product").each do |x|
puts x.at_xpath(".//location").text if x.at_xpath(".//name").text =~ /cool_fish/
end
# >> ocean
You are parsing an xml, you should use Nokogiri::XML. Your xpath expression was also incorrect. You wrote #xpath method, but you were using expression, which you should use with methods like css or search. I used at_xpath method, as you were interested with the single node match inside the #each block.
But you can use at in place of #at_xpath and search in place of xpath.
Remember search and at both understand CSS, as well as xpath expressions. search or xpath or css all methods will give you NodeSet, where as at, at_css or at_xpath would give you a Node. Once a Nokogiri node will be in your hand, use text method to get the content of that node.

I would suggest using Nokogiri::XML instead
#doc = Nokogiri::XML::Document.parse <<-EOXML
<?xml version="1.0"?>
<root>
<product>
<name>cool_fish</name>
<product_details>
<location>ocean</location>
<costs>
<msrp>9.99</msrp>
<margin>5.00</margin>
</costs>
</product_details>
</product>
<product>
<name>veggies</name>
<product_details>
<location>field</location>
<costs>
<msrp>2.99</msrp>
<margin>1.00</margin>
</costs>
</product_details>
</product>
</root>
EOXML
and then the Nokogiri::Node#search and Nokogiri::Node#at methods
#doc.search("product").each do |x|
puts x.at("location").content if x.at("name").content =~ /cool_fish/
end

Related

How to parse an XML file using Nokogiri and Ruby

I have a XML file:
<root>
<person name="brother">Abhijeet</person>
<person name="sister">pratiksha</person>
</root>
I want it to parse using Nokogiri. I tried by using CSS and XPath but it returns nil or the first element value. How do I retrieve other values?
I tried:
doc = Nokogiri::XML(xmlFile)
doc.elements.each do |f|
f.each do |y|
p y
end
end
and:
doc.xpath("//person/sister")
doc.at_xpath("//person/sister")
This is the basic way to search for a node with a given parameter and value using CSS:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<root>
<person name="brother">Abhijeet</person>
<person name="sister">pratiksha</person>
</root>
EOT
doc.at('person[name="sister"]').to_html # => "<person name=\"sister\">pratiksha</person>"
You need to research CSS and XPath and how their syntax work. In XPath //person/sister means search everywhere for <sister> nodes inside <person> nodes, matching something like:
<root>
<person>
<sister />
</person>
<person>
<sister />
</person>
</root>
Where it would find all the <sister /> nodes. It doesn't search for the parameter of a node.
Don't do:
doc.elements.each do |f|
f.each do |y|
p y
end
end
You're going to waste a lot of CPU walking through every element. Instead learn how selectors work, so you can take advantage of the power of libXML.

Create non-self-closed empty tag with Nokogiri

When I try to create an XML document with Nokogiri::XML::Builder:
builder = Nokogiri::XML::Builder.new do |xml|
xml.my_tag({key: :value})
end
I get the following XML tag:
<my_tag key="value"/>
It is self-closed, but I need the full form:
<my_tag key="value"></my_tag>
When I pass a value inside the node (or even a space):
xml.my_tag("content", key: :value)
xml.my_tag(" ", key: :value)
It generates the full tag:
<my_tag key="value">content</my_tag>
<my_tag key="value"> </my_tag>
But if I pass either an empty string or nil, or even an empty block:
xml.my_tag("", key: :value)
It generates a self-closed tag:
<my_tag key="value"/>
I believe there should be some attribute or something else that helps me but simple Googling didn't find the answer.
I found a possible solution in "Building blank XML tags with Nokogiri?" but it saves all tags as non-self-closed.
You can use Nokogiri's NO_EMPTY_TAGS save option. (XML calls self-closing tags empty-element tags.)
builder = Nokogiri::XML::Builder.new do |xml|
xml.my_tag({key: :value})
end
puts builder.to_xml(save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS)
<?xml version="1.0"?>
<my_tag key="value"></my_tag>
Each of the options is represented in a bit, so you can mix and match the ones you want. For example, setting NO_EMPTY_TAGS by itself will leave your XML on one line without spacing or indentation. If you still want it formatted for humans, you can bitwise or (|) it with the FORMAT option.
builder = Nokogiri::XML::Builder.new do |xml|
xml.my_tag({key: :value}) do |my_tag|
my_tag.nested({another: :value})
end
end
puts builder.to_xml(
save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS
)
puts
puts builder.to_xml(
save_with: Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS |
Nokogiri::XML::Node::SaveOptions::FORMAT
)
<?xml version="1.0"?>
<my_tag key="value"><nested another="value"></nested></my_tag>
<?xml version="1.0"?>
<my_tag key="value">
<nested another="value"></nested>
</my_tag>
There are also a handful of DEFAULT_* options at the end of the list that already combine options into common uses.
Your update mentions "it saves all tags as non-self-closed", as if perhaps you only want this single tag instance to be non-self-closed, and the rest to self close. Nokogiri won't produce an inconsistent document like that, but if you must, you can concatenate some XML strings together that you built with different options.

Check if xml response has type="array" using Nokogiri?

Given the following xml which has been parsed into #response using Nokogiri
<?xml version="1.0" encoding="UTF-8"?>
<foos type="array">
<foo>
<id type="integer">1</id>
<name>bar</name>
</foo>
</foos>
Does an xpath exist such that #response.xpath(xpath) returns array?
Assume that this xpath must be reused across multiple documents where the naming of foo is inconsistent.
If an xpath is not the correct tool to solve this problem, does Nokogiri provide a method that is?
This xml is automatically generated by the rails framework, and the answer to this question is intended to be used to create an XML equivalent to this Cucumber feature for JSON responses.
If you want to select the root node when its type attribute is array (regardless of the root element's name), then use this:
/*[#type='array']
For its children, use:
/*[#type='array']/*
Simply:
if doc.root['type']=='array'
Here's a test case:
#response = <<ENDXML
<?xml version="1.0" encoding="UTF-8"?>
<foos type="array">
<foo>
<id type="integer">1</id>
<name>bar</name>
</foo>
</foos>
ENDXML
require 'nokogiri'
doc = Nokogiri.XML(#response)
if doc.root['type']=='array'
puts "It is!"
else
puts "Nope"
end
Depending on your needs, you might want to:
case doc.root['type']
when 'array'
#...
when 'string'
#...
else
#...
end

Nokogiri and XML Formatting When Inserting Tags

I'd like to use Nokogiri to insert nodes into an XML document. Nokogiri uses the Nokogiri::XML::Builder class to insert or create new XML.
If I create XML using the new method, I'm able to create nice, formatted XML:
builder = Nokogiri::XML::Builder.new do |xml|
xml.product {
xml.test "hi"
}
end
puts builder
outputs the following:
<?xml version="1.0"?>
<product>
<test>hi</test>
</product>
That's great, but what I want to do is add the above XML to an existing document, not create a new document. According to the Nokogiri documentation, this can be done by using the Builder's with method, like so:
builder = Nokogiri::XML::Builder.with(document.at('products')) do |xml|
xml.product {
xml.test "hi"
}
end
puts builder
When I do this, however, the XML all gets put into a single line with no indentation. It looks like this:
<products><product><test>hi</test></product></products>
Am I missing something to get it to format correctly?
Found the answer in the Nokogiri mailing list:
In XML, whitespace can be considered
meaningful. If you parse a document
that contains whitespace nodes,
libxml2 will assume that whitespace
nodes are meaningful and will not
insert them for you.
You can tell libxml2 that whitespace
is not meaningful by passing the
"noblanks" flag to the parser. To
demonstrate, here is an example that
reproduces your error, then does what
you want:
require 'nokogiri'
def build_from node
builder = Nokogiri::XML::Builder.with(node) do|xml|
xml.hello do
xml.world
end
end
end
xml = DATA.read
doc = Nokogiri::XML(xml)
puts build_from(doc.at('bar')).to_xml
doc = Nokogiri::XML(xml) { |x| x.noblanks }
puts build_from(doc.at('bar')).to_xml
Output:
<root>
<foo>
<bar>
<baz />
</bar>
</foo>
</root>

Can Nokogiri search for "?xml-stylesheet" tags?

I need to parse for an XML style sheet:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/templates/xslt/inspections/disclaimer_en.xsl"?>
Using Nokogiri I tried:
doc.search("?xml-stylesheet").first['href']
but I get the error:
`on_error': unexpected '?' after '' (Nokogiri::CSS::SyntaxError)
Nokogiri cannot search for tags that are XML processing instructions. You may access them like this:
doc.children[0]
This is not an XML element; this is an XML "Processing Instruction". That is why you could not find it with your query. To find it you want:
# Find the first xml-stylesheet PI
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
# Find every xml-stylesheet PI
xsss = doc.xpath('//processing-instruction("xml-stylesheet")')
Seen in action:
require 'nokogiri'
xml = <<ENDXML
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/templates/disclaimer_en.xsl"?>
<root>Hi Mom!</root>
ENDXML
doc = Nokogiri.XML(xml)
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
puts xss.name #=> xml-stylesheet
puts xss.content #=> type="text/xsl" href="/templates/disclaimer_en.xsl"
Since a Processing Instruction is not an Element, it does not have attributes; you cannot, for example, ask for xss['type'] or xss['href']; you will need to parse the content as an element if you wish this. One way to do this is:
class Nokogiri::XML::ProcessingInstruction
def to_element
document.parse("<#{name} #{content}/>")
end
end
p xss.to_element['href'] #=> "/templates/disclaimer_en.xsl"
Note that there exists a bug in Nokogiri or libxml2 which will cause the XML Declaration to appear in the document as a Processing Instruction if there is at least one character (can be a space) before <?xml. This is why in the above we search specifically for processing instructions with the name xml-stylesheet.
Edit: The XPath expression processing-instruction()[name()="foo"] is equivalent to the expression processing-instruction("foo"). As described in the XPath 1.0 spec:
The processing-instruction() test may have an argument that is Literal; in this case, it is true for any processing instruction that has a name equal to the value of the Literal.
I've edited the answer above to use the shorter format.

Resources