Here's a question about repeated nodes and missing values.
Given the xml below is there a way in XPath of returning (value2, null, value5) rather than (value2, value5)? I'm using an expression that looks like:
/nodes/node[*]/value2/text()
To retrieve the values. I need to know there's a value missing and at which indexes this occurs. I'm a complete newby to XPath and cannot figure out how to do this.
<nodes>
<node>
<value1>value1</value1>
<value2>value2</value2>
</node>
<node>
<value1>value3</value1>
</node>
<node>
<value1>value4</value1>
<value2>value5</value2>
</node>
</nodes>
Kind regards,
mipper
I suspect you'll need to get friendly with your xml parser and iterate over the nodes yourself. This example is in Ruby; I decorated it with comments since I don't know what languages you can read.
#!/usr/bin/ruby1.8
require 'nokogiri'
xml = <<EOS
<nodes>
<node>
<value1>value1</value1>
<value2>value2</value2>
</node>
<node>
<value1>value3</value1>
</node>
<node>
<value1>value4</value1>
<value2>value5</value2>
</node>
</nodes>
EOS
# Parse the xml with Nokogiri
doc = Nokogiri::XML(xml)
# Get an array of nodes
nodes = doc.xpath('/nodes/node')
# For each node, get its value. Put the collected values into the
# "values" array.
values = nodes.collect do |node|
node.at_xpath('value2/text()').to_s
end
# Print the values in a programmer-friendly format
p values # => ["value2", "", "value5"]
Given the xml below is there a way in XPath of returning (value2, null, value5)
In XPath, no. XPath can only select/return existing nodes. It does not know null.
In XSLT or any other XML aware programming language the problem can be solved by iteration.
<xsl:for-each select="/nodes/node">
<xsl:value-of select="*[2]" />
<xsl:text>, </xsl:text>
</xsl:for-each>
Related
I have a XML file:
<root>
<person name="brother">Abhijeet</person>
<person name="sister">pratiksha</person>
</root>
I want it to parse using Nokogiri. I tried by using CSS and XPath but it returns nil or the first element value. How do I retrieve other values?
I tried:
doc = Nokogiri::XML(xmlFile)
doc.elements.each do |f|
f.each do |y|
p y
end
end
and:
doc.xpath("//person/sister")
doc.at_xpath("//person/sister")
This is the basic way to search for a node with a given parameter and value using CSS:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<root>
<person name="brother">Abhijeet</person>
<person name="sister">pratiksha</person>
</root>
EOT
doc.at('person[name="sister"]').to_html # => "<person name=\"sister\">pratiksha</person>"
You need to research CSS and XPath and how their syntax work. In XPath //person/sister means search everywhere for <sister> nodes inside <person> nodes, matching something like:
<root>
<person>
<sister />
</person>
<person>
<sister />
</person>
</root>
Where it would find all the <sister /> nodes. It doesn't search for the parameter of a node.
Don't do:
doc.elements.each do |f|
f.each do |y|
p y
end
end
You're going to waste a lot of CPU walking through every element. Instead learn how selectors work, so you can take advantage of the power of libXML.
I'm parsing some XML that I get from various feeds. Apparently some of the XML has an occasional tag that is all upper case. I'd like to normalize the XML to be all lower case tags to make searching, etc. easier.
What I want to do is something like:
parsed = Nokogiri::XML.parse(xml_content)
node = parsed.css("title") # => should return a Nokogiri node for the title tag
However, some of the XML documents have "TITLE" for that tag.
What are my options for getting that node whether it's tag is "title", "TITLE", or even "Title"?
Thanks!
If you want to transform your xml document by downcase'ing all tag names, here's one way to do it:
parsed = Nokogiri::XML.parse(xml_content)
parsed.traverse do |node|
node.name = node.name.downcase if node.kind_of?(Nokogiri::XML::Element)
end
As a general approach you could transform all element (tag) names to lower case (e.g. by using XSLT or another solution) and then do all of your XPath/CSS queries using lower case only.
This XSLT solution should work; however, my version of Ruby (2.0.0p481) and/or Nokogiri (1.5.6) complains mysteriously (perhaps about the use of the "lower-case(...)" function? Perhaps Nokogiri doesn't support XSLT v2?)
Here's a solution that seems to work:
require 'nokogiri'
xslt = Nokogiri::XSLT(File.read('lower.xslt'))
# <?xml version="1.0" encoding="UTF-8"?>
# <xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
# <xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
# <xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
# <xsl:template match="*">
# <xsl:element name="{translate(local-name(), $uppercase, $lowercase)}">
# <xsl:apply-templates />
# </xsl:element>
# </xsl:template>
# </xsl:transform>
doc = Nokogiri::XML(File.read('doc.xml'))
# <?xml version="1.0" encoding="UTF-8"?>
# <FOO>
# <BAR>Bar</BAR>
# <GAH>Gah</GAH>
# <ZIP><DOO><DAH/></DOO></ZIP>
# </FOO>
puts xslt.transform(doc)
# <?xml version="1.0"?>
# <foo>
# <bar>Bar</bar>
# <gah>Gah</gah>
# <zip><doo><dah/></doo></zip>
# </foo>
I am using Nokogiri to parse a XML document and want to output a list of locations where the product name matches a string.
I'm able to output a list of all product names or a list of all locations but I'm not able to compare the two. Removing the if portion of the statement correctly outputs all the locations. What am I doing wrong with my regex?
#doc = Nokogiri::HTML::DocumentFragment.parse <<-EOXML
<?xml version="1.0"?>
<root>
<product>
<name>cool_fish</name>
<product_details>
<location>ocean</location>
<costs>
<msrp>9.99</msrp>
<margin>5.00</margin>
</costs>
</product_details>
</product>
<product>
<name>veggies</name>
<product_details>
<location>field</location>
<costs>
<msrp>2.99</msrp>
<margin>1.00</margin>
</costs>
</product_details>
</product>
</root>
EOXML
doc.xpath("//product").each do |x|
puts x.xpath("location") if x.xpath("name") =~ /cool_fish/
end
A few things going on here:
As others have pointed out, you should be parsing as XML not HTML, although that wouldn’t actually make much difference to the results you get.
You are parsing as a DocumentFragment, you should parse as a complete document. There are some issues involved querying document fragments, in particular queries starting with // don’t work right.
The location element is actually at the position product_details/location relative to the product node in your XML, so you need to update your query to take that into account.
You are trying to use the =~ operator on the result of the xpath method which is a Nokogiri::XML::NodeSet. NodeSet doesn’t define a =~ method, so it uses the default one on Object that just returns nil, so it will never match. You should use at_xpath to only get the first result, and then call text on it to get the string that you can match using =~.
(Also you use #doc and doc, but I’m assuming that’s just a typo.)
So combining those four points your code will look like:
#parse using XML, and not a fragment
doc = Nokogiri::XML <<-EOXML
# ... XML elided for space
EOXML
doc.xpath("//product").each do |x|
# correct query, use at_xpath and call text method
puts x.at_xpath("product_details/location") if x.at_xpath("name").text =~ /cool_fish/
end
However in this case you could do it all in a single XPath query, using the contains function:
# parse doc as XML document as above
puts doc.xpath("//product[contains(name, 'cool_fish')]/product_details/location")
This works because you have a fairly simple regex that only checks against a literal string. XPath 1.0 doesn’t have support for regex, so if your real use case involves a more complex one you may need to do it the “hard way”. (You could write a custom XPath function in that case, but that’s another story.)
Write your code as below :
require 'nokogiri'
#doc = Nokogiri::XML <<-EOXML
<?xml version="1.0"?>
<root>
<product>
<name>cool_fish</name>
<product_details>
<location>ocean</location>
<costs>
<msrp>9.99</msrp>
<margin>5.00</margin>
</costs>
</product_details>
</product>
<product>
<name>veggies</name>
<product_details>
<location>field</location>
<costs>
<msrp>2.99</msrp>
<margin>1.00</margin>
</costs>
</product_details>
</product>
</root>
EOXML
#doc.xpath("//product").each do |x|
puts x.at_xpath(".//location").text if x.at_xpath(".//name").text =~ /cool_fish/
end
# >> ocean
You are parsing an xml, you should use Nokogiri::XML. Your xpath expression was also incorrect. You wrote #xpath method, but you were using expression, which you should use with methods like css or search. I used at_xpath method, as you were interested with the single node match inside the #each block.
But you can use at in place of #at_xpath and search in place of xpath.
Remember search and at both understand CSS, as well as xpath expressions. search or xpath or css all methods will give you NodeSet, where as at, at_css or at_xpath would give you a Node. Once a Nokogiri node will be in your hand, use text method to get the content of that node.
I would suggest using Nokogiri::XML instead
#doc = Nokogiri::XML::Document.parse <<-EOXML
<?xml version="1.0"?>
<root>
<product>
<name>cool_fish</name>
<product_details>
<location>ocean</location>
<costs>
<msrp>9.99</msrp>
<margin>5.00</margin>
</costs>
</product_details>
</product>
<product>
<name>veggies</name>
<product_details>
<location>field</location>
<costs>
<msrp>2.99</msrp>
<margin>1.00</margin>
</costs>
</product_details>
</product>
</root>
EOXML
and then the Nokogiri::Node#search and Nokogiri::Node#at methods
#doc.search("product").each do |x|
puts x.at("location").content if x.at("name").content =~ /cool_fish/
end
In the code below I'm trying to pull '90000' element using an xpath, but Nokogiri returns nil
<?xml version="1.0" encoding="UTF-8"?>
<rspec xmlns="URL1"
xmlns:add="URL2">
<node>
<price add:cars="90000"/>
</node>
</rspec>
I try the command:
puts root.xpath("//add:cars", "add" => "URL2")
but it doesn't seem to work.
Could you please help me, I'm new at Ruby and I have searched a lot but I couldn't find anything.
add:cars is an attribute of the price element, not an element itself. The syntax you want is:
root.xpath("//xmlns:price/#add:cars")
or possibly even just
root.xpath("//#add:cars")
if you want the add:cars attributes of all elements.
Note that since the namespaces are declared on the root, Nokogiri registers them automatically so you don’t need to include the mappings in your call to xpath (you will need to include them if your document is more complex with namespaces being declared on non-root elements). Also the default namespace is registered with the prefix xmlns, so you can use that in your XPath.
Here is one way to do this :
require 'nokogiri'
#doc = Nokogiri::XML.parse <<-eotl
<?xml version="1.0" encoding="UTF-8"?>
<rspec xmlns="URL1"
xmlns:add="URL2">
<node>
<price add:cars="90000"/>
</node>
</rspec>
eotl
#doc.remove_namespaces!
#doc.at_xpath('//price/#cars').text
# => "90000"
or(if you want to keep namespaces as it is, then use below)
#doc.at_xpath('//xmlns:price/#add:cars').text
# => "90000"
Read this tutorial : Searching an HTML / XML Document
I'm a fan of letting Nokogiri use CSS when dealing with namespaces:
require 'nokogiri'
xml = '<?xml version="1.0" encoding="UTF-8"?>
<rspec xmlns="URL1"
xmlns:add="URL2">
<node>
<price add:cars="90000"/>
</node>
</rspec>
'
doc = Nokogiri::XML(xml)
doc.at('price', 'add')['add:cars']
# => "90000"
I am using Nokogiri (1.5.9 - java) in JRuby ( 1.6.7.2 ) to copy an XML template and edit it. I'm having problems finding elements in the cloned document.
lblock = doc.xpath(".//lblock[#blockName='WINDOW_LIST']").first
lblock.children = new_children # kind of NodeSet or Node
copy_doc = doc.dup( 1 ) # or dup(0)
lblock = copy_doc.xpath(".//lblock[#blockName='WINDOW_LIST']").first # nil
When print to_s or to_xml, so lblock there is with new_children.
Where is my mistake?
I can't duplicate the problem:
require 'nokogiri'
new_children = Nokogiri::XML::DocumentFragment.parse('<foo>bar</foo>')
doc = Nokogiri::XML(<<EOF)
<xml>
<lblock blockName="WINDOW_LIST" />
</xml>
EOF
lblock = doc.xpath(".//lblock[#blockName='WINDOW_LIST']").first
lblock.children = new_children # kind of NodeSet or Node
copy_doc = doc.dup(1) # or dup(0)
lblock = copy_doc.xpath(".//lblock[#blockName='WINDOW_LIST']").first # nil
puts lblock.to_xml
puts
puts doc.to_xml
Running that outputs:
<lblock blockName="WINDOW_LIST">
<foo>bar</foo>
</lblock>
<?xml version="1.0"?>
<xml>
<lblock blockName="WINDOW_LIST"><foo>bar</foo></lblock>
</xml>
That said, here's code that is cleaned up to show you some simpler ways to write it:
require 'nokogiri'
new_children = '<foo>bar</foo>'
doc = Nokogiri::XML(<<EOF)
<xml>
<lblock blockName="WINDOW_LIST" />
</xml>
EOF
lblock = doc.at_xpath('//lblock')
lblock.children = new_children
copy_doc = doc.dup(1)
lblock = copy_doc.at_css('lblock')
puts lblock.to_xml
puts
puts doc.to_xml
Which outputs this too after running:
<lblock blockName="WINDOW_LIST">
<foo>bar</foo>
</lblock>
<?xml version="1.0"?>
<xml>
<lblock blockName="WINDOW_LIST"><foo>bar</foo></lblock>
</xml>
Dissecting the code:
lblock = doc.at_xpath('//lblock')
lblock = copy_doc.at_css('lblock')
These use two different ways of finding the same thing. In this case, because the sample XML was simple, I used at, which returns the first matching node. at_xpath and at_css work with XPaths and CSS respectively. at would try to figure out whether the string is CSS or XPath, and normally gets it right, though I have seen it fooled.
lblock.children = new_children
In this case, new_children is a String. Nokogiri is smart enough to know it should convert the string into an XML fragment before using it. This makes it very easy to modify XML or HTML documents with strings, instead of having to create DocumentFragments.