In the code below I'm trying to pull '90000' element using an xpath, but Nokogiri returns nil
<?xml version="1.0" encoding="UTF-8"?>
<rspec xmlns="URL1"
xmlns:add="URL2">
<node>
<price add:cars="90000"/>
</node>
</rspec>
I try the command:
puts root.xpath("//add:cars", "add" => "URL2")
but it doesn't seem to work.
Could you please help me, I'm new at Ruby and I have searched a lot but I couldn't find anything.
add:cars is an attribute of the price element, not an element itself. The syntax you want is:
root.xpath("//xmlns:price/#add:cars")
or possibly even just
root.xpath("//#add:cars")
if you want the add:cars attributes of all elements.
Note that since the namespaces are declared on the root, Nokogiri registers them automatically so you don’t need to include the mappings in your call to xpath (you will need to include them if your document is more complex with namespaces being declared on non-root elements). Also the default namespace is registered with the prefix xmlns, so you can use that in your XPath.
Here is one way to do this :
require 'nokogiri'
#doc = Nokogiri::XML.parse <<-eotl
<?xml version="1.0" encoding="UTF-8"?>
<rspec xmlns="URL1"
xmlns:add="URL2">
<node>
<price add:cars="90000"/>
</node>
</rspec>
eotl
#doc.remove_namespaces!
#doc.at_xpath('//price/#cars').text
# => "90000"
or(if you want to keep namespaces as it is, then use below)
#doc.at_xpath('//xmlns:price/#add:cars').text
# => "90000"
Read this tutorial : Searching an HTML / XML Document
I'm a fan of letting Nokogiri use CSS when dealing with namespaces:
require 'nokogiri'
xml = '<?xml version="1.0" encoding="UTF-8"?>
<rspec xmlns="URL1"
xmlns:add="URL2">
<node>
<price add:cars="90000"/>
</node>
</rspec>
'
doc = Nokogiri::XML(xml)
doc.at('price', 'add')['add:cars']
# => "90000"
Related
I'm parsing some XML that I get from various feeds. Apparently some of the XML has an occasional tag that is all upper case. I'd like to normalize the XML to be all lower case tags to make searching, etc. easier.
What I want to do is something like:
parsed = Nokogiri::XML.parse(xml_content)
node = parsed.css("title") # => should return a Nokogiri node for the title tag
However, some of the XML documents have "TITLE" for that tag.
What are my options for getting that node whether it's tag is "title", "TITLE", or even "Title"?
Thanks!
If you want to transform your xml document by downcase'ing all tag names, here's one way to do it:
parsed = Nokogiri::XML.parse(xml_content)
parsed.traverse do |node|
node.name = node.name.downcase if node.kind_of?(Nokogiri::XML::Element)
end
As a general approach you could transform all element (tag) names to lower case (e.g. by using XSLT or another solution) and then do all of your XPath/CSS queries using lower case only.
This XSLT solution should work; however, my version of Ruby (2.0.0p481) and/or Nokogiri (1.5.6) complains mysteriously (perhaps about the use of the "lower-case(...)" function? Perhaps Nokogiri doesn't support XSLT v2?)
Here's a solution that seems to work:
require 'nokogiri'
xslt = Nokogiri::XSLT(File.read('lower.xslt'))
# <?xml version="1.0" encoding="UTF-8"?>
# <xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
# <xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
# <xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
# <xsl:template match="*">
# <xsl:element name="{translate(local-name(), $uppercase, $lowercase)}">
# <xsl:apply-templates />
# </xsl:element>
# </xsl:template>
# </xsl:transform>
doc = Nokogiri::XML(File.read('doc.xml'))
# <?xml version="1.0" encoding="UTF-8"?>
# <FOO>
# <BAR>Bar</BAR>
# <GAH>Gah</GAH>
# <ZIP><DOO><DAH/></DOO></ZIP>
# </FOO>
puts xslt.transform(doc)
# <?xml version="1.0"?>
# <foo>
# <bar>Bar</bar>
# <gah>Gah</gah>
# <zip><doo><dah/></doo></zip>
# </foo>
I have an XML file that I need to parse. I have no control over the format of the file and cannot change it.
The file makes use of a prefix (call it a), but it doesn't define a namespace for that prefix anywhere. I can't seem to use xpath to query for nodes with the a namespace.
Here's the contents of the xml document
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
<a:thing>stuff2</a:thing>
<a:thing>stuff3</a:thing>
<a:thing>stuff4</a:thing>
<a:thing>stuff5</a:thing>
<a:thing>stuff6</a:thing>
<a:thing>stuff7</a:thing>
<a:thing>stuff8</a:thing>
<a:thing>stuff9</a:thing>
</a:root>
I am using Nokogiri to query the document:
doc = Nokogiri::XML(open('text.xml'))
things = doc.xpath('//a:thing')
The fails giving the following error:
Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //a:thing
From my research, I found out that I could specify the namespace for the prefix in the xpath method:
things = doc.xpath('//a:thing', a: 'nobody knows')
This returns an empty array.
What would be the best way for me to get the nodes that I need?
The problem is that the namespace is not properly defined in the XML document. As a result, Nokogiri sees the node names as being "a:root" instead of "a" being a namespace and "root" being the node name:
xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
doc = Nokogiri::XML(xml)
puts doc.at_xpath('*').node_name
#=> "a:root"
puts doc.at_xpath('*').namespace
#=> ""
Solution 1 - Specify node name with colon
One solution is to search for nodes with the name "a:thing". You cannot do //a:thing since the XPath will treat the "a" as a namespace. You can get around this by doing //*[name()="a:thing"]:
xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
doc = Nokogiri::XML(xml)
things = doc.xpath('//*[name()="a:thing"]')
puts things
#=> <a:thing>stuff0</a:thing>
#=> <a:thing>stuff1</a:thing>
Solution 2 - Modify the XML document to define the namespace
An alternative solution is to modify the XML file that you get to properly define the namespace. The document will then behave with namespaces as expected:
xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
xml.gsub!('<a:root>', '<a:root xmlns:a="foo">')
doc = Nokogiri::XML(xml)
things = doc.xpath('//a:thing')
puts things
#=> <a:thing>stuff0</a:thing>
#=> <a:thing>stuff1</a:thing>
I am switching from LibXML to Nokogiri. I have a method in my code to check if an xml document matches an Dtd. The Dtd is read from a database (as string).
This is an example within an irb session
require 'xml'
doc = LibXML::XML::Document.string('<foo bar="baz" />') #=> <?xml version="1.0" encoding="UTF-8"?>
dtd = LibXML::XML::Dtd.new('<!ELEMENT foo EMPTY><!ATTLIST foo bar ID #REQUIRED>') #=> #<LibXML::XML::Dtd:0x000000026f53b8>
doc.validate dtd #=> true
As I understand #validate of Nokogiri::XML::Document it is only possible to check DTDs within the Document. How would I do this to archive the same result?
I think what you need is internal_subset:
require 'nokogiri'
doc = Nokogiri::HTML("<!DOCTYPE html>")
# then you can get the info you want
doc.internal_subset # Nokogiri::XML::DTD
# for example you can get name, system_id, external_id, etc
doc.internal_subset.name
doc.internal_subset.system_id
doc.internal_subset.external_id
Here is a full doc of Nokogiri::XML::DTD.
Thanks
Given the following xml which has been parsed into #response using Nokogiri
<?xml version="1.0" encoding="UTF-8"?>
<foos type="array">
<foo>
<id type="integer">1</id>
<name>bar</name>
</foo>
</foos>
Does an xpath exist such that #response.xpath(xpath) returns array?
Assume that this xpath must be reused across multiple documents where the naming of foo is inconsistent.
If an xpath is not the correct tool to solve this problem, does Nokogiri provide a method that is?
This xml is automatically generated by the rails framework, and the answer to this question is intended to be used to create an XML equivalent to this Cucumber feature for JSON responses.
If you want to select the root node when its type attribute is array (regardless of the root element's name), then use this:
/*[#type='array']
For its children, use:
/*[#type='array']/*
Simply:
if doc.root['type']=='array'
Here's a test case:
#response = <<ENDXML
<?xml version="1.0" encoding="UTF-8"?>
<foos type="array">
<foo>
<id type="integer">1</id>
<name>bar</name>
</foo>
</foos>
ENDXML
require 'nokogiri'
doc = Nokogiri.XML(#response)
if doc.root['type']=='array'
puts "It is!"
else
puts "Nope"
end
Depending on your needs, you might want to:
case doc.root['type']
when 'array'
#...
when 'string'
#...
else
#...
end
I need to parse for an XML style sheet:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/templates/xslt/inspections/disclaimer_en.xsl"?>
Using Nokogiri I tried:
doc.search("?xml-stylesheet").first['href']
but I get the error:
`on_error': unexpected '?' after '' (Nokogiri::CSS::SyntaxError)
Nokogiri cannot search for tags that are XML processing instructions. You may access them like this:
doc.children[0]
This is not an XML element; this is an XML "Processing Instruction". That is why you could not find it with your query. To find it you want:
# Find the first xml-stylesheet PI
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
# Find every xml-stylesheet PI
xsss = doc.xpath('//processing-instruction("xml-stylesheet")')
Seen in action:
require 'nokogiri'
xml = <<ENDXML
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/templates/disclaimer_en.xsl"?>
<root>Hi Mom!</root>
ENDXML
doc = Nokogiri.XML(xml)
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
puts xss.name #=> xml-stylesheet
puts xss.content #=> type="text/xsl" href="/templates/disclaimer_en.xsl"
Since a Processing Instruction is not an Element, it does not have attributes; you cannot, for example, ask for xss['type'] or xss['href']; you will need to parse the content as an element if you wish this. One way to do this is:
class Nokogiri::XML::ProcessingInstruction
def to_element
document.parse("<#{name} #{content}/>")
end
end
p xss.to_element['href'] #=> "/templates/disclaimer_en.xsl"
Note that there exists a bug in Nokogiri or libxml2 which will cause the XML Declaration to appear in the document as a Processing Instruction if there is at least one character (can be a space) before <?xml. This is why in the above we search specifically for processing instructions with the name xml-stylesheet.
Edit: The XPath expression processing-instruction()[name()="foo"] is equivalent to the expression processing-instruction("foo"). As described in the XPath 1.0 spec:
The processing-instruction() test may have an argument that is Literal; in this case, it is true for any processing instruction that has a name equal to the value of the Literal.
I've edited the answer above to use the shorter format.