how can I get some attributes when using Nokogiri - ruby

Here is my xml string, I want to get the English result "B"
<body>
<attributes>
<attr name="Math" namespace="">A</attr>
<attr name="English" namespace="" parentName="">B</attr>
</attributes>
</body>
Here is how I parse the xml string
result = Nokogiri::XML(xml)
res = Hash.from_xml(result.to_s)
But when I use res["body"]["attributes"], I only get the result set
=> {"attr"=>["A", "B"]}
Because the order of attr is not regular,
So how can I judge what the English result really is ?
Thanks!

You can use css selector:
result.css("attr[name='English']").children.to_s
will give you "B"

Related

Get low level xpath from XML with Nokogiri

I'm trying to store in an array all the unique Xpaths of the low level elements in the XML below, but like I'm doing in array a is being stored all the XML, not only the Xpath themselves. The XML has different levels of Xpath. I mean, some child elements only have 2 ancestors and others more than one.
This is the code I have.
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<name>Cake</name>
<ppu>0.55</ppu>
<batters>
<batter>Regular</batter>
<batter>Chocolate</batter>
<batter>Blueberry</batter>
<batter>Devil's Food</batter>
</batters>
<topping>None</topping>
<topping>Glazed</topping>
<topping>Sugar</topping>
<topping>Powdered Sugar</topping>
<topping>Chocolate with Sprinkles</topping>
<topping>Chocolate</topping>
<topping>Maple</topping>
</item>
<item>
<name>Raised</name>
<ppu>0.55</ppu>
<batters>
<batter>Regular</batter>
</batters>
<topping>None</topping>
<topping>Glazed</topping>
<topping>Sugar</topping>
<topping>Chocolate</topping>
<topping>Maple</topping>
</item>
</items>
EOT
a = []
a = doc.xpath("//*")
puts a
I'd like to store in array "a" only the unique xpaths as below:
/items/item/name
/items/item/ppu
/items/item/batters/batter
/items/item/topping
Maybe somebody could help me in how to do this.
Thanks for the help.
What you want to select is the "leaf" nodes. You can do it like so:
doc.xpath("//*[not(*)]")
This means "select all elements that don't contain elements".
If you want the XPaths, you'll need to call .path on each node. But the paths provided by Nokogiri have explicit positions (e.g. /items/item[2]/topping[4]), so you'll have to apply a regex to remove them, then remove duplicates with uniq:
doc.xpath("//*[not(*)]").map {|leaf| leaf.path.gsub(/\[.*?\]/, '') }.uniq
Output:
/items/item/name
/items/item/ppu
/items/item/batters/batter
/items/item/topping

Get attribute value from XML

I have this chunk of XML:
<show name="Are We There Yet?">
<sid>24588</sid>
<network>TBS</network>
<title>The Kwandanegaba Children's Fund Episode</title>
<ep>03x31</ep>
<link>
http://www.tvrage.com/shows/id-24588/episodes/1065228407
</link>
</show>
I am trying to get "Are we there yet?" via Nokogiri. It is effectively the 'name' attribute of 'show'. I'm struggling to figure out how to parse this.
xml.at_css('show').value was my best guess but doesn't work.
You can use the following:
xml.at('//show/#name').text
which is XPath expression that returns the name attribute from the show element.
Use:
require 'nokogiri'
xml =<<EOT
<show name="Are We There Yet?">
<sid>24588</sid>
<network>TBS</network>
<title>The Kwandanegaba Children's Fund Episode</title>
<ep>03x31</ep>
<link>
http://www.tvrage.com/shows/id-24588/episodes/1065228407
</link>
</show>
EOT
xml = Nokogiri::XML(xml)
puts xml.at('show')['name']
=> Are We There Yet?
at accepts either CSS or XPath expressions, so feel free to use it for both. Use at_css or at_xpath if you know you need to declare the expression as CSS or XPath, respectively. at returns a Node, so you can simply reference the parameters of the node like you would a hash.

XQuery ancestor axis doesn't work, but explicit XPath does

Consider the following XML snippet:
<doc>
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>
</doc>
In XQuery, I have a function that needs to do some things based on the ancestor chapter of a given "para" element that is passed in as a parameter, as shown in the stripped down example below:
declare function doSomething($para){
let $chapter := $para/ancestor::chapter
return "some stuff"
};
In that example, $chapter keeps coming up empty. However, if I write the function similar to the follwing (i.e., without using the ancestor axis), I get the desired "chapter" element:
declare function doSomething($para){
let $chapter := $para/../..
return "some stuff"
};
The problem is that I cannot use explicit paths as in the latter example because the XMl I will be searching is not guaranteed to have the "chapter" element as a grandparent every time. It may be a great-grandparent or great-great-grandparent, and so on, as shown below:
<doc>
<chapter id="1">
<item>
<subItem>
<para>some text here</para>
</subItem>
</item>
</chapter>
</doc>
Does anyone have an explanation as to why the axis doesn't work, while the explicit XPath does? Also, does anyone have any suggestions on how to solve this problem?
Thank you.
SOLUTION:
The mystery is now solved.
The node in question was re-created in another function, which had the result of stripping it of all of its ancestor information. Unfortunately, the previous developer did not document this wonderful, little function and has cost us all a good deal of time.
So, the ancestor axis worked exactly as it should - it was just being applied to a deceptive node.
I thank all of you for your efforts in answering my questions.
The ancestor axis does work fine. I suspect your problem is namespaces. The example you showed and that I ran (below) has XML without any namespaces. If your XML have a namespace then you would need to provide that in the ancestor XPath, like this: $para/ancestor:foo:chapter where in this case the prefix _foo_ is bound to the correct namespace for the chapter element.
let $doc := <doc>
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>
</doc>
let $para := $doc//para
return $para/ancestor::chapter
RESULT:
<?xml version="1.0" encoding="UTF-8"?>
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>
These things almost always boil down to namespaces! As a daignostic to confirm 100% that namespace are not the issue, can you try:
declare function local:doSomething($para) {
let $chapter := $para/ancestor::*[local-name() = 'chapter']
return $chapter
};
This seems surprising to me; which XQuery implementation are you using? With BaseX, the following query...
declare function local:doSomething($para) {
let $chapter := $para/ancestor::chapter
return $chapter
};
let $xml :=
<doc>
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>
</doc>
return local:doSomething($xml//para)
...returns...
<chapter id="1">
<item>
<para>some text here</para>
</item>
</chapter>
I suspect namespaces too. If $para/../.. works but $para/parent::item/parent::chapter turns up empty, then you know it's a question of namespaces.
Look for an xmlns declaration at the top of your content, e.g.:
<doc xmlns="http://example.com">
...
</doc>
In your XQuery, you then need to bind that namespace to a prefix and use that prefix in your XQuery/XPath expressions, like this:
declare namespace my="http://example.com";
declare function doSomething($para){
let $chapter := $para/ancestor::my:chapter
return "some stuff"
};
What prefix you use doesn't matter. The important thing is that the namespace URI (http://example.com in the above example) matches up.
It makes sense that ../.. selects the element you want, because .. is short for parent::node() which selects the parent node regardless of its name (or namespace). Whereas ancestor::chapter will only select <chapter> elements that are not in a namespace (unless you have declared a default element namespace, which is usually not a good idea in XQuery because it affects both your input and your output).

XmlNodeList SelectNodes trouble

I'm trying to parse an xml file
My code looks like:
string path2 = "xmlFile.xml";
XmlDocument xDoc = new XmlDocument();
xDoc.Load(path2);
XmlNodeList xnList = xDoc.DocumentElement["feed"].SelectNodes("entry");
But can't seem to get the listing of nodes. I get the error message- "Use the 'new' keyword to create an object instance." and it seems to be on 'SelectNodes("entry")'
This code worked when I loaded the xml from an rss feed, but not a local file. Can you tell me what I'm doing wrong?
My xml looks like:
<?xml version="1.0"?>
<feed xmlns:media="http://search.yahoo.com/mrss/" xmlns:gr="http://www.google.com/schemas/reader/atom/" xmlns:idx="urn:atom-extension:indexing" xmlns="http://www.w3.org/2005/Atom" idx:index="no" gr:dir="ltr">
<entry gr:crawl-timestamp-msec="1318667375230">
<title type="html">Title 1 text</title>
<summary>summary 1 text text text</summary>
</entry>
<entry gr:crawl-timestamp-msec="1318667375230">
<title type="html">title 2 text</title>
<summary>summary 2 text text text</summary>
</entry>
</feed>
Take the namespace into acount:
XmlNamespaceManager mgr = new XmlNamespaceManager(XDoc.NameTable);
mgr.AddNamespace("atom", "http://www.w3.org/2005/Atom");
XmlNodeList xnList = xDoc.SelectNodes("//atom:entry", mgr);
This is the infamous most FAQ about XPath -- referring to the names of elements that are in a default namespace.
Short answer: search for "XPath default namespace" and understand the problem.
Then use an XmlNamespaceManager instance to add an association between a prefix (say "x") and the default namespace (in your case "http://www.w3.org/2005/Atom").
Finally, replace any Name with x:Name in your XPath expression.

Using ruby/nokogiri to transform xml to another xml

I've never encountered task of transforming XML from one form to another. I hear that XSLT is just for that, but I don't want to go there. So, using only ruby and nokogiri, how can I:
remove all item elements but time from initial XML and also rename element time to HammerTime?
Initial XML:
...
<item>
<time>05.04.2011 9:53:23</time>
<iddqd>42</iddqd>
<idkfa>woot</idkfa>
</item>
<item>
...
Desired result:
...
<item>
<HammerTime>05.04.2011 9:53:23</HammerTime>
</item>
<item>
...
I figured out how to put data from XML to array using nokogiri's .xpath, but is there a way to make the desired transformation into another XML without manually having to write something like puts "<HammerTime>#{array['time']}</HammerTime>"?
Here you go:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML <<-EOHTML
<html>
<body>
<item>
<time>05.04.2011 9:53:23</time>
<iddqd>42</iddqd>
<idkfa>woot</idkfa>
</item>
</body>
</html>
EOHTML
hammer = doc.at_css "time"
hammer.name = 'hammertime'
doc.css("iddqd").remove
doc.css("idkfa").remove
outfile = File.new("output.html", "w")
outfile.puts doc.to_html
outfile.close
What do you mean with
into another XML without manually having to write something like puts "<HammerTime>#{array['time']}</HammerTime>"?
If you want to transform an XML element into another in a language-independent way, you can use XSLT transformations (or stylesheet). Once you have your XSLT file you can apply it with Nokogiri's Nokogiri::XSLT::Stylesheet#apply_to.

Resources