I am trying to get the following working but alas it is failing to return the expected result. Although similar to questions I have asked before I am restricted by being limited to xpath 1.0.
I am looking to use xpath to get the first text node inside the "subtitle" node. The XML is as follows:
<topic class="coverPage">
<subtitle id="IDb2907ca1-51fe-472e-bf99-246126937eab">
<xt:delText xt:action="start" xt:author="James Doherty" xt:dateTime="2016-01-27T17:07:00" xt:id="fb72fba6-f502-422e-9e91-1731ed007e98"/>
Ignore
<xt:delText xt:action="end" xt:id="fb72fba6-f502-422e-9e91-1731ed007e98"/>
Sub-title
<xt:insText xt:action="start" xt:author="James Doherty" xt:dateTime="2016-01-27T14:55:00" xt:id="44ac82c2-acfc-4721-b962-20ac2b18d9f3"/>
Insert Additional Text
<xt:insText xt:action="end" xt:id="44ac82c2-acfc-4721-b962-20ac2b18d9f3"/>
Extra Text
</subtitle>
</topic>
Alternative XML is also provide below:
<topic class="coverPage">
Sub-title
<subtitle id="IDb2907ca1-51fe-472e-bf99-246126937eab">
<xt:delText xt:action="start" xt:author="James Doherty" xt:dateTime="2016-01-27T17:07:00" xt:id="fb72fba6-f502-422e-9e91-1731ed007e98"/>
Ignore
<xt:delText xt:action="end" xt:id="fb72fba6-f502-422e-9e91-1731ed007e98"/>
<xt:insText xt:action="start" xt:author="James Doherty" xt:dateTime="2016-01-27T14:55:00" xt:id="44ac82c2-acfc-4721-b962-20ac2b18d9f3"/>
Insert Additional Text
<xt:insText xt:action="end" xt:id="44ac82c2-acfc-4721-b962-20ac2b18d9f3"/>
Extra Text
</subtitle>
</topic>
I have tried the following but with no luck:
/topic[#class='coverPage']/*[local-name()='subtitle']/text()[1]|/topic[#class='coverPage']/*[local-name()='subtitle']/*[substring(local-name(), string-length(local-name())-string-length('Text')+1)='Text'][#*[local-name()='action']='end'][1]/following-sibling::text()[1]
I believe the issue is with the attribute value "action" and it having a namespace. The expected results would be "Sub-title". Any ideas on how I can get this to work?
The XML is missing the xml namespace definition, as kindly pointed out by #KeithHall in the comments of the OP.
<topic class="coverPage" "xmlns:xt="http://stackoverflow.com/questions/35063599/xpath-filter-attribute-with-namespace">
<subtitle id="IDb2907ca1-51fe-472e-bf99-246126937eab">
<xt:delText xt:action="start" xt:author="James Doherty" xt:dateTime="2016-01-27T17:07:00" xt:id="fb72fba6-f502-422e-9e91-1731ed007e98"/>
Ignore
<xt:delText xt:action="end" xt:id="fb72fba6-f502-422e-9e91-1731ed007e98"/>
Sub-title
<xt:insText xt:action="start" xt:author="James Doherty" xt:dateTime="2016-01-27T14:55:00" xt:id="44ac82c2-acfc-4721-b962-20ac2b18d9f3"/>
Insert Additional Text
<xt:insText xt:action="end" xt:id="44ac82c2-acfc-4721-b962-20ac2b18d9f3"/>
Extra Text
</subtitle>
</topic>
To filter the XML based on an attribute value, with namespace, the following xpath is used:
[#*[local-name()='action']='end']
The full xpath to achieve the expected result is below, unchanged from the OP.
/topic[#class='coverPage']/*[local-name()='subtitle']/text()[1]|/topic[#class='coverPage']/*[local-name()='subtitle']/*[substring(local-name(), string-length(local-name())-string-length('Text')+1)='Text'][#*[local-name()='action']='end'][1]/following-sibling::text()[1]
Related
Here's my xml,
<w:tc>
<w:p>
<w:pPr></w:pPr>
<w:r></w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:pPr></w:pPr>
</w:p>
</w:tc>
I want to match w:p which is preceded by w:tc and has no following sibling w:r, Precisely i want second w:tc. Code what i have tried,
<xsl:template match="w:pPr[ancestor::w:p[ancestor::w:tc] and not(following-sibling::w:r)]">
I need xpath for w:pPr having no following-sibling
The problem is when w:pPr is followed by w:hyperlink. Now i have ignored w:hyperlink too.
If you want to match a w:pPr that has no following sibling elements at all (regardless of name), then just use a match pattern of
w:pPr[ancestor::w:p[ancestor::w:tc] and not(following-sibling::*)]
or equivalently (and slightly shorter)
w:tc//w:p//w:pPr[not(following-sibling::*)]
Using the XPath is simple and straightforward, you have to filter elements olny. Your filtring could be based on the content of the element (using [] and path inside the brackets). With the filtered elements you can work as same as with the XML tree (start filtering again or select the final elements).
In your case, first you have to choose the correct tc element (filter the element as you need):
Based on the count of elements: //tc[count(./p/*) = 1], or
Based on non existing r element: //tc[not(./p/r)], or
Based on non existing r and hyperlink element: //tc[not(./p/r) and not(./p/hyperlink)]
Based on existing pPr and non existing r (it is not a necessary because the pPr is filtred in second step): //tc[./p/r and not(./p/r)]
It returns the following XML.
<tc>
<p>
<pPr>pPr</pPr>
</p>
</tc>
Then just simply say what do you want from the new XML:
Do you want the pPr element? Use: /p/pPr
All together:
//tc[count(./p/*) = 1]/p/pPr
or
//tc[not(./p/r)]/p/pPr
Note: // means find the element anywhere in the document.
Update 1: Hyperlink condition added.
I'm constructing a DTD which has a fuel_system element.
I want to restrict the text between <fuel_system> tag. It must be only carbureted or fuel-injected. How can I do this?
I don't mention something like this = > attribute type (carbureted, fuel-injected), because I want to force this rule in <fuel_system> tags, not the attribute of fuel_system.
when defining an element in a DTD, there is no way to restrict the text inside the element. you can only tell what other element (child elements) it might contain and their order, or you can tell that the element contains text, or a mixture of the 2.
so, basically you have 2 options for restricting the <fuel-system>: either declare it as an attribute (<fuel-system type="fuel-injected"/>), or declare children elements <fuel-injected> and <carburated>. the choice between those 2 options depends on what you are trying to describe and what will change depending on the type of fuel-system.
(the grammar for the declaration of an element is defined here)
examples: first option, attributes
<!ELEMENT fuel-system EMPTY>
<!ATTLIST fuel-system (fuel-injected|carburated) #REQUIRED>
second option, child elements
<!ELEMENT fuel-system (fuel-injected|carburated)>
<!ELEMENT fuel-injected ...>
<!ELEMENT carburated ...>
Does it have to be a DTD? Is XML Schema an option?
Using XML Schema you can restrict element text to an enumerated list of values:
<xs:element name="fuel-system">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="fuel-injected"/>
<xs:enumeration value="carbourated"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
I am using Ruby to retrieve an XML document with the following format:
<project>
<users>
<person>
<name>LUIS</name>
</person>
<person>
<name>JOHN</name>
</person>
</users>
</project>
I want to know how to produce the following result, with the tags concatenated:
<project>
<users>
<person>
<name>LUIS JOHN</name>
</person>
</users>
</project>
Here is the code I am using:
file = File.new( "proyectos.xml" )
doc3 = Nokogiri::XML(file)
a=0
#participa = doc3.search("person")
#participa.each do |i|
#par = #participa.search("name").map { |node| node.children.text }
#par.each do |i|
puts #par[a]
puts '--'
a = a + 1
end
end
Rather than supply code, here's how to fish:
To parse your XML into Nokogiri, which I recommend highly:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<project>
<users>
<person>
<name>LUIS</name>
</person>
<person>
<name>JOHN</name>
</person>
</users>
</project>
EOT
That gives you a doc variable which is the DOM as a Nokogiri::XML::Document. From that you can search, either for matching nodes or a particular node. search allows you to pass an XPath or CSS accessor to locate what you are looking for. I recommend CSS for most things because it is more readable, but XPath has some great tools to dig into the structure of your XML, so often I end up with both in my code.
So, doc.at('users') is the CSS accessor to find the first users node. doc.search('person') will return all nodes matching the person tag as a NodeSet, which is basically an array which you can enumerate or loop over.
Nokogiri has a text method for a node that lets you get the text content of that node, including all the carriage-returns between nodes that would normally be considered formatting in the XML as it flows down the document. When you have the text of the node, you can apply the normal Ruby string processing commands, such as strip, squish, chomp, etc., to massage the text into a more usable format.
Nokogiri also has a children= method which lets you redefine the child nodes of a node. You can pass in a node you've created, a NodeSet, or even the text you want rendered into the XML at that point.
In a quick experiment, I have code that does what you want in basically four lines. But, I want to see your work before I share what I wrote.
Finally, puts doc.to_xml will let you easily see if your changes to the document were successful.
Here's how I'd do it:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<project>
<users>
<person>
<name>LUIS</name>
</person>
<person>
<name>JOHN</name>
</person>
</users>
</project>
EOT
The XML is parsed into a DOM now. Search for the users tags, then locate the embedded name tags and extract the text from them. Join the results into a single space-delimited string. Then replace the children of the users tag with the desired results:
doc.search('users').each do |users|
user_names = users.search('name').map(&:text).join(' ')
users.children = "<person><name>#{ user_names }</name></person>"
end
If you output the resulting XML you'll get:
puts doc.to_xml
<?xml version="1.0"?>
<project>
<users><person><name>LUIS JOHN</name></person></users>
</project>
I'm trying to parse an xml file
My code looks like:
string path2 = "xmlFile.xml";
XmlDocument xDoc = new XmlDocument();
xDoc.Load(path2);
XmlNodeList xnList = xDoc.DocumentElement["feed"].SelectNodes("entry");
But can't seem to get the listing of nodes. I get the error message- "Use the 'new' keyword to create an object instance." and it seems to be on 'SelectNodes("entry")'
This code worked when I loaded the xml from an rss feed, but not a local file. Can you tell me what I'm doing wrong?
My xml looks like:
<?xml version="1.0"?>
<feed xmlns:media="http://search.yahoo.com/mrss/" xmlns:gr="http://www.google.com/schemas/reader/atom/" xmlns:idx="urn:atom-extension:indexing" xmlns="http://www.w3.org/2005/Atom" idx:index="no" gr:dir="ltr">
<entry gr:crawl-timestamp-msec="1318667375230">
<title type="html">Title 1 text</title>
<summary>summary 1 text text text</summary>
</entry>
<entry gr:crawl-timestamp-msec="1318667375230">
<title type="html">title 2 text</title>
<summary>summary 2 text text text</summary>
</entry>
</feed>
Take the namespace into acount:
XmlNamespaceManager mgr = new XmlNamespaceManager(XDoc.NameTable);
mgr.AddNamespace("atom", "http://www.w3.org/2005/Atom");
XmlNodeList xnList = xDoc.SelectNodes("//atom:entry", mgr);
This is the infamous most FAQ about XPath -- referring to the names of elements that are in a default namespace.
Short answer: search for "XPath default namespace" and understand the problem.
Then use an XmlNamespaceManager instance to add an association between a prefix (say "x") and the default namespace (in your case "http://www.w3.org/2005/Atom").
Finally, replace any Name with x:Name in your XPath expression.
A test sample of my xml file is shown below:
test.xml
<feed>
<entry>
<title>Link ISBN</title>
<libx:libapp xmlns:libx="http://libx.org/xml/libx2" />
</entry>
<entry>
<title>Link Something</title>
<libx:module xmlns:libx="http://libx.org/xml/libx2" />
</entry>
</feed>
Now, I want to write an xquery which will find all <entry> elements which have <libx:libapp> as a child. Then, for all such entries return the title if the title contains a given keyword (such as Link). So, in my example xml document the xquery should return "Link ISBN".
My sample xquery is shown below:
samplequery.xq (here doc_name is the xml file shown above and libapp_matchkey is a keyword such as 'Link')
declare namespace libx='http://libx.org/xml/libx2';
declare variable $doc_name as xs:string external;
declare variable $libpp_matchkey as xs:string external;
let $feeds_doc := doc($doc_name)
for $entry in $feeds_doc/feed/entry
(: test whether entry has libx:libapp child and has "Link" in its title child :)
where ($entry/libx:libapp and $entry/title/text()[contains(.,$libapp_matchkey)])
return $entry/title/text()
This xquery is returning null instead of the expected result 'Link ISBN'. Why is that?
I want to write an xquery which will
find all elements which have
as a child. Then, for
all such entries return the title if
the title contains a given keyword
(such as Link).
Just use:
/*/entry[libx:libapp]/title[contains(.,'Link')]/text()
Wrapping this XPath expression in XQuery we get:
declare namespace libx='http://libx.org/xml/libx2';
/*/entry[libx:libapp]/title[contains(.,'Link')]/text()
when applied on the provided XML document:
<feed>
<entry>
<title>Link ISBN</title>
<libx:libapp xmlns:libx="http://libx.org/xml/libx2" />
</entry>
<entry>
<title>Link Something</title>
<libx:module xmlns:libx="http://libx.org/xml/libx2" />
</entry>
</feed>
the wanted, correct result is produced:
Link ISBN