Find an element that only has one other kind of child - xpath

I want to use XPath to find every <blockquote> element that has at least one child <pre> element, no other kinds of child elements, and optionally text nodes as children:
<body><div><!-- arbitrary nesting -->
<blockquote><pre>YES</pre></blockquote>
<blockquote><p>NO</p></blockquote>
<blockquote><pre>NO</pre><p>NO</p></blockquote>
<blockquote><p>NO</p><pre>NO</pre></blockquote>
<blockquote><pre>YES</pre> <pre>YES</pre></blockquote>
<blockquote>NO</blockquote>
</div></body>
This XPath appears to work, but I suspect that it's overly complicated:
//blockquote[pre][not(*[not(name()="pre")])]
Is there a better (less code, more efficient, more DRY) way to select what I want?

//blockquote[pre][count(pre)=count(*)]

Use:
//blockquote[* and not(*[not(self::pre)])]
This selects all blockquote elements in the XML document that have at least one element child and don't have any element child that isn't a pre element.
This is just an application of the double negation law :).
Do note, that this expression is more efficient than one that counts all element children (because the selection stops right at the moment a non-pre child is found).
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="//blockquote[* and not(*[not(self::pre)])]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<body><div><!-- arbitrary nesting -->
<blockquote><pre>YES</pre></blockquote>
<blockquote><p>NO</p></blockquote>
<blockquote><pre>NO</pre><p>NO</p></blockquote>
<blockquote><p>NO</p><pre>NO</pre></blockquote>
<blockquote><pre>YES</pre> <pre>YES</pre></blockquote>
<blockquote>NO</blockquote>
</div></body>
the XPath expression is evaluated and the selected nodes are copied to the output:
<blockquote>
<pre>YES</pre>
</blockquote>
<blockquote>
<pre>YES</pre>
<pre>YES</pre>
</blockquote>

Related

XSLT3 joining values with separator

I'm pretty new to XSLT and I've been struggling to replicate the solution mentioned here
XSL for-each: how to detect last node?
for longer than I'm willing to admit :(
I've setup this fiddle. https://xsltfiddle.liberty-development.net/naZXVFi
I was hoping I could use just the value-of + separator, vs choose / when xslt tools, as it did seem more idiomatic.
I can't get the separator to show up;
nor can I select just the child of skill, I always get the descendants too. That's to say, I shouldn't see any detail in the output.
bonus: not sure why that meta tag is not self closing (warning in the html section)
Desired output:
skill1, skill2, skill3, skill4, skill5 (no comma space for the last one)
Any help would be greatly appreciated. Thanks.
EDIT: including the code here too:
xml: (need to add ref to xslt):
<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?> <!-- not in fiddle -->
<skills>
<skill>skill1</skill>
<skill>skill2</skill>
<skill>skill3
<details>
<detail>detail1</detail>
<detail>detail2</detail>
</details>
</skill>
<skill>skill4</skill>
<skill>skill5</skill>
</skills>
And test.xsl:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:array="http://www.w3.org/2005/xpath-functions/array"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="html" indent="yes" html-version="5"/>
<xsl:template match="/">
<html>
<head>
<title>.NET XSLT Fiddle Example</title>
</head>
<body>
<xsl:for-each select="/skills/skill">
<xsl:value-of select="." separator=", "/>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
In general, with XSLT 2/3 to output a sequence separated by some separator string, you simply use xsl:value-of select="$sequence" with the appropriate separator string in the separator attribute (and no for-each):
<xsl:template match="skills">
<xsl:value-of select="skill/text()[normalize-space()]/normalize-space()" separator=", "/>
</xsl:template>
https://xsltfiddle.liberty-development.net/naZXVFi/1
In most cases you would just need select="skill" separator=", " but given your descendants and the white space you seem to want to eliminate the select expression above is a bit more complicated.
Martin has given you the detailed work-through to get the final result including getting rid of the extra spaces etc, but at a high level, here's how to use xsl:value-of with separator correctly.
You have:
<body>
<xsl:for-each select="/skills/skill">
<xsl:value-of select="." separator=", "/>
</xsl:for-each>
</body>
This says that for each skill node, take the content of that node and display it. Notably, the value-of only sees one skill at a time, so there is nothing to join with the comma separator.
The answer which would get you what you want is:
<body>
<xsl:value-of select="/skills/skill" separator=", "/>
</body>
This says to take the set of skill nodes and display them joined by comma separators. You can see the output at https://xsltfiddle.liberty-development.net/naZXVFi/4

Xpath how do I grab contents of an href based on the contents of the href

How do I grab the contents of an href if it includes a specific word, example:
click here
How do I grab 'contacts.asp' based on that it has the word 'contact' in it?
tried variations of //a/#href[contains(#href,'contact')] but don't seem to be getting anywhere
tried variations of //a/#href[contains(#href,'contact')] but don't seem to be getting anywhere
You are nearly there.
In the contains test, you are already in the context of the href attribute, so your test should be against . rather than the #href your xpath has, which is attempting to look for a href attribute under the href attribute. This of course won't work.
Try
//a/#href[contains(.,'contact')]
This says "find all href attributes on a elements, such that the href attribute value itself contains contact".
Note that this returns the href attribute; the library you're usnig will then have a way to pick out the value.
In your Path you are below #href, so your contains won't work.
Try it like this:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes" version="1.0" encoding="utf-8"/>
<xsl:template match="/">
<xsl:value-of select="//a[contains(#href,'contact')]"/>
</xsl:template>
</xsl:stylesheet>

Getting nodes under a specific node element

I need help with my problem over here or at least some advice. I am parsing a HTML document using a HTMLcleaner with the use of XPATH.
I have something like this:
<html>
[code and other <h4> tags]
<h4>Random name</h4>
Text I want to get
Text I want to get 2
Text I want to get 3
Text I want to get 4
<h4> Random name 2 </h4>
Text I don't want to get
[code and other <h4> tags]
</html>
Ok. I have several <h4> tags, each one of them with <a> tags and with the some text. My problem is that I don't know how to get all the respective the text from a specific , just like a "h4[i]". I tried something like this but it didn't work:
String xpath = "h4["+number+"]//a" //where number will increment
Thank you in advice for you help!
Use:
/*/h4[1]/following-sibling::a[not(preceding-sibling::h4[2])]/text()
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/h4[1]/following-sibling::a[not(preceding-sibling::h4[2])]/text()"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document (the provided fragment, wrapped in a single top element to become an well-formed XML document):
<html>
<h4>Random name</h4>
Text I want to get
Text I want to get 2
Text I want to get 3
Text I want to get 4
<h4> Random name 2 </h4>
Text I don't want to get
</html>
The Xpath expression is evaluated and all selected (text) nodes are copied to the output:
Text I want to get Text I want to get 2 Text I want to get 3 Text I want to get 4

why does this xpath selector fail?

given the following html
<p>
<div class="allpricing">
<p class="priceadorn">
<FONT CLASS="adornmentsText">NOW: </FONT>
<font CLASS="adornmentsText">$1.00</font>
</p>
</div>
</p>
why does
//div[#class="allpricing"]/p[#class="priceadorn"][last()]/font[#class="adornmentsText"][last()]
return the expected value of $1.00
but adding the p element
//p/div[#class="allpricing"]/p[#class="priceadorn"][last()]/font[#class="adornmentsText"][last()]
returns nothing?
You cannot place a div inside a p. The div start closes the p automatically. See
Nesting block level elements inside the <p> tag... right or wrong?
I've often found that fixing the cases was the culprit. XPath 1.0 is case sensitive and unless you take care of the mixed cases explicitly, it will fail in a lot of cases.
XPath is case-sensitive.
None of the provided XPath expressions selects any node, because in the provided XML document there is no font element with an attribute named class (the element font has a CLASS attribute and this is different from having a class attribute due to the different capitalization).
Due to the same reason, font and FONT are elements with different names.
These two XPath expressions, when evaluated against the provided XML document, produce the same wanted result:
//div[#class="allpricing"]
/p[#class="priceadorn"]
[last()]
/font[#CLASS="adornmentsText"]
[last()]
and
//p/div[#class="allpricing"]
/p[#class="priceadorn"]
[last()]
/font[#CLASS="adornmentsText"]
[last()]
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
'//div[#class="allpricing"]
/p[#class="priceadorn"]
[last()]
/font[#CLASS="adornmentsText"]
[last()]'/>
=============
<xsl:copy-of select=
'//p/div[#class="allpricing"]
/p[#class="priceadorn"]
[last()]
/font[#CLASS="adornmentsText"]
[last()]
'/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<p>
<div class="allpricing">
<p class="priceadorn">
<FONT CLASS="adornmentsText">NOW: </FONT>
<font CLASS="adornmentsText">$1.00</font>
</p>
</div>
</p>
the two expressions are evaluated and the results of this evaluation are copied to the output:
<font CLASS="adornmentsText">$1.00</font>
=============
<font CLASS="adornmentsText">$1.00</font>
You describe your source as an HTML rather than an XML document, but you haven't explained how you parsed it. If you parse it using an HTML parser, the parser will "repair" it to turn it into valid HTML, which means that the tree it constructs doesn't directly reflect what you wrote in the source. XPath sees this "repaired" tree, not the original.

XPath selection while excluding elements having certain attribute values

My first post here - it's a great site and I will certainly do my best to give back as much as I can.
I have seen different manifestations of this following question; however my attempts to resolve don't appear to work.
Consider this simple tree:
<root>
<div>
<p>hello</p>
<p>hello2</p>
<p><span class="bad">hello3</span></p>
</div>
</root>
I would like to come up with an XPath expression that will select all child nodes of "div", except for elements that have their "class" attribute equal to "bad".
Here is what I have tried:
/root/div/node()[not (#class='bad')]
... However this doesn't seem to work.
What am I missing here?
Cheers,
Isaac
When testing your XPath here with the provided XML document, the XPath seems to be indeed selecting all child nodes that do not have an attribute class="bad" - these are all the <p> elements in the document.
You will note that the only child node that has such an attribute is the <span>, which indeed does not get selected.
Are you expecting the p node surrounding your span not to be selected?
I have been working with XPath in a Java program I'm writing. If you want to select the nodes that don't have class="bad" (i.e. the <span> nodes, but not the surrounding <p> nodes), you could use:
/root/div/descendant::*[not (#class='bad')]
Otherwise, if you want to select the nodes that don't have a child with class='bad', you can use something like the following:
/root/div/p/*[not (#class='bad')]/..
the .. part selects the immediate parent node.
The identity transform just matches and copies everything:
<xsl:template match="#*|node()" >
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
But you add a null transform that more specifically matches the pattern you want to exclude:
<xsl:template match="span[#class='bad']" />
( you can also add a priority attrib if you want to be more explicit about which one has precedence. )
Welcome to SO, Isaac!
I'd try this:
/root/div/*[./*[#class != "bad"]]
this ought to select all child elements (*) of the div element that do not have a descendant element with a class attribute that equals bad.
Edit:
As per #Alejandros comment:
/root/div/*[not(*/#class "bad")]

Resources