XPath HTML finding nodes - xpath

I am using HtmlAgilityPack to try to find HTML 'A' nodes that have a href attribute that contains a certain string, in my case the string '/groups/':
HtmlNodeCollection groups = source.DocumentNode.SelectNodes("//a[contains(#href, '/groups/')]");
Although the source code contains about 20 such nodes my code above is returning none which leads me to believe maybe I'm doing it incorrectly.
Is what I'm doing correct, and if not how can I select nodes that have a certain attribute that has a value that contains a certain string?

Your expression is seems to be correct as for me.
You don't post your source document (or at least a part of it). So, I'll be guessing.
The thing is, xpath is not cool for case insensitive comparison. If you have an <a> tag with href attribute that contains e.g. /Groups/ or /GROUPS/, it won't be matched. There is a workaround for this:
//a[contains(translate(#href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), '/groups/')]
As another option you could use LINQ with StringComparison.OrdinalIgnoreCase:
source.DocumentNode.Descendants("a")
.Where(a => a.GetAttributeValue("href", string.Empty)
.IndexOf("/groups/", StringComparison.OrdinalIgnoreCase) != -1
);

Related

Is it possible in XPATH to find an element by attribute value, not by name?

For example I have an XML element:
<input id="optSmsCode" type="tel" name="otp" placeholder="SMS-code">
Suppose I know that somewhere there must be an attribute with otp value, but I don’t know in what attribute it can be, respectively, is it possible to have an XPath expression of type like this:
.//input[(contains(*, "otp")) or (contains(*, "ode"))]
Try it like this and see if it works:
one = '//input/#*[(contains(.,"otp") or contains(.,"ode"))]/..'
print(driver.find_elements_by_xpath(one))
Edit:
The contains() function has a required cardinality of first argument of either one or zero. In plain(ish) English, it means you can check only one element at a time to see if it contains the target string.
So, the expression above goes through each attribute of input separately (/#*), checks if the attribute value of that specific attribute contains within it the target string and - if target is found - goes up to the parent of that attribute (/..) which, in the case of an attribute, is the node itself (input).
This XPath expression selects all <input> elements that have some attribute, whose string value contains "otp" or "ode". Notice that there is no need to "go up to the parent ..."
//input[#*[contains(., 'otp') or contains(., 'ode')]]
If we know that "otp" or "ode" must be the whole value of the attribute (not just a substring of the value), then this expression is stricter and more efficient to evaluate:
//input[#*[. ='otp' or . = 'ode']]
In this latter case ("otp" or "ode" are the whole value of the attribute), if we have to compare against many values then an XPath expression of the above form will quickly become too long. There is a way to simplify such long expression and do just a single comparison:
//input[#*[contains('|s1|s2|s3|s4|s5|', concat('|', ., '|'))]]
The above expression selects all input elements in the document, that have at least one attribute whose value is one of the strings "s1", "s2", "s3", "s4" or "s5".

How to select a specific category value in this Xpath expression

I have a feed here. I'm trying to create an XPath expression that returns items that have a category equal to Bananas. Due to the limitations in my XML parser, I can't use namespaces directly to select items.
The expression /rss/channel/item//*[name()='itunes:category'] returns this:
Element='<itunes:category
xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
text="Apples"/>'
Element='<itunes:category
xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
text="Bananas"/>'
...
And /rss/channel/item//*[name()='itunes:category']/#text returns this:
Attribute='text=Apples'
Attribute='text=Bananas'
...
But I can't figure out how to limit the response to just a single category (e.g., Bananas)?
I want some kind of expression like this:
/rss/channel/item//*[name()='itunes:category' and contains(., 'Bananas')]
But this doesn't work. It's not syntactically valid. What would be the right XPath expression syntax to just return Bananas?
Do you just mean to filter by attributes of item child, but still return item node?
/rss/channel/item/*[name()='itunes:category' and contains(#text,'Apples')]/parent::item
or simplier
/rss/channel/item[*[name()='itunes:category' and #text='Apples']]
I used Apples in example because using your example xml file there is 0 results for Bananas.

xpath expression to read value based on value of sibling

I've below xml and would like to read the value of 'Value' tag whose Name matches 'test2'. I'm using the below xpath , but did not work. Can someone help.
/*[ local-name()='OutputData']/*[ local-name()='OutputDataItem']/*[ local-name()='Name'][normalize-space(.) = 'test2']//*[local-name()='Value']/text()
<get:OutputData>
<get:OutputDataItem>
<get:Name>test1</get:Name>
<get:Value/>
</get:OutputDataItem>
<get:OutputDataItem>
<get:Name>test2</get:Name>
<get:Value>B5B4</get:Value>
</get:OutputDataItem>
<get:OutputDataItem>
<get:Name>test3</get:Name>
<get:Value/>
</get:OutputDataItem>
<get:OutputDataItem>
<get:Name>OP_VCscEncrptCd_VAR</get:Name>
<get:Value/>
</get:OutputDataItem>
</get:OutputData>
Thanks
You were close, but because the get:name and get:value are siblings, you need to adjust your XPath a little.
Your XPath was attempting to address get:value elements that were descendants of get:name, rather than as siblings. Move the criteria that is filtering the get:name into a predicate, then step down into the get:value:
/*[ local-name()='OutputData']/*[ local-name()='OutputDataItem']
[*[ local-name()='Name'][normalize-space(.) = 'test2']]/*[local-name()='Value']/text()
You could also combine the criteria of the predicate filter on the get:name and use an and:
/*[ local-name()='OutputData']/*[ local-name()='OutputDataItem']
[*[ local-name()='Name' and normalize-space(.) = 'test2']]/*[local-name()='Value']/text()
This should work I think:
//*[local-name()="get:Name" and text()="test2"]/following-sibling::*[local-name()="get:Value"]/text()

Inserting a child node when list is empty (XForms)

My problem is the following :
I usually have those data:
<structures>
<structure id="10">
<code>XXX</code>
</structure>
</structures>
so the table I display (single columns : code) is ok.
But in some cases, the data is the result a a query with no content, so the data is:
<structures/>
resulting in my table not displaying + error.
I am trying to insert, in the case of an empty instance, a single node so that the data would look like:
<structures>
<structure id="0"/>
</structures>
I am trying something like that :
<xforms:action ev:event="xforms-submit-done">
<xforms:insert if="0 = count(instance('{./instance-name}')/root/node())" context="instance('{./instance-name}')/root/node()" origin="xforms:element('structure', '')" />
</xforms:action>
but no node inserted when I look at the data in the inspector in the page.
Any obvious thing I am doing wrong?
There seems to be erros in your XPath if and context expressions:
if="0 = count(instance('{./instance-name}')/root/node())"
context="instance('{./instance-name}')/root/node()"
You are a using curly brackets { and }, I assume to have the behavior of attribute value templates (AVTs). But the if and context expressions are already XPath expressions, so you cannot use AVTs in them. Try instead:
if="0 = count(instance(instance-name)/root/node())"
context="instance(instance-name)/root/node()"
Also, the instance-name path is relative to something which might not be clear when reading or writing the expression. I would suggest using an absolute path for example instance('foo')/instance-name to make things clearer.
You don't provide the structure of the other instances, so I can tell for sure, but you'll expression above suppose that they have the form:
<xf:instance id="foo">
<some-root-element>
<root>
<structure/>
</root>
<some-root-element>
</xf:instance>
I don't know if that's what you intend.
Finally, you could replace count(something) = 0, with empty(something).

How to get H1,H2,H3,... using a single xpath expression

How can I get H1,H2,H3 contents in one single xpath expression?
I know I could do this.
//html/body/h1/text()
//html/body/h2/text()
//html/body/h3/text()
and so on.
Use:
/html/body/*[self::h1 or self::h2 or self::h3]/text()
The following expression is incorrect:
//html/body/*[local-name() = "h1"
or local-name() = "h2"
or local-name() = "h3"]/text()
because it may select text nodes that are children of unwanted:h1, different:h2, someWeirdNamespace:h3.
Another recommendation: Always avoid using // when the structure of the XML document is statically known. Using // most often results in significant inefficiencies because it causes the complete document (sub)tree roted in the context node to be traversed.

Resources