xpath - using contains with a wildcard - xpath

I have the following, and trying to see if there's a better approach. I know it cn be done using starts-with/contains. I'm testing with firefox 10, which I believe implements xpath 2.+.
Test node is
<a id="foo">
.
.
.
<a id="foo1">
.
<a id="foo2">
Is there a way to use wildcards to be able to get the foo1/foo2 nodes..
Something like
//a[#id =* 'foo']
or
//a[contains(#id*,'foo')]
Which would say, give me the "a" where the id starts with "foo" but has additional chars... This would then skip the 1st node with the "foo"
I thought I had seen an rticle on this, but can't find it!
As I recall, the article stated that xpath had a set of operators that could be used to designate the start/end of a given pattern in a string.
thanks

Use:
//a[#id[starts-with(., 'foo') and string-length() > 3]]

Not a wildcard, but probably what you need nonetheless:
//a[(#id!='foo') and starts-with(#id,'foo')]
See W3C XPath spec.

Related

What's wrong with this xpath statement?

Trying to get the color WHITE out of the line of code.
<a href="javascript:void(0)" class="itemAttr current" title="WHITE" data-
value="WHITE"><img src="https://gloimg.rglcdn.com/rosegal/pdm-product-
pic/Clothing/2019/06/05thumb-img/1559762268621192281.jpg"></a>
I've tried this:
color = driver.find_element_by_xpath("""//p[#id="select-attr-
0"]/a[#href="javascript:void(0)"]#title""").click()
I get this error message:
The string
'//p[#id="select-attr-0"]/a[#href="javascript:void(0)"]#title' is not
a valid XPath expression.
What I want is to get "WHITE".
It looks like you are missing a / before the #title attribute. Try this xpath instead:
//p[#id="select-attr-0"]/a[#href="javascript:void(0)"]/#title
In order to get an attribute value of an element, you need to put '/' before the '#title', so the following should work (provided the parent element p is correctly addressed):
//p[#id="select-attr-0"]/a[#href="javascript:void(0)"]/#title
When working with XPATHs, it is often useful to use one of free online testers to get instant path feedback, e.g. this one
Try using the below xpath snippet.
//p[#id='select-attr- 0']//child::a[#value='WHITE']

xpath query url with one folder depth only

I am using this XPath query succesfully:
//div[(#class="result")]//a[contains(#href,"pinterest.com")]/#href
The URL I am using the XPath query (with simple_html_dom.php) is this one here.
Now, I would like to find results for pinterest.com/one-folder-deep-only and exclude all URLs deeper than one directory, like pinterest.com/one-folder-deep-only/this or pinterest.com/one-folder-deep-only/this/this. I have no idea if there is a way to achieve that. Have googled a lot, but not found anything. Maybe my search terms weren't the best.
Do you have any ideas? Thanks for helping me here.
I am testing the query using the Chrome XPath Helper.
"//" is to evaluate all levels/depths. Instead use only one "/" for the "a" query to only evaluate immediate children
//div[(#id="first-result")]/a[contains(#href,"url.com")]/#href
Note use of / instead of // before the "a" tag.
Try below XPath to select #href from required anchors only:
//a[contains(#href, "url.com") and not(contains(substring-after(./#href, 'url.com/'), "/"))]/#href
Solution for XPath 2.0:
//a[contains(#href, "url.com") and count(tokenize(#href, "/"))=2]/#href
Note that if in real HTML source href starts-with "http://url.com" you should specify =4 instead of =2

How to use substring() with Import.io?

I'm having some issues with XPath and import.io and I hope you'll be able to help me. :)
The html code:
<a href="page.php?var=12345">
For the moment, I manage to extract the content of the href ( page.php?var=12345 ) with this:
./td[3]/a[1]/#href
Though, I would like to just collect: 12345
substring might be the solution but it does not seem to work on import.io as I use it...
substring(./td[3]/a[1]/#href,13)
Any ideas of what the problem is?
Thank's a lot in advance!
Try using this for the xpath: (Have the field selected as Text)
.//*[#class='oeil']/a/#href
Then use this for your regex:
([^=]*)$
This will get you the ISBN number you are looking for.
import.io only support functions in XPath when they return a node list
Your path expression is fine, but perhaps it should be
substring(./td[3]/a[1]/#href,14)
"Does not seem to work" is not a very clear description of what is wrong. Do you get error messages? Is the output wrong? Do you have any code surrounding the path expression you could show?
You can use substring, but using substring-after() would be even better.
substring-after(/a/#href,'=')
assuming as input the tiny snippet you have shown:
<a href="page.php?var=12345"/>
will select
12345
and taking into account the structure of your input
substring-after(./td[3]/a[1]/#href,'=')
A leading . in a path expression selects only immediate child td nodes of the current context node. I trust you know what you are doing.

What's the xpath syntax to get tag names?

I'm using Nokogiri to parse a large XML file. Say I've got the following structure:
<menagerie>
<penguin>Pablo</penguin>
<penguin>Mortimer</penguin>
<bull>Ferdinand</bull>
<aardvark>James Cornelius Madison Humphrey Zophar Handlebrush III</aardvark>
</menagerie>
I can count the non-penguins like this:
xml.xpath('//menagerie//*[not(penguin)]').length // 2
But how do I get a list of the tags, like this? (The exact format isn't important; I just want to visually scan the non-penguins.)
bull
aardvark
Update
This gave me the list I wanted - thanks Oded and TMN and delnan!
xml.xpath('//menageries/*[not(penguin)]').each do |node|
puts node.name()
end
You can use the name() or local-name() XPath function.
See the examples on zvon.
I know it's a bit outdated but you should do: xml.xpath('//meagerie/*[not(penguin)]/name()') as the expression. Note the slash, not the dot. This is how you call methods on the current node in XPath.

Problem running xpath query with namespaces

I'm trying to use an xpath expression to select a node-set in an xml document with different namespaces defined.
The xml looks something like this:
<?POSTEN SND="SE00317644000" REC="5566420989" MSGTYPE="EPIX"?>
<ns:Msg xmlns:ns="http://www.noventus.se/epix1/genericheader.xsd">
<GenericHeader>
<SubsysId>1</SubsysId>
<SubsysType>30003</SubsysType>
<SendDateTime>2009-08-13T14:28:15</SendDateTime>
</GenericHeader>
<m:OrderStatus xmlns:m="http://www.noventus.se/epix1/orderstatus.xsd">
<Header>
<OrderSystemId>Soda SE</OrderSystemId>
<OrderNo>20090811</OrderNo>
<Status>0</Status>
</Header>
<Lines>...
I want to select only "Msg"-nodes that has the "OrderStatus" child and therefore I want to use the following xpath expression: /Msg[count('OrderStatus') > 0] but this won't work since I get an error message saying: "Namespace Manager or XsltContext needed. This query has a prefix, variable, or user-defined function".
So I think I want to use an expression that looks something like this: /*[local-name()='Msg'][count('OrderStatus') > 0] but that doesn't seem to work.. any ideas?
Br,
Andreas
I want to use the following xpath
expression:
/Msg[count('OrderStatus')[ 0]
but this won't work since I get an error message saying: "Namespace
Manager or XsltContext needed.
This is a FAQ.
In XPath a unprefixed name is always considered to belong in "no namespace".
However, the elements you want to select are in fact in the "http://www.noventus.se/epix1/genericheader.xsd"
namespace.
You have two possible ways to write your XPath expression:
Use the facilities of the hosting language to associate prefixes to all different namespaces to which names from the expression belong. You haven't indicated what is the hosting language in this concrete case, so I can't help you with this. A C# example can be found here.
If you have associated the prefix "xxx" to the namespace "http://www.noventus.se/epix1/genericheader.xsd" and the prefix "yyy" to the namespace "http://www.noventus.se/epix1/orderstatus.xsd", then your Expression can be written as:
/xxx:Msg[yyy:OrderStatus]
:2: If you don't want to use any prefixes at all, an XPath expression can still be constructed, however it will not be too readable:
/*[local-name() = 'Msg' and *[local-name() = 'OrderStatus']]
Finally, do note:
In order to test if an element x has a child y it isn't necessary to test for a positive count(y). Just use: x[y]
Xpath positions are 1-based. This means that NodeSetExpression[0] never selects a node. You want: NodeSetExpression[1]

Resources