How to write Xpath expressions to distinguish between results? - xpath

I am new to xpath expression. Need help on a issue
Consider the following Document :
<tbody><tr>
<td>By <strong>Bec</strong></td>
<td><strong>Great Support</strong></td>
</tr></tbody>
In this I have to find the text inside tags separately.
Following is my xpath expression:
//tbody//td//strong/text();
It evaluates output as expected:
Bec
Great Support
How can I write xpath expressions to distinguish between the results i.e Becand Great Support

It's rather unclear what you're trying to do, but the following should succeed in selecting them separately:
//tbody/tr/td[1]/strong
and
//tbody/tr/td[2]/strong
Note that the text() you had at the end is most likely not needed in this case.

Not sure I understand 100%, but if you're trying to get the text of the first and the second strong tags, you can use position (1 based index)
//tbody/td[position()=1]/strong/text() //first text
//tbody/td[position()=2]/strong/text() //second text
This solution only applies to the current sample though, where your strong tags are inside either the first or second td tag.

Not sure this is what you're looking for... anyway, assuming you're asking to retrieve a node based on its text you can look up for text content by doing something like:
//tbody//td//strong/text()[.="Bec"]
PS
in [.=""] the dot is an alias for text() self::node() (thanks JLRishe for pointing out the mistake).

Related

Evaluate xpath selector to get text in p- and li-tags

For purposes to automatically replace keywords with links based on a list of keyword-link pairs I need to get text that is not already linked, not a script or manually excluded, inside paragraphs (p) and list items (li) –- to be used in Drupal's Alinks module.
I modified the existing xpath selector as follows and would like to get feedback on it, if it is efficient or might be improved:
//*[p or li]//text()[not(ancestor::a) and not(ancestor::script) and not(ancestor::*[#data-alink-ignore])]
The xpath is meant to work with any html5 content, also with self closing tags (not well-formed xml) -- that's the way the module was designed, and it works quite well.
In order to select text node descendant of p or li elements that are not descendant of a or script elements, you can use this XPath 1.0:
//*[self::p|self::li]
//text()[
not(ancestor::a|ancestor::script|ancestor::*[#data-alink-ignore])
]
Your XPath expression is invalid. You are missing a / before text(). So a valid expression would be
//*[p or li]/text()[not(ancestor::a) and not(ancestor::script) and not(ancestor::*[#data-alink-ignore])]
But without an XML source file it is impossible to tell if this expression would match your desired node.

xpath search from importxml function in google sheets

how do I get "Div/yield" value from here? i've tried //td[node()='Div/yield' and //td[text()='Div/yield'.
and //td[#data-snapfield='latest_dividend-dividend_yield']/following-sibling::td
#sideshowbarker is correct in that there's a newline at the end so looking for an element with the exact text would return 0 results. Another way to do this (one is through #sideshowbarker's answer) is to look for an element that contains this text. So the first step is:
//td[contains(text(),'Div/yield')]
But you don't need this. Your last answer is on the right track. You've identified the element that you're after, but I think you're looking for the text. So you need to add text() at the end:
//td[#data-snapfield='latest_dividend-dividend_yield']/following-sibling::td/text()
But if you want to use the field name, so you could use the xpath for the other fields as well, then just combine these:
//td[contains(text(),'Field name')]/following-sibling::td/text()
Now just replace Field name with the field you're after..
e.g. 'Div/yield': //td[contains(text(),'Div/yield')]/following-sibling::td/text()

XPath different in IE and Firefox. Why?

I used Firebug's Inspect Element to capture the XPath in a webpage, and it gave me something like:
//*[#id="Search_Fields_profile_docno_input"]
I used the Bookmarklets technique in IE to capture the XPath of the same object, and I got something like:
//INPUT[#id='Search_Fields_profile_docno_input']
Notice, the first one does not have INPUT instead has an asterisk (*). Why am I getting different XPath expressions? Does it matter which one I use for my tests like:
Selenium.Click(//*[#id="Search_Fields_profile_docno_input"]);
OR
Selenium.Click(//INPUT[#id='Search_Fields_profile_docno_input']);
*[Id=] denotes that it can be any element while the second one clearly mentions selenium to look ONLY for INPUT fields which have id as Search_Fields_profile_docno_input. The second xpath is better due to following reasons
It takes more time to find the element using * as IDs of all elements should be matched.
If your HTML code is not "well written" there could be other elements which have the same id and this could cause your test to fail.
The first one matches any element with a matching ID, whereas the second one restricts matches to <input> elements. If these were CSS expressions it'd be the difference between #Search_Fields_profile_docno_input and input#Search_Fields_profile_docno_input.
Assuming you only use this ID once in your web page, the two XPaths are effectively equivalent. They'll both match the <input id="Search_Fields_profile_docno_input"> element and no other.
There are some good answers to your "why?" question here, but for Selenium use, there's an even better alternative. Since your page element has an ID attribute, use Selenium's ID locator instead of XPath or CSS:
Selenium.Click("id=Search_Fields_profile_docno_input");
This will go directly to the element, and will run quicker than just about any other locator. Note that the syntax is id=value, not id="value".
Given any element in your document, there's an infinite number of XPath expressions that will select it uniquely. Therefore it's entirely reasonable for two different products to generate two different paths.
Google has just released Wicked Good XPath - A rewrite of Cybozu Lab's famous JavaScript-XPath. Link: https://code.google.com/p/wicked-good-xpath/ The rewritten version is 40% smaller and about %30 faster than the original implementation.
You can check this out and replace the one being used in Selenium.

Trouble using Xpath "starts with" to parse xhtml

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format
<div id="post_message_somenumber">
and I only want to get the first one
I tried xpath='//div[starts-with(#id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions
I think I have a solution that does not require dealing with namespaces.
Here is one that selects all matching div's:
//div[#id[starts-with(.,"post_message")]]
But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:
(//div[#id[starts-with(.,"post_message")]])[1]
These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.
It works great for me in PowerShell:
# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'
# Run the xpath selection of all matching div's
$xml.selectnodes('//div[#id[starts-with(.,"post_message")]]')
Result:
id
--
post_message_somenumber
post_message_somenumber2
Or, for just the first match:
# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[#id[starts-with(.,"post_message")]])[1]')
Result:
id
--
post_message_somenumber
I tried xpath='//div[starts-with(#id,
'"post_message_')]' in yql without
success I'm still learning this,
anyone have suggestions
If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.
Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.
Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.
Here is a good answer how to do this in C#.
Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.
#FindBy(xpath = "//div[starts-with(#id,'expiredUserDetails') and contains(text(), 'Details')]")
private WebElementFacade ListOfExpiredUsersDetails;
This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details

xpath expression to select text from link

I have such content of html file:
<a class="bf" title="Link to book" href="/book/229920/">book name</a>
Help me to construct xpath expression to get link text (book name).
I try to use /a, but expression evaluates without results.
If the context is the entire document you should probably use // instead of /. Also you may (not sure about that) need to get down one more level to retrieve the text.
I think it should look like this
//a/text()
EDIT: As Tomalak pointed out it's text() not text
Have you tried
//a
?
More specific is better:
//a[#class='bf' and starts-with(#href, '/book/')]
Note that this selects the <a> element. In your host environment it's easy to extract the text value of that node via standard DOM methods (like the .textContent property).
To select the actual text node, see the other answers in this thread.
It depends also on the rest of your document. If you use // in the beginning all the matching nodes will be returned, which might be too many results in case you have other links in your document.
Apart from that a possible xpath expression is //a/text().
The /a you tried only returns the a-tag itself, if it is the root element. To get the link text you need to append the /text() part.

Resources