Get element name by containing text - ruby

I'm looking through HTML documents for the text: "Required". What I need to find is the element that holds the text. For example:
<p>... Required<p>
I would get to element name = p
However, it might not be in a <p> tag. It could be in any kind of tag, which is where this question differs from some of the other search text Stack Overflow questions.
Right now I'm using:
page.at(':contains("Required")')
but this only get me the full HTML element

The problem you have is the :contains pseudo class matches any element that has the searched for text anywhere in its descendants. You need to find the innermost element that contains such text. Since html is the ancestor of all elements, if the page contains the text anywhere then html will contain, and so that will be the first matching element.
I’m not sure you can achieve this with CSS, but you can use XPath like this:
page.at_xpath('//*[text()[contains(., "Required")]]')
This finds the first element node that has a text() node as a child that contains Required. When you have that node (if it exists) you can then call name on it to give the name of the element.

For CSS you can do:
page.at('[text()*="Required"]')
It's not real CSS though, or even a jQuery extra.

You should use CSS selectors:
page.css('p').text

Related

XPATH - how to get the text if an element contains a certain class

JHow do I grab this text here?
I am trying to grab the text here based on that the href contains "#faq-default".
I tried this first of all but it doesn't grab the text, only the actual href name, which is pointless:
//a/#href[contains(., '#faq-default-2')]
There will be many of these hrefs, such as default-2, default-3 so I need to do some kind of contains query, I'd guess?
You are selecting the #href node value instead of the a element value. So try this instead:
//a[contains(#href, '#faq-default-2')]

What is the proper way to use descendant in XPath

I am trying to find all DIV elements have the attribute widget-name and a descendant span tag that have a title attribute.
This is what I am trying.
//div[#widget-name and descendant::span[#title]]"
This seems to almost work but it is missing one element in the Nodes Collection it returns.
Never mind.
This is what I needed:
//div[#widget-name and descendant::span[#class='title']]
OK - take it back.
This is not the complete answer.
I am now trying to tweak this to where it returns all except where title is not equal to some text:
//div[#widget-name and descendant::span[#class='title' and [text()[contains(., '{someTextToKeep}'
Anyone see why this would be invalid XPath?
Final answer is:
//div[#widget-name and descendant::span[#class='title' and text()[not(contains(., 'someTextToKeep'))]]]"
This XPath should return all div's that:
has a widget-name attribute
has a descendant span element (used abbreviated syntax) that:
has a class attribute with the value 'title'
contains the text 'someTextToKeep' (if you want to exclude spans with certain text, wrap the contains() in not().
XPath:
//div[#widget-name and .//span[#class='title'][contains(.,'someTextToKeep')]]

xpath: find a node whose content has a provided string

I have some HTML like this:
<div> Make </div>
And I want to match it based on the fact that the content of the node contains the text "Make".
Put another way "Make" is a substring of the div node's content and I want to make such a match on this node using XPath.
The obvious solution would be
//div[contains(., 'Make')]
but this will find all divs that contain the string "Make" anywhere within their content, so not only will it find the example you've given in the question but also any ancestor div of that one, or any divs where that substring is buried deep in a descendant element.
If you only want cases where that string is directly inside the div with no other intervening elements then you'd have to use the slightly more complex
//div[text()[contains(., 'Make')]]
This is subtly different from
//div[contains(text(), 'Make')]
which would look only in the first text node child of the div, so it would find <div>Make<br/>Break</div> but not <div>Break<br/>Make</div>
If you want to allow for intervening elements other than div, then try
//div[contains(., 'Make')][not(.//div[contains(., 'Make'])]
Seems like this is what you are looking for: //div[contains(text(),'Make')]
If this will not work you can try: //div[contains(.,'Make')]. This will find all divs, which contain 'Make' in any attribute.
To find that node anywhere in the document, you would need this:
//div[contains(text(), "Make")]

Checking the HTML structure with XPATH, any count of nodes

I want to check the structure of some html piece of markup, just checking the structure.
For example I need to check that SOMEWHERE in <list-item-canvas> tag is <image name='category-pic'> tag.
I write:
//div[#class='list-item-canvas'][1]/*/img[#name='category-pic']
That's working if <img> is a second node after any ('*') node in the hierarchy, BUT if I have <img> somewhere deep-deep in the structure, AND I do not want to care about the level hierarchy how then I should write my xpath-query? I would think that instead '*' I might write '**' but I can not..
Is it possible?
Use:
(//div[#class='list-item-canvas'])[1]//img[#name='category-pic']
This selects any img the string value of whose name attribute is 'category-pic' and that is a descendant of the first (in document order) div the string value of whose class attribute is 'list-item-canvas'.
Do note the bracets surrounding the subexpression:
(//div[#class='list-item-canvas'])[1]
this is quite different from:
//div[#class='list-item-canvas'][1]
the latter selects every div element in the document that is the first div child of its parent -- and there may be potentially more than one such elements.
Do this:
//div[#class='list-item-canvas'][1]//img[#name='category-pic']
The // before img lets you find any descendant of the div that is an img, instead of just children or grandchildren of the div.
Also are you sure you want the [1] there? It may not be doing what you think.

XPath intersection of two sets

I need to extract all links from a html document having text as the inner element and not a reference to an image. Basically I would like to do a doc.select("//a/attribute::href") for all elements in a tree where doc.select("//a/text()") returns anything. Thanks!
Well you can write conditions in XPath in a predicate in square brackets, e.g. //a[text()]/#href selects the href attributes of all link (a) elements that have at least one text node child. Or if you want to make sure there is no img child element in the link you can use e.g. //a[not(img)]/#href.

Resources