Selenium: Extracting only Text with out any sub elements from <p> - selenium-rc

Below is the sample code
<p>
I want this Text
<sup> not this </sup>
.(Need this too).
<sup> and not this </sup>
</p>
Using Selenium RC, selenium.getText("//...") bring us the all the text including which are in < sup >.
Is there any way to get the text from <p> without <sup> tags ?
Please let me know. Thanks

Your only option is to get the text of the three elements and manipulate the parts you don't want away. That, or resort to using getEval() to run some JavaScript that get's the <P> element's innerHTML property, then remove the parts inside the <SUP> elements yourself.

Related

how to remove everything after specific text with xpath

I am trying to setup a Telegram Instant View for a website.
i have something like this code and want to remove everything after "remove from here" text
<p> sample text <p> test</p> remove from here <p>test text</p> </p>
how can i access every text/nodes after this specific text ("remove from here") and remove them?
Update:
i want to have this result:
<p> sample text <p> test</p> remove from here</p>
how can i access every text/nodes after this specific text
You can use following-sibling::* from XPath to access the nodes on the same level after the one you selected.
Then use #remove function from the Instant View DSL:
$selected_node: //*[self::text() and normalize-space()="remove from here"]
#remove: $selected_node/following-sibling::*
You may want to be more specific with the $selected_node. Depending on your needs, you may want to add predicates to remove only certain types of the following siblings, for example: following-sibling::*[self::node() or self::text()].

Detect first non-empty element

After reading the most relevant Xpath questions about detecting empty nodes, I still can not find the first non-empty element. The dataset looks like:
<div>
<p>
<elem> </elem>
</p>
<p>
<elem> </elem>
</p>
<p>
<elem> </elem>
</p>
<p>
<elem>   </elem>
</p>
<p>
<elem>Application</elem>
</p>
<p>
<elem>Other text that should not be detected.</elem>
</p>
<p>
<elem> </elem>
</p>
<p>
<elem>Second application</elem>
</p>
</div>
Basically the empty elements should not be taken into account, and we only want to detect the first Application element. We've been testing a lot with normalize-space, and related functions but can not get this working.
The main problem are the empty elements. The check we have right now solves the positioning flawlessly, but fails once the html contains elements:
/div/p[position() < 3]//*[normalize-space()='Application']
So, how can we ignore empty elements? This only is possible via an additional step in between?
In my definition an empty element does not have any child nodes so //*[not(node()] would select all empty elements by that definition. If you want to allow certain text content then you could check normalize-space after removing them e.g. //*[not(*) and not(normalize-space(translate(., ' ', '')))]. Basically you need to list all characters as the second argument of the translate call that you want to remove before checking with normalize-space. And the XPath expression I have written would work inside XSLT where the numeric character reference is parsed by an XML parser, in general it depends on the host language you use XPath with how to escape characters.

Trouble accessing a text with XPath query

I have this html snippet
<div id="overview">
<strong>some text</strong>
<br/>
some other text
<strong>more text</strong>
TEXT I NEED IS HERE
<div id="sub">...</div>
</div>
How can I get the text I am looking for (shown in caps)?
I tried this, I get an error message saying not able to locate the element.
"//div[#id='overview']/strong[position()=2]/following-sibling"
I tried this, I get the div with id=sub, but not the text (correctly so)
"//div[#id='overview']/*[preceding-sibling::strong[position()=2]]"
Is there anyway to get the text, other than doing some string matching or regex with contents of overview div?
Thanks.
following-sibling is the axis, you still need to specify the actual node (in your example the XPath processor is searching for an element named following-sibling). You separate the axis from the node with ::.
Try this:
//div[#id='overview']/strong[position()=2]/following-sibling::text()[1]
This specifies the first text node after the second strong in the div.
If you always want the text immediately preceding the <div id="sub"> then you could try
//div[#id='sub']/preceding-sibling::text()[1]
That would give you everything between the </strong> and the opening <div ..., i.e. the upper case text plus its leading and trailing new lines and whitespace.

How to get node text without children?

I use Nokogiri for parse the html page with same content:
<p class="parent">
Useful text
<br>
<span class="child">Useless text</span>
</p>
When I call the method page.css('p.parent').text Nokogiri returns 'Useful text Useless text'. But I need only 'Useful text'.
How to get node text without children?
XPath includes the text() node test for selecting text nodes, so you could do:
page.xpath('//p[#class="parent"]/text()')
Using XPath to select HTML classes can become quite tricky if the element in question could belong to more than one class, so this might not be ideal.
Fortunately Nokogiri adds the text() selector to CSS, so you can use:
page.css('p.parent > text()')
to get the text nodes that are direct children of p.parent. This will also return some nodes that are whtespace only, so you may have to filter them out.
You should be able to use page.css('p.parent').children.remove.
Then your page.css('p.parent').text will return the text without the children nodes.
Note: the page will be modified by the remove

Get specific element in webdriver containing text

What are some good ways to retrieve a specific element in WebDriver/Selenium2 based only on the text inside the element?
<div class="page">
<ul id="list">
<li>Apple</li>
<li>Orange</li>
<li>Banana</li>
<li>Grape</li>
</ul>
</div>
Essentially, I'd like to write something like this to retrieve the specific element:
#driver.find_element(:id, "list").find_element(:text, "Orange")
This is very similar to how I would use a selector when finding text inside a link (i.e. :link_text or :partial_link_text), but I would like to find elements by text inside normal, non-link elements.
Any suggestions? How do you deal with this issue? (In case you were wondering, I am using Ruby.)
You could do that with xPath. Something like this for your example:
#driver.find_element(:id, "list").find_element(:xpath, './/*[contains(., "Orange")]')
A couple years late, but I was just going to ask this question and answer it so other could find it...
I used a css selector to get all the li elements and then filtered the array based on the text:
#driver.find_elements(css: '#list > li').select {|el| el.text == 'Orange'}.first
You could then .click or .send_keys :return to select the option.

Resources