Extract text and ignore next node - xpath

From this:
<span class="postbody">
<span style="color: #8e2fb6">
<span style="font-weight: bold">nickname</span>
</span>
<br>
Example text
<br>
Example text
<br>
<p class="signature">THIS IS WHAT I DO NOT WANT</p>
</span>
I want to extract:
<br>
Example text
<br>
Example text
<br>
I tried: span/text()[1] but it seems not to work. I always get unwanted p class. Is it even possible to do?

First you need to load your Html string into a HtmlDocument or HtmlNode (Using .load() function).
ChildNodes collection contains every children of your current node (Basically every nodes under span.postbody).
After that what you need to do is pretty obvious, just grab #text and br nodes (keep in mind that you will receive some #text nodes that have just whitespace characters. You may want to filter it out in the result.
//load html to HtmlNode
node.ChildNodes.Where(n => n.Name.Equals("#text") || n.Name.Equals("br")) //It will return collection of HtmlNode

You can use the jQuery selector for postbody, then the .text method which should ignore the HTML. This will also ignore the .
$('.postbody').text();
An alternative would be to iterate through the children of the $('.postbody').text();

'//text()[preceding-sibling::br and normalize-space()]'

Related

How to select by non-direct child condition in Xpath?

I would like to show an example.
This how the page looks:
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Hello</span>
</div>
</a>
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Pick Delivery Location</span>
</div>
</a>
I want to select anchor tags that have a child (direct or non-direct) span that has the text 'Hello'.
Right now, I do something like this:
//a[#class='aclass'][div/span[text() = 'Hello']]
I want to be able to select without having to select direct children (div in this case), like this:
//a[#class='aclass'][//span[text() = 'Hello']]
However, the second one finds all the anchor tags with the class 'aclass' rather than the one with the span with 'Hello' text.
I hope I worded my question clearly. Please feel free to edit if necessary.
In your attempt, // goes back to the root of the document - effectively you are saying "Give me the as for which there is a span anywhere in the document", which is why you get them all.
What you need is the descendant axis :
//a[#class='aclass' and descendant::span[text() = 'Hello']]
Note I have joined the conditions with and, but two separate conditions would also work.

xpath:how to find a node that not contains text?

I have a html like:
...
<div class="grid">
"abc"
<span class="searchMatch">def</span>
</div>
<div class="grid">
<span class="searchMatch">def</span>
</div>
...
I want to get the div which not contains text,but xpath
//div[#class='grid' and text()='']
seems doesn't work,and if I don't know the text that other divs have,how can I find the node?
Let's suppose I have inferred the requirement correctly as:
Find all <div> elements with #class='grid' that have no directly-contained non-whitespace text content, i.e. no non-whitespace text content unless it's within a child element like a <span>.
Then the answer to this is
//div[#class='grid' and not(text()[normalize-space(.)])]
You need a not() statement + normalize-space() :
//div[#class='grid' and not(normalize-space(text()))]
or
//div[#class='grid' and normalize-space(text())='']

Xpath: select div that contains class AND whose specific child element contains text

With the help of this SO question I have an almost working xpath:
//div[contains(#class, 'measure-tab') and contains(., 'someText')]
However this gets two divs: in one it's the child td that has someText, the other it's child span.
How do I narrow it down to the one with the span?
<div class="measure-tab">
<!-- table html omitted -->
<td> someText</td>
</div>
<div class="measure-tab"> <-- I want to select this div (and use contains #class)
<div>
<span> someText</span> <-- that contains a deeply nested span with this text
</div>
</div>
To find a div of a certain class that contains a span at any depth containing certain text, try:
//div[contains(#class, 'measure-tab') and contains(.//span, 'someText')]
That said, this solution looks extremely fragile. If the table happens to contain a span with the text you're looking for, the div containing the table will be matched, too. I'd suggest to find a more robust way of filtering the elements. For example by using IDs or top-level document structure.
You can use ancestor. I find that this is easier to read because the element you are actually selecting is at the end of the path.
//span[contains(text(),'someText')]/ancestor::div[contains(#class, 'measure-tab')]
You could use the xpath :
//div[#class="measure-tab" and .//span[contains(., "someText")]]
Input :
<root>
<div class="measure-tab">
<td> someText</td>
</div>
<div class="measure-tab">
<div>
<div2>
<span>someText2</span>
</div2>
</div>
</div>
</root>
Output :
Element='<div class="measure-tab">
<div>
<div2>
<span>someText2</span>
</div2>
</div>
</div>'
You can change your second condition to check only the span element:
...and contains(div/span, 'someText')]
If the span isn't always inside another div you can also use
...and contains(.//span, 'someText')]
This searches for the span anywhere inside the div.

Use Xpath To Retrieve Elements

HTML Portion:
<div class="abc">
<div style="text-align:left; itemscopr itemtype="xyz">
<h1 itemtype="mno"> I want this text </h1>
</div>
</div>
I am using
$text = $xpath->query('//div[class="abc"]/div/h1]
but I am getting no value. Please help me as I am new to it.
You should try
//div[#class="abc"]/div/h1
The difference is in the # sign before class, because the attribute axis is accessed this way. When you omit the # sign, it looks for node names (tag names).
This returns you the whole h1 node (or, rather, a node-set containing all the matching h1 nodes).
If you only wanted the text from the element, try the evaluate function instead:
$text = $xpath->evaluate("//div[#class='abc']/div/h1/text()")

how to access this element

I am using Watir to write some tests for a web application. I need to get the text 'Bishop' from the HTML below but can't figure out how to do it.
<div id="dnn_ctr353_Main_ctl00_ctl00_ctl00_ctl07_Field_048b9dfa-bc64-42e4-8bd5-b45385e5f45b_view" style="display: block;">
<div class="workprolabel wpFieldLabel">
<span title="Please select a courtesy title from the list.">Title</span> <span class="validationIndicator wpValidationText"></span>
</div>
<span class="wpFieldViewContent" id="dnn_ctr353_Main_ctl00_ctl00_ctl00_ctl07_Field_048b9dfa-bc64-42e4-8bd5-b45385e5f45b_view_value"><p class="wpFieldValue ">Bishop</p></span>
</div>
Firebug tells me the xpath is:
html/body/form/div[5]/div[6]/div[2]/div[2]/div/div/span/span/div[2]/div[4]/div[1]/span[1]/div[2]/span/p/text()
but I cant format the element_by_xpath to pick it up.
You should be able to access the paragraph right away if it's unique:
my_p = browser.p(:class, "wpFieldValue ")
my_text = my_p.text
See HTML Elements Supported by Watir
Try
//span[#id='dnn_ctr353_Main_ctl00_ctl00_ctl00_ctl07_Field_048b9dfa-bc64-42e4-8bd5b45385e5f45b_view_value']//text()
EDIT:
Maybe this will work
path = "//span[#id='dnn_ctr353_Main_ctl00_ctl00_ctl00_ctl07_Field_048b9dfa-bc64-42e4-8bd5b45385e5f45b_view_value']/p";
ie.element_by_xpath(path).text
And check if the span's id is constant
Maybe you have an extra space in the end of the name?
<p class="wpFieldValue ">
Try one of these (worked for me, please notice trailing space after wpFieldValue in the first example):
browser.p(:class => "wpFieldValue ").text
#=> "Bishop"
browser.span(:id => "dnn_ctr353_Main_ctl00_ctl00_ctl00_ctl07_Field_048b9dfa-bc64-42e4-8bd5-b45385e5f45b_view_value").text
#=> "Bishop"
It seems in run time THE DIV style changing NONE to BLOCK.
So in this case we need to collect the text (Entire source or DIV Source) and will collect the value from the text
For Example :
text=ie.text
particular_div=text.scan(%r{div id="dnn_ctr353_Main_ctl00_ctl00_ctl00_ctl07_Field_048b9dfa-bc64-42e4-8bd5-b45385e5f45b_view" style="display: block;(.*)</span></div>}im).flatten.to_s
particular_div.scan(%r{ <p class="wpFieldValue ">(.*)</p> }im).flatten.to_s
The above code is the sample one will solve your problem.

Resources