Get all immediate children and nothing deeper - xpath

WebElement body = browser.findElement(By.xpath("//body"));
body.findElement(By.xpath("")); // I want to get all child elements
// inside body, but nothing deeper.
Example document.
<html>
<body>
<div>
</div>
<span>
<table>
</table>
</span>
</body>
</html>
Expected result is div and span. I have no controll over the documents and they vary greatly.

("*") gives all the child elements of the context node. So use:
body.findElement(By.xpath("*"));

Here's another way to get the direct children of an element:
element.findElement(By.xpath("./*"));

/html/body/*
Will select only immediate children elements of body.
Do remember that if you copy all these selected nodes, you also copy their content. So, if you do copy-of, table will also be produced to the resulting document.
Also, I would recommend to read at least XPath basics, you ask too many similar questions.

The child instead of descendant may helps someone.

Related

When adding text() to my XPath, the number of results are duplicated. Why?

The following Xpath executed in Chrome's web inspector returns the expected number, 13, of nodes
//*[#id="day1"]//span[contains(#class, 'day-time-clock')]
However, when I add text() to it:
//*[#id="day1"]//span[contains(#class, 'day-time-clock')]/text()
it returns 26 nodes. However, only every other hit actually points somewhere in the source code, the others are just "numb".
The end node looks like this
<span class="medium bold day-time-clock">
09:00
<div class="tooltip-box first-free-tip ">
<div class="tooltip-box-inner">
<span class="fa fa-clock-o"></span>
Some text
</div>
</div>
</span>
The code sample above doesn't show exactly how it looks in the web inspector, there are a couple of empty rows in the text of this node. Here is a small screenshot of how it really looks.
Why is this happening? And what can I do about it?
Your span elements have multiple text node children. Some of the text node children contain only whitespace. In your example, the outer span element has one text node child containing "....09:00...." where "...." represents whitespace, plus one text node child immediately following the child div element. (Incidentally, my HTML is rusty, but I didn't think that having a div inside a span was allowed.)
Your second (inner) span element contains no text nodes, so /text() on this should select nothing.
Generally, using /text() in XPath is a bad idea unless you have some very good reason and know exactly what you are doing.

How to access second element using relative Xpath

Given this page snippet
<section id="mysection">
<div>
<div>
<div>
<a href="">
<div>first</div>
</a>
</div>
<div>
<a href="">
<div>second</div>
</a>
</div>
</div>
</div>
</section>
I want to access the second a-element using relative Xpath. In FF (and locating with Selenium IDE) this
//section[#id='mysection']//a[1]
works but this does not match
//section[#id='mysection']//a[2]
What is wrong with the second expression?
EDIT: Actually I do not care so much about Selenium IDE (just use it for quick verification). I want to get it going with selenium2library in Robot Framework. Here, the output is:
ValueError: Element locator with prefix '(//section[#id' is not
supported
for the suggested solution (//section[#id='mysection']//a)[2]
You can use this. This would select the anchor descendants of section and get you the second node. This works with xslt processor, hope this works with Selenium
//section[#id='mysection']/descendant::a[2]
Try this way instead :
(//section[#id='mysection']//a)[2]
//a[2] looks for <a> element within the same parent. Since each parent <div> only contains one <a> child, your xpath didn't match anything.
With this:
//section[#id='mysection']//a[1]
you are matching all first 'a' elements within any context (inside one div, for example), but with this
//section[#id='mysection']//a[2]
you are trying to match any second 'a' element with any context, but you dont have more than one 'a' element in any of nodes.
The icrementing sibling node thus should be a parent div node to those 'a' tags.
Very simple:
//section[#id='mysection']//a[1] - both elements
This is why previous answer with paranthesis around the whole thing is correct.
//section[#id='mysection']//div[1]/a - only first element
//section[#id='mysection']//div[2]/a - only second elemnt
Other way to mach each 'a' separately:
//section[#id='mysection']//a[div[text()='first']]
//section[#id='mysection']//a[div[text()='second']]
Other ways to reach to the second a-element can be by using the
<div>second</div>, call this bottom-up approach
instead of starting from section-element
<section id="mysection">, call this top-down approach
Using the div child of a-element, the solutions should look like this:
//div[.='second']/..

XPath/Scrapy crawling weirdly formatted pages

I've been playing around with scrapy and I see that knowledge of xpath is vital in order to leverage scrapy sucessfully. I have a webpage I'm trying to gather some information from where the tags are formatted as such
<div id = "content">
<h1></h1>
<p></p>
<p></p>
<h1></h1>
<p></p>
<p></p>
Now the heading contains a title and the first 'p' contains data1 and the second 'p' contains data2. This seems like a pretty straight forward task, and if this were always the case I would have no problem i.e. hsx.select('//*[#id="content"]') etc. etc.
The problem is, sometimes there will only be ONE p tag following a header instead of two.
<div id = "content">
<h1></h1>
<p></p> (a)
<h1></h1>
<p></p> (b)
<p></p> (c)
What i would like is if there is a paragraph tag missing I want to store that information as just blank data in my list. Right now what happens is the lists are storing the first heading 1, the first paragraph tag(a), and then the paragraph tag under the second h1 (b).
What it should be doing is storing
title -> h1[0]
data1[0] -> (a)
data2[0] ->[]
I hope that makes sense. I've been looking for a good xpath or scrapy solution to do this but I can't seem to find one. Any helpful tips would be awesome. thanks
Use:
//div[#id='content']
/h1[1]/following sibling::*
[not(position()>2)][self::p]
This selects the (utmost) two immediate sibling elements, only if they are p, of the first h1 child of any div (we know that this must be just one div) the string value of whoseidattribute is"content"`.
If only the first immediate sibling is a p, then the returned node-list contains only one item.
You can check whether the length of the returned node-list is 1 or 2, and use this to build the control of your processing.
I think you'd want something like this; not 100% though / untested.
//h1/following-sibling::*[2][self::p]/text()|//h1[not(following-sibling::*[2][self::p])]/string('')

Using Selenium to click on an element which is in nested <div> <li>

My page looks like the code given below in inspect element mode.
I have series of li tags inside div tags, whose ids are dynamically created while I load the page.
I need to click on Summary, intent, conversion elements.
Can anyone please help me how to do this in selenium RC.
The ids are dynamically generated so I cannot use the id option here. For example : the id yui_3_3_0_1_131676060142810944 is dynamically generated. Using xpath also, I could not click on these elements.
Please let me know if there is a way out. It would be very helpful for me.
The actual inspected source is here if it might help in looking into this.
http://paste.ubuntu.com/696262/
Here is the DOM tree with nested div
<div class="aui-helper-clearfix aui-tree-node-content aui-tree-data-content aui-tree-node- content aui-tree-node-selected aui-tree-expanded" id="aui_3_4_0_1_1005">
<div class="aui-tree-hitarea" id="aui_3_4_0_1_1224">
</div><div class="aui-tree-icon" id="aui_3_4_0_1_1214">
</div><div class="aui-tree-label aui-helper-unselectable" id="aui_3_4_0_1_1218">OSS</div> </div>
Here is the xpath that selects the clickable node (for Selenium)
$x("//div[contains(#class,'aui-tree-node-content') and (contains(.,'OSS'))]//div[contains(#class,'aui-tree-hitarea')]")
The obvious answer is:
selenium.click("link=Summary");
...
selenium.click("link=Intent");
...
selenium.click("link=Conversion");
...
A little less obvious would be:
selenium.click("xpath=//*[#id='reports-subtab-summary']/a");
...
selenium.click("xpath=//*[#id='reports-subtab-intent']/a");
...
selenium.click("xpath=//*[#id='reports-subtab-conversions']/a");
...
which has the advantage that it doesn't depend on page-text that might change (due to language translation, etc.).
You can use css path for example:
html body#gsr div#searchform.jhp form#tsf div.tsf-p div table tbody tr td table tbody tr td#sftab.lst-td div.lst-d table.lst-t tbody tr td table tbody tr td.gsib_a div input#lst-ib.gsfi

Xpath getting node without node child contents

hey guys coudln't get around this. I have an html structured as follow:
<div class="review-text">
<div id="reviewerprofile">
<div id="revimg"></div>
<div id="reviewr">marc</div>
<div id="revdate">2011-07-06</div>
</div>
this is an awesome review
</div>
what i am trying to get is just the text "this is an awesome review" but everytyme i query the node i also get the other content in the childs. using something like this now ".//div[#class='review-text']" how to get just that text only? tank you very much
You're almost there! Just add /text() at the end of your XPath to get the text node.
An XPath expression such as //div returns a set of nodes, in this case div elements. These are in effect pointers to the original nodes in the original tree; the nodes are still connected to their parents, children, ancestors, and siblings. If you see the children of the div element and don't want them, that's not the fault of the XPath processor, it's the fault of whatever software is processing the results returned by the XPath expression.
You can get the text that's an immediate child of the div element by using /text() as suggested. However, that assumes that you know exactly what you are expecting to find in the HTML page - if "awesome" were in italics, it would give you something different.

Resources