XPath in RSelenium for indexing list of values - xpath

Here is an example of html:
<li class="index i1"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample.com">
<li class="index i2"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample2.com">
I would like to take the value of every href in a element. What makes the list is the class in the first li in which class' name change i1, i2.
So I have a counter and change it when I go to take the value.
i <- 1
stablestr <- "index "
myVal <- paste(stablestr , i, sep="")
so even if try just to access the general lib with myVal index using this
profile<-remDr$findElement(using = 'xpath', "//*/input[#li = myVal]")
profile$highlightElement()
or the href using this
profile<-remDr$findElement(using = 'xpath', "/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']")
profile$highlightElement()
Is there anything wrong with xpath?

Your HTML structure is invalid. Your <li> tags are not closed properly, and it seems you are confusing <ol> with <li>. But for the sake of the question, I assume the structure is as you write, with properly closed <li> tags.
Then, constructing myVal is not right. It will yield "index 1" while you want "index i1". Use "index i" for stablestr.
Now for the XPath:
//*/input[#li = myVal]
This is obviously wrong since there is no input in your XML. Also, you didn't prefix the variable with $. And finally, the * seems to be unnecessary. Try this:
//li[#class = $myVal]
In your second XPath, there are also some errors:
/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']
^ ^ ^
missing $ should be #class is actually 'tlt mhead'
The first two issues are easy to fix. The third one is not. You could use contains(#class, 'tlt'), but that would also match if the class is, e.g., tltt, which is probably not what you want. Anyway, it might suffice for your use-case. Fixed XPath:
/li[#class=$myVal]/ol[#id='rem']/div[#class='bare']/h3/a[contains(#class, 'tlt')]

Related

xPath - Why is this exact text selector not working with the data test id?

I have a block of code like so:
<ul class="open-menu">
<span>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text Here</strong>
<small>...</small>
</div>
</li>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text</strong>
<small>...</small>
</div>
</li>
</span>
</ul>
I'm trying to select a menu item based on exact text like so in the dev tools:
$x('.//*[contains(#data-testid, "menu-item") and normalize-space() = "Text"]');
But this doesn't seem to be selecting the element. However, when I do:
$x('.//*[contains(#data-testid, "menu-item")]');
I can see both of the menu items.
UPDATE:
It seems that this works:
$x('.//*[contains(#class, "menu-item") and normalize-space() = "Text"]');
Not sure why using a class in this context works and not a data-testid. How can I get my xpath selector to work with my data-testid?
Why is this exact text selector not working
The fact that both li elements are matched by the XPath expression
if omitting the condition normalize-space() = "Text" is a clue.
normalize-space() returns ... Text Here ... for the first li
in the posted XML and ... Text ... for the second (or some other
content in place of ... from div/svg or div/small) causing
normalize-space() = "Text" to fail.
In an update you say the same condition succeeds. This has nothing to
do with using #class instead of #data-testid; it must be triggered
by some content change.
How can I get my xpath selector to work with my data-testid?
By testing for an exact text match in the li's descendant strong
element,
.//*[#data-testid = "menu-item" and div/strong = "Text"]
which matches the second li. Making the test more robust is usually
in order, e.g.
.//*[contains(#data-testid,"menu-item") and normalize-space(div/strong) = "Text"]
Append /div/small or /descendant::small, for example, to the XPath
expression to extract just the small text.
data-testid="menu-item" is matching both the outer li elements while text content you are looking for is inside the inner strong element.
So, to locate the outer li element based on it's data-testid attribute value and it's inner strong element text value you can use XPath expression like this:
//*[contains(#data-testid, "menu-item") and .//normalize-space() = "Text"]
Or
.//*[contains(#data-testid, "menu-item") and .//*[normalize-space() = "Text"]]
I have tested, both expressions are working correctly

How to select by non-direct child condition in Xpath?

I would like to show an example.
This how the page looks:
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Hello</span>
</div>
</a>
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Pick Delivery Location</span>
</div>
</a>
I want to select anchor tags that have a child (direct or non-direct) span that has the text 'Hello'.
Right now, I do something like this:
//a[#class='aclass'][div/span[text() = 'Hello']]
I want to be able to select without having to select direct children (div in this case), like this:
//a[#class='aclass'][//span[text() = 'Hello']]
However, the second one finds all the anchor tags with the class 'aclass' rather than the one with the span with 'Hello' text.
I hope I worded my question clearly. Please feel free to edit if necessary.
In your attempt, // goes back to the root of the document - effectively you are saying "Give me the as for which there is a span anywhere in the document", which is why you get them all.
What you need is the descendant axis :
//a[#class='aclass' and descendant::span[text() = 'Hello']]
Note I have joined the conditions with and, but two separate conditions would also work.

Get all tags followings a certain with mechanize ? (ruby)

How can I get all elements following once, like :
<div id="exemple">
<h2 class="target">foo</h2>
<p>bla bla</p>
<ul>
<li>bar1</li>
<li>bar2</li>
<li>bar3</li>
</ul>
<h4>baz</h4>
<ul>
<li>lot</li>
</ul>
<div>of</div>
<p>possible</p>
<p>tags</p>
after
</div>
I need to detect <h2 class="target"> and get all tags to the next <h4> and ignore <h4> AND all followings tags (if <h4> not exist, I have to get all tags to the end of parent [here : end of <div>])
The content is dynamic and unpredictable The only rule is : we know there is a target and there is a (or end of element). I need to get all tags beetween both and exclud all others.
With this exemple I need to get the HTML following :
<h2 class="target">foo</h2>
<p>bla bla</p>
<ul>
<li>bar1</li>
<li>bar2</li>
<li>bar3</li>
</ul>
so I can get : target = page.at('#exemple .target')
I know next_sibling method, but how can i test the type of tag of the current node?
I think about something like that to course the node tree :
html = ''
while not target.is_a? 'h4'
html << target.inner_html
target = target.next_sibling
How can I do this?
You can subtract the ones you don't want from your nodeset:
h2 = page.at('h2')
(h2.search('~ *') - h2.search('~ h4','~ h4 ~ *')).each do |el|
# el is not a h4 and does not follow a h4
end
Maybe it makes more sense to use xpath but I can do this without googling.
Your idea of iterating next sibling can work too:
el = page.at('h2 ~ *')
while el && el.name != 'h4'
# do something with el
el = el.at('+ *')
end
Looks like you want to return the h2 element and its following siblings. I'm not clear on whether you want to keep or discard the h4; if you want to keep it the XPath would be:
//h2[#class="target"] | //h2[#class="target"]/following-sibling::*
If you need to exclude the h4:
//h2[#class="target"] | //h2[#class="target"]/following-sibling::*[not(self::h4)]
Edit: If you need to exclude an h4 and anything beyond:
//h2[#class="target"] | //h2[#class="target"]/following-sibling::*[not(self::h4) | not(preceding-sibling::h4)]

Extracting contents from a list split across different divs

Consider the following html
<div id="relevantID">
<div class="column left">
<h1> Section-Header-1 </h1>
<ul>
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
</ul>
</div>
<div class="column">
<ul> <!-- Pay attention here -->
<li>item1e</li>
<li>item1f</li>
</ul>
<h1> Section-Header-2 </h1>
<ul>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
</ul>
</div>
<div class="column right">
<h1> Section-Header-3 </h1>
<ul>
<li>item3a</li>
<li>item3b</li>
<li>item3c</li>
<li>item3d</li>
</ul>
</div>
</div>
My objective is to extract the items for each Section headers. However, inconveniently the designer of the webpage decided to break up the data into three columns, adding an additional div (with classes column right etc).
My current method of extraction was using the xpath
for section headers, I use the xpath (get all h1 elements withing a div with given id)
//div[#id="relevantID"]//h1
above returns a list of h1 elements, looping over each element I apply the additional selector, for each matched h1 element, look up the next ul node and retreive all its li nodes.
following-sibling::ul//li
But thanks to the designer's aesthetics, I am failing in the one particular case I've marked in the HTML file. Where the items are split across two different column divs.
I can probably bypass this problem by stripping out the column divs entirely, but I don't think modifying the html to make a selector match is considered good (I haven't seen it needed anywhere in the examples I've browsed so far).
What would be a good way to extract data that has been formatted like this? Full solutions are not neccessary, hints/tips will do. Thanks!
The columns do frustrate use of following-sibling:: and preceding-sibling::, but you could instead use the following:: and preceding:: axis if the columns at least keep the list items in proper document order. (That is indeed the case in your example.)
The following XPath will select all li items, regardless of column, occurring after the "Section-Header-1" h1 and before the "Section-Header-2" h1 header in document order:
//div[#id='relevantID']//li[normalize-space(preceding::h1) = 'Section-Header-1'
and normalize-space(following::h1) = 'Section-Header-2']
Specifically, it selects the following items from your example HTML:
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
<li>item1e</li>
<li>item1f</li>
You can combine following-sibling and preceding-sibling to get possible li elements in a div before the h2 and use the union operator |. As example for the second h2:
((//div[#id="relevantID"]//h1)[2]/preceding-sibling::ul//li) |
((//div[#id="relevantID"]//h1)[2]/following-sibling::ul//li)
Result:
<li>item1e</li>
<li>item1f</li>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
As you're already selecting all h1 using //div[#id="relevantID"]//h1 and retrieving all li items for each h1 using as a second step following-sibling::ul//li, you could combine this to following-sibling::ul//li | preceding-sibling::ul//li.

Xpath: match a node only if one sub-node contains special string

First sample:
<ul class="breadcrumbs">
<li>Home</li>
<li>Movies</li>
<li>Thrilling Action</li>
<li><strong>Armageddon</strong></li>
</ul>
Second sample:
<ul class="breadcrumbs">
<li>Home</li>
<li>Food</li>
<li>Sweet rice</li>
<li><strong>Uncle Ben's Boil-In-Bag Rice</strong></li>
</ul>
This is how far I have come:
/html/body//ul[#class='breadcrumbs']/li[2]/a[contains(., 'Movies') or contains(., 'Cool Gadgets')]
Extracts Movies - but I also want it to extract Thrilling Action.
Explained: If the <a>-tag of second <li>-tag contains the strings "Movies" or "Cool Gadgets" I want to extract the <a>-tags of the second and the third <li>-tag.
/html//ul[#class='breadcrumbs']/li[2]/a
/html//ul[#class='breadcrumbs']/li[3]/a
If li[2] dosen't contain "Movies" or "Cool Gadgets", I don't want to extract anything!
If I get it right, you want to match all the <li> tags inside an <ul> if one of the <li> contains a special string. You could use:
//ul[#class="breadcrumbs" and (li[2]/a/text() = "Movies" or li[2]/a/text() = "Cool Gadgets")]/li[position() > 1]/a/text()
Explanation
1) The first part, //ul[#class="breadcrumbs" and (li[2]/a/text() = "Movies" or li[2]/a/text() = "Cool Gadgets")], will check you're in a <ul> tag that fits your needs.
#class="breadcrumbs" does what you might expect, and li[2]/a/text() = "Movies" or li[2]/a/text() = "Cool Gadgets" will return true if your filtering string is present.
Of course, if needed, you can change a/text() = "Movies" into a[contains(text(), "Movies")].
2) Once we know we're in the right place, all we have to do is select the fields you want. This is done by li[position() > 1] which will catch every <li> except the first. Select the text, and you're good to go!
The Document Type Declaration (see DocumentType) associated with this document.
For XML documents without a document type declaration this returns null.
For HTML documents, a DocumentType object may be returned, independently of the presence or absence of document type declaration in the HTML document.
This provides direct access to the DocumentType node, child node of this Document. This node can be set at document creation time and later changed through the use of child nodes manipulation methods, such as Node.insertBefore, or Node.replaceChild.
Note, however, that while some implementations may instantiate different types of Document objects supporting additional features than the "Core", such as "HTML" [DOM Level 2 HTML] , based on the DocumentType specified at creation time, changing it afterwards is very unlikely to result in a change of the features supported.
coolgadgets

Resources