First sample:
<ul class="breadcrumbs">
<li>Home</li>
<li>Movies</li>
<li>Thrilling Action</li>
<li><strong>Armageddon</strong></li>
</ul>
Second sample:
<ul class="breadcrumbs">
<li>Home</li>
<li>Food</li>
<li>Sweet rice</li>
<li><strong>Uncle Ben's Boil-In-Bag Rice</strong></li>
</ul>
This is how far I have come:
/html/body//ul[#class='breadcrumbs']/li[2]/a[contains(., 'Movies') or contains(., 'Cool Gadgets')]
Extracts Movies - but I also want it to extract Thrilling Action.
Explained: If the <a>-tag of second <li>-tag contains the strings "Movies" or "Cool Gadgets" I want to extract the <a>-tags of the second and the third <li>-tag.
/html//ul[#class='breadcrumbs']/li[2]/a
/html//ul[#class='breadcrumbs']/li[3]/a
If li[2] dosen't contain "Movies" or "Cool Gadgets", I don't want to extract anything!
If I get it right, you want to match all the <li> tags inside an <ul> if one of the <li> contains a special string. You could use:
//ul[#class="breadcrumbs" and (li[2]/a/text() = "Movies" or li[2]/a/text() = "Cool Gadgets")]/li[position() > 1]/a/text()
Explanation
1) The first part, //ul[#class="breadcrumbs" and (li[2]/a/text() = "Movies" or li[2]/a/text() = "Cool Gadgets")], will check you're in a <ul> tag that fits your needs.
#class="breadcrumbs" does what you might expect, and li[2]/a/text() = "Movies" or li[2]/a/text() = "Cool Gadgets" will return true if your filtering string is present.
Of course, if needed, you can change a/text() = "Movies" into a[contains(text(), "Movies")].
2) Once we know we're in the right place, all we have to do is select the fields you want. This is done by li[position() > 1] which will catch every <li> except the first. Select the text, and you're good to go!
The Document Type Declaration (see DocumentType) associated with this document.
For XML documents without a document type declaration this returns null.
For HTML documents, a DocumentType object may be returned, independently of the presence or absence of document type declaration in the HTML document.
This provides direct access to the DocumentType node, child node of this Document. This node can be set at document creation time and later changed through the use of child nodes manipulation methods, such as Node.insertBefore, or Node.replaceChild.
Note, however, that while some implementations may instantiate different types of Document objects supporting additional features than the "Core", such as "HTML" [DOM Level 2 HTML] , based on the DocumentType specified at creation time, changing it afterwards is very unlikely to result in a change of the features supported.
coolgadgets
Related
I have a block of code like so:
<ul class="open-menu">
<span>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text Here</strong>
<small>...</small>
</div>
</li>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text</strong>
<small>...</small>
</div>
</li>
</span>
</ul>
I'm trying to select a menu item based on exact text like so in the dev tools:
$x('.//*[contains(#data-testid, "menu-item") and normalize-space() = "Text"]');
But this doesn't seem to be selecting the element. However, when I do:
$x('.//*[contains(#data-testid, "menu-item")]');
I can see both of the menu items.
UPDATE:
It seems that this works:
$x('.//*[contains(#class, "menu-item") and normalize-space() = "Text"]');
Not sure why using a class in this context works and not a data-testid. How can I get my xpath selector to work with my data-testid?
Why is this exact text selector not working
The fact that both li elements are matched by the XPath expression
if omitting the condition normalize-space() = "Text" is a clue.
normalize-space() returns ... Text Here ... for the first li
in the posted XML and ... Text ... for the second (or some other
content in place of ... from div/svg or div/small) causing
normalize-space() = "Text" to fail.
In an update you say the same condition succeeds. This has nothing to
do with using #class instead of #data-testid; it must be triggered
by some content change.
How can I get my xpath selector to work with my data-testid?
By testing for an exact text match in the li's descendant strong
element,
.//*[#data-testid = "menu-item" and div/strong = "Text"]
which matches the second li. Making the test more robust is usually
in order, e.g.
.//*[contains(#data-testid,"menu-item") and normalize-space(div/strong) = "Text"]
Append /div/small or /descendant::small, for example, to the XPath
expression to extract just the small text.
data-testid="menu-item" is matching both the outer li elements while text content you are looking for is inside the inner strong element.
So, to locate the outer li element based on it's data-testid attribute value and it's inner strong element text value you can use XPath expression like this:
//*[contains(#data-testid, "menu-item") and .//normalize-space() = "Text"]
Or
.//*[contains(#data-testid, "menu-item") and .//*[normalize-space() = "Text"]]
I have tested, both expressions are working correctly
I am scraping through real estate listings from a certain site that contains multiple pages.
Here, I have summarized a structure nested deep in the DOM. I want to select all list items, based on the descendants that do not have a certain attribute name like <div id="nav-ad-container">
<ul class="photo-cards photo-cards_wow photo-cards_short photo-cards_extra-attribution">
<li>..</li>
<li>..</li>
<li>
<div id="nav-ad-container" class="zsg-aspect-ratio"></div>
</li>
<li>..</li>
<li>..</li>
<li>..</li>
</ul>
However, given that the attribute and the attribute's name change in the DOM for each page.
For example:
#id = 'nav-ad-container' or #class = 'nav-ad-empty'
In general, I want to retrieve the list items that do not contain the name pattern 'nav-ad'.
Things that I've tried with no success (still selects every list item)
xpath + //li[not(contains(#class, 'nav-ad'))]
xpath + //li[not((contains(#class,'nav-ad')) or contains(#id,'nav-ad'))]
Can anyone guide me toward a solution? I feel like I'm pretty close but missing something.
filter by classname of list items or descendants:
//li[not(contains(descendant-or-self::node()/#class,'nav-ad'))]
(not tested)
Try
//li[not(descendant-or-self::node()/#class[contains(.,'nav-ad')])]
From this code as below:
<span id="cTDQo7-img" class="z-menu-img"></span> payment
<span id="cTDQo7-img" class="z-menu-img"></span>
"payment"
I would like to get locator use keyword contains but the word "payment" is
a lot of the page such as payment1,payment2,payment3
And id is not unique.
I tried to use the code below but not work for me.
//a[contains(.,'payment')]
//span[#class='z-menu-img'] [contains(.,'payment')]
//span[#class='z-menu-img'] and [contains(.,'payment')]
//span[#class='z-menu-img'] contains(.,'payment')
Option 1 : Use the other attributes in combination with text
//a[#class='z-menu-cnt z-menu-cnt-img' and normalize-space(.)='payment']
Option 2: Specify the position if you have multiple elements without unique attributes/path
(//a[contains(.,'payment')])[1]
The second xpath will identify the first occurrence of the link contains text 'payment'. You can change the tagname and index based on your interest.
Given this XML, what XPath returns all elements whose prop attribute contains Foo (the first three nodes):
<bla>
<a prop="Foo1"/>
<a prop="Foo2"/>
<a prop="3Foo"/>
<a prop="Bar"/>
</bla>
//a[contains(#prop,'Foo')]
Works if I use this XML to get results back.
<bla>
<a prop="Foo1">a</a>
<a prop="Foo2">b</a>
<a prop="3Foo">c</a>
<a prop="Bar">a</a>
</bla>
Edit:
Another thing to note is that while the XPath above will return the correct answer for that particular xml, if you want to guarantee you only get the "a" elements in element "bla", you should as others have mentioned also use
/bla/a[contains(#prop,'Foo')]
This will search you all "a" elements in your entire xml document, regardless of being nested in a "blah" element
//a[contains(#prop,'Foo')]
I added this for the sake of thoroughness and in the spirit of stackoverflow. :)
This XPath will give you all nodes that have attributes containing 'Foo' regardless of node name or attribute name:
//attribute::*[contains(., 'Foo')]/..
Of course, if you're more interested in the contents of the attribute themselves, and not necessarily their parent node, just drop the /..
//attribute::*[contains(., 'Foo')]
descendant-or-self::*[contains(#prop,'Foo')]
Or:
/bla/a[contains(#prop,'Foo')]
Or:
/bla/a[position() <= 3]
Dissected:
descendant-or-self::
The Axis - search through every node underneath and the node itself. It is often better to say this than //. I have encountered some implementations where // means anywhere (decendant or self of the root node). The other use the default axis.
* or /bla/a
The Tag - a wildcard match, and /bla/a is an absolute path.
[contains(#prop,'Foo')] or [position() <= 3]
The condition within [ ]. #prop is shorthand for attribute::prop, as attribute is another search axis. Alternatively you can select the first 3 by using the position() function.
Have you tried something like:
//a[contains(#prop, "Foo")]
I've never used the contains function before but suspect that it should work as advertised...
John C is the closest, but XPath is case sensitive, so the correct XPath would be:
/bla/a[contains(#prop, 'Foo')]
If you also need to match the content of the link itself, use text():
//a[contains(#href,"/some_link")][text()="Click here"]
/bla/a[contains(#prop, "foo")]
try this:
//a[contains(#prop,'foo')]
that should work for any "a" tags in the document
For the code above...
//*[contains(#prop,'foo')]
Here is an example of html:
<li class="index i1"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample.com">
<li class="index i2"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample2.com">
I would like to take the value of every href in a element. What makes the list is the class in the first li in which class' name change i1, i2.
So I have a counter and change it when I go to take the value.
i <- 1
stablestr <- "index "
myVal <- paste(stablestr , i, sep="")
so even if try just to access the general lib with myVal index using this
profile<-remDr$findElement(using = 'xpath', "//*/input[#li = myVal]")
profile$highlightElement()
or the href using this
profile<-remDr$findElement(using = 'xpath', "/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']")
profile$highlightElement()
Is there anything wrong with xpath?
Your HTML structure is invalid. Your <li> tags are not closed properly, and it seems you are confusing <ol> with <li>. But for the sake of the question, I assume the structure is as you write, with properly closed <li> tags.
Then, constructing myVal is not right. It will yield "index 1" while you want "index i1". Use "index i" for stablestr.
Now for the XPath:
//*/input[#li = myVal]
This is obviously wrong since there is no input in your XML. Also, you didn't prefix the variable with $. And finally, the * seems to be unnecessary. Try this:
//li[#class = $myVal]
In your second XPath, there are also some errors:
/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']
^ ^ ^
missing $ should be #class is actually 'tlt mhead'
The first two issues are easy to fix. The third one is not. You could use contains(#class, 'tlt'), but that would also match if the class is, e.g., tltt, which is probably not what you want. Anyway, it might suffice for your use-case. Fixed XPath:
/li[#class=$myVal]/ol[#id='rem']/div[#class='bare']/h3/a[contains(#class, 'tlt')]