how to select the second <p> element using Xpath - xpath

I am trying to scrape full reviews from this webpage. (Full reviews - after clicking the 'Read More' button). This I am doing using RSelenium. I am able to select and extract text from the first <p> element, using the code
reviewNodes <- mybrowser$findElements(using = 'xpath', "//p[#id][1]")
which is for less text review.
But not able to extract full text reviews using the code
reviewNodes <- mybrowser$findElements(using = 'xpath', "//p[#id][2]")
or
reviewNodes <- mybrowser$findElements(using = 'xpath', "//p[#itemprop = 'reviewBody']")
It shows blank list elements. I don't know what is wrong. Please help me..

Drop the double slash and try to use the explicit descendant axis:
/descendant::p[#id][2]
(see the note from W3C document on XPath I mentioned in this answer)

As you're dealing with a list, you should first find the list items, e.g. using CSS selector
div.srm
Based on these elements, you can then search on inside the list items, e.g. using CSS selector
p[itemprop='reviewBody']
Of course you can also do it in 1 single expression, but that is not quite as neat imho:
div.srm p[itemprop='reviewBody']
Or in XPath (which I wouldn't recommend):
//div[#class='srm']//p[#itemprop='reviewBody']
If neither of these work for you, then the problem must be somewhere else.

Related

How to find the exact element path without using xpath

I'm currently trying to locate this check box. I know I can use a xpath to locate it but I'm trying to see if there's a more efficient way of doing it. The problem I'm seeing is that there are multiple div class with the same name. I'm trying to find this specific one and isolate it. I'm trying to make my code more efficient if possible.
Xpath
/html/body/div/div/div/div[1]/cow-data/cat-panel/section/div[1]/div/div/md- checkbox[4]/div[1]
Element path:
<div class="cd-container" cd-gar-ripple="" cd-gar-ripple-checkbox=""><div class="cd-icon"></div></div>
Code I'm trying to use:
find('cd-container').click
The problem I'm seeing is that the div id 'cd-container' has multiple occurrences on the page and thus this doesn't work. I'm trying to see if I can find a more efficient way of doing this.
As per the HTML cd-container is the value of the class attribute but not id attribute. So your effective line of code will be:
find('.cd-container').click
If you want to find an element (AND THEN), return it's xpath. Use capybara.
This will allow you to locate using text / css selector. And then you can just return the path of the element.
i.e.
page.find('td', text: 'Column 1').path # Random td with text
page.find('#main').path # ID
page.all('div').select { |element| element.text == 'COoL dIv' }.first.path # First div that matches certain text
page.find('.form > div:nth-of-type(2)').path # Specific structured div
page.all('p div li:nth-child(3)').sample.path # Random li

Why does IMPORTXML with XPATH return unexpected blank row in addition to expected result?

I'm importing into Google Sheets with IMPORTXML with the following XPATH:
=IMPORTXML(A2;"//*[#id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li")
A2 containing the URL (https://stt.wiki/wiki/20th_Century_Pistol).
From the website I want to import the list entries in the "Basic" column and "Crafted From" row of the table.
There are only two list entries in this section of the table:
"x1 Basic Security Codes" and
"x4 Basic Casing"
Therefore, I expected to get only those two list entries as rows in my sheet.
Instead, I got an additional blank row above those two entries. When I change "td[1]" to "td[3]" in the XPATH query however, there are no extra blanks.
I don't understand where the additional blank row is coming from and how I can avoid it.
Google Sheet with desired and actual result
When I saw the HTML of the URL, there are 2 li tags in the ul tag. So I think that your xpath is correct. But from your issue, I was worry that the sup tag might affect to this situation. But I'm not sure whether this is the direct reason. So I would like to propose to add the attribute of li for your xpath as follows.
Modified xpath:
When your xpath is modified, please modify as follows.
From:
//*[#id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li
To:
//*[#id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li[#style='white-space:nowrap']
By adding [#style='white-space:nowrap'], the value of li with style='white-space:nowrap' is retrieved.
Result:
The formula is =IMPORTXML(A1;"//*[#id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li[#style='white-space:nowrap']"). Please put the URL to the cell "A1".
Note:
Also, you can use the xpath of //*[#id='mw-content-text']/div/table[1]/tbody/tr[4]/td[1]/ul/li[position()>1].
To complete the very neat #Tanaike's answer, another expression :
=IMPORTXML(A2;"//th[contains(.,'Crafted')]/following::td[1]//li[contains(#style,'white')]")
If a blank line is added it's because GoogleSheets parses an additional blank li element containing a #style attribute.

How xpath works for tags in tags

I am trying to find out the xpath for first name of the facebook page and I have ended it with the following xpath: "**//div[1]/div[1]/div[1]/div[1]/input[#class='inputtext _58mg _5dba _2ph-']**" which is correct. My question is that, there are total 9 div tags on the page but I got it with the fourth div, I am not getting the reason how it's finding it in fourth div?
Page is Facebook home Page and element to find with xpath is Fist name input box
Please help me to understand how it's finding the element using above xpath
I know there are other ways to find xpath but I want to know the reason how it's finding it
I hope I am providing the complete information for the asked question if not let me know
Well it's because your xpath starts with a //. In literal english, it says find a DIV whose child is a DIV whose child is a DIV whose child is a DIV whose child is your INPUT. In your case, it does find a DIV which has INPUT as described by your xpath.
If you replace that // to single /, it will find the first DIV and then will try finding your input. Which it won't be able to find since .. like you said there are 9 DIVs.
Hope that paints a picture. Let me know if you need more explanation.

Watir how to click on nested element

I'm trying to click on "Mr" from the drop down list I've tried a combination of things but non of them seem to work.
I've even tried xpath which is usually reliable but for this case its failing.
$browser.element(:xpath, "/html/body/div[1]/div[1]/div[1]/div/div[2]/div[1]/div[2]/div/div[2]/div/div[2]/div[2]/form/div[2]/div/div[2]/div/div/div/div/div/ul/li[2]/a").click
The XPath suggested by Saurabh Gaur, can be written in a more readable Watir-like fashion using:
$browser.ul(class: 'dropdown-menu').link(text: 'Mr').click
Note that this assumes that there is only one ul element with class dropdown-menu. If there are multiple, you will need to scope the search to the specific dropdown using an element that likely exists higher in the DOM.
However, given there is likely only one link with text "Mr", you can probably get away with simply:
$browser.link(text: 'Mr').click
Given the link is a dialog that switches from hidden to visible, you may need to also wait:
$browser.link(text: 'Mr').when_present.click
Your xPath is positional which depends on element position.. it will not work if elements are change their position means adding some elements after some action on the page.
After seeing your attached image I have generated following xPath as below :-
//ul[contains(#class, 'dropdown-menu')]/descendant::span[contains(.,'Mr')]/parent::a
Try with this xPath.. May be it will work...:)

WebDriver select element that has ::before

I have 2 elements that have the same attributes but shown one at a time on the page (When one is shown, the other disappears).The only difference between the two is that the element which is displayed will have the '::before' selector. Is it possible to use an xpath or css selector to retrieve the element based on its id and whether or not it has ::before
I bet also to try with the javascript solution above.
Since ::after & ::before are a pseudo element which allows you to insert content onto a page from CSS (without it needing to be in the HTML). While the end result is not actually in the DOM, it appears on the page as if it is - you see it but can't really locate it with xpath for example (https://css-tricks.com/almanac/selectors/a/after-and-before/).
I can also suggest if possible to have different IDs or if they in different place in the DOM make more complex xpath using above/below elements and see if it is displayed.
String script = "return window.getComputedStyle(document.querySelector('.analyzer_search_inner.tooltipstered'),':after').getPropertyValue('content')";
Thread.sleep(3000);
JavascriptExecutor js = (JavascriptExecutor) driver;
String content = (String) js.executeScript(script);
System.out.println(content);

Resources