HtmlUnit getByXpath returns null - xpath

I am coding with Groovy, however, I don't believe its a language specific set of questions.
I actually have two questions
First Question
I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.
The page I'm testing it on is:
http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4
My code:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")
println title
This simply prints out: []
Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.
Second Question
I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[#id="gmi-ResViewSizer_img"]
How do I handle that?

First Answer:
/html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a
Your XPATH was off by one in the predicate filter for the 4th div of the body, it should be the 3rd div. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.
Maybe something like this:
/html/body//div/h1/a
Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but // starts at the root node and looks throughout every node in the tree, * matches on any element(to include the img) and the [] predicate filter restricts it to those that have an id attribute who's value equals "gmi-ResViewSizer_img".
There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that img:
/html/body/div/div/div/div/img[1]

I had the same problem, I solved when I realize iframe tags on page, try call
((HtmlPage)current_page.getFrames()[n].getEnclosedPage()).getElementByXPath(...
where n is the position in frame in iframe collection. It's work for me !!!
Thanks a lot.

Related

How to properly scraping filtered content using XPath Query to Google Sheet?

So, this is about a content from a website which I want to get and put it in my Google Sheets, but I'm having difficulty understanding the class of the content.
target link: https://www.cnbc.com/quotes/?symbol=XAU=
This number is what I want to get from. Picture 1: The part which i want to scrape
And this is what the code looks like in inspector. Picture 2: The code shown in inspector
The target is inside a span attribute but the span attribute looks very difficult to me, so I tried to simplify it using this line of code here =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span")
Picture 3: List is shown when putting the code
After some tries, I am able to get the right target, but it confuse me, Im using this code =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span[#class='last original'][1]")
Picture 4: The right target is shown when the xpath query is more specified
As what you can see in 2nd Picture, 'last original' is not really the full name of the class, when I put the 'last original ng-binding' instead it gave me an error saying imported content is empty
So, correct me if my code is wrong, or accidental worked out somehow because there's another correct way?
How about this answer?
Modified formula 1:
When the name of class is last original and last original ng-binding, how about the following xpath and formula?
=IMPORTXML(A1,"//span[contains(#class,'last original')][1]")
In this case, the URL of https://www.cnbc.com/quotes/?symbol=XAU= is put in the cell "A1".
In this case, //span[contains(#class,'last original')][1] is used as the xpath. The value of span that the name of class includes last original is retrieved. So last original and last original ng-binding can be used.
Modified formula2:
As other xpath, how about the following xpath and formula?
=IMPORTXML(A1,"//meta[#itemprop='price']/#content")
It seems that the value is included in the metadata. So this sample retrieves the value from the metadata.
Reference:
IMPORTXML
To complete #Tanaike's answer, two alternatives :
=IMPORTXML(B2;"//span[#class='year high']")
"Year high" seems always equal to the current stock index value.
Or, with value retrieved from the script element :
=IMPORTXML(B2;"substring-before(substring-after(//script[contains(.,'modApi')],'""last\"":\""'),'\')")
Note : since I'm based in Europe, you need to replace ; with , in the formulas.

Is it possible to use Following and preceding in combination in Selenium?

On this page
https://en.wikipedia.org/wiki/Trinity_Seven#Episode_list
I have:
//*[text()='Reception']//preceding::th[contains(#id, 'ep')]//following::I
But it only registers following.
The default firepath selector is: .//*[#id='mw-content-text']/div/table[5]/tbody/tr/td[1]/I but this kind of selector is known to break quite frequently. Just wondering if there is a better way of doing this and I thought this might be a way.
Thanks!
:)
- You can see that it's getting stuff under the table which is not what I want :S
Try to use below XPath to match required elements:
//th[contains(#id, 'ep')]/following::I[./following::*[text()='Reception']]
This looks more simple
//tr[contains(#class, 'vevent')]//i
Don't overcomplicate things. You need I tag inside each row. So just find row locator tr[contains(#class, 'vevent')] and get it's I
Another good approach in case you want to check that inside of parent element is located some special element, but you want to find some 3rd element is to use such style: //element[./specific]//child , so in your case:
//tr[contains(#class, 'vevent')][./th[contains(#id,'ep')]]//i
so it's I tag inside row that contains #id,'ep' in header

getting attribute via xpath query succesfull in browser, but not in Robot Framework

I have a certain XPATH-query which I use to get the height from a certain HTML-element which returns me perfectly the desired value when I execute it in Chrome via the XPath Helper-plugin.
//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
However, when I use the same query via the Get Element Attribute-keyword in the Robot Framework
Get Element Attribute//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
... then I got an InvalidSelectorException about this XPATH.
InvalidSelectorException: Message: u'invalid selector: Unable to locate an
element with the xpath expression `//*/div[#class="BarChart"]/*[name()="svg"]/*
[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/`
So, the Robot Framework or Selenium removed the #-sign and everything after it. I thought it was an escape -problem and added and removed some slashes before the #height, but unsuccessful. I also tried to encapsulate the result of this query in the string()-command but this was also unsuccessful.
Does somebody has an idea to prevent my XPATH-query from getting broken?
It looks like you can't include the attribute axis in the XPath itself when you're using Robot. You need to retrieve the element by XPath, and then specify the attribute name outside that. It seems like the syntax is something like this:
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])#height
or perhaps (I've never used Robot):
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])[1]#height
This documentation says
attribute_locator consists of element locator followed by an # sign and attribute name, for example "element_id#class".
so I think what I've posted above is on the right track.
You are correct in your observation that the keyword seems to removes everything after the final #. More correctly, it uses the # to separate the element locator from the attribute name, and does this by splitting the string at that final # character.
No amount of escaping will solve the problem as the code isn't doing any parsing at this point. This is the exact code (as of this writing...) that performs that operation:
def _parse_attribute_locator(self, attribute_locator):
parts = attribute_locator.rpartition('#')
...
The simple solution is to drop that trailing slash, so your xpath will look like this:
//*/div[#class="BarChart"]/... and #class="bar bar1"]#height`

Make 1 page objects Two Elements ID's to 1 page object Variable

I am using the page object Gem with Watir. During testing I found that I have a field that has the same contents that show in the same location but have separate unique ID's. The difference is before you get to the page.
I tried using Xpaths:
select_list(:selectionSpecial, :xpath => "//select[#id='t_id9' OR #id='t_id7']")
But was met with a script error.
They are static ID's but I want to force them into one variable since that would allow me to use "populate_page_with" feature.
I have a long winded way currently, but I am fishing for a more efficient way that works with the page object Features.
Does anyone know of a way to do this?
Your approach of using xpath can work. The problem is the syntax errors in the xpath selector. It should be:
"//select[#id='t_id9' or #id='t_id7']"
Note:
The start should be a // rather than a \
Using or is case-sensitive; it has to be lower case
There was also a missing closing ' for the first id attribute
Personally, I find css and xpath selectors harder to use. I would go with the id locator with a regex. The following gives the same results, but some will find it easier to read.
select_list(:selectionSpecial, :id => /^t_id(7|9)$/)

using xpath in selenium.get.Text and selenium.click

I have Адреса магазинов on page and want to store text, then click on this link and verify that the page where am I going to contains this text in headers. So I tried to find element by xpath, and selenium.getText get the right result, but selenium.click goes to another link. Where have I made a mistake? Thanks in advance!
String m_1 = selenium.getText("xpath=html/body/div[3]/div[2]/div[1]/h4[1]");
selenium.click("xpath=html/body/div[3]/div[2]/div[1]/h4[1]");
selenium.waitForPageToLoad("30000");
assertTrue(selenium.getText("css=h3").contains(m_1));
page:http://www.svyaznoy.ru/map/
Resume:
using xpath=//descendant::a[#href='/address_shops/'][2] or css=div.deff_one_column a[href='/address_shops/'] get right results
using xpath=//a[#href='/address_shops/'] - Element is not currently visible
xpath=//a[#href='/address_shops/'][2] - Element not found
There is a missing slash at the beginning of the expression. I am kind of surprised this got through at all - the first slash means "begin at root node".
Also, it is better to select the <a> element instead of the <h>. Sometimes it works, sometimes is misclicks, sometimes the click doesn't do anything at all. Try to be as concrete as you can be.
Try this one.
String m1 = selenium.getText("xpath=/html/body/div[3]/div[2]/div/h4/a");
selenium.click("xpath=/html/body/div[3]/div[2]/div/h4/a");
selenium.waitForPageToLoad("30000");
// your variable is named m1, but m_1 was used here
assertTrue(selenium.getText("css=h3").contains(m1));
By the way, there are even better XPath expressions you could use. See the documentation, it really is helpful. Just an example, this would work, too, and is much easier to write and read:
String m1 = selenium.getText("xpath=//a[#href='/address_shops/']");
selenium.click("xpath=//a[#href='/address_shops/']");
Sorry, didn't notice page link. Css for second link can be something like that css=div.deff_one_column a[href='/address_shops/']

Resources