Xpath expression pulling multiple items despite specifying item with [ ] - xpath

I am trying to write an XPath expression which can return the URL associated with the next page of a search.
The URL which leads to the next page of the search is always the href in the a tag following the tag span class="navCurrentPage" I have been trying to use a following-sibling term to pull the next URL. My search in the Chrome console is:
$x('//span[#class="navCurrentPage"][1]/following-sibling::a/#href[1]')
I thought by specifying #href[1] I would only get back one URL (thinking the [1] chooses the first element in list), but instead Chrome (and Scrapy) are returning four URLs. I don't understand why. Please help me to understand how to select the one URL that I am looking for.
Here is the URL where you can find the HTML giving me trouble:
https://www.yachtworld.com/core/listing/cache/searchResults.jsp?cit=true&slim=quick&ybw=&sm=3&searchtype=advancedsearch&Ntk=boatsEN&Ntt=&is=false&man=&hmid=102&ftid=101&enid=0&type=%28Sail%29&fromLength=35&toLength=50&fromYear=1985&toYear=2010&fromPrice=&toPrice=&luom=126&currencyid=100&city=&rid=100&rid=101&rid=104&rid=105&rid=107&rid=108&rid=112&rid=114&rid=115&rid=116&rid=128&rid=130&rid=153&pbsint=&boatsAddedSelected=-1
Thank you for the help.

Operator precedence: //x[1] means /descendant-or-self::node()/child::x[1] which finds every descendant x that is the first child of its parent. You want (//x)[1] which finds the first node among all the descendants named x.

xpath index will apply on all matching records, if you want to get only the first item, get the first instance.
$x('//span[#class="navCurrentPage"][1]/following-sibling::a/#href[1]').extract_first()

just add, .extract_first() or .get() to fetch the first item.
see the scrapy documentation here.

I've found this very helpful to make sure you have the bracket in the right place.
What is the XPath expression to find only the first occurrence?
also, the first occurrence may be [0] not [1]

Related

Is it possible to use Following and preceding in combination in Selenium?

On this page
https://en.wikipedia.org/wiki/Trinity_Seven#Episode_list
I have:
//*[text()='Reception']//preceding::th[contains(#id, 'ep')]//following::I
But it only registers following.
The default firepath selector is: .//*[#id='mw-content-text']/div/table[5]/tbody/tr/td[1]/I but this kind of selector is known to break quite frequently. Just wondering if there is a better way of doing this and I thought this might be a way.
Thanks!
:)
- You can see that it's getting stuff under the table which is not what I want :S
Try to use below XPath to match required elements:
//th[contains(#id, 'ep')]/following::I[./following::*[text()='Reception']]
This looks more simple
//tr[contains(#class, 'vevent')]//i
Don't overcomplicate things. You need I tag inside each row. So just find row locator tr[contains(#class, 'vevent')] and get it's I
Another good approach in case you want to check that inside of parent element is located some special element, but you want to find some 3rd element is to use such style: //element[./specific]//child , so in your case:
//tr[contains(#class, 'vevent')][./th[contains(#id,'ep')]]//i
so it's I tag inside row that contains #id,'ep' in header

getting attribute via xpath query succesfull in browser, but not in Robot Framework

I have a certain XPATH-query which I use to get the height from a certain HTML-element which returns me perfectly the desired value when I execute it in Chrome via the XPath Helper-plugin.
//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
However, when I use the same query via the Get Element Attribute-keyword in the Robot Framework
Get Element Attribute//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
... then I got an InvalidSelectorException about this XPATH.
InvalidSelectorException: Message: u'invalid selector: Unable to locate an
element with the xpath expression `//*/div[#class="BarChart"]/*[name()="svg"]/*
[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/`
So, the Robot Framework or Selenium removed the #-sign and everything after it. I thought it was an escape -problem and added and removed some slashes before the #height, but unsuccessful. I also tried to encapsulate the result of this query in the string()-command but this was also unsuccessful.
Does somebody has an idea to prevent my XPATH-query from getting broken?
It looks like you can't include the attribute axis in the XPath itself when you're using Robot. You need to retrieve the element by XPath, and then specify the attribute name outside that. It seems like the syntax is something like this:
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])#height
or perhaps (I've never used Robot):
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])[1]#height
This documentation says
attribute_locator consists of element locator followed by an # sign and attribute name, for example "element_id#class".
so I think what I've posted above is on the right track.
You are correct in your observation that the keyword seems to removes everything after the final #. More correctly, it uses the # to separate the element locator from the attribute name, and does this by splitting the string at that final # character.
No amount of escaping will solve the problem as the code isn't doing any parsing at this point. This is the exact code (as of this writing...) that performs that operation:
def _parse_attribute_locator(self, attribute_locator):
parts = attribute_locator.rpartition('#')
...
The simple solution is to drop that trailing slash, so your xpath will look like this:
//*/div[#class="BarChart"]/... and #class="bar bar1"]#height`

Xpath Multiple Predicates

I am trying to quickly find a specific node using XPath but it seems my multiple predicates are not working. The div I need has a specific class, but there are 3 others that have it. I want to select the fourth one so I did the following:
//div[#class='myCLass' and 4]
However the "4" is being ignored. Any help? I am new to XPath.
Thanks.
If a xpath query returns a node set you can always use the [OFFSET] operator to access a certain element of it.
Use the following query to access the fourth element that matches the #class='myClass' predicate:
//div[#class='myCLass'][4]
#WilliamNarmontas answer might be an alternative to the syntax showed above.
Alternatively,
//div[#class='myCLass' and position()=4]
The accepted answer works correctly only if all of the div elements have the same parent. Otherwise use:
(//div[#class='myCLass'])[4]

using xpath in selenium.get.Text and selenium.click

I have Адреса магазинов on page and want to store text, then click on this link and verify that the page where am I going to contains this text in headers. So I tried to find element by xpath, and selenium.getText get the right result, but selenium.click goes to another link. Where have I made a mistake? Thanks in advance!
String m_1 = selenium.getText("xpath=html/body/div[3]/div[2]/div[1]/h4[1]");
selenium.click("xpath=html/body/div[3]/div[2]/div[1]/h4[1]");
selenium.waitForPageToLoad("30000");
assertTrue(selenium.getText("css=h3").contains(m_1));
page:http://www.svyaznoy.ru/map/
Resume:
using xpath=//descendant::a[#href='/address_shops/'][2] or css=div.deff_one_column a[href='/address_shops/'] get right results
using xpath=//a[#href='/address_shops/'] - Element is not currently visible
xpath=//a[#href='/address_shops/'][2] - Element not found
There is a missing slash at the beginning of the expression. I am kind of surprised this got through at all - the first slash means "begin at root node".
Also, it is better to select the <a> element instead of the <h>. Sometimes it works, sometimes is misclicks, sometimes the click doesn't do anything at all. Try to be as concrete as you can be.
Try this one.
String m1 = selenium.getText("xpath=/html/body/div[3]/div[2]/div/h4/a");
selenium.click("xpath=/html/body/div[3]/div[2]/div/h4/a");
selenium.waitForPageToLoad("30000");
// your variable is named m1, but m_1 was used here
assertTrue(selenium.getText("css=h3").contains(m1));
By the way, there are even better XPath expressions you could use. See the documentation, it really is helpful. Just an example, this would work, too, and is much easier to write and read:
String m1 = selenium.getText("xpath=//a[#href='/address_shops/']");
selenium.click("xpath=//a[#href='/address_shops/']");
Sorry, didn't notice page link. Css for second link can be something like that css=div.deff_one_column a[href='/address_shops/']

Can't get nth node in Selenium

I try to write xpath expressions so that my tests won't be broken by small design changes. So instead of the expressions that Selenium IDE generates, I write my own.
Here's an issue:
//input[#name='question'][7]
This expression doesn't work at all. Input nodes named 'question' are spread across the page. They're not siblings.
I've tried using intermediate expression, but it also fails.
(//input[#name='question'])[2]
error = Error: Element (//input[#name='question'])[2] not found
That's why I suppose Seleniun has a wrong implementation of XPath.
According to XPath docs, the position predicate must filter by the position in the nodeset, so it must find the seventh input with the name 'question'. In Selenium this doesn't work. CSS selectors (:nth-of-kind) neither.
I had to write an expression that filters their common parents:
//*[contains(#class, 'question_section')][7]//input[#name='question']
Is this a Selenium specific issue, or I'm reading the specs wrong way? What can I do to make a shorter expression?
Here's an issue:
//input[#name='question'][7]
This expression doesn't work at all.
This is a FAQ.
[] has a higher priority than //.
The above expression selects every input element with #name = 'question', which is the 7th child of its parent -- and aparently the parents of input elements in the document that is not shown don't have so many input children.
Use (note the brackets):
(//input[#name='question'])[7]
This selects the 7th element input in the document that satisfies the conditions in the predicate.
Edit:
People, who know Selenium (Dave Hunt) suggest that the above expression is written in Selenium as:
xpath=(//input[#name='question'])[7]
If you want the 7th input with name attribute with a value of question in the source then try the following:
/descendant::input[#name='question'][7]

Resources