I have Адреса магазинов on page and want to store text, then click on this link and verify that the page where am I going to contains this text in headers. So I tried to find element by xpath, and selenium.getText get the right result, but selenium.click goes to another link. Where have I made a mistake? Thanks in advance!
String m_1 = selenium.getText("xpath=html/body/div[3]/div[2]/div[1]/h4[1]");
selenium.click("xpath=html/body/div[3]/div[2]/div[1]/h4[1]");
selenium.waitForPageToLoad("30000");
assertTrue(selenium.getText("css=h3").contains(m_1));
page:http://www.svyaznoy.ru/map/
Resume:
using xpath=//descendant::a[#href='/address_shops/'][2] or css=div.deff_one_column a[href='/address_shops/'] get right results
using xpath=//a[#href='/address_shops/'] - Element is not currently visible
xpath=//a[#href='/address_shops/'][2] - Element not found
There is a missing slash at the beginning of the expression. I am kind of surprised this got through at all - the first slash means "begin at root node".
Also, it is better to select the <a> element instead of the <h>. Sometimes it works, sometimes is misclicks, sometimes the click doesn't do anything at all. Try to be as concrete as you can be.
Try this one.
String m1 = selenium.getText("xpath=/html/body/div[3]/div[2]/div/h4/a");
selenium.click("xpath=/html/body/div[3]/div[2]/div/h4/a");
selenium.waitForPageToLoad("30000");
// your variable is named m1, but m_1 was used here
assertTrue(selenium.getText("css=h3").contains(m1));
By the way, there are even better XPath expressions you could use. See the documentation, it really is helpful. Just an example, this would work, too, and is much easier to write and read:
String m1 = selenium.getText("xpath=//a[#href='/address_shops/']");
selenium.click("xpath=//a[#href='/address_shops/']");
Sorry, didn't notice page link. Css for second link can be something like that css=div.deff_one_column a[href='/address_shops/']
Related
I am trying to write an XPath expression which can return the URL associated with the next page of a search.
The URL which leads to the next page of the search is always the href in the a tag following the tag span class="navCurrentPage" I have been trying to use a following-sibling term to pull the next URL. My search in the Chrome console is:
$x('//span[#class="navCurrentPage"][1]/following-sibling::a/#href[1]')
I thought by specifying #href[1] I would only get back one URL (thinking the [1] chooses the first element in list), but instead Chrome (and Scrapy) are returning four URLs. I don't understand why. Please help me to understand how to select the one URL that I am looking for.
Here is the URL where you can find the HTML giving me trouble:
https://www.yachtworld.com/core/listing/cache/searchResults.jsp?cit=true&slim=quick&ybw=&sm=3&searchtype=advancedsearch&Ntk=boatsEN&Ntt=&is=false&man=&hmid=102&ftid=101&enid=0&type=%28Sail%29&fromLength=35&toLength=50&fromYear=1985&toYear=2010&fromPrice=&toPrice=&luom=126¤cyid=100&city=&rid=100&rid=101&rid=104&rid=105&rid=107&rid=108&rid=112&rid=114&rid=115&rid=116&rid=128&rid=130&rid=153&pbsint=&boatsAddedSelected=-1
Thank you for the help.
Operator precedence: //x[1] means /descendant-or-self::node()/child::x[1] which finds every descendant x that is the first child of its parent. You want (//x)[1] which finds the first node among all the descendants named x.
xpath index will apply on all matching records, if you want to get only the first item, get the first instance.
$x('//span[#class="navCurrentPage"][1]/following-sibling::a/#href[1]').extract_first()
just add, .extract_first() or .get() to fetch the first item.
see the scrapy documentation here.
I've found this very helpful to make sure you have the bracket in the right place.
What is the XPath expression to find only the first occurrence?
also, the first occurrence may be [0] not [1]
On this page
https://en.wikipedia.org/wiki/Trinity_Seven#Episode_list
I have:
//*[text()='Reception']//preceding::th[contains(#id, 'ep')]//following::I
But it only registers following.
The default firepath selector is: .//*[#id='mw-content-text']/div/table[5]/tbody/tr/td[1]/I but this kind of selector is known to break quite frequently. Just wondering if there is a better way of doing this and I thought this might be a way.
Thanks!
:)
- You can see that it's getting stuff under the table which is not what I want :S
Try to use below XPath to match required elements:
//th[contains(#id, 'ep')]/following::I[./following::*[text()='Reception']]
This looks more simple
//tr[contains(#class, 'vevent')]//i
Don't overcomplicate things. You need I tag inside each row. So just find row locator tr[contains(#class, 'vevent')] and get it's I
Another good approach in case you want to check that inside of parent element is located some special element, but you want to find some 3rd element is to use such style: //element[./specific]//child , so in your case:
//tr[contains(#class, 'vevent')][./th[contains(#id,'ep')]]//i
so it's I tag inside row that contains #id,'ep' in header
My current issue is to find HTML-Tags inside of property values. I thought it would be easy to search with a query like /jcr:root/content/xgermany//*[jcr:contains(., '<strong>')] order by #jcr:score
It looks like there is a problem with the chars < and > because this query finds everything which has strong in it's property. It finds <strong>Some Text</strong> but also This is a strong man.
Also the Query Builder API didn't helped me.
Is there a possibility to solve it with a XPath or SQL Query or do I have to iterate through the whole content?
I don't fully understand why it finds This is a strong man as a result for '<strong>', but it sounds like the unexpected behavior comes from the "simple search-engine syntax" for the second argument to jcr:contains(). Apparently the < > are just being ignored as "meaningless" punctuation.
You could try quoting the search term:
/jcr:root/content/xgermany//*[jcr:contains(., '"<strong>"')]
though you may have to tweak that if your whole XPath expression is enclosed in double quotes.
Of course this will not be very robust even if it works, since you're trying to find HTML elements by searching for fixed strings, instead of actually parsing the HTML.
If you have an specific jcr:primaryType and the targeted properties you can do something like this
select * from nt:unstructured where text like '%<strong>%'
I tested it , but you need to know the properties you are intererested in.
This is jcr-sql syntax
Start using predicates like a champ this way all of this will make sense to you!
HTML Encode <strong>
HTML Decimal <strong>
Query builder is your friend:
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%3Cstrong%3E%25
Predicates: (like a CHAMP!)
path=/content/geometrixx
type=nt:unstructured
property=text
property.operation=like
property.value=%<strong>%
Have a go here:
http://localhost:4502/libs/cq/search/content/querydebug.html?charset=UTF-8&query=path%3D%2Fcontent%2Fgeometrixx%0D%0Atype%3Dnt%3Aunstructured%0D%0Aproperty%3Dtext%0D%0Aproperty.operation%3Dlike%0D%0Aproperty.value%3D%25%26lt%3Bstrong%26gt%3B%25
XPath:
/jcr:root/content/geometrixx//element(*, nt:unstructured)
[
jcr:like(#text, '%<strong>%')
]
SQL2 (already covered... NASTY YUK..)
SELECT * FROM [nt:unstructured] AS s WHERE ISDESCENDANTNODE([/content/geometrixx]) and text like '%<strong>%'
Although I'm sure it's entirely possible with a string of predicates, it's possibly heading down the wrong route. Ideally it would be better to parse the HTML when it is stored or published.
The required information would be stored on simple properties on the node in question. The query will then be a lot simpler with just a property = value query, than lots of overly complex query syntax.
It will probably be faster too.
So if you read in your HTML with something like HTMLClient and then parse it with a OSGI service, that can accurately save these properties for you. Every time the HTML is changed the process would update these properties as necessary. Just some thoughts if your SQL is getting too much.
I have a certain XPATH-query which I use to get the height from a certain HTML-element which returns me perfectly the desired value when I execute it in Chrome via the XPath Helper-plugin.
//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
However, when I use the same query via the Get Element Attribute-keyword in the Robot Framework
Get Element Attribute//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/#height
... then I got an InvalidSelectorException about this XPATH.
InvalidSelectorException: Message: u'invalid selector: Unable to locate an
element with the xpath expression `//*/div[#class="BarChart"]/*[name()="svg"]/*
[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"]/`
So, the Robot Framework or Selenium removed the #-sign and everything after it. I thought it was an escape -problem and added and removed some slashes before the #height, but unsuccessful. I also tried to encapsulate the result of this query in the string()-command but this was also unsuccessful.
Does somebody has an idea to prevent my XPATH-query from getting broken?
It looks like you can't include the attribute axis in the XPath itself when you're using Robot. You need to retrieve the element by XPath, and then specify the attribute name outside that. It seems like the syntax is something like this:
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])#height
or perhaps (I've never used Robot):
Get Element Attribute xpath=(//*/div[#class="BarChart"]/*[name()="svg"]/*[name()="svg"]/*[name()="g"]/*[name()="rect" and #class="bar bar1"])[1]#height
This documentation says
attribute_locator consists of element locator followed by an # sign and attribute name, for example "element_id#class".
so I think what I've posted above is on the right track.
You are correct in your observation that the keyword seems to removes everything after the final #. More correctly, it uses the # to separate the element locator from the attribute name, and does this by splitting the string at that final # character.
No amount of escaping will solve the problem as the code isn't doing any parsing at this point. This is the exact code (as of this writing...) that performs that operation:
def _parse_attribute_locator(self, attribute_locator):
parts = attribute_locator.rpartition('#')
...
The simple solution is to drop that trailing slash, so your xpath will look like this:
//*/div[#class="BarChart"]/... and #class="bar bar1"]#height`
I am coding with Groovy, however, I don't believe its a language specific set of questions.
I actually have two questions
First Question
I've run into an issue while using HtmlUnit. It is telling me that what I am trying to grab is null.
The page I'm testing it on is:
http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4
My code:
client = new WebClient(BrowserVersion.FIREFOX_3)
client.javaScriptEnabled = false
page = client.getPage(url)
//coming up as null
title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a")
println title
This simply prints out: []
Is this because the page uses onclick()? If so, how would I get around that? Enabling javascript creates a mess in my cmd prompt.
Second Question
I am wanting to also get the image but am having trouble because when I attempt to get the XPath (via firebug) it shows up as: //*[#id="gmi-ResViewSizer_img"]
How do I handle that?
First Answer:
/html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a
Your XPATH was off by one in the predicate filter for the 4th div of the body, it should be the 3rd div. It appears the HTML for the site can/does change from when you had origionally snagged the XPATH using Firebug. You may need to adjust your XPATH to accommodate for potential change and be less sensitive to some differences in document structure.
Maybe something like this:
/html/body//div/h1/a
Second Answer: The XPATH that you listed will work. It may look odd/short(and may not be the most efficient), but // starts at the root node and looks throughout every node in the tree, * matches on any element(to include the img) and the [] predicate filter restricts it to those that have an id attribute who's value equals "gmi-ResViewSizer_img".
There are many other options for XPATHs that could work as well. It will also depend on how often the HTML structure changes. This is one that also works for the page referenced to select that img:
/html/body/div/div/div/div/img[1]
I had the same problem, I solved when I realize iframe tags on page, try call
((HtmlPage)current_page.getFrames()[n].getEnclosedPage()).getElementByXPath(...
where n is the position in frame in iframe collection. It's work for me !!!
Thanks a lot.