I am new to XPath. I have a html source of the webpage
http://london.craigslist.co.uk/com/1233708939.html
Now I want to extract the following data from the above page
Full Date
Email - just below the date
I also want to find the existence of the button "Reply to this post" on the page
http://sfbay.craigslist.org/sfc/w4w/1391399758.html
Can anyone help me in writing the three XPath expressions for the above three data.
You don't need to write these yourself, or even figure them out yourself. If you use the Firebug plugin, go to the page, right click on the elements you want, click 'Inspect element' and Firebug will popup the HTML in a viewer at the bottom of your browser. Right click on the desired element in the HTML viewer and click on 'Copy XPath'.
That said, the XPath expression you're looking for (for #3) is:
/html/body/div[4]/form/button
...obtained via the method described above.
I noticed that the DTD is HTML 4/01 Transitional and not XHTML for the first link, so there's no guarantee that this is a valid XML document, and it may not be loaded correctly by an XML parser. In fact, I see several tags that aren't properly closed (i.e. <hr>, etc)
I don't know the first one off hand, and the third one was just answered by Alex, but the second one is /html/body/a[0].
As of your first page it's just impossible to do because this is not the way xpath works. In order for an xpath expression to select something that "something" must be a node (ie an element)
The second page is fairly easy, but you need an "id" attribute in order to do that (or anything that can make sure your button is unique). For example if you are sure the text "Reply to this post" correctly identify the button just do it with
//button["Reply to this post"]
Related
I'm trying to click on "Mr" from the drop down list I've tried a combination of things but non of them seem to work.
I've even tried xpath which is usually reliable but for this case its failing.
$browser.element(:xpath, "/html/body/div[1]/div[1]/div[1]/div/div[2]/div[1]/div[2]/div/div[2]/div/div[2]/div[2]/form/div[2]/div/div[2]/div/div/div/div/div/ul/li[2]/a").click
The XPath suggested by Saurabh Gaur, can be written in a more readable Watir-like fashion using:
$browser.ul(class: 'dropdown-menu').link(text: 'Mr').click
Note that this assumes that there is only one ul element with class dropdown-menu. If there are multiple, you will need to scope the search to the specific dropdown using an element that likely exists higher in the DOM.
However, given there is likely only one link with text "Mr", you can probably get away with simply:
$browser.link(text: 'Mr').click
Given the link is a dialog that switches from hidden to visible, you may need to also wait:
$browser.link(text: 'Mr').when_present.click
Your xPath is positional which depends on element position.. it will not work if elements are change their position means adding some elements after some action on the page.
After seeing your attached image I have generated following xPath as below :-
//ul[contains(#class, 'dropdown-menu')]/descendant::span[contains(.,'Mr')]/parent::a
Try with this xPath.. May be it will work...:)
I have the following HTML:
<input type="button" value="Close List" class="tiny round success button" id="btnSaveCloseListPanel">
The following code does not work:
# browser.button(:value => "Close List").click # does not work - timeout
browser.button(:xpath => "/html/body/center/div/div[9]/div[2]/input[2]").when_present.click
The error is:
Watir::Wait::TimeoutError:
timed out after 60 seconds
when_present(300) does not work.
I found the XPath using Firefox Developer Tools. I used the complete path to avoid any silly errors. I can find the same path manually in IE.
The component is a .NET MVC popup. I think it's called a "panel". The panel is a grandchild of the Internet Explorer tab.
The panel contains a datepicker, a dropdown, a text box, and 3 buttons. I can't find any of these using Watir. I can find anything in the panel's parent (obviously).
The underlying code does not seem to be aware that something actually doesn't exist. To prove that, I tested the following XPath, which is simply the above XPath with the middle bit removed:
browser.button(:xpath => "/html/body/center/div/input[2]").when_present.click
The error is "timeout", rather than "doesn't exist".
So, the code seems to be unaware that:
input[1] does not exist, therefore input[2] cannot exist.
div[2] does not exist.
Therefore there's nothing left to search.
Added:
I'm changing the specific element that I want to find.
Reason: The button in my OP was at the foot of the panel. I was going cross-eyed trying to step upwards through hundreds of lines of HTML. Instead, I'm now using the first field in the panel. All the previous info is still the same.
The first field is a text field with datepicker.
The HTML is:
<input type="text" value="" style="width:82px!important;" readonly="readonly" name="ListDateClosed" id="ListDateClosed" class="hasDatepicker">
Using F12 in Firefox, the XPath is:
/html/body/center/div/div[1]/div[2]/input
But, now, with a lot less lines of HTML, I can clearly see that the html tag is not the topmost html tag in the file. The parent of html is iframe
I've never used iframe before. Maybe this is what t0mppa was referring to in his comment to the first questiion.
As an experiment, I modified my XPath to:
browser.text_field(:xpath, '//iframe/html/body/center/div/div[1]/div[2]/input').when_present.set("01-Aug-2014")
But this times out, even with a 3-minute timeout.
Given that the elements are in an iframe, there are two things to note:
Unlike other elements types, you must always tell Watir when an element is in an iframe.
XPaths (in the context of Watir) cannot be used to cross into frames.
Assuming that there is only 1 iframe on the page, you can explicitly tell Watir to search the first iframe by using the iframe method:
browser.iframe.text_field(:xpath, '//body/center/div/div[1]/div[2]/input').when_present.set("01-Aug-2014")
If there are multiple iframes, you can use the usual locators to be more specific about which iframe. For example, if the iframe had an id:
browser.iframe(id: 'iframe_id')
.text_field(xpath: '//body/center/div/div[1]/div[2]/input')
.when_present
.set("01-Aug-2014")
I'm using the Firefox DevTools and I'm using the Inspector tab, in which it displays the HTML tree.
When I use the Search HTML feature, it only searches tags. So suppose I have this:
<div class="lol">textinsidediv</div>
And when I search for 'div', it returns the <div> accordingly. However, if I search for 'textinside' it doesn't match the text inside the content despite the fact that it's starting right there.
My question: How can I search for any arbitrary string within this HTML tree?
(In contrary Firebug performs a simple text search just as expected.)
The search in the Inspector panel of the Firefox DevTools allows to search for text content since Firefox 45 (see bug 835896).
Btw. since Firebug 2.0 you're also able to search in the HTML panel using CSS selectors (additionally to the plain text search).
It is not some really useless html tag search, it actually searches CSS selectors (same as what you use with css, querySelector in Javascript, or jQuery selector)
So you can search #id, go through all elements of a certain class by searching .class, you can even search for all elements with attribute including text, for example [class*="o"] should give all elements with letter o in class attribute. This is helpful for what designers/developers want to find - to find text, you can Ctrl+F within the page, then right click, inspect element.
A good idea is to Copy Inner HTML but even better is Edit As HTML. That brings up an in-place panel displaying the full text, and it can be searched with Ctrl+F or cmd+F.
Ctrl+G to find next, Ctrl+Shift+G to find previous.
Workaround: in the inspector, right-click the outermost tag, then click 'Copy Inner HTML'. Paste in a word processor and search there.
This is now fixed as of FireFox 45...
https://developer.mozilla.org/en-US/docs/Tools/Page_Inspector/How_to/Examine_and_edit_HTML#Searching
See answer.
As an alternative to searching within dev tools...
Right click on the page anywhere
Select View Source
Within the source view you can easily perform a text search by doing a ctrl/f or cmd/f if on a mac.
I'm having trouble with my xpaths in the Firefox plugin. I have three textboxes, the first one has ID=login and the rest has dynamically generated IDs. The first one works fine to write in the plugin, //input[#id='login'] but as soon as I try something more advanced, it cannot find anything. After reading plenty of forum posts, I've tried the XPather plugin to generate the xpath codes but the long html/css-filled strings don't work either. In some threads, people write "xpath=//..." and I tried that too, to no result.
Thankful for all help possible!
//M
The two tools I use are Firebug and XPath Checker, both Firefox add-ons.
In Firebug, select the HTML tab, right click on the element and select "Copy XPath"
For XPath Checker, right click on the element in the page and select "View XPath".
You can check this by pasting the result into the target field of Selenium IDE and then clicking the "Find" button. This tells what the Selenium result is going to be (much faster than having to run the test over and over!)
I've found that you have to "massage" the result somewhat e.g.
If you go to the Google home page and look for the "Images" link at the top left, the XPath Checker gives the image XPath as:
//id('gbar')/x:nobr/x:a1
This throws an error:
[error] locator not found: //id('gbar')/x:nobr/x:a1, error = Error: Invalid xpath 2: //id('gbar')/x:nobr/x:a1
If you remove the id e.g
//x:nobr/x:a1
you'll notice that the image has a green box around it showing thet the IDE has correctly parsed it.
Alright, I found some new information and it does work with writing "xpath=/" and the auto-generated codes from XPather. Although, the xpaths are very long and rely a lot on the HTML tags staying the same.
Does anyone have a better idea for xpath expressions (or css expressions) to write in order to enter information in the following three textbox elements:
TEXTBOX 1
Firebug: input type="text" tabindex="110"
name="6011300f91d9f186d1b7ab1a034827da"
id="input1"
Working xpath in Selenium, retrieved from XPather:
xpath=/html/body[#id='fire']/form[#id='loginForm']/div/div/div[#id='container']/div[#id='content']/div[#id='contentPadding']/div[1]/fieldset[1]/input[#id='input1']
TEXTBOX 2
input type="password" tabindex="110"
name="0de4766295b7a965fc4969da3df6824ba"
xpath=/html/body[#id='fire']/form[#id='loginForm']/div/div/div[#id='container']/div[#id='content']/div[#id='contentPadding']/div[1]/fieldset[2]/input
TEXTBOX 3
input type="password" tabindex="110" class="last" name="6c4e0fcde65e258c3dc7c508a1cc666a"
xpath=/html/body[#id='fire']/form[#id='loginForm']/div/div/div[#id='container']/div[#id='content']/div[#id='contentPadding']/div[1]/fieldset[3]/input
//M
I want to retrieve the xpath of an attribute (example "brand" of a product from a retailer website).
One way of doing it is using addons like xpather or xpath checker to firefox, opening up the website using firefox and right clicking the desired attrbute I am interested in. This is ok. But I want to capture this information for many attributes and right clicking each and every attribute maybe time consuming. Also, the other problem I have is that attributes I maybe interested in will be there for one product. The other attributes maybe for some other product. So, I will have to go that product & then do it manually again.
Is there an automated or programatic way of retrieving the xpath of the desired attributes from a website rather than having to do this manually?
You must notice that not all websites use valid XML that you can use xpath on...
That said, you should check out some HTML parsers that will allow you to use xpath on HTML even if it is not a valid XML.
Since you did not specify the technology you are working with - I'll suggest the .NET HTML Agility Pack, if you need others, search for questions dealing with this here on SO.
The solution I use for this kind of thing is to write an xpath something like this:
//*[text()="Brand"]/following-sibling::*
//*[text()="Color"]/following-sibling::*
//*[text()="Size"]/following-sibling::*
//*[text()="Material"]/following-sibling::*
It works by finding all elements (labels) with the text you want and then looking to the next sibling in the HTML. Without a specific URL to see I can't help any further.
This is a generalised version you can make more specific versions by replacing the asterisks is tag types, and you can navigate differently by replacing the axis following sibling with something else.
I use xPaths in import.io to make APIs for this kind of thing all the time, It's just a matter of finding a xPath that's generic enough to find the HTML no matter where it is on the page, but being specific enough to get the right data.