When trying to select a list element's option I attempted to do:
myvar=ie.select_list(:id, 'myid').option(:text, 'mytext').select
But for some reason while I'm using Watir in irb to access the website and attempting to manipulate any of the items I get this exception.
Watir::Exception::UnknownObjectException: Unable to locate element...etc
I'm looking at page in the browser but using .html isn't showing the full page. It looks like the rest of the page is hidden and I'm not sure how to get into/around this.
irb(main):011:0> ie.html
=> "<HTML><HEAD><TITLE>My Title</TITLE>\r\n
<SCRIPT language=JavaScript type=text/javascript src=\"../../script.js\"></SCRIPT>\r\n</HEAD><FRAMESET id=mainFrameSet name=mainFrameSet rows=100%,0%><FRAME id=frmMain src=\"DefaultT.cfm?ID=2197024\" name=frmMain><FRAME id=frmHidden src=\"Dummy.html\" name=frmHidden scrolling=no></FRAMESET></HTML>"
EDIT:
Looking at this in retrospect I have changed the title so it would more accurately address the issue I was having. It was difficult for a new waiter user to find information like on Watir and Frames. The original title was something like "Using Watir On An Encrypted Site". I have severely edited the question to get to the essence of what I was asking. I can't thank those enough who attempted to answer the ramblings of a new Ruby user with minimal knowledge of the Web and programming in general. Please see previous revisions if necessary.
Based on the html you added, your webpage is using frames. Unlike other elements, you have to explicitly specify the frames you want to use.
You probably want the frame with id 'frmMain', so try:
myvar=ie.frame(:id, 'frmMain').select_list(:id, 'myid').option(:text, 'mytext').select
My guess is that the element is not on the page when you try to access it.
Try this (please notice when_present):
myvar=ie.select_list(:id, 'myid').when_present.option(:text, 'mytext').select
More information: http://watirwebdriver.com/waiting/
Related
I am trying to scrape some data from the following website: https://xrpcharts.ripple.com/
The data I am interested in is Total XRP which you can see immediately below or to the side (depending on your browser) of the circle diagram. So what I first did was inspect the element I am interested in. So I see that it is inside <div class="stat" inside span ng-bind="totalXRP | number:2" class="ng-binding">99,993,056,930.18</span>.
The number 99,993,056,930.18 is what I am interested in.
So I started in a scrapy shell and wrote:
fetch("https://xrpcharts.ripple.com")
I then used chrome to copy the Xpath by right clicking on that place of HTML code, the result chrome gave me was:
/html/body/div[5]/div[3]/div/div/div[2]/div[3]/ul/li[1]/div/span
Then I used the Xpath command to extract the text:
response.xpath('/html/body/div[5]/div[3]/div/div/div[2]/div[3]/ul/li[1]/div/span/text()').extract()
but this gave me an empty list []. I really do not understand what I am doing wrong here. I think I am making an obvious mistake but I dont see it. Thanks in advance!
The bottom line is: you cannot expect the page you see in the browser to be the same page Scrapy would download and have available to work with. Scrapy is not a browser.
This page is quite dynamic and complex and is constructed with the help of multiple asynchronous requests bringing in both the logic and the data. There is also JavaScript executed in the browser that plays an important role in forming and supporting the HTML document object tree.
Scrapy does not have all these things, the thing you get when you do fetch() is just the very first initial "bare bones" HTML page without all the "dynamic content".
I want to click a link with Mechanize that I select with xpath (nokogiri).
How is that possible?
next_page = page.search "//div[#class='grid-dataset-pager']/span[#class='currentPage']/following-sibling::a[starts-with(#class, 'page')][1]"
next_page.click
The problem is that nokogiri element doesn't have click function.
I can't read the href (URL) and send get request because the link has onclick function defined (no href attribute).
If that's not possible, what are the alternatives?
Use page.at instead of page.search when you're trying to find only one element.
You can make your selector simpler (shorter) by using CSS selector syntax:
next_page = page.at('div.grid-dataset-pager > span.currentPage + a[class^="page"]')
You can construct your own Link instance if you have the Nokogiri element, page, and mechanize object to feed the constructor:
next_link = Mechanize::Page::Link.new( next_page, mech, page )
next_link.click
However, you might not need that, because Mechanize#click lets you supply a string with the text of the anchor/button to click on.
# Assuming this link text is unique on the page, which I suspect it is
mech.click next_page.text
Edit after re-reading the question completely: However, none of this is going to help you, because Mechanize is not a web browser! It does not have a JavaScript engine, and thus won't (can't) execute your onclick for you. For this you'll need to use Ruby to control a real web browser, e.g. using Watir or Selenium or Celerity or the like.
In general you would do:
page.link_with(:node => next_link).click
However like Phrogz says, this won't really do what you want.
Why don't you use a hpricot element instead? Mechanize can click on a hpricot element as long as the link has a 'src' or 'href' attribute. Try something along these lines:
page = agent.get("http://www.example.com")
next_page = agent.click((page/"//your/xpath/a"))
Edit After reading Phrogz answer I also realized that this won't really do it. Mechanize doesn't support Javascript yet. With this in mind you have 3 options.
Use a library that controls a real web browser. See #Phrogz answer.
Use Capybara which is an integration testing library but can also be used as a stand alone crawler. I've done this successfully with HTMLUnit which is a also an integration testing library in Java. Capybara comes with Selenium support by default though it also supports Webkit via an external gem. Capybara interprets Javascript out of the box. This blog post might help.
Grok the page that you intend to crawl and use something like HTTPFox to monitor what the onclick Javascript function does and replicate this in your Mechanize script.
Good luck.
Not sure if what I want to do is possible, but what I am hoping to do is somehow gather certain pieces of text from a website, remove the header, footer, background, all formatting, and place it into my application in a scrollview or something similar...
I'll give you an example... Imagine I was making wikipedia's iPhone app, I want to download the information about the wiki on dogs, without the header, side bars etc, just the text. How would I go about doing this?
I understand that for this I have not provided any example code or what I've tried or started, but that's just because in this case I'm lost! That doesn't mean I want full chunks of code either. Any help will do. If this doesn't work, I will just have to make a 'mobile optimised' version of the webpages I want to include in my app.
Thanks
(Edit: the term I was trying to use was 'strip the web page of its HTML coding')
You may be going about this the wrong way, or perhaps even asking the wrong question.
Does the target website have an API or datafeed of some kind?
Can you get the information you need in JSON or XML format directly from the site?
I think you've misunderstood the technology. HTML is merely the framwork on which the formatting and data is hung.
Parsing the HTML page seems like an awfully big headache, I doubt you'll ever be able to get it to work, because almost all sites these days are partially or wholly generated on the server side, the page is only the result.
Some sites hide the information in memory and others get it dynamically through ajax for example, which means that simply trying to get the data by parsing the HTML will get zero data.
Another issue you should be aware of though, is that simply copying the data from generated websites may open yourself up to copyright issues.
You have to parse the html code and search for the part that you want and "throw" away the part that you do not need. This is more or less like bruteforcing and the code of the website should not change otherwise you are screwed. So you have to write the parser by hand with this method. But maybe there is a atom or rss feed and you can parse this one. This will be much more easier and you are not depending on the website layout because the rss/atom feed is just about the data. For parsing rss you could try out NSXMLParser.
And then you have to make a valid html page out of the data and present it in the UIWebView
Does anyone know how to delete an element from the source using Watir? There doesn't seem to be a method for removing elements. Perhaps I'm missing something.
If you know JavaScript, you could execute any JavaScript code on the page.
Example:
browser.execute_script("some javascript code")
I am not a JavaScript ninja, but this question could help you: JavaScript: remove element by id.
Remove elements by css:
browser.execute_script("[...document.querySelectorAll('.some.class')].map(e => {e.parentNode.removeChild(e)})")
We can remove it with javascript. Here's an example to remove a breadcrumbs div element but it's id:
browser.execute_script("bd = document.getElementById('breadcrumbs'); bd.parentNode.removeChild(bd);")
The Purpose for Watir is to do web testing, which is to say drive the browser as if a user was interacting with it. That means doing the things a user could do, clicking on stuff, filling in input fields, etc. It also means being able to verify what is there on the screen that the user can see or interact with.
Since a user cannot delete elements, there is no means by which to do that using the tool.
If the application provides a way for users to 'remove' or 'delete' something, like closing a simulated window, removing a tab etc, then you need to do that by simulating what the user would do (usually clicking on some specific element) in order for that to happen.
I'm writing a sample test with Watir where I navigate around a site with the IE class, issue queries, etc..
That works perfectly.
I want to continue by using PageContainer's methods on the last page I landed on.
For instance, using its HTML method on that page.
Now I'm new to Ruby and just started learning it for Watir.
I tried asking this question on OpenQA, but for some reason the Watir section is restricted to normal members.
Thanks for looking at my question.
edit: here is a simple example
require "rubygems"
require "watir"
test_site = "http://wiki.openqa.org/"
browser = Watir::IE.new
browser.goto(test_site)
# now if I want to get the HTML source of this page, I can't use the IE class
# because it doesn't have a method which supports that
# the PageContainer class, does have a method that supports that
# I'll continue what I want to do in pseudo code
Store HTML source in text file
# I know how to write to a file, so that's not a problem;
# retrieving the HTML is the problem.
# more specifically, using another Watir class is the problem.
Close browser
# end
Currently, the best place to get answers to your Watir questions is the Watir-General email list.
For this question, it would be nice to see more code. Is the application under test (AUT) opening a new window/tab that you were having trouble getting to and therefore wanted to try the PageContainer, or is it just navigating to a second page?
If it is the first one, you want to look at #attach, if it is the second, then I would recommend reading the quick start tutorial.
Edit after code added above:
What I think you missed is that Watir::IE includes the Watir::PageContainer module. So you can call browser.html to get the html displayed on the page to which you've navigated.
I agree. It seems to me that browser.html is what you want.