I am using the save_screenshot(filename) method and noticed a difference between my Selenium option and the PhantomJS option (with capybara and poltergeist).
Selenium will take a screen shot of the whole page. PhantomJS only captures part of the page, cropping off the bottom.
How do I specify in the PhantomJS driver that I would like the whole page, not just the partial page?
I found my solution.
save_screenshot(filename,:full => true)
Related
This seems to be by design as far as I can tell. Selenium can see the initially loaded HTML, but not the HTML after it's been massaged. I've tried IE, Chrome and PhantomJS and they all show the same behavior. So does the built-in Chrome debugger, until you inspect an element on the page, you can't query any of the rendered HTML.
I'm looking for any suggestions about how to scrape the web page. The only option I see right now is finding the chrome process, triggering the inspector, clicking inside, then running the Javascript. Needless to say, this sounds fragile.
I also haven't been able to find anything on capturing the Ajax calls from selenium so I can make them and capture the JSON. When tried copy / paste from the chrome network tab into selenium I got a missing application block message.
Does anyone have any other advice?
Since I can replicate the issue in the chrome debugger, I don't see posting code as useful. It looks like a design decision.
Ralph
Sadly, I wasn't able to do things in a straightforward way. Instead, I used Selenium to do the login and navigate to the page, then use windows API calls to click inside the window send ^a^c to copy the data and an absolute location to click on the button to go to the next page.
The site is set up so that ^a^c copies the raw data for this site. I don't know if that's standard for Angular or not.
Fragile, but it works.
It's possible to take screenshots of web pages in Firefox using the developer toolbar. Is it possible to do this programmatically too, e.g., from the command line? I've tried with Selenium, but with no luck.
edit: I know it's possible to take screenshots using Selenium, but this only gets you the full screen. Using the developer toolbar in Firefox, one can use CSS selectors to select only parts of a page, which is what I want to do (and forgot to specify, sorry). What I've read online is that Selenium cannot access the developer toolbar because it's not part of the DOM.
I try to get a resized 400x400 screenshot of google. I try this both in Selenium and Watir with no success.
require 'watir-webdriver'
b = Watir::Browser.new
b.goto 'google.com'
b.window.resize_to(400,400)
b.driver.save_screenshot("screenshot.jpg")
I always get the screenshot with the original browser size.
Any idea how can I save it resized to 400x400?
At present, WebDriver defines a screenshot as a "full page screenshot". That is, the entire DOM should be represented by the image generated by the save_screenshot method. The fact that the Chrome driver doesn't generate a screenshot of the full DOM is a bug in the Chrome driver. So the real answer is that there is no way to generate a screenshot of only the browser view port using WebDriver.
Having said that, it might be possible to use other programmatic means to accomplish this, depending on your OS. On Windows, for example, it would be pretty easy to get the desktop window's window handle (HWND), capture the image of the desktop (using the Windows GetDesktop and PrintWindow APIs), and cropping it using the coordinates supplied by the WebDriver Window API.
I was able to reproduce the problem with Firefox, but Chrome generates resized screenshot. My guess is that it is a bug in FirefoxDriver. Take a look if the problem is already reported in Selenium bug tracker and if not, report it.
I want to crawl some web pages, like the following
http://www.youtube.com/user/koglin66/feed?filter=2
but there is a 'load more' button, it is related to an ajax request
http://www.youtube.com/channel_ajax?action_load_more_feed_items=1&activity_view=1&paging=1352148528&channel_id=UCCw8aVnsIeu9S6OPQyaQ14g
I want to crawl the whole page.
Manually, I have click on the button repeatedly until there is no more to load,
by automation, how can I crawl the whole page? thanks!
Yes, you can use Selenium IDE, or use other program/library with browser core to do click action. Like webkit, activex of IE.
And you can try FMiner http://www.fminer.com/, it can record and play human actions on browser to scrape data, but it's not free.
I recently faced same problem with other website I wanted to scrap. I use Java and after some research on the web I used Selenium IDE for firefox in which u can write Java Junit test cases which will automatically open the webpage and click buttons, fill up forms, etc.
It also supports C#,Python,Ruby,etc
I used it to click on Load More button and when the page was loaded completely after all clicks I saved it Manually.
You can download Selenium from their website and I found this youtube video useful too http://www.youtube.com/watch?v=twdDfDOrHC4
I'd like to scrape a website that dynamically generates more content as I scroll down the web browser. I have seen a related post, Auto-Scroll in FireFox, but it doesn't answer my question.
Is it possible to scroll a webpage until the end of page (no more content is generated by the web server) or until a few refreshes using Watir web-driver?
I have recently tried to do something like that, and to my surprise looks like webdriver does not have support for scrolling. I did find two workarounds.
This will send space to the browser, and it will scroll down (works on twitter.com, for example):
browser.send_keys :space
This will scroll to the element, and if the element is at the bottom of the page, it will load more content:
browser.element.wd.location_once_scrolled_into_view