I'd like to scrape a website that dynamically generates more content as I scroll down the web browser. I have seen a related post, Auto-Scroll in FireFox, but it doesn't answer my question.
Is it possible to scroll a webpage until the end of page (no more content is generated by the web server) or until a few refreshes using Watir web-driver?
I have recently tried to do something like that, and to my surprise looks like webdriver does not have support for scrolling. I did find two workarounds.
This will send space to the browser, and it will scroll down (works on twitter.com, for example):
browser.send_keys :space
This will scroll to the element, and if the element is at the bottom of the page, it will load more content:
browser.element.wd.location_once_scrolled_into_view
Related
I am new to Nuxt and decided to give it a shot and to make a webpage.
However I have this weird thing where when I click on a link the Firefox browser shows the page at the bottom, while Chrome scrolls automaticly up.
This only happens with Firefox and have no idea what is causing it, does anyone have a idea?
Ps: I am using the footer component in the default lay-out
Figured out that when i go directly to the url Firefox shows the top at the page as it should, but when using nuxt link it loads and shows the bottom
This seems to be by design as far as I can tell. Selenium can see the initially loaded HTML, but not the HTML after it's been massaged. I've tried IE, Chrome and PhantomJS and they all show the same behavior. So does the built-in Chrome debugger, until you inspect an element on the page, you can't query any of the rendered HTML.
I'm looking for any suggestions about how to scrape the web page. The only option I see right now is finding the chrome process, triggering the inspector, clicking inside, then running the Javascript. Needless to say, this sounds fragile.
I also haven't been able to find anything on capturing the Ajax calls from selenium so I can make them and capture the JSON. When tried copy / paste from the chrome network tab into selenium I got a missing application block message.
Does anyone have any other advice?
Since I can replicate the issue in the chrome debugger, I don't see posting code as useful. It looks like a design decision.
Ralph
Sadly, I wasn't able to do things in a straightforward way. Instead, I used Selenium to do the login and navigate to the page, then use windows API calls to click inside the window send ^a^c to copy the data and an absolute location to click on the button to go to the next page.
The site is set up so that ^a^c copies the raw data for this site. I don't know if that's standard for Angular or not.
Fragile, but it works.
This is the Firefox error page when you're offline. If you look at the red marked area, you see the svg background and some scripts are loaded from chrome://browser/content/.... What does chrome do there?
Firefox uses the chrome:// protocol to access Mozilla's chrome system - user interface elements and other resources (chrome is not referring to Google Chrome here).
From developer.mozilla.org:
What is chrome
Chrome is the set of user interface elements of the application window that are outside the window's content area. Toolbars, menu bars, progress bars, and window title bars are all examples of elements that are typically part of the chrome.
Using a Chrome URL we can access those elements with a browser.
For example, we can access the library menu with:
chrome://browser/content/places/places.xul
and the extensions menu with:
chrome://mozapps/content/extensions/extensions.xul
(URLs taken from http://kb.mozillazine.org/Chrome_URLs)
So, it seems that the chrome://browser/content/.. links are fetching Firefox resources needed to display the error page.
It's possible to take screenshots of web pages in Firefox using the developer toolbar. Is it possible to do this programmatically too, e.g., from the command line? I've tried with Selenium, but with no luck.
edit: I know it's possible to take screenshots using Selenium, but this only gets you the full screen. Using the developer toolbar in Firefox, one can use CSS selectors to select only parts of a page, which is what I want to do (and forgot to specify, sorry). What I've read online is that Selenium cannot access the developer toolbar because it's not part of the DOM.
I want to crawl some web pages, like the following
http://www.youtube.com/user/koglin66/feed?filter=2
but there is a 'load more' button, it is related to an ajax request
http://www.youtube.com/channel_ajax?action_load_more_feed_items=1&activity_view=1&paging=1352148528&channel_id=UCCw8aVnsIeu9S6OPQyaQ14g
I want to crawl the whole page.
Manually, I have click on the button repeatedly until there is no more to load,
by automation, how can I crawl the whole page? thanks!
Yes, you can use Selenium IDE, or use other program/library with browser core to do click action. Like webkit, activex of IE.
And you can try FMiner http://www.fminer.com/, it can record and play human actions on browser to scrape data, but it's not free.
I recently faced same problem with other website I wanted to scrap. I use Java and after some research on the web I used Selenium IDE for firefox in which u can write Java Junit test cases which will automatically open the webpage and click buttons, fill up forms, etc.
It also supports C#,Python,Ruby,etc
I used it to click on Load More button and when the page was loaded completely after all clicks I saved it Manually.
You can download Selenium from their website and I found this youtube video useful too http://www.youtube.com/watch?v=twdDfDOrHC4