how to crawl web page with ajax elements - ajax

I want to crawl some web pages, like the following
http://www.youtube.com/user/koglin66/feed?filter=2
but there is a 'load more' button, it is related to an ajax request
http://www.youtube.com/channel_ajax?action_load_more_feed_items=1&activity_view=1&paging=1352148528&channel_id=UCCw8aVnsIeu9S6OPQyaQ14g
I want to crawl the whole page.
Manually, I have click on the button repeatedly until there is no more to load,
by automation, how can I crawl the whole page? thanks!

Yes, you can use Selenium IDE, or use other program/library with browser core to do click action. Like webkit, activex of IE.
And you can try FMiner http://www.fminer.com/, it can record and play human actions on browser to scrape data, but it's not free.

I recently faced same problem with other website I wanted to scrap. I use Java and after some research on the web I used Selenium IDE for firefox in which u can write Java Junit test cases which will automatically open the webpage and click buttons, fill up forms, etc.
It also supports C#,Python,Ruby,etc
I used it to click on Load More button and when the page was loaded completely after all clicks I saved it Manually.
You can download Selenium from their website and I found this youtube video useful too http://www.youtube.com/watch?v=twdDfDOrHC4

Related

Scraping an Angular website with selenium and C# returns the angular script, not the rendered web page

This seems to be by design as far as I can tell. Selenium can see the initially loaded HTML, but not the HTML after it's been massaged. I've tried IE, Chrome and PhantomJS and they all show the same behavior. So does the built-in Chrome debugger, until you inspect an element on the page, you can't query any of the rendered HTML.
I'm looking for any suggestions about how to scrape the web page. The only option I see right now is finding the chrome process, triggering the inspector, clicking inside, then running the Javascript. Needless to say, this sounds fragile.
I also haven't been able to find anything on capturing the Ajax calls from selenium so I can make them and capture the JSON. When tried copy / paste from the chrome network tab into selenium I got a missing application block message.
Does anyone have any other advice?
Since I can replicate the issue in the chrome debugger, I don't see posting code as useful. It looks like a design decision.
Ralph
Sadly, I wasn't able to do things in a straightforward way. Instead, I used Selenium to do the login and navigate to the page, then use windows API calls to click inside the window send ^a^c to copy the data and an absolute location to click on the button to go to the next page.
The site is set up so that ^a^c copies the raw data for this site. I don't know if that's standard for Angular or not.
Fragile, but it works.

Switching between the browser tabs using capybara web

I'm trying to automate an ecommerce website, wherein when I click on a particular link it creates multiple browser tabs and related pages will be displayed within the respected tabs.
Problem here is I want to switch between these browser tabs to automate the web pages within that tabs, but I don't know how to switch between browser tabs using Capybara.
I'm using Capybara with Ruby.
From the java answer you need to send Command and T keys to open a new tab.

Handling the IE ActiveX Popup window using Watir

Is there any way to programmatically click on the Yes button in the IE ActiveX Popup window using Watir.
Per the documentation, this is not possible:
Watir will drive web applications that are served up as HTML pages in a web browser. Watir will not work with ActiveX plugin components, Java Applets, Macromedia Flash, or other plugin applications. To determine whether Watir can be used to automate a part of a web application, right click on the object and see if the View Source menu option is available. If you can view the HTML source, that object can be automated using Watir

How to detect flash content using selenium-wedriver?

I'm automating a web site; entire website is in html except few 'Copy' buttons are in flash. As there is no element for that button, I couldn't verify it's functionality (I have to check whether it's clickable & clicking on it results into coping contents in clipboard). Is there any way, we can automate that button's functionality by adding java script snippet or something?
I'm using ruby api's for automating the site.

Getting Browser Autocomplete of Ajax Submitted Form Fields Working in IE

I have a page that is heavily managed by ajax, and used all day by my clients employees for data entry.
Before a merger the client was using Firefox, but has had to change to IE8 now.
Firefox would save the form inputs when the forms on this page where submitted via ajax, IE8 doesn't do this natively.
Having the forms now not auto-complete has quite an effect on the efficiency the employees are able to use these forms.
The question:
Is there anyway I can get IE8 to save form inputs submitted via ajax to be later used for completion?
Without a browser solution I may have to goto a solution like storing the inputs in a database and running a data driven autocomplete...
Internet Explorer has an AutoComplete feature that may be of
assistance.
You can enable AutoComplete by going to Tools, then Internet Options.
On the Internet Options screen go to the Content tab, and click the
Settings button in the AutoComplete section. In the Settings check the
box next to Forms.
Hope that helps!

Resources