I would like to write crawler which supports cookies storing operation and sessions. There are two different implementations of java headless browser. HtmlUnit has better support of javascript and perhaps html parsing. But is there are any reason to use HttpUnit for performance of crawler?
There is a relevant article here, from one of the HtmlUnit developers.
It basically says that, apart from Javascript support, HtmlUnit is more high level that HttpUnit. HtmlUnit also seems to be more actively developed (2 releases in 2014 while HttpUnit has not been updated since 2008).
Related
I'm digging into Node.js now and the whole idea seems brilliant to me. But I'm interested in what the benefits of using Node.js are when developing "traditional" sites with a bit of AJAX and no realtime features. When I say traditional, I mean the sites that one usually builds using MVC frameworks on platforms like PHP, ASP.NET, etc.
I know that the Express framework is popular, but the question is more about what I would gain by switching to Node.js rather than simply "Can I do MVC in Node?".
Node has the advantage of
having a rich open source community with third party modules that solve most problems
having a low level API with a minimal amount of "default" bloat
reducing language context switching
having a decent level of performance
allowing you to manipulate the HTTP server programatically within your application
I guess this url: How to decide when to use Node.js? -is all you need.I am making this as community wiki.
Can anyone help me in understanding the difference between Selenium RC and WebDriver and which one is better and why?
Selenium uses JavaScript to automate web pages. This lets it interact very tightly with web content, and was one of the first automation tools to support Ajax and other heavily dynamic pages. However, this also means Selenium runs inside the JavaScript sandbox. This means you need to run the Selenium-RC server to get around the same-origin policy, which can sometimes cause issues with browser setup.
WebDriver on the other hand uses native automation from each language. While this means it takes longer to support new browsers/languages, it does offer a much closer ‘feel’ to the browser. If you’re happy with WebDriver, stick with it, it’s the future. There are limitations and bugs right now, but if they’re not stopping you, go for it.
Selenium Benefits over WebDriver
Supports many browsers and many languages, WebDriver needs native implementations for each new languagte/browser combo.
Very mature and complete API
Currently (Sept 2010) supports JavaScript alerts and confirms better
Benefits of WebDriver Compared to Selenium
Native automation faster and a little less prone to error and browser configuration
Does not Requires Selenium-RC Server to be running
Access to headlessHTMLUnit can allow really fast tests
Great API
It's explained here.
Selenium-RC uses JavaScript to automate web pages. Therefore it is constrained by what you can do with JavaScript, specifically, it is constrained to the JavaScript sandbox. It also requires the Selenium-RC server. It supports many browsers and many languages.
WebDriver uses native automation and does not have the sandbox constraints of Selenium-RC. It's a little faster and does not require a server.
I'm trying to connect to an application that uses Comet and is pretty heavy on Javascript and Comet. I've gone as far as I can go in Firebug, HTTP Header examination and am trying to see what's coming over the wire by writing something using Ruby Mechanize.
However, since I have no client run-time, my approach is to mimic the HTTP requests going back and forth (doing this using Ruby Mechanize). I'm looking at the logs and comparing them to LiveHTTPHeader output and it's very similar but the server isn't responding (I don't have access to the server side code).
Are there tools that could help? Has anyone tried simulating the DOM and Javascript runtime using something like Rhino or is that just asking for pain?
The only sane way I've found to run automated tests on web apps involving substantial Javascript (w/ or w/o Comet) is selenium rc -- basically, mechanizing/automating a real browser from your favorite programming language. (There may be other approaches with a similar architecture, but Selenium is popular and it's what I'm familiar with). Simulating browser's DOM and JS is just too painful -- been there, tried that, failed miserably;-).
Visual Studio 2010 Ultimate edition provides very good testing support for web applications. I had tried the web load test and it was impressing.
Many of the upcoming generation of browsers (FF 3.1, IE8) are going to support cross-domain XMLHttpRequests in one fashion or other (with security concerns, as long as the server opts in, etc).
Is the same bit of functionality going to be in WebKit?
FF: https://developer.mozilla.org/en/Cross-Site_XMLHttpRequest
IE: http://blogs.msdn.com/ie/archive/2008/06/23/securing-cross-site-xmlhttprequest.aspx
Basic support for this was added to WebKit in May (see this patch). There are have been a number of other patches since then cleaning it up and refactoring bits of WebKit to deal with the changes entailed as well as tracking changes to the spec. Since the spec had changes recently (and webkit was updated with them 3 days ago) I think it is safe to assume no currently shipping browsers support it, but that most of them will in the future, and the current WebKit nightlies are tracking the standard fairly closely.
I think this is really up to the standard (http://www.w3.org/TR/XMLHttpRequest/) rather than the browsers framework, or javascript engine.
In fact I fully disagree with Microsofts decision to implement their own stuff that has nothing to do with the W3C standard. The web is a mess today mostly because of Microsofts ugly implementation of things.
As per WebKit, they seem to be pretty up-to-date with W3C.
Here's also a good article about this: http://ajaxian.com/archives/the-fight-for-cross-domain-xmlhttprequest
If you're looking for other ways to communicate cross-domain using ajax style (without using the XMLHttpRequest object), you should check out JSONP, it is currently fully supported on all browsers.
I always have to check each and every browser to see if my website would work. Is there a website where I can check it with?
Update:
I don't really want just screenshots (which what browsershots do), I want to actually test the posting of my script.
You want a web site to check your web site for javascript compatibility? How would you expect it to know how to exercise your interface to trigger the proper interactions? Or are you thinking of it doing some sort of static code analysis? I think you are better off coding against a framework that has solved most of the browser-dependent idiosyncrasies and using it to check for browser capabilities before you use them. jQuery, MooTools, Prototype/Scriptaculous, etc. go a long way in solving these problems for javascript.
Note that you still need to worry about rendering your site, but you already have several answers for how to go about doing that based on web sites. Personally, I just maintain IE/Safari/FF/Opera/Chrome on my workstation and do significant checking in IE/FF and basic checking in Safari/Opera/Chrome.
Even when there exist websites that allow you to see a static snapshot of your site in several browsers, you should really test your page on them yourself, because there can be subtle, and not so subtle, bugs and differences that are only apparent when interacting with the webpage.
You can cover yourself quite a lot by testing in
A Gecko engine browser (Firefox)
A Webkit engine browser (Chrome, Safari, Konqueror)
Opera
AND IE6+
John Resig recommends checking the Yahoo graded browser support documentation.
If you write unit tests for your javascript, you could use testswarm http://testswarm.com
There are multiple options:
http://ipinfo.info/netrenderer/
These site will let you run multiple browsers and version without installing. You only need to install a plugin
http://spoon.net/browsers/
There are plenty of sites, just Google/Bing for browser compatibility check.
http://browsershots.org/ is a good one.
Although most of them just take a snapshot of the site, you might have to do the manual check for things like menus and dynamic content.
BrowserShots might do what you want if you can tell by rendering a particular URL whether or not things will work as expected.
In light of your update, you could still use BrowserShots by creating a page which tests each of your scripts and renders 'pass' or 'fail' as its content depending on whether they work or not.
Failing that, Multiple IE is quite useful for running various versions of IE on one PC which can otherwise be problematic.