I'm having some problem accessing the form element on a page I'm getting using Mechanize.
username_page = agent.get 'https://member.carefirst.com/mos/#/home'
username_form = username_page.form_with(name: 'soloLoginForm')
username_form is nil. (username_page does have the page). The page definitely has a form and the field is #soloLoginForm, but username_page.body has no form element.
I'm guessing this is some async or dynamic issue. I'm able to grab the form with poltergeist, and I'm looking into doing all my form filling with capybara/poltergeist, but I wonder if there's something simple I'm missing that will allow me to use mechanize, as I'd planned.
It seems to be that 'https://member.carefirst.com/mos/#/home' uses Angular to render elements of the page and AngularJS requires Javascript support in the browser or in your case Capybara needs a driver with Javascript support.
Mechanize doesn't support Javascript, check this old SO thread. This is probably the reason why it works when you try with poltergeist.
Check: https://github.com/teamcapybara/capybara#drivers
As stated in the answer by #hernanvicente the page is using Angular and requires JS (which mechanize does not support). However, you really want to be using selenium with headless Chrome rather than Poltergeist nowadays. Poltergeist is equivalent to about a 7 year old version of Safari (due to PhantomJS, which it uses for rendering, being abandoned) so it doesn't support a lot of JS and CSS used in modern sites. Another advantage of using Selenium with Chrome is that you can easily swap between headless and headed to see what's happening when you need to debug things.
Related
I have been using phantomjs to render canvas element on my page using ThreeJs.
Earlier I was building the page myself, but that did not gave option of adding a background image through system url.
Now after that I started using a localhost url, which did not worked when I used page.evaluate();
BUT to my surprise when I use Ruby/Watir browser with selenium to do the same operation using execute_script method, it works.
I want to know what is it doing differenty that I implement in phantomjs script instead of having watir/selenium etc.
Thanks in advance.
I'm scrapping some HTML pages with Rails, using Nokogiri.
I had some problems when I tried to scrap an AngularJS page because the gem is opening the HTML before it has been fully rendered.
Is there some way to scrap this type of page? How can I have the page fully rendered before scraping it?
If you're trying to scrape AngularJS pages in a fully generic fashion, then you're likely going to need something like what #tadman mentioned in the comments (PhantomJS) -- some type of headless browser that fully processes the AngularJS JavaScript and opens the DOM up to inspection afterwards.
If you have a specific site or sites that you are looking to scrape, the path of least resistance is likely to avoid the AngularJS frontend entirely and directly query the API from which the Angular code is pulling content. The standard scenario for many/most AngularJS sites is that they pull down the static JS and HTML code/templates, and then they make ajax calls back to a server (either their own, or some third party API) to get content that will be rendered. If you take a look at their code, you can likely directly query whatever angular is calling (i.e. via $http, ngResource, or restangular). The return data is typically JSON and would be much easier to gather vs. true scraping in the post-rendered html result.
You can use:
require 'phantomjs'
require 'watir'
b = Watir::Browser.new(:phantomjs)
b.goto URL
doc = Nokogiri::HTML(b.html)
Download phantomjs in http://phantomjs.org/download.html and move the binary for /usr/bin
I am trying to fill a web form with my script it will fill web forms which is having <form> </form>. But my site does not have this in HTML. Is there any way to fill this in firefox using AutoIt?
However, _FF_AutoLogin($uName,$pwd,$url) will fail in such case. So I am using
_FFSetValue($uName,$formUID,"id")
_FFSetValue($pwd,$formPID,"id")
even this is not filling requirement. Can any one suggest me where i am going wrong. I am using latest version of mozilla along with mozrepl-addon.
Use _FFSetValueById to set the value of the element based on its ID.
I was trying to parse some HTML content from a site. Nokogiri works perfectly for content loaded the first time.
Now the issue is how to fetch that content which is loaded using AJAX. For example, there is a "see more" link and more items are fetched using AJAX, or consider a case for AJAX-based tabs.
How can I fetch that content?
You won't be able to parse anything that requires a JavaScript runtime to produce that content using Nokogiri. Nokogiri is a HTML/XML parser, not a web browser.
PhantomJS on the other hand is a web browser, albeit a special kind of browser ;) Take a look at that and have a play.
It isn't completely clear what you want to do, but if you are trying to get access to additional HTML that is loaded by AJAX, then you will need to study the code, figure out what URL is being used for the AJAX request, whether any session IDs or cookies have been set, then create a new URL that reproduces what AJAX is using. Request that, and you should get the new content back.
That can be difficult to do though. As #Nuby said, Mechanize could be good help, as it is designed to manage cookies and sessions for you in the background. Mechanize uses Nokogiri internally so if you request a page from Mechanize, you can use Nokogiri searches against it to drill down and extract any particular JavaScript strings. They'll be present as text, so then you can use regex or substring matches to get at the particular parameters you need, then construct the new URL and ask Mechanize to get it.
can I get the correct data/text that is displayed via AJAX using mechanize in ruby?
Or is there any other scripting gem that would allow me to do so?
Mechanized cannot read data displayed by JavaScript, because it does not implement a JavaScript engine (in other words, it can't run it). You'll need a browser to do that, or a program that automates a browser to do it for you. WATIR is one such program.
You can use WATIR with webdriver which is a console only, headless browser.