Let me set the stage for what I'm trying to accomplish. In a physics class I'm taking, my teacher always likes to brag about how impossible it is to cheat in her class, because all of her assignments are done through WebAssign. The way WebAssign works is this: Everyone gets the same questions, but the numbers used in the question are random variables, so each student has different numbers, thus a different answer. So I've been writing ruby scripts to solve the question's for people by just imputing your specific numbers.
I would like to automate this process using mechanize. I've used mechanize plenty of times before, but I'm having trouble logging in to the site. I'll submit the form and it returns the same page I was just on. You can take a look at the site's source code, at http://webassign.net, and I've also tried using the login at http://webassign.net/login.html with no luck either.
Let me follow all of this up with some ruby code that doesn't do what I want it to:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://www.webassign.net/login.html")
form = page.forms.last
puts "Enter your username"
form.WebAssignUsername = gets.chomp
puts "Enter your password (Don't worry, we don't save this)"
form.WebAssignPassword = gets.chomp
form.WebAssignInstitution = "trinityvalley.tx"
form.submit #=> Returns original page
If anyone really takes an interest in getting this to work, I would be more than happy to send them a working username and password.
The site could be checking that the Login post variable is set (see the login button). Try adding form.Login = "Login".
Have you tried to use agent.submit(form, form.buttons.first) instead of form.submit?
This worked for me when I tried to submit a form. I tried using form.submit first and it kept returning the original page.
Try setting the user agent:
agent = Mechanize.new do |a|
a.user_agent_alias = 'Mac Safari'
end
Some sites seem to require that.
Your question seems a little ambiguous, saying that you're not having any luck? What is the problem exactly? Are you getting a different response entirely than when you view the page in a browser? If so, then do what #cam says and analyzer the headers, you can do it in Firefox via an extension, or you can do it in Chrome natively. Either way, try to mimic the headers that you see in whatever browser you are doing in you mechanize user agent. Here is a script that I used to mimic the iTunes request headers when I was data-mining the app-store:
def mimic_itunes( mech_agent )
mech_agent.pre_connect_hooks << lambda {|headers|
headers[:request]['X-Apple-Store-Front'] = X_APPLE_STOREFRONT;
headers[:request]['X-Apple-Tz'] = X_APPLE_TZ;
headers[:request]['X-Apple-Validation'] = X_APPLE_VALIDATION;
}
mech_agent.user_agent = 'iTunes/9.1.1 (Windows; Microsoft Windows 7 x64 Business Edition (Build 7600)) AppleWebKit/531.22.7'
mech_agent
end
Note: the constants in the example are just strings... not really that important what they are, as long as you know you can add any string there
Using this approach, you should be able to alter/add any headers that the web application might need.
If this is not the problem that you are having, then post more in-depth details of what exactly is happening.
Related
so here it is:
I use ruby to get user input, the easy way..say I request 2 inputs:
input1 = gets.chomp
input2 = gets.chomp
Now I would like to send this information to say, a search engine that takes these two options separately and does the search. How can I do this? What API/Gems will be helpful for me in this case?
I know that i can take these 2 inputs and insert them into the url but its not that simple because according to the inputs the url structure is not constant..(I wouldn't want to use this way though..)
Its been a long time since I lat programmed in ruby, I know how to access webpages and things like that, but I want to manipulate and receive back. Any Ideas?
If you are talking about some front-end of a site without any API access or sophisticated JS logic, you could simply use mechanize gem which allows you to do something like:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://google.com/')
form = page.forms.first
form['field_name_1'] = input1
form['field_name_2'] = input2
page = agent.submit(form, form.buttons.first)
puts page
→ Check out the official documentation for more examples
If you are going to use third party REST API you should better try something like faraday or other popular gems (depending on your taste and particular task).
Please correct me if I misunderstood you.
From what I understand, you want to encode your two inputs in a URL, send them to an API and receive the results back.
You can use the Net::HTTP library from the Ruby stdlib. Here's the example with dynamic parameters from the docs:
uri = URI('http://example.com/index.html')
params = { :limit => 10, :page => 3 }
uri.query = URI.encode_www_form(params)
res = Net::HTTP.get_response(uri)
puts res.body if res.is_a?(Net::HTTPSuccess)
Or you can use some gems to wrap it up for you. HTTParty seems quite popular. You can do it as simple as
HTTParty.get('http://foo.com/resource.json', query: {limit: 10})
I'm trying to log into Google, so that I can scrape & migrate a private google group.
It doesn't seem to log in over SSL. Any ideas appreciated. I'm using Mechanize and the code is below:
group_signin_url = "https://login page to goolge, with referrer url to a private group here"
user = ENV['GOOGLE_USER']
password = ENV['GOOGLE_PASSWORD']
scraper = Mechanize.new
scraper.user_agent = Mechanize::AGENT_ALIASES["Linux Firefox"]
scraper.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
page = scraper.get group_signin_url
google_form = page.form
google_form.Email = user
google_form.Passwd = password
group_page = scraper.submit(google_form, google_form.buttons.first)
pp group_page
I worked with Ian (the OP) on this problem and just felt we should close this thread with some answers based on what we found when we spent some more time on the problem.
1) You can't scrape a Google Group with Mechanize. We managed to get logged in abut the content of the Google Group pages is all rendered in-browser, meaning that HTTP requests, such as issued by Mechanize, are returned with a few links and no actual content.
We found that we could get page content by the use of Selenium (we used Selenium in Firefox, using the Ruby bindings).
2) the HTML element IDs/classes in Google Groups are obfuscated but we found that these Selenium commands will pull out the bits you need (until Google change them)
message snippets (click on them to expand messages)
find_elements(:class, 'GFP-UI5CCLB')
elements with name of author
find_elements(:class, 'GFP-UI5CA1B')
elements with content of post
find_elements(:class, 'GFP-UI5CCKB')
elements containing date
find_elements(:class, 'GFP-UI5CDKB') (and then use the attribute[:title] for a full length date string)
3) I have some Ruby code here which scrapes the content programmatically and uploads it into a Discourse forum (which is what we were trying to migrate to).
It's hacky but it kind of works. I recently migrated 2 commercially important Google Groups using this script. I'm up for taking on 'We Scrape Your Google Group' type work, please PM me.
I recently discovered SitePrism via the rubyweekly email.
It looks amazing. I can see its going to be the future.
The examples I have seen are mostly for cucumber steps.
I am trying to figure out how one would go about using SitePrism with rspec.
Assuming #home_page for the home page, and #login_page for the login_page
I can understand that
#home_page.load # => visit #home.expanded_url
however, the part I am not sure about, is if I think click on for example the "login" link, and the browser in Capybara goes to the login page - how I can then access an instance of the login page, without loading it.
#home_page = HomePage.new
#home_page.load
#home.login_link.click
# Here I know the login page should be loaded, so I can perhaps do
#login_page = LoginPage.new
#login_page.should be_displayed
#login_page.email_field.set("some#email.com")
#login_page.password_field.set("password")
#login_page.submit_button.click
etc...
That seems like it might work. So, when you know you are supposed to be on a specific page, you create an instance of that page, and somehow the capybara "page" context, as in page.find("a[href='/sessions/new']") is transferred to the last SitePrism object?
I just feel like I am missing something here.
I'll play around and see what I can figure out - just figured I might be missing something.
I am looking through the source, but if anyone has figured this out... feel free to share :)
What you've assumed turns out to be exactly how SitePrism works :) Though you may want to check the epilogue of the readme that explains how to save yourself from having to instantiate page objects all over your test code. Here's an example:
# our pages
class Home < SitePrism::Page
#...
end
class SearchResults < SitePrism::Page
#...
end
# here's the app class that represents our entire site:
class App
def home
Home.new
end
def results_page
SearchResults.new
end
end
# and here's how to use it:
#first line of the test...
#app = App.new
#app.home.load
#app.home.search_field.set "sausages"
#app.home.search_button.click
#app.results_page.should be_displayed
I have to say I am new both to Ruby and to RSpec. Anyway I completed one RSpec script but after refactoring it failed. Here is the original working version:
describe Site do
browser = Watir::Browser.new :ie
site = Site.new(browser, "http://localhost:8080/site")
it "can navigate to any page at the site" do
site.pages_names.each do |page_name|
site.goto(page_name)
site.actual_page.name.should eq page_name
end
end
browser.close
end
and here is the modified version - I wanted to have reported all the pages which were visited during the test:
describe Site do
browser = Watir::Browser.new :ie
site = Site.new(browser, "http://localhost:8080/site")
site.pages_names.each do |page_name|
it "can navigate to #{page_name}" do
site.goto(page_name)
site.actual_page.name.should eq page_name
end
end
browser.close
end
The problem in the latter case is that site gets evaluated to nil within the code block associated with 'it' method.
But when I did this:
...
s = site
it "can navigate to #{page_name}" do
s.goto(page_name)
s.actual_page.name.should eq page_name
end
...
the nil problem was gone but tests failed with the reason "browser was closed"
Apparently I am missing something very basic Ruby knowledge - because the browser reference is not working correctly in modified script. Where did I go wrong? What refactoring shall be applied to make this work?
Thanks for your help!
It's important to understand that RSpec, like many ruby programs, has two runtime stages:
During the first stage, RSpec loads each of your spec files, and executes each of the describe and context blocks. During this stage, the execution of your code defines your examples, the hooks, etc. But your examples and hooks are NOT executed during this stage.
Once RSpec has finished loading the spec files (and all examples have been defined), it executes them.
So...trimming down your example to a simpler form, here's what you've got:
describe Site do
browser = Watir::Browser.new :ie
it 'does something with the browser' do
# do something with the browser
end
browser.close
end
While visually it looks like the browser instance is instantiated, then used in the example, then closed, here's what's really happening:
The browser instance is instantiated
The example is defined (but not run)
The browser is closed
(Later, after all examples have been defined...) The example is run
As O.Powell's answer shows, you can close the browser in an after(:all) hook to delay the closing until after all examples in this example group have run. That said, I'd question if you really need the browser instance at example definition time. Generally you're best off lazily creating resources (such as the browser instance) when examples need them as they are running, rather than during the example definition phase.
I replicated your code above using fake classes for Site and Watir. It worked perfectly. My only conclusion then is that the issue must lie with either one of the above classes. I noticed the Site instance only had to visit one page in your first working version, but has to visit multiple pages in the non working version. There may be an issue there involving the mutation happening inside the instance.
See if this makes a difference:
describe Site do
uri = "http://localhost:8080/site"
browser = Watir::Browser.new :ie
page_names = Site.new(browser, uri).page_names
before(:each) { #site = Site.new(browser, uri) }
after(:all) { browser.close }
pages_names.each do |page_name|
it "can navigate to #{page_name}" do
#site.goto(page_name)
#site.actual_page.name.should eq page_name
end
end
end
I am trying to use ruby and Mechanize to parse data on foursquare's website. Here is my code:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('https://foursquare.com')
page = agent.click page.link_with(:text => /Log In/)
form = page.forms[1]
form.F12778070592981DXGWJ = ARGV[0]
form.F1277807059296KSFTWQ = ARGV[1]
page = form.submit form.buttons.first
puts page.body
But then, when I run this code, the following error poped up:
C:/Ruby192/lib/ruby/gems/1.9.1/gems/mechanize-2.0.1/lib/mechanize/form.rb:162:in
`method_missing': undefined method `F12778070592981DXGWJ='
for #<Mechanize::Form:0x2b31f70> (NoMethodError)
from four.rb:10:in `<main>'
I checked and found that these two variables for the form object "F12778070592981DXGWJ" and "F1277807059296KSFTWQ" are changing every time when I try to open foursquare's webpage.
Does any one have the same problem before? your variables change every time you try to open a webpage? How should I solve this problem?
Our project is about parsing the data on foursquare. So I need to be able to login first.
Mechanize is useful for sites which don't expose an API, but Foursquare has an established REST API already. I'd recommend using one of the Ruby libraries, perhaps foursquare2. These libraries abstract away things like authentication, so you just have to register your app and use the provided keys.
Instead of indexing the form fields by their name, just index them by their order. That way you don't have to worry about the name that changes on each request:
form.fields[0].value = ARGV[0]
form.fields[1].value = ARGV[1]
...
However like dwhalen said, using the REST API is probably a much better way. That's why it's there.