I'm trying to click the Settings button on the home page, but when I do I get this page back:
#<WWW::Mechanize::Page
{url
#<URI::HTTP:0x1023c5fc0 URL:http://www.facebook.com/editaccount.php?ref=mb&drop>}
{meta}
{title nil}
{iframes}
{frames}
{links}
{forms}>
which is.. kinda empty! Is there some problems with these iframes and frames stuff maybe?
As roja mentioned, following redirects might be what you need. Here's an example of how to do this:
#agent = Mechanize.new
#agent.redirect_ok = :all
#agent.follow_meta_refresh = :anywhere
Then you can pretty much ignore the fact that there's redirects involved - Mechanize will simply put you on the resulting page.
Facebook redirects me to: https://register.facebook.com/editaccount.php which I assume is the final destination. Assuming that WWW::Mechanize is set up to follow https redirects you should end up there too.
Much of facebook like most modern websites is generated by javascript which I think that WWW::Mechanize is unable to cope with, this could be the source of your problem. I recommend trying to scrape while appending "?_fb_noscript=1" to the url's you visit. This turns off much of facebooks javascript system and should enable a smoother ride for your little bot.
(Do remember this is only an idea and doubtless whatever you do is against facebooks usage policy and this makes you a "baddy." I don't condone such badness and beleve that baddies should be forced to go to bed early etc... ad nauseum)
Related
I moved a wiki install on lighttpd from https://www.example.com/wiki to a subdomain of https://wiki.example.com so I need to redirect anything wiki related to the new subdomain.
url.rewrite-once = (
"^/wiki" => "https://wiki.example.com",
)
This gives me an error 404 not found as the browser is still pointed to the old page.
In addition I would like to add a rule to handle pages people already have bookmarked such as sending
https://www.example.com/wiki/index.php?title=Main_Page
to
https://wiki.example.com/index.php?title=Main_Page
I ended up doing this:
url.redirect = ( "^/wiki/(.*)$ => https://wiki.example.net/$1",
"^/wiki/([^?]*)(?:\?(.*))?" => "https://wiki.example.net/index.php?title=$1&$2",
)
This works on 99% of the site. However there are a few forums threads that do not display correctly now because they are trying to redirect.
This one works and can view the forum normally
https://www.example.net/forums/showthread.php?796166-Wiki-Skins
This one breaks and tries to redirect
https://www.twcenter.net/forums/showthread.php?796105-Wiki-Extensions-amp-Gadgets
While Stackoverflow is a good resource, please also try reading the primary source documentation for the tool you are asking about. In this case, that is lighttpd documentation.
You might consider using mod_redirect to redirect the client. See documentation and examples at https://redmine.lighttpd.net/projects/lighttpd/wiki/Docs_ModRedirect
I'm trying to implement a 'Remember Me' feature in the new Padrino 0.11 Admin interface, but having a little bit of trouble due to the differences between it and Rails. Basically, I'm following along with http://railscasts.com/episodes/274-remember-me-reset-password.
I've managed to get the Remember Me and auth_token working handily, and I can see the cookie in the Dev console when I go to look at it. I am having a lot of trouble figuring out how to get the application to do autologin on the cookie when it is present though. I'm sure it's something stupid, but this is where I'm up to.
For instance, I've got the actual Remember Me creating an auth_token and setting it fine to the cookie (I can see it on localhost) in the dev console on Chrome via this in the sessions controller.
admin/controllers/sessions
post :create do
if account = Account.authenticate(params[:email], params[:password])
set_current_account(account)
if params[:remember_me]
response.set_cookie('da_app', value: account.auth_token,
expires: (Time.now + 1.year + 1.day))
end
flash[:success] = "You've successfully logged in as #{account.name}."
redirect url(:base, :index)
else
params[:email], params[:password] = h(params[:email]), h(params[:password])
flash[:error] = pat('login.error')
redirect url(:sessions, :new)
end
end
However, due to my inexperience with padrino, a little stumped as to where I'd put the bit of logic which triggers before an incoming request, checks for the cookie and then logs the user in. I tried the following, which is not perfect but which is definitely not working (though not sure why... =< ) and in fact, the code block to detect the cookie does not even seem to be firing (which seems pretty basic.).
admin/app.rb (not sure this is the right place for it actually)
before '/*' do
if request.cookies['da_app'].exists?
set_current_account(Account.find_by_auth_token(request.cookies['da_app']))
redirect url(:base, :index)
end
end
So, I'm sure it's probably dead simple to solve but a bit stumped on this one (and also, am really trying to avoid using a gem plugin like padrino-warden or the like at the moment and implement this from scratch as an exercise.).
(Also, bonus karma points on helping solve this one as I'm implementing this as part of some pro bono work for a global conservation charity.)
I am using the Watir-Webdriver library in Ruby to check some pages. I know I can connect through a proxy using
profile = Selenium::WebDriver::Firefox::Profile.new#create a new profile
profile.proxy = Selenium::WebDriver::Proxy.new(#create proxy data for in the profile
:http => proxyadress,
:ftp => nil,
:ssl => nil,
:no_proxy => nil
)
browser = Watir::Browser.new :firefox, :profile => profile#create a browser window with this profile
browser.goto "http://www.example.com"
browser.close
However, when wanting to connect to the same page multiple times using different proxies, I have to create a new browser for every proxy. Loading(and unloading) the browser takes quite some time.
So, my question: Is there any way to change, using webdriver in ruby, the proxy adress Firefox uses to connect through while keeping the browser open?
If you want to test whether a page is blocked when accessed through a proxy server, you can do that through a headless library. I recently had success using mechanize. You can probably use net/http as well.
I am still not sure why you need to change the proxy server for a current session.
require 'Mechanize'
session = Mechanize.new
session.set_proxy(host, port, user, pass)
session.user_agent='Mac Safari'
session.agent.robots = true #observe of robots.txt rules
response = session.get(url)
puts response.code
You need to supply the proxy host/port/user/pass (user/pass are optional), and the url. If you get an exception, then the response.code is probably not friendly.
You may need to use an OS level automation tool to automate going through the FF menus to change the setting as a user would.
For windows users there is the option of either the new RAutomation tool, or AutoIT. both can be used to automate things at the OS UI level, which would let you go into the browser settings and change the proxy there.
Still I'd think if you are checking a larger number of sites that the overhead to change the proxy settings would not be that much compared to all of the site navigation and waiting for pages to load etc.
Unless you are currently taking a 'row traverse' approach and changing proxy settings multiple times for each site you are checking? If that's the case I would go towards more of a by-column method (if we were to presume each column is a proxy, and each row is a site) and fire up the browser for one proxy, check all the sites, then change the proxy and re-check all the sites. That way you'd only be changing the proxy settings once for each proxy which should not add that much overhead to your script.
It might mean a little more work with storing and then reporting results at the end (if you had been writing them out a line at a time) but that's what hashes or arrays are for.
I am working on a website hosted on microsoft's office live service. It has a contact form enabling visitors to get in touch with the owner. I want to write a Ruby script that sits on a seperate sever and which the form will POST to. It will parse the form data and email the details to a preset address. The script should then redirect the browser to a confirmation page.
I have an ubuntu hardy machine running nginx and postfix. Ruby is installed and we shall see about using Thin and it's Rack functionality to handle the script. Now it's come to writing the script and i've drawn a blank.
It's been a long time and if i remember rightly the process is something like;
read HTTP header
parse parameters
send email
send redirect header
Broadly speaking, the question has been answered. Figuring out how to use the answer was more complicated than expected and I thought worth sharing.
First Steps:
I learnt rather abruptly that nginx doesn't directly support cgi scripts. You have to use some other process to run the script and get nginx to proxy requests over. If I was doing this in php (which in hind sight i think would have been a more natural choice) i could use something like php-fcgi and expect life would be pretty straight forward.
Ruby and fcgi felt pretty daunting. But if we are abandoning the ideal of loading these things at runtime then Rack is probably the most straight forward solution and Thin includes all we need. Learning how to make basic little apps with them has been profoundly beneficial to a relative Rails newcomer like me. The foundations of a Rails app can seem hidden for a long time and Rack has helped me lift the curtain that little bit further.
Nonetheless, following Yehuda's advice and looking up sinatra has been another surprise. I now have a basic sinatra app running in a Thin instance. It communicates with nginx over a unix socket in what i gather is the standard way. Sinatra enables a really elegant way to handle different requests and routes into the app. All you need is a get '/' {} to start handling requests to the virtual host. To add more (in a clean fashion) we just include a routes/script.rb into the main file.
# cgi-bin.rb
# main file loaded as a sinatra app
require 'sinatra'
# load cgi routes
require 'routes/default'
require 'routes/contact'
# 404 behaviour
not_found do
"Sorry, this CGI host does not recognize that request."
end
These route files will call on functionality stored in a separate library of classes:
# routes/contact.rb
# contact controller
require 'lib/contact/contactTarget'
require 'lib/contact/contactPost'
post '/contact/:target/?' do |target|
# the target for the message is taken from the URL
msg = ContactPost.new(request, target)
redirect msg.action, 302
end
The sheer horror of figuring out such a simple thing will stay with me for a while. I was expecting to calmly let nginx know that .rb files were to be executed and to just get on with it. Now that this little sinatra app is up and running, I'll be able to dive straight in if I want to add extra functionality in the future.
Implementation:
The ContactPost class handles the messaging aspect. All it needs to know are the parameters in the request and the target for the email. ContactPost::action kicks everything off and returns an address for the controller to redirect to.
There is a separate ContactTarget class that does some authentication to make sure the specified target accepts messages from the URL given in request.referrer. This is handled in ContactTarget::accept? as we can guess from the ContactPost::action method;
# lib/contact/contactPost.rb
class ContactPost
# ...
def action
return failed unless #target.accept? #request.referer
if send?
successful
else
failed
end
end
# ...
end
ContactPost::successful and ContactPost::failed each return a redirect address by combining paths supplied with the HTML form with the request.referer URI. All the behaviour is thus specified in the HTML form. Future websites that use this script just need to be listed in the user's own ~/cgi/contact.conf and they'll be away. This is because ContactTarget looks in /home/:target/cgi/contact.conf for the details. Maybe oneday this will be inappropriate, but for now it's just fine for my purposes.
The send method is simple enough, it creates an instance of a simple Email class and ships it out. The Email class is pretty much based on the standard usage example given in the Ruby net/smtp documentation;
# lib/email/email.rb
require 'net/smtp'
class Email
def initialize(from_alias, to, reply, subject, body)
#from_alias = from_alias
#from = "cgi_user#host.domain.com"
#to = to
#reply = reply
#subject = subject
#body = body
end
def send
Net::SMTP.start('localhost', 25) do |smtp|
smtp.send_message to_s, #from, #to
end
end
def to_s
<<END_OF_MESSAGE
From: #{#from_alias}
To: #{#to}
Reply-To: #{#from_alias}
Subject: #{#subject}
Date: #{DateTime::now().to_s}
#{#body}
END_OF_MESSAGE
end
end
All I need to do is rack up the application, let nginx know which socket to talk to and we're away.
Thank you everyone for your helpful pointers in the right direction! Long live sinatra!
It's all in the Net module, here's an example:
#net = Net::HTTP.new 'http://www.foo.com', 80
#params = {:name => 'doris', :email => 'doris#foo.com'}
# Create HTTP request
req = Net::HTTP::Post.new( 'script.cgi', {} )
req.set_form_data #params
# Send request
response = #net.start do |http|
http.read_timeout = 5600
http.request req
end
Probably the best way to do this would be to use an existing Ruby library like Sinatra:
require "rubygems"
require "sinatra"
get "/myurl" do
# params hash available here
# send email
end
You'll probably want to use MailFactory to send the actual email, but you definitely don't need to be mucking about with headers or parsing parameters.
CGI class of Ruby can be used for writing CGI scripts. Please check: http://www.ruby-doc.org/stdlib/libdoc/cgi/rdoc/index.html
By the way, there is no need to read the HTTP header. Parsing parametres will be easy using CGI class. Then, send the e-mail and redirect.
I have to access some pages at work and then log into them to report any problems. I was thinking of writing a program to do this.
First, I have to be able to access the pages, then I have to locate the login form and send the info. Currently, I plan on printing true/false for each test (accessibility and login) and then filling the forms myself. I'm hoping to be able to write something to automate this later.
I was thinking of using Ruby, although I haven't coded in it yet, it seems like it'd make the whole thing easier. I've worked the most with Java, though I have some experience with C++ and a bit of experience with C.
Any advice?
You can use Selenium IDE. It is a record and playback tool for simple web tests, which you can then save as test for Selenium RC in any language you want. I hope it helps
The Python urllib2 module easily permit you to interact with an HTTP server. You can use urrlib2 to read the page to verify the content. You can do a POST with the urlencoded form data and verify the content.
Further, Python has a simple unittest library that will help you structure your tests.
class TestForm( unittest.TestCase ):
def testFillInForm( self ):
data= urllib.urlencode( { field1="value", field2="value" } )
response= urllib2.urlopen( "http://localhost/path/to/form", data )
# check the response
if __name__ == "__main__":
unittest.main()
Ruby, PHP and Python all have easy to use HTTP libraries which make this kind of an operation pretty easy. Any of these languages would work fine.
If you want to do this is ruby, The Mechanize gem would be perfect for this
`
require 'mechanize'
agent = WWW::MECHANIZE.new
page = agent.get('localhost/path/to/form')
login_form = page.forms.first #assuming the first form is the one we want
login_form.username = 'myusername'
login_form.password = 'mypassword'
page = agent.submit(login_form)
puts page.body # just to see the results
`
I have found CURL to be really useful and easy to use as well under PHP. Easy to learn.
Handles cookies, HTTPS, etc.
All good.