Ruby: Problems using Mechanize to access my form! - ruby

Just for fun, I wrote a very small rails blog (just a hello world).
Now I want to create a post using mechanize.
So I created a Ruby Prog and started coding.
Here is my problem:
Rails creates my form element including all inputs.
In HTML my inputs look like this:
<input type="text" size="30" name="post[title]" id="post_title">
or
<textarea rows="20" name="post[description]" id="post_description" cols="40"></textarea>
Well...
Here is my Ruby Prog using Mechanize:
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
page = agent.get('http://localhost:3000/posts/new')
target_form = page.form_with(:class => 'new_post')
target_form.post[title] = "test"
target_form.post[description] = "test"
page = agent.submit(target_form)
puts "end"
I know where my error is but I don't know how to fix it.
At target_form.post[title] = "test" it crashes, cause of
undefined method `name' for nil:NilClass (NoMethodError)
I think (please correct me), it's because of the input name, cause it is post[title] instead of only post right?
How can I fix it?

How about
target_form.field_with(:name => "post[title]").value = "test"
target_form.field_with(:name => "post[description]").value = "test"

Related

How do I print XPath value?

I want to print the contents of an XPath node. Here is what I have:
require "mechanize"
agent = Mechanize.new
agent.get("http://store.steampowered.com/promotion/snowglobefaq")
puts agent.xpath("//*[#id='item_52b3985a70d58']/div[4]")
This returns: <main>: undefined method xpath for #<Mechanize:0x2fa18c0> (NoMethodError).
I just started using Mechanize and have no idea what I'm doing, however, I've used Watir and thought this would work but it didn't.
You an use Nokogiri to parse the page after retrieving it. Here is the example code:
m = Mechanize.new
result = m.get("http://google.com")
html = Nokogiri::HTML(result.body)
divs = html.xpath('//div').map { |div| div.content } # here you can do whatever is needed with the divs
# I've mapped their content into an array
There are two things wrong:
The ID doesn't exist on that page. Try this to see the list of tag IDs available:
require "open-uri"
require 'nokogiri'
doc = Nokogiri::HTML(open("http://store.steampowered.com/promotion/snowglobefaq"))
puts doc.search('[id*="item"]').map{ |n| n['id'] }.sort
The correct chain of methods is agent.page.xpath.
Because there is no sample HTML showing exactly which tag you want, we can't help you much.

Get website headline with Nokogiri

I'm trying to get a website's headline (in Vietnamese) using Nokogiri:
# encoding: utf-8
require 'rubygems'
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open("http://vnexpress.net"))
list = page.css("a[class='link-topnews']")
puts list[0].text
but it's giving the error:
undefined method `text' for nil:NilClass (NoMethodError)
The weird thing is, with the exact same code, sometimes it does work and gives the correct result:
Triều Tiên dọa hành động với máy bay B-52 của Mỹ
Even when trying to get the title it's giving the same error:
page = Nokogiri::HTML(open("http://vnexpress.net/"))
list = page.css("title")
puts list[0].text
Why does it behave like that? What did I do wrong?
It seems that the their server refuses to serve content when you use just nokogiri. I suppose, they are checking some headers. You can add headers or use Mechanize gem:
require 'mechanize'
agent = Mechanize.new
page = agent.get "http://vnexpress.net"
page.search("a.link-topnews").first.text
=> "Triều Tiên dọa hành động với máy bay B-52 của Mỹ"

Ruby:: How to search hidden elements with mechanize

I am trying to get hidden field with mechanize in ruby and trying to click on it.
agent = Mechanize.new
agent.get('http://www.example.com/')
agent.page.link_with(:text => "More Links...")
But this gives me:
=> nil
Actually, I want to click on it:
agent.page.link_with(:text => "More Links...").click
But this is an error:
undefined method `click' for nil:NilClass
And here is my HTML code:
<div id="rld-4" class="results_links_more highlight_d links_deep" style="display: none;">
<a class="large" href="javascript:;">More Links...</a>
</div>
Mechanize currently doesn't support javascript. I'd suggest you try and figure
out what the server expects the user-agent to send and then replicate this with
Mechanize. You can use a tool like HTTPFox which is a Firefox addon that monitors the traffic between a web server and your browser. Once you have this, you can easily replicate it with mechanize. Something like this;
agent = Mechanize.new
# Doesn't work
# home_page = agent.get('http://requestb.in/')
# agent.click(home_page.link_with(:text => "Create a RequestBin"))
# => undefined method `[]' for nil:NilClass (NoMethodError)
# Works
# The javascript code just makes a POST request with one parameter
request_bin = agent.post("http://requestb.in/api/v1/bins", { "private" => "false" })
puts request_bin.body
That should probably find the link if it's really on the page, but the bigger problem is that clicking on a link with a href of 'javascript:;' doesn't do what you think it does. That's because mechanize is not a full browser with a javascript interpreter, etc.

How to insert a string to a text field using mechanize in ruby?

I know is a very simple question but I've been stuck for an hour and I just can't understand how this works.
I need to scrape some stuff from my school's library so I need to insert 'CE' to a text field and then click on a link with text 'Clasificación'. The output is what I am going to use to work. So here is my code.
require 'rubygems'
require 'open-uri'
require 'nokogiri'
require 'mechanize'
url = 'http://biblio02.eld.edu.mx/janium-bin/busqueda_rapida.pl?Id=20110720161008#'
searchStr = 'CE'
agent = Mechanize.new
page = agent.get(url)
searchForm = page.form_with(:method => 'post')
searchForm['buscar'] = searchStr
clasificacionLink = page.link_with(:href => "javascript:onClick=set_index_and_submit(\'51\');").click
page = agent.submit(searchForm,clasificacionLink)
When I run it, it gives me this error
janium.rb:31: undefined method `[]=' for nil:NilClass (NoMethodError)
Thanks!
I think your problem is actually on line 13, not 31, and I'll even tell why I think that. Not only does your script not have 31 lines but, from the fine manual:
form_with(criteria)
Find a single form matching criteria.
There are several forms on that page that have method="post". Apparently Mechanize returns nil when it can't exactly match the form_with criteria including the single part mentioned in the documentation; so, if your criteria matches more than one thing, form_with returns nil instead of choosing one of the options and you end up trying to do this:
nil['buscar'] = searchStr
But nil doesn't have a []= method so you get your NoMethodError.
If you use this:
searchForm = page.form_with(:name => 'forma')
you'll get past the first part as there is exactly one form with name="forma" on that page. Then you'll have trouble with this:
clasificacionLink = page.link_with(:href => "javascript:onClick=set_index_and_submit(\'51\');").click
page = agent.submit(searchForm, clasificacionLink)
as Mechanize doesn't know what to do with JavaScript (at least mine doesn't). But if you use just this:
page = agent.submit(searchForm)
you'll get a page and then you can continue building and debugging your script.
mu's answer sounds reasonable. I am not sure if this is strictly necessary, but you might also try to put braces around searchStr.
searchForm['buscar'] = [searchStr]

hpricot: get image from URL and parse element

i am trying to get the exact URL of an image inside a page and then download it. i haven't yet gotten to the download point, as i am trying to isolate the URL of the image. here is the code:
#!/usr/bin/ruby -w
require 'rubygems'
require 'hpricot'
require 'open-uri'
raw = Hpricot(open("http://www.amazon.com/Weezer/dp/B000003TAW/"))
ele = raw.search("img[#src*=jpg]").first
img = ele.match("(\")(.*?)(\")").captures
puts img[1]
when i run it as it is, i receive:
undefined method `match' for #<Hpricot::Elem:0xb731948c> (NoMethodError)
if i comment out the last 2 lines and add
puts ele
i get:
<img src="http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" />
which is the correct portion of the page i want to parse. however, the error is when i try to get just the "http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" part.
i am not totally sure why it can't perform a match, as I understand the search i am running should be getting an array of the image elements and returning the first. so i assumed that i could not run the match on the entire array, so i tried
img = ele[1].match("(\")(.*?)(\")").captures
puts img
and that returns
undefined method `match' for nil:NilClass (NoMethodError)
i am lost. please excuse my ignorance, as i am just beginning to learn ruby. any help is appreciated.
Change this line:
img = ele.match("(\")(.*?)(\")").captures
To:
img = ele[:src]
The reason for the errors is that Hpricot:Elem isn't a string. Try:
ele.responde.to? :match
and you get false.
However, you could do:
ele.to_s.match("(\")(.*?)(\")").captures[1]
the secret is in the to_s

Resources