Upload file with Mechanize for Ruby - ruby

File upload does not work using:
form.file_upload_with(:name => 'image[1]').file_name = '/tmp/image.jpg'
form.submit
This is an out-of-date example: https://github.com/sparklemotion/mechanize/blob/master/examples/flickr_upload.rb
I tried this on two different sites.
I'm using Mechanize 2.6.0.

Slightly off-topic, but another way to upload files with Mechanize I found useful, particularly if you don't have a HTML form handy, is to just use Mechanize.post with a File instance:
a = Mechanize.new
a.post(url, {
"file1" => File.new("/tmp/image.jpg")
})

Try this:
file = File.join( APP_ROOT, 'tmp', 'image.jpg')
form.file_uploads.first.file_name = file

Try:
form_with(:method => /POST/) do |form|
form.file_uploads.first.file_name = '/tmp/image.jpg'
end.submit

Related

Scraping a webpage with Mechanize and Nokogiri and storing data in XML doc

I am trying to scrape a website and store data in XML using Mechanize and Nokogiri. I didn't set up a Rails project and I am only using Ruby and IRB.
I wrote this method:
def mechanize_club
agent = Mechanize.new
agent.get("http://www.rechercheclub.applipub-fft.fr/rechercheclub/")
form = agent.page.forms.first
form.field_with(:name => 'codeLigue').options[0].select
form.submit
page2 = agent.get('http://www.rechercheclub.applipub-fft.fr/rechercheclub/club.do?codeClub=01670001&millesime=2015')
body = page2.body
html_body = Nokogiri::HTML(body)
codeclub = html_body.search('.form').children("tr:first").children("th:first").to_i
#codeclubs << codeclub
filepath = '/davidgeismar/Documents/codeclubs.xml'
builder = Nokogiri::XML::Builder.new(encoding: 'UTF-8') do |xml|
xml.root {
xml.codeclubs {
#codeclubss.each do |c|
xml.codeclub {
xml.code_ c.code
}
end
}
}
end
puts builder.to_xml
end
My first problem is that I don't know how to test my code.
I call ruby webscraper.rb in my console, the file is treated I think, but it doesn't create an XML file in the specified path.
Then, more specifically I am quite sure this code is wrong as I didn't get a chance to test it.
Basically what I am trying to do is to submit a form several times:
agent = Mechanize.new
agent.get("http://www.rechercheclub.applipub-fft.fr/rechercheclub/")
form = agent.page.forms.first
form.field_with(:name => 'codeLigue').options[0].select
form.submit
I think this code is ok, but I dont want it to only select options[0], I want it to select an option, then scrape all the data I need, then go back to page, then select options[1]... until there are no more options (an iteration I guess).
the file is treated I think, but it doesnt create an xml file in the specified path.
There is nothing in your code that creates a file. You print some output, but don't do anything to open or write a file.
Perhaps you should read the IO and File documentation and review how you are using your filepath variable?
The second problem is that you don't call your method anywhere. Though it's defined and Ruby will see it and parse the method, it has no idea what you want to do with it unless you invoke the method:
def mechanize_club
...
end
mechanize_club()

is it possible to convert Mechanize::File into Mechanize::Page

I'm having a trouble with Mechanize gem, how to convert Mechanize::File into Mechanize::Page,
here's my piece of code:
**link** = page.link_with(:href => %r{/en/users}).click
when users link clicked it goes to the page with the list of users, now i want to click the first user, but i can't achieve this, because link return Mechanize::File object
Any help, suggestions 'd be great, thanks
Mechanize uses Content-Type to determine how the resource should be handled. Occasionally websites will not set the mime-types for their resources. Mechanize::File is the default for unset Content-Type.
If you are only dealing with 'text/html' you can following Jimm Stout's suggestion of using post_connect_hooks
agent = Mechanize.new do |a|
a.post_connect_hooks << ->(_,_,response,_) do
if response.content_type.empty?
response.content_type = 'text/html'
end
end
end
Just parse the body with nokogiri:
link = page.link_with(:href => %r{/en/users}).click
doc = Nokogiri::HTML link.body
agent.get doc.at('a')[:href]

How to set the Referer header before loading a page with Ruby mechanize?

Is there a straightforward way to set custom headers with Mechanize 2.3?
I tried a former solution but get:
$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p|
p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main'
}
# ./mech.rb:30:in `<main>': undefined method `pre_connect_hooks' for nil:NilClass (NoMethodError)
The docs say:
get(uri, parameters = [], referer = nil, headers = {}) { |page| ... }
so for example:
agent.get 'http://www.google.com/', [], agent.page.uri, {'foo' => 'bar'}
alternatively you might like:
agent.request_headers = {'foo' => 'bar'}
agent.get url
You misunderstood the code you were copying. There was a newline in the example, but it disappeared in the formatting as it wasn't tagged as code. $agent contains nil since you're trying to use it before it has been initialized. You must initialize the object and then use it. Just try this:
$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' }
For this question I noticed people seem to use:
page = agent.get("http://www.you.com/index_login/", :referer => "http://www.you.com/")
As an aside, now that I tested this answer, it seems this was not the issue behind my actual problem: that every visit to a site I'm scraping requires going through the login sequence pages again, even seconds later after the first logged-in visit, despite that I'm always loading and saving the complete cookie jar in yaml format. But that would lead to another question of course.

How to set the mechanize page encoding?

I'm trying to get a page with an ISO-8859-1 encoding clicking on a link, so the code is similar to this:
page_result = page.link_with( :text => 'link_text' ).click
So far I get the result with a wrong encoding, so I see characters like:
'T�tulo:' instead of 'Título:'
I've tried several approaches, including:
Stating the encoding in the first request using the agent like:
#page_search = #agent.get(
:url => 'http://www.server.com',
:headers => { 'Accept-Charset' => 'ISO-8859-1' } )
Stating the encoding for the page itself
page_result.encoding = 'ISO-8859-1'
But I must be doing something wrong: a simple puts always show the wrong characters.
Do you know how to state the encoding?
Thanks in advance,
Added: Executable example:
require 'rubygems'
require 'mechanize'
WWW::Mechanize::Util::CODE_DIC[:SJIS] = "ISO-8859-1"
#agent = WWW::Mechanize.new
#page = #agent.get(
:url => 'http://www.mcu.es/webISBN/tituloSimpleFilter.do?cache=init&layout=busquedaisbn&language=es',
:headers => { 'Accept-Charset' => 'utf-8' } )
puts #page.body
Hey you can just do a:
agent.page.encoding = 'utf-8'
Hope it helps!
The previous answer is correct, but in my code it looks slightly different:
agent = Mechanize.new
page = agent.get('http://example.com')
page.encoding = 'windows-1251'
page.search('p').each do |para|
puts para.text
end
Sorry, it was my mistake: I come from a Java background and there strings are internally converted to utf-16. I forgot Ruby doesn't do it. Mechanize was recovering the page flawlessly, but I needed to convert the data via iconv.
Mental note: Ruby stores the strings without converting its encoding.
Yeah, Mechanize will try to detect the encoding itself (using the NKF core Ruby library) to guess the encoding) and sometimes fails.
Maybe this might help:
WWW::Mechanize::Util::CODE_DIC[:SJIS] = "ISO-8859-1"
I'm not too sure about the exact syntax, but I think the CODE_DICT Hash might be a good place to look :)
I had a similar problem a while back.

http PUT a file to S3 presigned URLs using ruby

Anyone got a working example of using ruby to post to a presigned URL on s3
I have used aws-sdk and right_aws both.
Here is the code to do this.
require 'rubygems'
require 'aws-sdk'
require 'right_aws'
require 'net/http'
require 'uri'
require 'rack'
access_key_id = 'AAAAAAAAAAAAAAAAA'
secret_access_key = 'ASDFASDFAS4646ASDFSAFASDFASDFSADF'
s3 = AWS::S3.new( :access_key_id => access_key_id, :secret_access_key => secret_access_key)
right_s3 = RightAws::S3Interface.new(access_key_id, secret_access_key, {:multi_thread => true, :logger => nil} )
bucket_name = 'your-bucket-name'
key = "your-file-name.ext"
right_url = right_s3.put_link(bucket_name, key)
right_scan_command = "curl -I --upload-file #{key} '#{right_url.to_s}'"
system(right_scan_command)
bucket = s3.buckets[bucket_name]
form = bucket.presigned_post(:key => key)
uri = URI(form.url.to_s + '/' + key)
uri.query = Rack::Utils.build_query(form.fields)
scan_command = "curl -I --upload-file #{key} '#{uri.to_s}'"
system(scan_command)
Can you provide more information on how a "presigned URL" works? Is it like this:
AWS::S3::S3Object.url_for(self.full_filename,
self.bucket_name, {
:use_ssl => true,
:expires_in => ttl_seconds
})
I use this code to send authenticated clients the URL to their S3 file. I believe this is the "presigned URL" that you're asking about. I haven't used this code for a PUT, so I'm not exactly sure if it's right for you, but it might get you close.
I know this is an older question, but I was wondering the same thing and found an elegant solution in the AWS S3 Documentation.
require 'net/http'
file = "somefile.ext"
url = URI.parse(presigned_url)
Net::HTTP.start(url.host) do |http|
http.send_request("PUT", url.request_uri, File.read(file), {"content-type" => "",})
end
This worked great for my Device Farm uploads.
Does anything on the s3 library page cover what you need? There are loads of examples there.
There are some generic REST libraries for Ruby; Google for "ruby rest client". See also HTTParty.
I've managed to sort it out. Turns out the HTTP:Net in Ruby is has some short comings. Lot of Monkeypatch later I got it working.. More details when I have time. thank

Resources