How to scrape a website with the socksify gem (proxy) - ruby

I am reading through the documentation of the socksify gem on Rubyforge. I have installed the gem successfully, and I have run this documented code with success to test that my local implementation can replicate it:
require 'socksify/http'
uri = URI.parse('http://rubyforge.org/')
Net::HTTP.SOCKSProxy('127.0.0.1', 9050).start(uri.host, uri.port) do |http|
http.get(uri.path)
end
# => #<Net::HTTPOK 200 OK readbody=true>
But how do I e.g. scrape 'http://google.com/', and get the html content? I wish to parse it with e.g. Nokogiri like this:
Nokogiri::HTML(open("http://google.com/))

require 'socksify/http'
http = Net::HTTP::SOCKSProxy(addr, port)
html = http.get(URI('http://google.de'))
html_doc = Nokogiri::HTML(html)

Related

Sending file with POST to server in pure ruby (or library that doesn't need build tools)

Writing extensions for Sketchup I need to get around their usage of their own ruby (2.0.0) interpreter. Most importantly, I can't install gems that require build tools.
How can I send a file per POST request to my local server which does some calculations and answers with a JSON object?
I'm aware how I can use rest-client to send the file, but due the mentioned restrictions I can't use it (it required build tools). Is there another comparable way or library that can help me?
require 'uri'
url = 'http://foourl.com'
uri = URI.parse(url)
data = File.read('fil_path')
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new(uri.request_uri)
request.body = data
request.content_type = 'audio/amr'
response = http.request(request)
(Taken from here)
Don't forget to configure your content type
You can use Ruby's native Net::HTTP library.
Examples:
require 'net/http'
require 'uri'
uri = URI('http://www.example.com/search.cgi')
res = Net::HTTP.post_form(uri, 'q' => 'ruby', 'max' => '50')
puts res.body
or
response = http.post('/cgi-bin/search.rb', 'query=foo')

Checking Facebook Status Using Ruby

I wanted to check facebook status of another page from ruby script. First of all is it possible. I have been doing the following:
I got developer account
I got app key and secret
I installed json_pure gem
Here is my code:
require 'rubygems'
require 'json/pure'
require 'net/http'
url ="https://graph.facebook.com/user_id/feed?access_token=app_id|app_secret"
uri = URI.parse(URI.encode(url.strip))
#to remove specia codes encode
#to remoce whitespace strip
req = Net::HTTP::Get.new(uri.to_s)
res = Net::HTTP.start(uri.host, uri.port) {|http|
http.request(req)
}
html = res.body
res = JSON.parse(html)
Here is the error:
C:/Ruby187/lib/ruby/1.8/net/protocol.rb:135:in `sysread': An existing connection
was forcibly closed by the remote host. (Errno::ECONNRESET)
I would recommend using the koala gem instead of Net::Http
#graph = Koala::Facebook::API.new(oauth_access_token)
status = #graph.get_connections("me", user_id, "status")

How do I use ruby get JSON back from Instagram API

I am doing my best to get JSON back from the instagram API. Here is the code I am trying in my rake task within rails.
require 'net/http'
url = "https://api.instagram.com/v1/tags/snow/media/recent?access_token=522219.f59def8.95be7b2656ec42c08bff8a159a43d06f"
resp = Net::HTTP.get_response(URI.parse(url))
puts resp.body
All I end up with in the terminal is "rake aborted!
end of file reached"
If you look at the instagram docs http://instagram.com/developer/endpoints/tags/ and you paste the following URL in your browser you will get JSON back so I'm sure I am doing something wrong.
https://api.instagram.com/v1/tags/snow/media/recent?access_token=522219.f59def8.95be7b2656ec42c08bff8a159a43d06f
It has to do with HTTPS url you need to modify your code to include SSL
require "net/https"
require "uri"
uri = URI.parse("https://api.instagram.com/v1/tags/snow/media/recent?access_token=522219.f59def8.95be7b2656ec42c08bff8a159a43d06f")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
puts response.body
alternatively you could use somthing like https://github.com/jnunemaker/httparty to consume 3rd party services
Looks like you'd need to configure net/http to use SSL because you're using https.
Alternative : use this with Rails, it'll parse the json on the fly too :
ActiveSupport::JSON.decode(open(URI.encode(url)))
Returns a hash to play with

Simplest way to do a XMLHttpRequest in Ruby?

I want to do a XMLHttpRequest POST in Ruby. I don't want to use a framework like Watir. Something like Mechanize or Scrubyt would be fine. How can I do this?
Mechanize:
require 'mechanize'
agent = Mechanize.new
agent.post 'http://www.example.com/', :foo => 'bar'
Example with 'net/http', (ruby 1.9.3):
You only have to put an additional header for the XMLHttpRequest to your POST-request (see below).
require 'net/http'
require 'uri' # convenient for using parts of an URI
uri = URI.parse('http://server.com/path/to/resource')
# create a Net::HTTP object (the client with details of the server):
http_client = Net::HTTP.new(uri.host, uri.port)
# create a POST-object for the request:
your_post = Net::HTTP::Post.new(uri.path)
# the content (body) of your post-request:
your_post.body = 'your content'
# the headers for your post-request (you have to analyze before,
# which headers are mandatory for your request); for example:
your_post['Content-Type'] = 'put here the content-type'
your_post['Content-Length'] = your_post.body.size.to_s
# ...
# for an XMLHttpRequest you need (for example?) such header:
your_post['X-Requested-With'] = 'XMLHttpRequest'
# send the request to the server:
response = http_client.request(your_post)
# the body of the response:
puts response.body
XMLHTTPRequest is a browser concept, but since you're asking about Ruby, I assume all you want to do is simulate such a request from a ruby script? To that end, there's a gem called HTTParty which is very easy to use.
Here's a simple example (assuming you have the gem - install it with gem install httparty):
require 'httparty'
response = HTTParty.get('http://twitter.com/statuses/public_timeline.json')
puts response.body, response.code, response.message, response.headers.inspect

Need help in connecting to a HTTPS url with Ruby and passing source data to Nokogiri

I'm trying to get a list of stations from this webpage - https://web.barclayscyclehire.tfl.gov.uk/maps
I see that they have the list of stations in JSON structure in the javascript. So I am trying to connect to the page and then pass the data to Nokogiri to get the javascript with JSONs and then parse the JSONs individually.
To connect to HTTPS and pass data to Nokogiri I used this code available here - https://gist.github.com/1037492
require 'net/https'
require 'nokogiri'
url = "https://example.com"
url = URI.parse( url )
http = Net::HTTP.new( url.host, url.port )
http.use_ssl = true if url.port == 443
http.verify_mode = OpenSSL::SSL::VERIFY_NONE if url.port == 443
path = url.path
path += "?" + url.query unless url.query.nil?
res, data = http.get( path )
case res
when Net::HTTPSuccess, Net::HTTPRedirection
# parse link
doc = Nokogiri::HTML(data)
# do what you want ...
else
return "failed" + res.to_s
end
However when I try to debug in Aptana Studio 3, before the debugger can stop on any breakpoint I have, it exits giving me an invalid return error. Is there something wrong with that code ?
And is that the best way to connect to HTTPS and pass data to Nokogiri ?
Try like this:
require 'nokogiri'
require 'open-uri'
require 'openssl'
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
doc = Nokogiri::HTML open(https_url)

Resources