VCRProxy: Record PhantomJS ajax calls with VCR inside Capybara - ruby

I already did some research in this field, but didn't find any solution. I have a site, where asynchron ajax calls are made to facebook (using JSONP). I'm recording all my HTTP requests on the Ruby side with VCR, so I thought it would be cool, to use this feature for AJAX calls as well.
So I played a little bit around, and came up with a proxy attempt. I'm using PhantomJS as a headless browser and poltergeist for the integration inside Capybara. Poltergeist is now configured to use a proxy like this:
Capybara.register_driver :poltergeist_vcr do |app|
options = {
:phantomjs_options => [
"--proxy=127.0.0.1:9100",
"--proxy-type=http",
"--ignore-ssl-errors=yes",
"--web-security=no"
],
:inspector => true
}
Capybara::Poltergeist::Driver.new(app, options)
end
Capybara.javascript_driver = :poltergeist_vcr
For testing purposes, I wrote a proxy server based on WEbrick, that integrates VCR:
require 'io/wait'
require 'webrick'
require 'webrick/httpproxy'
require 'rubygems'
require 'vcr'
module WEBrick
class VCRProxyServer < HTTPProxyServer
def service(*args)
VCR.use_cassette('proxied') { super(*args) }
end
end
end
VCR.configure do |c|
c.stub_with :webmock
c.cassette_library_dir = '.'
c.default_cassette_options = { :record => :new_episodes }
c.ignore_localhost = true
end
IP = '127.0.0.1'
PORT = 9100
reader, writer = IO.pipe
#pid = fork do
reader.close
$stderr = writer
server = WEBrick::VCRProxyServer.new(:BindAddress => IP, :Port => PORT)
trap('INT') { server.shutdown }
server.start
end
raise 'VCR Proxy did not start in 10 seconds' unless reader.wait(10)
This works well with every localhost call, and they get well recorded. The HTML, JS and CSS files are recorded by VCR. Then I enabled the c.ignore_localhost = true option, cause it's useless (in my opinion) to record localhost calls.
Then I tried again, but I had to figure out, that the AJAX calls that are made on the page aren't recorded. Even worse, they doesn't work inside the tests anymore.
So to come to the point, my question is: Why are all calls to JS files on the localhost recorded, and JSONP calls to external ressources not? It can't be the jsonP thing, cause it's a "normal" ajax request. Or is there a bug inside phantomjs, that AJAX calls aren't proxied? If so, how could we fix that?
If it's running, I want to integrate the start and stop procedure inside
------- UPDATE -------
I did some research and came to the following point: the proxy has some problems with HTTPS calls and binary data through HTTPS calls.
I started the server, and made some curl calls:
curl --proxy 127.0.0.1:9100 http://d3jgo56a5b0my0.cloudfront.net/images/v7/application/stories_view/icons/bug.png
This call gets recorded as it should. The request and response output from the proxy is
GET http://d3jgo56a5b0my0.cloudfront.net/images/v7/application/stories_view/icons/bug.png HTTP/1.1
User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
Host: d3jgo56a5b0my0.cloudfront.net
Accept: */*
Proxy-Connection: Keep-Alive
HTTP/1.1 200 OK
Server: WEBrick/1.3.1 (Ruby/1.9.3/2012-10-12)
Date: Tue, 20 Nov 2012 10:13:10 GMT
Content-Length: 0
Connection: Keep-Alive
But this call doesn't gets recorded, there must be some problem with HTTPS:
curl --proxy 127.0.0.1:9100 https://d3jgo56a5b0my0.cloudfront.net/images/v7/application/stories_view/icons/bug.png
The header output is:
CONNECT d3jgo56a5b0my0.cloudfront.net:443 HTTP/1.1
Host: d3jgo56a5b0my0.cloudfront.net:443
User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
Proxy-Connection: Keep-Alive
HTTP/1.1 200 OK
Server: WEBrick/1.3.1 (Ruby/1.9.3/2012-10-12)
Date: Tue, 20 Nov 2012 10:15:48 GMT
Content-Length: 0
Connection: close
So, I thought maybe the proxy can't handle HTTPS, but it can (as long as I'm getting the output on the console after the cURL call). Then I thought, maybe VCR can't mock HTTPS requests. But using this script, VCR mocks out HTTPS requests, when I don't use it inside the proxy:
require 'vcr'
VCR.configure do |c|
c.hook_into :webmock
c.cassette_library_dir = 'cassettes'
end
uri = URI("https://d3jgo56a5b0my0.cloudfront.net/images/v7/application/stories_view/icons/bug.png")
VCR.use_cassette('https', :record => :new_episodes) do
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
response = http.request_get(uri.path)
puts response.body
end
So what is the problem? VCR handles HTTPS and the proxy handles HTTPS. Why they don't play together?

So I did some research and now I have a very basic example of a working VCR proxy server, that handles HTTPS calls as a MITM proxyserver (if you deactivate the security check in your client). I would be very happy if someone could contribute and help me to bring this thing to life.
Here is the github repo: https://github.com/23tux/vcr_proxy

Puffing Billy is a very nice tool. You need to specify which domains to bypass, and which urls you need to stub. It is also a bit tricky stubbing https urls. You need to stub https urls as https://www.example.com:443/path/

Related

Copied and pasted Ruby code from Hubspot API but I get an HTTPUnsupportedMediaType415

I am simply trying to do an HTTP PUT request using a Ruby script, and I am literally copying and pasting 100% of the same thing from Hubspot's example. It's working in Hubspot's example, but not mine.
For example, here's the 99% full code from HubSpot API (with my API key redacted):
# https://rubygems.org/gems/hubspot-api-client
require 'uri'
require 'net/http'
require 'openssl'
url = URI("https://api.hubapi.com/crm/v3/objects/deals/4104381XXXX/associations/company/530997XXXX/deal_to_company?hapikey=XXXX")
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Put.new(url)
request["accept"] = 'application/json'
response = http.request(request)
puts response.read_body
When initiated by hubspot, the response is an HTTP 201, but in my Ruby script it's giving me the following error:
=> #<Net::HTTPUnsupportedMediaType 415 Unsupported Media Type readbody=true>
I have tried directly copying and pasting the exact same thing, but no luck. I would copy what I'm using, but it's 100% the same code as above except for the redacted API, deal, and company IDs. I have copied and pasted HubSpot's example directly into my rails console, but I get an unsupported media type error.
I have also tried adding a body to the request, such as request.body = "hello" and nothing.
Any suggestion would be greatly appreciated.
After analyzing a working cURL request and the ruby script via BurpSuite, I determined that the following HTTP header in the request was the culprit:
Content-Type: application/x-www-form-urlencoded
For whatever reason, the Ruby code in the original post uses this content-type by default, even though the user doesn't specify it. Makes no sense.

Getting the last HTTPS Request in Ruby rest-client (like __getLastRequest in php SoapClient)

I'm using the ruby rest-client to send requests to a web service. My requests are working, but I'd like to see the actual request that was sent to the service.
I can't do this with Wireshark or tcpdump because I'm using https and don't have access to the servers private key.
In php, when I've used the SoapClient in the past, I've been able to use the __getLastRequest function to see what xml is sent (http://www.php.net/manual/en/soapclient.getlastrequest.php).
Does anyone know the best way for me to see the actual packets sent to the server?
Many thanks,
D.
You can set the environment variable RESTCLIENT_LOG to stdout, stderr or a file name:
test.rb:
require 'rest-client'
RestClient.get "http://www.google.de"
Call:
RESTCLIENT_LOG=stderr ruby test.rb
Output:
RestClient.get "http://www.google.de", "Accept"=>"*/*; q=0.5, application/xml", "Accept-Encoding"=>"gzip, deflate"
# => 200 OK | text/html 10941 bytes
If you use Net::HTTP instead of rest-client, you can use http.set_debug_output $stderr to see the contents of the request and response:
require 'net/http'
require 'openssl'
uri = URI('https://myserverip/myuri')
http = Net::HTTP.new(uri.host, uri.port)
http.set_debug_output $stderr
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
request.basic_auth 'username', 'password'
response = http.request(request)
#data = response.body
puts response.body
In addition, using Net::HTTP, I can also get it to use a proxy by using something like http = Net::HTTP::Proxy('127.0.0.1','8888').new(uri.host, uri.port).
You can also do the same with rest-client by using RestClient.proxy = "http://127.0.0.1:8888/" just before your RestClient.get(url) etc..
This way I can send the traffic via a tool like Fiddler2 on Windows and then see all the detail I need.
It's just a shame I can't find an equivalent to Fiddler for the Mac since that's where I want to write the code.
It Looks like I'll either have to rewrite my code to use HTTP::Net instead of rest-client, or switch to Windows full time ;), unless anyone has any other thoughts.

WEBrick socket returns eof? == true

I'm writing a MITM proxy with webrick and ssl support (for mocking out requests with VCR on client side, see this thread VCRProxy: Record PhantomJS ajax calls with VCR inside Capybara or my github repository https://github.com/23tux/vcr_proxy), and I made it really far (in my opinion). My configuration is that phantomjs is configured to use a proxy and ignore ssl errors. That proxy (written in webrick) records normal HTTP requests with VCR. If a SSL request is made, the proxy starts another webrick server, mounts it at / and re-writes the unparsed_uri for the request, so that not the original server is called but my just started webrick server. The new started server handles then the requests, records it with VCR and so on.
Everything works fine when using cURL to test the MITM proxy. For example a request made by curl like
curl --proxy localhost:11111 --ssl --insecure https://blekko.com/ws/?q=rails+/json -v
gets handled, recorded...
But: When I try to do the same request inside page served by poltergeist from javascript with an jsonp ajax request, something goes wrong. I debugged it to the line which causes the problem. It's inside the httpserver.rb from webrick inside the ruby source code at line 80 (Ruby 1.9.3):
def run(sock)
while true
res = HTTPResponse.new(#config)
req = HTTPRequest.new(#config)
server = self
begin
timeout = #config[:RequestTimeout]
while timeout > 0
break if IO.select([sock], nil, nil, 0.5)
timeout = 0 if #status != :Running
timeout -= 0.5
end
raise HTTPStatus::EOFError if timeout <= 0
raise HTTPStatus::EOFError if sock.eof?
The last line raise HTTPStatus::EOFError if sock.eof? raises an error when doing requests with phantomjs, because sock.eof? == true:
1.9.3p392 :002 > sock
=> #<OpenSSL::SSL::SSLSocket:0x007fa36885e090>
1.9.3p392 :003 > sock.eof?
=> true
I tried it with the curl command and there it's sock.eof? == false, so the error doesn't get raised, and everything works fine:
1.9.3p392 :001 > sock
=> #<OpenSSL::SSL::SSLSocket:0x007fa36b7156b8>
1.9.3p392 :002 > sock.eof?
=> false
I only have very little experience with socket programming in ruby, so I'm a little bit stucked.
How can I find out, what's the difference between the two requests, based on the sock variable? As I can see in the IO docs of ruby, eof? blocks until the other side sends some data or closes it. Am I right? But why is it closed when calling the same request, same parameters, same method with phantomjs, and it's not closed when using curl?
Hope somebody can help me to figure this out. thx!
Since this is a HTTPS I bet the client is closing the connection. In HTTPS this can happen when the server certificate is for example not valid. What kind of HTTPS library do you use? These libraries can be usually configured to ignore SSL CERT and continue working when it is not valid.
In curl you are actually doing that using -k (--insecure), without this it would not work. Try this without this option and if curl fails, then your server certificate is not valid. Note to get this working you usually either need to turn the checking off or to provide valid certificate to the client so it can verify it.

Determining if http://foo.com redirects to http://www.foo.com

I have a list of ~150 URLs. I need to find out whether each domain resolves to www.domain.com or just domain.com.
There are multiple ways that a domain name could 'resolve' or 'redirect' to another:
Making an HTTP request for foo.com could respond with an HTTP redirect response code like 301, sending the browser to www.foo.com.
phrogz$ curl -I http://adobe.com
HTTP/1.1 301 Moved Permanently
Date: Mon, 30 Apr 2012 22:19:33 GMT
Server: Apache
Location: http://www.adobe.com/
Content-Type: text/html; charset=iso-8859-1
The web page sent back by the server might include a <meta> redirect:
<meta http-equiv="refresh" content="0; url=http://www.adobe.com/">
The web page sent back by the server might include JavaScript redirection:
location.href = 'http://www.adobe.com';
Which of these do you need to test for?
Reading HTTP Response Header
To detect #1 use the net/http library built into Ruby:
require "net/http"
req = Net::HTTP.new('adobe.com', 80)
response = req.request_head('/')
p response.code, response['Location']
#=> "301"
#=> "http://www.adobe.com/"
Reading HTML Meta Headers
To detect #2, you'll need to actually fetch the page, parse it, and look at the contents. I'd use Nokogiri for this:
require 'open-uri' # …if you don't need #1 also, this is easier
html = open('http://adobe.com').read
require 'nokogiri'
doc = Nokogiri.HTML(html)
if meta = doc.at_xpath('//meta[#http-equiv="refresh"]')
# Might give you "abobe.com" or "www.adobe.com"
domain = meta['content'][%r{url=([^/"]+(\.[^/"])+)},1]
end
Reading JavaScript
…you're on your own, here. :) You could attempt to parse the JavaScript code yourself, but you'd need to actually run the JS to find out if it ever actually redirects to another page or not.
I've seen this done very successfully with the resolv std library.
require 'resolv'
["google.com", "ruby-lang.org"].map do |domain|
[domain, Resolv.getaddress(domain)]
end
The mechanize way:
require 'mechanize'
Mechanize.new.head('http://google.com').uri.host
#=> "www.google.com.ph"

How to get long URL from short URL?

Using Ruby, how do I convert the short URLs (tinyURL, bitly etc) to the corresponding long URLs?
I don't use Ruby but the general idea is to send an HTTP HEAD request to the server which in turn will return a 301 response (Moved Permanently) with the Location header which contains the URI.
HEAD /5b2su2 HTTP/1.1
Host: tinyurl.com
Accept: */*
RESPONSE:
HTTP/1.1 301 Moved Permanently
Location: http://stackoverflow.com
Content-type: text/html
Date: Sat, 23 May 2009 18:58:24 GMT
Server: TinyURL/1.6
This is much faster than opening the actual URL and you don't really want to fetch the redirected URL. It also plays nice with the tinyurl service.
Look into any HTTP or curl APIs within ruby. It should be fairly easy.
You can use the httpclient rubygem to get the headers
#!/usr/bin/env ruby
require 'rubygems'
require 'httpclient'
client = HTTPClient.new
result = client.head(ARGV[0])
puts result.header['Location']
There is a great wrapper for the bitly API in Python available here:
http://code.google.com/p/python-bitly/
So there must be something similar for Ruby.

Resources