Creating a single page proxy using Ruby Sinatra - ruby

I am trying to use Ruby Sinatra to create a simple proxy for a specific web page. I can do it in C#, I just can't seem to work it out for Sinatra, the C# code is below:
<%# WebHandler Language="C#" Class="Map" %>
using System;
using System.Web;
using System.Net;
using System.IO;
public class Map : IHttpHandler {
static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[0x1000];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
output.Write(buffer, 0, read);
}
public void ProcessRequest(HttpContext context)
{
string gmapUri = string.Format("http://maps.google.com/maps/api/staticmap{0}", context.Request.Url.Query);
WebRequest request = WebRequest.Create(gmapUri);
using (WebResponse response = request.GetResponse())
{
context.Response.ContentType = response.ContentType;
Stream responseStream = response.GetResponseStream();
CopyStream(responseStream, context.Response.OutputStream);
}
}
public bool IsReusable {
get {
return false;
}
}
}
The Ruby Sinatra code I have tried is as follows:
require 'rubygems'
require 'sinatra'
get '/mapsproxy/staticmap' do
request.path_info = 'http://maps.google.com/maps/api/staticmap'
pass
end
I am assuming that the Sinatra one does not work (get a 404) as is is only passing the request to pages in the same domain. Any hep would be greatly appreciated.
EDIT:
With the Tin Man's help I've come up with a nice succinct solution, which works well for me:
get '/proxy/path' do
URI.parse(<URI> + request.query_string.gsub("|", "%7C")).read
end
Thanks for all the help.

If you want your Sinatra app to retrieve the URL, you'll need to fire up a HTTP client of some sort:
get '/mapsproxy/staticmap' do
require 'open-uri'
open('http://maps.google.com/maps/api/staticmap').read
end
I think this will work and is about as minimal as you can get.
You could use HTTPClient if you need more tweakability.
Also, I think that Rack can do it. Sinatra is built on top of Rack, but it's been a while since I played at that level.
I still need to find a way to extract the contentType from the response
From the Open-URI docs:
The opened file has several methods for meta information as follows since
it is extended by OpenURI::Meta.
open("http://www.ruby-lang.org/en") {|f|
f.each_line {|line| p line}
p f.base_uri # <URI::HTTP:0x40e6ef2 URL:http://www.ruby-lang.org/en/>
p f.content_type # "text/html"
p f.charset # "iso-8859-1"
p f.content_encoding # []
p f.last_modified # Thu Dec 05 02:45:02 UTC 2002
}
For your purposes something like this should work:
content_type = ''
body = open("http://www.ruby-lang.org/en") {|f|
content_type = f.content_type # "text/html"
f.read
}
I haven't tested that, but I think the return value of the block will be assigned to body. If that doesn't work then try:
content_type = ''
body = ''
open("http://www.ruby-lang.org/en") {|f|
content_type = f.content_type # "text/html"
body = f.read
}
but I think the first will work.

With the help of the Tin Man and TK-421 I've worked out a solution, see the Sinatra route below:
get '/proxy/path' do
require 'open-uri'
uri = URI.parse(<URI>)
getresult = uri.read
halt 200, {'Content-Type' => getresult.content_type}, getresult
end
Just replace the <URI> with the page you require, and you're good to go.
After some more playing this is what I've come up with:
get '/proxy/path' do
URI.parse(<URI> + request.query_string.gsub("|", "%7C")).read
end
As mentioned else where you need to require 'open-uri' at the top of the code. The reason for the gsub is that for some reason the parse fails if they are left in, and my browser doesn't encode them automatically.

Related

Mock method in class using MiniTest

I'm running into an issue mocking a method within a class using Ruby's MiniTest::Mock and stub feature and it's driving me insane. What I am trying to do is set up a way to get an access token by calling a Ruby API which will get the actual access token from a 3rd party site.
This is the class all work will be done in.
class ThisClass
attr_reader :hauler
def initialize(hauler)
#hauler = hauler
end
def get_access_token
access_token = Rails.cache.read("access_token_#{hauler.id}")
if access_token.blank?
access_token = ext_access_token
Rails.cache.write("access_token_#{#hauler.id}", access_token, { expires_in: 3600 })
end
access_token
end
def ext_access_token
# call to external url to get access_token
# Successful response will be { "data": { "authToken": "new-token"} }"
url = URI.parse("http://www.fake-login.com/auth/sign_in")
res = Net::HTTP.post_form(url, "userName" => #hauler[:client_id], "password" => #hauler[:client_secret])
json_response = JSON.parse(res.body)
json_response["data"]["authToken"]
end
end
The test is as follows
class ThisClassTest < ActiveSupport::TestCase
test "Get Access Token" do
hauler = haulers(:one)
tc = ThisClass.new(hauler)
mock_client = MiniTest::Mock.new
mock_client.expect :ext_access_token, "\"{ \"data\": { \"authToken\": \"new-token\"} }\""
ThisClass.stub(:ext_access_token, mock_client) do
puts tc.get_access_token
end
assert_equal true, true
end
end
When I run my tests, I get the following error
Error:
ThisClassTest#test_Get_Access_Token:
NameError: undefined method `ext_access_token' for class `ThisClass'
I'm clearly doing something wrong since all I want is for the ext_access_token method to return the same data string so I can run logic against it, but very much failing. ThisClass is fairly simplistic but outlines the setup I'll be going with moving forward for more complex methods based on the return from the external site.
The test can't find ext_access_token method because it's looking for a class method on the ThisClass, rather than an instance method.
So what you need is something like
tc.stub :ext_access_token, mock_client do
puts tc.get_access_token
end

How to take full page screenshots with Watir and geckodriver + Firefox?

I upgraded my Watir / Firefox automation stack to the latest version, and added geckodriver with it. I was surprised to see that now screenshots are of the viewport only by default.
require 'watir'
require 'mini_magick'
b = Watir::Browser.new :firefox
b.goto "https://daringfireball.net"
base = b.screenshot.base64
blob = Base64.decode64(base)
image = MiniMagick::Image.read(blob)
image.height
=> 1760 # macOS 'Retina' resolution doubling
b.execute_script "return window.innerHeight"
=> 880 # 880 * 2 = 1760
b.execute_script "return document.documentElement.scrollHeight"
=> 34692
geckodriver does not have any API for full page screenshots, though reintroducing this feature is planned (on an infinite timescale).
How can I take screenshots of the full page with Watir driving Firefox without rolling back my environment?
Using Watir's .execute_script, it is possible to repeatedly take screenshots of the viewport while moving the scroll position. It is then possible to stitch images together using MiniMagick.
I developed the watir-screenshot-stitch gem to encapsulate my best approach to solving this problem, though it comes with caveats, which you can read about there. It is also memory intensive and can be slow.
This is not a true full-page screenshot solution, and I would gladly accept any alternative approaches that improve on this.
I solved the problem in C#. But the solution I guess can be rewritten on any language. I used a JavaScript library called HTML2Canvas to generate the full page screenshots. Here is the C# code:
[Test]
public void TakingHTML2CanvasFullPageScreenshot()
{
using (var driver = new ChromeDriver())
{
driver.Manage().Timeouts().PageLoad = TimeSpan.FromSeconds(5);
driver.Navigate().GoToUrl(#"https://automatetheplanet.com");
IJavaScriptExecutor js = driver;
var html2canvasJs = File.ReadAllText($"{GetAssemblyDirectory()}html2canvas.js");
js.ExecuteScript(html2canvasJs);
string generateScreenshotJS = #"function genScreenshot () {
var canvasImgContentDecoded;
html2canvas(document.body, {
onrendered: function (canvas) {
window.canvasImgContentDecoded = canvas.toDataURL(""image/png"");
}});
}
genScreenshot();";
js.ExecuteScript(generateScreenshotJS);
var wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
wait.IgnoreExceptionTypes(typeof(InvalidOperationException));
wait.Until(
wd =>
{
string response = (string)js.ExecuteScript
("return (typeof canvasImgContentDecoded === 'undefined' || canvasImgContentDecoded === null)");
if (string.IsNullOrEmpty(response))
{
return false;
}
return bool.Parse(response);
});
wait.Until(wd => !string.IsNullOrEmpty((string)js.ExecuteScript("return canvasImgContentDecoded;")));
var pngContent = (string)js.ExecuteScript("return canvasImgContentDecoded;");
pngContent = pngContent.Replace("data:image/png;base64,", string.Empty);
byte[] data = Convert.FromBase64String(pngContent);
var tempFilePath = Path.GetTempFileName().Replace(".tmp", ".png");
Image image;
using (var ms = new MemoryStream(data))
{
image = Image.FromStream(ms);
}
image.Save(tempFilePath, ImageFormat.Png);
}
}
You can find more examples and explanations in the article.
It is now possible to do this in Firefox, employing a geckodriver feature. As far as I know, this feature is not baked into Selenium / probably not a part of the W3C spec.
require 'watir'
browser = Watir::Browser.new :firefox
bridge = browser.driver.session_storage.instance_variable_get(:#bridge)
server_uri = bridge.instance_variable_get(:#http).instance_variable_get(:#server_url)
sid = bridge.instance_variable_get(:#session_id)
driver_path = "session/#{sid}/moz/screenshot/full"
request_url = server_uri.to_s + driver_path
url = URI.parse(request_url)
req = Net::HTTP::Get.new(request_url)
raw = Net::HTTP.start(url.host, url.port) {|http| http.request(req) }.body
base64_screenshot = JSON.parse(raw, symbolize_names: true)[:value]
This approach is also now an option in the watir-screenshot-stitch gem:
require 'watir-screenshot-stitch'
b = Watir::Browser.new :firefox
b.goto "https://github.com/mozilla/geckodriver/issues/570"
base64_screenshot = b.base64_geckodriver

Get image from parsing the response

I am trying to get an image from the response body. Right now this gives me the entire HTML page. I see the tag but cannot specifically retrieve it. Any help would be great!
#Get Request
encoded_response = response.body.force_encoding("UTF-8")
url = URI.parse(encoded_response)
req = Net::HTTP::Get.new(url.to_s)
res = Net::HTTP.start(url.host, url.port) {|http|
http.request(req)
}
puts res.img
For those about to ask, I had to encode the response because I was getting a Bad URI errorIP
Have you looked at a parsing library like Nokogiri?
html = Nokogiri::HTML.parse(response.body.force_encoding("UTF-8"))
image_urls = html.css('img').map { |image_tag| image_tag["src"] }
For "downloading" the image, see here: Download an image from a URL?

How to export a Confluence "Space" to PDF using remote API

How can I export a Confluence 'space' as a pdf? It looks like it might still be supported in Confluence 5.0 using the XML-RPC API. I cannot find an example of what to call, though.
https://developer.atlassian.com/display/CONFDEV/Remote+API+Specification+for+PDF+Export#RemoteAPISpecificationforPDFExport-XML-RPCInformation
That link says calls should be prefixed with pdfexport, but then doesn't list any of the calls or give an example.
This works using Bob Swift's SOAP library ('org.swift.common:confluence-soap:5.4.1'). I'm using this in a gradle plugin, so you'll need to change a few things
void exportSpaceAsPdf(spaceKey, File outputFile) {
// Setup Pdf Export Service
PdfExportRpcServiceLocator serviceLocator = new PdfExportRpcServiceLocator()
serviceLocator.setpdfexportEndpointAddress("${url}/rpc/soap-axis/pdfexport")
serviceLocator.setMaintainSession(true)
def pdfService = serviceLocator.getpdfexport()
// Login
def token = pdfService.login(user, password)
// Perform Export
def pdfUrl = pdfService.exportSpace(token, spaceKey)
// Download Pdf
HttpClient client = new DefaultHttpClient();
HttpGet httpget = new HttpGet(pdfUrl)
httpget.addHeader(
BasicScheme.authenticate(
new UsernamePasswordCredentials(user,password),"UTF-8", false))
HttpResponse response = client.execute(httpget)
HttpEntity entity = response.getEntity()
if (entity != null) {
InputStream inputStream = entity.getContent()
FileOutputStream fos = new FileOutputStream(outputFile)
int inByte
while ((inByte = inputStream.read()) != -1)
fos.write(inByte)
inputStream.close()
fos.close()
} else {
throw new GradleException("""Cannot Export Space to PDF:
Space: ${spaceKey}
Dest: ${outputFile.absolutePath}
URL: ${pdfUrl}
Status: ${response.getStatusLine()}
""")
}
}
I know this is a PHP example, not Ruby, but you can check out the XML-RPC example in VoycerAG's PHP project on Github at https://github.com/VoycerAG/confluence-xmlrpc-pdf-export/blob/master/src/Voycer/Confluence/Command/PdfExportCommand.php ... hope it helps.
Basically you just need to make a call to the login method and user the authentication token returned to make a call to the exportSpace method. That in turn gives you back a URL which an authenticated user can then download the PDF from.
Turns out the soap API is the only currently available api for exporting a space
Using the Savon library in Ruby here:
require 'savon'
# create a client for the service
# http://<confluence-install>/rpc/soap-axis/pdfexport?wsdll
client = Savon.client(wsdl: 'https://example.atlassian.net/wiki/rpc/soap-axis/pdfexport?wsdl', read_timeout: 200)
# call the 'findUser' operation
response = client.call(:login, message: {username: "user", password: "pass"})
token = response.body[:login_response][:login_return]
response = client.call(:export_space, message:{token: token, space_key: "SPACE KEY"})

HTTPBuilder - How can I get the HTML content of a web page?

I need to extract the HTML of a web page
I'm using HTTPuilder in groovy, making the following get:
def http = new HTTPBuilder('http://www.google.com/search')
http.request(Method.GET) {
requestContentType = ContentType.HTML
response.success = { resp, reader ->
println "resp: " + resp
println "READER: " + reader
}
response.failure = { resp, reader ->
println "Failure"
}
}
The response I get, does not contain the same html I can see when I explore the html source of www.google.com/search. In fact, it's neither an html, and does not contains the same info I can see in the html source of the page.
I've tried setting differents headers (for example, headers.Accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8', headers.Accept = 'text/html', seting the user-agent, etc), but the result is the same.
How can I get the html of www.google.com/search (or any web page) using http builder?
Why use httpBuilder? You might instead use
def url = "http://www.google.com/".toURL()
println url.text`
to extract the content of the webpage
Because the httpbuilder will auto parse the result by the content type.
to get the raw html, try to get text from Entity
def htmlResult = http.get(uri: url, contentType: TEXT){ resp->
return resp.getEntity().getContent().getText()
}

Resources