CGI in a ruby sinatra server - ruby

I am develop a simple web app with sinatra and ruby, and I have two files: app.rb is my sinatra app and test.cgi is a CGI program. I need execute the CGI script, for example:
#!/usr/bin/env ruby
# encoding: utf-8
# app.rb
require "sinatra"
get "/form" do
File.read("my_web_form.html")
end
post "/form" do
# I need execute the CGI script, but this not works:
cgi "text.cgi"
end
My CGI script is a custom language (I have a interpreter created by me), and I try to embed it into web apps. Thanks.

I've done some searching and I'm not able to find a way to "render CGI" in the way you're trying (which is the intuitive way).
However it does seem that you can run Sinata from a CGI. See here for a code example.
I was actually trying to do this a few days ago, and I guess I gave up. But seeing your question encouraged me to figure it out. See the following example of how to render CGI from sinatra:
A sample CGI file, say it's at ./app.cgi and chmod +x has been run
#!/usr/bin/env ruby
require "cgi"
cgi = CGI.new("html4")
cgi.out{
cgi.html{
cgi.head{ "\n"+cgi.title{"This Is a Test"} } +
cgi.body{ "\n"+
cgi.h1 { "This is a Test" } + "\n"+
}
}
}
A module which defines a render_cgi method:
class RenderCgiError < StandardError
end
module RenderCgi
def render_cgi(filepath, options={})
headers_string, body = run_cgi_and_parse_output(filepath, options)
headers_hash = parse_headers_string(headers_string)
response = Rack::Response.new
headers_hash.each { |k,v| response.header[k] = v }
response.body << body
response
end
private
def run_cgi_and_parse_output(filepath, options={})
options_string = options.reduce("") { |str, (k,v)| str << "#{k}=#{v} " }
# make sure options has at least one key-val pair, otherwise running the CGI may hang
if options_string.split("=").select { |part| (part&.length || -1) > 0 }.length < 2
raise(RenderCgiError, "one truthy key and associated truthy val is required for options")
end
output = `sh #{filepath} #{options_string}`
headers_string, body = output.split("\n\r")
return [headers_string, body]
end
def parse_headers_string(string)
return string.split("\n").reduce({}) do |results, line|
key, val = line.split(": ")
results[key.chomp] = val.chomp
next results
end
end
end
and a Sinatra app which runs it
require 'sinatra'
class MyApp < Sinatra::Base
include RenderCgi
get '/' do
render_cgi("./app.cgi", { "foo" => "bar" })
end
end
MyApp.run!

Related

How to test HTTParty API call with Ruby and RSpec

I am using the HTTParty gem to make a call to the GitHub API to access a list of user's repos.
It is a very simple application using Sinatra that displays a user's favourite programming language based on the most common language that appears in their repos.
I am a bit stuck on how I can write an RSpec expectation that mocks out the actual API call and instead just checks that json data is being returned.
I have a mock .json file but not sure how to use it in my test.
Any ideas?
github_api.rb
require 'httparty'
class GithubApi
attr_reader :username, :data, :languages
def initialize(username)
#username = username
#response = HTTParty.get("https://api.github.com/users/#{#username}/repos")
#data = JSON.parse(#response.body)
end
end
github_api_spec.rb
require './app/models/github_api'
require 'spec_helper'
describe GithubApi do
let(:github_api) { GithubApi.new('mock_user') }
it "receives a json response" do
end
end
Rest of the files for clarity:
results.rb
require 'httparty'
require_relative 'github_api'
class Results
def initialize(github_api = Github.new(username))
#github_api = github_api
#languages = []
end
def get_languages
#github_api.data.each do |repo|
#languages << repo["language"]
end
end
def favourite_language
get_languages
#languages.group_by(&:itself).values.max_by(&:size).first
end
end
application_controller.rb
require './config/environment'
require 'sinatra/base'
require './app/models/github_api'
class ApplicationController < Sinatra::Base
configure do
enable :sessions
set :session_secret, "#3x!iltĀ£"
set :views, 'app/views'
end
get "/" do
erb :index
end
post "/user" do
#github = GithubApi.new(params[:username])
#results = Results.new(#github)
#language = #results.favourite_language
session[:language] = #language
session[:username] = params[:username]
redirect '/results'
end
get "/results" do
#language = session[:language]
#username = session[:username]
erb :results
end
run! if app_file == $0
end
There are multiple ways you could approach this problem.
You could, as #anil suggested, use a library like webmock to mock the underlying HTTP call. You could also do something similar with VCR (https://github.com/vcr/vcr) which records the results of an actual call to the HTTP endpoint and plays back that response on subsequent requests.
But, given your question, I don't see why you couldn't just use an Rspec double. I'll show you how below. But, first, it would be a bit easier to test the code if it were not all in the constructor.
github_api.rb
require 'httparty'
class GithubApi
attr_reader :username
def initialize(username)
#username = username
end
def favorite_language
# method to calculate which language is used most by username
end
def languages
# method to grab languages from repos
end
def repos
repos ||= do
response = HTTParty.get("https://api.github.com/users/#{username}/repos")
JSON.parse(response.body)
end
end
end
Note that you do not need to reference the #username variable in the url because you have an attr_reader.
github_api_spec.rb
require './app/models/github_api'
require 'spec_helper'
describe GithubApi do
subject(:api) { described_class.new(username) }
let(:username) { 'username' }
describe '#repos' do
let(:github_url) { "https://api.github.com/users/#{username}/repos" }
let(:github_response) { instance_double(HTTParty::Response, body: github_response_body) }
let(:github_response_body) { 'response_body' }
before do
allow(HTTParty).to receive(:get).and_return(github_response)
allow(JSON).to receive(:parse)
api.repos
end
it 'fetches the repos from Github api' do
expect(HTTParty).to have_received(:get).with(github_url)
end
it 'parses the Github response' do
expect(JSON).to have_received(:parse).with(github_response_body)
end
end
end
Note that there is no need to actually load or parse any real JSON. What we're testing here is that we made the correct HTTP call and that we called JSON.parse on the response. Once you start testing the languages method you'd need to actually load and parse your test file, like this:
let(:parsed_response) { JSON.parse(File.read('path/to/test/file.json')) }
You can mock those API calls using https://github.com/bblimke/webmock and send back mock.json using webmock. This post, https://robots.thoughtbot.com/how-to-stub-external-services-in-tests walks you through the setup of webmock with RSpec (the tests in the post mock GitHub API call too)

Sidekiq mechanize overwritten instance

I am building a simple web spider using Sidekiq and Mechanize.
When I run this for one domain, it works fine. When I run it for multiple domains, it fails. I believe the reason is that web_page gets overwritten when instantiated by another Sidekiq worker, but I am not sure if that's true or how to fix it.
# my scrape_search controller's create action searches on google.
def create
#scrape = ScrapeSearch.build(keywords: params[:keywords], profession: params[:profession])
agent = Mechanize.new
scrape_search = agent.get('http://google.com/') do |page|
search_result = page.form...
search_result.css("h3.r").map do |link|
result = link.at_css('a')['href'] # Narrowing down to real search results
#domain = Domain.new(some params)
ScrapeDomainWorker.perform_async(#domain.url, #domain.id, remaining_keywords)
end
end
end
I'm creating a Sidekiq job per domain. Most of the domains I'm looking for should contain just a few pages, so there's no need for sub-jobs per page.
This is my worker:
class ScrapeDomainWorker
include Sidekiq::Worker
...
def perform(domain_url, domain_id, keywords)
#domain = Domain.find(domain_id)
#domain_link = #domain.protocol + '://' + domain_url
#keywords = keywords
# First we scrape the homepage and get the first links
#domain.to_parse = ['/'] # to_parse is an array of PATHS to parse for the domain
mechanize_path('/')
#domain.verified << '/' # verified is an Array field containing valid domain paths
get_paths(#web_page) # Now we should have to_scrape populated with homepage links
#domain.scraped = 1 # Loop counter
while #domain.scraped < 100
#domain.to_parse.each do |path|
#domain.to_parse.delete(path)
#domain.scraped += 1
mechanize_path(path) # We create a Nokogiri HTML doc with mechanize for the valid path
...
get_paths(#web_page) # Fire this to repopulate to_scrape !!!
end
end
#domain.save
end
def mechanize_path(path)
agent = Mechanize.new
begin
#web_page = agent.get(#domain_link + path)
rescue Exception => e
puts "Mechanize Exception for #{path} :: #{e.message}"
end
end
def get_paths(web_page)
paths = web_page.links.map {|link| link.href.gsub((#domain.protocol + '://' + #domain.url), "") } ## This works when I scrape a single domain, but fails with ".gsub for nil" when I scrape a few domains.
paths.uniq.each do |path|
#domain.to_parse << path
end
end
end
This works when I scrape a single domain, but fails with .gsub for nil for web_page when I scrape a few domains.
You can wrap you code in another class, and then create and object of that class within your worker:
class ScrapeDomainWrapper
def initialize(domain_url, domain_id, keywords)
# ...
end
def mechanize_path(path)
# ...
end
def get_paths(web_page)
# ...
end
end
And your worker:
class ScrapeDomainWorker
include Sidekiq::Worker
def perform(domain_url, domain_id, keywords)
ScrapeDomainWrapper.new(domain_url, domain_id, keywords)
end
end
Also, bear in mind that Mechanize::Page#links may be a nil.

Efficient way to render ton of JSON on Heroku

I built a simple API with one endpoint. It scrapes files and currently has around 30,000 records. I would ideally like to be able to fetch all those records in JSON with one http call.
Here is my Sinatra view code:
require 'sinatra'
require 'json'
require 'mongoid'
Mongoid.identity_map_enabled = false
get '/' do
content_type :json
Book.all
end
I've tried the following:
using multi_json with
require './require.rb'
require 'sinatra'
require 'multi_json'
MultiJson.engine = :yajl
Mongoid.identity_map_enabled = false
get '/' do
content_type :json
MultiJson.encode(Book.all)
end
The problem with this approach is I get Error R14 (Memory quota exceeded). I get the same error when I try to use the 'oj' gem.
I would just concatinate everything one long Redis string, but Heroku's redis service is $30 per month for the instance size I would need (> 10mb).
My current solution is to use background task that creates objects and stuffs them full of jsonified objects at near the Mongoid object size limit (16mb). The problems with this approach: It still takes nearly 30 seconds to render, and I have to run post-processing on the receiving app to properly extract the json from the objects.
Does anyone have any better idea for how I can render json for 30k records in one call without switching away from Heroku?
Sounds like you want to stream the JSON directly to the client instead of building it all up in memory. It's probably the best way to cut down memory usage. You could for example use yajl to encode JSON directly to a stream.
Edit: I rewrote the entire code for yajl, because its API is much more compelling and allows for much cleaner code. I also included an example for reading the response in chunks. Here's the streamed JSON array helper I wrote:
require 'yajl'
module JsonArray
class StreamWriter
def initialize(out)
super()
#out = out
#encoder = Yajl::Encoder.new
#first = true
end
def <<(object)
#out << ',' unless #first
#out << #encoder.encode(object)
#out << "\n"
#first = false
end
end
def self.write_stream(app, &block)
app.stream do |out|
out << '['
block.call StreamWriter.new(out)
out << ']'
end
end
end
Usage:
require 'sinatra'
require 'mongoid'
Mongoid.identity_map_enabled = false
# use a server that supports streaming
set :server, :thin
get '/' do
content_type :json
JsonArray.write_stream(self) do |json|
Book.all.each do |book|
json << book.attributes
end
end
end
To decode on the client side you can read and parse the response in chunks, for example with em-http. Note that this solution requires the clients memory to be large enough to store the entire objects array. Here's the corresponding streamed parser helper:
require 'yajl'
module JsonArray
class StreamParser
def initialize(&callback)
#parser = Yajl::Parser.new
#parser.on_parse_complete = callback
end
def <<(str)
#parser << str
end
end
def self.parse_stream(&callback)
StreamParser.new(&callback)
end
end
Usage:
require 'em-http'
parser = JsonArray.parse_stream do |object|
# block is called when we are done parsing the
# entire array; now we can handle the data
p object
end
EventMachine.run do
http = EventMachine::HttpRequest.new('http://localhost:4567').get
http.stream do |chunk|
parser << chunk
end
http.callback do
EventMachine.stop
end
end
Alternative solution
You could actually simplify the whole thing a lot when you give up the need for generating a "proper" JSON array. What the above solution generates is JSON in this form:
[{ ... book_1 ... }
,{ ... book_2 ... }
,{ ... book_3 ... }
...
,{ ... book_n ... }
]
We could however stream each book as a separate JSON and thus reduce the format to the following:
{ ... book_1 ... }
{ ... book_2 ... }
{ ... book_3 ... }
...
{ ... book_n ... }
The code on the server would then be much simpler:
require 'sinatra'
require 'mongoid'
require 'yajl'
Mongoid.identity_map_enabled = false
set :server, :thin
get '/' do
content_type :json
encoder = Yajl::Encoder.new
stream do |out|
Book.all.each do |book|
out << encoder.encode(book.attributes) << "\n"
end
end
end
As well as the client:
require 'em-http'
require 'yajl'
parser = Yajl::Parser.new
parser.on_parse_complete = Proc.new do |book|
# this will now be called separately for every book
p book
end
EventMachine.run do
http = EventMachine::HttpRequest.new('http://localhost:4567').get
http.stream do |chunk|
parser << chunk
end
http.callback do
EventMachine.stop
end
end
The great thing is that now the client does not have to wait for the entire response, but instead parses every book separately. However, this will not work if one of your clients expects one single big JSON array.

Testing STDOUT output in Rspec

I am trying to build a spec for this statement. It is easy with 'puts'
print "'#{#file}' doesn't exist: Create Empty File (y/n)?"
RSpec 3.0+
RSpec 3.0 added a new output matcher for this purpose:
expect { my_method }.to output("my message").to_stdout
expect { my_method }.to output("my error").to_stderr
Minitest
Minitest also has something called capture_io:
out, err = capture_io do
my_method
end
assert_equals "my message", out
assert_equals "my error", err
RSpec < 3.0 (and others)
For RSpec < 3.0 and other frameworks, you can use the following helper. This will allow you to capture whatever is sent to stdout and stderr, respectively:
require 'stringio'
def capture_stdout(&blk)
old = $stdout
$stdout = fake = StringIO.new
blk.call
fake.string
ensure
$stdout = old
end
def capture_stderr(&blk)
old = $stderr
$stderr = fake = StringIO.new
blk.call
fake.string
ensure
$stderr = old
end
Now, when you have a method that should print something to the console
def my_method
# ...
print "my message"
end
you can write a spec like this:
it 'should print "my message"' do
printed = capture_stdout do
my_method # do your actual method call
end
printed.should eq("my message")
end
If your goal is only to be able to test this method, I would do it like this:
class Executable
def initialize(outstream, instream, file)
#outstream, #instream, #file = outstream, instream, file
end
def prompt_create_file
#outstream.print "'#{#file}' doesn't exist: Create Empty File (y/n)?"
end
end
# when executing for real, you would do something like
# Executable.new $stdout, $stdin, ARGV[0]
# when testing, you would do
describe 'Executable' do
before { #input = '' }
let(:instream) { StringIO.new #input }
let(:outstream) { StringIO.new }
let(:filename) { File.expand_path '../testfile', __FILE__ }
let(:executable) { Executable.new outstream, instream, filename }
specify 'prompt_create_file prompts the user to create a new file' do
executable.prompt_create_file
outstream.string.should include "Create Empty File (y/n)"
end
end
However, I want to point out that I would not test a method like this directly. Instead, I'd test the code that uses it. I was talking with a potential apprentice yesterday, and he was doing something very similar, so I sat down with him, and we reimplemented a portion of the class, you can see that here.
I also have a blog that talks about this kind of thing.

Is there a way to flush html to the wire in Sinatra

I have a Sinatra app with a long running process (a web scraper). I'd like the app flush the results of the crawler's progress as the crawler is running instead of at the end.
I've considered forking the request and doing something fancy with ajax but this is a really basic one-pager app that really just needs to output a log to a browser as it's happening. Any suggestions?
Update (2012-03-21)
As of Sinatra 1.3.0, you can use the new streaming API:
get '/' do
stream do |out|
out << "foo\n"
sleep 10
out << "bar\n"
end
end
Old Answer
Unfortunately you don't have a stream you can simply flush to (that would not work with Rack middleware). The result returned from a route block can simply respond to each. The Rack handler will then call each with a block and in that block flush the given part of the body to the client.
All rack responses have to always respond to each and always hand strings to the given block. Sinatra takes care of this for you, if you just return a string.
A simple streaming example would be:
require 'sinatra'
get '/' do
result = ["this", " takes", " some", " time"]
class << result
def each
super do |str|
yield str
sleep 0.3
end
end
end
result
end
Now you could simply place all your crawling in the each method:
require 'sinatra'
class Crawler
def initialize(url)
#url = url
end
def each
yield "opening url\n"
result = open #url
yield "seaching for foo\n"
if result.include? "foo"
yield "found it\n"
else
yield "not there, sorry\n"
end
end
end
get '/' do
Crawler.new 'http://mysite'
end

Resources