Serve HTML files stored on S3 on a Rack app - ruby

Say I have some HTML documents stored on S3 likes this:
http://alan.aws-s3-bla-bla.com/posts/1.html
http://alan.aws-s3-bla-bla.com/posts/2.html
http://alan.aws-s3-bla-bla.com/posts/3.html
http://alan.aws-s3-bla-bla.com/posts/1/comments/1.html
http://alan.aws-s3-bla-bla.com/posts/1/comments/2.html
http://alan.aws-s3-bla-bla.com/posts/1/comments/3.html
etc, etc
I'd like to serve these with a Rack (preferably Sinatra) application, mapping the following routes:
get "/posts/:id" do
render "http://alan.aws-s3-bla-bla.com/posts/#{params[:id]}.html"
end
get "/posts/:posts_id/comments/:comments_id" do
render "http://alan.aws-s3-bla-bla.com/posts/#{params[:posts_id]}/comments/#{params[:comments_id}.html"
end
Is this a good idea? How would I do it?

There would obviously be a wait while you grabbed the file, so you could cache it or set etags etc to help with that. I suppose it depends on how long you want to wait and how often it is accessed, its size etc as to whether it's worth storing the HTML locally or remotely. Only you can work that bit out.
If the last expression in the block is a string that will automatically be rendered, so there's no need to call render as long as you've opened the file as a string.
Here's how to grab an external file and put it into a tempfile:
require 'faraday'
require 'faraday_middleware'
#require 'faraday/adapter/typhoeus' # see https://github.com/typhoeus/typhoeus/issues/226#issuecomment-9919517 if you get a problem with the requiring
require 'typhoeus/adapters/faraday'
configure do
Faraday.default_connection = Faraday::Connection.new(
:headers => { :accept => 'text/plain', # maybe this is wrong
:user_agent => "Sinatra via Faraday"}
) do |conn|
conn.use Faraday::Adapter::Typhoeus
end
end
helpers do
def grab_external_html( url )
response = Faraday.get url # you'll need to supply this variable somehow, your choice
filename = url # perhaps change this a bit
tempfile = Tempfile.open(filename, 'wb') { |fp| fp.write(response.body) }
end
end
get "/posts/:whatever/" do
tempfile = grab_external_html whatever # surely you'd do a bit more here…
tempfile.read
end
This might work. You may also want to think about closing that tempfile, but the garbage collector and the OS should take care of it.

Related

Creating a Ruby API

I have been tasked with creating a Ruby API that retrieves youtube URL's. However, I am not sure of the proper way to create an 'API'... I did the following code below as a Sinatra server that serves up JSON, but what exactly would be the definition of an API and would this qualify as one? If this is not an API, how can I make in an API? Thanks in advance.
require 'open-uri'
require 'json'
require 'sinatra'
# get user input
puts "Please enter a search (seperate words by commas):"
search_input = gets.chomp
puts
puts "Performing search on YOUTUBE ... go to '/videos' API endpoint to see the results and use the output"
puts
# define query parameters
api_key = 'my_key_here'
search_url = 'https://www.googleapis.com/youtube/v3/search'
params = {
part: 'snippet',
q: search_input,
type: 'video',
videoCaption: 'closedCaption',
key: api_key
}
# use search_url and query parameters to construct a url, then open and parse the result
uri = URI.parse(search_url)
uri.query = URI.encode_www_form(params)
result = JSON.parse(open(uri).read)
# class to define attributes of each video and format into eventual json
class Video
attr_accessor :title, :description, :url
def initialize
#title = nil
#description = nil
#url = nil
end
def to_hash
{
'title' => #title,
'description' => #description,
'url' => #url
}
end
def to_json
self.to_hash.to_json
end
end
# create an array with top 3 search results
results_array = []
result["items"].take(3).each do |video|
#video = Video.new
#video.title = video["snippet"]["title"]
#video.description = video["snippet"]["description"]
#video.url = video["snippet"]["thumbnails"]["default"]["url"]
results_array << #video.to_json.gsub!(/\"/, '\'')
end
# define the API endpoint
get '/videos' do
results_array.to_json
end
An "API = Application Program Interface" is, simply, something that another program can reliably use to get a job done, without having to busy its little head about exactly how the job is done.
Perhaps the simplest thing to do now, if possible, is to go back to the person who "tasked" you with this task, and to ask him/her, "well, what do you have in mind?" The best API that you can design, in this case, will be the one that is most convenient for the people (who are writing the programs which ...) will actually have to use it. "Don't guess. Ask!"
A very common strategy for an API, in a language like Ruby, is to define a class which represents "this application's connection to this service." Anyone who wants to use the API does so by calling some function which will return a new instance of this class. Thereafter, the program uses this object to issue and handle requests.
The requests, also, are objects. To issue a request, you first ask the API-connection object to give you a new request-object. You then fill-out the request with whatever particulars, then tell the request object to "go!" At some point in the future, and by some appropriate means (such as a callback ...) the request-object informs you that it succeeded or that it failed.
"A whole lot of voodoo-magic might have taken place," between the request object and the connection object which spawned it, but the client does not have to care. And that, most of all, is the objective of any API. "It Just Works.™"
I think they want you to create a third-party library. Imagine you are schizophrenic for a while.
Joe wants to build a Sinatra application to list some YouTube videos, but he is lazy and he does not want to do the dirty work, he just wants to drop something in, give it some credentials, ask for urls and use them, finito.
Joe asks Bob to implement it for him and he gives him his requirements: "Bob, I need YouTube library. I need it to do:"
# Please note that I don't know how YouTube API works, just guessing.
client = YouTube.new(api_key: 'hola')
video_urls = client.videos # => ['https://...', 'https://...', ...]
And Bob says "OK." end spends a day in his interactive console.
So first, you should figure out how you are going to use your not-yet-existing lib, if you can – sometimes you just don't know yet.
Next, build that library based on the requirements, then drop it in your Sinatra app and you're done. Does that help?

How can I download multiple .xlsx files using axlsx gem?

Hi I'm having trouble downloading multiple files with axlsx. The problem is I'm sending an array of Id's to the controller and asking it to download the report using the render command. It raises an AbstractController::DoubleRenderError. I was thinking of overriding the error but realized it's a bad idea, I don't know what else to do... Any suggestions? Thanks.
My controller code looks like this:
def download_report
params[:user_id].each do |user_id|
#report = Report.find_by(:user_id => user_id)
render :xlsx => "download_report", :filename => "#{#report.user.last_name}.xlsx"
end
end
My axlsx template:
wb = xlsx_package.workbook
wb.add_worksheet(name: "Reports") do |sheet|
wb.styles do |s|
# template code
end
end
It is the built in expectation of Rails that you would call render once per request. And, the browser is going to expect one response per request. So, you are going to have to do something else!
You can use render_to_string, and combine the results into a zip file, serving that. See the bottom of this response.
Or, you could create a single spreadsheet and have each user's report show up on their own worksheet.
Or, on the client side, you could use javascript to request each spreadsheet and download each one separately.
The zip one would be something like this code, which uses render_to_string, rubyzip, and send_data:
def download_report
compressed_filestream = Zip::ZipOutputStream.write_buffer do |zos|
params[:user_id].each do |user_id|
#report = Report.find_by(:user_id => user_id)
content = render_to_string :xlsx => "download_report", :filename => "#{#report.user.last_name}.xlsx"
zos.put_next_entry("user_#{user_id}.xlsx")
zos.print content
end
end
compressed_filestream.rewind
send_data compressed_filestream.read, :filename => 'download_report.zip', :type => "application/zip"
end
Axlsx requires rubyzip, so you should have it already. And you probably want to lookup each user and use their name for the spreadsheet, unless you have it otherwise.

Why is Net::HTTP timing out when I try to access a Prawn Generated PDF?

I am using Prawn to generate a PDF from my controller, and when accessed directly at the url, it works flawlessly, I.E. localhost:3000/responses/1.pdf
However, when I try to generate this file on the fly for inclusion in a Mailer, everything freezes up and it times out.
I have tried various methods for generating / attaching the file and none have changed the outcome.
I also tried modifying the timeout for Net::HTTP to no avail, it just takes LONGER to time out.
If I run this command on the Rails Console, I receive a PDF data stream.
Net::HTTP.get('127.0.0.1',"/responses/1.pdf", 3000)
But if I include this code in my controller, it times out.
I have tried two different methods, and both fail repeatedly.
Method 1
Controller:
http = Net::HTTP.new('localhost', 3000)
http.read_timeout = 6000
file = http.get(response_path(#response, :format => 'pdf')) #timeout here
ResponseMailer.confirmComplete(#response,file).deliver #deliver the mail!
Method 1 Mailer:
def confirmComplete(response,file)
email_address = response.supervisor_id
attachments["test.pdf"] = {:mime_type => "application/pdf", :content=> file}
mail to: email_address, subject: 'Thank you for your feedback!'
end
The above code times out.
Method 2 Controller:
ResponseMailer.confirmComplete(#response).deliver #deliver the mail!
Method 2 Mailer:
def confirmComplete(response)
email_address = response.supervisor_id
attachment "application/pdf" do |a|
a.body = Net::HTTP.get('127.0.0.1',"/responses/1.pdf", 3000) #timeout here
a.filename = "test.pdf"
end
mail to: email_address, subject: 'Thank you for your feedback!'
end
If I switch the a.body and a.filename, it errors out first with
undefined method `filename=' for #<Mail::Part:0x007ff620e05678>
Every example I find has a different syntax or suggestion but none fix the problem that Net::HTTP times out. Rails 3.1, Ruby 1.9.2
The problem is that, in development, you're only running one server process, which is busy generating the email. That process is sending another request (to itself) to generate a PDF and waiting for a response. The request for the PDF is basically standing in line at the server so that it can get it's PDF, but the server is busy generating the email and waiting to get the PDF before it can finish. And thus, you're waiting forever.
What you need to do is start up a second server process...
script/rails server -p 3001
and then get your PDF with something like ...
args = ['127.0.0.1','/responses/1.pdf']
args << 3001 unless Rails.env == 'production'
file = Net::HTTP.get(*args)
As an aside, depending on what server you're running on your production machine, you might run into issues with pointing at 127.0.0.1. You might need to make that dynamic and point to the full domain when in production, but that should be easy.
I agree with https://stackoverflow.com/users/811172/jon-garvin's analysis that you're only running one server process, but I would mention another solution. Refactor your PDF generation so you don't depend on your controller.
If you're using Prawnto, I'm guessing you have a view like
# app/views/response.pdf.prawn
pdf.text "Hello world"
Move this to your Response model: (or somewhere else more appropriate, like a presenter)
# app/models/response.rb
require 'tmpdir'
class Response < ActiveRecord::Base
def pdf_path
return #pdf_path if #pdf_generated == true
#pdf_path = File.join(Dir.tmpdir, rand(1e11).to_s)
Prawn::Document.generate(#pdf_path) do |pdf|
pdf.text "Hello world"
end
#pdf_generated = true
#pdf_path
end
def pdf_cleanup
if #pdf_generated and File.exist?(#pdf_path.to_s)
File.unlink #pdf_path
end
end
end
Then in your ResponsesController you can do:
# app/controllers/responses_controller.rb
def show
#response = Response.find params[:id]
respond_to do |format|
# this sends the PDF to the browser (doesn't email it)
format.pdf { send_file #response.pdf_path, :type => 'application/pdf', :disposition => 'attachment', :filename => 'test.pdf' }
end
end
And in your mailer you can do:
# this sends an email with the PDF attached
def confirm_complete(response)
email_address = response.supervisor_id
attachments['test.pdf'] = {:mime_type => "application/pdf", :content => File.read(response.pdf_path, :binmode => true) }
mail to: email_address, subject: 'Thank you for your feedback!'
end
Since you created it in the tmpdir, it will be automatically deleted when your server restarts. You can also call the cleanup function.
One final note: you might want to use a different model name like SupervisorReport or something - Response might get you in namespacing trouble later)

Sinatra, progress bar in upload form

I'm developing a Sinatra app that consists of an upload form, with a progress bar indicating how much of the upload has completed.
The process, as described by ryan dahl, is the following:
HTTP upload progress bars are rather obfuscated- they typically involve a process running on the server keeping track of the size of the tempfile that the HTTP server is writing to, then on the client side an AJAX call is made every couple seconds to the server during the upload to ask for the progress of the upload.
Every upload has a random session-id, and to keep track of the association i employ a class variable in my app (i know, that's horrible -- if you've got better ideas, please tell me)
configure do
##assoc = {}
end
I have a POST route for the upload, and a GET one for the AJAX polling.
Inside the POST route i save the association of session-id, Tempfile, and total size.
post '/files' do
tmp = params[:file][:tempfile]
# from here on, ##assoc[#sid] should have a value, even in other routes
##assoc[#sid] = { :file => tmp, :size => env['CONTENT_LENGTH'] }
File.open("#{options.filesdir}/#{filename}", 'w+') do |file|
file << tmp.read
end
end
In the GET route, i calculate the percentage based on the Tempfile's current size:
get '/status/:sid' do
h = ##assoc[params[:sid]]
unless h.nil?
percentage = (h[:file].size / h[:size].to_f) * 100
"#{percentage}%"
else
"0%"
end
end
The problem is that until the POST request hasn't completed (i.e., after it has read all of the Tempfile) the h.nil? returns true, which doesn't really make sense as I've just assigned ##assoc[#sid] a value in the other route.
So, what am I missing here?
EDIT: I've tried
set :reload, false
set :environment, :production
config { ##assoc ||= {} }
I also tried throwing a relational db at it (SQLite with DataMapper)
Neither worked.
I think i got what the problem is:
tmp = params[:file][:tempfile] doesn't return until the file has been fully received.
##assoc[#sid] = { :file => tmp, :size => env['CONTENT_LENGTH'] }
should be
##assoc[params[:sid]] = { :file => tmp, :size => env['CONTENT_LENGTH'] }

How to access html request parameters for a .rhtml page served by webrick?

I'm using webrick (the built-in ruby webserver) to serve .rhtml
files (html with ruby code embedded --like jsp).
It works fine, but I can't figure out how to access parameters
(e.g. http://localhost/mypage.rhtml?foo=bar)
from within the ruby code in the .rhtml file.
(Note that I'm not using the rails framework, only webrick + .rhtml files)
Thanks
According to the source code of erbhandler it runs the rhtml files this way:
Module.new.module_eval{
meta_vars = servlet_request.meta_vars
query = servlet_request.query
erb.result(binding)
}
So the binding should contain a query (which contains a hash of the query string) and a meta_vars variable (which contains a hash of the environment, like SERVER_NAME) that you can access inside the rhtml files (and the servlet_request and servlet_response might be available too, but I'm not sure about them).
If that is not the case you can also try querying the CGI parameter ENV["QUERY_STRING"] and parse it, but this should only be as a last resort (and it might only work with CGI files).
This is the solution:
(suppose the request is http://your.server.com/mypage.rhtml?foo=bar)
<html>
<body>
This is my page (mypage.rhtml, served by webrick)
<%
# embedded ruby code
servlet_request.query ["foo"] # this simply prints bar on console
%>
</body>
</html>
You don't give much details, but I imagine you have a servlet to serve the files you will process with erb, and by default the web server serves any static file in a public directory.
require 'webrick'
include WEBrick
require 'erb'
s = HTTPServer.new( :Port => 8080,:DocumentRoot => Dir::pwd + "/public" )
class MyServlet < HTTPServlet::AbstractServlet
def do_GET(req, response)
File.open('public/my.rhtml','r') do |f|
#template = ERB.new(f.read)
end
response.body = #template.result(binding)
response['Content-Type'] = "text/html"
end
end
s.mount("/my", MyServlet)
trap("INT"){
s.shutdown
}
s.start
This example is limited, when you go to /my always the same file is processed. Here you should construct the file path based on the request path. Here I said a important word: "request", everything you need is there.
To get the HTTP header parameters, use req[header_name]. To get the parameters in the query string, use req.query[param_name]. req is the HTTPRequest object passed to the servlet.
Once you have the parameter you need, you have to bind it to the template. In the example we pass the binding object from self (binding is defined in Kernel, and it represents the context where code is executing), so every local variable defined in the do_GET method would be available in the template. However, you can create your own binding for example passing a Proc object and pass it to the ERB processor when calling 'result'.
Everything together, your solution would look like:
def do_GET(req, response)
File.open('public/my.rhtml','r') do |f|
#template = ERB.new(f.read)
end
foo = req.query["foo"]
response.body = #template.result(binding)
response['Content-Type'] = "text/html"
end
Browsing the documentation, it looks like you should have an HTTPRequest from which you can get the query string. You can then use parse_query to get a name/value hash.
Alternatively, it's possible that just calling query() will give you the hash directly... my Ruby-fu isn't quite up to it, but you might want to at least give it a try.

Resources