Building a downloadable sitemap zip file in Heroku - ruby

I'm building a web tool in Heroku / Ruby Sinatra that scrapes a web domain and downloads all specified filetypes (it should provide a zip file of the sitemap of the domain's filetypes to download).
I am trying to figure out how to build a ZipFile on Heroku. How do I set the home directory? Then once I have the ZipFile, how do I link to it so it's downloadable?
Here is some of the relavent code so far:
anemone.after_crawl do
puts "Crawl finished. Gathering files, preparing download..."
datasets.each do |url|
u = URI.parse(url.to_s)
Net::HTTP.start(u.host) { |http|
resp = http.get(u.path)
if u.path[0] == "/"
u.path[0] = ''
end
full_path = u.path.split("/")
i = 0
len = full_path.size
filename = full_path[-1]
Zip::ZipFile.open(u.host + ".zip", Zip::ZipFile::CREATE) {
|zipfile|
while i < (len-1) do
directory = full_path[i]
unless File.directory?(directory)
zipfile.mkdir(directory)
end
Dir.chdir directory
i+=1
end
zipfile.add(filename);
while (i > 0) do
Dir.chdir File.expand_path("..",Dir.pwd)
i-=1
end
}
}
end
end

The Heroku filesystem is mostly read-only, but you should be able to temporarily stash the zipfile on /tmp:
Zip::ZipFile.open("#{RAILS_ROOT}/tmp/" + u.host + ".zip", Zip::ZipFile::CREATE)
You'll probably want to use send_file in a "downloads" controller to allow users to download the file. You'll want to build in error handling in case the temporary file disappears before the user downloads it (e.g., if the dyno restarted between zipfile creation and download).
EDIT
The documentation I linked is apparently outdated. RAILS_ROOT is the Rails 2 way to refer to the directory root, but the Rails 3 way (Rails.root) doesn't work either--in Heroku it refers to the ./app folder.
However, you can use the Heroku base filesystem /tmp folder, like this:
Zip::ZipFile.open("/tmp/" + u.host + ".zip", Zip::ZipFile::CREATE)

Related

Generate Filename Before Downloading

I'm trying to download the latest backup of data during my chef run but it's trying to download the file before the filename is generated. What's the best approach for doing this. All I want to do is generate a filename based on the time and download it.
The below code gives the error undefined method 'latest_backup' for Custom resource aws_s3_file from cookbook aws.
ruby_block "generate file name" do
block do
require 'time'
latest_backup = "NOT-SET"
utc_now = Time.now.utc
utc_midday = Time.new(Time.new.year, Time.new.month, Time.new.day, 22, 00, 1 ).utc
utc_midnight = Time.new(Time.new.year, Time.new.month, Time.new.day, 10, 00, 1 ).utc
if (utc_now < utc_midday) && (utc_now > utc_midnight )
latest_backup = "data_" + Time.now.strftime("%Y%m%d") + "-00001.tgz"
elsif (utc_now > utc_midday ) && (utc_now < utc_midnight)
latest_backup = "data_" + Time.now.strftime("%Y%m%d") + "-120001.tgz"
end
end
action :create
end
aws_s3_file "/root/backup.tgz" do
remote_path "backup-dir/#{latest_backup}"
bucket "my-backups-bucket"
region "ap-southeast-2"
end
You can't set a local variable across contexts like that. Since nothing in that code requires waiting until converge time, you can just run the code outside of a ruby_block and have it be a normal local variable.

Ruby shoes4 package over 1.5GB

I've made a small CLI script in ruby to manage a small shop for a friend, but then he wanted me to make a GUI for him, so I looked around and found shoes4.
So, I went and download it, created a small test, and run:
./bin/shoes -p swt:jar ./path/to/app.rb
and left it to create the package, then I got a warning from system that I'm running low on disc space, so I went to check the jar file, and it was over 1.5GB and still not done packaging... and the code is very small and basic:
require 'yaml'
Shoes.app do
button "Add client" do
filename = ask_open_file
para File.read(filename)
clients = YAML.load_file(filename)
id = clients[clients.length - 1][0].to_i + 1
name = ask("Enter the client's full name: ")
items = ask("Enter list of items.")
patients[id] = ["ID = "+ id.to_s,"Name = "+ pname,"list of items:\n"+ items]
File.open(filename, 'w') { |f| YAML.dump(clients, f) }
alert ("Added new patient.")
end
button "Exit" do
exit()
end
end
any idea why this small app is more than 1.5GB?? or did I try to package it wrong way??
The packager will include everything in the directory of your shoes script and below.

Tempfile.new vs. File.open on Heroku

I'm capturing/creating user entered text into files from my app, attempting to temporarily store them in my Heroku tmp directory, then upload them to a cloud service such as Google Drive.
In using Tempfile I can successfully upload, but when using File.open I get the following error when attempting to upload:
ArgumentError (wrong number of arguments (1 for 0))
The error is on the call:
#client.upload_file_by_folder_id(save_path, #folder_id)
Where #client is a session with the cloud service, save_path is the location of the attached file for upload and #folder_id is the folder they should go into.
When I use Tempfile.new I am successful in doing so:
tempfile = Tempfile.new([final_filename, '.txt'], Rails.root.join('tmp','text-temp'))
tempfile.binmode
tempfile.write msgbody
tempfile.close
save_path = tempfile.path
upload_file = #client.upload_file_by_folder_id(save_path, #folder_id)
tempfile.unlink
File.open code is:
path = 'tmp/text-temp'
filename = "#{final_filename}.txt"
save_path = Rails.root.join(path, filename)
File.open(save_path, 'wb') do |file|
file.write(msgbody)
file.close
end
upload_file = #client.upload_file_by_folder_id(save_path, #folder_id)
File.delete(save_path)
Could it be that the File.path is a string, and Tempfile.path is the full path (but not as a string)? When I put out each, they look identical.
I'd like to use File as I don't want to change the filename of the existing attachments I'm uploading, whereas Tempfile appends to the filename.
Any and all assistance is greatly appreciated. Thanks!
In order for it to work using File, I needed to set the save_path to a string:
save_path.to_s

"resources"-directory for ruby gem

I'm currently experimenting with creating my own gem in Ruby. The gem requires some static resources (say an icon in ICO format). Where do I put such resources within my gem directory tree and how to I access them from code?
Also, parts of my extension are native C code and I would like the C-parts to have access to the resources too.
You can put resources anywhere you want, except in the lib directory. Since it will be will be part of Ruby's load path, the only files that should be there are the ones that you want people to require.
For example, I usually store translated text in the i18n/ directory. For icons, I'd just put them in resources/icons/.
As for how to access these resources... I ran into this problem enough that I wrote a little gem just to avoid repetition.
Basically, I was doing this all the time:
def Your::Gem.root
# Current file is /home/you/code/your/lib/your/gem.rb
File.expand_path '../..', File.dirname(__FILE__)
end
Your::Gem.root
# => /home/you/code/your/
I wrapped this up into a nice DSL, added some additional convenience stuff and ended up with this:
class Your::Gem < Jewel::Gem
root '../..'
end
root = Your::Gem.root
# => /home/you/code/your/
# No more joins!
path = root.resources.icons 'your.ico'
# => /home/you/code/your/resources/icons/your.ico
As for accessing your resources in C, path is just a Pathname. You can pass it to a C function as a string, open the file and just do what you need to do. You can even return an object to the Ruby world:
VALUE your_ico_new(VALUE klass, VALUE path) {
char * ico_file = NULL;
struct your_ico * ico = NULL;
ico_file = StringValueCStr(path);
ico = your_ico_load_from_file(ico_file); /* Implement this */
return Data_Wrap_Struct(your_ico_class, your_ico_mark, your_ico_free, ico);
}
Now you can access it from Ruby:
ico = Your::Ico.new path

Is it possible to rename a file name before user right-click's + save as with Carrierwave + S3 + Heroku + Content-Disposition?

Is it possible to rename a file name before user right-click's + save as with Carrierwave + S3 + Heroku + Content-Disposition? I'm thinking of sanitizing file names (e.g. 193712391231.flv) before the file is saved on the S3 server and saving the original file name in a column in the db.
When a user decides to download the file (right-click and save as). I can't serve / send it as 193712391231.flv. Instead, I would like to send the file with its original file name.
How can this be implemented?
Using Carrierwave. I've come across this:
uploaded = Video.first.attachment
uploader.retrieve_from_store!(File.basename(Video.first.attachment.url))
uploader.cache_stored_file!
send_file uploader.file.path
This wont be served by S3, because it first caches the file in the local filesystem and then sends it to the browser. Which takes up a whole web process (Dyno in Heroku).
If anyone has any ideas, please suggest.
Akshully, you can:
# minimal example,
# here using mongoid, but it doesn't really matter
class Media
field :filename, type: String # i.e. "cute-puppy"
field :extension, type: String # i.e. "mp4"
mount_uploader :media, MediaUploader
end
class MediaUploader < CarrierWave::Uploader::Base
# Files on S3 are only accessible via signed URLS:
#fog_public = false
# Signed URLS expire after ...:
#fog_authenticated_url_expiration = 2.hours # in seconds from now, (default is 10.minutes)
# MIME-Type and filename that the user will see:
def fog_attributes
{
"Content-Disposition" => "attachment; filename*=UTF-8''#{model.filename}",
"Content-Type" => MIME::Types.type_for(model.extension).first.content_type
}
end
# ...
end
The url that model.media.url yields will then return the following headers:
Accept-Ranges:bytes
Content-Disposition:attachment; filename*=UTF-8''yourfilename.mp4
Content-Length:3926746
Content-Type:video/mpeg
Date:Thu, 28 Feb 2013 10:09:14 GMT
Last-Modified:Thu, 28 Feb 2013 09:53:50 GMT
Server:AmazonS3
...
The browser will then force a download (instead of opening in browser) and use the filename you set, regardless of the file name use used to store stuff in the bucket. The only drawback of this is that the Content-Disposition header is set when Carrierwave creates the file, so you couldn't for example use different filenames on the same file for different users.
In that case you could use RightAWS to generate the signed URL:
class Media
def to_url
s3_key = "" # the 'path' to the file in the S3 bucket
request_header = {}
response_header = {
"response-content-disposition" => "attachment; filename*=UTF-8''#{filename_with_extension}",
"response-content-type" => MIME::Types.type_for(extension).first.content_type
}
RightAws::S3Generator.new(
Settings.aws.key,
Settings.aws.secret,
:port => 80,
:protocol => 'http').
bucket(Settings.aws.bucket).
get(s3_key, 2.hours, request_header, response_header)
end
end
EDIT: It is not necessary to use RightAWS, uploader#url supports overriding response headers, the syntax is just a bit confusing (as is everything with CarrierWave, imho, but it's still awesome):
Media.last.media.url(query: {"response-content-disposition" => "attachment; filename*=UTF-8''huhuhuhuhu"})
# results in:
# => https://yourbucket.s3.amazonaws.com/media/512f292be75ab5a46f000001/yourfile.mp4?response-content-disposition=attachment%3B%20filename%2A%3DUTF-8%27%27huhuhuhuhu&AWSAccessKeyId=key&Signature=signature%3D&Expires=1362055338
If you're sending the file to the user direct from S3 you've no option. If you route the file through a dyno you can call it what you want, but you're using a dyno for the entire duration of the download.
I would store the files in a user friendly manner where possible and use folders to organise them.

Resources