Ruby TempFile behaviour among different classes - ruby

Our processing server works mainly with TempFiles as it makes things easier on our side: no need to take care of deleting them as they get garbage collected or handle name collisions, etc.
Lately, we are having problems with TempFiles getting GCed too early in the process. Specially with one of our services that will convert a Foo file from a url to some Bar file and upload it to our servers.
For sake of clarity I added bellow a case scenario in order to make discussion easier and have an example at hand.
This workflow does the following:
Get a url as parameter
Download the Foo file as a TempFile
Duplicate it to a new TempFile
Download the related assets to TempFiles
Link the related assets into the local dup TempFile
Convert the Foo to Bar format
Upload it to our server
At times the conversion fail and everything points to the fact that our local Foo file is pointing to related assets that have been created and GCed before the conversion.
My two questions:
Is it possible that my TempFiles get GCed too early? I read about Ruby GCed system it was very conservative to avoid those scenarios.
How can I avoid this from happening? I could try to save all related assets from download_and_replace_uri(node) and passing them as a return to keep it alive while the instance of ConvertService is still existing. But I'm not sure if this would solve it.
myfile.foo
{
"buffers": [
{ "uri": "http://example.com/any_file.jpg" },
{ "uri": "http://example.com/any_file.png" },
{ "uri": "http://example.com/any_file.jpmp3" }
]
}
main.rb
ConvertService.new('http://example.com/myfile.foo')
ConvertService
class ConvertService
def initialize(url)
#url = url
#bar_file = Tempfile.new
end
def call
import_foo
convert_foo
upload_bar
end
private
def import_foo
#foo_file = ImportService.new(#url).call.edited_file
end
def convert_foo
`create-bar "#{#foo_file.path}" "#{#bar_file.path}"`
end
def upload_bar
UploadBarService.new(#bar_file).call
end
end
ImportService
class ImportService
def initialize(url)
#url = url
#edited_file ||= Tempfile.new
end
def call
download
duplicate
replace
end
private
def download
#original = DownloadFileService.new(#url).call.file
end
def duplicate
FileUtils.cp(#original.path, #edited_file.path)
end
def replace
file = File.read(#edited_file.path)
json = JSON.parse(file, symbolize_names: true)
json[:buffers]&.each do |node|
node[:uri] = DownloadFileService.new(node[:uri]).call.file.path
end
write_to_disk(#edited_file.path, json.to_json)
end
end
DownloadFileService
module Helper
class DownloadFileService < ApplicationHelperService
def initialize(url)
#url = url
#file = Tempfile.new
end
def call
uri = URI.parse(#url)
Net::HTTP.start(
uri.host,
uri.port,
use_ssl: uri.scheme == 'https'
) do |http|
response = http.request(Net::HTTP::Get.new(uri.path))
#file.binmode
#file.write(response.body)
#file.flush
end
end
end
end
UploadBarService
module Helper
class UploadBarService < ApplicationHelperService
def initialize(file)
#file = file
end
def call
HTTParty.post('http://example.com/upload', body: { file: #file })
# NOTE: End points returns the url for the uploaded file
end
end
end

Because of the complexity of your code and missing parts which may be obfuscated to us, the simple answer to your problem is to insure that your tempfile instance objects remain in memory throughout the lifecycle in which they are needed, otherwise they will get garbage collected immediately, removing the tempfile from the file system, and will lead to the the missing tempfile state you've encountered.
The Ruby Document for Tempfile states "When a Tempfile object is garbage collected, or when the Ruby interpreter exits, its associated temporary file is automatically deleted."
As per comments, others may find this conversation helpful when running into this problem.

Related

Testing input/output with rspec and plain ruby

I am trying to create a test for a FileProcessor that reads from a text file, passes it to another class and then writes output. I made a test file and am able to access but it feels bulky. I'm also going to need to test that it writes the output in a new file and I am not sure how to set this up. I've seen a lot of tutorials but they are be rails centric. My goal is to get rid of writing the path in the test and to clean up the generated output files after each test.
describe FileProcessor do
test_file = File.dirname(__FILE__) + '/fixtures/test_input.txt'
output_file = File.dirname(__FILE__) + '/fixtures/test_output.txt'
subject {FileProcessor.new(test_file, output_file)}
describe '#read_file' do
it 'reads a file' do
expect(subject.read_file).to eq('This is a test.')
end
end
def write_file(str)
File.open("#{output_file}", "w+") { |file| file.write(str) }
end
end
How about using StringIO:
require 'stringio'
class FileProcessor
def initialize(infile, outfile)
#infile = infile
#outfile = outfile
#content = nil
end
def read_file
#content ||= #infile.read
end
def write_file(text)
#outfile.write(text)
end
end
describe FileProcessor do
let(:outfile) { StringIO.new }
subject(:file_processor) do
infile = StringIO.new('This is a test')
FileProcessor.new(infile, outfile)
end
describe '#read_file' do
it "returns correct text" do
expect(file_processor.read_file).to eq("This is a test")
end
end
describe '#write_file' do
it "writes correct text" do
file_processor.write_file("Hello world")
outfile.rewind
expect(outfile.read).to eq("Hello world")
end
end
end
There's not a great way to avoid writing the path of your input file. You could move that into a helper method, but on the other hand having the path in the test has the benefit that someone else (or you six months from now) looking at the code will know immediately where the test data comes from.
As for the output file, the simplest solution is to use Ruby's built-in Tempfile class. Tempfile.new is like File.new, except that it automatically puts the file in /tmp (or wherever your OS's temporary file directory is) and gives it a unique name. This way you don't have to worry about cleaning it up, because the next time you run the test it'll use a file with a different name (and your OS will automatically delete the file). For example:
require 'tempfile'
describe FileProcessor do
let(:test_file_path) { File.dirname(__FILE__) + '/fixtures/test_input.txt' }
let(:output_file) { Tempfile.new('test_output.txt').path }
subject { FileProcessor.new(test_file_path, output_file.path) }
describe '#read_file' do
it 'reads a file' do
expect(subject.read_file).to eq('This is a test.')
end
end
end
Using let (instead of just assigning a local variable) ensures that each example will use its own unique output file. In RSpec you should almost always prefer let.
If you want to get really serious, you could instead use the FakeFS gem, which mocks all of Ruby's built-in file-related classes (File, Pathname, etc.) so you're never writing to your actual filesystem. Here's a quick tutorial on using FakeFS: http://www.bignerdranch.com/blog/fake-it/

read json in Ruby and set variables for use in another class

The need here is to read a json file and to make the variables which is done from one class and use them with in another class. What I have so far is
helper.rb
class MAGEINSTALLER_Helper
#note nonrelated items removed
require 'fileutils'
#REFACTOR THIS LATER
def load_settings()
require 'json'
file = File.open("scripts/installer_settings.json", "rb")
contents = file.read
file.close
#note this should be changed for a better content check.. ie:valid json
#so it's a hack for now
if contents.length > 5
begin
parsed = JSON.parse(contents)
rescue SystemCallError
puts "must redo the settings file"
else
puts parsed['bs_mode']
parsed.each do |key, value|
puts "#{key}=>#{value}"
instance_variable_set("#" + key, value) #better way?
end
end
else
puts "must redo the settings file"
end
end
#a method to provide feedback simply
def download(from,to)
puts "completed download for #{from}\n"
end
end
Which is called in a file of Pre_start.rb
class Pre_start
#note nonrelated items removed
def initialize(params=nil)
puts 'World'
mi_h = MAGEINSTALLER_Helper.new
mi_h.load_settings()
bs_MAGEversion=instance_variable_get("#bs_MAGEversion") #doesn't seem to work
file="www/depo/newfile-#{bs_MAGEversion}.tar.gz"
if !File.exist?(file)
mi_h.download("http://www.dom.com/#{bs_MAGEversion}/file-#{bs_MAGEversion}.tar.gz",file)
else
puts "mage package exists"
end
end
end
the josn file is valid json and is a simple object (note there is more just showing the relevant)
{
"bs_mode":"lite",
"bs_MAGEversion":"1.8.0.0"
}
The reason I need to have a json settings file is that I will need to pull settings from a bash script and later a php script. This file is the common thread that is used to pass settings each share and need to match.
Right now I end up with an empty string for the value.
The instance_variable_setis creating the variable inside MAGEINSTALLER_Helper class. That's the reason why you can't access these variables.
You can refactor it into a module, like this:
require 'fileutils'
require 'json'
module MAGEINSTALLER_Helper
#note nonrelated items removed
#REFACTOR THIS LATER
def load_settings()
content = begin
JSON.load_file('scripts/installer_settings.json')
rescue
puts 'must redo the settings file'
{} # return an empty Hash object
end
parsed.each {|key, value| instance_variable_set("##{key}", value)}
end
#a method to provide feedback simply
def download(from,to)
puts "completed download for #{from}\n"
end
end
class PreStart
include MAGEINSTALLER_Helper
#note nonrelated items removed
def initialize(params=nil)
puts 'World'
load_settings # The method is available inside the class
file="www/depo/newfile-#{#bs_MAGEversion}.tar.gz"
if !File.exist?(file)
download("http://www.dom.com/#{#bs_MAGEversion}/file-#{#bs_MAGEversion}.tar.gz",file)
else
puts "mage package exists"
end
end
end
I refactored a little bit to more Rubish style.
On this line:
bs_MAGEversion=instance_variable_get("#bs_MAGEversion") #doesn't seem to work
instance_variable_get isn't retrieving from the mi_h Object, which is where your value is stored. The way you've used it, that line is equivalent to:
bs_MAGEversion=#bs_MAGEversion
Changing it to mi_h.instance_variable_get would work. It would also be painfully ugly ruby. But I sense that's not quite what you're after. If I read you correctly, you want this line:
mi_h.load_settings()
to populate #bs_MAGEversion and #bs_mode in your Pre_start object. Ruby doesn't quite work that way. The closest thing to what you're looking for here would probably be a mixin, as described here:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_modules.html
We do something similar to this all the time in code at work. The problem, and solution, is proper use of variables and scoping in the main level of your code. We use YAML, you're using JSON, but the idea is the same.
Typically we define a constant, like CONFIG, which we load the YAML into, in our main code, and which is then available in all the code we require. For you, using JSON instead:
require 'json'
require_relative 'helper'
CONFIG = JSON.load_file('path/to/json')
At this point CONFIG would be available to the top-level code and in "helper.rb" code.
As an alternate way of doing it, just load your JSON in either file. The load-time is negligible and it'll still be the same data.
Since the JSON data should be static for the run-time of the program, it's OK to use it in a CONSTANT. Storing it in an instance variable only makes sense if the data would vary from instance to instance of the code, which makes no sense when you're loading data from a JSON or YAML-type file.
Also, notice that I'm using a method from the JSON class. Don't go through the rigamarole you're using to try to copy the JSON into the instance variable.
Stripping your code down as an example:
require 'fileutils'
require 'json'
CONTENTS = JSON.load_file('scripts/installer_settings.json')
class MAGEINSTALLER_Helper
def download(from,to)
puts "completed download for #{from}\n"
end
end
class Pre_start
def initialize(params=nil)
file = "www/depo/newfile-#{ CONFIG['bs_MAGEversion'] }.tar.gz"
if !File.exist?(file)
mi_h.download("http://www.dom.com/#{ CONFIG['bs_MAGEversion'] }/file-#{ CONFIG['bs_MAGEversion'] }.tar.gz", file)
else
puts "mage package exists"
end
end
end
CONFIG can be initialized/loaded in either file, just do it from the top-level before you need to access the contents.
Remember, Ruby starts executing it at the top of the first file and reads downward. Code that is outside of def, class and module blocks gets executed as it's encountered, so the CONFIG initialization will happen as soon as Ruby sees that code. If that happens before you start calling your methods and creating instances of classes then your code will be happy.

Two versions of each blog post in Jekyll

I need two versions of each of my posts in a very simple Jekyll setup: The public facing version and a barebones version with branding specifically for embedding.
I have one layout for each type:
post.html
post_embed.html
I could accomplish this just fine by making duplicates of each post file with different layouts in the front matter, but that's obviously a terrible way to do it. There must be a simpler solution, either at the level of the command line or in the front matter?
Update:
This SO question covers creating JSON files for each post. I really just need a generator to loop through each post, alter one value in the YAML front matter (embed_page=True) and feed it back to the same template. So each post is rendered twice, once with embed_page true and one with it false. Still don't have a full grasp of generators.
Here's my Jekyll plugin to accomplish this. It's probably absurdly inefficient, but I've been writing in Ruby for all of two days.
module Jekyll
# override write and destination functions to taking optional argument for pagename
class Post
def destination(dest, pagename)
# The url needs to be unescaped in order to preserve the correct filename
path = File.join(dest, CGI.unescape(self.url))
path = File.join(path, pagename) if template[/\.html$/].nil?
path
end
def write(dest, pagename="index.html")
path = destination(dest, pagename)
puts path
FileUtils.mkdir_p(File.dirname(path))
File.open(path, 'w') do |f|
f.write(self.output)
end
end
end
# the cleanup function was erasing our work
class Site
def cleanup
end
end
class EmbedPostGenerator < Generator
safe true
priority :low
def generate(site)
site.posts.each do |post|
if post.data["embeddable"]
post.data["is_embed"] = true
post.render(site.layouts, site.site_payload)
post.write(site.dest, "embed.html")
post.data["is_embed"] = false
end
end
end
end
end

Accessing/Dealing with Variables in Ruby

Let me preface by stating I'm a "new" programmer - an IT guy trying his hand at his first "real" problem after working through various tutorials.
So - here is what I'm trying to do. I'm watching a directory for a .csv file - it will be in this format: 999999_888_filename.csv
I want to return each part of the "_" filename as a variable to pass on to another program/script for some other task. I have come up w/ the following code:
require 'rubygems'
require 'fssm'
class Watcher
def start
monitor = FSSM::Monitor.new(:directories => true)
monitor.path('/data/testing/uploads') do |path|
path.update do |base, relative, ftype|
output(relative)
end
path.create do |base, relative, ftype|
output(relative)
end
path.delete { |base, relative, ftype| puts "DELETED #{relative} (#{ftype})" }
end
monitor.run
end
def output(relative)
puts "#{relative} added"
values = relative.split('_',)
sitenum = values[0]
numrecs = values[1]
filename = values[2]
puts sitenum
end
end
My first "puts" gives me the full filename (it's just there to show me the script is working), and the second puts returns the 'sitenum'. I want to be able to access this "outside" of this output method. I have this file (named watcher.rb) in a libs/ folder and I have a second file in the project root called 'monitor.rb' which contains simply:
require './lib/watcher'
watcher = Watcher.new
watcher.start
And I can't figure out how to access my 'sitenum', 'numrecs' and 'filename' from this file. I'm not sure if it needs to be a variable, instance variable or what. I've played around w/ attr_accessible and other things, and nothing works. I decided to ask here since I've been spinning my wheels for a couple of things, and I'm starting to confuse myself by searching on my own.
Thanks in advance for any help or advice you may have.
At the top of the Watcher class, you're going to want to define three attr_accessor declarations, which give the behavior you want. (attr_reader if you're only reading, attr_writer if you're only writing, attr_accessor if both.)
class Watcher
attr_accessor :sitenum, :numrecs, :filename
...
# later on, use # for class variables
...
#sitenum = 5
...
end
Now you should have no problem with watcher.sitenum etc. Here's an example.
EDIT: Some typos.
In addition to Jordan Scales' answer, these variable should initialized
class Watcher
attr_accessor :sitenum, :numrecs, :filename
def initialize
#sitenum = 'default value'
#numrecs = 'default value'
#filename = 'default value'
end
...
end
Otherwise you'll get uninformative value nil

Dynamically change the LogDevice of Ruby Logger

Is it possible to dynamically change the LogDevice of Ruby Logger?
If so, it would allow for some un-obtrusive changes to my existing codebase.
Currently Ruby Logger uses StringIO as the LogDevice:
#logDevice = StringIO.new("", "r+")
#log = Logger.new(#logDevice) // a reference to this is used by many objects
// both are instance vars
...
#log.info('some log') // Logging activity
...
// Before program ends, transmit logs to a server
Can LogDevice be dynamically changed to continue logging to a file?
(dynamic change because initially the filename is not known.)
Or if log device cannot be change can the StringIO object start writing to a file?
Instead of doing the above, I could write to a temporary log file, but wanted to check if the above can be done because it would be a less obstrusive change to the existing codebase.
The object you give to the logger just has to implement the 'write' and 'close' methods, so you can easily write your own 'io':
class MyIO
def initialize
#file = nil
#history = StringIO.new "", "w"
end
def file=(filename)
#file = File.open(filename, 'a+')
#file.write #history.string if #history
#history = nil
end
def write(data)
#history.write(data) if #history
#file.write(data) if #file
end
def close
#file.close if #file
end
end
create the logger with an instance of that and keep a reference to the instance. Then, whenever you know the filename, just set it with the 'file=' method.

Resources