I have a function which generates some data. Right now, this data is written to a file. Once the file is complete, I upload it via HTTParty:
require 'httparty'
url = "..."
def generate_data(file)
file << "First line of data\n"
sleep 1
file << "Second line of data\n"
sleep 1
file << "Third line of data\n"
end
File.open('payload.txt', 'w+') do |file|
generate_data(file)
file.rewind
HTTParty.post(url, body: {file: file})
end
As it happens, generate_data takes a bit -- I would like to accelerate the script and avoid writing to disk by interleaving the generation of the data and uploading it. How could I do this using HTTParty?
I was looking for something like StringIO which could be used as a fixed-size FIFO buffer: the generate_data function writes to it (and blocks when the buffer is full) while the HTTParty.post call reads from it. (and blocks when the buffer is empty). However, I failed to find anything like that.
You need to use streaming
HTTParty.put(
'http://localhost:3000/train',
body_stream: StringIO.new('foo')
)
Related
I'm attempting to download a ~2GB file and write it to a file locally but I'm running into this issue:
Here's the applicable code:
File.open(local_file, "wb") do |tempfile|
puts "Downloading the backup..."
pbar = nil
open(backup_url,
:read_timeout => nil,
:content_length_proc => lambda do |content_length|
if content_length&.positive?
pbar = ProgressBar.create(:total => content_length)
end
end,
:progress_proc => ->(size) { pbar&.progress = size }) do |retrieved|
begin
tempfile.binmode
tempfile << retrieved.read
tempfile.close
rescue Exception => e
binding.pry
end
end
Read your file in chunks.
The line causing the issue is here:
tempfile << retrieved.read
This reads the entire contents into memory before writing it to the tempfile. If the content is small, this isn't a big deal, but if this content is quite large (how large depends on the system, configuration, OS and available resources), this can cause an Errno::EINVAL error, like Invalid argument # io_fread and Invalid argument # io_write.
To work around this, read the content in chunks and write each chunk to the tempfile. Something like this:
tempfile.write( retrieved.read( 1024 ) ) until retrieved.eof?
This will get chunks of 1024 bytes and write each chunk to the tempfile until retrieved reaches the end of the file (i.e. .eof?).
If retrieved.read doesn't take a size parameter, you may need to convert retrieved into a StringIO, like this:
retrievedIO = StringIO.new( retrieved )
tempfile.write( retrievedIO.read( 1024 ) ) until retrievedIO.eof?
Let's say you have the following code:
from_file, to_file = ARGV
puts "Copying from #{from_file} to #{to_file}"
#in_file = open(from_file)
#indata = in_file.read
indata = open(from_file).read # Combined in_file and indata.
puts "The input file is #{indata.length} bytes long."
puts "Does the output file exist? #{File.exist?(to_file)}"
puts "Ready, hit RETURN to continue or CTRL-C to abort."
$stdin.gets
out_file = open(to_file, 'w')
out_file.write(indata)
puts "Alright, all done."
out_file.close
#in_file.close
How would you close the file descriptor invoked by indata? You will need to close File open, but indata is really a (File open).read.
P.S. Since it's a script, it will be closed automatically upon exit. Let's assume that we're running a general, consistently running backend service. And we don't know whether garbage collector will kick in, so we will need to explicitly close it. What would you do?
If you are just copying the file...
you could just use FileUtils#cp:
FileUtils.cp("from_file", "to_file")
or even shell-out to the operating system and do it with a system command.
Let's suppose you want to do something to the input file before writing it to the output file.
If from_file is not large,...
you could "gulp it" into a string using IO.read:
str = IO.read(from_file)
manipulate str as desired, to obtain new_str, then then blast it to the output file using IO#write:
IO.write("to_file", new_str)
Note that for the class File:
File < IO #=> true # File inherits IO's methods
which is why you often see this written File.read(...) and File.write(...).
If from_file is large, read a line, write a line...
provided the changes to be made are done for each line separately.
f = File.open("to_file", "w") # or File.new("to_file", "w")
IO.foreach("from_file") do |line|
# < modify line to produce new_line >
f.puts new_line
end
f.close
foreach closes "from_file" when it's finished. If f.close is not present, Ruby will close "to_file" when the method containing the code goes out of scope. Still, it's a good idea to close it in case other work is done before the code goes out of scope.
Passing File.open a block is generally a nice way to go about things, so I’ll offer it up as an alternative even if it doesn’t seem to be quite what you asked.
indata = File.open(from_file) do |f|
f.read
end
I want to copy the contents of one file to another using Ruby's file methods.
How can I do it using a simple Ruby program using file methods?
There is a very handy method for this - the IO#copy_stream method - see the output of ri copy_stream
Example usage:
File.open('src.txt') do |f|
f.puts 'Some text'
end
IO.copy_stream('src.txt', 'dest.txt')
For those that are interested, here's a variation of the IO#copy_stream, File#open + block answer(s) (written against ruby 2.2.x, 3 years too late).
copy = Tempfile.new
File.open(file, 'rb') do |input_stream|
File.open(copy, 'wb') do |output_stream|
IO.copy_stream(input_stream, output_stream)
end
end
As a precaution I would recommend using buffer unless you can guarantee whole file always fits into memory:
File.open("source", "rb") do |input|
File.open("target", "wb") do |output|
while buff = input.read(4096)
output.write(buff)
end
end
end
Here my implementation
class File
def self.copy(source, target)
File.open(source, 'rb') do |infile|
File.open(target, 'wb') do |outfile2|
while buffer = infile.read(4096)
outfile2 << buffer
end
end
end
end
end
Usage:
File.copy sourcepath, targetpath
Here is a simple way of doing that using ruby file operation methods :
source_file, destination_file = ARGV
script = $0
input = File.open(source_file)
data_to_copy = input.read() # gather the data using read() method
puts "The source file is #{data_to_copy.length} bytes long"
output = File.open(destination_file, 'w')
output.write(data_to_copy) # write up the data using write() method
puts "File has been copied"
output.close()
input.close()
You can also use File.exists? to check if the file exists or not. This would return a boolean true if it does!!
Here's a fast and concise way to do it.
# Open first file, read it, store it, then close it
input = File.open(ARGV[0]) {|f| f.read() }
# Open second file, write to it, then close it
output = File.open(ARGV[1], 'w') {|f| f.write(input) }
An example for running this would be.
$ ruby this_script.rb from_file.txt to_file.txt
This runs this_script.rb and takes in two arguments through the command-line. The first one in our case is from_file.txt (text being copied from) and the second argument second_file.txt (text being copied to).
You can also use File.binread and File.binwrite if you wish to hold onto the file contents for a bit. (Other answers use an instant copy_stream instead.)
If the contents are other than plain text files, such as images, using basic File.read and File.write won't work.
temp_image = Tempfile.new('image.jpg')
actual_img = IO.binread('image.jpg')
IO.binwrite(temp_image, actual_img)
Source: binread,
binwrite.
In my app, I have the following code:
File.open "filename", "w" do |file|
file.write("text")
end
I want to test this code via RSpec. What are the best practices for doing this?
I would suggest using StringIO for this and making sure your SUT accepts a stream to write to instead of a filename. That way, different files or outputs can be used (more reusable), including the string IO (good for testing)
So in your test code (assuming your SUT instance is sutObject and the serializer is named writeStuffTo:
testIO = StringIO.new
sutObject.writeStuffTo testIO
testIO.string.should == "Hello, world!"
String IO behaves like an open file. So if the code already can work with a File object, it will work with StringIO.
For very simple i/o, you can just mock File. So, given:
def foo
File.open "filename", "w" do |file|
file.write("text")
end
end
then:
describe "foo" do
it "should create 'filename' and put 'text' in it" do
file = mock('file')
File.should_receive(:open).with("filename", "w").and_yield(file)
file.should_receive(:write).with("text")
foo
end
end
However, this approach falls flat in the presence of multiple reads/writes: simple refactorings which do not change the final state of the file can cause the test to break. In that case (and possibly in any case) you should prefer #Danny Staple's answer.
This is how to mock File (with rspec 3.4), so you could write to a buffer and check its content later:
it 'How to mock File.open for write with rspec 3.4' do
#buffer = StringIO.new()
#filename = "somefile.txt"
#content = "the content fo the file"
allow(File).to receive(:open).with(#filename,'w').and_yield( #buffer )
# call the function that writes to the file
File.open(#filename, 'w') {|f| f.write(#content)}
# reading the buffer and checking its content.
expect(#buffer.string).to eq(#content)
end
You can use fakefs.
It stubs filesystem and creates files in memory
You check with
File.exists? "filename"
if file was created.
You can also just read it with
File.open
and run expectation on its contents.
For someone like me who need to modify multiple files in multiple directories (e.g. generator for Rails), I use temp folder.
Dir.mktmpdir do |dir|
Dir.chdir(dir) do
# Generate a clean Rails folder
Rails::Generators::AppGenerator.start ['foo', '--skip-bundle']
File.open(File.join(dir, 'foo.txt'), 'w') {|f| f.write("write your stuff here") }
expect(File.exist?(File.join(dir, 'foo.txt'))).to eq(true)
end
end
I have a small application that process emails as downloaded from a imap-server with fetchmail. The processing consists of finding base64-encoded attachments with a XML-file inside.
Here is the code (somewhat stripped):
def extract_data_from_mailfile(mailfile)
begin
mail = TMail::Mail.load(mailfile)
rescue
return nil
end
bodies_found = []
if mail.multipart? then
mail.parts.each do |m|
bodies_found << m.body
end
end
## Let's parse the parts we found in the mail to see if one of them
## looks XML-ish. Hacky but works for now.
## was XML.
bodies_found.each do |body|
if body =~ /^<\?XML /i then
return body
end
end
return nil # Nothing found.
end
This works great, but on large XML-files (typically >600k mailfiles), this breaks.
>> mail.parts[1].body.size
=> 487424 <-- should have been larger - doesn't include the end of the file
Base64-decoding doesn't happen automatically either. But this is when I try to run decode manually:
>> Base64::decode64(mail.parts[1].body)
[...] ll="SMTP"></Sendt><Sendt"
That's part of the XML-file, but it has been clipped.
Any way to get the entire attachment? any tips?
I see your code breaks out the loop at the first found XML fragment. Perhaps the larger messages divide their XML into smaller chunks inside the same multi-part MIME message? You would then return an array of bodies and concat them
mail.parts[1].body[0] + mail.parts[1].body[1]
(PS. It's a long shot, I haven't tried this)