I wrote a simple ftp client that was downloading some zip files from a client site. For all intents and purposes the code looked like this:
ftp = Net::FTP.new
ftp.connect 'ftp.server.com'
ftp.login 'user', 'pwd'
ftp.binary = true
t = Tempfile.new 'file'
ftp.getbinaryfile('remotefile', nil) {|data| t << data}
t.close
ftp.close
FileUtils.mv t, '/path/to/file'
This ran fine and dandy when it was running on a Linux box, but when the code got moved to a Windows box the binary data started getting corrupted and I had to set the tempfile into binmode before writing to it.
My question: Is there any way I can "fix" or undo the encoding translations that were done when the zip files were originally downloaded and corrupted to get those files back, essentially going from the encoding back to binary?
Some further info from the Windows box the code was running from
t = Tempfile.new('file')
t.external_encoding # -> nil
t.internal_encoding # -> nil
Encoding.default_internal # -> nil
Encoding.default_external.name # -> "IBM437"
I think the data get corrupted while saving into the file, not while downloading.
On Windows, text file lines are separated with CR+LF. If you open a file in text mode and write CR byte into it, the CR automatically gets replaced with CR+LF.
Zip files are binary files. Use binary mode to work with them.
Related
I have a file which is a compressed image. Its size on disk on Windows is 125,966,232 bytes. I uploaded it to S3 using the Ruby aws-S3 gem. Its size on S3, from the properties pane, is also 125,966,232 bytes.
When I download it to disk using the web browser and the image's public URL, it downloads fine, and its size is consistent. It also uncompresses fine with my uncompression utility.
When I download the file from the S3 bucket to disk using RestClient (1.6.7), its size on disk after downloading is 126,456,885 bytes, 890,653 bytes bigger. This successful download cannot be uncompressed with my uncompression utility, and running this download repeatedly with the same S3 file gets a downloaded file, always of the size with the same file size of 126,456,885 bytes.
require 'rest_client'
local_file = "C:\\test\\test_download.cap"
s3_bucket = "my-bucket-not"
remote_S3_file_url = "https://s3.amazonaws.com/#{s3_bucket}/test_download.cap"
File.open(local_file, "w") do |f|
f.write RestClient.read remote_S3_file_url
end
What do I have to do to ensure that the downloaded file is exactly the same size and/or decompresses properly?
I'd recommend not saving the file as text but instead as binary.
You're using:
File.open(local_file, "w")
'w' means:
"w" Write-only, truncates existing file
to zero length or creates a new file for writing.
Use the 'wb' mode for saving the file instead. Without 'b', line-ends will be converted to Windows format effectively ballooning the size and corrupting the file's contents:
"b" Binary file mode
Suppresses EOL <-> CRLF conversion on Windows. And
sets external encoding to ASCII-8BIT unless explicitly
specified.
So use:
File.open(local_file, 'wb')
See "IO Open Mode" for more information.
I have a tar.gz file saved on disk and I want to leave it packed there, but I need to open one file within the archive, read from it and save some information somewhere.
File structure:
base_folder
file_i_need.txt
other_folder
other_file
code (it is not much - I tried 10mio different ways and this is what is left)
def self.open_file(file)
uncompressed_file = Gem::Package::TarReader.new(Zlib::GzipReader.open(file))
uncompressed_file.rewind
end
When I run it in a console I get
<Gem::Package::TarReader:0x007fbaac178090>
and I can run commands on the entries. I just haven't figured out how to open an entry and read from it without saving it unpacked to disk. I mainly need the string from the text file.
Any help appreciated. I might just be missing something...
TarReader is Enumerable, returning Entry.
That said, to retrieve the text content from the file by it’s name one might
uncompressed = Gem::Package::TarReader.new(Zlib::GzipReader.open(file))
text = uncompressed.detect do |f|
f.fullname == 'base_folder/file_i_need.txt'
end.read
#⇒ Hello, I’m content of the text file, located inside gzipped tar
Hope it helps.
I'm having troubles trying to download word documents from a dropbox using an APP controlled by a ruby program. (I would like to have the ability to download any file from a dropbox).
The code they provide is great for "downloading" a .txt file, but if you try using the same code to download a .docx file, the "downloaded" file won't open in word due to "corruption."
The code I'm using:
contents = #client.get_file(path + filename)
open(filename, 'w') {|f| f.puts contents }
For variable examples, path could be '/', and filename could be 'aFile.docx'. This works, but the file, aFile.docx, that is created can not be opened. I am aware that this is simply grabbing the contents of the file and then creating a new file and inserting the contents.
Try this:
open(filename, 'wb') { |f| f.write contents }
Two changes from your code:
I used the file mode wb to specify that I'm going to write binary data. I don't think this makes a difference on Linux and OS X, but it matters on Windows.
I used write instead of puts. I believe puts expects a string, while you're trying to write arbitrary binary data. I assume this is the source of the "corruption."
I need to upload Word and Excel files on my site.
I create a upload form, upload the file and save it like this:
f = File.new("public/files/#{user.id.to_s}/filename", "w+")
f.write params[:file].read
f.close
Word and Excel files must be saved as binary data.
Sadly the Filemode "b" is only for windows and I'm under linux.
What to do?
Yours,
Joern
Binary file mode "b" may appear with any of the key letters (r, r+, w, w+, a, a+) so you can do it like this f = File.new("public/files/#{user.id.to_s}/filename", "w+b").
And the "b" mode is not only for windows. Ruby documentation says that "Binary file mode (may appear with any of the key letters r, r+, w, w+, a, a+. Suppresses EOL <-> CRLF conversion on Windows. And sets external encoding to ASCII-8BIT unless explicitly specified." and says nothing about "b" being just for windows. It just tells that it works different on windows/linux with line endings. So you can use "w+b" mode on linux and windows.
I am using rubyzip on windows to zip up a directory.
When I unzip the archive some of the files are smaller than they were.
Zipping should be a lossless operation so I am wondering why this is happening.
Here is the code I am using:
require 'rubygems'
require 'find'
require 'zip/zip'
output = "c:/temp/test.zip"
zos = Zip::ZipOutputStream.new(output)
path = "C:/temp/profile"
::Find.find(path) do |file|
next if File.directory?(file)
entry = file.sub("#{path}/", '')
zos.put_next_entry(entry)
zos << File.read(file)
end
zos.close
The specific files that are having an issue are from a firefox profile. cert8.db and key3.db
Running the same code under jruby on linux with the same files works as expected - all the files are the same size.
Any ideas why this is a problem on windows?
I think problem is that you are reading files as text, not as binary files. These two fundamental modes of reading files have difference in such things as linebreaks, symbols EOF, etc.
Try File.open(file,'rb'){|f|f.read} instead of File.read(file).