Download from S3 doesn't work using RestClient - ruby

I have a file which is a compressed image. Its size on disk on Windows is 125,966,232 bytes. I uploaded it to S3 using the Ruby aws-S3 gem. Its size on S3, from the properties pane, is also 125,966,232 bytes.
When I download it to disk using the web browser and the image's public URL, it downloads fine, and its size is consistent. It also uncompresses fine with my uncompression utility.
When I download the file from the S3 bucket to disk using RestClient (1.6.7), its size on disk after downloading is 126,456,885 bytes, 890,653 bytes bigger. This successful download cannot be uncompressed with my uncompression utility, and running this download repeatedly with the same S3 file gets a downloaded file, always of the size with the same file size of 126,456,885 bytes.
require 'rest_client'
local_file = "C:\\test\\test_download.cap"
s3_bucket = "my-bucket-not"
remote_S3_file_url = "https://s3.amazonaws.com/#{s3_bucket}/test_download.cap"
File.open(local_file, "w") do |f|
f.write RestClient.read remote_S3_file_url
end
What do I have to do to ensure that the downloaded file is exactly the same size and/or decompresses properly?

I'd recommend not saving the file as text but instead as binary.
You're using:
File.open(local_file, "w")
'w' means:
"w" Write-only, truncates existing file
to zero length or creates a new file for writing.
Use the 'wb' mode for saving the file instead. Without 'b', line-ends will be converted to Windows format effectively ballooning the size and corrupting the file's contents:
"b" Binary file mode
Suppresses EOL <-> CRLF conversion on Windows. And
sets external encoding to ASCII-8BIT unless explicitly
specified.
So use:
File.open(local_file, 'wb')
See "IO Open Mode" for more information.

Related

How can i re-compress DLL files of xamarin application

I have a xamarin application and its files are compressed using lz4 I could easily decompress the files it using lz4.block.decompress but I couldn't compress it again, can anyone help me with that?
I used the following script to decompress the file and I could read the content of the DLL file but when I modified something and try to recompress it again to patch the APK file I couldn't compress the file in the right format.
with open(input_filepath, "rb") as xalz_file:
data = xalz_file
header = data.read(8)
if header[:4] != b"XALZ":
sys.exit("The input file does not contain the expected magic bytes, aborting ...")
payload = data.read()
decompressed = lz4.block.decompress(payload)
with open(output_filepath, "wb") as output_file:
output_file.write(decompressed)
output_file.close()
print("result written to file")
Note that I couldn't decompress or compress the DLL file using lz4tools.

Read the file names or the number of files in tar.gz

I have a tar.gz file, which holds multiple csv files archived. I need to read the list of the file names or at least the number of files.
This is what I tried:
require 'zlib'
file = Zlib::GzipReader.open('test/data/file_name.tar.gz')
file.each_line do |line|
p line
end
but this only prints each line in the csv files, not the file names. I also tried this:
require 'zlib'
Zlib::GzipReader.open('test/data/file_name.tar.gz') { | f |
p f.read
}
which reads similarly, but character by character instead of line by line.
Any idea how I could get the list of file names or at least the number of files within the archive?
You need to use a tar reader on the uncompressed output.
".tar.gz" means that two processes were applied to generate the file. First a set of files were "tarred" to make a ".tar" file which contains a sequence of (file header block, uncompressed file data) units. Then that was gzipped as a single stream of bytes, to make the ".tar.gz". In reality, the .tar file was very likely never stored anywhere, but generated as a stream of bytes and gzipped on the fly to write out the .tar.gz file directly.
To get the contents, you reverse the process, ungzipping, and then feeding the result of that to a tar reader to interpret the file header blocks and extract the data. Again, you can ungzip and read the tarred file contents on the fly, with no need to store the intermediate .tar file.

Ruby Dropbox APP: How to download a word document

I'm having troubles trying to download word documents from a dropbox using an APP controlled by a ruby program. (I would like to have the ability to download any file from a dropbox).
The code they provide is great for "downloading" a .txt file, but if you try using the same code to download a .docx file, the "downloaded" file won't open in word due to "corruption."
The code I'm using:
contents = #client.get_file(path + filename)
open(filename, 'w') {|f| f.puts contents }
For variable examples, path could be '/', and filename could be 'aFile.docx'. This works, but the file, aFile.docx, that is created can not be opened. I am aware that this is simply grabbing the contents of the file and then creating a new file and inserting the contents.
Try this:
open(filename, 'wb') { |f| f.write contents }
Two changes from your code:
I used the file mode wb to specify that I'm going to write binary data. I don't think this makes a difference on Linux and OS X, but it matters on Windows.
I used write instead of puts. I believe puts expects a string, while you're trying to write arbitrary binary data. I assume this is the source of the "corruption."

Undoing Encoding Translations

I wrote a simple ftp client that was downloading some zip files from a client site. For all intents and purposes the code looked like this:
ftp = Net::FTP.new
ftp.connect 'ftp.server.com'
ftp.login 'user', 'pwd'
ftp.binary = true
t = Tempfile.new 'file'
ftp.getbinaryfile('remotefile', nil) {|data| t << data}
t.close
ftp.close
FileUtils.mv t, '/path/to/file'
This ran fine and dandy when it was running on a Linux box, but when the code got moved to a Windows box the binary data started getting corrupted and I had to set the tempfile into binmode before writing to it.
My question: Is there any way I can "fix" or undo the encoding translations that were done when the zip files were originally downloaded and corrupted to get those files back, essentially going from the encoding back to binary?
Some further info from the Windows box the code was running from
t = Tempfile.new('file')
t.external_encoding # -> nil
t.internal_encoding # -> nil
Encoding.default_internal # -> nil
Encoding.default_external.name # -> "IBM437"
I think the data get corrupted while saving into the file, not while downloading.
On Windows, text file lines are separated with CR+LF. If you open a file in text mode and write CR byte into it, the CR automatically gets replaced with CR+LF.
Zip files are binary files. Use binary mode to work with them.

not in gzip format (Zlib::GzipFile::Error) - while decompressing the gzip file in Ruby

I am trying to decompress a file with following Ruby code.
File.open("file_compressed.gz") do |compressed|
File.open("file_decomp","w") do |decompressed|
gz = Zlib::GzipReader.new(compressed)
result = gz.read
decompressed.write(result)
gz.close
end
end
But I am getting following error -
not in gzip format (Zlib::GzipFile::Error)
./features/support/abc/abc_file.rb:44:in `initialize'
When I decompress the same file using gzip command on Mac it produced the correct uncompressed output.
For following command I can see -
$file file_compressed.gz
file_compressed.gz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT)
Do I need to put any header data while I create the compressed file with Zlib? Because when I use the inflate method instead of the GzipReader I get following error -
incorrect header check (Zlib::DataError)
./features/support/abc/abc_file.rb:69:in `inflate'
If you're on a platform that doesn't use LF delimiters, but CR+LF, you may need to open the file in binary mode for reading:
File.open("file_compressed.gz", "rb") do |compressed|
# ...
end
This should also avoid interpreting the input stream as anything but 8-bit binary.
Be sure to open your output file the same way using "wb" as the flag.

Resources