losing data when zipping files

losing data when zipping files - ruby

I am using rubyzip on windows to zip up a directory.
When I unzip the archive some of the files are smaller than they were.
Zipping should be a lossless operation so I am wondering why this is happening.
Here is the code I am using:
require 'rubygems'
require 'find'
require 'zip/zip'
output = "c:/temp/test.zip"
zos = Zip::ZipOutputStream.new(output)
path = "C:/temp/profile"
::Find.find(path) do |file|
next if File.directory?(file)
entry = file.sub("#{path}/", '')
zos.put_next_entry(entry)
zos << File.read(file)
end
zos.close
The specific files that are having an issue are from a firefox profile. cert8.db and key3.db
Running the same code under jruby on linux with the same files works as expected - all the files are the same size.
Any ideas why this is a problem on windows?

I think problem is that you are reading files as text, not as binary files. These two fundamental modes of reading files have difference in such things as linebreaks, symbols EOF, etc.
Try File.open(file,'rb'){|f|f.read} instead of File.read(file).

Related

Read the content of the only file in a zip file in Ruby

I have a zip file in Ruby at a particular location on the file system. There is only 1 file in that zip file. I want to read the content of that file. How can I do it (without knowing the name of the file up-front)? I've tried looking at various libraries/ways but the APIs were either outdated, or the libraries weren't maintained for years.

You can use RubyZip:
require 'zip'
a = Zip::File.open(path_to_zip_file) { |z| z.first.get_input_stream.read }

How to find text file in same directory

I am trying to read a list of baby names from the year 1880 in CSV format. My program, when run in the terminal on OS X returns an error indicating yob1880.txt doesnt exist.
No such file or directory # rb_sysopen - /names/yob1880.txt (Errno::ENOENT)
from names.rb:2:in `<main>'
The location of both the script and the text file is /Users/*****/names.
lines = []
File.expand_path('../yob1880.txt', __FILE__)
IO.foreach('../yob1880.txt') do |line|
lines << line
if lines.size >= 1000
lines = FasterCSV.parse(lines.join) rescue next
store lines
lines = []
end
end
store lines

If you're running the script from the /Users/*****/names directory, and the files also exist there, you should simply remove the "../" from your pathnames to prevent looking in /Users/***** for the files.
Use this approach to referencing your files, instead:
File.expand_path('yob1880.txt', __FILE__)
IO.foreach('yob1880.txt') do |line|
Note that the File.expand_path is doing nothing at the moment, as the return value is not captured or used for any purpose; it simply consumes resources when it executes. Depending on your actual intent, it could realistically be removed.
Going deeper on this topic, it may be better for the script to be explicit about which directory in which it locates files. Consider these approaches:
Change to the directory in which the script exists, prior to opening files
Dir.chdir(File.dirname(File.expand_path(__FILE__)))
IO.foreach('yob1880.txt') do |line|
This explicitly requires that the script and the data be stored relative to one another; in this case, they would be stored in the same directory.
Provide a specific path to the files
# do not use Dir.chdir or File.expand_path
IO.foreach('/Users/****/yob1880.txt') do |line|
This can work if the script is used in a small, contained environment, such as your own machine, but will be brittle if it data is moved to another directory or to another machine. Generally, this approach is not useful, except for short-lived scripts for personal use.
Never put a script using this approach into production use.
Work only with files in the current directory
# do not use Dir.chdir or File.expand_path
IO.foreach('yob1880.txt') do |line|
This will work if you run the script from the directory in which the data exists, but will fail if run from another directory. This approach typically works better when the script detects the contents of the directory, rather than requiring certain files to already exist there.
Many Linux/Unix utilities, such as cat and grep use this approach, if the command-line options do not override such behavior.
Accept a command-line option to find data files
require 'optparse'
base_directory = "."
OptionParser.new do |opts|
opts.banner = "Usage: example.rb [options]"
opts.on('-d', '--dir NAME', 'Directory name') {|v| base_directory = Dir.chdir(File.dirname(File.expand_path(v))) }
end
IO.foreach(File.join(base_directory, 'yob1880.txt')) do |line|
# do lines
end
This will give your script a -d or --dir option in which to specify the directory in which to find files.
Use a configuration file to find data files
This code would allow you to use a YAML configuration file to define where the files are located:
require 'yaml'
config_filename = File.expand_path("~/yob/config.yml")
config = {}
name = nil
config = YAML.load_file(config_filename)
base_directory = config["base"]
IO.foreach(File.join(base_directory, 'yob1880.txt')) do |line|
# do lines
end
This doesn't include any error handling related to finding and loading the config file, but it gets the point across. For additional information on using a YAML config file with error handling, see my answer on Asking user for information, and never having to ask again.
Final thoughts
You have the tools to establish ways to locate your data files. You can even mix-and-match solutions for a more sophisticated solution. For instance, you could default to the current directory (or the script directory) when no config file exists, and allow the command-line option to manually override the directory, when necessary.

Here's a technique I always use when I want to normalize the current working directory for my scripts. This is a good idea because in most cases you code your script and place the supporting files in the same folder, or in a sub-folder of the main script.
This resets the current working directory to the same folder as where the script is situated in. After that it's much easier to figure out the paths to everything:
# Reset working directory to same folder as current script file
Dir.chdir(File.dirname(File.expand_path(__FILE__)))
After that you can open your data file with just:
IO.foreach('yob1880.txt')

Ruby Dropbox APP: How to download a word document

I'm having troubles trying to download word documents from a dropbox using an APP controlled by a ruby program. (I would like to have the ability to download any file from a dropbox).
The code they provide is great for "downloading" a .txt file, but if you try using the same code to download a .docx file, the "downloaded" file won't open in word due to "corruption."
The code I'm using:
contents = #client.get_file(path + filename)
open(filename, 'w') {|f| f.puts contents }
For variable examples, path could be '/', and filename could be 'aFile.docx'. This works, but the file, aFile.docx, that is created can not be opened. I am aware that this is simply grabbing the contents of the file and then creating a new file and inserting the contents.

Try this:
open(filename, 'wb') { |f| f.write contents }
Two changes from your code:
I used the file mode wb to specify that I'm going to write binary data. I don't think this makes a difference on Linux and OS X, but it matters on Windows.
I used write instead of puts. I believe puts expects a string, while you're trying to write arbitrary binary data. I assume this is the source of the "corruption."

File paths in Ruby

So I want to make a file path relative to the directory it is in, in Ruby.
I have a project, and I want it to be able to find the file no matter what directory the project is unzipped into. (Say the code is run on different machines, for example) I can't figure it out for the life of me.
It seems for requires that I can do this:
require File.dirname(__FILE__) + '/comparison'
What can I do for a file that is in a different directory than my src folder?
Instead of listing,
file = 'C:/whole path/long/very_long/file.txt'
I'd like to say:
file = 'file.txt'
or
file = File.helpful_method + 'file.txt'

file = File.join(File.dirname(__FILE__), '..', 'another_dir', 'file.txt')
Replace '..', 'another_dir' with the relative path segments that reach 'file.txt'.

If you're running Ruby 1.9.2 or later, you can use require_relative instead:
require_relative '../somewhere/file.rb'
This doesn't solve the general problem of referring to files by their relative path, but if all you're doing is requiring the file, it should work.

RubyZip - files from different directories have path in zip

I'm trying to use RubyZip to package up some files. At the moment I have a method which happily zips on particular directory and sub-directories.
def zip_directory(zipfile)
Dir["#{#directory_to_zip}/**/**"].reject{|f| reject_file(f)}.each do |file_path|
file_name = file_path.sub(#directory_to_zip+'/','');
zipfile.add(file_name, file_path)
end
end
However, I want to include a file from a completely different folder. I have a the following method to solve this:
def zip_additional(zipfile)
additional_files.reject{|f| reject_file(f)}.each do |file_path|
file_name = file_path.split('\\').last
zipfile.add(file_name, file_path)
end
end
While the file is added, it also copies the directory structure instead of placing the file at the root of the folder. This is really annoying and makes it more difficult to work with.
How can I get around this?
Thanks
Ben

there is setting to include (or exclude) the full path for zip libraries, check that setting

Turns out it was because the filename had the pull path in. My split didn't work as the path used a / instead of a . With the path removed from the filename it just worked.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

losing data when zipping files - ruby

I think problem is that you are reading files as text, not as binary files. These two fundamental modes of reading files have difference in such things as linebreaks, symbols EOF, etc. Try File.open(file,'rb'){|f|f.read} instead of File.read(file).

Related

Read the content of the only file in a zip file in Ruby

How to find text file in same directory

Ruby Dropbox APP: How to download a word document

File paths in Ruby

RubyZip - files from different directories have path in zip

Categories

Resources