Output All Paths in a Zip file in ruby - ruby

I am trying to check a zip file for file paths that exceed a specified character limit. I was able to do this easily with a regular folder path using:
Dir.glob("#{root_path}/*")
And then iterating through the paths given from the glob and comparing their lengths to my given character limit. Is there a way to do this with a zip file without actually unzipping it?
Any help is much appreciated.

I ended up using the zip class and it has an entries property. So if you open the zip in with the zip library you can access all the file path names by iterating through each “entry” and accessing each entry’s “name” property which is a string of the path.
For example:
require 'zip'
paths = Array.new
zip = Zip::File.open(my_zip, false)
zip.entries.each do |entry|
paths << entry.name
end
Returning the paths array will give you an array of each path in the zip file as a string.

Related

Why are Ruby's file related types string-based (stringly typed)?

e.g. Dir.entries returns an array of strings vs an array containing File or Dir instances.
Most methods on Dir and File types. The instances are aneamic in comparison.
There is no Dir#folders or Dir#files - instead I explicitly
loop over Dir.entries
build the path (File.expand_path) for
each item
check File.directory?
Simple use-cases like get all .svg files in this directory seem to require a number of hoops/loops/checks. Am I using Ruby wrong or does this facet of Ruby seem very un-ruby-ish?
Depending on your needs, File or Dir might do just fine.
When you need to chain commands and (rightfully) think it feels un-ruby-ish to only use class methods with string parameters, you can use Pathname. It is a standard library.
Examples
Dirs and Files
require 'pathname'
my_folder = Pathname.new('./')
dirs, files = my_folder.children.partition(&:directory?)
# dirs is now an Array of Pathnames pointing to subdirectories of my_folder
# files is now an Array of Pathnames pointing to files inside my_folder
All .svg files
If for some reason there might be folders with .svg extension, you can just filter the pathnames returned by Pathname.glob :
svg_files = Pathname.glob("folder/", "*.svg").select(&:file?)
If you want a specific syntax :
class Pathname
def files
children.select(&:file?)
end
end
aDir = Pathname.new('folder/')
p aDir.files.find_all{ |f| f.extname == '.svg' }
Iterating the Directory tree
Pathname#find will help.
Until you open the file it is just a path (string).
To open all .svg files
svgs = Dir.glob(File.join('/path/to/dir', '*.svg'))
On windows case doesn't matter in file paths, but in all unixoid systems (Linux, MacOS...) file.svg is different from file.SVG
To get all .svg files and.SVG files you need File::FNM_CASEFOLD flag.
If you want to get .svg files recursively, you need **/*.svg
svgs = Dir.glob('/path/to/dir/**/*.svg', File::FNM_CASEFOLD)
If you expect directories ending in.svg then filter them out
svgs.reject! { |path| File.directory?(path) }

Wildcard file requires in Ruby

As an example:
Dir[File.dirname(__FILE__) + "/support/**/*.rb"].each { |f| require f }
This is how RSpec requires all of the ruby files in the support directory and all subdirectories. I know this has to do with "/**/*". What does this mean in Ruby? How does it work?
File.dirname(__FILE__) is the directory where the file is. ** and * are UNIX wildcards. Adding "/support/**/*.rb to the directory points to any file that ends with .rb, which is under an arbitrary depth under the sub-directory support under that directory.
Passing that to Dir[] gives the array of such files. each iterates over such files, and require loads each file.
i believe that the /**/ part means Any directory , and the *.rb means any file that ends with .rb extention, regardless of it's name.
so, basically, you are getting any .rb file that are in any folder in
#{current_dir}/support/#{any_dir}/#{any_file_with_extention.rb}

ruby return section of directory name

I have a directory, which contains a series of folders, which are of the pattern YYYY-MM-DD_NUMBER . If I am navigating through one of these folders using Dir, how can I return part of the folder name that contains YYYY-MM-DD ?
For example, 2013-05-23_160332 would be a name of a folder. And it would be apart of a larger directory, called main_dir. I use Dir to get access to some file names and store them into an array, like so:
array = Dir["/main_dir/**/data/*.csv"]
I then iterate through the array and print the files. How can I also return/print the part of the title directory that I am currently accessing with each iteration (again, in the form of YYYY-MM-DD)?
I might do something like this.
re = Regexp.new('\d{4}-\d{2}-\d{2}')
array.each do |folder|
puts folder[re]
# folder.each or other processing ...
end

Ruby FTP Separating files from Folders

I'm trying to crawl FTP and pull down all the files recursively.
Up until now I was trying to pull down a directory with
ftp.list.each do |entry|
if entry.split(/\s+/)[0][0, 1] == "d"
out[:dirs] << entry.split.last unless black_dirs.include? entry.split.last
else
out[:files] << entry.split.last unless black_files.include? entry.split.last
end
But turns out, if you split the list up until last space, filenames and directories with spaces are fetched wrong.
Need a little help on the logic here.
You can avoid recursion if you list all files at once
files = ftp.nlst('**/*.*')
Directories are not included in the list but the full ftp path is still available in the name.
EDIT
I'm assuming that each file name contains a dot and directory names don't. Thanks for mentioning #Niklas B.
There are a huge variety of FTP servers around.
We have clients who use some obscure proprietary, Windows-based servers and the file listing returned by them look completely different from Linux versions.
So what I ended up doing is for each file/directory entry I try changing directory into it and if this doesn't work - consider it a file :)
The following method is "bullet proof":
# Checks if the give file_name is actually a file.
def is_ftp_file?(ftp, file_name)
ftp.chdir(file_name)
ftp.chdir('..')
false
rescue
true
end
file_names = ftp.nlst.select {|fname| is_ftp_file?(ftp, fname)}
Works like a charm, but please note: if the FTP directory has tons of files in it - this method takes a while to traverse all of them.
You can also use a regular expression. I put one together. Please verify if it works for you as well as I don't know it your dir listing look different. You have to use Ruby 1.9 btw.
reg = /^(?<type>.{1})(?<mode>\S+)\s+(?<number>\d+)\s+(?<owner>\S+)\s+(?<group>\S+)\s+(?<size>\d+)\s+(?<mod_time>.{12})\s+(?<path>.+)$/
match = entry.match(reg)
You are able to access the elements by name then
match[:type] contains a 'd' if it's a directory, a space if it's a file.
All the other elements are there as well. Most importantly match[:path].
Assuming that the FTP server returns Unix-like file listings, the following code works. At least for me.
regex = /^d[r|w|x|-]+\s+[0-9]\s+\S+\s+\S+\s+\d+\s+\w+\s+\d+\s+[\d|:]+\s(.+)/
ftp.ls.each do |line|
if dir = line.match(regex)
puts dir[1]
end
end
dir[1] contains the name of the directory (given that the inspected line actually represents a directory).
As #Alex pointed out, using patterns in filenames for this is hardly reliable. Directories CAN have dots in their names (.ssh for example), and listings can be very different on different servers.
His method works, but as he himself points out, takes too long.
I prefer using the .size method from Net::FTP.
It returns the size of a file, or throws an error if the file is a directory.
def item_is_file? (item)
ftp = Net::FTP.new(host, username, password)
begin
if ftp.size(item).is_a? Numeric
true
end
rescue Net::FTPPermError
return false
end
end
I'll add my solution to the mix...
Using ftp.nlst('**/*.*') did not work for me... server doesn't seem to support that ** syntax.
The chdir trick with a rescue seems expensive and hackish.
Assuming that all files have at least one char, a single period, and then an extension, I did a simple recursion.
def list_all_files(ftp, folder)
entries = ftp.nlst(folder)
file_regex = /.+\.{1}.*/
files = entries.select{|e| e.match(file_regex)}
subfolders = entries.reject{|e| e.match(file_regex)}
subfolders.each do |subfolder|
files += list_all_files(ftp, subfolder)
end
files
end
nlst seems to return the full path to whatever it finds non-recursively... so each time you get a listing, separate the files from the folders, and then process any folder you find recrsively. Collect all the file results.
To call, you can pass a starting folder
files = list_all_files(ftp, "my_starting_folder/my_sub_folder")
files = list_all_files(ftp, ".")
files = list_all_files(ftp, "")
files = list_all_files(ftp, nil)

RubyZip - files from different directories have path in zip

I'm trying to use RubyZip to package up some files. At the moment I have a method which happily zips on particular directory and sub-directories.
def zip_directory(zipfile)
Dir["#{#directory_to_zip}/**/**"].reject{|f| reject_file(f)}.each do |file_path|
file_name = file_path.sub(#directory_to_zip+'/','');
zipfile.add(file_name, file_path)
end
end
However, I want to include a file from a completely different folder. I have a the following method to solve this:
def zip_additional(zipfile)
additional_files.reject{|f| reject_file(f)}.each do |file_path|
file_name = file_path.split('\\').last
zipfile.add(file_name, file_path)
end
end
While the file is added, it also copies the directory structure instead of placing the file at the root of the folder. This is really annoying and makes it more difficult to work with.
How can I get around this?
Thanks
Ben
there is setting to include (or exclude) the full path for zip libraries, check that setting
Turns out it was because the filename had the pull path in. My split didn't work as the path used a / instead of a . With the path removed from the filename it just worked.

Resources