ruby return section of directory name - ruby

I have a directory, which contains a series of folders, which are of the pattern YYYY-MM-DD_NUMBER . If I am navigating through one of these folders using Dir, how can I return part of the folder name that contains YYYY-MM-DD ?
For example, 2013-05-23_160332 would be a name of a folder. And it would be apart of a larger directory, called main_dir. I use Dir to get access to some file names and store them into an array, like so:
array = Dir["/main_dir/**/data/*.csv"]
I then iterate through the array and print the files. How can I also return/print the part of the title directory that I am currently accessing with each iteration (again, in the form of YYYY-MM-DD)?

I might do something like this.
re = Regexp.new('\d{4}-\d{2}-\d{2}')
array.each do |folder|
puts folder[re]
# folder.each or other processing ...
end

Related

Why are Ruby's file related types string-based (stringly typed)?

e.g. Dir.entries returns an array of strings vs an array containing File or Dir instances.
Most methods on Dir and File types. The instances are aneamic in comparison.
There is no Dir#folders or Dir#files - instead I explicitly
loop over Dir.entries
build the path (File.expand_path) for
each item
check File.directory?
Simple use-cases like get all .svg files in this directory seem to require a number of hoops/loops/checks. Am I using Ruby wrong or does this facet of Ruby seem very un-ruby-ish?
Depending on your needs, File or Dir might do just fine.
When you need to chain commands and (rightfully) think it feels un-ruby-ish to only use class methods with string parameters, you can use Pathname. It is a standard library.
Examples
Dirs and Files
require 'pathname'
my_folder = Pathname.new('./')
dirs, files = my_folder.children.partition(&:directory?)
# dirs is now an Array of Pathnames pointing to subdirectories of my_folder
# files is now an Array of Pathnames pointing to files inside my_folder
All .svg files
If for some reason there might be folders with .svg extension, you can just filter the pathnames returned by Pathname.glob :
svg_files = Pathname.glob("folder/", "*.svg").select(&:file?)
If you want a specific syntax :
class Pathname
def files
children.select(&:file?)
end
end
aDir = Pathname.new('folder/')
p aDir.files.find_all{ |f| f.extname == '.svg' }
Iterating the Directory tree
Pathname#find will help.
Until you open the file it is just a path (string).
To open all .svg files
svgs = Dir.glob(File.join('/path/to/dir', '*.svg'))
On windows case doesn't matter in file paths, but in all unixoid systems (Linux, MacOS...) file.svg is different from file.SVG
To get all .svg files and.SVG files you need File::FNM_CASEFOLD flag.
If you want to get .svg files recursively, you need **/*.svg
svgs = Dir.glob('/path/to/dir/**/*.svg', File::FNM_CASEFOLD)
If you expect directories ending in.svg then filter them out
svgs.reject! { |path| File.directory?(path) }

Keeping track of current directory per user

I am currently creating a client/server application which is trying to keep track of multiple connected users current directories by way of pairing their unique identifier (username), and a new Dir object to an array of hashes like so:
users = []
user = {:user => "userN", :dir => Dir.new(".")}
users.push(user)
...
Although when accessing the dir key within the users hash, I can't seem to use the objects methods properly.
For example:
users[0][:dir].chdir("../")
Returns undefined methodchrdirfor #<Dir:.>
Likewise the method entries which is supposed to accept 1 argument for listing the contents of a directory, only accepts 0 arguments, and when called with 0 arguments it only lists the current directory initialized when Dir was created.
Is there a simple way to keep track of a user's pseudo location within the filesystem?
Edit:: I found the Pathname class and it sort of implements what I need. I am just wondering now if there is a cleaner way to implementing the cd and ls commands when using it.
#Simulate a single users default directory starting point
$dir = Pathname.pwd
#Create a backup of the current directory, change to new directory,
#test to see if the directory exists and if not return to the backup
def cd(dir)
backup = $dir
$dir += dir
$dir = backup if !($dir.directory?)
end
#Take the array of Pathname objects from entries and convert them
#to their string directory values and return the sorted array
def ls(dir)
$dir.entries.map { |pathobject| pathobject.to_s }.sort
end
Your problem actually isn't using a hash incorrectly, it's that Dir.chdir is a global method that changes the working directory of the current process. Dir.entries is similar.
If you're trying to keep track of a path on a per user basis, you could store it as a File, which can also be a directory. That is, directories are represented as a File, so even though it's called "file", it can still store a directory path.
The answer to my question as I've found out is to use the Pathname class: Pathname
It allows you to use the += operator to transverse the file system, although you will have to manually implement many checks to make sure where you are going to transverse to actually exists.
When I implemented my ls command I simply mapped the output of Pathname.entries, and sorted the results.
def ls(pathname)
pathname.entries.map { |pathobject| pathobject.to_s }.sort
end
This gave you an array of sorted strings of all the files in the current directory that Pathname is set to.
For cd you need to make sure the directory exists and if not revert to the previously good directory.
def cd(pathname, directory_to_move_to)
directory_backup = pathname
pathname += directory_to_move_to
pathname = directory_backup if !(pathname.directory?)
end
Example usage:
my_pathname = Pathname.pwd
cd(my_pathname, "../")
ls(my_pathname)

Remove certain characters from several files

I want to remove the following characters from several files in a folder. What I have so far is this:
str.delete! '!##$%^&*()
which I think will work to remove the characters. What do I need to do to make it run through all the files in the folder?
You clarified your question, stating you want to remove certain characters from the contents of files in a directory. I created a straight forward way to traverse a directory (and optionally, subdirectories) and remove specified characters from the file contents. I used String#delete like you started with. If you want to remove more advanced patterns you might want to change it to String#gsub with regular expressions.
The example below will traverse a tmp directory (and all subdirectories) relative to the current working directory and remove all occurrences of !, $, and # inside the files found. You can of course also pass the absolute path, e.g., C:/some/dir. Notice I do not filter on files, I assume it's all text files in there. You can of course add a file extension check if you wish.
def replace_in_files(dir, chars, subdirs=true)
Dir[dir + '/*'].each do |file|
if File.directory?(file) # Traverse inner directories if subdirs == true
replace_in_files(file, chars, subdirs) if subdirs
else # Replace file contents
replaced = File.read(file).delete(chars)
File.write(file, replaced)
end
end
end
replace_in_files('tmp', '!$#')
I think this might work, although I'm a little shaky on the Dir class in Ruby.
Dir.foreach('/path/to/dir') do |file|
file.delete '!##$%^&*()
end
There's a more general version of your question here: Iterate through every file in one directory
Hopefully a more thorough answer will be forthcoming but maybe this'll get you where you need.
Dir.foreach('filepath') do |f|
next if Dir.exists?(f)
file = File.new("filepath/#{f}",'r+')
text = file.read.delete("'!##$%^&*()")
file.rewind
file.write(text)
file.close
end
The reason you can't do
file.write(file.read.delete("'!##$%^&*()"))
is that file.read leaves the "cursor" at the end of the text. Instead of writing over the file, you would be appending to the file, which isn't what you want.
You could also add a method to the File class that would move the cursor to the beginning of the file.
class File
def newRead
data = self.read
self.rewind
data
end
end
Dir.foreach('filepath') do |f|
next if Dir.exists?(f)
file = File.new("filepath/#{f}",'r+')
file.write(file.newRead.delete("'!##$%^&*()"))
file.close
end

Ruby FTP Separating files from Folders

I'm trying to crawl FTP and pull down all the files recursively.
Up until now I was trying to pull down a directory with
ftp.list.each do |entry|
if entry.split(/\s+/)[0][0, 1] == "d"
out[:dirs] << entry.split.last unless black_dirs.include? entry.split.last
else
out[:files] << entry.split.last unless black_files.include? entry.split.last
end
But turns out, if you split the list up until last space, filenames and directories with spaces are fetched wrong.
Need a little help on the logic here.
You can avoid recursion if you list all files at once
files = ftp.nlst('**/*.*')
Directories are not included in the list but the full ftp path is still available in the name.
EDIT
I'm assuming that each file name contains a dot and directory names don't. Thanks for mentioning #Niklas B.
There are a huge variety of FTP servers around.
We have clients who use some obscure proprietary, Windows-based servers and the file listing returned by them look completely different from Linux versions.
So what I ended up doing is for each file/directory entry I try changing directory into it and if this doesn't work - consider it a file :)
The following method is "bullet proof":
# Checks if the give file_name is actually a file.
def is_ftp_file?(ftp, file_name)
ftp.chdir(file_name)
ftp.chdir('..')
false
rescue
true
end
file_names = ftp.nlst.select {|fname| is_ftp_file?(ftp, fname)}
Works like a charm, but please note: if the FTP directory has tons of files in it - this method takes a while to traverse all of them.
You can also use a regular expression. I put one together. Please verify if it works for you as well as I don't know it your dir listing look different. You have to use Ruby 1.9 btw.
reg = /^(?<type>.{1})(?<mode>\S+)\s+(?<number>\d+)\s+(?<owner>\S+)\s+(?<group>\S+)\s+(?<size>\d+)\s+(?<mod_time>.{12})\s+(?<path>.+)$/
match = entry.match(reg)
You are able to access the elements by name then
match[:type] contains a 'd' if it's a directory, a space if it's a file.
All the other elements are there as well. Most importantly match[:path].
Assuming that the FTP server returns Unix-like file listings, the following code works. At least for me.
regex = /^d[r|w|x|-]+\s+[0-9]\s+\S+\s+\S+\s+\d+\s+\w+\s+\d+\s+[\d|:]+\s(.+)/
ftp.ls.each do |line|
if dir = line.match(regex)
puts dir[1]
end
end
dir[1] contains the name of the directory (given that the inspected line actually represents a directory).
As #Alex pointed out, using patterns in filenames for this is hardly reliable. Directories CAN have dots in their names (.ssh for example), and listings can be very different on different servers.
His method works, but as he himself points out, takes too long.
I prefer using the .size method from Net::FTP.
It returns the size of a file, or throws an error if the file is a directory.
def item_is_file? (item)
ftp = Net::FTP.new(host, username, password)
begin
if ftp.size(item).is_a? Numeric
true
end
rescue Net::FTPPermError
return false
end
end
I'll add my solution to the mix...
Using ftp.nlst('**/*.*') did not work for me... server doesn't seem to support that ** syntax.
The chdir trick with a rescue seems expensive and hackish.
Assuming that all files have at least one char, a single period, and then an extension, I did a simple recursion.
def list_all_files(ftp, folder)
entries = ftp.nlst(folder)
file_regex = /.+\.{1}.*/
files = entries.select{|e| e.match(file_regex)}
subfolders = entries.reject{|e| e.match(file_regex)}
subfolders.each do |subfolder|
files += list_all_files(ftp, subfolder)
end
files
end
nlst seems to return the full path to whatever it finds non-recursively... so each time you get a listing, separate the files from the folders, and then process any folder you find recrsively. Collect all the file results.
To call, you can pass a starting folder
files = list_all_files(ftp, "my_starting_folder/my_sub_folder")
files = list_all_files(ftp, ".")
files = list_all_files(ftp, "")
files = list_all_files(ftp, nil)

Recursively find folder names only (not files)

Is it possible to display the folder names (only) recursively. I know, to display the files from the specific folder using the following command.
Dir.glob("/home/test/**/*.pdf")
or
Dir['/home/test/**/*.*']
But, i need to display folder name only.
you put a slash, like this
Dir["**/"].each {|x| puts x}

Resources