Determine name of last subfolder in a path (Ruby) - ruby

New to Ruby. I'm trying to figure out how to grab the name of a folder. I have this:
path = Dir["#{some_base_path}/*/*"]
Which gives me something like this:
path: ["/tmp/animals/cats/Fluffy"]
What I want is to know the name of the last subfolder - in this case Fluffy.
I've tried variations of Pathname and File.basename, but I always run into no implicit conversion of Array into String (TypeError) errors.
What would be the best way to do this?`

You already have your path. There is this neat thing in programming languages called Tokenization
You can split a string via a single character or more.
Starting with your array
paths = ["/tmp/animals/cats/Fluffy"]
=> ["/tmp/animals/cats/Fluffy"]
You could take the first element (which is your path string)
path = paths.first
=> "/tmp/animals/cats/Fluffy"
and tokenize it with ruby
tokens = path.split("/")
=> ["", "tmp", "animals", "cats", "Fluffy"]
and then return the last element of the array of "tokens".
tokens.last
=> "Fluffy"

# Get array of subdirectories arrays
directories =
Dir[pattern].
filter_map { |filename| filename.split("/") if File.directory?(filename) }
# Get maximum subdirectories depth
max_depth = directories.max_by(&:size).size
# Get all subdirectories (tree leaves) with maximum depth
directories.filter_map { |dirs| dirs.last if dirs.size == max_depth }

Related

How to verify that the last character in a string is a number

I need to check if the last character in a string is a digit, and if so, increment it.
I have a directory structure of /u01/app/oracle/... and that's where it goes off the rails. Sometimes it ends with the version number, sometimes it ends with dbhome_1 (or 2, or 3), and sometimes, I have to assume, it will take some other form. If it ends with dbhome_X, I need to parse that and bump that final digit, if it is a digit.
I use split to split the directory structure on '/', and use include? to check if the final element is something like "dbhome". As long as my directory structure ends with dbhome_X it seems to work. As I was testing, though, I tried a path that ended with dbhome, and found that my check for the last character being a digit didn't work.
db_home = '/u01/app/oracle/product/11.2.0/dbhome'
if db_home.split('/')[-1].include?('dbhome')
homedir=db_home.split('/')[-1]
if homedir[-1].to_i.is_a? Numeric
homedir=homedir[0...-1]+(homedir[-1].to_i+1).to_s
new_path="/"+db_home.split('/')[1...-1].join("/")+"/"+homedir.to_s
end
else
new_path=db_home+"/dbhome_1"
end
puts new_path
I did not expect the output to be /u01/app/oracle/11.2.0/product/dbhom1 - it seems to have fallen into the if block that added 1 to the final character.
If I set the initial path to /u01/app/.../dbhome_1, I get the expected /u01/app/.../dbhome_2 as the output.
You could use a regular expression to make matching a tad bit easier
if !!(db_home[/.*dbhome.*\z]) ..
You could use regex's
/[0-9]$/.match("How3").nil?
I need to check if the last character in a string is a digit, and if
so, increment it.
This is one option:
s = 'string9'
s[-1].then { |last| last.to_i.to_s == last ? [s[0..-2], last.to_i+1].join : s }
#=> "string10"
'/u01/app/11.2.0/dbhome'.sub(/\d\z/) { |s| s.succ }
#=> "/u01/app/11.2.0/dbhome"
'/u01/app/11.2.0/dbhome9'.sub(/\d\z/) { |s| s.succ }
#=> "/u01/app/11.2.0/dbhome10"
This is a starting point if you're running Ruby v2.6+:
fname = 'filename1'
fname[/\d+$/].then { |digits|
fname[/\d+$/] = digits.to_i.next.to_s if digits
}
fname # => "filename2"
And it's safe if the filename doesn't end with a digit:
fname = 'filename'
fname[/\d+$/].then { |digits|
fname[/\d+$/] = digits.to_i.next.to_s if digits
}
fname # => "filename"
I'm not sure if I like doing it that way better than the more traditional way which works with much older Rubies:
digits = fname[/\d+$/]
fname[/\d+$/] = digits.to_i.next.to_s if digits
except for the fact that digits gets stuck into the variable space after only being used once. There's probably worse things that happen in my code though.
This is taking advantage of String's [] and []= methods.

Use ruby to remove a part of a string on each entry in an array where it exists

I have a list of file paths, for example
[
'Useful',
'../Some.Root.Directory/Path/Interesting',
'../Some.Root.Directory/Path/Also/Interesting'
]
(I mention that they're file paths in case there is something that makes this task easier because they're files but they can be considered simply a set of strings some of which may start with a particular string)
and I need to make this into a set of pairs so that I have the original list but also
[
'Useful',
'Interesting',
'Also/Interesting'
]
I expected I'd be able to do this
'../Some.Root.Directory/Path/Interesting'.gsub!('../Some.Root.Directory/Path/', '')
or
'../Some.Root.Directory/Path/Interesting'.gsub!('\.\.\/Some\.Root\.Directory\/Path\/', '')
but neither of those replaces the provided string/pattern with an empty string...
So in irb
puts '../Some.Root.Directory/Path/Interesting'.gsub('\.\.\/Some\.Root\.Directory\/Path\/', '')
outputs
../Some.Root.Directory/Path/Interesting
and the desired output is
Interesting
How can I do this?
NB the path will be passed in so really I have
file_path.gsub!(removal_path, '')
If you are positive that strings start with removal_path you can do:
string[removal_path.size..-1]
to get the remaining part.
If you want to get pairs of the original paths and the shortened ones, you can use sub in combination with map:
a = [
'../Some.Root.Directory/Path/Interesting',
'../Some.Root.Directory/Path/Also/Interesting'
]
b = a.map do |v|
[v, v.sub('../Some.Root.Directory/Path', '')]
end
puts b
This will return an Array of arrays - each sub-array contains the original path plus the shortened one. As noted by #sawa - you can simply use sub instead of gsub, since you want to replace only a single occurrence.

Custom serialize and parse methods in Ruby

I have developed this class Directory that some what emulates a directory using hashes. I have difficulties figuring out how to do the serialize and parse methods. The returned string from the serialize method should look something like this:
2:README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:
Now to explain what exactly this means. This is the master directory and the 2 upfront means the number of files, than we have the file name README and after that the length of the contents of the file 19, represented with a string that I get from the parse method of the other class in the module. And after that the second file, also notice that the two files are not separated by :, we don't need it here since when know the string length. So in a little better look:
<file count><file1_data><file2_data>1:rbfs:4:0:0:, here <file1_data>, encompasses the name, length and contents part.
Now the 1:rbfs:4:0:0: means we have one sub-directory with name rbfs, 4 representing the length of it's contents as a string and 0:0: representing that it's empty, no file and no sub-directories. Here is another example:
0:1:directory1:40:0:1:directory2:22:1:README:9:number:420: which is equivalent to:
.
`-- directory1
`-- directory2
`-- README
I have no problem with the files part,and i know how to get the number of directories and their names, but the other part I have no idea what to do. I know that recursion is the best answer, but I have no clue what should the bottom of that recursion be and how to implement it. Also solving this will help greatly in figuring out how to do the parse method by reverse engineering it.
The code is below:
module RBFS
class File
... # here I have working `serialize` and `parse` methods for `File`
end
class Directory
attr_accessor :content
def initialize
#content = {}
end
def add_file (name,file)
#content[name]=file
end
def add_directory(name, subdirectory = nil)
if subdirectory
#content[name] = subdirectory
else
#content[name] = RBFS::Directory.new
end
end
def serialize
...?
end
def self.parse (string)
...?
end
end
end
PS: I check the kind of values in the hash with the is_a? method.
Another example for #Jordan:
2:file1:17:string:Test test?file2:10:number:4322:direc1:34:0:1:dir2:22:1:README:9:number:420:direc2::1:README2:9:number:33:0
...should be this structure (if I've formulated it right):
. ->file1,file2
`-- direc1,.....................................direc2 -> README2
`-- dir2(subdirectory of direc1) -> README
direc1 contains only a directory and no files, while direc2 contains only a file.
You can see that the master directory doesn't specify it's string length while all others do.
Okay, let's work through this iteratively, starting with your example:
str = "2:README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
entries = {} # No entries yet!
The very first thing we need to know is how many files there are, and we know we know that's the number before the first ::
num_entries, rest = str.split(':', 2)
num_entries = Integer(num_entries)
# num_entries is now 2
# rest is now "README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
The second argument to split says "I only want 2 pieces," so it stops splitting after the first :.) We use Integer(n) instead of n.to_i because it's stricter. (to_i will convert "10xyz" to 10; Integer will raise an error, which is what we want here.)
Now we know we have two files. We don't know anything else yet, but what's left of our string is this:
README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:
The next thing we can get is the name and length of the first file.
name, len, rest = rest.split(':', 3)
len = Integer(len.to_i)
# name = "README"
# len = 19
# rest = "string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
Cool, now we have the name and length of the first file, so we can get its content:
content = rest.slice!(0, len)
# content = "string:Hello world!"
# rest = "spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
entries[name] = content
# entries = { "README" => "string:Hello world!" }
We used rest.slice! which modifies removes len characters from the front of the string and returns them, so content is just what we want (string:Hello world!) and rest is everything that was after it. Then we added it to entries Hash. One file down, one to go!
For the second file, we do the exact same thing:
name, len, rest = rest.split(':', 3)
len = Integer(len)
# name = "spec.rb"
# len = 20
# rest = "string:describe RBFS1:rbfs:4:0:0:"
content = rest.slice!(0, len)
# content = "string:describe RBFS"
# rest = "1:rbfs:4:0:0:"
entries[name] = content
# entries = { "README" => "string:Hello world!",
# "spec.rb" => "string:describe RBFS" }
Since we do the exact same thing twice, obviously we should do this in a loop! But before we write that, we need to get organized. So far we have two discrete steps: First, get the number of files. Second, get those files' contents. We also know we'll need to get the number of directories and the directories. We'll take a guess at how this'll look:
def parse(serialized)
files, rest = parse_files(serialized)
# `files` will be a Hash of file names and their contents and `rest` will be
# the part of the string we haven't serialized yet
directories, rest = parse_directories(rest)
# `directories` will be a Hash of directory names and their contents
files.merge(directories)
end
def parse_files(serialized)
# Get the number of files from the beginning of the string
num_entries, rest = str.split(':', 2)
num_entries = Integer(num_entries)
entries = {}
# `rest` now starts with the first file (e.g. "README:19:...")
num_entries.times do
name, len, rest = rest.split(':', 3) # get the file name and length
len = Integer(len)
content = rest.slice!(0, len) # get the file contents from the beginning of the string
entries[name] = content # add it to the hash
end
[ entries, rest ]
end
def parse_directories(serialized)
# TBD...
end
That parse_files method is a bit long for my taste, though, so how about we split it up?
def parse_files(serialized)
# Get the number of files from the beginning of the string
num_entries, rest = str.split(':', 2)
num_entries = Integer(num_entries)
entries = {}
# `rest` now starts with the first file (e.g. "README:19:...")
num_entries.times do
name, content, rest = parse_file(rest)
entries[name] = content # add it to the hash
end
[ entries, rest ]
end
def parse_file(serialized)
name, len, rest = serialized.split(':', 3) # get the name and length of the file
len = Integer(len)
content = rest.slice!(0, len) # use the length to get its contents
[ name, content, rest ]
end
Clean!
Now, I'm going to give you a big spoiler: Since the serialization format is reasonably well-designed, we don't actually need a parse_directories method, because it would do exactly the same thing as parse_files. The only difference is that after this line:
name, content, rest = parse_file(rest)
...we want to do something different if we're parsing directories instead of files. In particular, we want to call parse(content), which will do all of this over again on the directory's contents. Since it's pulling double-duty now, we should probably change it's name to something more general like parse_entries, and we also need to give it another argument to tell it when to do that recursion.
Rather than post more code here, I've posted my "finished" product over in this Gist.
Now, I know that doesn't help you with the serialize part, but hopefully it'll help get you started. serialize is the easier part because there are plenty of questions and answers on SO about recursively iterating over a Hash.

Using a Hash of function names, search file and return with line numbers

Given a hash that contains function names like "find_by_user", "find_by_id", ...
I want to search in a directory of files, and return a object that has each file name, along with the line numbers of where the function name occurred.
I have this so far:
files = Dir.glob(#folder_path)
files.each do |file_name|
content = File.read(file_name)
end
This will be scanning a few hundred files.
Here's the basic functionality you need:
# Given a path to a file and a regex,
# return an array of paired filename+line number matches
def matching_lines( file_path, regex )
name = File.basename(file_path)
File.readlines(file_path)
.map.with_index{ |line,i| [name,line,i] }
.select{ |name,line,i| line =~ regex }
.map{ |name,line,i| [name,i] }
end
You can choose to use this as you like, iterating over multiple files and/or patterns, or using Regexp.union to create a pattern matching any one of a set of strings.
However: this is what grep was made for:
C:\>grep --line-number Nokogiri *.rb
push_nav_to_docs.rb:13: nav_dom = Nokogiri.XML(IO.read(NAV))
push_nav_to_docs.rb:39: landing = Nokogiri.XML(html)
push_nav_to_docs.rb:53: doc = Nokogiri.XML(IO.read(doc_path))
push_nav_to_docs.rb:73: if File.exists?(toc_path) && toc = Nokogiri.XML(IO.read(toc_path)).at('ul')
push_nav_to_docs.rb:104: container << Nokogiri.make("<ul/>").tap do |ul|
In Ruby you could call this code and get the output you want via:
lookfor = "Nokogiri"
grepped = `grep --line-number #{lookfor} *.rb`
results = grepped.scan(/^(.+?):(\d+)/)
#=> [["push_nav_to_docs.rb", "13"], ["push_nav_to_docs.rb", "39"], ["push_nav_to_docs.rb", "53"], ["push_nav_to_docs.rb", "73"], ["push_nav_to_docs.rb", "104"]]
Grep can also recurse into directories, match only particular file names, take regular expressions as patterns, and more.

How do I convert a Ruby string with brackets to an array?

I would like to convert the following string into an array/nested array:
str = "[[this, is],[a, nested],[array]]"
newarray = # this is what I need help with!
newarray.inspect # => [['this','is'],['a','nested'],['array']]
You'll get what you want with YAML.
But there is a little problem with your string. YAML expects that there's a space behind the comma. So we need this
str = "[[this, is], [a, nested], [array]]"
Code:
require 'yaml'
str = "[[this, is],[a, nested],[array]]"
### transform your string in a valid YAML-String
str.gsub!(/(\,)(\S)/, "\\1 \\2")
YAML::load(str)
# => [["this", "is"], ["a", "nested"], ["array"]]
You could also treat it as almost-JSON. If the strings really are only letters, like in your example, then this will work:
JSON.parse(yourarray.gsub(/([a-z]+)/,'"\1"'))
If they could have arbitrary characters (other than [ ] , ), you'd need a little more:
JSON.parse("[[this, is],[a, nested],[array]]".gsub(/, /,",").gsub(/([^\[\]\,]+)/,'"\1"'))
For a laugh:
ary = eval("[[this, is],[a, nested],[array]]".gsub(/(\w+?)/, "'\\1'") )
=> [["this", "is"], ["a", "nested"], ["array"]]
Disclaimer: You definitely shouldn't do this as eval is a terrible idea, but it is fast and has the useful side effect of throwing an exception if your nested arrays aren't valid
Looks like a basic parsing task. Generally the approach you are going to want to take is to create a recursive function with the following general algorithm
base case (input doesn't begin with '[') return the input
recursive case:
split the input on ',' (you will need to find commas only at this level)
for each sub string call this method again with the sub string
return array containing the results from this recursive method
The only slighlty tricky part here is splitting the input on a single ','. You could write a separate function for this that would scan through the string and keep a count of the openbrackets - closedbrakets seen so far. Then only split on commas when the count is equal to zero.
Make a recursive function that takes the string and an integer offset, and "reads" out an array. That is, have it return an array or string (that it has read) and an integer offset pointing after the array. For example:
s = "[[this, is],[a, nested],[array]]"
yourFunc(s, 1) # returns ['this', 'is'] and 11.
yourFunc(s, 2) # returns 'this' and 6.
Then you can call it with another function that provides an offset of 0, and makes sure that the finishing offset is the length of the string.

Resources