Avoiding making multiple calls to Find.find("./") in Ruby - ruby

I am not sure what is the best strategy for this. I have a class, where I can search the filesystem for a certain pattern of files. I want to execute Find.find("./") only once. how would I approach this:
def files_pattern(pattern)
Find.find("./") do |f|
if f.include? pattern
#fs << f
end
end
end

Remembering the (usually computationally intensive) result of a method call so that you don't need to recalculate it next time the method is called is known as memoization so you will probably want to read more about that.
One way of achieving that it Ruby is to use a little wrapper class that stores the result in an instance variable. e.g.
class Finder
def initialize(pattern)
#pattern = pattern
end
def matches
#matches ||= find_matches
end
private
def find_matches
fs = []
Find.find("./") do |f|
if f.include? #pattern
fs << f
end
end
fs
end
end
And then you can do:
irb(main):089:0> f = Finder.new 'xml'
=> #<Finder:0x2cfc568 #pattern="xml">
irb(main):090:0> f.matches
find_matches
=> ["./example.xml"]
irb(main):091:0> f.matches # won't result in call to find_matches
=> ["./example.xml"]
Note: the ||= operator performs an assignment only if the variable on the left hand side does evaluates to false. i.e. #matches ||= find_matches is shorthand for #matches = #matches || find_matches where find_matches will only be called the first time due to short circuit evaluation. There are lots of other questions explaining it on Stackoverflow.
Slight variation: You could change your method to return a list of all files and then use methods from Enumerable such as grep and select to perform multiple searches against the same list of files. Of course, this has the downside of keeping the entire list of files in memory. Here is an example though:
def find_all
fs = []
Find.find("./") do |f|
fs << f
end
fs
end
And then use it like:
files = find_all
files.grep /\.xml/
files.select { |f| f.include? '.cpp' }
# etc

If I understand your question correctly you want to run Find.find to assign the result to an instance variable. You can move what is now the block to a separate method and call that to return only files matching your pattern.
Only problem is that if the directory contains many files, you are holding a big array in memory.

how about system "find / -name #{my_pattern}"

Related

Ruby Enumerable#find returning mapped value

Does Ruby's Enumerable offer a better way to do the following?
output = things
.find { |thing| thing.expensive_transform.meets_condition? }
.expensive_transform
Enumerable#find is great for finding an element in an enumerable, but returns the original element, not the return value of the block, so any work done is lost.
Of course there are ugly ways of accomplishing this...
Side effects
def constly_find(things)
output = nil
things.each do |thing|
expensive_thing = thing.expensive_transform
if expensive_thing.meets_condition?
output = expensive_thing
break
end
end
output
end
Returning from a block
This is the alternative I'm trying to refactor
def costly_find(things)
things.each do |thing|
expensive_thing = thing.expensive_transform
return expensive_thing if expensive_thing.meets_condition?
end
nil
end
each.lazy.map.find
def costly_find(things)
things
.each
.lazy
.map(&:expensive_transform)
.find(&:meets_condition?)
end
Is there something better?
Of course there are ugly ways of accomplishing this...
If you had a cheap operation, you'd just use:
collection.map(&:operation).find(&:condition?)
To make Ruby call operation only "on a as-needed basis" (as the documentation says), you can simply prepend lazy:
collection.lazy.map(&:operation).find(&:condition?)
I don't think this is ugly at all—quite the contrary— it looks elegant to me.
Applied to your code:
def costly_find(things)
things.lazy.map(&:expensive_transform).find(&:meets_condition?)
end
I would be inclined to create an enumerator that generates values thing.expensive_transform and then make that the receiver for find with meets_condition? in find's block. For one, I like the way that reads.
Code
def costly_find(things)
Enumerator.new { |y| things.each { |thing| y << thing.expensive_transform } }.
find(&:meets_condition?)
end
Example
class Thing
attr_reader :value
def initialize(value)
#value = value
end
def expensive_transform
self.class.new(value*2)
end
def meets_condition?
value == 12
end
end
things = [1,3,6,4].map { |n| Thing.new(n) }
#=> [#<Thing:0x00000001e90b78 #value=1>, #<Thing:0x00000001e90b28 #value=3>,
# #<Thing:0x00000001e90ad8 #value=6>, #<Thing:0x00000001e90ab0 #value=4>]
costly_find(things)
#=> #<Thing:0x00000001e8a3b8 #value=12>
In the example I have assumed that expensive_things and things are instances of the same class, but if that is not the case the code would need to be modified in the obvious way.
I don't think there is a "obvious best general solution" for your problem, which is also simple to use. You have two procedures involved (expensive_transform and meets_condition?), and you also would need - if this were a library method to use - as a third parameter the value to return, if no transformed element meets the condition. You return nil in this case, but in a general solution, expensive_transform might also yield nil, and only the caller knows what unique value would indicate that the condition as not been met.
Hence, a possible solution within Enumerable would have the signature
class Enumerable
def find_transformed(default_return_value, transform_proc, condition_proc)
...
end
end
or something similar, so this is not particularily elegant either.
You could do it with a single block, if you agree to merge the semantics of the two procedures into one: You have only one procedure, which calculates the transformed value and tests it. If the test succeeds, it returns the transformed value, and if it fails, it returns the default value:
class Enumerable
def find_by(default_value, &block)
result = default_value
each do |element|
result = block.call(element)
break if result != default_value
end
end
result
end
You would use it in your case like this:
my_collection.find_by(nil) do |el|
transformed_value = expensive_transform(el)
meets_condition?(transformed_value) ? transformed_value : nil
end
I'm not sure whether this is really intuitive to use...

Puts arrays in file using ruby

This is a part of my file:
project(':facebook-android-sdk-3-6-0').projectDir = new File('facebook-android-sdk-3-6-0/facebook-android-sdk-3.6.0/facebook')
project(':Forecast-master').projectDir = new File('forecast-master/Forecast-master/Forecast')
project(':headerListView').projectDir = new File('headerlistview/headerListView')
project(':library-sliding-menu').projectDir = new File('library-sliding-menu/library-sliding-menu')
I need to extract the names of the libs. This is my ruby function:
def GetArray
out_file = File.new("./out.txt", "w")
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
File.open(out_file, "w") do |f|
l.each do |ch|
f.write("#{ch}\n")
end
end
puts "#{l} "
end
end
My function returns this:
[]
[["CoverFlowLibrary"]]
[["Android-RSS-Reader-Library-master"]]
[["library"]]
[["facebook-android-sdk-3-6-0"]]
[["Forecast-master"]]
My problem is that I find nothing in out_file. How can I write to a file? Otherwise, I only need to get the name of the libs in the file.
Meditate on this:
"project(':facebook-android-sdk-3-6-0').projectDir'".scan(/project\(\'\:(.*)\'\).projectDir/)
# => [["facebook-android-sdk-3-6-0"]]
When scan sees the capturing (...), it will create a sub-array. That's not what you want. The knee-jerk reaction is to flatten the resulting array of arrays but that's really just a band-aid on the code because you chose the wrong method.
Instead consider this:
"project(':facebook-android-sdk-3-6-0').projectDir'"[/':([^']+)'/, 1]
# => "facebook-android-sdk-3-6-0"
This is using String's [] method to apply a regular expression with a capture and return that captured text. No sub-arrays are created.
scan is powerful and definitely has its place, but not for this sort of "find one thing" parsing.
Regarding your code, I'd do something like this untested code:
def get_array
File.new('./out.txt', 'w') do |out_file|
File.foreach('./file.txt') do |line|
l = line[/':([^']+)'/, 1]
out_file.puts l
puts l
end
end
end
Methods in Ruby are NOT camelCase, they're snake_case. Constants, like classes, start with a capital letter and are CamelCase. Don't go all Java on us, especially if you want to write code for a living. So GetArray should be get_array. Also, don't start methods with "get_", and don't call it array; Use to_a to be idiomatic.
When building a regular expression start simple and do your best to keep it simple. It's a maintainability thing and helps to reduce insanity. /':([^']+)'/ is a lot easier to read and understand, and accomplishes the same as your much-too-complex pattern. Regular expression engines are greedy and lazy and want to do as little work as possible, which is sometimes totally evil, but once you understand what they're doing it's possible to write very small/succinct patterns to accomplish big things.
Breaking it down, it basically says "find the first ': then start capturing text until the next ', which is what you're looking for. project( can be ignored as can ).projectDir.
And actually,
/':([^']+)'/
could really be written
/:([^']+)'/
but I felt generous and looked for the leading ' too.
The problem is that you're opening the file twice: once in:
out_file = File.new("./out.txt", "w")
and then once for each line:
File.open(out_file, "w") do |f| ...
Try this instead:
def GetArray
File.open("./out.txt", "w") do |f|
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
l.each do |ch|
f.write("#{ch}\n")
end # l.each
end # File.foreach
end # File.open
end # def GetArray

Functionally find mapping of first value that passes a test

In Ruby, I have an array of simple values (possible encodings):
encodings = %w[ utf-8 iso-8859-1 macroman ]
I want to keep reading a file from disk until the results are valid. I could do this:
good = encodings.find{ |enc| IO.read(file, "r:#{enc}").valid_encoding? }
contents = IO.read(file, "r:#{good}")
...but of course this is dumb, since it reads the file twice for the good encoding. I could program it in gross procedural style like so:
contents = nil
encodings.each do |enc|
if (s=IO.read(file, "r:#{enc}")).valid_encoding?
contents = s
break
end
end
But I want a functional solution. I could do it functionally like so:
contents = encodings.map{|e| IO.read(f, "r:#{e}")}.find{|s| s.valid_encoding? }
…but of course that keeps reading files for every encoding, even if the first was valid.
Is there a simple pattern that is functional, but does not keep reading the file after a the first success is found?
If you sprinkle a lazy in there, map will only consume those elements of the array that are used by find - i.e. once find stops, map stops as well. So this will do what you want:
possible_reads = encodings.lazy.map {|e| IO.read(f, "r:#{e}")}
contents = possible_reads.find {|s| s.valid_encoding? }
Hopping on sepp2k's answer: If you can't use 2.0, lazy enums can be easily implemented in 1.9:
class Enumerator
def lazy_find
self.class.new do |yielder|
self.each do |element|
if yield(element)
yielder.yield(element)
break
end
end
end
end
end
a = (1..100).to_enum
p a.lazy_find { |i| i.even? }.first
# => 2
You want to use the break statement:
contents = encodings.each do |e|
s = IO.read( f, "r:#{e}" )
s.valid_encoding? and break s
end
The best I can come up with is with our good friend inject:
contents = encodings.inject(nil) do |s,enc|
s || (c=File.open(f,"r:#{enc}").valid_encoding? && c
end
This is still sub-optimal because it continues to loop through encodings after finding a match, though it doesn't do anything with them, so it's a minor ugliness. Most of the ugliness comes from...well, the code itself. :/

A twist on directory walking in Ruby

I'd like to do the following:
Given a directory tree:
Root
|_dirA
|_dirB
|_file1
|_file2
|_dirC
|_dirD
|_dirE
|_file3
|_file4
|_dirF
|_dirG
|_file5
|_file6
|_file7
... I'd like to walk the directory tree and build an array that contains the path to the first file in each directory that has at least one file. The overall structure may be quite large with many more files than directories, so I'd like to capture just the path to the first file without iterating through all the files in a given directory. One file is enough. For the above tree, the result should look like an array that contains only:
root/dirB/file1
root/dirC/dirD/dirE/file3
root/dirF/dirG/file5
I've played with the Dir and Find options in ruby, but my approach feels too brute-force-ish.
Is there an efficient way to code this functionality? It feels like I am missing some ruby trick here.
Many thanks!
Here's my approach:
root="/home/subtest/tsttree/"
Dir.chdir(root)
dir_list=Dir.glob("**/*/") #this invokes recursion
result=Array.new
dir_list.each do |d|
Dir.chdir(root + d)
Dir.open(Dir.pwd).each do |filename|
next if File.directory? filename #some directories may contain only other directories so exclude them
result.push(d + filename)
break
end
end
puts result
Works, but seems messy.
require 'pathname'
# My answer to stackoverflow question posted here:
# http://stackoverflow.com/questions/12684736/a-twist-on-directory-walking-in-ruby
class ShallowFinder
def initialize(root)
#matches = {}
#root = Pathname(root)
end
def matches
while match = next_file
#matches[match.parent.to_s] = match
end
#matches.values
end
private
def next_file
#root.find do |entry|
Find.prune if previously_matched?(entry)
return entry if entry.file?
end
nil
end
def previously_matched?(entry)
return unless entry.directory?
#matches.key?(entry.to_s)
end
end
puts ShallowFinder.new('Root').matches
Outputs:
Root/B/file1
Root/C/D/E/file3
Root/F/G/file5

Visitor Pattern in Ruby, or just use a Block?

Hey there, I have read the few posts here on when/how to use the visitor pattern, and some articles/chapters on it, and it makes sense if you are traversing an AST and it is highly structured, and you want to encapsulate the logic into a separate "visitor" object, etc. But with Ruby, it seems like overkill because you could just use blocks to do nearly the same thing.
I would like to pretty_print xml using Nokogiri. The author recommended that I use the visitor pattern, which would require I create a FormatVisitor or something similar, so I could just say "node.accept(FormatVisitor.new)".
The issue is, what if I want to start customizing all the stuff in the FormatVisitor (say it allows you to specify how nodes are tabbed, how attributes are sorted, how attributes are spaced, etc.).
One time I want the nodes to have 1 tab for each nest level, and the attributes to be in any order
The next time, I want the nodes to have 2 spaces, and the attributes in alphabetical order
The next time, I want them with 3 spaces and with two attributes per line.
I have a few options:
Create an options hash in the constructor (FormatVisitor.new({:tabs => 2})
Set values after I have constructed the Visitor
Subclass the FormatVisitor for each new implementation
Or just use blocks, not the visitor
Instead of having to construct a FormatVisitor, set values, and pass it to the node.accept method, why not just do this:
node.pretty_print do |format|
format.tabs = 2
format.sort_attributes_by {...}
end
That's in contrast to what I feel like the visitor pattern would look like:
visitor = Class.new(FormatVisitor) do
attr_accessor :format
def pretty_print(node)
# do something with the text
#format.tabs = 2 # two tabs per nest level
#format.sort_attributes_by {...}
end
end.new
doc.children.each do |child|
child.accept(visitor)
end
Maybe I've got the visitor pattern all wrong, but from what I've read about it in ruby, it seems like overkill. What do you think? Either way is fine with me, just wondering what how you guys feel about it.
Thanks a lot,
Lance
In essence, a Ruby block is the Visitor pattern without the extra boilerplate. For trivial cases, a block is sufficient.
For example, if you want to perform a simple operation on an Array object, you would just call the #each method with a block instead of implementing a separate Visitor class.
However, there are advantages in implementing a concrete Visitor pattern under certain cases:
For multiple, similar but complex operations, Visitor pattern provides inheritance and blocks don't.
Cleaner to write a separate test suite for Visitor class.
It's always easier to merge smaller, dumb classes into a larger smart class than separating a complex smart class into smaller dumb classes.
Your implementation seems mildly complex, and Nokogiri expects a Visitor instance that impelment #visit method, so Visitor pattern would actually be a good fit in your particular use case. Here is a class based implementation of the visitor pattern:
FormatVisitor implements the #visit method and uses Formatter subclasses to format each node depending on node types and other conditions.
# FormatVisitor implments the #visit method and uses formatter to format
# each node recursively.
class FormatVistor
attr_reader :io
# Set some initial conditions here.
# Notice that you can specify a class to format attributes here.
def initialize(io, tab: " ", depth: 0, attributes_formatter_class: AttributesFormatter)
#io = io
#tab = tab
#depth = depth
#attributes_formatter_class = attributes_formatter_class
end
# Visitor interface. This is called by Nokogiri node when Node#accept
# is invoked.
def visit(node)
NodeFormatter.format(node, #attributes_formatter_class, self)
end
# helper method to return a string with tabs calculated according to depth
def tabs
#tab * #depth
end
# creates and returns another visitor when going deeper in the AST
def descend
self.class.new(#io, {
tab: #tab,
depth: #depth + 1,
attributes_formatter_class: #attributes_formatter_class
})
end
end
Here the implementation of AttributesFormatter used above.
# This is a very simple attribute formatter that writes all attributes
# in one line in alphabetical order. It's easy to create another formatter
# with the same #initialize and #format interface, and you can then
# change the logic however you want.
class AttributesFormatter
attr_reader :attributes, :io
def initialize(attributes, io)
#attributes, #io = attributes, io
end
def format
return if attributes.empty?
sorted_attribute_keys.each do |key|
io << ' ' << key << '="' << attributes[key] << '"'
end
end
private
def sorted_attribute_keys
attributes.keys.sort
end
end
NodeFormatters uses Factory pattern to instantiate the right formatter for a particular node. In this case I differentiated text node, leaf element node, element node with text, and regular element nodes. Each type has a different formatting requirement. Also note, that this is not complete, e.g. comment nodes are not taken into account.
class NodeFormatter
# convience method to create a formatter using #formatter_for
# factory method, and calls #format to do the formatting.
def self.format(node, attributes_formatter_class, visitor)
formatter_for(node, attributes_formatter_class, visitor).format
end
# This is the factory that creates different formatters
# and use it to format the node
def self.formatter_for(node, attributes_formatter_class, visitor)
formatter_class_for(node).new(node, attributes_formatter_class, visitor)
end
def self.formatter_class_for(node)
case
when text?(node)
Text
when leaf_element?(node)
LeafElement
when element_with_text?(node)
ElementWithText
else
Element
end
end
# Is the node a text node? In Nokogiri a text node contains plain text
def self.text?(node)
node.class == Nokogiri::XML::Text
end
# Is this node an Element node? In Nokogiri an element node is a node
# with a tag, e.g. <img src="foo.png" /> It can also contain a number
# of child nodes
def self.element?(node)
node.class == Nokogiri::XML::Element
end
# Is this node a leaf element node? e.g. <img src="foo.png" />
# Leaf element nodes should be formatted in one line.
def self.leaf_element?(node)
element?(node) && node.children.size == 0
end
# Is this node an element node with a single child as a text node.
# e.g. <p>foobar</p>. We will format this in one line.
def self.element_with_text?(node)
element?(node) && node.children.size == 1 && text?(node.children.first)
end
attr_reader :node, :attributes_formatter_class, :visitor
def initialize(node, attributes_formatter_class, visitor)
#node = node
#visitor = visitor
#attributes_formatter_class = attributes_formatter_class
end
protected
def attribute_formatter
#attribute_formatter ||= #attributes_formatter_class.new(node.attributes, io)
end
def tabs
visitor.tabs
end
def io
visitor.io
end
def leaf?
node.children.empty?
end
def write_tabs
io << tabs
end
def write_children
v = visitor.descend
node.children.each { |child| child.accept(v) }
end
def write_attributes
attribute_formatter.format
end
def write_open_tag
io << '<' << node.name
write_attributes
if leaf?
io << '/>'
else
io << '>'
end
end
def write_close_tag
return if leaf?
io << '</' << node.name << '>'
end
def write_eol
io << "\n"
end
class Element < self
def format
write_tabs
write_open_tag
write_eol
write_children
write_tabs
write_close_tag
write_eol
end
end
class LeafElement < self
def format
write_tabs
write_open_tag
write_eol
end
end
class ElementWithText < self
def format
write_tabs
write_open_tag
io << text
write_close_tag
write_eol
end
private
def text
node.children.first.text
end
end
class Text < self
def format
write_tabs
io << node.text
write_eol
end
end
end
To use this class:
xml = "<root><aliens><alien><name foo=\"bar\">Alf<asdf/></name></alien></aliens></root>"
doc = Nokogiri::XML(xml)
# the FormatVisitor accepts an IO object and writes to it
# as it visits each node, in this case, I pick STDOUT.
# You can also use File IO, Network IO, StringIO, etc.
# As long as it support the #puts method, it will work.
# I'm using the defaults here. ( two spaces, with starting depth at 0 )
visitor = FormatVisitor.new(STDOUT)
# this will allow doc ( the root node ) to call visitor.visit with
# itself. This triggers the visiting of each children recursively
# and contents written to the IO object. ( In this case, it will
# print to STDOUT.
doc.accept(visitor)
# Prints:
# <root>
# <aliens>
# <alien>
# <name foo="bar">
# Alf
# <asdf/>
# </name>
# </alien>
# </aliens>
# </root>
With the above code, you can change node formatting behaviors by constructing extra subclasses of NodeFromatters and plug them into the factory method. You can control the formatting of attributes with various implementation of the AttributesFromatter. As long as you adhere to its interface, you can plug it into the attributes_formatter_class argument without modifying anything else.
List of design patterns used:
Visitor Pattern: handle node traversal logic. ( Also interface requirement by Nokogiri. )
Factory Pattern, used to determine formatter based on node types and other formatting conditions. Note, if you don't like the class methods on NodeFormatter, you can extract them into NodeFormatterFactory to be more proper.
Dependency Injection (DI / IoC), used to control the formatting of attributes.
This demonstrates how you can combine a few patterns together to achieve the flexibility you desire. Although, if you need those flexibility is something you have to decide.
I would go with what is simple and works. I don't know the details, but what you wrote compared with the Visitor pattern, looks simpler. If it also works for you, I would use that. Personally, I am tired with all these techniques that ask you to create a huge "network" of interelated classes, just to solve one small problem.
Some would say, yeah, but if you do it using patterns then you can cover many future needs and blah blah. I say, do now what works and if the need arises, you can refactor in the future. In my projects, that need almost never arises, but that's a different story.

Resources