I am working on a compiler written in Ruby and I am currently at the semantic analysis stage (type checking). I have an AST that I need to visit in two ways: pre-order and post-order, I was wondering what the best way to do this is in Ruby. I know that passing a block to each is essentially the Visitor Pattern, but since I need to visit in two ways(pre, post) and Ruby doesn't support method overloading, I am not sure how to approach this.
(Note: I am trying to have the Node objects control how they are visited, so my Visitor isn't bloated)
Here is what I am thinking about trying:
Two accept methods for each Node class accept_pre and accept_post that call the corresponding accept_pre and accept_post methods of other Nodes
class Node
def initialize(a, b, c)
#a, #b, #c = a, b, c
end
def accept_pre(visitor)
#a.accept_pre visitor
#b.accept_pre visitor
#c.accept_pre visitor
vistor.visit_node(self)
end
def accept_post(visitor)
visitor.visit_node(self)
#c.accept_post visitor
#b.accept_post visitor
#a.accept_post visitor
end
end
Is there a better way to do this? Can it be done with .each, even though I need two orderings?
Any help would be appreciated.
You could fold the two orderings into one accept using an traversal option arg. You can certainly use each over the node's members to dispatch the children's accepts.
class Node
def initialize(a, b, c)
#a, #b, #c = a, b, c
end
def accept(visitor, traversal=:pre)
visitor.visit(self) if traversal == :pre
order = traversal == :pre ? :each : :reverse_each
[#a,#b,#c].send(order) { |e| e.accept(visitor, traversal) }
visitor.visit(self) if traversal == :post
end
end
Related
I am attempting to search a simple tree to find a target value. My instructions are to use recursion. I understand that recursion means (in part) that the method is called from within itself. In that sense the following code succeeds. However, I also understand (I think) that recursion requires setting up an inquiry that cannot be resolved until the base case is reached, at which time the answer is passed back in the chain and each unresolved inquiry can be resolved. My code does not really do that (I think). I need a higher power to show me (and hopefully others) the way.
class Tree
attr_accessor :payload, :children
def initialize(payload, children)
#payload = payload
#children = children
end
def self.test (node, target)
if node.payload == target
print "Bingo!"
else
print node.payload
end
node.children.each do |child|
Tree.test(child, target)
end
end
end
# The "Leafs" of a tree, elements that have no children
deep_fifth_node = Tree.new(5, [])
eleventh_node = Tree.new(11, [])
fourth_node = Tree.new(4, [])
# The "Branches" of the tree
ninth_node = Tree.new(9, [fourth_node])
sixth_node = Tree.new(6, [deep_fifth_node, eleventh_node])
seventh_node = Tree.new(7, [sixth_node])
shallow_fifth_node = Tree.new(5, [ninth_node])
# The "Trunk" of the tree
trunk = Tree.new(2, [seventh_node, shallow_fifth_node])
Tree.test(trunk, 5)
Assuming your function calls itself (and yours appears to), you're using recursion. See https://softwareengineering.stackexchange.com/questions/25052/in-plain-english-what-is-recursion for a good discussion on recursion.
To prevent infinite recursion, you'll want a base case that makes sure that you do eventually unwind everything completely, rather than simply continuing to call your function forever, and quite often that is when you're fully complete with whatever computation you're running, but it doesn't necessarily mean you couldn't find what you're looking for (in the case of a search) before you reach the base case.
Yes, your function is using recursion: It calls itself.
Here is a proposal for your method:
def find_nodes_with_payload?(target, result = [])
if payload == target
result << self
end
children.each do |child|
child.find_nodes_with_payload?(target, result)
end
accu
end
The method will collect all the nodes in a (sub)-tree containing a specific target value. Does solve your problem?
I modified your original code in following ways:
Made it an instance method and removed the 'node' argument
Renamed it from 'test' to something more descriptive, IMHO
added an array which collects all the nodes that contain the payload
changed the logic to: do nothing but recursively call the method again for all the children if the condition does not match
You can make it simple.
def search(tree, target)
trav = ->(node) { node.val == target ? puts("Bingo") : node.leaves.each(&trav) }
trav[tree]
end
Tree creation can also be simplified.
Node = Struct.new(:val, :leaves)
add = ->(tree) { tree ? tree.map {|k,v| Node.new(k, add[v])} : [] }
So we can recursively create tree and find value.
sample_tree = {2 => {7 => {6 => {4=>{}, 11=>{}}}, 5=> {9 => {}, 4=>{}}}}
tree = add[sample_tree]
search(tree.first, 5)
I have a network of nodes, each node influencing the state of some other nodes (imagine an Excel spreadsheet with cells values depending on other cells through formulas).
I'm wondering what is the cleanest way to implement this in Ruby ?
Of course I could have one process per node, but how will it perform if the number of nodes increases ? And, I'm sure there are libraries for that, but I can't find a up-to-date one.
Thanks for your help !
Update: Sounds like EventMachine might do the job... but it seems more adapted to a small number of "nodes"
This sounds like a good situation for the observer pattern. This is a sample of that in ruby:
require 'observer'
class Node
attr_accessor :id
##current_node_id = 0
def initialize
##current_node_id += 1
id = ##current_node_id
end
include Observable
attr_reader :value
protected
def value=(new_value)
return if #value == new_value
old_value = #value
#value = new_value
changed
notify_observers(id, old_value, #value)
end
end
class ValueNode < Node
def initialize(initial_value)
super()
#value = initial_value
end
def value=(new_value)
super(new_value)
end
end
class SumNode < Node
def initialize(*nodes)
super()
#value = nodes.map(&:value).inject(0, &:+)
nodes.each do |node|
node.add_observer(self)
end
end
def update(id, old_value, new_value)
self.value = self.value - old_value + new_value
end
end
def test
v1 = ValueNode.new 4
v2 = ValueNode.new 8
sum = SumNode.new(v1, v2)
sum2 = SumNode.new(v1, sum)
v2.value = 10
p sum.value
p sum2.value
end
test()
Notice how the value of SumNode isn't recalculated every time it is requested - instead it is updated when one of its value nodes is updated. This works recursively, so that inner SumNodes also trigger updates. As the notification includes the unique id of the node, it is possible to write more complex Node types, such as ones that contain formulas.
See http://www.ruby-doc.org/stdlib/libdoc/observer/rdoc/index.html for more details on Observable
This sounds similar to the oft-used Twitter paradigm, where updates by one user are pushed to all it's followers. To do this efficiently, you should store two lists for a given person: one with the people he follows and one with the people that follow him. You can do the same for a list of nodes. When a node changes you can quickly look up the nodes that are influenced by this node. When a relationship disappears you will need the 'forward' list to know from which lists to 'remove' the reverse relationship.
You can store these lists two-dimensional arrays, or in something like Redis. I don't really understand how EventMachine would fit in.
If you have a network graph of dependencies and you want them to scale, a graph database is the best solution. Neo4J is a popular, powerful database for tracking this type of dependencies.
There are several ways to interface with Neo4J from Ruby:
You can use JRuby and its java interface.
Use its REST API
Use neo4j.rb or one of the other Ruby interface libraries.
Could someone please explain what exactly recursion is (and how it works in Ruby, if that's not too much to ask for). I came across a lengthy code snippet relying on recursion and it confused me (I lost it now, and it's not entirely relevant).
A recursive function/method calls itself. For a recursive algorithm to terminate you need a base case (e.g. a condition where the function does not call itself recursively) and you also need to make sure that you get closer to that base case in each recursive call. Let's look at a very simple example:
def countdown(n)
return if n.zero? # base case
puts n
countdown(n-1) # getting closer to base case
end
countdown(5)
5
4
3
2
1
Some problems can be very elegantly expressed with recursion, e.g a lot of mathematical functions are described in a recursive way.
To understand recursion, you first need to understand recursion.
Now, on a serious note, a recursive function is one that calls itself. One classic example of this construct is the fibonacci sequence:
def fib(n)
return n if (0..1).include? n
fib(n-1) + fib(n-2) if n > 1
end
Using recursive functions gives you great power, but also comes with a lot of responsability (pun intended) and it presents some risk. For instance, you could end up with stack overflows (I'm on a roll) if your recursiveness is too big :-)
Ruby on Rails example:
Recursion will generate array of parents parents
a/m/document.rb
class Document < ActiveRecord::Base
belongs_to :parent, class_name: 'Document'
def self.get_ancestors(who)
#tree ||= []
# #tree is instance variable of Document class object not document instance object
# so: Document.get_instance_variable('#tree')
if who.parent.nil?
return #tree
else
#tree << who.parent
get_ancestors(who.parent)
end
end
def ancestors
#ancestors ||= Document.get_ancestors(self)
end
end
console:
d = Document.last
d.ancestors.collect(&:id)
# => [570, 569, 568]
https://gist.github.com/equivalent/5063770
Typically recursion is about method calling themselves, but maybe what you encountered were recursive structures, i.e. objects referring to themselves. Ruby 1.9 handles these really well:
h = {foo: 42, bar: 666}
parent = {child: {foo: 42, bar: 666}}
h[:parent] = parent
h.inspect # => {:foo=>42, :bar=>666, :parent=>{:child=>{...}}}
x = []
y = [x]
x << y
x.inspect # => [[[...]]]
x == [x] # => true
I find that last line is quite wicked; I blogged about this kind of issues with comparison of recursive structures a couple of years ago.
I just wanted to add to the answers in here that every recursion algorithm is composed of two things:
base case
recursive case
Base case is what tells your algorithm to break out so it does not continue to infinity or your memory runs out.
Recursive case is the part that ensures to call itself but shaving or shrinking the arguments you are calling it with.
My friends and I are working on some basic Ruby exercises to get a feel for the language, and we've run into an interesting behavior that we're yet unable to understand. Basically, we're creating a tree data type where there's just one class, node, which contains exactly one value and an array of zero or more nodes. We're using rspec's autospec test runner. At one point we started writing tests to disallow infinite recursion (a circular tree structure).
Here's our test:
it "breaks on a circular reference, which we will fix later" do
tree1 = Node.new 1
tree2 = Node.new 1
tree2.add_child tree1
tree1.add_child tree2
(tree1 == tree2).should be_false
end
Here's the Node class:
class Node
attr_accessor :value
attr_reader :nodes
def initialize initial_value = nil
#value = initial_value
#nodes = []
end
def add_child child
#nodes.push child
#nodes.sort! { |node1, node2| node1.value <=> node2.value }
end
def == node
return (#value == node.value) && (#nodes == node.nodes)
end
end
We expect the last line of the test to result in an infinite recursion until the stack overflows, because it should continually compare the child nodes with each other and never find a leaf node. (We're under the impression that the == operator on an array will iterate over the array and call == on each child, based on the array page of RubyDoc.) But if we throw a puts into the == method to see how often it's called, we discover that it's called exactly three times and then the test passes.
What are we missing?
Edit: Note that if we replace be_false in the test with be_true then the test fails. So it definitely thinks the arrays are not equal, it's just not recursing over them (aside from the three distinct calls to ==).
If you click on the method name of the RubyDoc you linked to, you will see the source (in C) of the Array#== method:
{
// [...]
if (RARRAY(ary1)->len != RARRAY(ary2)->len) return Qfalse;
if (rb_inspecting_p(ary1)) return Qfalse;
return rb_protect_inspect(recursive_equal, ary1, ary2);
}
This implementation (specifically the "recursive_equal") suggests that Array#== already implements the infinite recursion protection you're after.
Hey there, I have read the few posts here on when/how to use the visitor pattern, and some articles/chapters on it, and it makes sense if you are traversing an AST and it is highly structured, and you want to encapsulate the logic into a separate "visitor" object, etc. But with Ruby, it seems like overkill because you could just use blocks to do nearly the same thing.
I would like to pretty_print xml using Nokogiri. The author recommended that I use the visitor pattern, which would require I create a FormatVisitor or something similar, so I could just say "node.accept(FormatVisitor.new)".
The issue is, what if I want to start customizing all the stuff in the FormatVisitor (say it allows you to specify how nodes are tabbed, how attributes are sorted, how attributes are spaced, etc.).
One time I want the nodes to have 1 tab for each nest level, and the attributes to be in any order
The next time, I want the nodes to have 2 spaces, and the attributes in alphabetical order
The next time, I want them with 3 spaces and with two attributes per line.
I have a few options:
Create an options hash in the constructor (FormatVisitor.new({:tabs => 2})
Set values after I have constructed the Visitor
Subclass the FormatVisitor for each new implementation
Or just use blocks, not the visitor
Instead of having to construct a FormatVisitor, set values, and pass it to the node.accept method, why not just do this:
node.pretty_print do |format|
format.tabs = 2
format.sort_attributes_by {...}
end
That's in contrast to what I feel like the visitor pattern would look like:
visitor = Class.new(FormatVisitor) do
attr_accessor :format
def pretty_print(node)
# do something with the text
#format.tabs = 2 # two tabs per nest level
#format.sort_attributes_by {...}
end
end.new
doc.children.each do |child|
child.accept(visitor)
end
Maybe I've got the visitor pattern all wrong, but from what I've read about it in ruby, it seems like overkill. What do you think? Either way is fine with me, just wondering what how you guys feel about it.
Thanks a lot,
Lance
In essence, a Ruby block is the Visitor pattern without the extra boilerplate. For trivial cases, a block is sufficient.
For example, if you want to perform a simple operation on an Array object, you would just call the #each method with a block instead of implementing a separate Visitor class.
However, there are advantages in implementing a concrete Visitor pattern under certain cases:
For multiple, similar but complex operations, Visitor pattern provides inheritance and blocks don't.
Cleaner to write a separate test suite for Visitor class.
It's always easier to merge smaller, dumb classes into a larger smart class than separating a complex smart class into smaller dumb classes.
Your implementation seems mildly complex, and Nokogiri expects a Visitor instance that impelment #visit method, so Visitor pattern would actually be a good fit in your particular use case. Here is a class based implementation of the visitor pattern:
FormatVisitor implements the #visit method and uses Formatter subclasses to format each node depending on node types and other conditions.
# FormatVisitor implments the #visit method and uses formatter to format
# each node recursively.
class FormatVistor
attr_reader :io
# Set some initial conditions here.
# Notice that you can specify a class to format attributes here.
def initialize(io, tab: " ", depth: 0, attributes_formatter_class: AttributesFormatter)
#io = io
#tab = tab
#depth = depth
#attributes_formatter_class = attributes_formatter_class
end
# Visitor interface. This is called by Nokogiri node when Node#accept
# is invoked.
def visit(node)
NodeFormatter.format(node, #attributes_formatter_class, self)
end
# helper method to return a string with tabs calculated according to depth
def tabs
#tab * #depth
end
# creates and returns another visitor when going deeper in the AST
def descend
self.class.new(#io, {
tab: #tab,
depth: #depth + 1,
attributes_formatter_class: #attributes_formatter_class
})
end
end
Here the implementation of AttributesFormatter used above.
# This is a very simple attribute formatter that writes all attributes
# in one line in alphabetical order. It's easy to create another formatter
# with the same #initialize and #format interface, and you can then
# change the logic however you want.
class AttributesFormatter
attr_reader :attributes, :io
def initialize(attributes, io)
#attributes, #io = attributes, io
end
def format
return if attributes.empty?
sorted_attribute_keys.each do |key|
io << ' ' << key << '="' << attributes[key] << '"'
end
end
private
def sorted_attribute_keys
attributes.keys.sort
end
end
NodeFormatters uses Factory pattern to instantiate the right formatter for a particular node. In this case I differentiated text node, leaf element node, element node with text, and regular element nodes. Each type has a different formatting requirement. Also note, that this is not complete, e.g. comment nodes are not taken into account.
class NodeFormatter
# convience method to create a formatter using #formatter_for
# factory method, and calls #format to do the formatting.
def self.format(node, attributes_formatter_class, visitor)
formatter_for(node, attributes_formatter_class, visitor).format
end
# This is the factory that creates different formatters
# and use it to format the node
def self.formatter_for(node, attributes_formatter_class, visitor)
formatter_class_for(node).new(node, attributes_formatter_class, visitor)
end
def self.formatter_class_for(node)
case
when text?(node)
Text
when leaf_element?(node)
LeafElement
when element_with_text?(node)
ElementWithText
else
Element
end
end
# Is the node a text node? In Nokogiri a text node contains plain text
def self.text?(node)
node.class == Nokogiri::XML::Text
end
# Is this node an Element node? In Nokogiri an element node is a node
# with a tag, e.g. <img src="foo.png" /> It can also contain a number
# of child nodes
def self.element?(node)
node.class == Nokogiri::XML::Element
end
# Is this node a leaf element node? e.g. <img src="foo.png" />
# Leaf element nodes should be formatted in one line.
def self.leaf_element?(node)
element?(node) && node.children.size == 0
end
# Is this node an element node with a single child as a text node.
# e.g. <p>foobar</p>. We will format this in one line.
def self.element_with_text?(node)
element?(node) && node.children.size == 1 && text?(node.children.first)
end
attr_reader :node, :attributes_formatter_class, :visitor
def initialize(node, attributes_formatter_class, visitor)
#node = node
#visitor = visitor
#attributes_formatter_class = attributes_formatter_class
end
protected
def attribute_formatter
#attribute_formatter ||= #attributes_formatter_class.new(node.attributes, io)
end
def tabs
visitor.tabs
end
def io
visitor.io
end
def leaf?
node.children.empty?
end
def write_tabs
io << tabs
end
def write_children
v = visitor.descend
node.children.each { |child| child.accept(v) }
end
def write_attributes
attribute_formatter.format
end
def write_open_tag
io << '<' << node.name
write_attributes
if leaf?
io << '/>'
else
io << '>'
end
end
def write_close_tag
return if leaf?
io << '</' << node.name << '>'
end
def write_eol
io << "\n"
end
class Element < self
def format
write_tabs
write_open_tag
write_eol
write_children
write_tabs
write_close_tag
write_eol
end
end
class LeafElement < self
def format
write_tabs
write_open_tag
write_eol
end
end
class ElementWithText < self
def format
write_tabs
write_open_tag
io << text
write_close_tag
write_eol
end
private
def text
node.children.first.text
end
end
class Text < self
def format
write_tabs
io << node.text
write_eol
end
end
end
To use this class:
xml = "<root><aliens><alien><name foo=\"bar\">Alf<asdf/></name></alien></aliens></root>"
doc = Nokogiri::XML(xml)
# the FormatVisitor accepts an IO object and writes to it
# as it visits each node, in this case, I pick STDOUT.
# You can also use File IO, Network IO, StringIO, etc.
# As long as it support the #puts method, it will work.
# I'm using the defaults here. ( two spaces, with starting depth at 0 )
visitor = FormatVisitor.new(STDOUT)
# this will allow doc ( the root node ) to call visitor.visit with
# itself. This triggers the visiting of each children recursively
# and contents written to the IO object. ( In this case, it will
# print to STDOUT.
doc.accept(visitor)
# Prints:
# <root>
# <aliens>
# <alien>
# <name foo="bar">
# Alf
# <asdf/>
# </name>
# </alien>
# </aliens>
# </root>
With the above code, you can change node formatting behaviors by constructing extra subclasses of NodeFromatters and plug them into the factory method. You can control the formatting of attributes with various implementation of the AttributesFromatter. As long as you adhere to its interface, you can plug it into the attributes_formatter_class argument without modifying anything else.
List of design patterns used:
Visitor Pattern: handle node traversal logic. ( Also interface requirement by Nokogiri. )
Factory Pattern, used to determine formatter based on node types and other formatting conditions. Note, if you don't like the class methods on NodeFormatter, you can extract them into NodeFormatterFactory to be more proper.
Dependency Injection (DI / IoC), used to control the formatting of attributes.
This demonstrates how you can combine a few patterns together to achieve the flexibility you desire. Although, if you need those flexibility is something you have to decide.
I would go with what is simple and works. I don't know the details, but what you wrote compared with the Visitor pattern, looks simpler. If it also works for you, I would use that. Personally, I am tired with all these techniques that ask you to create a huge "network" of interelated classes, just to solve one small problem.
Some would say, yeah, but if you do it using patterns then you can cover many future needs and blah blah. I say, do now what works and if the need arises, you can refactor in the future. In my projects, that need almost never arises, but that's a different story.