I have a tree data-structure implemented in ruby. I'm using it to represent a parse-tree.
It works, as you might expect, by having many node objects, each containing useful values as well as an array of references to it's child-nodes.
I've written a method to traverse the tree that's pretty simple and works like:
def depth_first_traversal(node, &block)
if(node.has_children?)
depth_first_traversal(node.children[0], &block)
yield node
depth_first_traversal(node.children[1], &block)
else
yield node
end
end
The issue is that for each tree I only explicitly hold a reference to the root node. Thus far I've just been using my recursive traversal to access all the other nodes.
Now I need to change the values of the nodes in the tree and I'm not sure how to do it.
How could I modify this traversal so that I could modify each element in the tree, instead of just passing a reference to them in to &block?
--- EDIT: ---
Apologies for the lack of detail, I was trying to make my question broad and useful.
The 'value' of a node in the tree is several instance variables in each instance of the node-object. Lets call them #value and #type. There are getter and setter methods for them.
The tree is a binary tree - but that may change later. I also don't think that's the aspect of the problem I'm struggling with:
My tree explicitly creates the Node #root. All other nodes in the tree are created in a loop. So a typical node is accessible, for example as "the child of the child of the root" and in no other manner.
In other words, searching this structure of pointers is my only means of accessing the nodes.
If ruby passes exclusively by value, any value yielded (like in the above method) will be a copy of this object, not the object itself.
So I'm confused about how I should modify values in any tree, not just this one.
If I understand you correctly, you could probably do something like this:
def df_tree_map(node, &block)
if(node.has_children?)
df_tree_map(node.children[0], &block)
node = yield node
df_tree_map(node.children[1], &block)
else
node = yield node
end
end
Obviously this is going to have consequences to the tree structure, but that might be a benefit. The critical point here though is that you're block will need to return a node instead of any old thing. Returning a string, for example, isn't going to work the way that Array#map does, because a node inherently has children.
Another solution is to allow the map function to modify the contents of the node but not the structure. I'm taking a little liberty here as you didn't post the instance variables nodes have access too, but it should make enough sense:
def df_tree_map(node, &block)
if(node.has_children?)
df_tree_map(node.children[0], &block)
node.contents = yield node.contents
df_tree_map(node.children[1], &block)
else
node.contents = yield node.contents
end
end
Here, I'm not passing the node itself to the block, but rather the contents. This way, the tree structure cannot be altered by the map. It seems like it might be more consistent with the Array#map function, but it might not do what you're looking for.
Related
How can I rewrite the following code to be more Ruby-wayish? I'm thinking about inject but can't figure out how to do it.
def nested_page_path(page)
path = "/#{page.slug}"
while page.parent_id do
path.prepend "/#{page.parent.slug}"
page = page.parent
end
path
end
Input is an AR object, that has 0-5 consecutive parents. And output is something like '/pages/services/law'.
If you know for sure that there are no cycles in your parenting, you can do that recursively, i. e. with a function that calls itself. 5-level nesting should do just fine, trouble could arise with thousands.
def nested_page_path(page)
return "" if page.nil? # Or whatever that is root
"#{nested_page_path(page.parent)}/#{page.slug}"
end
But bear in mind, that the approach above, as well as yours, will fetch each object in a separate query. It's fine when you already have them fetched, but if not, you're in a bit of N+1 query trouble.
An easy workaround is caching. You can rebuild the nested path of this object and its descendants on before_save: that is some significant overhead on each write. There is a much better way.
By using nested sets you can get the object's hierarchy branch in just one query. Like this:
page.self_and_ancestors.pluck(:slug).join('/')
# ^
# Nested sets' goodness
What that query does is essentially "fetch me pages ordered by left bound, ranges of which enclose my own". I'm using awesome_nested_set in my examples.
SELECT "pages"."slug" FROM "pages"
WHERE ("pages"."lft" <= 42) AND ("pages"."rgt" >= 88)
ORDER BY "pages"."lft"
Without knowing your object structure it's difficult. But something recursive like this should do:
def nested_page_path(page)
path = "/#{page.slug}"
return path unless page.parent_id
path.prepend "#{nested_page_path(page.parent)}/"
end
Not sure inject is the simple answer since it operates on an Enumerable and you don’t have an obvious enumerable to start with.
I’d suggest something like this (not unlike your solution)
def nested_page_path(page)
pages = [page]
pages << pages.last.parent while pages.last.parent
'/' + pages.reverse.map(&:slug).join('/')
end
There’s scope for reducing repetition there, but that’s more or less what I’d go with.
Is it safe to delete a Node from a NodeSet during iteration? I'm pulling some links out of a bunch of a tags but want to remove the tags from the set altogether if the link is invalid.
def get_links(nodeset)
links = nodeset.map do |node|
begin
URI.join(node.document.url, node.get_attribute('href'))
rescue URI::InvalidURIError
nodeset.delete(node) # Is this safe?
nil
end
end
links.compact
end
In your example code I think you're not separating your actions well. Don't manipulate your nodeset array inside the map; It's not that you can't do it, it's that you shouldn't for clarity and ease of maintenance. "Map" the URLs separately from removing the bad ones.
At a minimum I'd do something more like:
def get_valid_links(nodeset)
doc_url = nodeset.first.document.url
links = nodeset.map { |node|
begin
URI.join(doc_url, node['href'])
rescue URI::InvalidURIError
nil
end
end
links.compact
end
nodeset = get_valid_links(nodeset)
Doing it that way doesn't alter nodeset unless you explicitly say so, by assigning the compacted/mapped value returned from get_links. That keeps the purpose of the method very clear, and it has no side effects.
I think this is one of those cases where "POLS" ("Principle Of Least Surprise") would kick in. Having the side-effect of munging nodeset inside the method could be very surprising to someone who's trying to maintain the code or use it in a library, and it'd be hard to work around.
From experience, I'll recommend being very careful throwing the contents of href attributes onto the end of a URL and expecting it to be good or useful. Remember that it's possible for the href to be a JavaScript link, which will make an ugly URL.
The difference between Enumerable#each and Enumerable#map is whether it returns the receiver or the mapped result. Getting back to the receiver is trivial and you usually do not need to continue a method chain after each like each{...}.another_method (I probably have not seen such case. Even if you want to get back to the receiver, you can do that with tap). So I think all or most cases where Enumerable#each is used can be replaced by Enumerable#map. Am I wrong? If I am right, what is the purpose of each? Is map slower than each?
Edit:
I know that there is a common practice to use each when you are not interested in the return value. I am not interested in whether such practice exists, but am interested in whether such practice makes sense other than from the point of view of convention.
The difference between map and each is more important than whether one returns a new array and the other doesn't. The important difference is in how they communicate your intent.
When you use each, your code says "I'm doing something for each element." When you use map, your code says "I'm creating a new array by transforming each element."
So while you could use map in place of each, performance notwithstanding, the code would now be lying about its intent to anyone reading it.
The choice between map or each should be decided by the desired end result: a new array or no new array. The result of map can be huge and/or silly:
p ("aaaa".."zzzz").map{|word| puts word} #huge and useless array of nil's
I agree with what you said. Enumerable#each simply returns the original object it was called on while Enumerable#map sets the current element being iterated over to the return value of the block, and then returns a new object with those changes.
Since Enumerable#each simply returns the original object itself, it can be very well preferred over the map when it comes to cases where you need to simply iterate or traverse over elements.
In fact, Enumerable#each is a simple and universal way of doing a traditional iterating for loop, and each is much preferred over for loops in Ruby.
You can see the significant difference between map and each when you're composing these enumaratiors.
For example you need to get new array with indixes in it:
array.each.with_index.map { |index, element| [index, element] }
Or for example you just need to apply some method to all elements in array and print result without changing the original array:
m = 2.method(:+)
[1,2,3].each { |a| puts m.call(a) } #=> prints 3, 4, 5
And there's a plenty another examples where the difference between each and map is important key in the writing code in functional style.
In C#, you could do something like this:
public IEnumerable<T> GetItems<T>()
{
for (int i=0; i<10000000; i++) {
yield return i;
}
}
This returns an enumerable sequence of 10 million integers without ever allocating a collection in memory of that length.
Is there a way of doing an equivalent thing in Ruby? The specific example I am trying to deal with is the flattening of a rectangular array into a sequence of values to be enumerated. The return value does not have to be an Array or Set, but rather some kind of sequence that can only be iterated/enumerated in order, not by index. Consequently, the entire sequence need not be allocated in memory concurrently. In .NET, this is IEnumerable and IEnumerable<T>.
Any clarification on the terminology used here in the Ruby world would be helpful, as I am more familiar with .NET terminology.
EDIT
Perhaps my original question wasn't really clear enough -- I think the fact that yield has very different meanings in C# and Ruby is the cause of confusion here.
I don't want a solution that requires my method to use a block. I want a solution that has an actual return value. A return value allows convenient processing of the sequence (filtering, projection, concatenation, zipping, etc).
Here's a simple example of how I might use get_items:
things = obj.get_items.select { |i| !i.thing.nil? }.map { |i| i.thing }
In C#, any method returning IEnumerable that uses a yield return causes the compiler to generate a finite state machine behind the scenes that caters for this behaviour. I suspect something similar could be achieved using Ruby's continuations, but I haven't seen an example and am not quite clear myself on how this would be done.
It does indeed seem possible that I might use Enumerable to achieve this. A simple solution would be to us an Array (which includes module Enumerable), but I do not want to create an intermediate collection with N items in memory when it's possible to just provide them lazily and avoid any memory spike at all.
If this still doesn't make sense, then consider the above code example. get_items returns an enumeration, upon which select is called. What is passed to select is an instance that knows how to provide the next item in the sequence whenever it is needed. Importantly, the whole collection of items hasn't been calculated yet. Only when select needs an item will it ask for it, and the latent code in get_items will kick into action and provide it. This laziness carries along the chain, such that select only draws the next item from the sequence when map asks for it. As such, a long chain of operations can be performed on one data item at a time. In fact, code structured in this way can even process an infinite sequence of values without any kinds of memory errors.
So, this kind of laziness is easily coded in C#, and I don't know how to do it in Ruby.
I hope that's clearer (I'll try to avoid writing questions at 3AM in future.)
It's supported by Enumerator since Ruby 1.9 (and back-ported to 1.8.7). See Generator: Ruby.
Cliche example:
fib = Enumerator.new do |y|
y.yield i = 0
y.yield j = 1
while true
k = i + j
y.yield k
i = j
j = k
end
end
100.times { puts fib.next() }
Your specific example is equivalent to 10000000.times, but let's assume for a moment that the times method didn't exist and you wanted to implement it yourself, it'd look like this:
class Integer
def my_times
return enum_for(:my_times) unless block_given?
i=0
while i<self
yield i
i += 1
end
end
end
10000.my_times # Returns an Enumerable which will let
# you iterate of the numbers from 0 to 10000 (exclusive)
Edit: To clarify my answer a bit:
In the above example my_times can be (and is) used without a block and it will return an Enumerable object, which will let you iterate over the numbers from 0 to n. So it is exactly equivalent to your example in C#.
This works using the enum_for method. The enum_for method takes as its argument the name of a method, which will yield some items. It then returns an instance of class Enumerator (which includes the module Enumerable), which when iterated over will execute the given method and give you the items which were yielded by the method. Note that if you only iterate over the first x items of the enumerable, the method will only execute until x items have been yielded (i.e. only as much as necessary of the method will be executed) and if you iterate over the enumerable twice, the method will be executed twice.
In 1.8.7+ it has become to define methods, which yield items, so that when called without a block, they will return an Enumerator which will let the user iterate over those items lazily. This is done by adding the line return enum_for(:name_of_this_method) unless block_given? to the beginning of the method like I did in my example.
Without having much ruby experience, what C# does in yield return is usually known as lazy evaluation or lazy execution: providing answers only as they are needed. It's not about allocating memory, it's about deferring computation until actually needed, expressed in a way similar to simple linear execution (rather than the underlying iterator-with-state-saving).
A quick google turned up a ruby library in beta. See if it's what you want.
C# ripped the 'yield' keyword right out of Ruby- see Implementing Iterators here for more.
As for your actual problem, you have presumably an array of arrays and you want to create a one-way iteration over the complete length of the list? Perhaps worth looking at array.flatten as a starting point - if the performance is alright then you probably don't need to go too much further.
Here is a clever trick to enable hash autovivification in ruby (taken from facets):
# File lib/core/facets/hash/autonew.rb, line 19
def self.autonew(*args)
leet = lambda { |hsh, key| hsh[key] = new( &leet ) }
new(*args,&leet)
end
Although it works (of course), I find it really frustrating that I can't figure out how this two liner does what it does.
leet is put as a default value. So that then just accessing h['new_key'] somehow brings it up and creates 'new_key' => {}
Now, I'd expect h['new_key'] returning default value object as opposed to evaluating it. That is, 'new_key' => {} is not automatically created. So how does leet actually get called? Especially with two parameters?
The standard new method for Hash accepts a block. This block is called in the event of trying to access a key in the Hash which does not exist. The block is passed the Hash itself and the key that was requested (the two parameters) and should return the value that should be returned for the requested key.
You will notice that the leet lambda does 2 things. It returns a new Hash with leet itself as the block for handling defaults. This is the behaviour which allows autonew to work for Hashes of arbitrary depth. It also assigns this new Hash to hsh[key] so that next time you request the same key you will get the existing Hash rather than a new one being created.
It's also worth noting that this code can be made into a one-liner as follows:
def self.autonew(*args)
new(*args){|hsh, key| hsh[key] = Hash.new(&hsh.default_proc) }
end
The call to Hash#default_proc returns the proc that was used to create the parent, so we have a nice recursive setup here.
I talk about a similar case to this on my blog.
Alternatively, you might consider my xkeys gem. It's a module that you can use to extend arrays or hashes to facilitate nested access.
If you look for something that doesn't exist yet, you get a nil value (or another value or an exception if you prefer) without creating anything by looking. It can also append to the end of arrays.
You can opt to autovivify either hashes or arrays for integer keys (but just once for the entire structure).