ruby create array from query result - ruby

What is the optimal way to create an array from query result cursor?
results = #db.query("SELECT * FROM mytable")
I would like to turn results.each into an array and store it back in results.

You haven't given much context (which db gem you're using etc).
However, I can think of a couple of ways off the top of my head:
reuslts.to_a # might not work (depends on how you're accessing the db)
a = []
results.each { |r| a << r } # will definitely work assuming `each` is defined
Bear in mind that the #query method likely returns a generator for a good reason. Turning the results into an array would be a pointless memory hog (and time-consuming). Using #each is definitely preferable as it will be more efficient.

Related

Fetch from hash with either Singular or Plural

I get the following input hash in my ruby code
my_hash = { include: 'a,b,c' }
(or)
my_hash = { includes: 'a,b,c' }
Now I want the fastest way to get 'a,b,c'
I currently use
def my_includes
my_hash[:include] || my_hash[:includes]
end
But this is very slow because it always checks for :include keyword first then if it fails it'll look for :includes. I call this function several times and the value inside this hash can keep changing. Is there any way I can optimise and speed up this? I won't get any other keywords. I just need support for :include and :includes.
Caveats and Considerations
First, some caveats:
You tagged this Rails 3, so you're probably on a very old Ruby that doesn't support a number of optimizations, newer Hash-related method calls like #fetch_values or #transform_keys!, or pattern matching for structured data.
You can do all sorts of things with your Hash lookups, but none of them are likely to be faster than a Boolean short-circuit when assuming you can be sure of having only one key or the other at all times.
You haven't shown any of the calling code, so without benchmarks it's tough to see how this operation can be considered "slow" in any general sense.
If you're using Rails and not looking for a pure Ruby solution, you might want to consider ActiveModel::Dirty to only take action when an attribute has changed.
Use Memoization
Regardless of the foregoing, what you're probably missing here is some form of memoization so you don't need to constantly re-evaluate the keys and extract the values each time through whatever loop feels slow to you. For example, you could store the results of your Hash evaluation until it needs to be refreshed:
attr_accessor :includes
def extract_includes(hash)
#includes = hash[:include] || hash[:includes]
end
You can then call #includes or #includes= (or use the #includes instance variable directly if you like) from anywhere in scope as often as you like without having to re-evaluate the hashes or keys. For example:
def count_includes
#includes.split(?,).count
end
500.times { count_includes }
The tricky part is basically knowing if and when to update your memoized value. Basically, you should only call #extract_includes when you fetch a new Hash from somewhere like ActiveRecord or a remote API. Until that happens, you can reuse the stored value for as long as it remains valid.
You could work with a modified hash that has both keys :include and :includes with the same values:
my_hash = { include: 'a,b,c' }
my_hash.update(my_hash.key?(:include) ? { includes: my_hash[:include] } :
{ include: my_hash[:includes] })
#=> {:include=>"a,b,c", :includes=>"a,b,c"}
This may be fastest if you were using the same hash my_hash for multiple operations. If, however, a new hash is generated after just a few interrogations, you might see if both the keys :include and :includes can be included when the hash is constructed.

Is there a similar solution as Array#wrap for hashes?

I use Array.wrap(x) all the time in order to ensure that Array methods actually exist on an object before calling them.
What is the best way to similarly ensure a Hash?
Example:
def ensure_hash(x)
# TODO: this is what I'm looking for
end
values = [nil,1,[],{},'',:a,1.0]
values.all?{|x| ensure_hash(x).respond_to?(:keys) } # true
The best I've been able to come up with so far is:
Hash::try_convert(x) || {}
However, I would prefer something more elegant.
tl; dr: In an app with proper error handling, there is no "easy, care-free" way to handle something that may or may not be hashy.
From a conceptual standpoint, the answer is no. There is no similar solution as Array.wrap(x) for hashes.
An array is a collection of values. Single values can be stored outside of arrays (e.g. x = 42) , so it's a straight-forward task to wrap a value in an array (a = [42]).
A hash is a collection of key-value pairs. In ruby, single key-value pairs can't exist outside of a hash. The only way to express a key-value pair is with a hash: h = { v: 42 }
Of course, there are a thousand ways to express a key-value pair as a single value. You could use an array [k, v] or a delimited string `"k:v" or some more obscure method.
But at that point, you're no longer wrapping, you're parsing. Parsing relies on properly formatted data and has multiple points of failure. No matter how you look at it, if you find yourself in a situation where you may or may not have a hash, that means you need to write a proper chunk of code for data validation and parsing (or refactor your upstream code so that you can always expect a hash).

Ruby way of looping through nested objects

How can I rewrite the following code to be more Ruby-wayish? I'm thinking about inject but can't figure out how to do it.
def nested_page_path(page)
path = "/#{page.slug}"
while page.parent_id do
path.prepend "/#{page.parent.slug}"
page = page.parent
end
path
end
Input is an AR object, that has 0-5 consecutive parents. And output is something like '/pages/services/law'.
If you know for sure that there are no cycles in your parenting, you can do that recursively, i. e. with a function that calls itself. 5-level nesting should do just fine, trouble could arise with thousands.
def nested_page_path(page)
return "" if page.nil? # Or whatever that is root
"#{nested_page_path(page.parent)}/#{page.slug}"
end
But bear in mind, that the approach above, as well as yours, will fetch each object in a separate query. It's fine when you already have them fetched, but if not, you're in a bit of N+1 query trouble.
An easy workaround is caching. You can rebuild the nested path of this object and its descendants on before_save: that is some significant overhead on each write. There is a much better way.
By using nested sets you can get the object's hierarchy branch in just one query. Like this:
page.self_and_ancestors.pluck(:slug).join('/')
# ^
# Nested sets' goodness
What that query does is essentially "fetch me pages ordered by left bound, ranges of which enclose my own". I'm using awesome_nested_set in my examples.
SELECT "pages"."slug" FROM "pages"
WHERE ("pages"."lft" <= 42) AND ("pages"."rgt" >= 88)
ORDER BY "pages"."lft"
Without knowing your object structure it's difficult. But something recursive like this should do:
def nested_page_path(page)
path = "/#{page.slug}"
return path unless page.parent_id
path.prepend "#{nested_page_path(page.parent)}/"
end
Not sure inject is the simple answer since it operates on an Enumerable and you don’t have an obvious enumerable to start with.
I’d suggest something like this (not unlike your solution)
def nested_page_path(page)
pages = [page]
pages << pages.last.parent while pages.last.parent
'/' + pages.reverse.map(&:slug).join('/')
end
There’s scope for reducing repetition there, but that’s more or less what I’d go with.

Ruby equivalent of C#'s 'yield' keyword, or, creating sequences without preallocating memory

In C#, you could do something like this:
public IEnumerable<T> GetItems<T>()
{
for (int i=0; i<10000000; i++) {
yield return i;
}
}
This returns an enumerable sequence of 10 million integers without ever allocating a collection in memory of that length.
Is there a way of doing an equivalent thing in Ruby? The specific example I am trying to deal with is the flattening of a rectangular array into a sequence of values to be enumerated. The return value does not have to be an Array or Set, but rather some kind of sequence that can only be iterated/enumerated in order, not by index. Consequently, the entire sequence need not be allocated in memory concurrently. In .NET, this is IEnumerable and IEnumerable<T>.
Any clarification on the terminology used here in the Ruby world would be helpful, as I am more familiar with .NET terminology.
EDIT
Perhaps my original question wasn't really clear enough -- I think the fact that yield has very different meanings in C# and Ruby is the cause of confusion here.
I don't want a solution that requires my method to use a block. I want a solution that has an actual return value. A return value allows convenient processing of the sequence (filtering, projection, concatenation, zipping, etc).
Here's a simple example of how I might use get_items:
things = obj.get_items.select { |i| !i.thing.nil? }.map { |i| i.thing }
In C#, any method returning IEnumerable that uses a yield return causes the compiler to generate a finite state machine behind the scenes that caters for this behaviour. I suspect something similar could be achieved using Ruby's continuations, but I haven't seen an example and am not quite clear myself on how this would be done.
It does indeed seem possible that I might use Enumerable to achieve this. A simple solution would be to us an Array (which includes module Enumerable), but I do not want to create an intermediate collection with N items in memory when it's possible to just provide them lazily and avoid any memory spike at all.
If this still doesn't make sense, then consider the above code example. get_items returns an enumeration, upon which select is called. What is passed to select is an instance that knows how to provide the next item in the sequence whenever it is needed. Importantly, the whole collection of items hasn't been calculated yet. Only when select needs an item will it ask for it, and the latent code in get_items will kick into action and provide it. This laziness carries along the chain, such that select only draws the next item from the sequence when map asks for it. As such, a long chain of operations can be performed on one data item at a time. In fact, code structured in this way can even process an infinite sequence of values without any kinds of memory errors.
So, this kind of laziness is easily coded in C#, and I don't know how to do it in Ruby.
I hope that's clearer (I'll try to avoid writing questions at 3AM in future.)
It's supported by Enumerator since Ruby 1.9 (and back-ported to 1.8.7). See Generator: Ruby.
Cliche example:
fib = Enumerator.new do |y|
y.yield i = 0
y.yield j = 1
while true
k = i + j
y.yield k
i = j
j = k
end
end
100.times { puts fib.next() }
Your specific example is equivalent to 10000000.times, but let's assume for a moment that the times method didn't exist and you wanted to implement it yourself, it'd look like this:
class Integer
def my_times
return enum_for(:my_times) unless block_given?
i=0
while i<self
yield i
i += 1
end
end
end
10000.my_times # Returns an Enumerable which will let
# you iterate of the numbers from 0 to 10000 (exclusive)
Edit: To clarify my answer a bit:
In the above example my_times can be (and is) used without a block and it will return an Enumerable object, which will let you iterate over the numbers from 0 to n. So it is exactly equivalent to your example in C#.
This works using the enum_for method. The enum_for method takes as its argument the name of a method, which will yield some items. It then returns an instance of class Enumerator (which includes the module Enumerable), which when iterated over will execute the given method and give you the items which were yielded by the method. Note that if you only iterate over the first x items of the enumerable, the method will only execute until x items have been yielded (i.e. only as much as necessary of the method will be executed) and if you iterate over the enumerable twice, the method will be executed twice.
In 1.8.7+ it has become to define methods, which yield items, so that when called without a block, they will return an Enumerator which will let the user iterate over those items lazily. This is done by adding the line return enum_for(:name_of_this_method) unless block_given? to the beginning of the method like I did in my example.
Without having much ruby experience, what C# does in yield return is usually known as lazy evaluation or lazy execution: providing answers only as they are needed. It's not about allocating memory, it's about deferring computation until actually needed, expressed in a way similar to simple linear execution (rather than the underlying iterator-with-state-saving).
A quick google turned up a ruby library in beta. See if it's what you want.
C# ripped the 'yield' keyword right out of Ruby- see Implementing Iterators here for more.
As for your actual problem, you have presumably an array of arrays and you want to create a one-way iteration over the complete length of the list? Perhaps worth looking at array.flatten as a starting point - if the performance is alright then you probably don't need to go too much further.

Ruby: Why is Array.sort slow for large objects?

A colleague needed to sort an array of ActiveRecord objects in a Rails app. He tried the obvious Array.sort! but it seemed surprisingly slow, taking 32s for an array of 3700 objects. So just in case it was these big fat objects slowing things down, he reimplemented the sort by sorting an array of small objects, then reordering the original array of ActiveRecord objects to match - as shown in the code below. Tada! The sort now takes 700ms.
That really surprised me. Does Ruby's sort method end up copying objects about the place rather than just references? He's using Ruby 1.8.6/7.
def self.sort_events(events)
event_sorters = Array.new(events.length) {|i| EventSorter.new(i, events[i])}
event_sorters.sort!
event_sorters.collect {|es| events[es.index]}
end
private
# Class used by sort_events
class EventSorter
attr_reader :sqn
attr_reader :time
attr_reader :index
def initialize(index, event)
#index = index
#sqn = event.sqn
#time = event.time
end
def <=>(b)
#time != b.time ? #time <=> b.time : #sqn <=> b.sqn
end
end
sort definitely does not copy the objects. One difference that I can imagine between the code using EventSorter and the code without it (which you didn't supply, so I have to guess) is that EventSorter calls event.sqn and event.time exactly once and stores the result in variables. During the sorting only the variables need to be accessed. The original version presumably called sqn and time each time the sort-block was invoked.
If this is the case, it can be fixed by using sort_by instead of sort. sort_by only calls the block once per object and then uses the cached results of the block for further comparisons.
Just as an explanation of what is likely happening and how to deal with it...
Sorting tends to look at an element multiple times so an expensive lookup into the object or structure will become very costly very quickly.
A Schwartzian Transform is commonly used when sorting arrays of complex objects or structures. The basic idea is to pre-compute a simple value that accurately reflects the big structure or object, then sort the values, then use the resulting sorted array to refer back to the thing they came from.
http://en.wikipedia.org/wiki/Schwartzian_transform
Nothing answers questions like this better than the actual language source code. Array#sort! uses sort_internal() which is defined in array.c:
sort_internal()
(Yes, I know that's the sources for 1.8.4 but I can't find 1.8.6 ones online right and am pretty sure this hasn't changed.)

Resources