Concurrent, non-blocking, fixed-size list? - ruby

I'm looking to implement a data structure with the following characteristics:
Operations
Push: Add an element to the front of the list.
Read: Read all elements in the list
Behavior
Fixed-size: The list should not grow beyond a specified threshold, and it should automatically truncate from the end (oldest item) if that threshold is exceeded. This does not need to be strictly enforced, but the list should eventually be truncated once it passes the threshold.
Concurrency-safe: The structure should safely accommodate multiple parallel pushers and readers
Non-blocking: This is the real problem. I'd like to use an implementation without locks. Many threads should be able to push/read simultaneously if possible. A less-desirable, but acceptable option would be an implementation that has locks, but minimizes contention between multiple pushers/readers. I'm familiar with reader-writer locks, but those assume infrequent writes, which is not my use-case.
Optional but nice-to-have
Write-read consistency: If a single thread pushes to the structure, a read immediately following should contain the written item. This would be nice, but I'm wondering whether excluding this requirement could make the above requirements easier to implement.
I'm mostly a novice in concurrent data structures. Does an example of such a data structure exist? Ring buffers are interesting, but I don't think they can be non-blocking. Linked-lists are promising, but the concurrency-safe, non-blocking requirements complicate the implementation considerably.
I have found some good papers on implementing non-blocking linked lists using atomic CAS (compare-and-swap) operations, but the fixed-size requirement throws a bit of a wrench into those. Maybe that idea can be adapted to a fixed-size list?
For what it's worth, I'm interested in using this in Ruby. I understand that MRI has the global-interpreter-lock, which makes this a bit useless for MRI, but other Ruby runtimes could take advantage of this, and I'm thinking of it as a learning exercise to grow my concurrent programming skills.

Analysis
This question might be a better fit on Software Engineering, rather than here on Stack Overflow, as it seems to be more of a design question. That said, I suggest using thread-safe arrays, or delegating resource contention to an MVCC database if you can't redesign your application to avoid a singular shared object altogether.
Recommendations
You can implement a thread-safe list or simulate a circular buffer using Concurrent::Array with the #unshift and #pop methods. You can also choose to externalize locking to something like a database, where Ruby's GIL is largely irrelevant to the underlying queue or locking mechanisms. However, to the best of my knowledge, there's no way to create a truly lockless concurrent access object in Ruby, although implementing your own multiversion concurrency control might come close.
The low-hanging fruit is probably to externalize your reads and writes to an MVCC-capable database such as PostgreSQL. If you can't or won't do that, you may need to accept the trade-offs inherent in the ACID properties and performance characteristics of your application and data structures. In particular, the use of a single shared data structure is a design decision you should perhaps re-evaluate if you can.
Before you start down that path, just make sure that you have a real performance problem to solve. While there are certainly cases where locks add noticeable overhead, many real-world applications are sufficiently performant even with Ruby's GIL in the mix. Your mileage may certainly vary.

You might consider creating a class such as the following. I don't consider this to be complete. Moreover, I have not considered non-blocking issues, which is a broad topic that is not specific to this class.
class TruncatedList
attr_reader :max_size
alias to_s inspect
def initialize(max_size=Float::INFINITY)
#max_size = max_size
#list = []
end
def pop(n=1)
return nil if #list.empty?
case n
when 0
nil
when 1
#list.pop
else
#list.pop([n, #list.size].min)
end
end
def >>(obj)
#list.pop if #list.size == #max_size
#list.unshift(obj)
end
def unshift(*arr)
arr.each do |obj|
#list.pop if #list.size == #max_size
#list >> obj
end
end
def <<(obj)
if #list.size < #max_size
#list << obj
else
#list
end
end
def push(*arr)
arr.each do |obj|
break(#list) if #list.size == #max_size
#list << obj
end
end
def shift(n=1)
return #list if #list.empty?
case n
when 0
nil
when 1
#list.shift
else
#list.shift([n, #list.size].min)
end
end
def pop(n=1)
return nil if #list.empty?
case n
when 0
nil
when 1
#list.pop
else
#list.pop([n, #list.size].min)
end
end
def inspect
#list.to_str
end
def to_a
#list
end
def size
#list.size
end
end
Here is an example of how this set might be used.
t = TruncatedList.new(6)
#=> #<TruncatedList:0x00007fe2db0512a0 #max_size=6, #list=[]>
t.inspect
#=> "[]"
t.to_a
#=> []
t >> 1
#=> 1
t.inspect
#=> "[1]"
t.unshift(2,3)
#=> [2, 3]
t.inspect
#=> "[3, 2, 1]"
t.unshift(4,5,6,7,8)
#=> [4, 5, 6, 7, 8]
t.inspect
#=> "[8, 7, 6, 5, 4, 3]"
t.to_a
#=> [8, 7, 6, 5, 4, 3]

I think I came up with an interesting solution that meets the requirements.
Theory
We use a linked-list as the foundation, but add thread-safety and truncation on top.
Pushes
During pushes, we accomplish thread-safety by using a compare-and-set operation. The push will succeed only if another thread has not already pushed to the last-known list head. If the push fails, we simply retry until it succeeds.
Truncation
When the first node is pushed, we designate it as the "prune node". As items get pushed to the list, that node is pushed further down, but we maintain a reference to it. When the list reaches capacity, we set break the link on the "prune node" to allow the following nodes to be garbage collected. Then we set the newest node as the "prune node". This way, the list size never exceeds "capacity * 2".
Reads
Because it's a linked-list without arbitrary insertions, we get mostly consistent reads because the list nodes will never be rearranged. We dereference the head when we start reading the list. We never read more elements than the configured capacity. If the list is truncated during a read, it's possible that we might not read enough nodes (this could be mitigated by saving the prune node when starting enumeration so that pruned nodes could still be read while the enumerator is active).
Thoughts
I'm pretty happy about the truncation mechanism, but it seems likely that a Mutex-based solution would perform just-as-well or even better than the CAS solution. It likely depends on how heavily contested the push operation is, and would need to be benchmarked.
require 'concurrent-ruby'
class SizedList
attr_reader :capacity
class Node
attr_reader :value
attr_reader :nxt
def initialize(value, nxt = nil)
#value = value
#nxt = Concurrent::AtomicReference.new(nxt)
#count = Concurrent::AtomicFixnum.new(0)
end
def increment
#count.increment
end
end
def initialize(capacity)
#capacity = capacity
#head = Node.new(nil)
#prune_node = Concurrent::AtomicReference.new
end
def push(element)
succeeded = false
node = nil
# Maybe should just use a mutex for this write instead of CAS
# It needs to be benchmarked
until succeeded
first = #head.nxt.get
node = Node.new(element, first)
succeeded = #head.nxt.compare_and_set(first, node)
end
# Every N nodes where N=#capacity is designated as the "prune node"
# Once we push N times, we drop all the nodes after the prune node by setting
# it's nxt value to nil.
# Then we set the first node as the new prune node
#prune_node.compare_and_set(nil, node) if #prune_node.get.nil?
prune_node = #prune_node.get
count = prune_node.increment
if count >= #capacity
if #prune_node.compare_and_set(prune_node, node)
prune_node.nxt.set(nil)
end
end
nil
end
def each(&block)
enum = Enumerator.new do |yielder|
current = #head
# Here we just iterate through the list, but limit the results to #capacity
#capacity.times do
current = current.nxt.get
break if current == nil
yielder.yield(current.value)
end
end
block ? enum.each(&block) : enum
end
end

Related

Overriding the << method for instance variables

Let's suppose I have this class:
class Example
attr_accessor :numbers
def initialize(numbers = [])
#numbers = numbers
end
private
def validate!(number)
number >= 0 || raise(ArgumentError)
end
end
I would like to run the #validate! on any new number before pushing it into the numbers:
example = Example.new([1, 2, 3])
example.numbers # [1, 2, 3]
example.numbers << 4
example.numbers # [1, 2, 3, 4]
example.numbers << -1 # raise ArgumentError
Below is the best I can do but I'm really not sure about it.
Plus it works only on <<, not on push. I could add it but there is risk of infinite loop...).
Is there a more "regular" way to do it? I couldn't find any official process for that.
class Example
attr_accessor :numbers
def initialize(numbers = [])
#numbers = numbers
bind = self # so the instance is usable inside the singleton block
#numbers.singleton_class.send(:define_method, :<<) do |value|
# here, self refers to the #numbers array, so use bind to refer to the instance
bind.send(:validate!, value)
push(value)
end
end
private
def validate!(number)
number >= 0 || raise(ArgumentError)
end
end
Programming is a lot like real life: it is not a good idea to just run around and let strangers touch your private parts.
You are solving the wrong problem. You are trying to regulate what strangers can do when they play with your private parts, but instead you simply shouldn't let them touch your privates in the first place.
class Example
def initialize(numbers = [])
#numbers = numbers.clone
end
def numbers
#numbers.clone.freeze
end
def <<(number)
validate(number)
#numbers << number
self
end
private
def validate(number)
raise ArgumentError, "number must be non-negative, but is #{number}" unless number >= 0
end
end
example = Example.new([1, 2, 3])
example.numbers # [1, 2, 3]
example << 4
example.numbers # [1, 2, 3, 4]
example << -1 # raise ArgumentError
Let's look at all the changes I made one-by-one.
cloneing the initializer argument
You are taking a mutable object (an array) from an untrusted source (the caller). You should make sure that the caller cannot do anything "sneaky". In your first code, I can do this:
ary = [1, 2, 3]
example = Example.new(ary)
ary << -1
Since you simply took my array I handed you, I can still do to the array anything I want!
And even in the hardened version, I can do this:
ary = [1, 2, 3]
example = Example.new(ary)
class << ary
remove_method :<<
end
ary << -1
Or, I can freeze the array before I hand it to you, which makes it impossible to add a singleton method to it.
Even without the safety aspects, you should still do this, because you violate another real-life rule: Don't play with other people's toys! I am handing you my array, and then you mutate it. In the real world, that would be considered rude. In programming, it is surprising, and surprises breed bugs.
cloneing in the getter
This goes to the heart of the matter: the #numbers array is my private internal state. I should never hand that to strangers. If you don't hand the #numbers array out, then none of the problems you are protecting against can even occur.
You are trying to protect against strangers mutating your internal state, and the solution to that is simple: don't give strangers your internal state!
The freeze is technically not necessary, but I like it to make clear to the caller that this is just a view into the state of the example object, and they are only allowed to view what I want them to.
And again, even without the safety aspects, this would still be a bad idea: by exposing your internal implementation to clients, you can no longer change the internal implementation without breaking clients. If you change the array to a linked list, your clients are going to break, because they are used to getting an array that you can randomly index, but you can't randomly index a linked list, you always have to traverse it from the front.
The example is unfortunately too small and simple to judge that, but I would even question why you are handing out arrays in the first place. What do the clients want to do with those numbers? Maybe it is enough for them to just iterate over them, in which case you don't need to give them a whole array, just an iterator:
class Example
def each(...)
return enum_for(__callee__) unless block_given?
#numbers.each(...)
self
end
end
If the caller wants an array, they can still easily get one by calling to_a on the Enumerator.
Note that I return self. This has two reasons:
It is simply the contract of each. Every other object in Ruby that implements each returns self. If this were Java, this would be part of the Iterable interface.
I would actually accidentally leak the internal state that I work so hard to protect! As I just wrote: every implementation of each returns self, so what does #numbers.each return? It returns #numbers, which means my whole Example#each method returns #numbers which is exactly the thing I am trying to hide!
Implement << myself
Instead of handing out my internal state and have the caller append to it, I control what happens with my internal state. I implement my own version of << in which I can check for whatever I want and make sure no invariants of my object are violated.
Note that I return self. This has two reasons:
It is simply the contract of <<. Every other object in Ruby that implements << returns self. If this were Java, this would be part of the Appendable interface.
I would actually accidentally leak the internal state that I work so hard to protect! As I just wrote: every implementation of << returns self, so what does #numbers << number return? It returns #numbers, which means my whole Example#<< method returns #numbers which is exactly the thing I am trying to hide!
Drop the bang
In Ruby, method names that end with a bang mean "This method is more surprising than its non-bang counterpart". In your case, there is no non-bang counterpart, so the method shouldn't have a bang.
Don't abuse boolean operators for control flow
… or at least if you do, use the keyword versions (and / or) instead of the symbolic ones (&& / ||).
But really, you should void it altogether. do or die is idiomatic in Perl, but not in Ruby.
Technically, I have changed the return value of your method: it used to return true for a valid value, now it returns nil. But you ignore its return value anyway, so it doesn't matter.
validate is probably not a good name for the method, though. I would expect a method named validate to return a boolean result, not raise an exception.
An exceptional message
You should add messages to your exceptions that tell the programmer what went wrong. Another possibility is to create more specific exceptions, e.g.
class NegativeNumberError < ArgumentError; end
But that would be overkill in this case. In general, if you expect code to "read" your exception, create a new class, if you expect humans to read your exception, then a message is enough.
Encapsulation, Data Abstraction, Information Hiding
Those are three subtly different but related concepts, and they are among the most important concepts in programming. We always want hide our internal state and encapsulate it behind methods that we control.
Encapsulation to the max
Some people (including myself) don't particularly like even the object itself playing with its internal state. Personally, I even encapsulate private instance variables that are never exposed behind getters and setters. The reason is that this makes the class easier to subclass: you can override and specialize methods, but not instance variables. So, if I use the instance variable directly, a subclass cannot "hook" into those accesses.
Whereas if I use getter and setter methods, the subclass can override those (or only one of those).
Note: the example is too small and simple, so I had some real trouble coming up with a good name (there is not enough in the example to understand how the variable is used and what it means), so eventually, I just gave up, but you will see what I mean about using getters and setters:
class Example
class NegativeNumberError < ArgumentError; end
def initialize(numbers = [])
self.numbers_backing = numbers.clone
end
def each(...)
return enum_for(__callee__) unless block_given?
numbers_backing.each(...)
self
end
def <<(number)
validate(number)
numbers_backing << number
self
end
private
attr_accessor :numbers_backing
def validate(number)
raise NegativeNumberError unless number >= 0
end
end
example = Example.new([1, 2, 3])
example.each.to_a # [1, 2, 3]
example << 4
example.each.to_a # [1, 2, 3, 4]
example << -1 # raise NegativeNumberError

Is Ruby Array#[]= threadsafe for a preallocated array? Can this be made lockless?

I've written some code in ruby to process items in an array via a threadpool. In the process, I've preallocated a results array which is the same size as the passed-in array. Within the threadpool, I'm assigning items in the preallocated array, but the indexes of those items are guaranteed to be unique. With that in mind, do I need to surround the assignment with a Mutex#synchronize?
Example:
SIZE = 1000000000
def collect_via_threadpool(items, pool_count = 10)
processed_items = Array.new(items.count, nil)
index = -1
length = items.length
mutex = Mutex.new
items_mutex = Mutex.new
[pool_count, length, 50].min.times.collect do
Thread.start do
while (i = mutex.synchronize{index = index + 1}) < length do
processed_items[i] = yield(items[i])
# ^ do I need to synchronize around this? `processed_items` is preallocated
end
end
end.each(&:join)
processed_items
end
items = collect_via_threadpool(SIZE.times.to_a, 100) do |item|
item.to_s
end
raise unless items.size == SIZE
items.each_with_index do |item, index|
raise unless item.to_i == index
end
puts 'success'
(This test code takes a long time to run, but appears to print 'success' every time.)
It seems like I would want to surround the Array#[]= with Mutex#synchronize just to be safe, but my question is:
Within Ruby's specification is this code defined as safe?
Nothing in Ruby is specified to be thread safe other than Mutex (and thus anything derived from it). If you want to know if your specific code is thread safe, you'll need to look at how your implementation handles threads and arrays.
For MRI, calling Array.new(n, nil) does actually allocate memory for the entire array, so if your threads are guaranteed to not share indices your code will work. It's as safe as having multiple threads operate on distinct variables without a mutex.
However for other implementations, Array.new(n, nil) might not allocate a whole array, and assigning to indices later could involve reallocations and memory copies, which could break catastrophically.
So while your code may work (in MRI at least), don't rely on it. While we're on the topic, Ruby's threads aren't even specified to actually run in parallel. So if you're trying to avoid mutexes because you think you might see some performance boost, maybe you should rethink your approach.

What is prefered way to loop in Ruby?

Why is each loop preferred over for loop in Ruby? Is there a difference in time complexity or are they just syntactically different?
Yes, these are two different ways of iterating over, But hope this calculation helps.
require 'benchmark'
a = Array( 1..100000000 )
sum = 0
Benchmark.realtime {
a.each { |x| sum += x }
}
This takes 5.866932 sec
a = Array( 1..100000000 )
sum = 0
Benchmark.realtime {
for x in a
sum += x
end
}
This takes 6.146521 sec.
Though its not a right way to do the benchmarking, there are some other constraints too. But on a single machine, each seems to be a bit faster than for.
The variable referencing an item in iteration is temporary and does not have significance outside of the iteration. It is better if it is hidden from outside of the iteration. With external iterators, such variable is located outside of the iteration block. In the following, e is useful only within do ... end, but is separated from the block, and written outside of it; it does not look easy to a programmer:
for e in [:foo, :bar] do
...
end
With internal iterators, the block variable is defined right inside the block, where it is used. It is easier to read:
[:foo, :bar].each do |e|
...
end
This visibility issue is not just for a programmer. With respect to visibility in the sense of scope, the variable for an external iterator is accessible outside of the iteration:
for e in [:foo] do; end
e # => :foo
whereas in internal iterator, a block variable is invisible from outside:
[:foo].each do |e|; end
e # => undefined local variable or method `e'
The latter is better from the point of view of encapsulation.
When you want to nest the loops, the order of variables would be somewhat backwards with external iterators:
for a in [[:foo, :bar]] do
for e in a do
...
end
end
but with internal iterators, the order is more straightforward:
[[:foo, :bar]].each do |a|
a.each do |e|
...
end
end
With external iterators, you can only use hard-coded Ruby syntax, and you also have to remember the matching between the keyword and the method that is internally called (for calls each), but for internal iterators, you can define your own, which gives flexibility.
each is the Ruby Way. Implements the Iterator Pattern that has decoupling benefits.
Check also this: "for" vs "each" in Ruby
An interesting question. There are several ways of looping in Ruby. I have noted that there is a design principle in Ruby, that when there are multiple ways of doing the same, there are usually subtle differences between them, and each case has its own unique use, its own problem that it solves. So in the end you end up needing to be able to write (and not just to read) all of them.
As for the question about for loop, this is similar to my earlier question whethe for loop is a trap.
Basically there are 2 main explicit ways of looping, one is by iterators (or, more generally, blocks), such as
[1, 2, 3].each { |e| puts e * 10 }
[1, 2, 3].map { |e| e * 10 )
# etc., see Array and Enumerable documentation for more iterator methods.
Connected to this way of iterating is the class Enumerator, which you should strive to understand.
The other way is Pascal-ish looping by while, until and for loops.
for y in [1, 2, 3]
puts y
end
x = 0
while x < 3
puts x; x += 1
end
# same for until loop
Like if and unless, while and until have their tail form, such as
a = 'alligator'
a.chop! until a.chars.last == 'g'
#=> 'allig'
The third very important way of looping is implicit looping, or looping by recursion. Ruby is extremely malleable, all classes are modifiable, hooks can be set up for various events, and this can be exploited to produce most unusual ways of looping. The possibilities are so endless that I don't even know where to start talking about them. Perhaps a good place is the blog by Yusuke Endoh, a well known artist working with Ruby code as his artistic material of choice.
To demonstrate what I mean, consider this loop
class Object
def method_missing sym
s = sym.to_s
if s.chars.last == 'g' then s else eval s.chop end
end
end
alligator
#=> "allig"
Aside of readability issues, the for loop iterates in the Ruby land whereas each does it from native code, so in principle each should be more efficient when iterating all elements in an array.
Loop with each:
arr.each {|x| puts x}
Loop with for:
for i in 0..arr.length
puts arr[i]
end
In the each case we are just passing a code block to a method implemented in the machine's native code (fast code), whereas in the for case, all code must be interpreted and run taking into account all the complexity of the Ruby language.
However for is more flexible and lets you iterate in more complex ways than each does, for example, iterating with a given step.
EDIT
I didn't come across that you can step over a range by using the step() method before calling each(), so the flexibility I claimed for the for loop is actually unjustified.

Chaining partition, keep_if etc

[1,2,3].partition.inject(0) do |acc, x|
x>2 # this line is intended to be used by `partition`
acc+=x # this line is intended to be used by `inject`
end
I know that I can write above stanza using different methods but this is not important here.
What I want to ask why somebody want to use partition (or other methods like keep_if, delete_if) at the beginning of the "chain"?
In my example, after I chained inject I couldn't use partition. I can write above stanza using each:
[1,2,3].each.inject(0) do |acc, x|
x>2 # this line is intended to be used by `partition`
acc+=x # this line is intended to be used by `inject`
end
and it will be the same, right?
I know that x>2 will be discarded (and not used) by partition. Only acc+=x will do the job (sum all elements in this case).
I only wrote that to show my "intention": I want to use partition in the chain like this [].partition.inject(0).
I know that above code won't work as I intended and I know that I can chain after block( }.map as mentioned by Neil Slater).
I wanted to know why, and when partition (and other methods like keep_if, delete_if etc) becomes each (just return elements of the array as partition do in the above cases).
In my example, partition.inject, partition became each because partition cannot take condition (x>2).
However partition.with_index (as mentioned by Boris Stitnicky) works (I can partition array and use index for whatever I want):
shuffled_array
.partition
.with_index { |element, index|
element > index
}
ps. This is not question about how to get sum of elements that are bigger than 2.
This is an interesting situation. Looking at your code examples, you are obviously new to Ruby and perhaps also to programming. Yet you managed to ask a very difficult question that basically concerns the Enumerator class, one of the least publicly understood classes, especially since Enumerator::Lazy was introduced. To me, your question is difficult enough that I am not able to provide a comprehensive answer. Yet the remarks about your code would not fit into a comment under the OP. That's why I'm adding this non-answer.
First of all, let us notice a few awful things in your code:
Useless lines. In both blocks, x>2 line is useless, because its return value is discarded.
[1,2,3].partition.inject(0) do |x, acc|
x>2 # <---- return value of this line is never used
acc+=x
end
[1,2,3].each.inject(0) do |x, acc|
x>2 # <---- return value of this line is never used
acc+=x
end
I will ignore this useless line when discussing your code examples further.
Useless #each method. It is useless to write
[1,2,3].each.inject(0) do |x, acc|
acc+=x
end
This is enough:
[1,2,3].inject(0) do |x, acc|
acc+=x
end
Useless use of #partition method. Instead of:
[1,2,3].partition.inject(0) do |x, acc|
acc+=x
end
You can just write this:
[1,2,3].inject(0) do |x, acc|
acc+=x
end
Or, as I would write it, this:
[ 1, 2, 3 ].inject :+
But then, you ask a deep question about using #partition method in the enumerator mode. Having discussed the trivial newbie problems of your code, we are left with the question how exactly the enumerator-returning versions of the #partition, #keep_if etc. should be used, or rather, what are the interesting way of using them, because everyone knows that we can use them for chaining:
array = [ *1..6 ]
shuffled_arrray = array.shuffle # randomly shuffles the array elements
shuffled_array
.partition # partition enumerator comes into play
.with_index { |element, index| # method Enumerator#with_index comes into play
element > index # and partitions elements into those greater
} # than their index, and those smaller
And also like this:
e = partition_enumerator_of_array = array.partition
# And then, we can partition the array in many ways:
e.each &:even? # partitions into odd / even numbers
e.each { rand() > 0.5 } # partitions the array randomly
# etc.
An easily understood advantage is that instead of writing longer:
array.partition &:even?
You can write shorter:
e.each &:even?
But I am basically sure that enumerators provide more power to the programmer than just chaining collection methods and shortening code a little bit. Because different enumerators do very different things. Some, such as #map! or #reject!, can even modify the collection on which they operate. In this case, it is imaginable that one could combine different enumerators with the same block to do different things. This ability to vary not just the blocks, but also the enumerators to which they are passed, gives combinatorial power, which can very likely be used to make some otherwise lengthy code very concise. But I am unable to provide a very useful concrete example of this.
In sum, Enumerator class is here mainly for chaining, and to use chaining, programmers do not really need to undestand Enumerator in detail. But I suspect that the correct habits regarding the use of Enumerator might be as difficult to learn as, for instance, correct habits of parametrized subclassing. I suspect I have not grasped the most powerful ways to use enumerators yet.
I think that the result [3, 3] is what you are looking for here - partitioning the array into smaller and larger numbers then summing each group. You seem to be confused about how you give the block "rules" to the two different methods, and have merged what should be two blocks into one.
If you need the net effects of many methods that each take a block, then you can chain after any block, by adding the .method after the close of the block like this: }.each or end.each
Also note that if you create partitions, you are probably wanting to sum over each partition separately. To do that you will need an extra link in the chain (in this case a map):
[1,2,3].partition {|x| x > 2}.map do |part|
part.inject(0) do |acc, x|
x + acc
end
end
# => [3, 3]
(You also got the accumulator and current value wrong way around in the inject, and there is no need to assign to the accumulator, Ruby does that for you).
The .inject is no longer in a method chain, instead it is inside a block. There is no problem with blocks inside other blocks, in fact you will see this very often in Ruby code.
I have chained .partition and .map in the above example. You could also write the above like this:
[1,2,3].partition do
|x| x > 2
end.map do |part|
part.inject(0) do |acc, x|
x + acc
end
end
. . . although when chaining with short blocks, I personally find it easier to use the { } syntax instead of do end, especially at the start of a chain.
If it all starts to look complex, there is not usually a high cost to assigning the results of the first part of a chain to a local variable, in which case there is no chain at all.
parts = [1,2,3].partition {|x| x > 2}
parts.map do |part|
part.inject(0) do |acc, x|
x + acc
end
end

Ruby find in array with offset

I'm looking for a way to do the following in Ruby in a cleaner way:
class Array
def find_index_with_offset(offset, &block)
[offset..-1].find &block
end
end
offset = array.find_index {|element| element.meets_some_criterion?}
the_object_I_want =
array.find_index_with_offset(offset+1) {|element| element.meets_another_criterion?}
So I'm searching a Ruby array for the index of some object and then I do a follow-up search to find the first object that matches some other criterion and has a higher index in the array. Is there a better way to do this?
What do I mean by cleaner: something that doesn't involve explicitly slicing the array. When you do this a couple of times, calculating the slicing indices gets messy fast. I'd like to keep operating on the original array. It's easier to understand and less error-prone.
NB. In my actual code I haven't monkey-patched Array, but I want to draw attention to the fact that I expect I'm duplicating existing functionality of Array/Enumerable
Edits
Fixed location of offset + 1 as per Mladen Jablanović's comment; rewrite error
Added explanation of 'cleaner' as per Mladen Jablanović's comment
Cleaner is here obviously subjective matter. If you aim for short, I don't think you could do better than that. If you want to be able to chain multiple such finds, or you are bothered by slicing, you can do something like this:
module Enumerable
def find_multi *procs
return nil if procs.empty?
find do |e|
if procs.first.call(e)
procs.shift
next true if procs.empty?
end
false
end
end
end
a = (1..10).to_a
p a.find_multi(lambda{|e| e % 5 == 0}, lambda{|e| e % 3 == 0}, lambda{|e| e % 4 == 0})
#=> 8
Edit: And if you're not concerned with the performance you could do something like:
array.drop_while{|element|
!element.meets_some_criterion?
}.drop(1).find{|element|
element.meets_another_criterion?
}

Resources