I have learned two array sorting methods in Ruby:
array = ["one", "two", "three"]
array.sort.reverse!
or:
array = ["one", "two", "three"]
array.sort { |x,y| y<=>x }
And I am not able to differentiate between the two. Which method is better and how exactly are they different in execution?
Both lines do the same (create a new array, which is reverse sorted). The main argument is about readability and performance. array.sort.reverse! is more readable than array.sort{|x,y| y<=>x} - I think we can agree here.
For the performance part, I created a quick benchmark script, which gives the following on my system (ruby 1.9.3p392 [x86_64-linux]):
user system total real
array.sort.reverse 1.330000 0.000000 1.330000 ( 1.334667)
array.sort.reverse! 1.200000 0.000000 1.200000 ( 1.198232)
array.sort!.reverse! 1.200000 0.000000 1.200000 ( 1.199296)
array.sort{|x,y| y<=>x} 5.220000 0.000000 5.220000 ( 5.239487)
Run times are pretty constant for multiple executions of the benchmark script.
array.sort.reverse (with or without !) is way faster than array.sort{|x,y| y<=>x}. Thus, I recommend that.
Here is the script as a Reference:
#!/usr/bin/env ruby
require 'benchmark'
Benchmark.bm do|b|
master = (1..1_000_000).map(&:to_s).shuffle
a = master.dup
b.report("array.sort.reverse ") do
a.sort.reverse
end
a = master.dup
b.report("array.sort.reverse! ") do
a.sort.reverse!
end
a = master.dup
b.report("array.sort!.reverse! ") do
a.sort!.reverse!
end
a = master.dup
b.report("array.sort{|x,y| y<=>x} ") do
a.sort{|x,y| y<=>x}
end
end
There really is no difference here. Both methods return a new array.
For the purposes of this example, simpler is better. I would recommend array.sort.reverse because it is much more readable than the alternative. Passing blocks to methods like sort should be saved for arrays of more complex data structures and user-defined classes.
Edit: While destructive methods (anything ending in a !) are good for performance games, it was pointed out that they aren't required to return an updated array, or anything at all for that matter. It is important to keep this in mind because array.sort.reverse! could very likely return nil. If you wish to use a destructive method on a newly generated array, you should prefer calling .reverse! on a separate line instead of having a one-liner.
Example:
array = array.sort
array.reverse!
should be preferred to
array = array.sort.reverse!
Reverse! is Faster
There's often no substitute for benchmarking. While it probably makes no difference in shorter scripts, the #reverse! method is significantly faster than sorting using the "spaceship" operator. For example, on MRI Ruby 2.0, and given the following benchmark code:
require 'benchmark'
array = ["one", "two", "three"]
loops = 1_000_000
Benchmark.bmbm do |bm|
bm.report('reverse!') { loops.times {array.sort.reverse!} }
bm.report('spaceship') { loops.times {array.sort {|x,y| y<=>x} }}
end
the system reports that #reverse! is almost twice as fast as using the combined comparison operator.
user system total real
reverse! 0.340000 0.000000 0.340000 ( 0.344198)
spaceship 0.590000 0.010000 0.600000 ( 0.595747)
My advice: use whichever is more semantically meaningful in a given context, unless you're running in a tight loop.
With comparison as simple as your example, there is not much difference, but as the formula for comparison gets complicated, it is better to avoid using <=> with a block because the block you pass will be evaluated for each element of the array, causing redundancy. Consider this:
array.sort{|x, y| some_expensive_method(x) <=> some_expensive_method(y)}
In this case, some_expensive_method will be evaluated for each possible pair of element of array.
In your particular case, use of a block with <=> can be avoided with reverse.
array.sort_by{|x| some_expensive_method(x)}.reverse
This is called Schwartzian transform.
In playing with tessi's benchmarks on my machine, I've gotten some interesting results. I'm running ruby 2.0.0p195 [x86_64-darwin12.3.0], i.e., latest release of Ruby 2 on an OS X system. I used bmbm rather than bm from the Benchmark module. My timings are:
Rehearsal -------------------------------------------------------------
array.sort.reverse: 1.010000 0.000000 1.010000 ( 1.020397)
array.sort.reverse!: 0.810000 0.000000 0.810000 ( 0.808368)
array.sort!.reverse!: 0.800000 0.010000 0.810000 ( 0.809666)
array.sort{|x,y| y<=>x}: 0.300000 0.000000 0.300000 ( 0.291002)
array.sort!{|x,y| y<=>x}: 0.100000 0.000000 0.100000 ( 0.105345)
---------------------------------------------------- total: 3.030000sec
user system total real
array.sort.reverse: 0.210000 0.000000 0.210000 ( 0.208378)
array.sort.reverse!: 0.030000 0.000000 0.030000 ( 0.027746)
array.sort!.reverse!: 0.020000 0.000000 0.020000 ( 0.020082)
array.sort{|x,y| y<=>x}: 0.110000 0.000000 0.110000 ( 0.107065)
array.sort!{|x,y| y<=>x}: 0.110000 0.000000 0.110000 ( 0.105359)
First, note that in the Rehearsal phase that sort! using a comparison block comes in as the clear winner. Matz must have tuned the heck out of it in Ruby 2!
The other thing that I found exceedingly weird was how much improvement array.sort.reverse! and array.sort!.reverse! exhibited in the production pass. It was so extreme it made me wonder whether I had somehow screwed up and passed these already sorted data, so I added explicit checks for sorted or reverse-sorted data prior to performing each benchmark.
My variant of tessi's script follows:
#!/usr/bin/env ruby
require 'benchmark'
class Array
def sorted?
(1...length).each {|i| return false if self[i] < self[i-1] }
true
end
def reversed?
(1...length).each {|i| return false if self[i] > self[i-1] }
true
end
end
master = (1..1_000_000).map(&:to_s).shuffle
Benchmark.bmbm(25) do|b|
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort.reverse:") { a.sort.reverse }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort.reverse!:") { a.sort.reverse! }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort!.reverse!:") { a.sort!.reverse! }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort{|x,y| y<=>x}:") { a.sort{|x,y| y<=>x} }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort!{|x,y| y<=>x}:") { a.sort!{|x,y| y<=>x} }
end
I want to see if one string contains a keyword in a keyword list.
I have the following function:
def needfilter?(src)
["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].each do |kw|
return true if src.include?(kw)
end
false
end
Can this code block be simplified in to one line sentence?
I know it can be simplified to:
def needfilter?(src)
!["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].select{|c| src.include?(c)}.empty?
end
But this approach is not so efficient if the keyword array list is very long.
Looks like a nice use case for Enumerable#any? method:
def needfilter?(src)
["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].any? do |kw|
src.include? kw
end
end
I was curious what's the fastest solution and I created a benchmark of all answers up to now.
I modified steenslag answer a bit. For tuning reasons I create the regexp only once not for each test.
require 'benchmark'
KEYWORDS = ["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"]
TESTSTRINGS = ['xx', 'xxx', "keyowrd_2"]
N = 10_000 #Number of Test loops
def needfilter_orig?(src)
["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].each do |kw|
return true if src.include?(kw)
end
false
end
def needfilter_orig2?(src)
!["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].select{|c| src.include?(c)}.empty?
end
def needfilter_any?(src)
["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].any? do |kw|
src.include? kw
end
end
def needfilter_regexp?(src)
!!(src =~ Regexp.union(KEYWORDS))
end
def needfilter_regexp_init?(src)
!!(src =~ $KEYWORDS_regexp)
end
def needfilter_split?(src)
(src.split(/ /) & KEYWORDS).empty?
end
Benchmark.bmbm(10) {|b|
b.report('orig') { N.times { TESTSTRINGS.each{|src| needfilter_orig?(src)} } }
b.report('orig2') { N.times { TESTSTRINGS.each{|src| needfilter_orig2?(src) } } }
b.report('any') { N.times { TESTSTRINGS.each{|src| needfilter_any?(src) } } }
b.report('regexp') { N.times { TESTSTRINGS.each{|src| needfilter_regexp?(src) } } }
b.report('regexp_init') {
$KEYWORDS_regexp = Regexp.union(KEYWORDS) # Initialize once
N.times { TESTSTRINGS.each{|src| needfilter_regexp_init?(src) } }
}
b.report('split') { N.times { TESTSTRINGS.each{|src| needfilter_split?(src) } } }
} #Benchmark
Result:
Rehearsal -----------------------------------------------
orig 0.094000 0.000000 0.094000 ( 0.093750)
orig2 0.093000 0.000000 0.093000 ( 0.093750)
any 0.110000 0.000000 0.110000 ( 0.109375)
regexp 0.578000 0.000000 0.578000 ( 0.578125)
regexp_init 0.047000 0.000000 0.047000 ( 0.046875)
split 0.125000 0.000000 0.125000 ( 0.125000)
-------------------------------------- total: 1.047000sec
user system total real
orig 0.078000 0.000000 0.078000 ( 0.078125)
orig2 0.109000 0.000000 0.109000 ( 0.109375)
any 0.078000 0.000000 0.078000 ( 0.078125)
regexp 0.579000 0.000000 0.579000 ( 0.578125)
regexp_init 0.046000 0.000000 0.046000 ( 0.046875)
split 0.125000 0.000000 0.125000 ( 0.125000)
The solution with regular expressions is the fastest, if you create the regexp only once.
def need_filter?(src)
!!(src =~ /keyowrd_1|keyowrd_2|keyowrd_3|keyowrd_4|keyowrd_5/)
end
The =~ method returns a fixnum or nil. The double bang converts that to a boolean.
This is the way I'd do it:
def needfilter?(src)
keywords = Regexp.union("keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5")
!!(src =~ keywords)
end
This solution has:
No iteration
Single regexp using Regexp.union
Should be fast for even a large set of keywords. Note that hardcoding the keywords in the method is not ideal, but I'm assuming that was just for the sake of the example.
I think that
def need_filter?(src)
(src.split(/ /) & ["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"]).empty?
end
will work as you expect (as it's described in Array include any value from another array?) and will be faster than any? and include?.
Example:
[12,23,987,43
What is the fastest, most efficient way to remove the "[",
using maybe a chop() but for the first character?
Similar to Pablo's answer above, but a shade cleaner :
str[1..-1]
Will return the array from 1 to the last character.
'Hello World'[1..-1]
=> "ello World"
I kind of favor using something like:
asdf = "[12,23,987,43"
asdf[0] = ''
p asdf
# >> "12,23,987,43"
I'm always looking for the fastest and most readable way of doing things:
require 'benchmark'
N = 1_000_000
puts RUBY_VERSION
STR = "[12,23,987,43"
Benchmark.bm(7) do |b|
b.report('[0]') { N.times { "[12,23,987,43"[0] = '' } }
b.report('sub') { N.times { "[12,23,987,43".sub(/^\[+/, "") } }
b.report('gsub') { N.times { "[12,23,987,43".gsub(/^\[/, "") } }
b.report('[1..-1]') { N.times { "[12,23,987,43"[1..-1] } }
b.report('slice') { N.times { "[12,23,987,43".slice!(0) } }
b.report('length') { N.times { "[12,23,987,43"[1..STR.length] } }
end
Running on my Mac Pro:
1.9.3
user system total real
[0] 0.840000 0.000000 0.840000 ( 0.847496)
sub 1.960000 0.010000 1.970000 ( 1.962767)
gsub 4.350000 0.020000 4.370000 ( 4.372801)
[1..-1] 0.710000 0.000000 0.710000 ( 0.713366)
slice 1.020000 0.000000 1.020000 ( 1.020336)
length 1.160000 0.000000 1.160000 ( 1.157882)
Updating to incorporate one more suggested answer:
require 'benchmark'
N = 1_000_000
class String
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
def first(how_many = 1)
self[0...how_many]
end
def shift(how_many = 1)
shifted = first(how_many)
self.replace self[how_many..-1]
shifted
end
alias_method :shift!, :shift
end
class Array
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
end
puts RUBY_VERSION
STR = "[12,23,987,43"
Benchmark.bm(7) do |b|
b.report('[0]') { N.times { "[12,23,987,43"[0] = '' } }
b.report('sub') { N.times { "[12,23,987,43".sub(/^\[+/, "") } }
b.report('gsub') { N.times { "[12,23,987,43".gsub(/^\[/, "") } }
b.report('[1..-1]') { N.times { "[12,23,987,43"[1..-1] } }
b.report('slice') { N.times { "[12,23,987,43".slice!(0) } }
b.report('length') { N.times { "[12,23,987,43"[1..STR.length] } }
b.report('eat!') { N.times { "[12,23,987,43".eat! } }
b.report('reverse') { N.times { "[12,23,987,43".reverse.chop.reverse } }
end
Which results in:
2.1.2
user system total real
[0] 0.300000 0.000000 0.300000 ( 0.295054)
sub 0.630000 0.000000 0.630000 ( 0.631870)
gsub 2.090000 0.000000 2.090000 ( 2.094368)
[1..-1] 0.230000 0.010000 0.240000 ( 0.232846)
slice 0.320000 0.000000 0.320000 ( 0.320714)
length 0.340000 0.000000 0.340000 ( 0.341918)
eat! 0.460000 0.000000 0.460000 ( 0.452724)
reverse 0.400000 0.000000 0.400000 ( 0.399465)
And another using /^./ to find the first character:
require 'benchmark'
N = 1_000_000
class String
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
def first(how_many = 1)
self[0...how_many]
end
def shift(how_many = 1)
shifted = first(how_many)
self.replace self[how_many..-1]
shifted
end
alias_method :shift!, :shift
end
class Array
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
end
puts RUBY_VERSION
STR = "[12,23,987,43"
Benchmark.bm(7) do |b|
b.report('[0]') { N.times { "[12,23,987,43"[0] = '' } }
b.report('[/^./]') { N.times { "[12,23,987,43"[/^./] = '' } }
b.report('[/^\[/]') { N.times { "[12,23,987,43"[/^\[/] = '' } }
b.report('sub+') { N.times { "[12,23,987,43".sub(/^\[+/, "") } }
b.report('sub') { N.times { "[12,23,987,43".sub(/^\[/, "") } }
b.report('gsub') { N.times { "[12,23,987,43".gsub(/^\[/, "") } }
b.report('[1..-1]') { N.times { "[12,23,987,43"[1..-1] } }
b.report('slice') { N.times { "[12,23,987,43".slice!(0) } }
b.report('length') { N.times { "[12,23,987,43"[1..STR.length] } }
b.report('eat!') { N.times { "[12,23,987,43".eat! } }
b.report('reverse') { N.times { "[12,23,987,43".reverse.chop.reverse } }
end
Which results in:
# >> 2.1.5
# >> user system total real
# >> [0] 0.270000 0.000000 0.270000 ( 0.270165)
# >> [/^./] 0.430000 0.000000 0.430000 ( 0.432417)
# >> [/^\[/] 0.460000 0.000000 0.460000 ( 0.458221)
# >> sub+ 0.590000 0.000000 0.590000 ( 0.590284)
# >> sub 0.590000 0.000000 0.590000 ( 0.596366)
# >> gsub 1.880000 0.010000 1.890000 ( 1.885892)
# >> [1..-1] 0.230000 0.000000 0.230000 ( 0.223045)
# >> slice 0.300000 0.000000 0.300000 ( 0.299175)
# >> length 0.320000 0.000000 0.320000 ( 0.325841)
# >> eat! 0.410000 0.000000 0.410000 ( 0.409306)
# >> reverse 0.390000 0.000000 0.390000 ( 0.393044)
Here's another update on faster hardware and a newer version of Ruby:
2.3.1
user system total real
[0] 0.200000 0.000000 0.200000 ( 0.204307)
[/^./] 0.390000 0.000000 0.390000 ( 0.387527)
[/^\[/] 0.360000 0.000000 0.360000 ( 0.360400)
sub+ 0.490000 0.000000 0.490000 ( 0.492083)
sub 0.480000 0.000000 0.480000 ( 0.487862)
gsub 1.990000 0.000000 1.990000 ( 1.988716)
[1..-1] 0.180000 0.000000 0.180000 ( 0.181673)
slice 0.260000 0.000000 0.260000 ( 0.266371)
length 0.270000 0.000000 0.270000 ( 0.267651)
eat! 0.400000 0.010000 0.410000 ( 0.398093)
reverse 0.340000 0.000000 0.340000 ( 0.344077)
Why is gsub so slow?
After doing a search/replace, gsub has to check for possible additional matches before it can tell if it's finished. sub only does one and finishes. Consider gsub like it's a minimum of two sub calls.
Also, it's important to remember that gsub, and sub can also be handicapped by poorly written regex which match much more slowly than a sub-string search. If possible anchor the regex to get the most speed from it. There are answers here on Stack Overflow demonstrating that so search around if you want more information.
We can use slice to do this:
val = "abc"
=> "abc"
val.slice!(0)
=> "a"
val
=> "bc"
Using slice! we can delete any character by specifying its index.
Ruby 2.5+
As of Ruby 2.5 you can use delete_prefix or delete_prefix! to achieve this in a readable manner.
In this case "[12,23,987,43".delete_prefix("[").
More info here:
Official docs
https://blog.jetbrains.com/ruby/2017/10/10-new-features-in-ruby-2-5/
https://bugs.ruby-lang.org/issues/12694
'invisible'.delete_prefix('in') #=> "visible"
'pink'.delete_prefix('in') #=> "pink"
N.B. you can also use this to remove items from the end of a string with delete_suffix and delete_suffix!
'worked'.delete_suffix('ed') #=> "work"
'medical'.delete_suffix('ed') #=> "medical"
Docs
https://bugs.ruby-lang.org/issues/13665
Edit:
Using the Tin Man's benchmark setup, it looks pretty quick too (under the last two entries delete_p and delete_p!). Doesn't quite pip the previous faves for speed, though is very readable.
2.5.0
user system total real
[0] 0.174766 0.000489 0.175255 ( 0.180207)
[/^./] 0.318038 0.000510 0.318548 ( 0.323679)
[/^\[/] 0.372645 0.001134 0.373779 ( 0.379029)
sub+ 0.460295 0.001510 0.461805 ( 0.467279)
sub 0.498351 0.001534 0.499885 ( 0.505729)
gsub 1.669837 0.005141 1.674978 ( 1.682853)
[1..-1] 0.199840 0.000976 0.200816 ( 0.205889)
slice 0.279661 0.000859 0.280520 ( 0.285661)
length 0.268362 0.000310 0.268672 ( 0.273829)
eat! 0.341715 0.000524 0.342239 ( 0.347097)
reverse 0.335301 0.000588 0.335889 ( 0.340965)
delete_p 0.222297 0.000832 0.223129 ( 0.228455)
delete_p! 0.225798 0.000747 0.226545 ( 0.231745)
I prefer this:
str = "[12,23,987,43"
puts str[1..-1]
>> 12,23,987,43
If you always want to strip leading brackets:
"[12,23,987,43".gsub(/^\[/, "")
If you just want to remove the first character, and you know it won't be in a multibyte character set:
"[12,23,987,43"[1..-1]
or
"[12,23,987,43".slice(1..-1)
Inefficient alternative:
str.reverse.chop.reverse
For example : a = "One Two Three"
1.9.2-p290 > a = "One Two Three"
=> "One Two Three"
1.9.2-p290 > a = a[1..-1]
=> "ne Two Three"
1.9.2-p290 > a = a[1..-1]
=> "e Two Three"
1.9.2-p290 > a = a[1..-1]
=> " Two Three"
1.9.2-p290 > a = a[1..-1]
=> "Two Three"
1.9.2-p290 > a = a[1..-1]
=> "wo Three"
In this way you can remove one by one first character of the string.
Easy way:
str = "[12,23,987,43"
removed = str[1..str.length]
Awesome way:
class String
def reverse_chop()
self[1..self.length]
end
end
"[12,23,987,43".reverse_chop()
(Note: prefer the easy way :) )
Thanks to #the-tin-man for putting together the benchmarks!
Alas, I don't really like any of those solutions. Either they require an extra step to get the result ([0] = '', .strip!) or they aren't very semantic/clear about what's happening ([1..-1]: "Um, a range from 1 to negative 1? Yearg?"), or they are slow or lengthy to write out (.gsub, .length).
What we are attempting is a 'shift' (in Array parlance), but returning the remaining characters, rather than what was shifted off. Let's use our Ruby to make this possible with strings! We can use the speedy bracket operation, but give it a good name, and take an arg to specify how much we want to chomp off the front:
class String
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
end
But there is more we can do with that speedy-but-unwieldy bracket operation. While we are at it, for completeness, let's write a #shift and #first for String (why should Array have all the fun‽‽), taking an arg to specify how many characters we want to remove from the beginning:
class String
def first(how_many = 1)
self[0...how_many]
end
def shift(how_many = 1)
shifted = first(how_many)
self.replace self[how_many..-1]
shifted
end
alias_method :shift!, :shift
end
Ok, now we have a good clear way of pulling characters off the front of a string, with a method that is consistent with Array#first and Array#shift (which really should be a bang method??). And we can easily get the modified string as well with #eat!. Hm, should we share our new eat!ing power with Array? Why not!
class Array
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
end
Now we can:
> str = "[12,23,987,43" #=> "[12,23,987,43"
> str.eat! #=> "12,23,987,43"
> str #=> "12,23,987,43"
> str.eat!(3) #=> "23,987,43"
> str #=> "23,987,43"
> str.first(2) #=> "23"
> str #=> "23,987,43"
> str.shift!(3) #=> "23,"
> str #=> "987,43"
> arr = [1,2,3,4,5] #=> [1, 2, 3, 4, 5]
> arr.eat! #=> [2, 3, 4, 5]
> arr #=> [2, 3, 4, 5]
That's better!
str = "[12,23,987,43"
str[0] = ""
class String
def bye_felicia()
felicia = self.strip[0] #first char, not first space.
self.sub(felicia, '')
end
end
Using regex:
str = 'string'
n = 1 #to remove first n characters
str[/.{#{str.size-n}}\z/] #=> "tring"
I find a nice solution to be str.delete(str[0]) for its readability, though I cannot attest to it's performance.
list = [1,2,3,4]
list.drop(1)
# => [2,3,4]
List drops one or more elements from the start of the array, does not mutate the array, and returns the array itself instead of the dropped element.