How %r(..) differs from /../ in Regexp creation in Ruby? [closed] - ruby

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I am using Ruby1.9.3. I am newbie to this platform.
From the docs I got to know we can make Regexp using the below :
%r{pattern}
/pattern/
Now is there any difference between the the two styles above mentioned, interms of fast pattern matching symbol, Area specifics(***can use/can't use restrictions***) etc.
I found one as below :
irb(main):006:0> s= '2/3'
=> "2/3"
irb(main):008:0> /2\/3/ =~ s
=> 0
irb(main):009:0> %r(2/3) =~ s
=> 0
irb(main):010:0> exit
Here I found one diferrence between %r(..) and /../ is we don't need to use \ to escape /. Is there any more from your practical experiences?
EDIT
As per #akashspeaking suggestion I tried this and found what he said:
> re=%r(2/3)­
=> /2\/3/ # giving the pattern /../. Means Ruby internally converted this %r(..) to /../, which it should not if we created such regexp pattern manually.
>
From the above it is very clear theoretically that %r(..) is slower than the /../.
Can anyone help me by executing quickbm(10000000) { /2\­/3/=~s } and quickbm(10000000) { %r(2/3) =~ s }to measure the execution time. I don't have the required gem benchmark installed here. But curios to know the output of that two.If any one has - could you try on your terminal and paste the details here?
Thanks

There is absolutely no difference in %r/foo/ and /foo/.
irb(main):001:0> %r[foo]
=> /foo/
irb(main):002:0> %r{foo}
=> /foo/
irb(main):003:0> /foo/
=> /foo/
The source script will be analyzed by the interpreter at startup and both will be converted to a regexp, which, at run-time, will be the same.
The only difference is the source-code, not the executable. Try this:
require 'benchmark'
str = (('a'..'z').to_a * 256).join + 'foo'
n = 1_000_000
puts RUBY_VERSION, n
puts
Benchmark.bm do |b|
b.report('%r') { n.times { str[%r/foo/] } }
b.report('/') { n.times { str[/foo/] } }
end
Which outputs:
1.9.3
1000000
user system total real
%r 8.000000 0.000000 8.000000 ( 8.014767)
/ 8.000000 0.000000 8.000000 ( 8.010062)
That's on an old MacBook Pro running 10.8.2. Think about it, that's 6,656,000,000 (26 * 256 * 1,000,000) characters being searched and both returned what's essentially the same value. Coincidence? I think not.
Running this on a machine and getting an answer that varies significantly between the two tests on that CPU would indicate a difference in run-time performance of the two syntactically different ways of specifying the same thing. I seriously doubt that will happen.
EDIT:
Running it multiple times shows the randomness in action. I adjusted the code a bit to make it do five loops across the benchmarks this morning. The system was scanning the disk while running the tests so they took a little longer, but they still show minor random differences between the two runs:
require 'benchmark'
str = (('a'..'z').to_a * 256).join + 'foo'
n = 1_000_000
puts RUBY_VERSION, n
puts
regex = 'foo'
Benchmark.bm(2) do |b|
5.times do
b.report('%r') { n.times { str[%r/#{ regex }/] } }
b.report('/') { n.times { str[/#{ regex }/] } }
end
end
And the results:
# user system total real
%r 12.440000 0.030000 12.470000 ( 12.475312)
/ 12.420000 0.030000 12.450000 ( 12.455737)
%r 12.400000 0.020000 12.420000 ( 12.431750)
/ 12.400000 0.020000 12.420000 ( 12.417107)
%r 12.430000 0.030000 12.460000 ( 12.467275)
/ 12.390000 0.020000 12.410000 ( 12.418452)
%r 12.400000 0.030000 12.430000 ( 12.432781)
/ 12.390000 0.020000 12.410000 ( 12.412609)
%r 12.410000 0.020000 12.430000 ( 12.427783)
/ 12.420000 0.020000 12.440000 ( 12.449336)
Running about two seconds later:
# user system total real
%r 12.360000 0.020000 12.380000 ( 12.390146)
/ 12.370000 0.030000 12.400000 ( 12.391151)
%r 12.370000 0.020000 12.390000 ( 12.397819)
/ 12.380000 0.020000 12.400000 ( 12.399413)
%r 12.410000 0.020000 12.430000 ( 12.440236)
/ 12.420000 0.030000 12.450000 ( 12.438158)
%r 12.560000 0.040000 12.600000 ( 12.969364)
/ 12.640000 0.050000 12.690000 ( 12.810051)
%r 13.160000 0.120000 13.280000 ( 14.624694) # <-- opened new browser window
/ 12.650000 0.040000 12.690000 ( 13.040637)
There is no consistent difference in speed.

Here I found one diferrence between %r(..) and /../ is we don't need
to use \ to escape /.
That is their primary use. Unlike strings, whose delimiters change their semantics, the only real differences between the regular expression literals are the delimiters themselves.

Look also to this thread The Ruby %r{ } expression and 2 paragraphs of this doc http://www.ruby-doc.org/core-1.9.3/Regexp.html
there is no difference except of using any symbols as delimiters after %r instead of //

If you use %r notation, you can use an arbitrary symbol as delimiter. For example, you can write a regex as any of the following (and more):
%r{pattern}
%r[pattern]
%r(pattern)
%r!pattern!
This can be useful if your regex contains lots of '/'
Note: No matter what you use, it will be saved in default form. i.e.
%r:pattern: will default to /pattern/

Related

Manipulate string in ruby

I have a grouping of string variables that will be something like "height_low". I want to use something clean like gsub or something else to get rid of the underscore and everything past it. so it will be like "height". Does someone have a solution for this? Thanks.
Try this:
strings.map! {|s| s.split('_').first}
Shorter:
my_string.split('_').first
The unavoidable regex answer. (Assuming strings is an array of strings.)
strings.map! { |s| s[/^.+?(?=_)/] }
FWIW, solutions based on String#split perform poorly because they have to parse the whole string and allocate an array. Their performance degrades as the number of underscores increases. The following performs better:
string[0, string.index("_") || string.length]
Benchmark results (with number of underscores in parenthesis):
user system total real
String#split (0) 0.640000 0.000000 0.640000 ( 0.650323)
String#split (1) 0.760000 0.000000 0.760000 ( 0.759951)
String#split (9) 2.180000 0.010000 2.190000 ( 2.192356)
String#index (0) 0.610000 0.000000 0.610000 ( 0.625972)
String#index (1) 0.580000 0.010000 0.590000 ( 0.589463)
String#index (9) 0.600000 0.000000 0.600000 ( 0.605253)
Benchmarks:
strings = ["x", "x_x", "x_x_x_x_x_x_x_x_x_x"]
Benchmark.bm(16) do |bm|
strings.each do |string|
bm.report("String#split (#{string.count("_")})") do
1000000.times { string.split("_").first }
end
end
strings.each do |string|
bm.report("String#index (#{string.count("_")})") do
1000000.times { string[0, string.index("_") || string.length] }
end
end
end
Try as below using str[regexp, capture] → new_str or nil:
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
strings.map { |s| s[/(.*?)_.*$/,1] }
If you're looking for something "like gsub", why not just use gsub?
"height_low".gsub(/_.*$/, "") #=> "height"
In my opinion though, this is a bit cleaner:
"height_low".split('_').first #=> "height"
Another option is to use partition:
"height_low".partition("_").first #=> "height"
Learn to think in terms of searches vs. replacements. It's usually easier, faster, and cleaner to search for, and extract, what you want, than it is to search for, and strip, what you don't want.
Consider this:
'a_b_c'[/^(.*?)_/, 1] # => "a"
It looks for only what you want, which is the text from the start of the string, up to _. Everything preceding _ is captured, and returned.
The alternates:
'a_b_c'.sub(/_.+$/, '') # => "a"
'a_b_c'.gsub(/_.+$/, '') # => "a"
have to look backwards until the engine is sure there are no more _, then the string can be truncated.
Here's a little benchmark showing how that affects things:
require 'fruity'
compare do
string_capture { 'a_b_c'[/^(.*?)_/, 1] }
string_sub { 'a_b_c'.sub(/_.+$/, '') }
string_gsub { 'a_b_c'.gsub(/_.+$/, '') }
look_ahead { 'a_b_c'[/^.+?(?=_)/] }
string_index { 'a_b_c'[0, s.index("_") || s.length] }
end
# >> Running each test 8192 times. Test will take about 1 second.
# >> string_index is faster than string_capture by 19.999999999999996% ± 10.0%
# >> string_capture is similar to look_ahead
# >> look_ahead is faster than string_sub by 70.0% ± 10.0%
# >> string_sub is faster than string_gsub by 2.9x ± 0.1
Again, searching is going to be faster than any sort of replace, so think about what you're doing.
The downfall to the "search" regex-based tactics like "string_capture" and "look_ahead" is they don't handle missing _, so if there's any question whether your string will, or will not, have _, then use the "string_index" method which will fall-back to using string.length to grab the entire string.

Fastest way to check if a string matches a regexp in ruby?

What is the fastest way to check if a string matches a regular expression in Ruby?
My problem is that I have to "egrep" through a huge list of strings to find which are the ones that match a regexp that is given at runtime. I only care about whether the string matches the regexp, not where it matches, nor what the content of the matching groups is. I hope this assumption can be used to reduce the amount of time my code spend matching regexps.
I load the regexp with
pattern = Regexp.new(ptx).freeze
I have found that string =~ pattern is slightly faster than string.match(pattern).
Are there other tricks or shortcuts that can used to make this test even faster?
Starting with Ruby 2.4.0, you may use RegExp#match?:
pattern.match?(string)
Regexp#match? is explicitly listed as a performance enhancement in the release notes for 2.4.0, as it avoids object allocations performed by other methods such as Regexp#match and =~:
Regexp#match?
Added Regexp#match?, which executes a regexp match without creating a back reference object and changing $~ to reduce object allocation.
This is a simple benchmark:
require 'benchmark'
"test123" =~ /1/
=> 4
Benchmark.measure{ 1000000.times { "test123" =~ /1/ } }
=> 0.610000 0.000000 0.610000 ( 0.578133)
"test123"[/1/]
=> "1"
Benchmark.measure{ 1000000.times { "test123"[/1/] } }
=> 0.718000 0.000000 0.718000 ( 0.750010)
irb(main):019:0> "test123".match(/1/)
=> #<MatchData "1">
Benchmark.measure{ 1000000.times { "test123".match(/1/) } }
=> 1.703000 0.000000 1.703000 ( 1.578146)
So =~ is faster but it depends what you want to have as a returned value. If you just want to check if the text contains a regex or not use =~
This is the benchmark I have run after finding some articles around the net.
With 2.4.0 the winner is re.match?(str) (as suggested by #wiktor-stribiżew), on previous versions, re =~ str seems to be fastest, although str =~ re is almost as fast.
#!/usr/bin/env ruby
require 'benchmark'
str = "aacaabc"
re = Regexp.new('a+b').freeze
N = 4_000_000
Benchmark.bm do |b|
b.report("str.match re\t") { N.times { str.match re } }
b.report("str =~ re\t") { N.times { str =~ re } }
b.report("str[re] \t") { N.times { str[re] } }
b.report("re =~ str\t") { N.times { re =~ str } }
b.report("re.match str\t") { N.times { re.match str } }
if re.respond_to?(:match?)
b.report("re.match? str\t") { N.times { re.match? str } }
end
end
Results MRI 1.9.3-o551:
$ ./bench-re.rb | sort -t $'\t' -k 2
user system total real
re =~ str 2.390000 0.000000 2.390000 ( 2.397331)
str =~ re 2.450000 0.000000 2.450000 ( 2.446893)
str[re] 2.940000 0.010000 2.950000 ( 2.941666)
re.match str 3.620000 0.000000 3.620000 ( 3.619922)
str.match re 4.180000 0.000000 4.180000 ( 4.180083)
Results MRI 2.1.5:
$ ./bench-re.rb | sort -t $'\t' -k 2
user system total real
re =~ str 1.150000 0.000000 1.150000 ( 1.144880)
str =~ re 1.160000 0.000000 1.160000 ( 1.150691)
str[re] 1.330000 0.000000 1.330000 ( 1.337064)
re.match str 2.250000 0.000000 2.250000 ( 2.255142)
str.match re 2.270000 0.000000 2.270000 ( 2.270948)
Results MRI 2.3.3 (there is a regression in regex matching, it seems):
$ ./bench-re.rb | sort -t $'\t' -k 2
user system total real
re =~ str 3.540000 0.000000 3.540000 ( 3.535881)
str =~ re 3.560000 0.000000 3.560000 ( 3.560657)
str[re] 4.300000 0.000000 4.300000 ( 4.299403)
re.match str 5.210000 0.010000 5.220000 ( 5.213041)
str.match re 6.000000 0.000000 6.000000 ( 6.000465)
Results MRI 2.4.0:
$ ./bench-re.rb | sort -t $'\t' -k 2
user system total real
re.match? str 0.690000 0.010000 0.700000 ( 0.682934)
re =~ str 1.040000 0.000000 1.040000 ( 1.035863)
str =~ re 1.040000 0.000000 1.040000 ( 1.042963)
str[re] 1.340000 0.000000 1.340000 ( 1.339704)
re.match str 2.040000 0.000000 2.040000 ( 2.046464)
str.match re 2.180000 0.000000 2.180000 ( 2.174691)
What about re === str (case compare)?
Since it evaluates to true or false and has no need for storing matches, returning match index and that stuff, I wonder if it would be an even faster way of matching than =~.
Ok, I tested this. =~ is still faster, even if you have multiple capture groups, however it is faster than the other options.
BTW, what good is freeze? I couldn't measure any performance boost from it.
Depending on how complicated your regular expression is, you could possibly just use simple string slicing. I'm not sure about the practicality of this for your application or whether or not it would actually offer any speed improvements.
'testsentence'['stsen']
=> 'stsen' # evaluates to true
'testsentence'['koala']
=> nil # evaluates to false
What I am wondering is if there is any strange way to make this check even faster, maybe exploiting some strange method in Regexp or some weird construct.
Regexp engines vary in how they implement searches, but, in general, anchor your patterns for speed, and avoid greedy matches, especially when searching long strings.
The best thing to do, until you're familiar with how a particular engine works, is to do benchmarks and add/remove anchors, try limiting searches, use wildcards vs. explicit matches, etc.
The Fruity gem is very useful for quickly benchmarking things, because it's smart. Ruby's built-in Benchmark code is also useful, though you can write tests that fool you by not being careful.
I've used both in many answers here on Stack Overflow, so you can search through my answers and will see lots of little tricks and results to give you ideas of how to write faster code.
The biggest thing to remember is, it's bad to prematurely optimize your code before you know where the slowdowns occur.
To complete Wiktor Stribiżew and Dougui answers I would say that /regex/.match?("string") about as fast as "string".match?(/regex/).
Ruby 2.4.0 (10 000 000 ~2 sec)
2.4.0 > require 'benchmark'
=> true
2.4.0 > Benchmark.measure{ 10000000.times { /^CVE-[0-9]{4}-[0-9]{4,}$/.match?("CVE-2018-1589") } }
=> #<Benchmark::Tms:0x005563da1b1c80 #label="", #real=2.2060338060000504, #cstime=0.0, #cutime=0.0, #stime=0.04000000000000001, #utime=2.17, #total=2.21>
2.4.0 > Benchmark.measure{ 10000000.times { "CVE-2018-1589".match?(/^CVE-[0-9]{4}-[0-9]{4,}$/) } }
=> #<Benchmark::Tms:0x005563da139eb0 #label="", #real=2.260814556000696, #cstime=0.0, #cutime=0.0, #stime=0.010000000000000009, #utime=2.2500000000000004, #total=2.2600000000000007>
Ruby 2.6.2 (100 000 000 ~20 sec)
irb(main):001:0> require 'benchmark'
=> true
irb(main):005:0> Benchmark.measure{ 100000000.times { /^CVE-[0-9]{4}-[0-9]{4,}$/.match?("CVE-2018-1589") } }
=> #<Benchmark::Tms:0x0000562bc83e3768 #label="", #real=24.60139879199778, #cstime=0.0, #cutime=0.0, #stime=0.010000999999999996, #utime=24.565644999999996, #total=24.575645999999995>
irb(main):004:0> Benchmark.measure{ 100000000.times { "CVE-2018-1589".match?(/^CVE-[0-9]{4}-[0-9]{4,}$/) } }
=> #<Benchmark::Tms:0x0000562bc846aee8 #label="", #real=24.634255946999474, #cstime=0.0, #cutime=0.0, #stime=0.010046, #utime=24.598276, #total=24.608321999999998>
Note: times varies, sometimes /regex/.match?("string") is faster and sometimes "string".match?(/regex/), the differences maybe only due to the machine activity.

Simple way for removing all non word characters

I'd like to remove all characters from string, using most simple way.
For example
from "a,sd3 31ds" to "asdds"
I cad do it something like this:
"a,sd3 31ds".gsub(/\W/, "").gsub(/\d/,"")
# => "asdds"
but it looks a little bit awkward. Maybe it is possible to merge these rexegs in one?
"a,sd3 31ds".gsub(/(\W|\d)/, "")
I would go for the regexp /[\W\d]+/. It is potentially faster than e.g. /(\W|\d)/.
require 'benchmark'
N = 500_000
Regexps = [ "(\\W|\\d)", "(\\W|\\d)+", "(?:\\W|\\d)", "(?:\\W|\\d)+",
"\\W|\\d", "[\\W\\d]", "[\\W\\d]+" ]
Benchmark.bm(15) do |x|
Regexps.each do | re_str |
re = Regexp.new(re_str)
x.report("/#{re_str}/:") { N.times { "a,sd3 31ds".gsub(re, "") }}
end
end
gives (with ruby 2.0.0p195 [x64-mingw32])
user system total real
/(\W|\d)/: 1.950000 0.000000 1.950000 ( 1.951437)
/(\W|\d)+/: 1.794000 0.000000 1.794000 ( 1.787569)
/(?:\W|\d)/: 1.857000 0.000000 1.857000 ( 1.855515)
/(?:\W|\d)+/: 1.638000 0.000000 1.638000 ( 1.626698)
/\W|\d/: 1.856000 0.000000 1.856000 ( 1.865506)
/[\W\d]/: 1.732000 0.000000 1.732000 ( 1.754596)
/[\W\d]+/: 1.622000 0.000000 1.622000 ( 1.617705)
You can do this with the regex "OR".
"205h2n0bn r0".gsub(/\W|\d/, "")
will do the trick :)
What about
"a,sd3 31ds".gsub(/\W|\d/,"")
You can always join regular expressions by | to express an "or".
You can try this regex:
\P{L}
not Unicode letter, but I don't know, does Ruby support this class.
A non regex solution:
"a,sd3 31ds".delete('^A-Za-z')

Remove "#" sign and everything after it in Ruby

I am working on an application where I need to pass on the anything before "#" sign from the user's email address as his/her first name and last name. For example if the user has an email address "user#example.com" than when the user submits the form I remove "#example.com" from the email and assign "user" as the first and last name.
I have done research but was not able to find a way of doing this in Ruby. Any suggestions ??
You can split on "#" and just use the first part.
email.split("#")[0]
That will give you the first part before the "#".
To catch anything before the # sign:
my_string = "user#example.com"
substring = my_string[/[^#]+/]
# => "user"
Just split at the # symbol and grab what went before it.
string.split('#')[0]
The String#split will be useful. Given a string and an argument, it returns an array splitting the string up into separate elements on that String. So if you had:
e = test#testing.com
e.split("#")
#=> ["test", "testing.com"]
Thus you would take e.split("#")[0] for the first part of the address.
use gsub and a regular expression
first_name = email.gsub(/#[^\s]+/,"")
irb(main):011:0> Benchmark.bmbm do |x|
irb(main):012:1* email = "user#domain.type"
irb(main):013:1> x.report("split"){100.times{|n| first_name = email.split("#")[0]}}
irb(main):014:1> x.report("regex"){100.times{|n| first_name = email.gsub(/#[a-z.]+/,"")}}
irb(main):015:1> end
Rehearsal -----------------------------------------
split 0.000000 0.000000 0.000000 ( 0.000000)
regex 0.000000 0.000000 0.000000 ( 0.001000)
-------------------------------- total: 0.000000sec
user system total real
split 0.000000 0.000000 0.000000 ( 0.001000)
regex 0.000000 0.000000 0.000000 ( 0.000000)
=> [#<Benchmark::Tms:0x490b810 #label="", #stime=0.0, #real=0.00100016593933105, #utime=0.0, #cstime=0.0, #total=0.0, #cutime=0.0>, #<Benchmark::Tms:0x4910bb0 #
label="", #stime=0.0, #real=0.0, #utime=0.0, #cstime=0.0, #total=0.0, #cutime=0.0>]

Is there a performance gain in using single quotes vs double quotes in ruby?

Do you know if using double quotes instead of single quotes in ruby decreases performance in any meaningful way in ruby 1.8 and 1.9.
so if I type
question = 'my question'
is it faster than
question = "my question"
I imagine that ruby tries to figure out if something needs to be evaluated when it encounters double quotes and probably spends some cycles doing just that.
Summary: no speed difference; this great collaborative Ruby style guide recommends being consistent. I now use 'string' unless interpolation is needed (option A in the guide) and like it, but you will typically see more code with "string".
Details:
Theoretically, it can make a difference when your code is parsed, but not only should you not care about parse time in general (negligible compared to execution time), you won't be able to find a significant difference in this case.
The important thing is that when is gets executed it will be exactly the same.
Benchmarking this only shows a lack of understanding of how Ruby works. In both cases, the strings will get parsed to a tSTRING_CONTENT (see the source in parse.y). In other words, the CPU will go through the exact same operations when creating 'string' or "string". The exact same bits will flip the exact same way. Benchmarking this will only show differences that are not significant and due to other factors (GC kicking in, etc.); remember, there can't be any difference in this case! Micro benchmarks like these are difficult to get right. See my gem fruity for a decent tool for this.
Note that if there is interpolation of the form "...#{...}...", this gets parsed to a tSTRING_DBEG, a bunch of tSTRING_DVAR for the each expression in #{...} and a final tSTRING_DEND. That's only if there is interpolation, though, which is not what the OP is about.
I used to suggest you use double quotes everywhere (makes it easier to actually add that #{some_var} later on), but I now use single quotes unless I need interpolation, \n, etc... I like it visually and it's slightly more explicit, since there's no need to parse the string to see if it contains any expression.
$ ruby -v
ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.0.0]
$ cat benchmark_quotes.rb
# As of Ruby 1.9 Benchmark must be required
require 'benchmark'
n = 1000000
Benchmark.bm(15) do |x|
x.report("assign single") { n.times do; c = 'a string'; end}
x.report("assign double") { n.times do; c = "a string"; end}
x.report("concat single") { n.times do; 'a string ' + 'b string'; end}
x.report("concat double") { n.times do; "a string " + "b string"; end}
end
$ ruby benchmark_quotes.rb
user system total real
assign single 0.110000 0.000000 0.110000 ( 0.116867)
assign double 0.120000 0.000000 0.120000 ( 0.116761)
concat single 0.280000 0.000000 0.280000 ( 0.276964)
concat double 0.270000 0.000000 0.270000 ( 0.278146)
Note: I've updated this to make it work with newer Ruby versions, and cleaned up the header, and run the benchmark on a faster system.
This answer omits some key points. See especially these other answers concerning interpolation and the reason there is no significant difference in performance when using single vs. double quotes.
No one happened to measure concatenation vs interpolation though:
$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9.6.2]
$ cat benchmark_quotes.rb
require 'benchmark'
n = 1000000
Benchmark.bm do |x|
x.report("assign single") { n.times do; c = 'a string'; end}
x.report("assign double") { n.times do; c = "a string"; end}
x.report("assign interp") { n.times do; c = "a string #{'b string'}"; end}
x.report("concat single") { n.times do; 'a string ' + 'b string'; end}
x.report("concat double") { n.times do; "a string " + "b string"; end}
end
$ ruby -w benchmark_quotes.rb
user system total real
assign single 2.600000 1.060000 3.660000 ( 3.720909)
assign double 2.590000 1.050000 3.640000 ( 3.675082)
assign interp 2.620000 1.050000 3.670000 ( 3.704218)
concat single 3.760000 1.080000 4.840000 ( 4.888394)
concat double 3.700000 1.070000 4.770000 ( 4.818794)
Specifically, note assign interp = 2.62 vs concat single = 3.76.
As icing on the cake, I also find interpolation to be more readable than 'a' + var + 'b' especially with regard to spaces.
No difference - unless you're using #{some_var} style string interpolation. But you only get the performance hit if you actually do that.
Modified from Zetetic's example:
require 'benchmark'
n = 1000000
Benchmark.bm do |x|
x.report("assign single") { n.times do; c = 'a string'; end}
x.report("assign double") { n.times do; c = "a string"; end}
x.report("assign interp") { n.times do; c = "a #{n} string"; end}
x.report("concat single") { n.times do; 'a string ' + 'b string'; end}
x.report("concat double") { n.times do; "a string " + "b string"; end}
x.report("concat interp") { n.times do; "a #{n} string " + "b #{n} string"; end}
end
output
user system total real
assign single 0.370000 0.000000 0.370000 ( 0.374599)
assign double 0.360000 0.000000 0.360000 ( 0.366636)
assign interp 1.540000 0.010000 1.550000 ( 1.577638)
concat single 1.100000 0.010000 1.110000 ( 1.119720)
concat double 1.090000 0.000000 1.090000 ( 1.116240)
concat interp 3.460000 0.020000 3.480000 ( 3.535724)
Single quotes can be very slightly faster than double quotes because the lexer doesn't have to check for #{} interpolation markers. Depending on implementation, etc. Note that this is a parse-time cost, not a run-time cost.
That said, the actual question was whether using double quoted strings "decreases performance in any meaningful way", to which the answer is a decisive "no". The difference in performance is so incredibly small that it is completely insignificant compared to any real performance concerns. Don't waste your time.
Actual interpolation is a different story, of course. 'foo' will be almost exactly 1 second faster than "#{sleep 1; nil}foo".
Thought I'd add a comparison of 1.8.7 and 1.9.2. I ran them a few times. Variance was about +-0.01.
require 'benchmark'
n = 1000000
Benchmark.bm do |x|
x.report("assign single") { n.times do; c = 'a string'; end}
x.report("assign double") { n.times do; c = "a string"; end}
x.report("assign interp") { n.times do; c = "a #{n} string"; end}
x.report("concat single") { n.times do; 'a string ' + 'b string'; end}
x.report("concat double") { n.times do; "a string " + "b string"; end}
x.report("concat interp") { n.times do; "a #{n} string " + "b #{n} string"; end}
end
ruby 1.8.7 (2010-08-16 patchlevel 302) [x86_64-linux]
assign single 0.180000 0.000000 0.180000 ( 0.187233)
assign double 0.180000 0.000000 0.180000 ( 0.187566)
assign interp 0.880000 0.000000 0.880000 ( 0.877584)
concat single 0.550000 0.020000 0.570000 ( 0.567285)
concat double 0.570000 0.000000 0.570000 ( 0.570644)
concat interp 1.800000 0.010000 1.810000 ( 1.816955)
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
user system total real
assign single 0.140000 0.000000 0.140000 ( 0.144076)
assign double 0.130000 0.000000 0.130000 ( 0.142316)
assign interp 0.650000 0.000000 0.650000 ( 0.656088)
concat single 0.370000 0.000000 0.370000 ( 0.370663)
concat double 0.370000 0.000000 0.370000 ( 0.370076)
concat interp 1.420000 0.000000 1.420000 ( 1.412210)
Double quotes take twice as many key strikes to type than single quotes. I'm always in a hurry. I use single quotes. :) And yes, I consider that a "performance gain". :)
There is no significant difference in either direction. It would have to be huge for it to matter.
Except for times when you are sure that there is an actual problem with timing, optimize for programmer maintainability.
The costs of machine time are very very small. The costs of programmer time to write code and maintain it is huge.
What good is an optimization to save seconds, even minutes of runtime over thousands of runs if it means that the code is harder to maintain?
Pick with a style and stick with it but do not pick that style based on statistically insignificant milliseconds of runtime.
I tried the following:
def measure(t)
single_measures = []
double_measures = []
double_quoted_string = ""
single_quoted_string = ''
single_quoted = 0
double_quoted = 0
t.times do |i|
t1 = Time.now
single_quoted_string << 'a'
t1 = Time.now - t1
single_measures << t1
t2 = Time.now
double_quoted_string << "a"
t2 = Time.now - t2
double_measures << t2
if t1 > t2
single_quoted += 1
else
double_quoted += 1
end
end
puts "Single quoted did took longer in #{((single_quoted.to_f/t.to_f) * 100).round(2)} percent of the cases"
puts "Double quoted did took longer in #{((double_quoted.to_f/t.to_f) * 100).round(2)} percent of the cases"
single_measures_avg = single_measures.inject{ |sum, el| sum + el }.to_f / t
double_measures_avg = double_measures.inject{ |sum, el| sum + el }.to_f / t
puts "Single did took an average of #{single_measures_avg} seconds"
puts "Double did took an average of #{double_measures_avg} seconds"
puts "\n"
end
both = 10.times do |i|
measure(1000000)
end
And these are the outputs:
1.
Single quoted did took longer in 32.33 percent of the cases
Double quoted did took longer in 67.67 percent of the cases
Single did took an average of 5.032084099982639e-07 seconds
Double did took an average of 5.171539549983464e-07 seconds
2.
Single quoted did took longer in 26.9 percent of the cases
Double quoted did took longer in 73.1 percent of the cases
Single did took an average of 4.998066229983696e-07 seconds
Double did took an average of 5.223457359986066e-07 seconds
3.
Single quoted did took longer in 26.44 percent of the cases
Double quoted did took longer in 73.56 percent of the cases
Single did took an average of 4.97640888998877e-07 seconds
Double did took an average of 5.132918459987151e-07 seconds
4.
Single quoted did took longer in 26.57 percent of the cases
Double quoted did took longer in 73.43 percent of the cases
Single did took an average of 5.017136069985988e-07 seconds
Double did took an average of 5.004514459988143e-07 seconds
5.
Single quoted did took longer in 26.03 percent of the cases
Double quoted did took longer in 73.97 percent of the cases
Single did took an average of 5.059069689983285e-07 seconds
Double did took an average of 5.028807639983705e-07 seconds
6.
Single quoted did took longer in 25.78 percent of the cases
Double quoted did took longer in 74.22 percent of the cases
Single did took an average of 5.107472039991399e-07 seconds
Double did took an average of 5.216212339990241e-07 seconds
7.
Single quoted did took longer in 26.48 percent of the cases
Double quoted did took longer in 73.52 percent of the cases
Single did took an average of 5.082368429989468e-07 seconds
Double did took an average of 5.076817109989933e-07 seconds
8.
Single quoted did took longer in 25.97 percent of the cases
Double quoted did took longer in 74.03 percent of the cases
Single did took an average of 5.077162969990005e-07 seconds
Double did took an average of 5.108381859991112e-07 seconds
9.
Single quoted did took longer in 26.28 percent of the cases
Double quoted did took longer in 73.72 percent of the cases
Single did took an average of 5.148080479983138e-07 seconds
Double did took an average of 5.165793929982176e-07 seconds
10.
Single quoted did took longer in 25.03 percent of the cases
Double quoted did took longer in 74.97 percent of the cases
Single did took an average of 5.227828659989748e-07 seconds
Double did took an average of 5.218296609988378e-07 seconds
If I made no mistake, it seems to me that both take approximately the same time, even though single quoted is slightly faster in most cases.
It's certainly possible depending on the implementation, but the scanning portion of the interpreter should only look at each character once. It will need just an additional state (or possible set of states) and transitions to handle #{} blocks. In a table based scanner thats going to be a single lookup to determine transition, and will be happening for each character anyways.When the parser gets the scanner output, it's already known that it will have to eval code in the block. So the overhead is only really the memory overhead in the scanner/parser to handle the #{} block, which you pay for either way.
Unless I'm missing something (or misremembering compiler construction details), which is also certainly possible :)
~ > ruby -v
jruby 1.6.7 (ruby-1.8.7-p357) (2012-02-22 3e82bc8) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_37) [darwin-x86_64-java]
~ > cat qu.rb
require 'benchmark'
n = 1000000
Benchmark.bm do |x|
x.report("assign single") { n.times do; c = 'a string'; end}
x.report("assign double") { n.times do; c = "a string"; end}
x.report("concat single") { n.times do; 'a string ' + 'b string'; end}
x.report("concat double") { n.times do; "a string " + "b string"; end}
end
~ > ruby qu.rb
user system total real
assign single 0.186000 0.000000 0.186000 ( 0.151000)
assign double 0.062000 0.000000 0.062000 ( 0.062000)
concat single 0.156000 0.000000 0.156000 ( 0.156000)
concat double 0.124000 0.000000 0.124000 ( 0.124000)
There's one you all missed.
HERE doc
try this
require 'benchmark'
mark = <<EOS
a string
EOS
n = 1000000
Benchmark.bm do |x|
x.report("assign here doc") {n.times do; mark; end}
end
It gave me
`asign here doc 0.141000 0.000000 0.141000 ( 0.140625)`
and
'concat single quotes 1.813000 0.000000 1.813000 ( 1.843750)'
'concat double quotes 1.812000 0.000000 1.812000 ( 1.828125)'
so it's certainly better than concat and writing all those puts.
I would like to see Ruby taught more along the lines of a document manipulation language.
After all, don't we really do that in Rails, Sinatra, and running tests?
I modded Tim Snowhite's answer.
require 'benchmark'
n = 1000000
attr_accessor = :a_str_single, :b_str_single, :a_str_double, :b_str_double
#a_str_single = 'a string'
#b_str_single = 'b string'
#a_str_double = "a string"
#b_str_double = "b string"
#did_print = false
def reset!
#a_str_single = 'a string'
#b_str_single = 'b string'
#a_str_double = "a string"
#b_str_double = "b string"
end
Benchmark.bm do |x|
x.report('assign single ') { n.times do; c = 'a string'; end}
x.report('assign via << single') { c =''; n.times do; c << 'a string'; end}
x.report('assign double ') { n.times do; c = "a string"; end}
x.report('assing interp ') { n.times do; c = "a string #{'b string'}"; end}
x.report('concat single ') { n.times do; 'a string ' + 'b string'; end}
x.report('concat double ') { n.times do; "a string " + "b string"; end}
x.report('concat single interp') { n.times do; "#{#a_str_single}#{#b_str_single}"; end}
x.report('concat single << ') { n.times do; #a_str_single << #b_str_single; end}
reset!
# unless #did_print
# #did_print = true
# puts #a_str_single.length
# puts " a_str_single: #{#a_str_single} , b_str_single: #{#b_str_single} !!"
# end
x.report('concat double interp') { n.times do; "#{#a_str_double}#{#b_str_double}"; end}
x.report('concat double << ') { n.times do; #a_str_double << #b_str_double; end}
end
Results:
jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on Java HotSpot(TM) 64-Bit Server VM 1.7.0_10-b18 [darwin-x86_64]
user system total real
assign single 0.220000 0.010000 0.230000 ( 0.108000)
assign via << single 0.280000 0.010000 0.290000 ( 0.138000)
assign double 0.050000 0.000000 0.050000 ( 0.047000)
assing interp 0.100000 0.010000 0.110000 ( 0.056000)
concat single 0.230000 0.010000 0.240000 ( 0.159000)
concat double 0.150000 0.010000 0.160000 ( 0.101000)
concat single interp 0.170000 0.000000 0.170000 ( 0.121000)
concat single << 0.100000 0.000000 0.100000 ( 0.076000)
concat double interp 0.160000 0.000000 0.160000 ( 0.108000)
concat double << 0.100000 0.000000 0.100000 ( 0.074000)
ruby 1.9.3p429 (2013-05-15 revision 40747) [x86_64-darwin12.4.0]
user system total real
assign single 0.100000 0.000000 0.100000 ( 0.103326)
assign via << single 0.160000 0.000000 0.160000 ( 0.163442)
assign double 0.100000 0.000000 0.100000 ( 0.102212)
assing interp 0.110000 0.000000 0.110000 ( 0.104671)
concat single 0.240000 0.000000 0.240000 ( 0.242592)
concat double 0.250000 0.000000 0.250000 ( 0.244666)
concat single interp 0.180000 0.000000 0.180000 ( 0.182263)
concat single << 0.120000 0.000000 0.120000 ( 0.126582)
concat double interp 0.180000 0.000000 0.180000 ( 0.181035)
concat double << 0.130000 0.010000 0.140000 ( 0.128731)
I too thought that single quoted strings might be quicker to parse for Ruby. It doesn't seem to be the case.
Anyway, I think the above benchmark are measuring the wrong thing, though.
It stands to reason that either versions will be parsed into the same internal string representations so to get the answer as to which is quicker to parse, we shouldn't be measuring performance with string variables, but rather Ruby's speed of parsing strings.
generate.rb:
10000.times do
('a'..'z').to_a.each {|v| print "#{v}='This is a test string.'\n" }
end
#Generate sample ruby code with lots of strings to parse
$ ruby generate.rb > single_q.rb
#Get the double quote version
$ tr \' \" < single_q.rb > double_q.rb
#Compare execution times
$ time ruby single_q.rb
real 0m0.978s
user 0m0.920s
sys 0m0.048s
$ time ruby double_q.rb
real 0m0.994s
user 0m0.940s
sys 0m0.044s
Repeated runs don't seem to make much difference. It still takes pretty much the same time to parse either version of the string.

Resources