Random string generation with same pattern - ruby

I want to generate a random string with the pattern:
number-number-letter-SPACE-letter-number-number
for example "81b t15", "12a x13". How can I generate something like this? I tried generating each char and joining them into one string, but it does not look efficient.

Nums = (0..9).to_a
Ltrs = ("A".."Z").to_a + ("a".."z").to_a
def rand_num; Nums.sample end
def rand_ltr; Ltrs.sample end
"#{rand_num}#{rand_num}#{rand_ltr} #{rand_ltr}#{rand_num}#{rand_num}"
# => "71P v33"

Have you looked at randexp gem
It works like this:
> /\d\d\w \w\d\d/.gen
=> "64M c82"

Ok here's another entry for the competition :D
module RandomString
LETTERS = (("A".."Z").to_a + ("a".."z").to_a)
LETTERS_SIZE = LETTERS.size
SPACE = " "
FORMAT = [:number, :letter, :number, :space, :letter, :number, :number]
class << self
def generate
chars.join
end
def generate2
"#{number}#{letter}#{number} #{letter}#{number}#{number}"
end
private
def chars
FORMAT.collect{|char_class| send char_class}
end
def letter
LETTERS[rand(LETTERS_SIZE)]
end
def number
rand 10
end
def space
SPACE
end
end
end
And you use it like:
50.times { puts RandomString.generate }
Out of curiosity, I made a benchmark of all the solutions presented here. Here are the results:
JRuby:
user system total real
kimmmo 1.490000 0.000000 1.490000 ( 0.990000)
kimmmo2 0.600000 0.010000 0.610000 ( 0.479000)
sawa 0.960000 0.040000 1.000000 ( 0.533000)
hp4k 2.050000 0.230000 2.280000 ( 1.234000)
brian 17.700000 0.170000 17.870000 ( 14.867000)
MRI 2.0
user system total real
kimmmo 0.900000 0.000000 0.900000 ( 0.908601)
kimmmo2 0.410000 0.000000 0.410000 ( 0.406443)
sawa 0.570000 0.000000 0.570000 ( 0.568935)
hp4k 4.940000 0.000000 4.940000 ( 4.945404)
brian 25.860000 0.010000 25.870000 ( 25.870011)

You can do it this way
(0..9).to_a.sample(2).join + ('a'..'z').to_a.sample + " " + ('a'..'z').to_a.sample + (0..9).to_a.sample(2).join

Related

Why is storing a value in an instance variable more expensive than looking up a hash?

I ran a benchmark to see whether memoizing attributes was faster than reading from a configuration hash. The code below is an example. Can anybody explain them?
The Test
require 'benchmark'
class MyClassWithStuff
DEFAULT_VALS = { one: '1', two: 2, three: 3, four: 4 }
def memoized_fetch
#value ||= DEFAULT_VALS[:one]
end
def straight_fetch
DEFAULT_VALS[:one]
end
end
TIMES = 10000
CALL_TIMES = 1000
Benchmark.bmbm do |test|
test.report("Memoized") do
TIMES.times do
instance = MyClassWithStuff.new
CALL_TIMES.times { |i| instance.memoized_fetch }
end
end
test.report("Fetched") do
TIMES.times do
instance = MyClassWithStuff.new
CALL_TIMES.times { |i| instance.straight_fetch }
end
end
end
Results
Rehearsal --------------------------------------------
Memoized 1.500000 0.010000 1.510000 ( 1.510230)
Fetched 1.330000 0.000000 1.330000 ( 1.342800)
----------------------------------- total: 2.840000sec
user system total real
Memoized 1.440000 0.000000 1.440000 ( 1.456937)
Fetched 1.260000 0.000000 1.260000 ( 1.269904)

Count total characters in an Array of Strings in Ruby?

How would I count the total number of characters in an array of strings in Ruby? Assume I have the following:
array = ['peter' , 'romeo' , 'bananas', 'pijamas']
I'm trying:
array.each do |counting|
puts counting.count "array[]"
end
but, I'm not getting the desired result. It appears I am counting something other than the characters.
I searched for the count property but I haven't had any luck or found a good source of info. Basically, I'd like to get an output of the total of characters inside the array.,
Wing's Answer will work, but just for fun here are a few alternatives
['peter' , 'romeo' , 'bananas', 'pijamas'].inject(0) {|c, w| c += w.length }
or
['peter' , 'romeo' , 'bananas', 'pijamas'].join.length
The real issue is that string.count is not the method you're looking for. (Docs)
Or...
a.map(&:size).reduce(:+) # from Andrew: reduce(0, :+)
Another alternative:
['peter' , 'romeo' , 'bananas', 'pijamas'].join('').size
An interesting result :)
>> array = []
>> 1_000_000.times { array << 'foo' }
>> Benchmark.bmbm do |x|
>> x.report('mapreduce') { array.map(&:size).reduce(:+) }
>> x.report('mapsum') { array.map(&:size).sum }
>> x.report('inject') { array.inject(0) { |c, w| c += w.length } }
>> x.report('joinsize') { array.join('').size }
>> x.report('joinsize2') { array.join.size }
>> end
Rehearsal ---------------------------------------------
mapreduce 0.220000 0.000000 0.220000 ( 0.222946)
mapsum 0.210000 0.000000 0.210000 ( 0.210070)
inject 0.150000 0.000000 0.150000 ( 0.158709)
joinsize 0.120000 0.000000 0.120000 ( 0.116889)
joinsize2 0.070000 0.000000 0.070000 ( 0.071718)
------------------------------------ total: 0.770000sec
user system total real
mapreduce 0.220000 0.000000 0.220000 ( 0.228385)
mapsum 0.210000 0.000000 0.210000 ( 0.207359)
inject 0.160000 0.000000 0.160000 ( 0.156711)
joinsize 0.120000 0.000000 0.120000 ( 0.116652)
joinsize2 0.080000 0.000000 0.080000 ( 0.069612)
so it looks like array.join.size has the lowest runtime
a = ['peter' , 'romeo' , 'bananas', 'pijamas']
count = 0
a.each {|s| count += s.length}
puts count

Why is == faster than eql?

I read in the documentation for the String class that eql? is a strict equality operator, without type conversion, and == is a equality operator which tries to convert second its argument to a String, and, the C source code for this methods confirms that:
The eql? source code:
static VALUE
rb_str_eql(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (TYPE(str2) != T_STRING) return Qfalse;
return str_eql(str1, str2);
}
The == source code:
VALUE
rb_str_equal(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (TYPE(str2) != T_STRING) {
if (!rb_respond_to(str2, rb_intern("to_str"))) {
return Qfalse;
}
return rb_equal(str2, str1);
}
return str_eql(str1, str2);
}
But when I tried to benchmark these methods, I was suprised that == is faster than eql? by up to 20%!
My benchmark code is:
require "benchmark"
RUN_COUNT = 100000000
first_string = "Woooooha"
second_string = "Woooooha"
time = Benchmark.measure do
RUN_COUNT.times do |i|
first_string.eql?(second_string)
end
end
puts time
time = Benchmark.measure do
RUN_COUNT.times do |i|
first_string == second_string
end
end
puts time
And results:
Ruby 1.9.3-p125:
26.420000 0.250000 26.670000 ( 26.820762)
21.520000 0.200000 21.720000 ( 21.843723)
Ruby 1.9.2-p290:
25.930000 0.280000 26.210000 ( 26.318998)
19.800000 0.130000 19.930000 ( 19.991929)
So, can anyone explain why the more simple eql? method is slower than == method in the case when I run it for two similar strings?
The reason you are seeing a difference is not related to the implementation of == vs eql? but is due to the fact that Ruby optimizes operators (like ==) to avoid going through the normal method lookup when possible.
We can verify this in two ways:
Create an alias for == and call that instead. You'll get similar results to eql? and thus slower results than ==.
Compare using send :== and send :eql? instead and you'll get similar timings; the speed difference disappears because Ruby will only use the optimization for direct calls to the operators, not with using send or __send__.
Here's code that shows both:
require 'fruity'
first = "Woooooha"
second = "Woooooha"
class String
alias same_value? ==
end
compare do
with_operator { first == second }
with_same_value { first.same_value? second }
with_eql { first.eql? second }
end
compare do
with_send_op { first.send :==, second }
with_send_eql { first.send :eql?, second }
end
Results:
with_operator is faster than with_same_value by 2x ± 0.1
with_same_value is similar to with_eql
with_send_eql is similar to with_send_op
If you're the curious, the optimizations for operators are in insns.def.
Note: this answer applies only to Ruby MRI, I would be surprised if there was a speed difference in JRuby / rubinius, for instance.
When doing benchmarks, don't use times, because that creates a closure RUN_COUNT times. The extra time taken as a result affects all benchmarks equally in absolute terms, but that makes it harder to notice a relative difference:
require "benchmark"
RUN_COUNT = 10_000_000
FIRST_STRING = "Woooooha"
SECOND_STRING = "Woooooha"
def times_eq_question_mark
RUN_COUNT.times do |i|
FIRST_STRING.eql?(SECOND_STRING)
end
end
def times_double_equal_sign
RUN_COUNT.times do |i|
FIRST_STRING == SECOND_STRING
end
end
def loop_eq_question_mark
i = 0
while i < RUN_COUNT
FIRST_STRING.eql?(SECOND_STRING)
i += 1
end
end
def loop_double_equal_sign
i = 0
while i < RUN_COUNT
FIRST_STRING == SECOND_STRING
i += 1
end
end
1.upto(10) do |i|
method_names = [:times_eq_question_mark, :times_double_equal_sign, :loop_eq_question_mark, :loop_double_equal_sign]
method_times = method_names.map {|method_name| Benchmark.measure { send(method_name) } }
puts "Run #{i}"
method_names.zip(method_times).each do |method_name, method_time|
puts [method_name, method_time].join("\t")
end
puts
end
gives
Run 1
times_eq_question_mark 3.500000 0.000000 3.500000 ( 3.578011)
times_double_equal_sign 2.390000 0.000000 2.390000 ( 2.453046)
loop_eq_question_mark 3.110000 0.000000 3.110000 ( 3.140525)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.124932)
Run 2
times_eq_question_mark 3.531000 0.000000 3.531000 ( 3.562386)
times_double_equal_sign 2.469000 0.000000 2.469000 ( 2.484295)
loop_eq_question_mark 3.063000 0.000000 3.063000 ( 3.109276)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.140556)
Run 3
times_eq_question_mark 3.547000 0.000000 3.547000 ( 3.593635)
times_double_equal_sign 2.437000 0.000000 2.437000 ( 2.453047)
loop_eq_question_mark 3.063000 0.000000 3.063000 ( 3.109275)
loop_double_equal_sign 2.140000 0.000000 2.140000 ( 2.140557)
Run 4
times_eq_question_mark 3.547000 0.000000 3.547000 ( 3.578011)
times_double_equal_sign 2.422000 0.000000 2.422000 ( 2.437422)
loop_eq_question_mark 3.094000 0.000000 3.094000 ( 3.140524)
loop_double_equal_sign 2.140000 0.000000 2.140000 ( 2.140557)
Run 5
times_eq_question_mark 3.578000 0.000000 3.578000 ( 3.671758)
times_double_equal_sign 2.406000 0.000000 2.406000 ( 2.468671)
loop_eq_question_mark 3.110000 0.000000 3.110000 ( 3.156149)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.156181)
Run 6
times_eq_question_mark 3.562000 0.000000 3.562000 ( 3.562386)
times_double_equal_sign 2.407000 0.000000 2.407000 ( 2.468671)
loop_eq_question_mark 3.109000 0.000000 3.109000 ( 3.124900)
loop_double_equal_sign 2.125000 0.000000 2.125000 ( 2.234303)
Run 7
times_eq_question_mark 3.500000 0.000000 3.500000 ( 3.546762)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.468671)
loop_eq_question_mark 3.031000 0.000000 3.031000 ( 3.171773)
loop_double_equal_sign 2.157000 0.000000 2.157000 ( 2.156181)
Run 8
times_eq_question_mark 3.468000 0.000000 3.468000 ( 3.656133)
times_double_equal_sign 2.454000 0.000000 2.454000 ( 2.484296)
loop_eq_question_mark 3.093000 0.000000 3.093000 ( 3.249896)
loop_double_equal_sign 2.125000 0.000000 2.125000 ( 2.140556)
Run 9
times_eq_question_mark 3.563000 0.000000 3.563000 ( 3.593635)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.453047)
loop_eq_question_mark 3.125000 0.000000 3.125000 ( 3.124900)
loop_double_equal_sign 2.141000 0.000000 2.141000 ( 2.156181)
Run 10
times_eq_question_mark 3.515000 0.000000 3.515000 ( 3.562386)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.453046)
loop_eq_question_mark 3.094000 0.000000 3.094000 ( 3.140525)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.156181)
equal? is reference equality
== is value equality
eql? is value and type equality
The third method, eql? is normally used to test if two objects have the same value as well as the same type. For example:
puts "integer == to float: #{25 == 25.0}"
puts "integer eql? to float: #{25.eql? 25.0}"
gives:
Does integer == to float: true
Does integer eql? to float: false
So I thought since eql? does more checking it would be slower, and for strings it is, at least on my Ruby 1.93. So I figured it must be type dependent and did some tests.
When integer and floats are compared eql? is a bit faster. When integers are compared == is much faster, until x2. Wrong theory, back to start.
The next theory: comparing two values of the same type will be faster with one of both proved to be true, in the case they are of the same type == is always faster, eql? is faster when types are different, again until x2.
Don't have the time to compare all types but I'm sure you'll get varying results, although the same kind of comparison always gives similar results. Can somebody prove me wrong?
Here are my results from the test of the OP:
16.863000 0.000000 16.863000 ( 16.903000) 2 strings with eql?
14.212000 0.000000 14.212000 ( 14.334600) 2 strings with ==
13.213000 0.000000 13.213000 ( 13.245600) integer and floating with eql?
14.103000 0.000000 14.103000 ( 14.200400) integer and floating with ==
13.229000 0.000000 13.229000 ( 13.410800) 2 same integers with eql?
9.406000 0.000000 9.406000 ( 9.410000) 2 same integers with ==
19.625000 0.000000 19.625000 ( 19.720800) 2 different integers with eql?
9.407000 0.000000 9.407000 ( 9.405800) 2 different integers with ==
21.825000 0.000000 21.825000 ( 21.910200) integer with string with eql?
43.836000 0.031000 43.867000 ( 44.074200) integer with string with ==

How do I see if one string contains a keyword in a keyword list?

I want to see if one string contains a keyword in a keyword list.
I have the following function:
def needfilter?(src)
["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].each do |kw|
return true if src.include?(kw)
end
false
end
Can this code block be simplified in to one line sentence?
I know it can be simplified to:
def needfilter?(src)
!["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].select{|c| src.include?(c)}.empty?
end
But this approach is not so efficient if the keyword array list is very long.
Looks like a nice use case for Enumerable#any? method:
def needfilter?(src)
["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].any? do |kw|
src.include? kw
end
end
I was curious what's the fastest solution and I created a benchmark of all answers up to now.
I modified steenslag answer a bit. For tuning reasons I create the regexp only once not for each test.
require 'benchmark'
KEYWORDS = ["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"]
TESTSTRINGS = ['xx', 'xxx', "keyowrd_2"]
N = 10_000 #Number of Test loops
def needfilter_orig?(src)
["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].each do |kw|
return true if src.include?(kw)
end
false
end
def needfilter_orig2?(src)
!["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].select{|c| src.include?(c)}.empty?
end
def needfilter_any?(src)
["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"].any? do |kw|
src.include? kw
end
end
def needfilter_regexp?(src)
!!(src =~ Regexp.union(KEYWORDS))
end
def needfilter_regexp_init?(src)
!!(src =~ $KEYWORDS_regexp)
end
def needfilter_split?(src)
(src.split(/ /) & KEYWORDS).empty?
end
Benchmark.bmbm(10) {|b|
b.report('orig') { N.times { TESTSTRINGS.each{|src| needfilter_orig?(src)} } }
b.report('orig2') { N.times { TESTSTRINGS.each{|src| needfilter_orig2?(src) } } }
b.report('any') { N.times { TESTSTRINGS.each{|src| needfilter_any?(src) } } }
b.report('regexp') { N.times { TESTSTRINGS.each{|src| needfilter_regexp?(src) } } }
b.report('regexp_init') {
$KEYWORDS_regexp = Regexp.union(KEYWORDS) # Initialize once
N.times { TESTSTRINGS.each{|src| needfilter_regexp_init?(src) } }
}
b.report('split') { N.times { TESTSTRINGS.each{|src| needfilter_split?(src) } } }
} #Benchmark
Result:
Rehearsal -----------------------------------------------
orig 0.094000 0.000000 0.094000 ( 0.093750)
orig2 0.093000 0.000000 0.093000 ( 0.093750)
any 0.110000 0.000000 0.110000 ( 0.109375)
regexp 0.578000 0.000000 0.578000 ( 0.578125)
regexp_init 0.047000 0.000000 0.047000 ( 0.046875)
split 0.125000 0.000000 0.125000 ( 0.125000)
-------------------------------------- total: 1.047000sec
user system total real
orig 0.078000 0.000000 0.078000 ( 0.078125)
orig2 0.109000 0.000000 0.109000 ( 0.109375)
any 0.078000 0.000000 0.078000 ( 0.078125)
regexp 0.579000 0.000000 0.579000 ( 0.578125)
regexp_init 0.046000 0.000000 0.046000 ( 0.046875)
split 0.125000 0.000000 0.125000 ( 0.125000)
The solution with regular expressions is the fastest, if you create the regexp only once.
def need_filter?(src)
!!(src =~ /keyowrd_1|keyowrd_2|keyowrd_3|keyowrd_4|keyowrd_5/)
end
The =~ method returns a fixnum or nil. The double bang converts that to a boolean.
This is the way I'd do it:
def needfilter?(src)
keywords = Regexp.union("keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5")
!!(src =~ keywords)
end
This solution has:
No iteration
Single regexp using Regexp.union
Should be fast for even a large set of keywords. Note that hardcoding the keywords in the method is not ideal, but I'm assuming that was just for the sake of the example.
I think that
def need_filter?(src)
(src.split(/ /) & ["keyowrd_1","keyowrd_2","keyowrd_3","keyowrd_4","keyowrd_5"]).empty?
end
will work as you expect (as it's described in Array include any value from another array?) and will be faster than any? and include?.

Delete specific parts of a String in Ruby

I have a String str = "abcdefghij", and I want to set str2 to str minus the 4th to 6th character (assuming 0 based index).
Is it possible to do this in one go? slice! seems to do it, but it requires atleast 3 statements (duplicating, slicing, and then using the string).
A common way is to do it like this:
str = "abcdefghij"
str2 = str.dup
str2[4..6] = ''
# => "abcdhij"
but it still requires two steps.
If the range you want is continuous, then you can do it in one step
str2 = str[2..5]
# => "cdef"
Depending on what exactly you're deleting, http://ruby-doc.org/core/classes/String.html#M001201 might be an option.
You could probably do obscene things with regexes:
"abcdefghij".sub(/(.{4}).{2}/) { $1 }
But that's gross.
I went ahead with using the following:
str = "abcdefghij"
str2 = str[0, 4] + str[7..-1]
It turned out to be faster and cleaner than the other solutions presented. Here's a mini benchmark.
require 'benchmark'
str = "abcdefghij"
times = 1_000_000
Benchmark.bmbm do |bm|
bm.report("1 step") do
times.times do
str2 = str[0, 4] + str[7..-1]
end
end
bm.report("3 steps") do
times.times do
str2 = str.dup
str2[4..6] = ''
str2
end
end
end
Output on Ruby 1.9.2
Rehearsal -------------------------------------------
1 step 0.950000 0.010000 0.960000 ( 0.955288)
3 steps 1.250000 0.000000 1.250000 ( 1.250415)
---------------------------------- total: 2.210000sec
user system total real
1 step 0.960000 0.000000 0.960000 ( 0.950541)
3 steps 1.250000 0.010000 1.260000 ( 1.254416)
Edit: Update for <<.
Script:
require 'benchmark'
str = "abcdefghij"
times = 1_000_000
Benchmark.bmbm do |bm|
bm.report("1 step") do
times.times do
str2 = str[0, 4] + str[7..-1]
end
end
bm.report("3 steps") do
times.times do
str2 = str.dup
str2[4..6] = ''
str2
end
end
bm.report("1 step using <<") do
times.times do
str2 = str[0, 4] << str[7..-1]
end
end
end
Output on Ruby 1.9.2
Rehearsal ---------------------------------------------------
1 step 0.980000 0.010000 0.990000 ( 0.979944)
3 steps 1.270000 0.000000 1.270000 ( 1.265495)
1 step using << 0.910000 0.010000 0.920000 ( 0.909705)
------------------------------------------ total: 3.180000sec
user system total real
1 step 0.980000 0.000000 0.980000 ( 0.985154)
3 steps 1.280000 0.000000 1.280000 ( 1.281310)
1 step using << 0.930000 0.000000 0.930000 ( 0.916511)

Resources