String concatenation in Ruby - ruby

I am looking for a more elegant way of concatenating strings in Ruby.
I have the following line:
source = "#{ROOT_DIR}/" << project << "/App.config"
Is there a nicer way of doing this?
And for that matter what is the difference between << and +?

You can do that in several ways:
As you shown with << but that is not the usual way
With string interpolation
source = "#{ROOT_DIR}/#{project}/App.config"
with +
source = "#{ROOT_DIR}/" + project + "/App.config"
The second method seems to be more efficient in term of memory/speed from what I've seen (not measured though). All three methods will throw an uninitialized constant error when ROOT_DIR is nil.
When dealing with pathnames, you may want to use File.join to avoid messing up with pathname separator.
In the end, it is a matter of taste.

The + operator is the normal concatenation choice, and is probably the fastest way to concatenate strings.
The difference between + and << is that << changes the object on its left hand side, and + doesn't.
irb(main):001:0> s = 'a'
=> "a"
irb(main):002:0> s + 'b'
=> "ab"
irb(main):003:0> s
=> "a"
irb(main):004:0> s << 'b'
=> "ab"
irb(main):005:0> s
=> "ab"

If you are just concatenating paths you can use Ruby's own File.join method.
source = File.join(ROOT_DIR, project, 'App.config')

from http://greyblake.com/blog/2012/09/02/ruby-perfomance-tricks/
Using << aka concat is far more efficient than +=, as the latter creates a temporal object and overrides the first object with the new object.
require 'benchmark'
N = 1000
BASIC_LENGTH = 10
5.times do |factor|
length = BASIC_LENGTH * (10 ** factor)
puts "_" * 60 + "\nLENGTH: #{length}"
Benchmark.bm(10, '+= VS <<') do |x|
concat_report = x.report("+=") do
str1 = ""
str2 = "s" * length
N.times { str1 += str2 }
end
modify_report = x.report("<<") do
str1 = "s"
str2 = "s" * length
N.times { str1 << str2 }
end
[concat_report / modify_report]
end
end
output:
____________________________________________________________
LENGTH: 10
user system total real
+= 0.000000 0.000000 0.000000 ( 0.004671)
<< 0.000000 0.000000 0.000000 ( 0.000176)
+= VS << NaN NaN NaN ( 26.508796)
____________________________________________________________
LENGTH: 100
user system total real
+= 0.020000 0.000000 0.020000 ( 0.022995)
<< 0.000000 0.000000 0.000000 ( 0.000226)
+= VS << Inf NaN NaN (101.845829)
____________________________________________________________
LENGTH: 1000
user system total real
+= 0.270000 0.120000 0.390000 ( 0.390888)
<< 0.000000 0.000000 0.000000 ( 0.001730)
+= VS << Inf Inf NaN (225.920077)
____________________________________________________________
LENGTH: 10000
user system total real
+= 3.660000 1.570000 5.230000 ( 5.233861)
<< 0.000000 0.010000 0.010000 ( 0.015099)
+= VS << Inf 157.000000 NaN (346.629692)
____________________________________________________________
LENGTH: 100000
user system total real
+= 31.270000 16.990000 48.260000 ( 48.328511)
<< 0.050000 0.050000 0.100000 ( 0.105993)
+= VS << 625.400000 339.800000 NaN (455.961373)

Since this is a path I'd probably use array and join:
source = [ROOT_DIR, project, 'App.config'] * '/'

Here's another benchmark inspired by this gist. It compares concatenation (+), appending (<<) and interpolation (#{}) for dynamic and predefined strings.
require 'benchmark'
# we will need the CAPTION and FORMAT constants:
include Benchmark
count = 100_000
puts "Dynamic strings"
Benchmark.benchmark(CAPTION, 7, FORMAT) do |bm|
bm.report("concat") { count.times { 11.to_s + '/' + 12.to_s } }
bm.report("append") { count.times { 11.to_s << '/' << 12.to_s } }
bm.report("interp") { count.times { "#{11}/#{12}" } }
end
puts "\nPredefined strings"
s11 = "11"
s12 = "12"
Benchmark.benchmark(CAPTION, 7, FORMAT) do |bm|
bm.report("concat") { count.times { s11 + '/' + s12 } }
bm.report("append") { count.times { s11 << '/' << s12 } }
bm.report("interp") { count.times { "#{s11}/#{s12}" } }
end
output:
Dynamic strings
user system total real
concat 0.050000 0.000000 0.050000 ( 0.047770)
append 0.040000 0.000000 0.040000 ( 0.042724)
interp 0.050000 0.000000 0.050000 ( 0.051736)
Predefined strings
user system total real
concat 0.030000 0.000000 0.030000 ( 0.024888)
append 0.020000 0.000000 0.020000 ( 0.023373)
interp 3.160000 0.160000 3.320000 ( 3.311253)
Conclusion: interpolation in MRI is heavy.

I'd prefer using Pathname:
require 'pathname' # pathname is in stdlib
Pathname(ROOT_DIR) + project + 'App.config'
about << and + from ruby docs:
+: Returns a new String containing other_str concatenated to str
<<: Concatenates the given object to str. If the object is a Fixnum between 0 and 255, it is converted to a character before concatenation.
so difference is in what becomes to first operand (<< makes changes in place, + returns new string so it is memory heavier) and what will be if first operand is Fixnum (<< will add as if it was character with code equal to that number, + will raise error)

Let me show to you all my experience with that.
I had an query that returned 32k of records, for each record I called a method to format that database record into a formated string and than concatenate that into a String that at the end of all this process wil turn into a file in disk.
My problem was that by the record goes, around 24k, the process of concatenating the String turned on a pain.
I was doing that using the regular '+' operator.
When I changed to the '<<' was like magic. Was really fast.
So, I remembered my old times - sort of 1998 - when I was using Java and concatenating String using '+' and changed from String to StringBuffer (and now we, Java developer have the StringBuilder).
I believe that the process of + / << in Ruby world is the same as + / StringBuilder.append in the Java world.
The first reallocate the entire object in memory and the other just point to a new address.

Concatenation you say? How about #concat method then?
a = 'foo'
a.object_id #=> some number
a.concat 'bar' #=> foobar
a.object_id #=> same as before -- string a remains the same object
In all fairness, concat is aliased as <<.

Here are more ways to do this:
"String1" + "String2"
"#{String1} #{String2}"
String1<<String2
And so on ...

You can also use % as follows:
source = "#{ROOT_DIR}/%s/App.config" % project
This approach works with ' (single) quotation mark as well.

You can concatenate in string definition directly:
nombre_apellido = "#{customer['first_name']} #{customer['last_name']} #{order_id}"

You may use + or << operator, but in ruby .concat function is the most preferable one, as it is much faster than other operators. You can use it like.
source = "#{ROOT_DIR}/".concat(project.concat("/App.config"))

Situation matters, for example:
# this will not work
output = ''
Users.all.each do |user|
output + "#{user.email}\n"
end
# the output will be ''
puts output
# this will do the job
output = ''
Users.all.each do |user|
output << "#{user.email}\n"
end
# will get the desired output
puts output
In the first example, concatenating with + operator will not update the output object,however, in the second example, the << operator will update the output object with each iteration. So, for the above type of situation, << is better.

For your particular case you could also use Array#join when constructing file path type of string:
string = [ROOT_DIR, project, 'App.config'].join('/')]
This has a pleasant side effect of automatically converting different types to string:
['foo', :bar, 1].join('/')
=>"foo/bar/1"

For Puppet:
$username = 'lala'
notify { "Hello ${username.capitalize}":
withpath => false,
}

Related

Without Converting to a String, How Many Digits Does a Fixnum Have?

I want find the length of a Fixnum, num, without converting it into a String.
In other words, how many digits are in num without calling the .to_s() method:
num.to_s.length
puts Math.log10(1234).to_i + 1 # => 4
You could add it to Fixnum like this:
class Fixnum
def num_digits
Math.log10(self).to_i + 1
end
end
puts 1234.num_digits # => 4
Ruby 2.4 has an Integer#digits method, which return an Array containing the digits.
num = 123456
num.digits
# => [6, 5, 4, 3, 2, 1]
num.digits.count
# => 6
EDIT:
To handle negative numbers (thanks #MatzFan), use the absolute value. Integer#abs
-123456.abs.digits
# => [6, 5, 4, 3, 2, 1]
Sidenote for Ruby 2.4+
I ran some benchmarks on the different solutions, and Math.log10(x).to_i + 1 is actually a lot faster than x.to_s.length. The comment from #Wayne Conrad is out of date. The new solution with digits.count is trailing far behind, especially with larger numbers:
with_10_digits = 2_040_240_420
print Benchmark.measure { 1_000_000.times { Math.log10(with_10_digits).to_i + 1 } }
# => 0.100000 0.000000 0.100000 ( 0.109846)
print Benchmark.measure { 1_000_000.times { with_10_digits.to_s.length } }
# => 0.360000 0.000000 0.360000 ( 0.362604)
print Benchmark.measure { 1_000_000.times { with_10_digits.digits.count } }
# => 0.690000 0.020000 0.710000 ( 0.717554)
with_42_digits = 750_325_442_042_020_572_057_420_745_037_450_237_570_322
print Benchmark.measure { 1_000_000.times { Math.log10(with_42_digits).to_i + 1 } }
# => 0.140000 0.000000 0.140000 ( 0.142757)
print Benchmark.measure { 1_000_000.times { with_42_digits.to_s.length } }
# => 1.180000 0.000000 1.180000 ( 1.186603)
print Benchmark.measure { 1_000_000.times { with_42_digits.digits.count } }
# => 8.480000 0.040000 8.520000 ( 8.577174)
Although the top-voted loop is nice, it isn't very Ruby and will be slow for large numbers, the .to_s is a built-in function and therefore will be much faster. ALMOST universally built-in functions will be far faster than constructed loops or iterators.
Another way:
def ndigits(n)
n=n.abs
(1..1.0/0).each { |i| return i if (n /= 10).zero? }
end
ndigits(1234) # => 4
ndigits(0) # => 1
ndigits(-123) # => 3
If you don't want to use regex, you can use this method:
def self.is_number(string_to_test)
is_number = false
# use to_f to handle float value and to_i for int
string_to_compare = string_to_test.to_i.to_s
string_to_compare_handle_end = string_to_test.to_i
# string has to be the same
if(string_to_compare == string_to_test)
is_number = true
end
# length for fixnum in ruby
size = Math.log10(string_to_compare_handle_end).to_i + 1
# size has to be the same
if(size != string_to_test.length)
is_number = false
end
is_number
end
You don't have to get fancy, you could do as simple as this.
def l(input)
output = 1
while input - (10**output) > 0
output += 1
end
return output
end
puts l(456)
It can be a solution to find out the length/count/size of a fixnum.
irb(main):004:0> x = 2021
=> 2021
irb(main):005:0> puts x.to_s.length
4
=> nil
irb(main):006:0>

How do I test if a string contains two or more vowels in ruby?

How do I test if a string contains two or more vowels?
I have the following code, but it only tests 2 vowels adjacent to each other. I just want to know if the string contains two or more vowels regardless of where they appear in the string.
if /[aeiouy]{2,}/.match(word)
puts word
end
You could use scan which returns an array with all the matches:
if word.scan(/[aeiou]/).count >= 2
puts word
end
You could use something like:
/[aeiouy].*?[aeiouy]/
First some questions:
What is a vowel? In your example you have y. In my eyes, y is no vowel. What's about umlauts?
Only small letters or also capitals?
In my example you may adopt the constant VOWELS to your definition.
I think the easiest was is to count the vowels with String#count.
Below an example with three variants a-c.
You wrote about two vowels, not two different vowels. My solutions a+b works only for two vowels, even it is the same one. Variant c works only, if there are at least two different vowels in the word.
VOWELS = 'aeiouyAEIOUY'
%w{
test
teste
testa
}.each{|word|
puts 'a: ' + word if word.count(VOWELS) > 1
puts 'b: ' + word if /[#{VOWELS}].*?[#{VOWELS}]/ =~ word
puts 'c: ' + word if word.scan(/[#{VOWELS}]/).uniq.count > 1
}
I made a benchmark. The count solution is the fastest.
require 'benchmark'
N = 10_000 #Number of Test loops
VOWELS = 'aeiouyAEIOUY'
TESTDATA = %w{
test
teste
testa
}
Benchmark.bmbm(10) {|b|
b.report('count') { N.times { TESTDATA.each{|word| word.count(VOWELS) > 1} } }
b.report('regex') { N.times { TESTDATA.each{|word| /[#{VOWELS}].*?[#{VOWELS}]/ =~ word} } }
b.report('scab') { N.times { TESTDATA.each{|word| word =~ /[#{VOWELS}].*?[#{VOWELS}]/ } } }
b.report('scan/uniq') { N.times { TESTDATA.each{|word| word.scan(/[#{VOWELS}]/).uniq.count > 1 } } }
} #Benchmark
Result:
Rehearsal ---------------------------------------------
count 0.031000 0.000000 0.031000 ( 0.031250)
regex 0.562000 0.000000 0.562000 ( 0.562500)
scab 0.516000 0.000000 0.516000 ( 0.515625)
scan/uniq 0.437000 0.000000 0.437000 ( 0.437500)
------------------------------------ total: 1.546000sec
user system total real
count 0.031000 0.000000 0.031000 ( 0.031250)
regex 0.500000 0.000000 0.500000 ( 0.515625)
scab 0.500000 0.000000 0.500000 ( 0.500000)
scan/uniq 0.422000 0.000000 0.422000 ( 0.437500)

Rabin Karp Implementation too slow in Ruby

I have been working on a small Plagiarism detection engine which uses Idea from MOSS.
I need a Rolling Hash function, I am inspired from Rabin-Karp Algorithm.
Code I wrote -->
#!/usr/bin/env ruby
#Designing a rolling hash function.
#Inspired from the Rabin-Karp Algorithm
module Myth
module Hasher
#Defining a Hash Structure
#A hash is a integer value + line number where the word for this hash existed in the source file
Struct.new('Hash',:value,:line_number)
#For hashing a piece of text we ned two sets of parameters
#k-->For buildinf units of k grams hashes
#q-->Prime which lets calculations stay within range
def calc_hash(text_to_process,k,q)
text_length=text_to_process.length
radix=26
highorder=(radix**(text_length-1))%q
#Individual hashes for k-grams
text_hash=0
#The entire k-grams hashes list for the document
text_hash_string=""
#Preprocessing
for c in 0...k do
text_hash=(radix*text_hash+text_to_process[c].ord)%q
end
text_hash_string << text_hash.to_s << " "
loop=text_length-k
for c in 0..loop do
puts text_hash
text_hash=(radix*(text_hash-text_to_process[c].ord*highorder)+(text_hash[c+k].ord))%q
text_hash_string << text_hash_string << " "
end
end
end
end
I am running it with values -->
calc_hash(text,5,101) where text is a String input.
The code is very slow. Where am I going wrong?
Look at Ruby-Prof, a profiler for Ruby. Use gem install ruby-prof to install it.
Once you have some ideas where the code is lagging, you can use Ruby's Benchmark to try different algorithms to find the fastest.
Nose around on StackOverflow and you'll see lots of places where we'll use Benchmark to test various methods to see which is the fastest. You'll also get an idea of different ways to set up the tests.
For instance, looking at your code, I wasn't sure whether an append, <<, was better than concatenating using + or using string interpolation. Here's the code to test that and the results:
require 'benchmark'
include Benchmark
n = 1_000_000
bm(13) do |x|
x.report("interpolate") { n.times { foo = "foo"; bar = "bar"; "#{foo}#{bar}" } }
x.report("concatenate") { n.times { foo = "foo"; bar = "bar"; foo + bar } }
x.report("append") { n.times { foo = "foo"; bar = "bar"; foo << bar } }
end
ruby test.rb; ruby test.rb
user system total real
interpolate 1.090000 0.000000 1.090000 ( 1.093071)
concatenate 0.860000 0.010000 0.870000 ( 0.865982)
append 0.750000 0.000000 0.750000 ( 0.753016)
user system total real
interpolate 1.080000 0.000000 1.080000 ( 1.085537)
concatenate 0.870000 0.000000 0.870000 ( 0.864697)
append 0.750000 0.000000 0.750000 ( 0.750866)
I was wondering about the effects of using fixed versus variables when appending strings based on #Myth17's comment below:
require 'benchmark'
include Benchmark
n = 1_000_000
bm(13) do |x|
x.report("interpolate") { n.times { foo = "foo"; bar = "bar"; "#{foo}#{bar}" } }
x.report("concatenate") { n.times { foo = "foo"; bar = "bar"; foo + bar } }
x.report("append") { n.times { foo = "foo"; bar = "bar"; foo << bar } }
x.report("append2") { n.times { foo = "foo"; bar = "bar"; "foo" << bar } }
x.report("append3") { n.times { foo = "foo"; bar = "bar"; "foo" << "bar" } }
end
Resulting in:
ruby test.rb;ruby test.rb
user system total real
interpolate 1.330000 0.000000 1.330000 ( 1.326833)
concatenate 1.080000 0.000000 1.080000 ( 1.084989)
append 0.940000 0.010000 0.950000 ( 0.937635)
append2 1.160000 0.000000 1.160000 ( 1.165974)
append3 1.400000 0.000000 1.400000 ( 1.397616)
user system total real
interpolate 1.320000 0.000000 1.320000 ( 1.325286)
concatenate 1.100000 0.000000 1.100000 ( 1.090585)
append 0.930000 0.000000 0.930000 ( 0.936956)
append2 1.160000 0.000000 1.160000 ( 1.157424)
append3 1.390000 0.000000 1.390000 ( 1.392742)
The values are different than my previous test because the code is being run on my laptop.
Appending two variables is faster than when a fixed string is involved because there is overhead; Ruby has to create an intermediate variable and then append to it.
The big lesson here is we can make a more informed decision when we're writing code because we know what runs faster. At the same time, the differences are not very big, since most code isn't running 1,000,000 loops. Your mileage might vary.

Convert Input Value to Integer or Float, as Appropriate Using Ruby

I believe I have a good answer to this issue, but I wanted to make sure ruby-philes didn't have a much better way to do this.
Basically, given an input string, I would like to convert the string to an integer, where appropriate, or a float, where appropriate. Otherwise, just return the string.
I'll post my answer below, but I'd like to know if there is a better way out there.
Ex:
to_f_or_i_or_s("0523.49") #=> 523.49
to_f_or_i_or_s("0000029") #=> 29
to_f_or_i_or_s("kittens") #=> "kittens"
I would avoid using regex whenever possible in Ruby. It's notoriously slow.
def to_f_or_i_or_s(v)
((float = Float(v)) && (float % 1.0 == 0) ? float.to_i : float) rescue v
end
# Proof of Ruby's slow regex
def regex_float_detection(input)
input.match('\.')
end
def math_float_detection(input)
input % 1.0 == 0
end
n = 100_000
Benchmark.bm(30) do |x|
x.report("Regex") { n.times { regex_float_detection("1.1") } }
x.report("Math") { n.times { math_float_detection(1.1) } }
end
# user system total real
# Regex 0.180000 0.000000 0.180000 ( 0.181268)
# Math 0.050000 0.000000 0.050000 ( 0.048692)
A more comprehensive benchmark:
def wattsinabox(input)
input.match('\.').nil? ? Integer(input) : Float(input) rescue input.to_s
end
def jaredonline(input)
((float = Float(input)) && (float % 1.0 == 0) ? float.to_i : float) rescue input
end
def muistooshort(input)
case(input)
when /\A\s*[+-]?\d+\.\d+\z/
input.to_f
when /\A\s*[+-]?\d+(\.\d+)?[eE]\d+\z/
input.to_f
when /\A\s*[+-]?\d+\z/
input.to_i
else
input
end
end
n = 1_000_000
Benchmark.bm(30) do |x|
x.report("wattsinabox") { n.times { wattsinabox("1.1") } }
x.report("jaredonline") { n.times { jaredonline("1.1") } }
x.report("muistooshort") { n.times { muistooshort("1.1") } }
end
# user system total real
# wattsinabox 3.600000 0.020000 3.620000 ( 3.647055)
# jaredonline 1.400000 0.000000 1.400000 ( 1.413660)
# muistooshort 2.790000 0.010000 2.800000 ( 2.803939)
def to_f_or_i_or_s(v)
v.match('\.').nil? ? Integer(v) : Float(v) rescue v.to_s
end
A pile of regexes might be a good idea if you want to handle numbers in scientific notation (which String#to_f does):
def to_f_or_i_or_s(v)
case(v)
when /\A\s*[+-]?\d+\.\d+\z/
v.to_f
when /\A\s*[+-]?\d+(\.\d+)?[eE]\d+\z/
v.to_f
when /\A\s*[+-]?\d+\z/
v.to_i
else
v
end
end
You could mash both to_f cases into one regex if you wanted.
This will, of course, fail when fed '3,14159' in a locale that uses a comma as a decimal separator.
Depends on security requirements.
def to_f_or_i_or_s s
eval(s) rescue s
end
I used this method
def to_f_or_i_or_s(value)
return value if value[/[a-zA-Z]/]
i = value.to_i
f = value.to_f
i == f ? i : f
end
CSV has converters which do this.
require "csv"
strings = ["0523.49", "29","kittens"]
strings.each{|s|p s.parse_csv(converters: :numeric).first}
#523.49
#29
#"kittens"
However for some reason it converts "00029" to a float.

What is the easiest way to remove the first character from a string?

Example:
[12,23,987,43
What is the fastest, most efficient way to remove the "[",
using maybe a chop() but for the first character?
Similar to Pablo's answer above, but a shade cleaner :
str[1..-1]
Will return the array from 1 to the last character.
'Hello World'[1..-1]
=> "ello World"
I kind of favor using something like:
asdf = "[12,23,987,43"
asdf[0] = ''
p asdf
# >> "12,23,987,43"
I'm always looking for the fastest and most readable way of doing things:
require 'benchmark'
N = 1_000_000
puts RUBY_VERSION
STR = "[12,23,987,43"
Benchmark.bm(7) do |b|
b.report('[0]') { N.times { "[12,23,987,43"[0] = '' } }
b.report('sub') { N.times { "[12,23,987,43".sub(/^\[+/, "") } }
b.report('gsub') { N.times { "[12,23,987,43".gsub(/^\[/, "") } }
b.report('[1..-1]') { N.times { "[12,23,987,43"[1..-1] } }
b.report('slice') { N.times { "[12,23,987,43".slice!(0) } }
b.report('length') { N.times { "[12,23,987,43"[1..STR.length] } }
end
Running on my Mac Pro:
1.9.3
user system total real
[0] 0.840000 0.000000 0.840000 ( 0.847496)
sub 1.960000 0.010000 1.970000 ( 1.962767)
gsub 4.350000 0.020000 4.370000 ( 4.372801)
[1..-1] 0.710000 0.000000 0.710000 ( 0.713366)
slice 1.020000 0.000000 1.020000 ( 1.020336)
length 1.160000 0.000000 1.160000 ( 1.157882)
Updating to incorporate one more suggested answer:
require 'benchmark'
N = 1_000_000
class String
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
def first(how_many = 1)
self[0...how_many]
end
def shift(how_many = 1)
shifted = first(how_many)
self.replace self[how_many..-1]
shifted
end
alias_method :shift!, :shift
end
class Array
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
end
puts RUBY_VERSION
STR = "[12,23,987,43"
Benchmark.bm(7) do |b|
b.report('[0]') { N.times { "[12,23,987,43"[0] = '' } }
b.report('sub') { N.times { "[12,23,987,43".sub(/^\[+/, "") } }
b.report('gsub') { N.times { "[12,23,987,43".gsub(/^\[/, "") } }
b.report('[1..-1]') { N.times { "[12,23,987,43"[1..-1] } }
b.report('slice') { N.times { "[12,23,987,43".slice!(0) } }
b.report('length') { N.times { "[12,23,987,43"[1..STR.length] } }
b.report('eat!') { N.times { "[12,23,987,43".eat! } }
b.report('reverse') { N.times { "[12,23,987,43".reverse.chop.reverse } }
end
Which results in:
2.1.2
user system total real
[0] 0.300000 0.000000 0.300000 ( 0.295054)
sub 0.630000 0.000000 0.630000 ( 0.631870)
gsub 2.090000 0.000000 2.090000 ( 2.094368)
[1..-1] 0.230000 0.010000 0.240000 ( 0.232846)
slice 0.320000 0.000000 0.320000 ( 0.320714)
length 0.340000 0.000000 0.340000 ( 0.341918)
eat! 0.460000 0.000000 0.460000 ( 0.452724)
reverse 0.400000 0.000000 0.400000 ( 0.399465)
And another using /^./ to find the first character:
require 'benchmark'
N = 1_000_000
class String
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
def first(how_many = 1)
self[0...how_many]
end
def shift(how_many = 1)
shifted = first(how_many)
self.replace self[how_many..-1]
shifted
end
alias_method :shift!, :shift
end
class Array
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
end
puts RUBY_VERSION
STR = "[12,23,987,43"
Benchmark.bm(7) do |b|
b.report('[0]') { N.times { "[12,23,987,43"[0] = '' } }
b.report('[/^./]') { N.times { "[12,23,987,43"[/^./] = '' } }
b.report('[/^\[/]') { N.times { "[12,23,987,43"[/^\[/] = '' } }
b.report('sub+') { N.times { "[12,23,987,43".sub(/^\[+/, "") } }
b.report('sub') { N.times { "[12,23,987,43".sub(/^\[/, "") } }
b.report('gsub') { N.times { "[12,23,987,43".gsub(/^\[/, "") } }
b.report('[1..-1]') { N.times { "[12,23,987,43"[1..-1] } }
b.report('slice') { N.times { "[12,23,987,43".slice!(0) } }
b.report('length') { N.times { "[12,23,987,43"[1..STR.length] } }
b.report('eat!') { N.times { "[12,23,987,43".eat! } }
b.report('reverse') { N.times { "[12,23,987,43".reverse.chop.reverse } }
end
Which results in:
# >> 2.1.5
# >> user system total real
# >> [0] 0.270000 0.000000 0.270000 ( 0.270165)
# >> [/^./] 0.430000 0.000000 0.430000 ( 0.432417)
# >> [/^\[/] 0.460000 0.000000 0.460000 ( 0.458221)
# >> sub+ 0.590000 0.000000 0.590000 ( 0.590284)
# >> sub 0.590000 0.000000 0.590000 ( 0.596366)
# >> gsub 1.880000 0.010000 1.890000 ( 1.885892)
# >> [1..-1] 0.230000 0.000000 0.230000 ( 0.223045)
# >> slice 0.300000 0.000000 0.300000 ( 0.299175)
# >> length 0.320000 0.000000 0.320000 ( 0.325841)
# >> eat! 0.410000 0.000000 0.410000 ( 0.409306)
# >> reverse 0.390000 0.000000 0.390000 ( 0.393044)
Here's another update on faster hardware and a newer version of Ruby:
2.3.1
user system total real
[0] 0.200000 0.000000 0.200000 ( 0.204307)
[/^./] 0.390000 0.000000 0.390000 ( 0.387527)
[/^\[/] 0.360000 0.000000 0.360000 ( 0.360400)
sub+ 0.490000 0.000000 0.490000 ( 0.492083)
sub 0.480000 0.000000 0.480000 ( 0.487862)
gsub 1.990000 0.000000 1.990000 ( 1.988716)
[1..-1] 0.180000 0.000000 0.180000 ( 0.181673)
slice 0.260000 0.000000 0.260000 ( 0.266371)
length 0.270000 0.000000 0.270000 ( 0.267651)
eat! 0.400000 0.010000 0.410000 ( 0.398093)
reverse 0.340000 0.000000 0.340000 ( 0.344077)
Why is gsub so slow?
After doing a search/replace, gsub has to check for possible additional matches before it can tell if it's finished. sub only does one and finishes. Consider gsub like it's a minimum of two sub calls.
Also, it's important to remember that gsub, and sub can also be handicapped by poorly written regex which match much more slowly than a sub-string search. If possible anchor the regex to get the most speed from it. There are answers here on Stack Overflow demonstrating that so search around if you want more information.
We can use slice to do this:
val = "abc"
=> "abc"
val.slice!(0)
=> "a"
val
=> "bc"
Using slice! we can delete any character by specifying its index.
Ruby 2.5+
As of Ruby 2.5 you can use delete_prefix or delete_prefix! to achieve this in a readable manner.
In this case "[12,23,987,43".delete_prefix("[").
More info here:
Official docs
https://blog.jetbrains.com/ruby/2017/10/10-new-features-in-ruby-2-5/
https://bugs.ruby-lang.org/issues/12694
'invisible'.delete_prefix('in') #=> "visible"
'pink'.delete_prefix('in') #=> "pink"
N.B. you can also use this to remove items from the end of a string with delete_suffix and delete_suffix!
'worked'.delete_suffix('ed') #=> "work"
'medical'.delete_suffix('ed') #=> "medical"
Docs
https://bugs.ruby-lang.org/issues/13665
Edit:
Using the Tin Man's benchmark setup, it looks pretty quick too (under the last two entries delete_p and delete_p!). Doesn't quite pip the previous faves for speed, though is very readable.
2.5.0
user system total real
[0] 0.174766 0.000489 0.175255 ( 0.180207)
[/^./] 0.318038 0.000510 0.318548 ( 0.323679)
[/^\[/] 0.372645 0.001134 0.373779 ( 0.379029)
sub+ 0.460295 0.001510 0.461805 ( 0.467279)
sub 0.498351 0.001534 0.499885 ( 0.505729)
gsub 1.669837 0.005141 1.674978 ( 1.682853)
[1..-1] 0.199840 0.000976 0.200816 ( 0.205889)
slice 0.279661 0.000859 0.280520 ( 0.285661)
length 0.268362 0.000310 0.268672 ( 0.273829)
eat! 0.341715 0.000524 0.342239 ( 0.347097)
reverse 0.335301 0.000588 0.335889 ( 0.340965)
delete_p 0.222297 0.000832 0.223129 ( 0.228455)
delete_p! 0.225798 0.000747 0.226545 ( 0.231745)
I prefer this:
str = "[12,23,987,43"
puts str[1..-1]
>> 12,23,987,43
If you always want to strip leading brackets:
"[12,23,987,43".gsub(/^\[/, "")
If you just want to remove the first character, and you know it won't be in a multibyte character set:
"[12,23,987,43"[1..-1]
or
"[12,23,987,43".slice(1..-1)
Inefficient alternative:
str.reverse.chop.reverse
For example : a = "One Two Three"
1.9.2-p290 > a = "One Two Three"
=> "One Two Three"
1.9.2-p290 > a = a[1..-1]
=> "ne Two Three"
1.9.2-p290 > a = a[1..-1]
=> "e Two Three"
1.9.2-p290 > a = a[1..-1]
=> " Two Three"
1.9.2-p290 > a = a[1..-1]
=> "Two Three"
1.9.2-p290 > a = a[1..-1]
=> "wo Three"
In this way you can remove one by one first character of the string.
Easy way:
str = "[12,23,987,43"
removed = str[1..str.length]
Awesome way:
class String
def reverse_chop()
self[1..self.length]
end
end
"[12,23,987,43".reverse_chop()
(Note: prefer the easy way :) )
Thanks to #the-tin-man for putting together the benchmarks!
Alas, I don't really like any of those solutions. Either they require an extra step to get the result ([0] = '', .strip!) or they aren't very semantic/clear about what's happening ([1..-1]: "Um, a range from 1 to negative 1? Yearg?"), or they are slow or lengthy to write out (.gsub, .length).
What we are attempting is a 'shift' (in Array parlance), but returning the remaining characters, rather than what was shifted off. Let's use our Ruby to make this possible with strings! We can use the speedy bracket operation, but give it a good name, and take an arg to specify how much we want to chomp off the front:
class String
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
end
But there is more we can do with that speedy-but-unwieldy bracket operation. While we are at it, for completeness, let's write a #shift and #first for String (why should Array have all the fun‽‽), taking an arg to specify how many characters we want to remove from the beginning:
class String
def first(how_many = 1)
self[0...how_many]
end
def shift(how_many = 1)
shifted = first(how_many)
self.replace self[how_many..-1]
shifted
end
alias_method :shift!, :shift
end
Ok, now we have a good clear way of pulling characters off the front of a string, with a method that is consistent with Array#first and Array#shift (which really should be a bang method??). And we can easily get the modified string as well with #eat!. Hm, should we share our new eat!ing power with Array? Why not!
class Array
def eat!(how_many = 1)
self.replace self[how_many..-1]
end
end
Now we can:
> str = "[12,23,987,43" #=> "[12,23,987,43"
> str.eat! #=> "12,23,987,43"
> str #=> "12,23,987,43"
> str.eat!(3) #=> "23,987,43"
> str #=> "23,987,43"
> str.first(2) #=> "23"
> str #=> "23,987,43"
> str.shift!(3) #=> "23,"
> str #=> "987,43"
> arr = [1,2,3,4,5] #=> [1, 2, 3, 4, 5]
> arr.eat! #=> [2, 3, 4, 5]
> arr #=> [2, 3, 4, 5]
That's better!
str = "[12,23,987,43"
str[0] = ""
class String
def bye_felicia()
felicia = self.strip[0] #first char, not first space.
self.sub(felicia, '')
end
end
Using regex:
str = 'string'
n = 1 #to remove first n characters
str[/.{#{str.size-n}}\z/] #=> "tring"
I find a nice solution to be str.delete(str[0]) for its readability, though I cannot attest to it's performance.
list = [1,2,3,4]
list.drop(1)
# => [2,3,4]
List drops one or more elements from the start of the array, does not mutate the array, and returns the array itself instead of the dropped element.

Resources