String#split strange behavior [duplicate] - ruby

This question already has answers here:
How do I avoid trailing empty items being removed when splitting strings?
(1 answer)
Empty strings at the beginning and end of split [duplicate]
(2 answers)
Closed 9 years ago.
I observed a strange behavior of the split method on a String.
"1..2".split('..') # => ['1', '2']
"1..2".split('..', 2) # => ['1', '2']
"..2".split('..') # => ['', '2']
"..2".split('..', 2) # => ['', '2']
Everything like expected, but now:
"1..".split('..') # => ['1']
"1..".split('..', 2) # => ['1', '']
I would expect the first to return the same that the second.
Does anyone have a good explanation, why "1..".split('..') returns an array with just one element? Or is it an inconsistency in Ruby? What do you think about that?

According to the Ruby String documentation for split:
If the limit parameter is omitted, trailing null fields are suppressed.
Regarding the limit parameter, the Ruby documentation isn't totally complete. Here is a little more detail:
If limit is positive, split returns at most that number of fields. The last element of the returned array is the "rest of the string", or a single null string ("") if there are fewer fields than limit and there's a trailing delimiter in the original string.
Examples:
"2_3_4_".split('_',3)
=> ["2", "3", "4_"]
"2_3_4_".split('_',4)
=> ["2", "3", "4", ""]
If limit is zero [not mentioned in the documentation], split appears to return all of the parsed fields, and no trailing null string ("") element if there is a trailing delimiter in the original string. I.e., it behaves as if limit were not present. (It may be implemented as a default value.)
Example:
"2_3_4_".split('_',0)
=> ["2", "3", "4"]
If limit is negative, split returns all of the parsed fields and a trailing null string element if there is a trailing delimiter in the original string.
Example:
"2_3_4".split('_',-2)
=> ["2", "3", "4"]
"2_3_4".split('_',-5)
=> ["2", "3", "4"]
"2_3_4_".split('_',-2)
=> ["2", "3", "4", ""]
"2_3_4_".split('_',-5)
=> ["2", "3", "4", ""]
It would seem that something a little more useful or interesting could have been done with the negative limit.

Related

Convert array string objects into hashes?

I have an array:
a = ["us.production => 1", "us.stats => 1", "us.stats.total_active => 1", "us.stats.inactive => 0"]
How can I modify it into a hash object? e.g.:
h = {"us.production" => 1, "us.stats" => 1, "us.stats.total_active" => 1, "us.stats.inactive" => 0}
Thank you,
If pattern you are having is proper and constant, you can try following,
h = a.map { |x| x.split(' => ') }.to_h
# => {"us.production"=>"1", "us.stats"=>"1", "us.stats.total_active"=>"1", "us.stats.inactive"=>"0"}
Instead it is better to use split(/\s*=>\s*/) instead of split(' => ')
You can split every string with String#split and then convert an array of pairs into a hash with Array#to_h:
a = ["us.production => 1", "us.stats => 1", "us.stats.total_active => 1", "us.stats.inactive => 0"]
pairs = a.map{|s| s.split(/\s*=>\s*/)}
# => [["us.production", "1"], ["us.stats", "1"], ["us.stats.total_active", "1"], ["us.stats.inactive", "0"]]
pairs.to_h
# => {"us.production"=>"1", "us.stats"=>"1", "us.stats.total_active"=>"1", "us.stats.inactive"=>"0"}
/\s*=>\s*/ is a regular expression that matches first any number of whitespaces with \s*, then => then again any nuber of whitespaces. As it's a String#split delimiter, this part of string won't be present in a string pair.
The other answers to date are incorrect because they leave the values as strings, whereas the spec is that they be integers. That can be easily corrected. One way is to change s.split(/\s*=>\s*/) in #mrzasa's answer to k,v = s.split(/\s*=>\s*/); [k,v.to_i]. Another way is to tack .transform_values(&:to_i) to the ends of the expressions given in those answers. I expect the authors of those answers either didn't notice that integers were required or intended to leave it as an exercise for the OP to do the (rather uninteresting) conversion.
To make a single pass through the array and avoid the creation of a temporary array and local variables (other than block variables), I suggest using Enumerable#each_with_object (rather than map and to_h), and use regular expressions to extract both keys and values (rather than using String#split):
a = ["us.production => 1", "us.stats=>1", "us.stats.total_active => 1"]
a.each_with_object({}) { |s,h| h[s[/.*[^ ](?= *=>)/]] = s[/\d+\z/].to_i }
#=> {"us.production"=>1, "us.stats"=>1, "us.stats.total_active"=>1}
The first regular expression reads, "match zero or more characters (.*) followed by a character that is not a space ([^ ]), provided that is followed by zero or more spaces (*) followed by the string "=>". (?= *=>) is a positive lookahead.
The second regular expression reads, "match one or more digits (\d+) at the end of the string (the anchor \z). If that string could represent a negative integer, change that regex to /-?\d+\z/ (? makes the minus sign optional).

Remove empty value before converting string to array

ids = "1,4,5,"
ids.split(',') => ["1", " 4", " 5", " "]
ids.split(',').map(&:to_i) => [1, 4, 5, 0]
How do I remove that empty value before it becomes a zero?
You can use #scan also
ids = "1,4,5,"
ids.scan(/\d+/).map(&:to_i)
# => [1, 4, 5]
This doesn't happen in Ruby 2.2+:
ids = "1,4,5,"
ids.split(',')
# => ["1", "4", "5"]
RUBY_VERSION # => "2.2.0"
The simple thing to do is run a preflight check on your data and normalize it to what it's supposed to be, BEFORE trying to process it:
ids = "1,4,5,"
ids.chop! if ids[-1] == ','
ids # => "1,4,5"
ids.split(',')
# => ["1", "4", "5"]
You could be a bit more rigorous in the test, since the end of the line might also contain whitespace which would throw off the cleanup.
Also, you're dealing with comma-delimited data, so consider using the built in CSV class, which is designed to work with such strings.

Regex to catch groups of same digits in Ruby

Is it possible to catch all grous of same digits in string with regex on Ruby? I'm not familiar with regex.
I mean: regex on "1112234444" will produce ["111", "22", "3", "4444"]
I know, I can use (\d)(\1*), but it only gives me 2 groups in each match. ["1", "11"], ["2", "2"], ["3", -], ["4", "444"]
How can I get 1 group in each match? Thanks.
Here, give this a shot:
((\d)\2*)
You can use this regex
((\d)\2*)
group 1 catches your required value
My first quick answer was rightfully criticized for having no explanation for the code. So here's another one, better in all respects ;-)
We exploit the fact that the elements whose runs we want are digits and they are easy to enumerate by hand. So we construct a readable regex which means "a run of zeros, or a run of ones, ... or a run of nines". And we use the right method for the job, String#scan:
irb> "1112234444".scan(/0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/)
=> ["111", "22", "3", "4444"]
For the record, here's my original answer:
irb> s = "1112234444"
=> "1112234444"
irb> rx = /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
=> /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
irb> s.split(rx).reject(&:empty?)
=> ["111", "22", "3", "4444"]

Break up variable into array in Ruby

I'd like to take a variable that I have, and turn it into an array separated by the character of my choosing. In the example below, that separator is %
dump = "1%2%3%apple%car%yellow"
into
Array= [1,2,3,apple,car,yellow]
Use String#split:
"1%2%3%apple%car%yellow".split('%')
# => ["1", "2", "3", "apple", "car", "yellow"]
(Note that every element of the returned array is a string, even the ones containing digits.)
From the docs:
split (pattern=$;, [limit]) → anArray
Divides
str into substrings based on a delimiter, returning an array of these
substrings.
You can pass a string like above ('%'), or a regular expression.

Match sequences of consecutive characters in a string

I have the string "111221" and want to match all sets of consecutive equal integers: ["111", "22", "1"].
I know that there is a special regex thingy to do that but I can't remember and I'm terrible at Googling.
Using regex in Ruby 1.8.7+:
p s.scan(/((\d)\2*)/).map(&:first)
#=> ["111", "22", "1"]
This works because (\d) captures any digit, and then \2* captures zero-or-more of whatever that group (the second opening parenthesis) matched. The outer (…) is needed to capture the entire match as a result in scan. Finally, scan alone returns:
[["111", "1"], ["22", "2"], ["1", "1"]]
…so we need to run through and keep just the first item in each array. In Ruby 1.8.6+ (which doesn't have Symbol#to_proc for convenience):
p s.scan(/((\d)\2*)/).map{ |x| x.first }
#=> ["111", "22", "1"]
With no Regex, here's a fun one (matching any char) that works in Ruby 1.9.2:
p s.chars.chunk{|c|c}.map{ |n,a| a.join }
#=> ["111", "22", "1"]
Here's another version that should work even in Ruby 1.8.6:
p s.scan(/./).inject([]){|a,c| (a.last && a.last[0]==c[0] ? a.last : a)<<c; a }
# => ["111", "22", "1"]
"111221".gsub(/(.)(\1)*/).to_a
#=> ["111", "22", "1"]
This uses the form of String#gsub that does not have a block and therefore returns an enumerator. It appears gsub was bestowed with that option in v2.0.
I found that this works, it first matches each character in one group, and then it matches any of the same character after it. This results in an array of two element arrays, with the first element of each array being the initial match, and then the second element being any additional repeated characters that match the first character. These arrays are joined back together to get an array of repeated characters:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
repeated_chars = input.scan(/(.)(\1*)/)
# => [["W", "W"], ["B", ""], ["W", "WWW"], ["B", "BB"], ["W", "WWWWWW"], ["B", ""], ["3", "333"], ["!", "!!!"]]
repeated_chars.map(&:join)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
As an alternative I found that I could create a new Regexp object to match one or more occurrences of each unique characters in the input string as follows:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
regexp = Regexp.new("#{input.chars.uniq.join("+|")}+")
#=> regexp created for this example will look like: /W+|B+|3+|!+/
and then use that Regex object as an argument for scan to split out all the repeated characters, as follows:
input.scan(regexp)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
you can try is
string str ="111221";
string pattern =#"(\d)(\1)+";
Hope can help you

Resources