The code '////'.split('/') results in []. While I expected it to be ['', '', '', '', '']. If this is a feature of ruby, why is it designed like so?
You can't split string of delimiters by delimiter.
You should pass limit as second parameter to split function to achieve this behaviour
'////'.split('/',-1)
=>
["", "", "", "", ""]
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed
Investigation of behaviour of split method show that it is result of optimization, it simply crops empty array elements after last match as it is shewn below:
'////'.split('/')
=> []
'//a//'.split('/')
=> ["", "", "a"]
This design provides a convenience for parsing strings with trailing delimiters. For example:
'1␣2␣3␣␣'.split('␣') will now give ['1', '2', '3'] rather than ['1', '2', '3', '', ''].
This feature is just for simplification of workflow.
However, I don't like this feature because it breaks the purity of this method. To achieve the effect above, you just need an extra rstrip('␣') between '1␣2␣3␣␣' and split('␣').
Related
I get this result (notice that the first "" is for the preceding empty match):
"babab".split("b")
# => ["", "a", "a"]
By replacing "a" with an empty string in the input above as follows,
"bbb".split("b")
I expected to get the following result:
["", "", ""]
But in reality, I get:
[]
What is the logic behind this?
Logic is described in the documentation:
If the limit parameter is omitted, trailing null fields are suppressed.
Trailing empty fields are removed, but not leading ones.
If, by any chance, what you were asking is "yeah, but where's the logic in that?", then imagine we're parsing some CSV.
fname,sname,id,email,status
,,1,sergio#example.com,
We want the first two position to remain empty (rather than be removed and have fname become 1 and sname - sergio#example.com).
We care less about trailing empty fields. Removed or kept, they don't shift data.
I have a string "990822". I want to know if the string starts with "99".
I could achieve this by getting the first two characters of the string, then check if this is equal to "99". How do I get the first two characters from a string?
You can use String#start_with?:
"990822".start_with?("99") #=> true
Consider using the method start_with?.
s = "990822"
=> "990822"
s.start_with? "99"
=> true
You can use a range to access string:
"990822"[0...2]
# => "99"
See the String docs
To get the first two characters, the most straightforward way is:
"990822"[0, 2] # => "99"
Using a range inside the method [] is both not straightforward and also creates a range object that is immediately thrown out, which is a waste.
However, the whole question is actually an XY-question.
Is there a way to obtain:
"[][][]".split('[]')
#=> ["", "", ""]
instead of
#=>[]
without having to write a function?
The behavior is surprising here because sometimes irb would respond as expected:
"[]a".split('[]')
#=>["", "a"]`
From the docs:
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.
And so:
"[][][]".split("[]", -1)
# => ["", "", "", ""]
This yields four empty strings rather than your three, but if you think about it it's the only result that makes sense. If you split ,,, on each comma you would expect to get four empty strings as well, since there's one empty item "before" the first comma and one "after" the last.
String#split takes two arguments: a pattern to split on, and a limit to the number of results returned. In this case, limit can help us.
The documentation for String#split says:
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if *limit( is 1, the entire string is returned as the only entry in an array).
The key phrase here is trailing null fields are suppressed, in other words, if you have extra, empty matches at the end of the string, they'll be dropped from the result unless you have set a limit.
Here's an example:
"[]a[][]".split("[]")
#=> ["", "a"]
You might expect to get ["", "a", "", ""], but because trailing null fields are suppressed, everything after the last non-empty match (the a) is dropped.
We could set a limit, and only get that many results:
"[]a[][]".split("[]", 3)
#=> ["", "a", "[]"]
In this case, since we've asked for 3 results, the last [] is ignored and forms part of the last result. This is useful when we know how many results we expect, but not so useful in your specific case.
Fortunately, the docs continue:
If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.
In other words, we can pass a limit of -1, and get all the matches, even the trailing empty ones:
"[]a[][]".split('[]', -1)
#=> ["", "a", "", ""]
This even works when all the matches are empty:
"[][][]".split('[]', -1)
#=> ["", "", "", ""]
I've been searching for an answer to this in Ruby for a little while now and haven't found a good solution. What I am trying to figure out is how to split a string when the next character doesn't match the previous and pass the groupings into an array. ie.
'aaaabbbbzzxxxhhnnppp'
becomes
['aaaa', 'bbbb', 'zz', 'xxx', 'hh', 'nn', 'ppp']
I know I could just iterate over each char in the string and check for a change but am curious if there's anything built-in that could tackle this in a elegant manner.
Doable with a simple regex:
'aaaabbbbzzxxxhhnnppp'.scan(/((.)\2*)/).map{|x| x[0]}
=> ["aaaa", "bbbb", "zz", "xxx", "hh", "nn", "ppp"]
I've got a string Unnecessary:12357927251data and I need to select all data after colon and numbers. I will do it using Regexp.
string.scan(/:\d+.+$/)
This will give me :12357927251data, but can I select only needed information .+ (data)?
Anything in parentheses in a regexp will be captured as a group, which you can access in $1, $2, etc. or by using [] on a match object:
string.match(/:\d+(.+)$/)[1]
If you use scan with capturing groups, you will get an array of arrays of the groups:
"Unnecessary:123data\nUnnecessary:5791next".scan(/:\d+(.+)$/)
=> [["data"], ["next"]]
Use parenthesis in your regular expression and the result will be broken out into an array. For example:
x='Unnecessary:12357927251data'
x.scan(/(:\d+)(.+)$/)
=> [[":12357927251", "data"]]
x.scan(/:\d+(.+$)/).flatten
=> ["data"]
Assuming that you are trying to get the string 'data' from your string, then you can use:
string.match(/.*:\d*(.*)/)[1]
String#match returns a MatchData object. You can then index into that MatchData object to find the part of the string that you want.
(The first element of MatchData is the original string, the second element is the part of the string captured by the parentheses)
Try this: /(?<=\:)\d+.+$/
It changes the colon to a positive look-behind so that it does not appear in the output. Note that the colon alone is a metacharacter and so must be escaped with a backslash.
Using IRB
irb(main):004:0> "Unnecessary:12357927251data".scan(/:\d+(.+)$/)
=> [["data"]]