I'm trying to use Ruby to split to the right of a number.
For example: H2SO4
How do you do this?
I'd like to output ["H2", "SO4"]
x.split(/\d+/) yields: ["H", "SO"]
x.split(//) yields: ["H", "2", "S", "O", "4"]
Both cool but not exactly what I'm looking for.
x.scan(/[A-za-z]*\d+/)
This means break it into groups, each of which contains 0 or more letters, then 1 or more digits. Or if the non-digits can be anything:
x.scan(/\D*\d+/)
Related
I have a string:
"N8383"
I want to split on the character and maintain it to get:
["N", "8383"]
I tried the following:
"N8383".split(/[A-Z]/)
which gives me:
["", "8383"]
I want to match some more example strings like:
N344 344N S555 555S
String#split is a bad fit for this problem for the reasons others have stated. I would approach it like this, using String#scan instead:
str_parts = "N8383".scan(/[[:alpha:]]+/)
num_parts = "N8383".scan(/[[:digit:]]+/)
This will give you something to work with if the strings contain multiple string parts and/or multiple numeric parts.
This expression:
%w[N344 344N S555 555S].map do |str|
next str.scan(/[[:alpha:]]+/), str.scan(/[[:digit:]]+/)
end
Will return:
[
[["N"], ["344"]],
[["N"], ["344"]],
[["S"], ["555"]],
[["S"], ["555"]]
]
Although you are scanning each string twice, I think it's a better solution than 1. trying to come up with a complex regex that backtracks to return the parts in the right order, or 2. reprocessing the results to put the parts in the right order. Especially if the strings are as short as they are in the examples you've provided. That being said, if scanning each string twice really rankles you, here's another way to do it:
str_parts, num_parts = str.scan(/([[:alpha:]]+)|([[:digit:]]+)/).transpose.each(&:compact!)
Okay given the examples you could use the following regex
/(?=[A-Z])|(?<=[A-Z])/
This will look look ahead (?=) for a single character [A-Z] or look behind (?<=) for a single character [A-Z]. Since these are zero length assertions the split is placed between the characters rather than being the character. e.g.
%w{N8383 N344 344N S555 555S}.map {|s| s.split(/(?=[A-Z])|(?<=[A-Z])/) }
#=> [["N", "8383"], ["N", "344"], ["344", "N"], ["S", "555"], ["555", "S"]]
However this regex is specific to the given cases and does not offer any real deviation from the given cases e.g I have no idea of desired output for "N344S" but right now it will be ["N", "344" ,"S"] and worse yet "NSS344S" will be ["N", "S", "S", "344", "S"]
def doit(str)
str.scan(/\d+|\p{L}+/)
end
doit "N123" #=> ["N", "123"]
doit "123N" #=> ["123", "N"]
doit "N123M" #=> ["N", "123", "M"]
doit "N12M3P" #=> ["N", "12", "M", "3", "P"]
doit "123" #=> ["123"]
doit "NMN" #=> ["NMN"]
doit "" #=> []
If I have a string such as "aabbbbccdddeffffgg" and I wanted to split the string into this array: ["aa", "bbbb", "cc", "ddd", "e", "ffff", "gg"], how would I go about that?
I know of string.split/.../ < or however many period you put there, but it doesn't account for if the strings are uneven. The point of the problem I'm working on is to take two strings and see if there are three characters in a row of one string and two in a row in the other. I tried
`letter_count_1 = {}
str1.each_char do |let|
letter_count_1[let] = str1.count(let)
end`
But that gives the count for the total amount of each character in the string, and some of the inputs are randomized with the same letter in multiple places, like, "aabbbacccdba"
So how do you split the string up by character?
You can use a regex with a back reference and the scan() method:
str = "aabbbbccdddeffffgg"
groups = []
str.scan(/((.)\2*)/) { |x| groups.push(x[0]) }
groups will look like this afterwards:
["aa", "bbbb", "cc", "ddd", "e", "ffff", "gg"]
Here is a non-regexp version
str = "aabbbbccdddeffffgg"
p str.chars.chunk(&:itself).map{|x|x.last.join} #=> ["aa", "bbbb", "cc", "ddd", "e", "ffff", "gg"]
/((\w)\2)/ finds repeating letters. I was hoping to avoid the two dimensional array that is produced by ignoring the letter matching second capture group like this: /((?:\w)\2)/. It seems that's not possible. Any ideas why?
Rubular example
You don't need any capture groups:
str = [*'a+'..'z+', *'A+'..'Z+', *'0+'..'9+', '_+'].join('|')
#=> "a+|b+| ... |z+|A+|B+| ... |Z+|0+|1+| ... |9+|_+"
"aaabbcddd".scan(/#{str}/)
#=> ["aaa", "bb", "c", "ddd"]
but if you insist on having one:
"aaabbcddd".scan(/(#{str})/).flatten(1)
#=> ["aaa", "bb", "c", "ddd"]
Is this cheating? You did ask if it was possible.
If you mean you're using String#scan, you can post-process the result to return only the first items Enumerable#map:
'helloo'.scan(/((\w)\2)/)
# => [["ll", "l"], ["oo", "o"]]
'helloo'.scan(/((\w)\2)/).map { |m| m[0] }
# => ["ll", "oo"]
Is it possible to catch all grous of same digits in string with regex on Ruby? I'm not familiar with regex.
I mean: regex on "1112234444" will produce ["111", "22", "3", "4444"]
I know, I can use (\d)(\1*), but it only gives me 2 groups in each match. ["1", "11"], ["2", "2"], ["3", -], ["4", "444"]
How can I get 1 group in each match? Thanks.
Here, give this a shot:
((\d)\2*)
You can use this regex
((\d)\2*)
group 1 catches your required value
My first quick answer was rightfully criticized for having no explanation for the code. So here's another one, better in all respects ;-)
We exploit the fact that the elements whose runs we want are digits and they are easy to enumerate by hand. So we construct a readable regex which means "a run of zeros, or a run of ones, ... or a run of nines". And we use the right method for the job, String#scan:
irb> "1112234444".scan(/0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/)
=> ["111", "22", "3", "4444"]
For the record, here's my original answer:
irb> s = "1112234444"
=> "1112234444"
irb> rx = /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
=> /(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)/
irb> s.split(rx).reject(&:empty?)
=> ["111", "22", "3", "4444"]
I have the string "111221" and want to match all sets of consecutive equal integers: ["111", "22", "1"].
I know that there is a special regex thingy to do that but I can't remember and I'm terrible at Googling.
Using regex in Ruby 1.8.7+:
p s.scan(/((\d)\2*)/).map(&:first)
#=> ["111", "22", "1"]
This works because (\d) captures any digit, and then \2* captures zero-or-more of whatever that group (the second opening parenthesis) matched. The outer (…) is needed to capture the entire match as a result in scan. Finally, scan alone returns:
[["111", "1"], ["22", "2"], ["1", "1"]]
…so we need to run through and keep just the first item in each array. In Ruby 1.8.6+ (which doesn't have Symbol#to_proc for convenience):
p s.scan(/((\d)\2*)/).map{ |x| x.first }
#=> ["111", "22", "1"]
With no Regex, here's a fun one (matching any char) that works in Ruby 1.9.2:
p s.chars.chunk{|c|c}.map{ |n,a| a.join }
#=> ["111", "22", "1"]
Here's another version that should work even in Ruby 1.8.6:
p s.scan(/./).inject([]){|a,c| (a.last && a.last[0]==c[0] ? a.last : a)<<c; a }
# => ["111", "22", "1"]
"111221".gsub(/(.)(\1)*/).to_a
#=> ["111", "22", "1"]
This uses the form of String#gsub that does not have a block and therefore returns an enumerator. It appears gsub was bestowed with that option in v2.0.
I found that this works, it first matches each character in one group, and then it matches any of the same character after it. This results in an array of two element arrays, with the first element of each array being the initial match, and then the second element being any additional repeated characters that match the first character. These arrays are joined back together to get an array of repeated characters:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
repeated_chars = input.scan(/(.)(\1*)/)
# => [["W", "W"], ["B", ""], ["W", "WWW"], ["B", "BB"], ["W", "WWWWWW"], ["B", ""], ["3", "333"], ["!", "!!!"]]
repeated_chars.map(&:join)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
As an alternative I found that I could create a new Regexp object to match one or more occurrences of each unique characters in the input string as follows:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
regexp = Regexp.new("#{input.chars.uniq.join("+|")}+")
#=> regexp created for this example will look like: /W+|B+|3+|!+/
and then use that Regex object as an argument for scan to split out all the repeated characters, as follows:
input.scan(regexp)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
you can try is
string str ="111221";
string pattern =#"(\d)(\1)+";
Hope can help you