Ruby regex for a split every four characters not working - ruby

I'm trying to split a sizeable string every four characters. This is how I'm trying to do it:
big_string.split(/..../)
This is yielding a nil array. As far as I can see, this should be working. It even does when I plug it into an online ruby regex test.

Try scan instead:
$ irb
>> "abcd1234beefcake".scan(/..../)
=> ["abcd", "1234", "beef", "cake"]
or
>> "abcd1234beefcake".scan(/.{4}/)
=> ["abcd", "1234", "beef", "cake"]
If the number of characters isn't divisible by 4, you can also grab the remaining characters:
>> "abcd1234beefcakexyz".scan(/.{1,4}/)
=> ["abcd", "1234", "beef", "cake", "xyz"]
(The {1,4} will greedily grab between 1 and 4 characters)

Hmm, I don't know what Rubular is doing there and why - but
big_string.split(/..../)
does translate into
split the string at every 4-character-sequence
which should correctly result into something like
["", "", "", "abc"]

Whoops.
str = 'asdfasdfasdf'
c = 0
out = []
inum = 4
(str.length / inum).round.times do |s|
out.push(str[c, round(s * inum)])
c += inum
end

Related

How to make a repeated string to the left be deleted without using While?

For example, I have this string of only numbers:
0009102
If I convert it to integer Ruby automatically gives me this value:
9102
That's correct. But my program gives me different types of numbers:
2229102 desired output => 9102
9999102 desired output => 102
If you look at them I have treated 2 and 9 as zeros since they are automatically deleted, well, it is easy to delete that with an while but I must avoid it.
In other words, how do you make 'n' on the left be considered a zero for Ruby?
"2229102".sub(/\A(\d)\1*/, "") #=> "9102"`.
The regular expression reads, "match the first digit in the string (\A is the beginning-of-string anchor) in capture group 1 ((\d)), followed by zero or more characters (*) that equal the contents of capture group 1 (\1). String#gsub converts that match to an empty string.
Try with Enumerable#chunk_while:
s = '222910222'
s.each_char.chunk_while(&:==).drop(1).join
#=> "910222"
Where s.each_char.chunk_while(&:==).to_a #=> [["2", "2", "2"], ["9"], ["1"], ["0"], ["2", "2", "2"]]
Similar to the solution of iGian you could also use drop_while.
s = '222910222'
s.each_char.each_cons(2).drop_while { |a, b| a == b }.map(&:last).join
#=> "910222"
# or
s.each_char.drop_while.with_index(-1) { |c, i| i < 0 || c == s[i] }.join
#=> "910222"
You can also try this way:
s = '9999102938'
s.chars.then{ |chars| chars[chars.index(chars.uniq[1])..-1] }.join
=> "102938"

How to extract number from array of string? (I m using regex)

I have a array of string
test= ["ChangeServer<br/>Test: 3-7<br/>PinCode:DFSFSDFB04008<br/>ShipCode:DFADFSDFSDM-000D3<br/>SomeCode:sdfsdf", "werwerwe", "adfsdfsd",
"sdfsdfsdfsd<br/>Test: 9<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>"]
I want to grab the number after Test: which in the above array of string are 3, 4, 5, 6, 7 ( range 3-7) and 9
Desired output:
["3","4","5","6","7","9"]
What I tried so far
test.join.scan(/(?<=Test: )[0-9]+/)
=> ["3", "7"]
How to deal with range?
Second test case:
test= ["ChangeServer<br/>Test: 3-7<br/>PinCode:DFSFSDFB04008<br/>ShipCode:DFADFSDFSDM-000D3<br/>SomeCode:sdfsdf", "werwerwe", "adfsdfsd",
"sdfsdfsdfsd<br/>Test: 9<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>", "sdfsdfsdfsd<br/>Test: 15-18<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>"]
Desired output:
["3","4","5","6","7","9","15","16","17","18"]
There are a lot of ways you could solve this. I'd probably do it this way:
test.flat_map do |s|
_, m, n = *s.match(/Test:\s*(\d+)(?:-(\d+))?/)
m ? (m..n||m).to_a : []
end
See it in action on repl.it: https://repl.it/JFwT/13
Or, more succinctly:
test.flat_map {|s| s.match(/Test:\s*(\d+)(?:-(\d+))?/) { $1..($2||$1) }.to_a }
https://repl.it/JFwT/11
You could create a new Range for each range found (i.e N-N) using the splat operator (i.e. *) and combine the results, like this 1:
test.join.scan(/(?<=Test: )[0-9-]+/)
.flat_map { |r| Range.new(*r.split('-').values_at(0, -1)).to_a }
#=> ["3", "4", "5", "6", "7", "9"]
This will work for both examples.
1 Notice the the added - next to 0-9 in the regex.
Is the a way where we can include both Test: 1 (with space between
Test: and 1) and Test:1 (without space between Test: and 1)?
Yes, update your regex (change where space is placed) and add an additional map to get rid of those spaces:
test.join
.scan(/(?<=Test:)[ 0-9-]+/)
.map(&:strip)
.flat_map { |r| Range.new(*r.split('-').values_at(0, -1)).to_a }
And here's shortened option using two captures in the regex, as suggested by Jordan.
test.join
.scan(/Test:\s*(\d+)(?:-(\d+))?/)
.flat_map { |m,n| (m..n||m).to_a }
Just out of curiosity:
test.
join.
scan(/(?<=Test: )[\d-]+/).
map { |e| e.gsub(/\A\d+\Z/) { |m| "#{m}..#{m}" }.gsub('-', '..') }.
map(&method(:eval)).
flat_map(&:to_a)

Ruby Split string at character difference using regex

I'm current working on a problem that involves splitting a string by each group of characters.
For example,
"111223334456777" #=> ['111','22','333','44','5','6','777']
The way I am currently doing it now is using a enumerator and comparing each character with the next one, and splitting the array that way.
res = []
str = "111223334456777"
group = str[0]
(1...str.length).each do |i|
if str[i] != str[i-1]
res << group
group = str[i]
else
group << str[i]
end
end
res << group
res #=> ['111','22','333','44','5','6','777']
I want to see if I can use regex to do this, which will make this process a lot easier. I understand I could just put this block of code in a method, but I'm curious if regex can be used here.
So what I want to do is
str.split(/some regex/)
to produce the same result. I thought about positive lookahead, but I can't figure out how to have regex recognize that the character is different.
Does anyone have an idea if this is possible?
The chunk_while method is what you're looking for here:
str.chars.chunk_while { |b,a| b == a }.map(&:join)
That will break anything where the current character a doesn't match the previous character b. If you want to restrict to just numbers you can do some pre-processing.
There's a lot of very handy methods in Enumerable that are worth exploring, and each new version of Ruby seems to add more of them.
str = "111333224456777"
str.scan /0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/
#=> ["111", "333", "22", "44", "5", "6", "777"]
or
str.gsub(/(\d)\1*/).to_a
#=> ["111", "333", "22", "44", "5", "6", "777"]
The latter uses the (underused) form of String#gsub that takes one argument and no block, returning an enumerator. It merely generates matches and has nothing to do with character replacement.
For fun, here are several other ways to do that.
str.scan(/((\d)\2*)/).map(&:first)
str.split(/(?<=(.))(?!\1)/).each_slice(2).map(&:first)
str.each_char.slice_when(&:!=).map(&:join)
str.each_char.chunk(&:itself).map { |_,a| a.join }
str.each_char.chunk_while(&:==).map(&:join)
str.gsub(/(?<=(.))(?!\1)/, ' ').split
str.gsub(/(.)\1*/).reduce([], &:<<)
str[1..-1].each_char.with_object([txt[0]]) {|c,a| a.last[-1]==c ? (a.last<<c) : a << c}
Another option which utilises the group_by method, which returns a hash with each individual number as a key and an array of grouped numbers as the value.
"111223334456777".split('').group_by { |i| i }.values.map(&:join) => => ["111", "22", "333", "44", "5", "6", "777"]
Although it doesn't implement a regex, someone else may find it useful.

Ruby: Insert Multiple Values Into String

Suppose we have the string "aaabbbccc" and want to use the String#insert to convert the string to "aaa<strong>bbb</strong>ccc". Is this the best way to insert multiple values into a Ruby string using String#insert or can multiple values simultaneously be added:
string = "aaabbbccc"
opening_tag = '<strong>'
opening_index = 3
closing_tag = '</strong>'
closing_index = 6
string.insert(opening_index, opening_tag)
closing_index = 6 + opening_tag.length # I don't really like this
string.insert(closing_index, closing_tag)
Is there a way to simultaneously insert multiple substrings into a Ruby string so the closing tag does not need to be offset by the length of the first substring that is added? I would like something like this one liner:
string.insert(3 => '<strong>', 6 => '</strong>') # => "aaa<strong>bbb</strong>ccc"
Let's have some fun. How about
class String
def splice h
self.each_char.with_index.inject('') do |accum,(c,i)|
accum + h.fetch(i,'') + c
end
end
end
"aaabbbccc".splice(3=>"<strong>", 6=>"</strong>")
=> "aaa<strong>bbb</strong>ccc"
(you can encapsulate this however you want, I just like messing with built-ins because Ruby lets me)
How about inserting from right to left?
string = "aaabbbccc"
string.insert(6, '</strong>')
string.insert(3, '<strong>')
string # => "aaa<strong>bbb</strong>ccc"
opening_tag = '<strong>'
opening_index = 3
closing_tag = '</strong>'
closing_index = 6
string = "aaabbbccc"
string[opening_index...closing_index] =
opening_tag + string[opening_index...closing_index] + closing_tag
#=> "<strong>bbb</strong>"
string
#=> "aaa<strong>bbb</strong>ccc"
If your string is comprised of three groups of consecutive characters, and you'd like to insert the opening tag between the first two groups and the closing tag between the last two groups, regardless of the size of each group, you could do that like this:
def stuff_tags(str, tag)
str.scan(/((.)\2*)/)
.map(&:first)
.insert( 1, "<#{tag}>")
.insert(-2, "<\/#{tag}>")
.join
end
stuff_tags('aaabbbccc', 'strong') #=> "aaa<strong>bbb</strong>ccc"
stuff_tags('aabbbbcccccc', 'weak') #=> "aa<weak>bbbb</weak>cccccc"
I will explain the regex used by scan, but first would like to show how the calculations proceed for the string 'aaabbbccc':
a = 'aaabbbccc'.scan(/((.)\2*)/)
#=> [["aaa", "a"], ["bbb", "b"], ["ccc", "c"]]
b = a.map(&:first)
#=> ["aaa", "bbb", "ccc"]
c = b.insert( 1, "<strong>")
#=> ["aaa", "<strong>", "bbb", "ccc"]
d = c.insert(-2, "<\/strong>")
#=> ["aaa", "<strong>", "bbb", "</strong>", "ccc"]
d.join
#=> "aaa<strong>bbb</strong>ccc"
We need two capture groups in the regex. The first (having the first left parenthesis) captures the string we want. The second captures the first character, (.). This is needed so that we can require that it be followed by zero or more copies of that character, \2*.
Here's another way this can be done:
def stuff_tags(str, tag)
str.chars.chunk {|c| c}
.map {|_,a| a.join}
.insert( 1, "<#{tag}>")
.insert(-2, "<\/#{tag}>")
.join
end
The calculations of a and b above change to the following:
a = 'aaabbbccc'.chars.chunk {|c| c}
#=> #<Enumerator: #<Enumerator::Generator:0x000001021622d8>:each>
# a.to_a => [["a",["a","a","a"]],["b",["b","b","b"]],["c",["c","c","c"]]]
b = a.map {|_,a| a.join }
#=> ["aaa", "bbb", "ccc"]

Partition/split a string by character set in Ruby

How can I separate different character sets in my string? For example, if I had these charsets:
[a-z]
[A-Z]
[0-9]
[\s]
{everything else}
And this input:
thisISaTEST***1234pie
Then I want to separate the different character sets, for example, if I used a newline as the separating character:
this
IS
a
TEST
***
1234
pie
I've tried this regex, with a positive lookahead:
'thisISaTEST***1234pie'.gsub(/(?=[a-z]+|[A-Z]+|[0-9]+|[\s]+)/, "\n")
But apparently the +s aren't being greedy, because I'm getting:
t
h
# (snip)...
S
T***
1
# (snip)...
e
I snipped out the irrelevant parts, but as you can see each character is counting as its own charset, except the {everything else} charset.
How can I do this? It does not necessarily have to be by regex. Splitting them into an array would work too.
The difficult part is to match whatever that does not match the rest of the regex. Forget about that, and think of a way that you can mix the non-matching parts together with the matching parts.
"thisISaTEST***1234pie"
.split(/([a-z]+|[A-Z]+|\d+|\s+)/).reject(&:empty?)
# => ["this", "IS", "a", "TEST", "***", "1234", "pie"]
In the ASCII character set, apart from alphanumerics and space, there are thirty-two "punctuation" characters, which are matched with the property construct \p{punct}.
To split your string into sequences of a single category, you can write
str = 'thisISaTEST***1234pie'
p str.scan(/\G(?:[a-z]+|[A-Z]+|\d+|\s+|[\p{punct}]+)/)
output
["this", "IS", "a", "TEST", "***", "1234", "pie"]
Alternatively, if your string contains characters outside the ASCII set, you could write the whole thing in terms of properties, like this
p str.scan(/\G(?:\p{lower}+|\p{upper}+|\p{digit}+|\p{space}|[^\p{alnum}\p{space}]+)/)
Here a two solutions.
String#scan with a regular expression
str = "thisISa\n TEST*$*1234pie"
r = /[a-z]+|[A-Z]+|\d+|\s+|[^a-zA-Z\d\s]+/
str.scan r
#=> ["this", "IS", "a", "\n ", "TEST", "*$*", "1234", "pie"]
Because of ^ at the beginning of [^a-zA-Z\d\s] that character class matches any character other than letters (lower and upper case), digits and whitespace.
Use Enumerable#slice_when1
First, a helper method:
def type(c)
case c
when /[a-z]/ then 0
when /[A-Z]/ then 1
when /\d/ then 2
when /\s/ then 3
else 4
end
end
For example,
type "f" #=> 0
type "P" #=> 1
type "3" #=> 2
type "\n" #=> 3
type "*" #=> 4
Then
str.each_char.slice_when { |c1,c2| type(c1) != type(c2) }.map(&:join)
#=> ["this", "IS", "a", "TEST", "***", "1234", "pie"]
1. slich_when made its debut in Ruby v2.4.
Non-word, non-space chars can be covered with [^\w\s], so:
"thisISaTEST***1234pie".scan /[a-z]+|[A-Z]+|\d+|\s+|[^\w\s]+/
#=> ["this", "IS", "a", "TEST", "***", "1234", "pie"]

Resources