ruby and regex grouping - ruby

Here is the code
string = "Looking for the ^[cows]"
footnote = string[/\^\[(.*?)\]/]
I was hoping that footnote would equal cows
What I get is footnote equals ^[cows]
Any help?
Thanks!

You can specify which capture group you want with a second argument to []:
string = "Looking for the ^[cows]"
footnote = string[/\^\[(.*?)\]/, 1]
# footnote == "cows"

According to the String documentation, the #[] method takes a second parameter, an integer, which determines the matching group returned:
a = "hello there"
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
You should use footnote = string[/\^\[(.*?)\]/, 1]

If you want to capture subgroups, you can use Regexp#match:
r = /\^\[(.*?)\]/
r.match(string) # => #<MatchData "^[cows]" 1:"cows">
r.match(string)[0] # => "^[cows]"
r.match(string)[1] # => "cows"

An alternative to using a capture group, and then retrieving it's contents, is to match only what you want. Here are three ways of doing that.
#1 Use a positive lookbehind and a positive lookahead
string[/(?<=\[).*?(?=\])/]
#=> "cows"
#2 Use match but forget (\K) and a positive lookahead
string[/\[\K.*?(?=\])/]
#=> "cows"
#3 Use String#gsub
string.gsub(/.*?\[|\].*/,'')
#=> "cows"

Related

How to make a repeated string to the left be deleted without using While?

For example, I have this string of only numbers:
0009102
If I convert it to integer Ruby automatically gives me this value:
9102
That's correct. But my program gives me different types of numbers:
2229102 desired output => 9102
9999102 desired output => 102
If you look at them I have treated 2 and 9 as zeros since they are automatically deleted, well, it is easy to delete that with an while but I must avoid it.
In other words, how do you make 'n' on the left be considered a zero for Ruby?
"2229102".sub(/\A(\d)\1*/, "") #=> "9102"`.
The regular expression reads, "match the first digit in the string (\A is the beginning-of-string anchor) in capture group 1 ((\d)), followed by zero or more characters (*) that equal the contents of capture group 1 (\1). String#gsub converts that match to an empty string.
Try with Enumerable#chunk_while:
s = '222910222'
s.each_char.chunk_while(&:==).drop(1).join
#=> "910222"
Where s.each_char.chunk_while(&:==).to_a #=> [["2", "2", "2"], ["9"], ["1"], ["0"], ["2", "2", "2"]]
Similar to the solution of iGian you could also use drop_while.
s = '222910222'
s.each_char.each_cons(2).drop_while { |a, b| a == b }.map(&:last).join
#=> "910222"
# or
s.each_char.drop_while.with_index(-1) { |c, i| i < 0 || c == s[i] }.join
#=> "910222"
You can also try this way:
s = '9999102938'
s.chars.then{ |chars| chars[chars.index(chars.uniq[1])..-1] }.join
=> "102938"

Ruby string char chunking

I have a string "wwwggfffw" and want to break it up into an array as follows:
["www", "gg", "fff", "w"]
Is there a way to do this with regex?
"wwwggfffw".scan(/((.)\2*)/).map(&:first)
scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).
EDIT to explain the regexp ((.)\2*) itself:
( start group #1, consisting of
( start group #2, consisting of
. any one character
) and nothing else
\2 followed by the content of the group #2
* repeated any number of times (including zero)
) and nothing else.
So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.
You can use back references, something like
'wwwggfffw'.scan(/((.)\2*)/).map{ |s| s[0] }
will work
Here's one that's not using regex but works well:
def chunk(str)
chars = str.chars
chars.inject([chars.shift]) do |arr, char|
if arr[-1].include?(char)
arr[-1] << char
else
arr << char
end
arr
end
end
In my benchmarks it's faster than the regex answers here (with the example string you gave, at least).
Another non-regex solution, this one using Enumerable#slice_when, which made its debut in Ruby v.2.2:
str.each_char.slice_when { |a,b| a!=b }.map(&:join)
#=> ["www", "gg", "fff", "w"]
Another option is:
str.scan(Regexp.new(str.squeeze.each_char.map { |c| "(#{c}+)" }.join)).first
#=> ["www", "gg", "fff", "w"]
Here the steps are as follows
s = str.squeeze
#=> "wgfw"
a = s.each_char
#=> #<Enumerator: "wgfw":each_char>
This enumerator generates the following elements:
a.to_a
#=> ["w", "g", "f", "w"]
Continuing
b = a.map { |c| "(#{c}+)" }
#=> ["(w+)", "(g+)", "(f+)", "(w+)"]
c = b.join
#=> "(w+)(g+)(f+)(w+)"
r = Regexp.new(c)
#=> /(w+)(g+)(f+)(w+)/
d = str.scan(r)
#=> [["www", "gg", "fff", "w"]]
d.first
#=> ["www", "gg", "fff", "w"]
Here's one more way of doing it without a regex:
'wwwggfffw'.chars.chunk(&:itself).map{ |s| s[1].join }
# => ["www", "gg", "fff", "w"]

Ruby: Insert Multiple Values Into String

Suppose we have the string "aaabbbccc" and want to use the String#insert to convert the string to "aaa<strong>bbb</strong>ccc". Is this the best way to insert multiple values into a Ruby string using String#insert or can multiple values simultaneously be added:
string = "aaabbbccc"
opening_tag = '<strong>'
opening_index = 3
closing_tag = '</strong>'
closing_index = 6
string.insert(opening_index, opening_tag)
closing_index = 6 + opening_tag.length # I don't really like this
string.insert(closing_index, closing_tag)
Is there a way to simultaneously insert multiple substrings into a Ruby string so the closing tag does not need to be offset by the length of the first substring that is added? I would like something like this one liner:
string.insert(3 => '<strong>', 6 => '</strong>') # => "aaa<strong>bbb</strong>ccc"
Let's have some fun. How about
class String
def splice h
self.each_char.with_index.inject('') do |accum,(c,i)|
accum + h.fetch(i,'') + c
end
end
end
"aaabbbccc".splice(3=>"<strong>", 6=>"</strong>")
=> "aaa<strong>bbb</strong>ccc"
(you can encapsulate this however you want, I just like messing with built-ins because Ruby lets me)
How about inserting from right to left?
string = "aaabbbccc"
string.insert(6, '</strong>')
string.insert(3, '<strong>')
string # => "aaa<strong>bbb</strong>ccc"
opening_tag = '<strong>'
opening_index = 3
closing_tag = '</strong>'
closing_index = 6
string = "aaabbbccc"
string[opening_index...closing_index] =
opening_tag + string[opening_index...closing_index] + closing_tag
#=> "<strong>bbb</strong>"
string
#=> "aaa<strong>bbb</strong>ccc"
If your string is comprised of three groups of consecutive characters, and you'd like to insert the opening tag between the first two groups and the closing tag between the last two groups, regardless of the size of each group, you could do that like this:
def stuff_tags(str, tag)
str.scan(/((.)\2*)/)
.map(&:first)
.insert( 1, "<#{tag}>")
.insert(-2, "<\/#{tag}>")
.join
end
stuff_tags('aaabbbccc', 'strong') #=> "aaa<strong>bbb</strong>ccc"
stuff_tags('aabbbbcccccc', 'weak') #=> "aa<weak>bbbb</weak>cccccc"
I will explain the regex used by scan, but first would like to show how the calculations proceed for the string 'aaabbbccc':
a = 'aaabbbccc'.scan(/((.)\2*)/)
#=> [["aaa", "a"], ["bbb", "b"], ["ccc", "c"]]
b = a.map(&:first)
#=> ["aaa", "bbb", "ccc"]
c = b.insert( 1, "<strong>")
#=> ["aaa", "<strong>", "bbb", "ccc"]
d = c.insert(-2, "<\/strong>")
#=> ["aaa", "<strong>", "bbb", "</strong>", "ccc"]
d.join
#=> "aaa<strong>bbb</strong>ccc"
We need two capture groups in the regex. The first (having the first left parenthesis) captures the string we want. The second captures the first character, (.). This is needed so that we can require that it be followed by zero or more copies of that character, \2*.
Here's another way this can be done:
def stuff_tags(str, tag)
str.chars.chunk {|c| c}
.map {|_,a| a.join}
.insert( 1, "<#{tag}>")
.insert(-2, "<\/#{tag}>")
.join
end
The calculations of a and b above change to the following:
a = 'aaabbbccc'.chars.chunk {|c| c}
#=> #<Enumerator: #<Enumerator::Generator:0x000001021622d8>:each>
# a.to_a => [["a",["a","a","a"]],["b",["b","b","b"]],["c",["c","c","c"]]]
b = a.map {|_,a| a.join }
#=> ["aaa", "bbb", "ccc"]

Find just part of string with a regex

I have a string like so:
"#[30:Larry Middleton]"
I want to return just 30. Where 30 will always be digits, and can be of 1 to infinity in length.
I've tried:
user_id = result.match(/#\[(\d+):.*]/)
But that returns everything. How can I get back just 30?
If that's really all your string, you don't need to match the rest of the pattern; just match the consecutive integers:
irb(main):001:0> result = "#[30:Larry Middleton]"
#=> "#[30:Larry Middleton]"
irb(main):002:0> result[/\d+/]
#=> "30"
However, if you need to match this as part of a larger string that might have digits elsewhere:
irb(main):004:0> result[/#\[(\d+):.*?\]/]
#=> "#[30:Larry Middleton]"
irb(main):005:0> result[/#\[(\d+):.*?\]/,1]
#=> "30"
irb(main):006:0> result[/#\[(\d+):.*?\]/,1].to_i
#=> 30
If you need the name also:
irb(main):002:0> m = result.match /#\[(\d+):(.*?)\]/
#=> #<MatchData "#[30:Larry Middleton]" 1:"30" 2:"Larry Middleton">
irb(main):003:0> m[1]
#=> "30"
irb(main):004:0> m[2]
#=> "Larry Middleton"
In Ruby 1.9 you can even name the matches, instead of using the capture number:
irb(main):005:0> m = result.match /#\[(?<id>\d+):(?<name>.*?)\]/
#=> #<MatchData "#[30:Larry Middleton]" id:"30" name:"Larry Middleton">
irb(main):006:0> m[:id]
#=> "30"
irb(main):007:0> m[:name]
#=> "Larry Middleton"
And if you need to find many of these:
irb(main):008:0> result = "First there was #[30:Larry Middleton], age 17, and then there was #[42:Phrogz], age unknown."
#irb(main):015:0> result.scan /#\[(\d+):.*?\]/
#=> [["30"], ["42"]]
irb(main):016:0> result.scan(/#\[(\d+):.*?\]/).flatten.map(&:to_i)
#=> [30, 42]
irb(main):017:0> result.scan(/#\[(\d+):(.*?)\]/).each{ |id,name| puts "#{name} is #{id}" }
Larry is 30
Phrogz is 42
Try this:
user_id = result.match(/#\[(\d+):.*]/)[1]
You've forgot to escape ']':
user_id = result.match(/#\[(\d+):.*\]/)[1]
I don't know ruby, but if it supports lookbehinds and lookaheads:
user_id = result.match(/(?<#\[)\d+(?=:)/)
If not, you should have some way of retrieving subpattern from the match - again, I wouldn't know how.
I prefer String#scan for most of my regex needs, here's what I would do:
results.scan(/#\[(\d+):/).flatten.map(&:to_i).first
For your second question about getting the name:
results.scan(/(\d+):([A-Za-z ]+)\]$/).flatten[1]
Scan will always return an array of sub string matches:
"#[123:foo bars]".scan(/\d+/) #=> ['123']
If you include a pattern in parens, then each match for those "sub-patterns" will be included in a sub array:
"#[123:foo bars]".scan(/(\d+):(\w+)/) #=> [['123'], ['foo']]
That's why we have to do flatten on results involving sub-patterns:
[['123'], ['foo']].flatten = ['123', 'foo']
Also it always returns strings, that's why conversion to integer is needed in the first example:
['123'].to_i = 123
Hope this is helpful.

How to get a substring of text?

I have text with length ~700. How do I get only ~30 of its first characters?
If you have your text in your_text variable, you can use:
your_text[0..29]
Use String#slice, also aliased as [].
a = "hello there"
a[1] #=> "e"
a[1,3] #=> "ell"
a[1..3] #=> "ell"
a[6..-1] #=> "there"
a[6..] #=> "there" (requires Ruby 2.6+)
a[-3,2] #=> "er"
a[-4..-2] #=> "her"
a[12..-1] #=> nil
a[-2..-4] #=> ""
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
a["lo"] #=> "lo"
a["bye"] #=> nil
Since you tagged it Rails, you can use truncate:
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-truncate
Example:
truncate(#text, :length => 17)
Excerpt is nice to know too, it lets you display an excerpt of a text Like so:
excerpt('This is an example', 'an', :radius => 5)
# => ...s is an exam...
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-excerpt
if you need it in rails you can use first (source code)
'1234567890'.first(5) # => "12345"
there is also last (source code)
'1234567890'.last(2) # => "90"
alternatively check from/to (source code):
"hello".from(1).to(-2) # => "ell"
If you want a string, then the other answers are fine, but if what you're looking for is the first few letters as characters you can access them as a list:
your_text.chars.take(30)

Resources