Find just part of string with a regex - ruby

I have a string like so:
"#[30:Larry Middleton]"
I want to return just 30. Where 30 will always be digits, and can be of 1 to infinity in length.
I've tried:
user_id = result.match(/#\[(\d+):.*]/)
But that returns everything. How can I get back just 30?

If that's really all your string, you don't need to match the rest of the pattern; just match the consecutive integers:
irb(main):001:0> result = "#[30:Larry Middleton]"
#=> "#[30:Larry Middleton]"
irb(main):002:0> result[/\d+/]
#=> "30"
However, if you need to match this as part of a larger string that might have digits elsewhere:
irb(main):004:0> result[/#\[(\d+):.*?\]/]
#=> "#[30:Larry Middleton]"
irb(main):005:0> result[/#\[(\d+):.*?\]/,1]
#=> "30"
irb(main):006:0> result[/#\[(\d+):.*?\]/,1].to_i
#=> 30
If you need the name also:
irb(main):002:0> m = result.match /#\[(\d+):(.*?)\]/
#=> #<MatchData "#[30:Larry Middleton]" 1:"30" 2:"Larry Middleton">
irb(main):003:0> m[1]
#=> "30"
irb(main):004:0> m[2]
#=> "Larry Middleton"
In Ruby 1.9 you can even name the matches, instead of using the capture number:
irb(main):005:0> m = result.match /#\[(?<id>\d+):(?<name>.*?)\]/
#=> #<MatchData "#[30:Larry Middleton]" id:"30" name:"Larry Middleton">
irb(main):006:0> m[:id]
#=> "30"
irb(main):007:0> m[:name]
#=> "Larry Middleton"
And if you need to find many of these:
irb(main):008:0> result = "First there was #[30:Larry Middleton], age 17, and then there was #[42:Phrogz], age unknown."
#irb(main):015:0> result.scan /#\[(\d+):.*?\]/
#=> [["30"], ["42"]]
irb(main):016:0> result.scan(/#\[(\d+):.*?\]/).flatten.map(&:to_i)
#=> [30, 42]
irb(main):017:0> result.scan(/#\[(\d+):(.*?)\]/).each{ |id,name| puts "#{name} is #{id}" }
Larry is 30
Phrogz is 42

Try this:
user_id = result.match(/#\[(\d+):.*]/)[1]

You've forgot to escape ']':
user_id = result.match(/#\[(\d+):.*\]/)[1]

I don't know ruby, but if it supports lookbehinds and lookaheads:
user_id = result.match(/(?<#\[)\d+(?=:)/)
If not, you should have some way of retrieving subpattern from the match - again, I wouldn't know how.

I prefer String#scan for most of my regex needs, here's what I would do:
results.scan(/#\[(\d+):/).flatten.map(&:to_i).first
For your second question about getting the name:
results.scan(/(\d+):([A-Za-z ]+)\]$/).flatten[1]
Scan will always return an array of sub string matches:
"#[123:foo bars]".scan(/\d+/) #=> ['123']
If you include a pattern in parens, then each match for those "sub-patterns" will be included in a sub array:
"#[123:foo bars]".scan(/(\d+):(\w+)/) #=> [['123'], ['foo']]
That's why we have to do flatten on results involving sub-patterns:
[['123'], ['foo']].flatten = ['123', 'foo']
Also it always returns strings, that's why conversion to integer is needed in the first example:
['123'].to_i = 123
Hope this is helpful.

Related

Generate a hash of all letters and digits

Using ruby, how do I make a hash of each letter in the alphabet (keys) and 1-26 (values) ?
I need to create a hash with "a" to "z" in keys and 1 to 26 in values but I do not want to write myself alphabet = {'a'=>1,'b'=>2,....'y'=>25,'z'=>26}
I need this in my code to print alphabet[i] if alphabet.key?(i)
('a'..'z').each.with_index(1).to_h
#=> {"a"=>1, "b"=>2, "c"=>3, "d"=>4, "e"=>5, "f"=>6, "g"=>7, "h"=>8, "i"=>9, "j"=>10,
# "k"=>11, "l"=>12, "m"=>13, "n"=>14, "o"=>15, "p"=>16, "q"=>17, "r"=>18, "s"=>19,
# "t"=>20, "u"=>21, "v"=>22, "w"=>23, "x"=>24, "y"=>25, "z"=>26}
Steps:
('a'..'z') - create a Range of alphabetic letters "a" through "z" inclusive
each - returns an Enumerator
with_index(1) - returns an Enumerator of each element of the initial Range combined with its index (starting at 1) e.g. [["a",1],["b",2],...]
to_h - convert the Enumerator to a Hash
Update:
A bit more esoteric but this will also work
enum = Enumerator.produce('a') {|e| e == 'z' ? raise(StopIteration) : e.succ }.tap do |e|
e.define_singleton_method(:[]) {|elem| find_index(elem)&.+(1) }
e.define_singleton_method(:to_h) { with_index(1).to_h }
end
enum['w']
#=> 23
enum['W']
#=> nil
enum.to_h
#=> {"a"=>1, "b"=>2, "c"=>3, "d"=>4, "e"=>5, "f"=>6, "g"=>7, "h"=>8, "i"=>9, "j"=>10,
# "k"=>11, "l"=>12, "m"=>13, "n"=>14, "o"=>15, "p"=>16, "q"=>17, "r"=>18, "s"=>19,
# "t"=>20, "u"=>21, "v"=>22, "w"=>23, "x"=>24, "y"=>25, "z"=>26}
With two ranges, zip and to_h
('a'..'z').zip(1..26).to_h
Hash[('a'..'z').zip(1.upto(26))]
Depending on requirements you may be able to save memory by using an empty hash with a default proc.
h = Hash.new do |_h,k|
k.is_a?(String) && k.match?(/\A[a-z]\z/) ? (k.ord - 96) : nil
end
#=> {}
h['a'] #=> 1
h['z'] #=> 26
h['R'] #=> nil
h['cat'] #=> nil
h[2] #=> nil
h[{a:1}] #=> nil
See Hash::new and String#match?.
The regular expression reads, "match the beginning of the string (\A) followed by one lowercase letter ([a-z]) followed by the end of the string (\z). [a-z] denotes a character class.
If all lowercase letters must comprise the hash's keys one may write the following.
('a'..'z').to_h { |c| [c, c.ord - 96] }
#=> {"a"=>1, "b"=>2,..., "y"=>25, "z"=>26}
See Enumerable#to_h.
There have been better answers given already, but here's an entirely different option using a times loop to simply increment the keys and values of a starter hash using next:
h = {"a" => 1}
25.times {h[h.keys.last.next] = h.values.last.next}
h
#=> {"a"=>1, "b"=>2, "c"=>3, "d"=>4, "e"=>5, "f"=>6, "g"=>7, "h"=>8, "i"=>9, "j"=>10, "k"=>11, "l"=>12, "m"=>13, "n"=>14, "o"=>15, "p"=>16, "q"=>17, "r"=>18, "s"=>19, "t"=>20, "u"=>21, "v"=>22, "w"=>23, "x"=>24, "y"=>25, "z"=>26}

How to match given characters in string

Given:
fruits = %w[Banana Apple Orange Grape]
chars = 'ep'
how can I print all elements of fruits that have all characters of chars? I tried the following:
fruits.each{|fruit| puts fruit if !(fruit=~/["#{chars}"]/i).nil?)}
but I see 'Orange' in the result, which does not have the 'p' character in it.
p fruits.select { |fruit| chars.delete(fruit.downcase).empty? }
["Apple", "Grape"]
String#delete returns a copy of chars with all characters in delete's argument deleted.
Just for fun, here's how you might do this with a regular expression, thanks to the magic of positive lookahead:
fruits = %w[Banana Apple Orange Grape]
p fruits.grep(/(?=.*e)(?=.*p)/i)
# => ["Apple", "Grape"]
This is nice and succinct, but the regex is a bit occult, and it gets worse if you want to generalize it:
def match_chars(arr, chars)
expr_parts = chars.chars.map {|c| "(?=.*#{Regexp.escape(c)})" }
arr.grep(Regexp.new(expr_parts.join, true))
end
p match_chars(fruits, "ar")
# => ["Orange", "Grape"]
Also, I'm pretty sure this would be outperformed by most or all of the other answers.
fruits = ["Banana", "Apple", "Orange", "Grape"]
chars = 'ep'.chars
fruits.select { |fruit| (fruit.split('') & chars).length == chars.length }
#=> ["Apple", "Grape"]
chars.each_char.with_object(fruits.dup){|e, a| a.select!{|s| s.include?(e)}}
# => ["Apple", "Grape"]
To print:
puts chars.each_char.with_object(fruits.dup){|e, a| a.select!{|s| s.include?(e)}}
I'm an absolute beginner, but here's what worked for me
fruits = %w[Banana Apple Orange Grape]
chars = 'ep'
fruits.each {|fruit| puts fruit if fruit.include?('e') && fruit.include?('p')}
Here is one more way to do this:
fruits.select {|f| chars.downcase.chars.all? {|c| f.downcase.include?(c)} }
Try this, first split all characters into an Array ( chars.split("") ) and after check if all are present into word.
fruits.select{|fruit| chars.split("").all? {|char| fruit.include?(char)}}
#=> ["Apple", "Grape"]

retrieve numbers from a string with regex

I have a string which returns duration in the below format.
"152M0S" or "1H22M32S"
I need to extract hours, minutes and seconds from it as numbers.
I tried like the below with regex
video_duration.scan(/(\d+)?.(\d+)M(\d+)S/)
But it does not return as expected. Anyone has any idea where I am going wrong here.
"1H22M0S".scan(/\d+/)
#=> ["1", "22", "0']
You can use this expression: /((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/.
"1H22M32S".match(/((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/)
#=> #<MatchData "1H22M32S" h:"1" m:"22" s:"32">
"152M0S".match(/((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/)
#=> #<MatchData "152M0S" h:nil m:"152" s:"0">
Question mark after group makes it optional. To access data: $~[:h].
If you want to extract numbers, you could do as :
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i).captures
# => ["1", "22", "32"]
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i)['min']
# => "22"
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i)['hour']
# => "1"
Me, I'd hashify:
def hashify(str)
str.gsub(/\d+[HMS]/).with_object({}) { |s,h| h[s[-1]] = s.to_i }
end
hashify "152M0S" #=> {"M"=>152, "S"=>0}
hashify "1H22M32S" #=> {"H"=>1, "M"=>22, "S"=>32}
hashify "32S22M11H" #=> {"S"=>32, "M"=>22, "H"=>11}
hashify "1S" #=> {"S"=>1}

ruby and regex grouping

Here is the code
string = "Looking for the ^[cows]"
footnote = string[/\^\[(.*?)\]/]
I was hoping that footnote would equal cows
What I get is footnote equals ^[cows]
Any help?
Thanks!
You can specify which capture group you want with a second argument to []:
string = "Looking for the ^[cows]"
footnote = string[/\^\[(.*?)\]/, 1]
# footnote == "cows"
According to the String documentation, the #[] method takes a second parameter, an integer, which determines the matching group returned:
a = "hello there"
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
You should use footnote = string[/\^\[(.*?)\]/, 1]
If you want to capture subgroups, you can use Regexp#match:
r = /\^\[(.*?)\]/
r.match(string) # => #<MatchData "^[cows]" 1:"cows">
r.match(string)[0] # => "^[cows]"
r.match(string)[1] # => "cows"
An alternative to using a capture group, and then retrieving it's contents, is to match only what you want. Here are three ways of doing that.
#1 Use a positive lookbehind and a positive lookahead
string[/(?<=\[).*?(?=\])/]
#=> "cows"
#2 Use match but forget (\K) and a positive lookahead
string[/\[\K.*?(?=\])/]
#=> "cows"
#3 Use String#gsub
string.gsub(/.*?\[|\].*/,'')
#=> "cows"

How to get a substring of text?

I have text with length ~700. How do I get only ~30 of its first characters?
If you have your text in your_text variable, you can use:
your_text[0..29]
Use String#slice, also aliased as [].
a = "hello there"
a[1] #=> "e"
a[1,3] #=> "ell"
a[1..3] #=> "ell"
a[6..-1] #=> "there"
a[6..] #=> "there" (requires Ruby 2.6+)
a[-3,2] #=> "er"
a[-4..-2] #=> "her"
a[12..-1] #=> nil
a[-2..-4] #=> ""
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
a["lo"] #=> "lo"
a["bye"] #=> nil
Since you tagged it Rails, you can use truncate:
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-truncate
Example:
truncate(#text, :length => 17)
Excerpt is nice to know too, it lets you display an excerpt of a text Like so:
excerpt('This is an example', 'an', :radius => 5)
# => ...s is an exam...
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-excerpt
if you need it in rails you can use first (source code)
'1234567890'.first(5) # => "12345"
there is also last (source code)
'1234567890'.last(2) # => "90"
alternatively check from/to (source code):
"hello".from(1).to(-2) # => "ell"
If you want a string, then the other answers are fine, but if what you're looking for is the first few letters as characters you can access them as a list:
your_text.chars.take(30)

Resources