retrieve numbers from a string with regex - ruby

I have a string which returns duration in the below format.
"152M0S" or "1H22M32S"
I need to extract hours, minutes and seconds from it as numbers.
I tried like the below with regex
video_duration.scan(/(\d+)?.(\d+)M(\d+)S/)
But it does not return as expected. Anyone has any idea where I am going wrong here.

"1H22M0S".scan(/\d+/)
#=> ["1", "22", "0']

You can use this expression: /((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/.
"1H22M32S".match(/((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/)
#=> #<MatchData "1H22M32S" h:"1" m:"22" s:"32">
"152M0S".match(/((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/)
#=> #<MatchData "152M0S" h:nil m:"152" s:"0">
Question mark after group makes it optional. To access data: $~[:h].

If you want to extract numbers, you could do as :
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i).captures
# => ["1", "22", "32"]
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i)['min']
# => "22"
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i)['hour']
# => "1"

Me, I'd hashify:
def hashify(str)
str.gsub(/\d+[HMS]/).with_object({}) { |s,h| h[s[-1]] = s.to_i }
end
hashify "152M0S" #=> {"M"=>152, "S"=>0}
hashify "1H22M32S" #=> {"H"=>1, "M"=>22, "S"=>32}
hashify "32S22M11H" #=> {"S"=>32, "M"=>22, "H"=>11}
hashify "1S" #=> {"S"=>1}

Related

Ruby string char chunking

I have a string "wwwggfffw" and want to break it up into an array as follows:
["www", "gg", "fff", "w"]
Is there a way to do this with regex?
"wwwggfffw".scan(/((.)\2*)/).map(&:first)
scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).
EDIT to explain the regexp ((.)\2*) itself:
( start group #1, consisting of
( start group #2, consisting of
. any one character
) and nothing else
\2 followed by the content of the group #2
* repeated any number of times (including zero)
) and nothing else.
So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.
You can use back references, something like
'wwwggfffw'.scan(/((.)\2*)/).map{ |s| s[0] }
will work
Here's one that's not using regex but works well:
def chunk(str)
chars = str.chars
chars.inject([chars.shift]) do |arr, char|
if arr[-1].include?(char)
arr[-1] << char
else
arr << char
end
arr
end
end
In my benchmarks it's faster than the regex answers here (with the example string you gave, at least).
Another non-regex solution, this one using Enumerable#slice_when, which made its debut in Ruby v.2.2:
str.each_char.slice_when { |a,b| a!=b }.map(&:join)
#=> ["www", "gg", "fff", "w"]
Another option is:
str.scan(Regexp.new(str.squeeze.each_char.map { |c| "(#{c}+)" }.join)).first
#=> ["www", "gg", "fff", "w"]
Here the steps are as follows
s = str.squeeze
#=> "wgfw"
a = s.each_char
#=> #<Enumerator: "wgfw":each_char>
This enumerator generates the following elements:
a.to_a
#=> ["w", "g", "f", "w"]
Continuing
b = a.map { |c| "(#{c}+)" }
#=> ["(w+)", "(g+)", "(f+)", "(w+)"]
c = b.join
#=> "(w+)(g+)(f+)(w+)"
r = Regexp.new(c)
#=> /(w+)(g+)(f+)(w+)/
d = str.scan(r)
#=> [["www", "gg", "fff", "w"]]
d.first
#=> ["www", "gg", "fff", "w"]
Here's one more way of doing it without a regex:
'wwwggfffw'.chars.chunk(&:itself).map{ |s| s[1].join }
# => ["www", "gg", "fff", "w"]

How do you replace a string with a hash collection value in Ruby?

I have a hash collection:
my_hash = {"1" => "apple", "2" => "bee", "3" => "cat"}
What syntax would I use to replace the first occurrence of the key with hash collection value in a string?
eg my input string:
str = I want a 3
The resulting string would be:
str = I want a cat
My one liner:
hash.each { |k, v| str[k] &&= v }
or using String#sub! method:
hash.each { |k, v| str.sub!(k, v) }
"I want a %{b}" % {c: "apple", b: "bee", a: "cat"}
=> "I want a bee"
Assuming Ruby 1.9 or later:
str.gsub /\d/, my_hash
I didn't understand your problem, but you can try this:
my_hash = {"1" => "apple", "2" => "bee", "3" => "cat"}
str = "I want a 3"
str.gsub(/[[:word:]]+/).each do |word|
my_hash[word] || word
end
#=> "I want a cat"
:D
Just to add point free style abuse to fl00r's answer:
my_hash = {"1" => "apple", "2" => "bee", "3" => "cat"}
my_hash.default_proc = Proc.new {|hash, key| key}
str = "I want a 3"
str.gsub(/[[:word:]]+/).each(&my_hash.method(:[]))
my_hash = {"1" => "apple", "2" => "bee", "3" => "cat"}
str = "I want a 3"
If there isn't any general pattern for the strings you want to substitute, you can use:
str.sub /#{my_hash.keys.map { |s| Regexp.escape s }.join '|'}/, my_hash
But if there is one, the code becomes much simpler, e.g.:
str.sub /[0-9]+/, my_hash
If you want to substitute all the occurrences, not only the first one, use gsub.
You can use String.sub in ruby 1.9:
string.sub(key, hash[key])
The following code replace the first occurrence of the key with hash collection value in the given string str
str.gsub(/\w+/) { |m| my_hash.fetch(m,m)}
=> "I want a cat"

Find just part of string with a regex

I have a string like so:
"#[30:Larry Middleton]"
I want to return just 30. Where 30 will always be digits, and can be of 1 to infinity in length.
I've tried:
user_id = result.match(/#\[(\d+):.*]/)
But that returns everything. How can I get back just 30?
If that's really all your string, you don't need to match the rest of the pattern; just match the consecutive integers:
irb(main):001:0> result = "#[30:Larry Middleton]"
#=> "#[30:Larry Middleton]"
irb(main):002:0> result[/\d+/]
#=> "30"
However, if you need to match this as part of a larger string that might have digits elsewhere:
irb(main):004:0> result[/#\[(\d+):.*?\]/]
#=> "#[30:Larry Middleton]"
irb(main):005:0> result[/#\[(\d+):.*?\]/,1]
#=> "30"
irb(main):006:0> result[/#\[(\d+):.*?\]/,1].to_i
#=> 30
If you need the name also:
irb(main):002:0> m = result.match /#\[(\d+):(.*?)\]/
#=> #<MatchData "#[30:Larry Middleton]" 1:"30" 2:"Larry Middleton">
irb(main):003:0> m[1]
#=> "30"
irb(main):004:0> m[2]
#=> "Larry Middleton"
In Ruby 1.9 you can even name the matches, instead of using the capture number:
irb(main):005:0> m = result.match /#\[(?<id>\d+):(?<name>.*?)\]/
#=> #<MatchData "#[30:Larry Middleton]" id:"30" name:"Larry Middleton">
irb(main):006:0> m[:id]
#=> "30"
irb(main):007:0> m[:name]
#=> "Larry Middleton"
And if you need to find many of these:
irb(main):008:0> result = "First there was #[30:Larry Middleton], age 17, and then there was #[42:Phrogz], age unknown."
#irb(main):015:0> result.scan /#\[(\d+):.*?\]/
#=> [["30"], ["42"]]
irb(main):016:0> result.scan(/#\[(\d+):.*?\]/).flatten.map(&:to_i)
#=> [30, 42]
irb(main):017:0> result.scan(/#\[(\d+):(.*?)\]/).each{ |id,name| puts "#{name} is #{id}" }
Larry is 30
Phrogz is 42
Try this:
user_id = result.match(/#\[(\d+):.*]/)[1]
You've forgot to escape ']':
user_id = result.match(/#\[(\d+):.*\]/)[1]
I don't know ruby, but if it supports lookbehinds and lookaheads:
user_id = result.match(/(?<#\[)\d+(?=:)/)
If not, you should have some way of retrieving subpattern from the match - again, I wouldn't know how.
I prefer String#scan for most of my regex needs, here's what I would do:
results.scan(/#\[(\d+):/).flatten.map(&:to_i).first
For your second question about getting the name:
results.scan(/(\d+):([A-Za-z ]+)\]$/).flatten[1]
Scan will always return an array of sub string matches:
"#[123:foo bars]".scan(/\d+/) #=> ['123']
If you include a pattern in parens, then each match for those "sub-patterns" will be included in a sub array:
"#[123:foo bars]".scan(/(\d+):(\w+)/) #=> [['123'], ['foo']]
That's why we have to do flatten on results involving sub-patterns:
[['123'], ['foo']].flatten = ['123', 'foo']
Also it always returns strings, that's why conversion to integer is needed in the first example:
['123'].to_i = 123
Hope this is helpful.

Check value in loop

I'm iterating through an array:
#fileArray.each() {
|x|
}
How can I access the value x to check if it begins with a specific string?
test = ['abc', 'bcef', 'abcdef']
p test.select{|word| word.start_with?('abc')}
#=> ["abc", "abcdef"]
# or the very short:
test.grep(/^abc/)
#=> ["abc", "abcdef"]
This seems to do the trick!
test = ['abc', 'bcabcef', 'abcdef']
test.each do |x|
if x.match(/^abc/)
puts x
end
end
Outputs:
abc
abcdef
You could use select.
["a","ab","b","ac","c"].select{|x| x[0] == "a"}
=> ["a", "ab", "ac"]
If not, then you can just do
x[0..5] == "String"

How to get a substring of text?

I have text with length ~700. How do I get only ~30 of its first characters?
If you have your text in your_text variable, you can use:
your_text[0..29]
Use String#slice, also aliased as [].
a = "hello there"
a[1] #=> "e"
a[1,3] #=> "ell"
a[1..3] #=> "ell"
a[6..-1] #=> "there"
a[6..] #=> "there" (requires Ruby 2.6+)
a[-3,2] #=> "er"
a[-4..-2] #=> "her"
a[12..-1] #=> nil
a[-2..-4] #=> ""
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
a["lo"] #=> "lo"
a["bye"] #=> nil
Since you tagged it Rails, you can use truncate:
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-truncate
Example:
truncate(#text, :length => 17)
Excerpt is nice to know too, it lets you display an excerpt of a text Like so:
excerpt('This is an example', 'an', :radius => 5)
# => ...s is an exam...
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-excerpt
if you need it in rails you can use first (source code)
'1234567890'.first(5) # => "12345"
there is also last (source code)
'1234567890'.last(2) # => "90"
alternatively check from/to (source code):
"hello".from(1).to(-2) # => "ell"
If you want a string, then the other answers are fine, but if what you're looking for is the first few letters as characters you can access them as a list:
your_text.chars.take(30)

Resources