How to get a substring of text? - ruby

I have text with length ~700. How do I get only ~30 of its first characters?

If you have your text in your_text variable, you can use:
your_text[0..29]

Use String#slice, also aliased as [].
a = "hello there"
a[1] #=> "e"
a[1,3] #=> "ell"
a[1..3] #=> "ell"
a[6..-1] #=> "there"
a[6..] #=> "there" (requires Ruby 2.6+)
a[-3,2] #=> "er"
a[-4..-2] #=> "her"
a[12..-1] #=> nil
a[-2..-4] #=> ""
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
a["lo"] #=> "lo"
a["bye"] #=> nil

Since you tagged it Rails, you can use truncate:
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-truncate
Example:
truncate(#text, :length => 17)
Excerpt is nice to know too, it lets you display an excerpt of a text Like so:
excerpt('This is an example', 'an', :radius => 5)
# => ...s is an exam...
http://api.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html#method-i-excerpt

if you need it in rails you can use first (source code)
'1234567890'.first(5) # => "12345"
there is also last (source code)
'1234567890'.last(2) # => "90"
alternatively check from/to (source code):
"hello".from(1).to(-2) # => "ell"

If you want a string, then the other answers are fine, but if what you're looking for is the first few letters as characters you can access them as a list:
your_text.chars.take(30)

Related

Generate a hash of all letters and digits

Using ruby, how do I make a hash of each letter in the alphabet (keys) and 1-26 (values) ?
I need to create a hash with "a" to "z" in keys and 1 to 26 in values but I do not want to write myself alphabet = {'a'=>1,'b'=>2,....'y'=>25,'z'=>26}
I need this in my code to print alphabet[i] if alphabet.key?(i)
('a'..'z').each.with_index(1).to_h
#=> {"a"=>1, "b"=>2, "c"=>3, "d"=>4, "e"=>5, "f"=>6, "g"=>7, "h"=>8, "i"=>9, "j"=>10,
# "k"=>11, "l"=>12, "m"=>13, "n"=>14, "o"=>15, "p"=>16, "q"=>17, "r"=>18, "s"=>19,
# "t"=>20, "u"=>21, "v"=>22, "w"=>23, "x"=>24, "y"=>25, "z"=>26}
Steps:
('a'..'z') - create a Range of alphabetic letters "a" through "z" inclusive
each - returns an Enumerator
with_index(1) - returns an Enumerator of each element of the initial Range combined with its index (starting at 1) e.g. [["a",1],["b",2],...]
to_h - convert the Enumerator to a Hash
Update:
A bit more esoteric but this will also work
enum = Enumerator.produce('a') {|e| e == 'z' ? raise(StopIteration) : e.succ }.tap do |e|
e.define_singleton_method(:[]) {|elem| find_index(elem)&.+(1) }
e.define_singleton_method(:to_h) { with_index(1).to_h }
end
enum['w']
#=> 23
enum['W']
#=> nil
enum.to_h
#=> {"a"=>1, "b"=>2, "c"=>3, "d"=>4, "e"=>5, "f"=>6, "g"=>7, "h"=>8, "i"=>9, "j"=>10,
# "k"=>11, "l"=>12, "m"=>13, "n"=>14, "o"=>15, "p"=>16, "q"=>17, "r"=>18, "s"=>19,
# "t"=>20, "u"=>21, "v"=>22, "w"=>23, "x"=>24, "y"=>25, "z"=>26}
With two ranges, zip and to_h
('a'..'z').zip(1..26).to_h
Hash[('a'..'z').zip(1.upto(26))]
Depending on requirements you may be able to save memory by using an empty hash with a default proc.
h = Hash.new do |_h,k|
k.is_a?(String) && k.match?(/\A[a-z]\z/) ? (k.ord - 96) : nil
end
#=> {}
h['a'] #=> 1
h['z'] #=> 26
h['R'] #=> nil
h['cat'] #=> nil
h[2] #=> nil
h[{a:1}] #=> nil
See Hash::new and String#match?.
The regular expression reads, "match the beginning of the string (\A) followed by one lowercase letter ([a-z]) followed by the end of the string (\z). [a-z] denotes a character class.
If all lowercase letters must comprise the hash's keys one may write the following.
('a'..'z').to_h { |c| [c, c.ord - 96] }
#=> {"a"=>1, "b"=>2,..., "y"=>25, "z"=>26}
See Enumerable#to_h.
There have been better answers given already, but here's an entirely different option using a times loop to simply increment the keys and values of a starter hash using next:
h = {"a" => 1}
25.times {h[h.keys.last.next] = h.values.last.next}
h
#=> {"a"=>1, "b"=>2, "c"=>3, "d"=>4, "e"=>5, "f"=>6, "g"=>7, "h"=>8, "i"=>9, "j"=>10, "k"=>11, "l"=>12, "m"=>13, "n"=>14, "o"=>15, "p"=>16, "q"=>17, "r"=>18, "s"=>19, "t"=>20, "u"=>21, "v"=>22, "w"=>23, "x"=>24, "y"=>25, "z"=>26}

Ruby string char chunking

I have a string "wwwggfffw" and want to break it up into an array as follows:
["www", "gg", "fff", "w"]
Is there a way to do this with regex?
"wwwggfffw".scan(/((.)\2*)/).map(&:first)
scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).
EDIT to explain the regexp ((.)\2*) itself:
( start group #1, consisting of
( start group #2, consisting of
. any one character
) and nothing else
\2 followed by the content of the group #2
* repeated any number of times (including zero)
) and nothing else.
So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.
You can use back references, something like
'wwwggfffw'.scan(/((.)\2*)/).map{ |s| s[0] }
will work
Here's one that's not using regex but works well:
def chunk(str)
chars = str.chars
chars.inject([chars.shift]) do |arr, char|
if arr[-1].include?(char)
arr[-1] << char
else
arr << char
end
arr
end
end
In my benchmarks it's faster than the regex answers here (with the example string you gave, at least).
Another non-regex solution, this one using Enumerable#slice_when, which made its debut in Ruby v.2.2:
str.each_char.slice_when { |a,b| a!=b }.map(&:join)
#=> ["www", "gg", "fff", "w"]
Another option is:
str.scan(Regexp.new(str.squeeze.each_char.map { |c| "(#{c}+)" }.join)).first
#=> ["www", "gg", "fff", "w"]
Here the steps are as follows
s = str.squeeze
#=> "wgfw"
a = s.each_char
#=> #<Enumerator: "wgfw":each_char>
This enumerator generates the following elements:
a.to_a
#=> ["w", "g", "f", "w"]
Continuing
b = a.map { |c| "(#{c}+)" }
#=> ["(w+)", "(g+)", "(f+)", "(w+)"]
c = b.join
#=> "(w+)(g+)(f+)(w+)"
r = Regexp.new(c)
#=> /(w+)(g+)(f+)(w+)/
d = str.scan(r)
#=> [["www", "gg", "fff", "w"]]
d.first
#=> ["www", "gg", "fff", "w"]
Here's one more way of doing it without a regex:
'wwwggfffw'.chars.chunk(&:itself).map{ |s| s[1].join }
# => ["www", "gg", "fff", "w"]

retrieve numbers from a string with regex

I have a string which returns duration in the below format.
"152M0S" or "1H22M32S"
I need to extract hours, minutes and seconds from it as numbers.
I tried like the below with regex
video_duration.scan(/(\d+)?.(\d+)M(\d+)S/)
But it does not return as expected. Anyone has any idea where I am going wrong here.
"1H22M0S".scan(/\d+/)
#=> ["1", "22", "0']
You can use this expression: /((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/.
"1H22M32S".match(/((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/)
#=> #<MatchData "1H22M32S" h:"1" m:"22" s:"32">
"152M0S".match(/((?<h>\d+)H)?(?<m>\d+)M(?<s>\d+)S/)
#=> #<MatchData "152M0S" h:nil m:"152" s:"0">
Question mark after group makes it optional. To access data: $~[:h].
If you want to extract numbers, you could do as :
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i).captures
# => ["1", "22", "32"]
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i)['min']
# => "22"
"1H22M32S".match(/(?<hour>(\d+))H(?<min>(\d+))M(?<sec>(\d+))S/i)['hour']
# => "1"
Me, I'd hashify:
def hashify(str)
str.gsub(/\d+[HMS]/).with_object({}) { |s,h| h[s[-1]] = s.to_i }
end
hashify "152M0S" #=> {"M"=>152, "S"=>0}
hashify "1H22M32S" #=> {"H"=>1, "M"=>22, "S"=>32}
hashify "32S22M11H" #=> {"S"=>32, "M"=>22, "H"=>11}
hashify "1S" #=> {"S"=>1}

ruby and regex grouping

Here is the code
string = "Looking for the ^[cows]"
footnote = string[/\^\[(.*?)\]/]
I was hoping that footnote would equal cows
What I get is footnote equals ^[cows]
Any help?
Thanks!
You can specify which capture group you want with a second argument to []:
string = "Looking for the ^[cows]"
footnote = string[/\^\[(.*?)\]/, 1]
# footnote == "cows"
According to the String documentation, the #[] method takes a second parameter, an integer, which determines the matching group returned:
a = "hello there"
a[/[aeiou](.)\1/] #=> "ell"
a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil
You should use footnote = string[/\^\[(.*?)\]/, 1]
If you want to capture subgroups, you can use Regexp#match:
r = /\^\[(.*?)\]/
r.match(string) # => #<MatchData "^[cows]" 1:"cows">
r.match(string)[0] # => "^[cows]"
r.match(string)[1] # => "cows"
An alternative to using a capture group, and then retrieving it's contents, is to match only what you want. Here are three ways of doing that.
#1 Use a positive lookbehind and a positive lookahead
string[/(?<=\[).*?(?=\])/]
#=> "cows"
#2 Use match but forget (\K) and a positive lookahead
string[/\[\K.*?(?=\])/]
#=> "cows"
#3 Use String#gsub
string.gsub(/.*?\[|\].*/,'')
#=> "cows"

Find just part of string with a regex

I have a string like so:
"#[30:Larry Middleton]"
I want to return just 30. Where 30 will always be digits, and can be of 1 to infinity in length.
I've tried:
user_id = result.match(/#\[(\d+):.*]/)
But that returns everything. How can I get back just 30?
If that's really all your string, you don't need to match the rest of the pattern; just match the consecutive integers:
irb(main):001:0> result = "#[30:Larry Middleton]"
#=> "#[30:Larry Middleton]"
irb(main):002:0> result[/\d+/]
#=> "30"
However, if you need to match this as part of a larger string that might have digits elsewhere:
irb(main):004:0> result[/#\[(\d+):.*?\]/]
#=> "#[30:Larry Middleton]"
irb(main):005:0> result[/#\[(\d+):.*?\]/,1]
#=> "30"
irb(main):006:0> result[/#\[(\d+):.*?\]/,1].to_i
#=> 30
If you need the name also:
irb(main):002:0> m = result.match /#\[(\d+):(.*?)\]/
#=> #<MatchData "#[30:Larry Middleton]" 1:"30" 2:"Larry Middleton">
irb(main):003:0> m[1]
#=> "30"
irb(main):004:0> m[2]
#=> "Larry Middleton"
In Ruby 1.9 you can even name the matches, instead of using the capture number:
irb(main):005:0> m = result.match /#\[(?<id>\d+):(?<name>.*?)\]/
#=> #<MatchData "#[30:Larry Middleton]" id:"30" name:"Larry Middleton">
irb(main):006:0> m[:id]
#=> "30"
irb(main):007:0> m[:name]
#=> "Larry Middleton"
And if you need to find many of these:
irb(main):008:0> result = "First there was #[30:Larry Middleton], age 17, and then there was #[42:Phrogz], age unknown."
#irb(main):015:0> result.scan /#\[(\d+):.*?\]/
#=> [["30"], ["42"]]
irb(main):016:0> result.scan(/#\[(\d+):.*?\]/).flatten.map(&:to_i)
#=> [30, 42]
irb(main):017:0> result.scan(/#\[(\d+):(.*?)\]/).each{ |id,name| puts "#{name} is #{id}" }
Larry is 30
Phrogz is 42
Try this:
user_id = result.match(/#\[(\d+):.*]/)[1]
You've forgot to escape ']':
user_id = result.match(/#\[(\d+):.*\]/)[1]
I don't know ruby, but if it supports lookbehinds and lookaheads:
user_id = result.match(/(?<#\[)\d+(?=:)/)
If not, you should have some way of retrieving subpattern from the match - again, I wouldn't know how.
I prefer String#scan for most of my regex needs, here's what I would do:
results.scan(/#\[(\d+):/).flatten.map(&:to_i).first
For your second question about getting the name:
results.scan(/(\d+):([A-Za-z ]+)\]$/).flatten[1]
Scan will always return an array of sub string matches:
"#[123:foo bars]".scan(/\d+/) #=> ['123']
If you include a pattern in parens, then each match for those "sub-patterns" will be included in a sub array:
"#[123:foo bars]".scan(/(\d+):(\w+)/) #=> [['123'], ['foo']]
That's why we have to do flatten on results involving sub-patterns:
[['123'], ['foo']].flatten = ['123', 'foo']
Also it always returns strings, that's why conversion to integer is needed in the first example:
['123'].to_i = 123
Hope this is helpful.

Resources