Extract the last word in sentence/string? - ruby

I have an array of strings, of different lengths and contents.
Now i'm looking for an easy way to extract the last word from each string, without knowing how long that word is or how long the string is.
something like;
array.each{|string| puts string.fetch(" ", last)

This should work just fine
"my random sentence".split.last # => "sentence"
to exclude punctuation, delete it
"my rando­m sente­nce..,.!?".­split.last­.delete('.­!?,') #=> "sentence"
To get the "last words" as an array from an array you collect
["random sentence...",­ "lorem ipsum!!!"­].collect { |s| s.spl­it.last.delete('.­!?,') } # => ["sentence", "ipsum"]

array_of_strings = ["test 1", "test 2", "test 3"]
array_of_strings.map{|str| str.split.last} #=> ["1","2","3"]

["one two",­ "thre­e four five"­].collect { |s| s.spl­it.last }
=> ["two", "five"]

"a string of words!".match(/(.*\s)*(.+)\Z/)[2] #=> 'words!' catches from the last whitespace on. That would include the punctuation.
To extract that from an array of strings, use it with collect:
["a string of words", "Something to say?", "Try me!"].collect {|s| s.match(/(.*\s)*(.+)\Z/)[2] } #=> ["words", "say?", "me!"]

The problem with all of these solutions is that you only considering spaces for word separation. Using regex you can capture any non-word character as a word separator. Here is what I use:
str = 'Non-space characters, like foo=bar.'
str.split(/\W/).last
# "bar"

This is the simplest way I can think of.
hostname> irb
irb(main):001:0> str = 'This is a string.'
=> "This is a string."
irb(main):002:0> words = str.split(/\s+/).last
=> "string."
irb(main):003:0>

Related

ruby regexp to find a word that does not contain digits

I want my regular expression to return an enumerator that would return blocks with words that are not digits, what is the best way I could get that?
I have tried following:
regexp= /(?=\w+)(?=^(?:(?!\d+).)*$)/
"this is a number 1234".split(regexp) # ["this is a number 1234"]
where I expected (?=\w+) should ensure if that is word or not and I expected (?=^(?:(?!\d+).)*$) to ensure it does not contain any digits.
I expected an output:
["this", "is", "a", "number"]
scan is easier than split for this:
regexp = /\b[[:alpha:]]+\b/
p "this is a number 1234".scan(regexp)
# => ["this", "is", "a", "number"]
Try Following.
p "this is a number 1234".scan(/\D+/).first.split(' ')

Scanning through a hash and return a value if true

Based on my hash, I want to match it if it's in the string:
def conv
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60 }
str.match( Regexp.union( hash.keys.to_s ) )
end
puts conv # => <blank>
The above does not work but this only matches "one":
str.match( Regexp.union( hash[0].to_s ) )
Edited:
Any idea how to match "one", "two" and sixty in the string exactly?
If my string has "sixt" it return "6" and that should not happen based on #Cary's answer.
You need to convert each element of hash.keys to a string, rather than converting the array hash.keys to a string, and you should use String#scan rather than String#match. You may also need to play around with the regex until it returns everyhing you want and nothing you don't want.
Let's first look at your example:
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60}
We might consider constructing the regex with word breaks (\b) before and after each word we wish to match:
r0 = Regexp.union(hash.keys.map { |k| /\b#{k.to_s}\b/ })
#=> /(?-mix:\bone\b)|(?-mix:\btwo\b)|(?-mix:\bsix\b)|(?-mix:\bsixty\b)/
str.scan(r0)
#=> ["one", "two", "sixty"]
Without the word breaks, scan would return ["one", "two", "six"], as "sixty" in str would match "six". (Word breaks are zero-width. One before a string requires that the string be preceded by a non-word character or be at the beginning of the string. One after a string requires that the string be followed by a non-word character or be at the end of the string.)
Depending on your requirements, word breaks may not be sufficient or suitable. Suppose, for example (with hash above):
str = "I only have one, two, twenty-one or maybe sixty"
and we do not wish to match "twenty-one". However,
str.scan(r0)
#=> ["one", "two", "one", "sixty"]
One option would be to use a regex that demands that matches be preceded by whitespace or be at the beginning of the string, and be followed by whitespace or be at the end of the string:
r1 = Regexp.union(hash.keys.map { |k| /(?<=^|\s)#{k.to_s}(?=\s|$)/ })
str.scan(r1)
#=> ["sixty"]
(?<=^|\s) is a positive lookbehind; (?=\s|$) is a positive lookahead.
Well, that avoided the match of "twenty-one" (good), but we no longer matched "one" or "two" (bad) because of the comma following each of those words in the string.
Perhaps the solution here is to first remove punctuation, which allows us to then apply either of the above regexes:
str.tr('.,?!:;-','')
#=> "I only have one two twentyone or maybe sixty"
str.tr('.,?!:;-','').scan(r0)
#=> ["one", "two", "sixty"]
str.tr('.,?!:;-','').scan(r1)
#=> ["one", "two", "sixty"]
You may also want to change / at the end of the regex to /i to make the match insensitive to case.1
1 Historical note for readers who want to know why 'a' is called lower case and 'A' is called upper case.

How could I split string and keep the whitespaces, as well?

I did the following in Python:
s = 'This is a text'
re.split('(\W)', s)
# => ['This', ' ', 'is', ' ', 'a', 'text']
It worked just great. How do I do the same split in Ruby?
I've tried this, but it eats up my whitespace.:
s = "This is a text"
s.split(/[\W]/)
# => ["This", "is", "a", "text"]
From the String#split documentation:
If pattern contains groups, the respective matches will be returned in
the array as well.
This works in Ruby the same as in Python, square brackets are for specify character classes, not match groups:
"foo bar baz".split(/(\W)/)
# => ["foo", " ", "bar", " ", "baz"]
toro2k's answer is most straightforward. Alternatively,
string.scan(/\w+|\W+/)

Remove phrases in Array from string

I need to remove some phrases from a string in Ruby. The phrases are defined inside an array. It could look like this:
remove = ["Test", "Another One", "Something Else"]
Then I want to check and remove these from a given string.
"This is a Test" => "This is a "
"This is Another One" => "This is "
"This is Another Two" => "This is Another Two"
Using Ruby 1.9.3 and Rail 3.2.6.
ary = ["Test", "Another One", "Something Else", "(RegExp i\s escaped)"]
string.gsub(Regexp.union(ary), '')
Regexp.union can be used to compile an array of strings (or regexpes) into a single regexp which therefore only requires a single search & replace.
Regexp.union ['string', /regexp?/i] #=> /string|(?i-mx:regexp?)/
Simplest (but not most efficient):
# Non-mutating
cleaned = str
remove.each{ |s| cleaned = cleaned.gsub(s,'') }
# Mutating
remove.each{ |s| str.gsub!(s,'') }
More efficient (but less clear):
# Non-mutating
cleaned = str.gsub(Regexp.union(remove), '')
# Mutating
str.gsub!(Regexp.union(remove), '')

putting enumeration with spaces in rails collection

irb(main):001:0> t = %w{this is a test}
=> ["this", "is", "a", "test"]
irb(main):002:0> t.size
=> 4
irb(main):003:0> t = %w{"this is" a test}
=> ["\"this", "is\"", "a", "test"]
irb(main):004:0> t.size
=> 4
In the end I expected t.size to be 3.
As suggested, each space has to be escaped ...which turns out to be a lot of work. What other options are there? I have a list of about 30 words that I need to put in a collection because I am showing them as checkboxes using simple_form
Why not just use a normal array so no one has to visually parse all the escaping to figure out what's going on? This is pretty clear:
t = [
'this is',
'a',
'test'
]
and the people maintaining your code won't hate you for using %w{} when it isn't appropriate or when they mess things up because they didn't see your escaped whitespace.
You need to escape the space with a '\', like t = %w{this\ is a test} if you dont want that space to be a splitter.
Escape the space using \:
%w{this\ is a test}
You can escape the space %w{this\ is a test} to get ['this is', 'a', 'test'], but in general I wouldn't use %w unless then intention is to split on whitespace.
As others have pointed out use the %w{} construct when spaces are the separator for the words. If you have items that must be quoted and still want to use the construct you can do:
> %w{a test here}.unshift("This is")
=> ["This is", "a", "test", "here"]
require 'csv'
str = '"this is" a test'
p CSV.parse_line(str,{:col_sep=>' '})
#=> ["this is", "a", "test"]

Resources