How could I split string and keep the whitespaces, as well? - ruby

I did the following in Python:
s = 'This is a text'
re.split('(\W)', s)
# => ['This', ' ', 'is', ' ', 'a', 'text']
It worked just great. How do I do the same split in Ruby?
I've tried this, but it eats up my whitespace.:
s = "This is a text"
s.split(/[\W]/)
# => ["This", "is", "a", "text"]

From the String#split documentation:
If pattern contains groups, the respective matches will be returned in
the array as well.
This works in Ruby the same as in Python, square brackets are for specify character classes, not match groups:
"foo bar baz".split(/(\W)/)
# => ["foo", " ", "bar", " ", "baz"]

toro2k's answer is most straightforward. Alternatively,
string.scan(/\w+|\W+/)

Related

How to split string into an array into two separate strings

I have the following Ruby on Rails params:
<ActionController::Parameters {"type"=>["abc, def"], "format"=>:json, "controller"=>"order", "action"=>"index"} permitted: false>
I want to check if there's a , in the string, then separate it into two strings like below and update type in params.
<ActionController::Parameters {"type"=>["abc", "def"], "format"=>:json, "controller"=>"order", "action"=>"index"} permitted: false>
I tried to do like below:
params[:type][0].split(",") #=> ["abc", " def"]
but I am not sure why there's a space before the second string.
How can I achieve that?
Because there's a whitespace in your string, that's why the result of using split will also include it in the splitted element for the array.
You could remove first the whitespaces and then use split. Or add ', ' as the split value in order it takes the comma and the space after it. Or depending on the result you're trying to get, to map the resulting elements in the array and remove the whitespaces there, like:
string = 'abc, def'
p string.split ',' # ["abc", " def"]
p string.split ', ' # ["abc", "def"]
p string.delete(' ').split ',' # ["abc", "def"]
p string.split(',').map &:strip # ["abc", "def"]

How do I split a string on capitals unless preceded by a '+'

I have a CamelCased string, which I would like to split into individual words at the capitals, unless the capital is preceded by a '+':
Splitting on the caps is fairly simple in Ruby: s.split(/(?=[A-Z])/)
But I can't figure out how to add the "except after '+'" part.
For example:
s = "FooBashFizz+BuzzXBar"
p s.split(/(?=[A-Z])/)
=> ["Foo", "Bash", "Fizz+", "Buzz", "X", "Bar"]
desired:
=> ["Foo", "Bash", "Fizz+Buzz", "X", "Bar"]
Add a negative lookbehind at the start.
irb(main):001:0> s = "FooBashFizz+BuzzXBar"
=> "FooBashFizz+BuzzXBar"
irb(main):002:0> s.split(/(?<!\+)(?=[A-Z])/)
=> ["Foo", "Bash", "Fizz+Buzz", "X", "Bar"]
Explanation:
(?<!\+) Asserts that the preceding character would be any but not a + symbol.
(?=[A-Z]) Asserts that the following character must be an uppercase letter.
Alternative using String#scan. This also works in Ruby 1.8.
s = "FooBashFizz+BuzzXBar"
s.scan(/[A-Z][a-z]*(?:\+[A-Z][a-z]*)*/)
# => ["Foo", "Bash", "Fizz+Buzz", "X", "Bar"]

How to break a string into two arrays in Ruby

Is there a way to extract the strings removed by String#split into a separate array?
s = "This is a simple, uncomplicated sentence."
a = s.split( /,|\./ ) #=> [ "This is a simple", "uncomplicated sentence" ]
x = ... => should contain [ ",", "." ]
Note that the actual regex I need to use is much more complex than this example.
Something like this ?
a = s.scan( /,|\./ )
When you want both the matched delimiters and the substrings in between as in Stefan's comment, then you should use split with captures.
"This is a simple, uncomplicated sentence."
.split(/([,.])/)
# => ["This is a simple", ",", " uncomplicated sentence", "."]
If you want to separate them into different arrays, then do:
a, x =
"This is a simple, uncomplicated sentence."
.split(/([,.])/).each_slice(2).to_a.transpose
a # => ["This is a simple", " uncomplicated sentence"]
x # => [",", "."]
or
a =
"This is a simple, uncomplicated sentence."
.split(/([,.])/)
a.select.with_index{|_, i| i.even?}
# => ["This is a simple", " uncomplicated sentence"]
a.select.with_index{|_, i| i.odd?}
# => [",", "."]
try this:
a = s.split(/,/)[1..-1]

Extract the last word in sentence/string?

I have an array of strings, of different lengths and contents.
Now i'm looking for an easy way to extract the last word from each string, without knowing how long that word is or how long the string is.
something like;
array.each{|string| puts string.fetch(" ", last)
This should work just fine
"my random sentence".split.last # => "sentence"
to exclude punctuation, delete it
"my rando­m sente­nce..,.!?".­split.last­.delete('.­!?,') #=> "sentence"
To get the "last words" as an array from an array you collect
["random sentence...",­ "lorem ipsum!!!"­].collect { |s| s.spl­it.last.delete('.­!?,') } # => ["sentence", "ipsum"]
array_of_strings = ["test 1", "test 2", "test 3"]
array_of_strings.map{|str| str.split.last} #=> ["1","2","3"]
["one two",­ "thre­e four five"­].collect { |s| s.spl­it.last }
=> ["two", "five"]
"a string of words!".match(/(.*\s)*(.+)\Z/)[2] #=> 'words!' catches from the last whitespace on. That would include the punctuation.
To extract that from an array of strings, use it with collect:
["a string of words", "Something to say?", "Try me!"].collect {|s| s.match(/(.*\s)*(.+)\Z/)[2] } #=> ["words", "say?", "me!"]
The problem with all of these solutions is that you only considering spaces for word separation. Using regex you can capture any non-word character as a word separator. Here is what I use:
str = 'Non-space characters, like foo=bar.'
str.split(/\W/).last
# "bar"
This is the simplest way I can think of.
hostname> irb
irb(main):001:0> str = 'This is a string.'
=> "This is a string."
irb(main):002:0> words = str.split(/\s+/).last
=> "string."
irb(main):003:0>

putting enumeration with spaces in rails collection

irb(main):001:0> t = %w{this is a test}
=> ["this", "is", "a", "test"]
irb(main):002:0> t.size
=> 4
irb(main):003:0> t = %w{"this is" a test}
=> ["\"this", "is\"", "a", "test"]
irb(main):004:0> t.size
=> 4
In the end I expected t.size to be 3.
As suggested, each space has to be escaped ...which turns out to be a lot of work. What other options are there? I have a list of about 30 words that I need to put in a collection because I am showing them as checkboxes using simple_form
Why not just use a normal array so no one has to visually parse all the escaping to figure out what's going on? This is pretty clear:
t = [
'this is',
'a',
'test'
]
and the people maintaining your code won't hate you for using %w{} when it isn't appropriate or when they mess things up because they didn't see your escaped whitespace.
You need to escape the space with a '\', like t = %w{this\ is a test} if you dont want that space to be a splitter.
Escape the space using \:
%w{this\ is a test}
You can escape the space %w{this\ is a test} to get ['this is', 'a', 'test'], but in general I wouldn't use %w unless then intention is to split on whitespace.
As others have pointed out use the %w{} construct when spaces are the separator for the words. If you have items that must be quoted and still want to use the construct you can do:
> %w{a test here}.unshift("This is")
=> ["This is", "a", "test", "here"]
require 'csv'
str = '"this is" a test'
p CSV.parse_line(str,{:col_sep=>' '})
#=> ["this is", "a", "test"]

Resources