How to break a string into two arrays in Ruby - ruby

Is there a way to extract the strings removed by String#split into a separate array?
s = "This is a simple, uncomplicated sentence."
a = s.split( /,|\./ ) #=> [ "This is a simple", "uncomplicated sentence" ]
x = ... => should contain [ ",", "." ]
Note that the actual regex I need to use is much more complex than this example.

Something like this ?
a = s.scan( /,|\./ )

When you want both the matched delimiters and the substrings in between as in Stefan's comment, then you should use split with captures.
"This is a simple, uncomplicated sentence."
.split(/([,.])/)
# => ["This is a simple", ",", " uncomplicated sentence", "."]
If you want to separate them into different arrays, then do:
a, x =
"This is a simple, uncomplicated sentence."
.split(/([,.])/).each_slice(2).to_a.transpose
a # => ["This is a simple", " uncomplicated sentence"]
x # => [",", "."]
or
a =
"This is a simple, uncomplicated sentence."
.split(/([,.])/)
a.select.with_index{|_, i| i.even?}
# => ["This is a simple", " uncomplicated sentence"]
a.select.with_index{|_, i| i.odd?}
# => [",", "."]

try this:
a = s.split(/,/)[1..-1]

Related

Ruby: Matching a delimiter with Regex

I'm trying to solve this with a regex pattern, and even though my test passes with this solution, I would like split to only have ["1", "2"] inside the array. Is there a better way of doing this?
irb testing:
s = "//;\n1;2" # when given a delimiter of ';'
s2 = "1,2,3" # should read between commas
s3 = "//+\n2+2" # should read between delimiter of '+'
s.split(/[,\n]|[^0-9]/)
=> ["", "", "", "", "1", "2"]
Production:
module StringCalculator
def self.add(input)
solution = input.scan(/\d+/).map(&:to_i).reduce(0, :+)
input.end_with?("\n") ? nil : solution
end
end
Test:
context 'when given a newline delimiter' do
it 'should read between numbers' do
expect(StringCalculator.add("1\n2,3")).to eq(6)
end
it 'should not end in a newline' do
expect(StringCalculator.add("1,\n")).to be_nil
end
end
context 'when given different delimiter' do
it 'should support that delimiter' do
expect(StringCalculator.add("//;\n1;2")).to eq(3)
end
end
Very simple using String#scan :
s = "//;\n1;2"
s.scan(/\d/) # => ["1", "2"]
/\d/ - A digit character ([0-9])
Note :
If you have a string like below then, you should use /\d+/.
s = "//;\n11;2"
s.scan(/\d+/) # => ["11", "2"]
You're getting data that looks like this string: //1\n212
If you're getting the data as a file, then treat it as two separate lines. If it's a string, then, again, treat it as two separate lines. In either case it'd look like
//1
212
when output.
If it's a string:
input = "//1\n212".split("\n")
delimiter = input.first[2] # => "1"
values = input.last.split(delimiter) # => ["2", "2"]
If it's a file:
line = File.foreach('foo.txt')
delimiter = line.next[2] # => "1"
values = line.next.chomp.split(delimiter) # => ["2", "2"]

How could I split string and keep the whitespaces, as well?

I did the following in Python:
s = 'This is a text'
re.split('(\W)', s)
# => ['This', ' ', 'is', ' ', 'a', 'text']
It worked just great. How do I do the same split in Ruby?
I've tried this, but it eats up my whitespace.:
s = "This is a text"
s.split(/[\W]/)
# => ["This", "is", "a", "text"]
From the String#split documentation:
If pattern contains groups, the respective matches will be returned in
the array as well.
This works in Ruby the same as in Python, square brackets are for specify character classes, not match groups:
"foo bar baz".split(/(\W)/)
# => ["foo", " ", "bar", " ", "baz"]
toro2k's answer is most straightforward. Alternatively,
string.scan(/\w+|\W+/)

Remove phrases in Array from string

I need to remove some phrases from a string in Ruby. The phrases are defined inside an array. It could look like this:
remove = ["Test", "Another One", "Something Else"]
Then I want to check and remove these from a given string.
"This is a Test" => "This is a "
"This is Another One" => "This is "
"This is Another Two" => "This is Another Two"
Using Ruby 1.9.3 and Rail 3.2.6.
ary = ["Test", "Another One", "Something Else", "(RegExp i\s escaped)"]
string.gsub(Regexp.union(ary), '')
Regexp.union can be used to compile an array of strings (or regexpes) into a single regexp which therefore only requires a single search & replace.
Regexp.union ['string', /regexp?/i] #=> /string|(?i-mx:regexp?)/
Simplest (but not most efficient):
# Non-mutating
cleaned = str
remove.each{ |s| cleaned = cleaned.gsub(s,'') }
# Mutating
remove.each{ |s| str.gsub!(s,'') }
More efficient (but less clear):
# Non-mutating
cleaned = str.gsub(Regexp.union(remove), '')
# Mutating
str.gsub!(Regexp.union(remove), '')

Extract the last word in sentence/string?

I have an array of strings, of different lengths and contents.
Now i'm looking for an easy way to extract the last word from each string, without knowing how long that word is or how long the string is.
something like;
array.each{|string| puts string.fetch(" ", last)
This should work just fine
"my random sentence".split.last # => "sentence"
to exclude punctuation, delete it
"my rando­m sente­nce..,.!?".­split.last­.delete('.­!?,') #=> "sentence"
To get the "last words" as an array from an array you collect
["random sentence...",­ "lorem ipsum!!!"­].collect { |s| s.spl­it.last.delete('.­!?,') } # => ["sentence", "ipsum"]
array_of_strings = ["test 1", "test 2", "test 3"]
array_of_strings.map{|str| str.split.last} #=> ["1","2","3"]
["one two",­ "thre­e four five"­].collect { |s| s.spl­it.last }
=> ["two", "five"]
"a string of words!".match(/(.*\s)*(.+)\Z/)[2] #=> 'words!' catches from the last whitespace on. That would include the punctuation.
To extract that from an array of strings, use it with collect:
["a string of words", "Something to say?", "Try me!"].collect {|s| s.match(/(.*\s)*(.+)\Z/)[2] } #=> ["words", "say?", "me!"]
The problem with all of these solutions is that you only considering spaces for word separation. Using regex you can capture any non-word character as a word separator. Here is what I use:
str = 'Non-space characters, like foo=bar.'
str.split(/\W/).last
# "bar"
This is the simplest way I can think of.
hostname> irb
irb(main):001:0> str = 'This is a string.'
=> "This is a string."
irb(main):002:0> words = str.split(/\s+/).last
=> "string."
irb(main):003:0>

putting enumeration with spaces in rails collection

irb(main):001:0> t = %w{this is a test}
=> ["this", "is", "a", "test"]
irb(main):002:0> t.size
=> 4
irb(main):003:0> t = %w{"this is" a test}
=> ["\"this", "is\"", "a", "test"]
irb(main):004:0> t.size
=> 4
In the end I expected t.size to be 3.
As suggested, each space has to be escaped ...which turns out to be a lot of work. What other options are there? I have a list of about 30 words that I need to put in a collection because I am showing them as checkboxes using simple_form
Why not just use a normal array so no one has to visually parse all the escaping to figure out what's going on? This is pretty clear:
t = [
'this is',
'a',
'test'
]
and the people maintaining your code won't hate you for using %w{} when it isn't appropriate or when they mess things up because they didn't see your escaped whitespace.
You need to escape the space with a '\', like t = %w{this\ is a test} if you dont want that space to be a splitter.
Escape the space using \:
%w{this\ is a test}
You can escape the space %w{this\ is a test} to get ['this is', 'a', 'test'], but in general I wouldn't use %w unless then intention is to split on whitespace.
As others have pointed out use the %w{} construct when spaces are the separator for the words. If you have items that must be quoted and still want to use the construct you can do:
> %w{a test here}.unshift("This is")
=> ["This is", "a", "test", "here"]
require 'csv'
str = '"this is" a test'
p CSV.parse_line(str,{:col_sep=>' '})
#=> ["this is", "a", "test"]

Resources