putting enumeration with spaces in rails collection - ruby

irb(main):001:0> t = %w{this is a test}
=> ["this", "is", "a", "test"]
irb(main):002:0> t.size
=> 4
irb(main):003:0> t = %w{"this is" a test}
=> ["\"this", "is\"", "a", "test"]
irb(main):004:0> t.size
=> 4
In the end I expected t.size to be 3.
As suggested, each space has to be escaped ...which turns out to be a lot of work. What other options are there? I have a list of about 30 words that I need to put in a collection because I am showing them as checkboxes using simple_form

Why not just use a normal array so no one has to visually parse all the escaping to figure out what's going on? This is pretty clear:
t = [
'this is',
'a',
'test'
]
and the people maintaining your code won't hate you for using %w{} when it isn't appropriate or when they mess things up because they didn't see your escaped whitespace.

You need to escape the space with a '\', like t = %w{this\ is a test} if you dont want that space to be a splitter.

Escape the space using \:
%w{this\ is a test}

You can escape the space %w{this\ is a test} to get ['this is', 'a', 'test'], but in general I wouldn't use %w unless then intention is to split on whitespace.

As others have pointed out use the %w{} construct when spaces are the separator for the words. If you have items that must be quoted and still want to use the construct you can do:
> %w{a test here}.unshift("This is")
=> ["This is", "a", "test", "here"]

require 'csv'
str = '"this is" a test'
p CSV.parse_line(str,{:col_sep=>' '})
#=> ["this is", "a", "test"]

Related

gsub method and regex (case sensitive and case insensitive)

In ruby, I want to substitute some letters in a string, is there a better way of doing this?
string = "my random string"
string.gsub(/a/, "#").gsub(/i/, "1").gsub(/o/, "0")`
And if I want to substitute both "a" and "A" with a "#", I know I can do .gsub(/a/i, "#"), but what if I want to substitute every "a" with an "e" and every "A" with an "E"? Is there a way of abstracting it instead of specifying both like .gsub(/a/, "e").gsub(/A/, "E")?
You can use a Hash. eg:
h = {'a' => '#', 'b' => 'B', 'A' => 'E'}
"aAbBcC".gsub(/[abA]/, h)
# => "#EBBcC"
Not really an answer to your question, but more an other way to proceed: use the translation:
'aaAA'.tr('aA', 'eE')
# => 'eeEE'
For the same transformation, you can also use the ascii table:
'aaAA'.gsub(/a/i) {|c| (c.ord + 4).chr}
# => 'eeEE'
other example (the last character is used by default):
'aAaabbXXX'.tr('baA', 'B#')
# => '####BBXXX'
Here are two variants of #Santosh's answer:
str ="aAbBcC"
h = {'a' => '#', 'b' => 'B', 'A' => 'E'}
#1
str.gsub(/[#{h.keys.join}]/, h) #=> "#EBBcC"
#2
h.default_proc = ->(_,k) { k }
str.gsub(/./, h) #=> "#EBBcC"
These offer better maintainability should h could change in future
You can also pass gsub a block
"my random string".gsub(/[aoi]/) do |match|
case match; when "a"; "#"; when "o"; "0"; when "i"; "I" end
end
# => "my r#nd0m strIng"
The use of a hash is of course much more elegant in this case, but if you have complex rules of substitution it can come in handy to dedicate a class to it.
"my random string".gsub(/[aoi]/) {|match| Substitute.new(match).result}
# => "my raws0m strAINng"

How do I split a string on capitals unless preceded by a '+'

I have a CamelCased string, which I would like to split into individual words at the capitals, unless the capital is preceded by a '+':
Splitting on the caps is fairly simple in Ruby: s.split(/(?=[A-Z])/)
But I can't figure out how to add the "except after '+'" part.
For example:
s = "FooBashFizz+BuzzXBar"
p s.split(/(?=[A-Z])/)
=> ["Foo", "Bash", "Fizz+", "Buzz", "X", "Bar"]
desired:
=> ["Foo", "Bash", "Fizz+Buzz", "X", "Bar"]
Add a negative lookbehind at the start.
irb(main):001:0> s = "FooBashFizz+BuzzXBar"
=> "FooBashFizz+BuzzXBar"
irb(main):002:0> s.split(/(?<!\+)(?=[A-Z])/)
=> ["Foo", "Bash", "Fizz+Buzz", "X", "Bar"]
Explanation:
(?<!\+) Asserts that the preceding character would be any but not a + symbol.
(?=[A-Z]) Asserts that the following character must be an uppercase letter.
Alternative using String#scan. This also works in Ruby 1.8.
s = "FooBashFizz+BuzzXBar"
s.scan(/[A-Z][a-z]*(?:\+[A-Z][a-z]*)*/)
# => ["Foo", "Bash", "Fizz+Buzz", "X", "Bar"]

How could I split string and keep the whitespaces, as well?

I did the following in Python:
s = 'This is a text'
re.split('(\W)', s)
# => ['This', ' ', 'is', ' ', 'a', 'text']
It worked just great. How do I do the same split in Ruby?
I've tried this, but it eats up my whitespace.:
s = "This is a text"
s.split(/[\W]/)
# => ["This", "is", "a", "text"]
From the String#split documentation:
If pattern contains groups, the respective matches will be returned in
the array as well.
This works in Ruby the same as in Python, square brackets are for specify character classes, not match groups:
"foo bar baz".split(/(\W)/)
# => ["foo", " ", "bar", " ", "baz"]
toro2k's answer is most straightforward. Alternatively,
string.scan(/\w+|\W+/)

Splitting string using whitespace while keeping \n as a separate element

I am using Ruby and looking for a way to read in a sample string with the following text:
"This is a test
file, dog cat bark
meow woof woof"
and split elements into an array of characters based on whitespace, but to keep the \n value in the array as a separate element.
I know I can use the string.split(/\n/) to get
["this is a test", "file, dog cat bark", "meow woof woof"]
Also string.split(/ /) yields
["this", "is", "a", "test\nfile,", "dog", "cat", "bark\nmeow", "woof", "woof"]
But I am looking for a way to get:
["this", "is", "a", "test", "\n", "file,", "dog", "cat", "bark", "\n", "meow", "woof", "woof"]
Is there any way to accomplish this using Ruby?
It's a strange thing to do but:
string.split /(?=\n)|(?<=\n)| /
#=> ["This", "is", "a", "test", "\n", "file,", "dog", "cat", "bark", "\n", "meow", "woof", "woof"]
You could turn your logic around a bit and look for what you want instead of looking for the delimiters between what you want. A simple scan like this should do the trick:
>> s.scan(/\S+|\n+/)
=> ["This", "is", "a", "test", "\n", "file,", "dog", "cat", "bark", "\n", "meow", "woof", "woof"]
That assumes that repeated \n should be a single token of course.
This isn't particularly elegant, but you could try replacing "\n" with " \n " (note the spaces surrounding \n), and then split the resulting string on / /.
This is an odd request, and perhaps, if you told us WHY you want to do that, we could help you do it in a more straightforward and conventional fashion.
It looks like you're trying to split the words and still know where your original line-ends were. Having the lines split into individual words is useful for many things, but keeping the line-ends... not so much in my experience.
When I'm dealing with text and need to break the lines up for processing, I do it this way:
text = "This is a test
file, dog cat bark
meow woof woof"
data = text.lines.map(&:split)
At this point, data looks like:
[["This", "is", "a", "test"],
["file,", "dog", "cat", "bark"],
["meow", "woof", "woof"]]
I know that each sub-array was a separate line, so if I need to process by lines I can do it using an iterator like each or map, or to reconstruct the original text I can join(" ") the sub-array elements, then join("\n") the resulting lines:
data.map{ |a| a.join(' ') }.join("\n")
=> "This is a test\nfile, dog cat bark\nmeow woof woof"

Extract the last word in sentence/string?

I have an array of strings, of different lengths and contents.
Now i'm looking for an easy way to extract the last word from each string, without knowing how long that word is or how long the string is.
something like;
array.each{|string| puts string.fetch(" ", last)
This should work just fine
"my random sentence".split.last # => "sentence"
to exclude punctuation, delete it
"my rando­m sente­nce..,.!?".­split.last­.delete('.­!?,') #=> "sentence"
To get the "last words" as an array from an array you collect
["random sentence...",­ "lorem ipsum!!!"­].collect { |s| s.spl­it.last.delete('.­!?,') } # => ["sentence", "ipsum"]
array_of_strings = ["test 1", "test 2", "test 3"]
array_of_strings.map{|str| str.split.last} #=> ["1","2","3"]
["one two",­ "thre­e four five"­].collect { |s| s.spl­it.last }
=> ["two", "five"]
"a string of words!".match(/(.*\s)*(.+)\Z/)[2] #=> 'words!' catches from the last whitespace on. That would include the punctuation.
To extract that from an array of strings, use it with collect:
["a string of words", "Something to say?", "Try me!"].collect {|s| s.match(/(.*\s)*(.+)\Z/)[2] } #=> ["words", "say?", "me!"]
The problem with all of these solutions is that you only considering spaces for word separation. Using regex you can capture any non-word character as a word separator. Here is what I use:
str = 'Non-space characters, like foo=bar.'
str.split(/\W/).last
# "bar"
This is the simplest way I can think of.
hostname> irb
irb(main):001:0> str = 'This is a string.'
=> "This is a string."
irb(main):002:0> words = str.split(/\s+/).last
=> "string."
irb(main):003:0>

Resources