I need to remove some phrases from a string in Ruby. The phrases are defined inside an array. It could look like this:
remove = ["Test", "Another One", "Something Else"]
Then I want to check and remove these from a given string.
"This is a Test" => "This is a "
"This is Another One" => "This is "
"This is Another Two" => "This is Another Two"
Using Ruby 1.9.3 and Rail 3.2.6.
ary = ["Test", "Another One", "Something Else", "(RegExp i\s escaped)"]
string.gsub(Regexp.union(ary), '')
Regexp.union can be used to compile an array of strings (or regexpes) into a single regexp which therefore only requires a single search & replace.
Regexp.union ['string', /regexp?/i] #=> /string|(?i-mx:regexp?)/
Simplest (but not most efficient):
# Non-mutating
cleaned = str
remove.each{ |s| cleaned = cleaned.gsub(s,'') }
# Mutating
remove.each{ |s| str.gsub!(s,'') }
More efficient (but less clear):
# Non-mutating
cleaned = str.gsub(Regexp.union(remove), '')
# Mutating
str.gsub!(Regexp.union(remove), '')
Related
Is there a way to extract the strings removed by String#split into a separate array?
s = "This is a simple, uncomplicated sentence."
a = s.split( /,|\./ ) #=> [ "This is a simple", "uncomplicated sentence" ]
x = ... => should contain [ ",", "." ]
Note that the actual regex I need to use is much more complex than this example.
Something like this ?
a = s.scan( /,|\./ )
When you want both the matched delimiters and the substrings in between as in Stefan's comment, then you should use split with captures.
"This is a simple, uncomplicated sentence."
.split(/([,.])/)
# => ["This is a simple", ",", " uncomplicated sentence", "."]
If you want to separate them into different arrays, then do:
a, x =
"This is a simple, uncomplicated sentence."
.split(/([,.])/).each_slice(2).to_a.transpose
a # => ["This is a simple", " uncomplicated sentence"]
x # => [",", "."]
or
a =
"This is a simple, uncomplicated sentence."
.split(/([,.])/)
a.select.with_index{|_, i| i.even?}
# => ["This is a simple", " uncomplicated sentence"]
a.select.with_index{|_, i| i.odd?}
# => [",", "."]
try this:
a = s.split(/,/)[1..-1]
Say I want to get 10 inputs in loop and store it in an array. The input will be either string or line or json string.
I'm aware of Ruby's upto and gets.chomp but I'm looking for a simple and lazy technique like:
n=10
arr = []
loop(n) { arr.push getline } #Just an example to share my thought. Will not work
Don't know if this is "simple and lazy" enough:
irb> 3.times.collect { gets.chomp }
foo
bar
baz
# => ["foo", "bar", baz"]
Array.new.
Array.new(3){gets.chomp}
(1..3).map {gets.strip!}
This works nice, and is clean for noise before and after the entries.
Valid in 1.9 and 2.0.
>> (1..3).map {gets.strip!}
Hello
1
2
=> ["Hello", "1", "2"]
The following code:
str = "1, hello,2"
puts str
arr = str.split(",")
puts arr.inspect
arr.collect { |x| x.strip! }
puts arr.inspect
produces the following result:
1, hello,2
["1", " hello", "2"]
["1", "hello", "2"]
This is as expected. The following code:
str = "1, hello,2"
puts str
arr = (str.split(",")).collect { |x| x.strip! }
puts arr.inspect
Does however produce the following output:
1, hello,2
[nil, "hello", nil]
Why do I get these "nil"? Why can't I do the .collect immediately on the splitted-array?
Thanks for the help!
The #collect method will return an array of the values returned by each block's call. In your first example, you're modifying the actual array contents with #strip! and use those, while you neglect the return value of #collect.
In the second case, you use the #collect result. Your problem is that #strip! will either return a string or nil, depending on its result – especially, it'll return nil if the string wasn't modified.
Therefore, use #strip (without the exclamation mark):
1.9.3-p194 :005 > (str.split(",")).collect { |x| x.strip }
=> ["1", "hello", "2"]
Because #strip! returns nil if the string was not altered.
In your early examples you were not using the result of #collect, just modifying the strings with #strip!. Using #each in that case would have made the non-functional imperative loop a bit more clear. One normally uses #map / #collect only when using the resulting new array.
You last approach looks good, you wrote a functional map but you left the #strip! in ... just take out the !.
I have an array of strings, of different lengths and contents.
Now i'm looking for an easy way to extract the last word from each string, without knowing how long that word is or how long the string is.
something like;
array.each{|string| puts string.fetch(" ", last)
This should work just fine
"my random sentence".split.last # => "sentence"
to exclude punctuation, delete it
"my random sentence..,.!?".split.last.delete('.!?,') #=> "sentence"
To get the "last words" as an array from an array you collect
["random sentence...", "lorem ipsum!!!"].collect { |s| s.split.last.delete('.!?,') } # => ["sentence", "ipsum"]
array_of_strings = ["test 1", "test 2", "test 3"]
array_of_strings.map{|str| str.split.last} #=> ["1","2","3"]
["one two", "three four five"].collect { |s| s.split.last }
=> ["two", "five"]
"a string of words!".match(/(.*\s)*(.+)\Z/)[2] #=> 'words!' catches from the last whitespace on. That would include the punctuation.
To extract that from an array of strings, use it with collect:
["a string of words", "Something to say?", "Try me!"].collect {|s| s.match(/(.*\s)*(.+)\Z/)[2] } #=> ["words", "say?", "me!"]
The problem with all of these solutions is that you only considering spaces for word separation. Using regex you can capture any non-word character as a word separator. Here is what I use:
str = 'Non-space characters, like foo=bar.'
str.split(/\W/).last
# "bar"
This is the simplest way I can think of.
hostname> irb
irb(main):001:0> str = 'This is a string.'
=> "This is a string."
irb(main):002:0> words = str.split(/\s+/).last
=> "string."
irb(main):003:0>
irb(main):001:0> t = %w{this is a test}
=> ["this", "is", "a", "test"]
irb(main):002:0> t.size
=> 4
irb(main):003:0> t = %w{"this is" a test}
=> ["\"this", "is\"", "a", "test"]
irb(main):004:0> t.size
=> 4
In the end I expected t.size to be 3.
As suggested, each space has to be escaped ...which turns out to be a lot of work. What other options are there? I have a list of about 30 words that I need to put in a collection because I am showing them as checkboxes using simple_form
Why not just use a normal array so no one has to visually parse all the escaping to figure out what's going on? This is pretty clear:
t = [
'this is',
'a',
'test'
]
and the people maintaining your code won't hate you for using %w{} when it isn't appropriate or when they mess things up because they didn't see your escaped whitespace.
You need to escape the space with a '\', like t = %w{this\ is a test} if you dont want that space to be a splitter.
Escape the space using \:
%w{this\ is a test}
You can escape the space %w{this\ is a test} to get ['this is', 'a', 'test'], but in general I wouldn't use %w unless then intention is to split on whitespace.
As others have pointed out use the %w{} construct when spaces are the separator for the words. If you have items that must be quoted and still want to use the construct you can do:
> %w{a test here}.unshift("This is")
=> ["This is", "a", "test", "here"]
require 'csv'
str = '"this is" a test'
p CSV.parse_line(str,{:col_sep=>' '})
#=> ["this is", "a", "test"]