Gsub-ing multiple substrings to an empty string - ruby

I often remove substrings from strings by doing this:
"don't use bad words like".gsub("bad", "").gsub("words", "").gsub("like", "")
What's a more concise/better way of excising long lists of substrings from a string in Ruby?

I would go with nronas' answer, however people tend to forget about Regexp.union:
str = "don't use bad words like"
str.gsub(Regexp.union('bad', 'words', 'like'), '')
# or
str.gsub(Regexp.union(['bad', 'words', 'like']), '')

You can always use regex when you gsubing :P. like:
str = "don't use bad words like"
str.gsub(/bad|words|like/, '')
I hope that helps

Edit2: Upon reflection, I think what I have below (or any solution that first breaks the string into an array of words) is really what you want. Suppose:
str = "Becky darned her socks before playing badmitten."
bad_words = ["bad", "darn", "socks"]
Which of the following would you want?
str.gsub(Regexp.union(*bad_words), '')
#=> "Becky ed her before playing mitten."
or
(str.split - bad_words).join(' ')
#=> "Becky darned her before playing badmitten."
Alternatively,
bad_words.reduce(str.split) { |arr,bw| arr.delete(bw); arr }.join(' ')
#=> "Becky darned her before playing badmitten."
:2tidE
Edit1: I've come to my senses and purged my solution. It was much too elaborate (and inefficient) for such a simple problem. I've just left an observation. :1tidE
If you want to end up with just a single space between words, you need to take a different tack:
(str.split - bad_words).join(' ')
#=> "don't use

I already suggested this to Cary, but it's here:
bad_words = %w[bad words like]
h = Hash.new{|h, k| k}.merge(bad_words.product(['']).to_h)
"don't use bad words like".gsub(/\w+/, h)

Related

How to sort a string x by the order in which they appear in string y?

I'm trying to create a method sort_by_letter that takes two string arguments and sorts the first by each letter in the order they appear in the second string.
x = "cat"
y = "kndttayc"
sort_by_letter(x, y)
#=> "tac"
try this
x.each_char.sort_by { |str| y.index str }.join
Subush's answer is straightforward, and works fine, but if the strings become long, and you want to care about efficiency, you might also want to do this:
h = y.each_char.with_index.to_h
#=> {"k"=>0, "n"=>1, "d"=>2, "t"=>4, "a"=>5, "y"=>6, "c"=>7}
x.each_char.sort_by{|c| h[c]}.join
#=> "tac"
Note: See Simple Lime's comment to the question. Subush's answer and my answer are respectively correct under different interpretation regarding this point.

How to select certain pattern in an array

Here is my array:
a = ['a','b','c', 'C!', 'D!']
I would like to select any upcase letters followed by the ! character and display them. I was trying:
puts a.select! {|i| i.upcase + "!"}
which gave me null set. Any help would be greatly appreciated.
puts a.grep(/[A-Z]!/)
will do.
Try the following:
a.select {|i| i =~ /[A-Z]!/}
Here's another way using the Regexp match method in Ruby.
a.select { |letter| /[A-Z]!/.match(letter) }
Also, one note: consider a more meaningful and contextually relevant variable name than "i" in a.select! {|i| i.upcase + "!"}. For example, I chose the name "letter", although there may be a more meaningful name. It's just a good naming practice that a lot of Ruby programmers tend to follow. Same thing applies to the array named a.

Ruby - Deleting every word from an array

In my app, I have to monitor what users type. So I have to prevent any bad words from the web site. Just for example, suppose all my bad words were in this array.
bad_words = ['bad', 'evil', 'terrible', 'villain', 'enemy']
If a user typed those, I would like them to be deleted. Here was one thing I tried.
bad_words.each {|word| string.gsub(word, '')}
Help is appreciated.
You can use a Gem to do the clean job:
https://github.com/tjackiw/obscenity
including the gem will allow you methods like:
Obscenity.configure { |config| config.whitelist = bad_words }
and then:
Obscenity.sanitize(string)
Here's one way:
bad_words = ['bad', 'evil', 'terrible', 'villain', 'enemy']
orig_str =
"Evil is embodied by a terrible villain named 'Bad' who plays badmitten"
no_bad_str = orig_str.gsub(/(?<=^|\W)\w+(?=\W|$)/) { |w|
(bad_words.include?(w.downcase)) ? '' : w }
#=> " is embodied by a named '' who plays badmitten"
(?<=^|\W) is a positive lookbehind
(?=\W|$) is a positive lookahead
Can bad, evil and terrible words sneak by? Of course. Some examples for orig_str:
badbadbad
evilterribleenemy
eviloff
flyingevil
Good luck!
You can either do
bad_words.each {|word| string = string.gsub(word, '')}
or
bad_words.each {|word| string.gsub!(word, '')}
Either should work issue with your original was that it was returning a new string not modifying the old one like the to solutions I have proposed above.
You can use Regexp.union to create a regular expression containing all the words in yours list:
bad_words = ['bad', 'evil', 'terrible', 'villain', 'enemy']
Regexp.union(bad_words)
# => /bad|evil|terrible|villain|enemy/
string.gsub(Regexp.union(bad_words), '')

Ruby count # of matches in string from array

I have a string, for example:
'This is a test string'
and an array:
['test', 'is']
I need to find out how many elements in array are present in string (in this case, it would be 2). What's the best/ruby-way of doing this? Also, I am doing this thousands of time, so please keep in mind efficiency.
What I tried so far:
array.each do |el|
string.include? el #increment counter
end
Thanks
['test', 'is'].count{ |s| /\b#{s}\b/ =~ 'This is a test string' }
Edit: adjusted for full word matching.
['test', 'is'].count { |e| 'This is a test string'.split.include? e }
Your question is ambiguous.
If you are counting the occurrences, then:
('This is a test string'.scan(/\w+/).map(&:downcase) & ['test', 'is']).length
If you are counting the tokens, then:
(['test', 'is'] & 'This is a test string'.scan(/\w+/).map(&:downcase)).length
You can further speed up the calculation by replacing Array#& by some operation using a Hash (or Set).
Kyle's answer gave you the simple practical way of doing the job. But looking at it, allow me to remark that more efficient algorithms exist to solve your problem, when n (string length and/or number of matched strings) climbs to millions. We commonly encounter such problems in biology.
Following will work provided there are no duplicates in string or array.
str = "This is a test string"
arr = ["test", "is"]
match_count = arr.size - (arr - str.split).size # 2 in this example

how to count the words of a string in ruby

I want to do something like this
def get_count(string)
sentence.split(' ').count
end
I think there's might be a better way, string may have built-in method to do this.
I believe count is a function so you probably want to use length.
def get_count(string)
sentence.split(' ').length
end
Edit: If your string is really long creating an array from it with any splitting will need more memory so here's a faster way:
def get_count(string)
(0..(string.length-1)).inject(1){|m,e| m += string[e].chr == ' ' ? 1 : 0 }
end
If the only word boundary is a single space, just count them.
puts "this sentence has five words".count(' ')+1 # => 5
If there are spaces, line endings, tabs , comma's followed by a space etc. between the words, then scanning for word boundaries is a possibility:
puts "this, is./tfour words".scan(/\b/).size/2
I know this is an old question, but this might help someone stumbling here. Countring words is a complicated problem. What is a "word"? Do numbers and special characters count as words? Etc...
I wrote the words_counted gem for this purpose. It's a highly flexible, customizable string analyser. You can ask it to analyse any string for word count, word occurrences, and exclude words/characters using regexp, strings, and arrays.
counter = WordsCounted::Counter.new("Hello World!", exclude: "World")
counter.word_count #=> 1
counted.words #=> ["Hello"]
Etc...
The documentation and full source are on Github.
using regular expression will also cover multi spaces:
sentence.split(/\S+/).size
String doesn't have anything pre-built to do what you wanted. You can define a method in your class or extend the String class itself for what you want to do:
def word_count( string )
return 0 if string.empty?
string.split.size
end
Regex split on any non-word character:
string.split(/\W+/).size
...although it makes apostrophe use count as two words, so depending on how small the margin of error needs to be, you might want to build your own regex expression.
I recently found that String#count is faster than splitting up the string by over an order of magnitude.
Unfortunately, String#count only accepts a string, not a regular expression. Also, it would count two adjacent spaces as two things, rather than a single thing, and you'd have to handle other white space characters seperately.
p " some word\nother\tword.word|word".strip.split(/\s+/).size #=> 4
I'd rather check for word boundaries directly:
"Lorem Lorem Lorem".scan(/\w+/).size
=> 3
If you need to match rock-and-roll as one word, you could do like:
"Lorem Lorem Lorem rock-and-roll".scan(/[\w-]+/).size
=> 4

Resources