Remove a string pattern and symbols from string - ruby

I need to clean up a string from the phrase "not" and hashtags(#). (I also have to get rid of spaces and capslock and return them in arrays, but I got the latter three taken care of.)
Expectation:
"not12345" #=> ["12345"]
" notabc " #=> ["abc"]
"notone, nottwo" #=> ["one", "two"]
"notCAPSLOCK" #=> ["capslock"]
"##doublehash" #=> ["doublehash"]
"h#a#s#h" #=> ["hash"]
"#notswaggerest" #=> ["swaggerest"]
This is the code I have
def some_method(string)
string.split(", ").map{|n| n.sub(/(not)/,"").downcase.strip}
end
All of the above test does what I need to do except for the hash ones. I don't know how to get rid of the hashes; I have tried modifying the regex part: n.sub(/(#not)/), n.sub(/#(not)/), n.sub(/[#]*(not)/) to no avail. How can I make Regex to remove #?

arr = ["not12345", " notabc", "notone, nottwo", "notCAPSLOCK",
"##doublehash:", "h#a#s#h", "#notswaggerest"].
arr.flat_map { |str| str.downcase.split(',').map { |s| s.gsub(/#|not|\s+/,"") } }
#=> ["12345", "abc", "one", "two", "capslock", "doublehash:", "hash", "swaggerest"]
When the block variable str is set to "notone, nottwo",
s = str.downcase
#=> "notone, nottwo"
a = s.split(',')
#=> ["notone", " nottwo"]
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
#=> ["one", "two"]
Because I used Enumerable#flat_map, "one" and "two" are added to the array being returned. When str #=> "notCAPSLOCK",
s = str.downcase
#=> "notcapslock"
a = s.split(',')
#=> ["notcapslock"]
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
#=> ["capslock"]

Here is one more solution that uses a different technique of capturing what you want rather than dropping what you don't want: (for the most part)
a = ["not12345", " notabc", "notone, nottwo",
"notCAPSLOCK", "##doublehash:","h#a#s#h", "#notswaggerest"]
a.map do |s|
s.downcase.delete("#").scan(/(?<=not)\w+|^[^not]\w+/)
end
#=> [["12345"], ["abc"], ["one", "two"], ["capslock"], ["doublehash"], ["hash"], ["swaggerest"]]
Had to delete the # because of h#a#s#h otherwise delete could have been avoided with a regex like /(?<=not|^#[^not])\w+/

You can use this regex to solve your problem. I tested and it works for all of your test cases.
/^\s*#*(not)*/
^ means match start of string
\s* matches any space at the start
#* matches 0 or more #
(not)* matches the phrase "not" zero or more times.
Note: this regex won't work for cases where "not" comes before "#", such as not#hash would return #hash

Fun problem because it can use the most common string functions in Ruby:
result = values.map do |string|
string.strip # Remove spaces in front and back.
.tr('#','') # Transform single characters. In this case remove #
.gsub('not','') # Substitute patterns
.split(', ') # Split into arrays.
end
p result #=>[["12345"], ["abc"], ["one", "two"], ["CAPSLOCK"], ["doublehash"], ["hash"], ["swaggerest"]]
I prefer this way rather than a regexp as it is easy to understand the logic of each line.

Ruby regular expressions allow comments, so to match the octothorpe (#) you can escape it:
"#foo".sub(/\#/, "") #=> "foo"

Related

How to skip over whitespaces in .map. Ruby

This is my code:
def weirdcase (string)
string.chars.map.with_index { |letter, index|
unless index.odd?;
letter = letter.upcase
else
letter
end }.compact.join("")
end
This is what it's supposed to do:
"ThIs Is A TeSt"
And this is what I got:
"ThIs iS A TeSt"
It's giving me the wrong string in return because it's counting/including the white spaces in my
code. All I need to do is find a way to skip the white spaces then I'm good to go.
Thanks!
The problem
I assume that the objective is to capitalize, for each word, all letters at even indices (the first letter of the word having index zero).
Here are two ways to do that. Both methods use String#gsub with a regular expression. Depending on requirements it may be necessary to change str.gsub... to str.downcase.gsub... for both methods.
Use a regular expression to match one- or two-characters strings, two if possible, and capitalize those strings.
def weirdcase(str)
str.gsub(/(?<=\A| |[^ ]{2})[^ ]{1,2}/) { |s| s.capitalize }
end
weirdcase "this is a sentence for testing"
#=> "ThIs Is A SeNtEnCe FoR TeStInG"
The regular expression reads, "match one or two characters other than spaces, two if possible ([^ ]{1,2}), that are immediately preceded by one of the following: the beginning of the string (\A), a space or two characters other than spaces. (?<=\A| |[^ ]{2}) is a positive lookbehind.
s.capitalize invokes the method String#capitalize on the match.
Use a cycling enumerator
def weirdcase(str)
enum = [:upcase, :downcase].cycle
str.gsub(/./) do |s|
if s == ' '
enum.rewind
' '
else
s.public_send(enum.next)
end
end
end
weirdcase "this is a sentence for testing"
#=> "ThIs Is A SeNtEnCe FoR TeStInG"
The regular expression /./ matches each character in the string.
See Array#cycle, Enumerator#rewind, Enumerator#next and Object#public_send.
Note the following.
enum = [:upcase, :downcase].cycle
#=> #<Enumerator: [:upcase, :downcase]:cycle>
enum.next
#=> :upcase
enum.next
#=> :downcase
enum.next
#=> :upcase
enum.rewind
#=> #<Enumerator: [:upcase, :downcase]:cycle>
enum.next
#=> :upcase
enum.next
#=> :downcase
... ad infinitum

Trying to remove punctuation without using regex

I am trying to remove punctuation from an array of words without using regular expression. In below eg,
str = ["He,llo!"]
I want:
result # => ["Hello"]
I tried:
alpha_num="abcdefghijklmnopqrstuvwxyz0123456789"
result= str.map do |punc|
punc.chars {|ch|alpha_num.include?(ch)}
end
p result
But it returns ["He,llo!"] without any change. Can't figure out where the problem is.
include? block returns true/false, try use select function to filter illegal characters.
result = str.map {|txt| txt.chars.select {|c| alpha_num.include?(c.downcase)}}
.map {|chars| chars.join('')}
p result
str=["He,llo!"]
alpha_num="abcdefghijklmnopqrstuvwxyz0123456789"
Program
v=[]<<str.map do |x|
x.chars.map do |c|
alpha_num.chars.map.include?(c.downcase) ? c : nil
end
end.flatten.compact.join
p v
Output
["Hello"]
exclusions = ((32..126).map(&:chr) - [*'a'..'z', *'A'..'Z', *'0'..'9']).join
#=> " !\"\#$%&'()*+,-./:;<=>?#[\\]^_`{|}~"
arr = ['He,llo!', 'What Ho!']
arr.map { |word| word.delete(exclusions) }
#=> ["Hello", "WhatHo"]
If you could use a regular expression and truly only wanted to remove punctuation, you could write the following.
arr.map { |word| word.gsub(/[[:punct:]]/, '') }
#=> ["Hello", "WhatHo"]
See String#delete. Note that arr is not modified.

Ruby method that uppercases even indexed letters and lowercases odd

Directions:
Write a method that accepts a string, and returns the same string with all even indexed characters in each word upper cased, and all odd indexed characters in each word lower cased. The indexing just explained is zero based, so the zero-ith index is even, therefore that character should be upper cased.
The passed in string will only consist of alphabetical characters and spaces(' '). Spaces will only be present if there are multiple words. Words will be separated by a single space(' ').
My code:
(someone please refactor or explain to me a cleaner/shorter solution)
def weirdcase(string)
arr = string.split(' ')
arr.map! {|word|
char = word.chars
char.each_with_index do |letter, i|
i % 2 == 0 ? letter.upcase! : letter.downcase!
end
}
arr.map! {|a| a.push(' ').join('')}
x = arr.join('').to_s
x[0...-1]
end
This is one way you could do that, using Array#cycle to create an enumerator and String#gsub to replace every character in the string with its value upcased or downcased.
def weirdcase(str)
enum = [:upcase, :downcase].cycle
str.gsub(/./) do |s|
if s == ' '
enum.rewind
s
else
s.public_send(enum.next)
end
end
end
weirdcase "Mary had a little lamb"
#=> "MaRy hAd a lItTlE LaMb"
By making gsub's argument /./ each character in the string is replaced by the value returned by the block, which, if that character is not a space, is that character either upcased or downcased, depending on the symbol generated by the enumerator enum, which alternates between :upcase and :downcase for each word.
Note that
enum = [:upcase, :downcase].cycle
#=> #<Enumerator: [:upcase, :downcase]:cycle>
enum.next
#=> :upcase
enum.next
#=> :downcase
enum.next
#=> :upcase
and so on. See also Enumerator#next.
Enumerator#rewind is needed to begin anew the alternating of case with each word.
One could replace s.public_send(enum.next) with
enum.next == :upcase ? s.upcase : s.downcase
You could also use gsub to change two adjacent characters at a time:
def weirdcase(string)
string.gsub(/(.)(.?)/) { "#{$1.upcase}#{$2.downcase}" }
end
weirdcase "Mary had a little lamb"
#=> "MaRy hAd a lItTlE LaMb"
The ? makes the second character optional, which is needed for odd-length strings:
weirdcase "foo"
#=> "FoO"
Or a using each_char and with_index:
def weirdcase(string)
string.each_char.map.with_index { |char, index|
if index.odd?
char.downcase
else
char.upcase
end
}.join
end
If you want to change each word separately:
"Mary had a little lamb".split(' ').map { |word| weirdcase(word) }.join(' ')
#=> "MaRy HaD A LiTtLe LaMb"
or again with gsub:
"Mary had a little lamb".gsub(/\S+/) { |word| weirdcase(word) }
#=> "MaRy HaD A LiTtLe LaMb"

Ruby gsub match concatenation

Given a string of digits, I am trying to insert '-' between odd numbers and '*' between even numbers. The solution below:
def DashInsertII(num)
num = num.chars.map(&:to_i)
groups = num.slice_when {|x,y| x.odd? && y.even? || x.even? && y.odd?}.to_a
puts groups.to_s
groups.map! do |array|
if array[0].odd?
array.join(" ").gsub(" ", "-")
else
array.join(" ").gsub(" ", "*")
end
end
d = %w{- *}
puts groups.join.chars.to_s
groups = groups.join.chars
# Have to account for 0 because Coderbyte thinks 0 is neither even nor odd, which is false.
groups.each_with_index do |char,index|
if d.include? char
if (groups[index-1] == "0" || groups[index+1] == "0")
groups.delete_at(index)
end
end
end
groups.join
end
is very convoluted, and I was wondering if I could do something like this:
"99946".gsub(/[13579][13579]/) {|s,x| s+"-"+x}
where s is the first odd, x the second. Usually when I substitute, I replace the matched term, but here I want to keep the matched term and insert a character between the pattern. This would make this problem much simpler.
This will work for you:
"99946".gsub(/[13579]+/) {|s| s.split("").join("-") }
# => "9-9-946"
It's roughly similar to what you tried. It captures multiple consecutive odd digits, and uses the gsub block to split and then join them separated by the "-".
This will include both solutions working together:
"99946".gsub(/[13579]+/) {|s| s.split("").join("-") }.gsub(/[02468]+/) {|s| s.split("").join("*") }
# => "9-9-94*6"
The accepted answer illustrates well the logic required to solve the problem. However, I'd like to suggest that in production code that it be simplified somewhat so that it is easier to read and understand.
In particular, we are doing the same thing twice with different arguments, so it would be helpful to the reader to make that obvious, by writing a method or lambda that both uses call. For example:
do_pair = ->(string, regex, delimiter) do
string.gsub(regex) { |s| s.chars.join(delimiter) }
end
Then, one can call it like this:
do_pair.(do_pair.('999434432', /[13579]+/, '-'), /['02468']+/, '*')
This could be simplified even further:
do_pair = ->(string, odd_or_even) do
regex = (odd_or_even == :odd) ? /[13579]+/ : /['02468']+/
delimiter = (odd_or_even == :odd) ? '-' : '*'
string.gsub(regex) { |s| s.chars.join(delimiter) }
end
One advantage to this approach is that it makes obvious both the fact that we are processing two cases, odd and even, and the values we are using for those two cases. It can then be called like this:
do_pair.(do_pair.('999434432', :odd), :even)
This could also be done in a method, of course, and that would be fine. The reason I suggested a lambda is that it's pretty minimal logic and it is used in only one (albeit compound) expression in a single method.
This is admittedly more verbose, but breaks down the logic for the reader into more easily digestible chunks, reducing the cognitive cost of understanding it.
The ordinary way to do that is:
"99946"
.gsub(/(?<=[13579])(?=[13579])/, "-")
.gsub(/(?<=[2468])(?=[2468])/, "*")
# => "9-9-94*6"
or
"99946".gsub(/(?<=[13579])()(?=[13579])|(?<=[2468])()(?=[2468])/){$1 ? "-" : "*"}
# => "9-9-94*6"
"2899946".each_char.chunk { |c| c.to_i.even? }.map { |even, arr|
arr.join(even ? '*' : '-') }.join
#=> "2*89-9-94*6"
The steps:
enum0 = "2899946".each_char
#=> #<Enumerator: "2899946":each_char>
We can convert enum0 to an array to see the elements it will generate:
enum0.to_a
#=> ["2", "8", "9", "9", "9", "4", "6"]
Continuing,
enum1 = enum0.chunk { |c| c.to_i.even? }
#=> #<Enumerator: #<Enumerator::Generator:0x007fa733024b58>:each>
enum1.to_a
#=> [[true, ["2", "8"]], [false, ["9", "9", "9"]], [true, ["4", "6"]]]
a = enum1.map { |even, arr| arr.join(even ? '*' : '-') }
#=> ["2*8", "9-9-9", "4*6"]
a.join
#=> "2*89-9-94*6"

Regexp to match repeated substring

I would like to verify a string containing repeated substrings. The substrings have a particular structure. Whole string has a particular structure (substring split by "|"). For instance, the string can be:
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
How can I check that all repeated substrings match a regexp? I tried to check it with:
"1=23.00|6=22.12|12=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
But checking gives true even when several substrings do not match the regexp:
"1=23.00|6=ass|=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
# => #<MatchData "1=23.00" 1:"1=23.00">
The question is whether every repeated substring matches a regex. I understand that the substrings are separated by the character | or $/, the latter being the end of a line. We first need to obtain the repeated substrings:
a = str.split(/[#{$/}\|]/)
.map(&:strip)
.group_by {|s| s}
.select {|_,v| v.size > 1 }
.keys
Next we specify whatever regex you wish to use. I am assuming it is this:
REGEX = /[1-9][0-9]*=[1-9]+\.[0-9]+/
but it could be altered if you have other requirements.
As we wish to determine if all repeated substrings match the regex, that is simply:
a.all? {|s| s =~ REGEX}
Here are the calculations:
str =<<_
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
_
c = str.split(/[#{$/}\|]/)
#=> ["1=23.00", "6=22.12", "12=21.34", "112=20.34", "1=23.00",
# "6=22.12", "12=21.34", "1=23.00", "12=21.34", "1=23.00**"]
d = c.map(&:strip)
# same as c, possibly not needed or not wanted
e = d.group_by {|s| s}
# => {"1=23.00" =>["1=23.00", "1=23.00", "1=23.00"],
# "6=22.12" =>["6=22.12", "6=22.12"],
# "12=21.34" =>["12=21.34", "12=21.34", "12=21.34"],
# "112=20.34"=>["112=20.34"], "1=23.00**"=>["1=23.00**"]}
f = e.select {|_,v| v.size > 1 }
#=> {"1=23.00"=>["1=23.00", "1=23.00" , "1=23.00"],
# "6=22.12"=>["6=22.12", "6=22.12"],
# "12=21.34"=>["12=21.34", "12=21.34", "12=21.34"]}
a = f.keys
#=> ["1=23.00", "6=22.12", "12=21.34"]
a.all? {|s| s =~ REGEX}
#=> true
This will return true if there are any duplicates, false if there are not:
s = "1=23.00|6=22.12|12=21.34|112=20.34|3=23.00"
arr = s.split(/\|/).map { |s| s.gsub(/\d=/, "") }
arr != arr.uniq # => true
If you want to resolve it through regexp (not ruby), you should match whole string, not substrings. Well, I added [|] symbol and line ending to your regexp and it should works like you want.
([1-9][0-9]*[=][0-9\.]+[|]*)+$
Try it out.

Resources