Trying to remove punctuation without using regex - ruby

I am trying to remove punctuation from an array of words without using regular expression. In below eg,
str = ["He,llo!"]
I want:
result # => ["Hello"]
I tried:
alpha_num="abcdefghijklmnopqrstuvwxyz0123456789"
result= str.map do |punc|
punc.chars {|ch|alpha_num.include?(ch)}
end
p result
But it returns ["He,llo!"] without any change. Can't figure out where the problem is.

include? block returns true/false, try use select function to filter illegal characters.
result = str.map {|txt| txt.chars.select {|c| alpha_num.include?(c.downcase)}}
.map {|chars| chars.join('')}
p result

str=["He,llo!"]
alpha_num="abcdefghijklmnopqrstuvwxyz0123456789"
Program
v=[]<<str.map do |x|
x.chars.map do |c|
alpha_num.chars.map.include?(c.downcase) ? c : nil
end
end.flatten.compact.join
p v
Output
["Hello"]

exclusions = ((32..126).map(&:chr) - [*'a'..'z', *'A'..'Z', *'0'..'9']).join
#=> " !\"\#$%&'()*+,-./:;<=>?#[\\]^_`{|}~"
arr = ['He,llo!', 'What Ho!']
arr.map { |word| word.delete(exclusions) }
#=> ["Hello", "WhatHo"]
If you could use a regular expression and truly only wanted to remove punctuation, you could write the following.
arr.map { |word| word.gsub(/[[:punct:]]/, '') }
#=> ["Hello", "WhatHo"]
See String#delete. Note that arr is not modified.

Related

How to use gsub with a file in Ruby?

Hey I've a little problem, I've a string array text_word and I want to replace some letters with my file transform.txt, my file looks like this:
/t/ 3
/$/ 1
/a/ !
But when I use gsub I get an Enumerator back, does anyone know how to fix this?
text_transform= Array.new
new_words= Array.new
File.open("transform.txt", "r") do |fi|
fi.each_line do |words|
text_transform << words.chomp
end
end
text_transform.each do |transform|
text_word.each do |words|
new_words << words.gsub(transform)
end
end
You can see String#gsub
If the second argument is a Hash, and the matched text is one of its
keys, the corresponding value is the replacement string.
Also you can use IO::readlines
File.readlines('transform.txt', chomp: true).map { |word| word.gsub(/[t$a]/, 't' => 3, '$' => 1, 'a' => '!') }
gsub returns an Enumerator when you provide just one argument (the pattern). If you want to replace just add the replacement string:
pry(main)> 'this is my string'.gsub(/i/, '1')
"th1s 1s my str1ng"
You need to refactor your code:
text_transform = Array.new
new_words = Array.new
File.open("transform.txt", "r") do |fi|
fi.each_line do |words|
text_transform << words.chomp.strip.split # "/t/ 3" -> ["/t/", "3"]
end
end
text_transform.each do |pattern, replacement| # pattern = "/t/", replacement = "3"
text_word.each do |words|
new_words << words.gsub(pattern, replacement)
end
end

how I could do a gsub with array elements?

How I could replaces a string like this
I think something like this
inputx.gsub(/variable1/,string1.split(";")[i])
But I dont know How I could do this code
name1;variable1
name;variable1
name3;variable1
by
dog;watch;rock
For obtain this
name1;dog
name;watch
name3;rock
string1 => dog;watch;rock ; this string Im trying to split for replace each string variable1
Please help me
subst = "dog;watch;rock".split ';'
input.gsub(/variable1/) do subst.shift end
#⇒ "name1;dog \n name;watch \n name3;rock"
Given (assuming) this input:
inputx = <<-EOD
name1;variable1
name;variable1
name3;variable1
EOD
#=> "name1;variable1\nname;variable1\nname3;variable1\n"
string1 = 'dog;watch;rock'
#=> "dog;watch;rock"
You can chain gsub and with_index to perform a replacement based on its index:
inputx.gsub('variable1').with_index { |_, i| string1.split(';')[i] }
#=> "name1;dog\nname;watch\nname3;rock\n"
You could also perform the split beforehand:
values = string1.split(';')
#=> ["dog", "watch", "rock"]
inputx.gsub('variable1').with_index { |_, i| values[i] }
#=> "name1;dog\nname;watch\nname3;rock\n"
I'm not sure there's a way to do it using .gsub(). One simple way to achieve what you want to is the following:
str = "dog;watch;rock"
array = str.split(";")
array.each_with_index do |str, i|
array[i] = "name#{i + 1};#{str}"
end
puts array
Output:
name1;dog
name2;watch
name3;rock
file intro2 => dog;watch;rock
file intro
name1;variable1
name;variable1
name3;variable1
ruby code
ruby -e ' n=0; input3= File.read("intro");string1= File.read("intro2") ;input3x=input3.gsub("variable1") { val =string1.split(";")[n].to_s; n+=1; val } ;print input3x' >gggf

Remove a string pattern and symbols from string

I need to clean up a string from the phrase "not" and hashtags(#). (I also have to get rid of spaces and capslock and return them in arrays, but I got the latter three taken care of.)
Expectation:
"not12345" #=> ["12345"]
" notabc " #=> ["abc"]
"notone, nottwo" #=> ["one", "two"]
"notCAPSLOCK" #=> ["capslock"]
"##doublehash" #=> ["doublehash"]
"h#a#s#h" #=> ["hash"]
"#notswaggerest" #=> ["swaggerest"]
This is the code I have
def some_method(string)
string.split(", ").map{|n| n.sub(/(not)/,"").downcase.strip}
end
All of the above test does what I need to do except for the hash ones. I don't know how to get rid of the hashes; I have tried modifying the regex part: n.sub(/(#not)/), n.sub(/#(not)/), n.sub(/[#]*(not)/) to no avail. How can I make Regex to remove #?
arr = ["not12345", " notabc", "notone, nottwo", "notCAPSLOCK",
"##doublehash:", "h#a#s#h", "#notswaggerest"].
arr.flat_map { |str| str.downcase.split(',').map { |s| s.gsub(/#|not|\s+/,"") } }
#=> ["12345", "abc", "one", "two", "capslock", "doublehash:", "hash", "swaggerest"]
When the block variable str is set to "notone, nottwo",
s = str.downcase
#=> "notone, nottwo"
a = s.split(',')
#=> ["notone", " nottwo"]
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
#=> ["one", "two"]
Because I used Enumerable#flat_map, "one" and "two" are added to the array being returned. When str #=> "notCAPSLOCK",
s = str.downcase
#=> "notcapslock"
a = s.split(',')
#=> ["notcapslock"]
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
#=> ["capslock"]
Here is one more solution that uses a different technique of capturing what you want rather than dropping what you don't want: (for the most part)
a = ["not12345", " notabc", "notone, nottwo",
"notCAPSLOCK", "##doublehash:","h#a#s#h", "#notswaggerest"]
a.map do |s|
s.downcase.delete("#").scan(/(?<=not)\w+|^[^not]\w+/)
end
#=> [["12345"], ["abc"], ["one", "two"], ["capslock"], ["doublehash"], ["hash"], ["swaggerest"]]
Had to delete the # because of h#a#s#h otherwise delete could have been avoided with a regex like /(?<=not|^#[^not])\w+/
You can use this regex to solve your problem. I tested and it works for all of your test cases.
/^\s*#*(not)*/
^ means match start of string
\s* matches any space at the start
#* matches 0 or more #
(not)* matches the phrase "not" zero or more times.
Note: this regex won't work for cases where "not" comes before "#", such as not#hash would return #hash
Fun problem because it can use the most common string functions in Ruby:
result = values.map do |string|
string.strip # Remove spaces in front and back.
.tr('#','') # Transform single characters. In this case remove #
.gsub('not','') # Substitute patterns
.split(', ') # Split into arrays.
end
p result #=>[["12345"], ["abc"], ["one", "two"], ["CAPSLOCK"], ["doublehash"], ["hash"], ["swaggerest"]]
I prefer this way rather than a regexp as it is easy to understand the logic of each line.
Ruby regular expressions allow comments, so to match the octothorpe (#) you can escape it:
"#foo".sub(/\#/, "") #=> "foo"

Ruby getting the longest word of a sentence

I'm trying to create method named longest_word that takes a sentence as an argument and The function will return the longest word of the sentence.
My code is:
def longest_word(str)
words = str.split(' ')
longest_str = []
return longest_str.max
end
The shortest way is to use Enumerable's max_by:
def longest(string)
string.split(" ").max_by(&:length)
end
Using regexp will allow you to take into consideration punctuation marks.
s = "lorem ipsum, loremmm ipsummm? loremm ipsumm...."
first longest word:
s.split(/[^\w]+/).max_by(&:length)
# => "loremmm"
# or using scan
s.scan(/\b\w+\b/).max_by(&:length)
# => "loremmm"
Also you may be interested in getting all longest words:
s.scan(/\b\w+\b/).group_by(&:length).sort.last.last
# => ["loremmm", "ipsummm"]
It depends on how you want to split the string. If you are happy with using a single space, than this works:
def longest(source)
arr = source.split(" ")
arr.sort! { |a, b| b.length <=> a.length }
arr[0]
end
Otherwise, use a regular expression to catch whitespace and puntuaction.
def longest_word(sentence)
longest_word = ""
words = sentence.split(" ")
words.each do |word|
longest_word = word unless word.length < longest_word.length
end
longest_word
end
That's a simple way to approach it. You could also strip the punctuation using a gsub method.
Funcional Style Version
str.split(' ').reduce { |r, w| w.length > r.length ? w : r }
Another solution using max
str.split(' ').max { |a, b| a.length <=> b.length }
sort_by! and reverse!
def longest_word(sentence)
longw = sentence.split(" ")
longw.sort_by!(&:length).reverse!
p longw[0]
end
longest_word("once upon a time long ago a very longword")
If you truly want to do it in the Ruby way it would be:
def longest(sentence)
sentence.split(' ').sort! { |a, b| b.length <=> a.length }[0]
end
This is to strip the word from the extra chars
sen.gsub(/[^0-9a-z ]/i, '').split(" ").max_by(&:length)
Find Longest word in a string
sentence = "Hi, my name is Mesut. There is longestword here!"
def longest_word(string)
long = ""
string.split(" ").each do |sent|
if sent.length >= long.length
long = sent
end
end
return long
end
p longest_word(sentence)

How do I get the match data for all occurrences of a Ruby regular expression in a string?

I need the MatchData for each occurrence of a regular expression in a string. This is different than the scan method suggested in Match All Occurrences of a Regex, since that only gives me an array of strings (I need the full MatchData, to get begin and end information, etc).
input = "abc12def34ghijklmno567pqrs"
numbers = /\d+/
numbers.match input # #<MatchData "12"> (only the first match)
input.scan numbers # ["12", "34", "567"] (all matches, but only the strings)
I suspect there is some method that I've overlooked. Suggestions?
You want
"abc12def34ghijklmno567pqrs".to_enum(:scan, /\d+/).map { Regexp.last_match }
which gives you
[#<MatchData "12">, #<MatchData "34">, #<MatchData "567">]
The "trick" is, as you see, to build an enumerator in order to get each last_match.
My current solution is to add an each_match method to Regexp:
class Regexp
def each_match(str)
start = 0
while matchdata = self.match(str, start)
yield matchdata
start = matchdata.end(0)
end
end
end
Now I can do:
numbers.each_match input do |match|
puts "Found #{match[0]} at #{match.begin(0)} until #{match.end(0)}"
end
Tell me there is a better way.
I’ll put it here to make the code available via a search:
input = "abc12def34ghijklmno567pqrs"
numbers = /\d+/
input.gsub(numbers) { |m| p $~ }
The result is as requested:
⇒ #<MatchData "12">
⇒ #<MatchData "34">
⇒ #<MatchData "567">
See "input.gsub(numbers) { |m| p $~ } Matching data in Ruby for all occurrences in a string" for more information.
I'm surprised nobody mentioned the amazing StringScanner class included in Ruby's standard library:
require 'strscan'
s = StringScanner.new('abc12def34ghijklmno567pqrs')
while s.skip_until(/\d+/)
num, offset = s.matched.to_i, [s.pos - s.matched_size, s.pos - 1]
# ..
end
No, it doesn't give you the MatchData objects, but it does give you an index-based interface into the string.
input = "abc12def34ghijklmno567pqrs"
n = Regexp.new("\\d+")
[n.match(input)].tap { |a| a << n.match(input,a.last().end(0)+1) until a.last().nil? }[0..-2]
=> [#<MatchData "12">, #<MatchData "34">, #<MatchData "567">]

Resources