How to use gsub with a file in Ruby? - ruby

Hey I've a little problem, I've a string array text_word and I want to replace some letters with my file transform.txt, my file looks like this:
/t/ 3
/$/ 1
/a/ !
But when I use gsub I get an Enumerator back, does anyone know how to fix this?
text_transform= Array.new
new_words= Array.new
File.open("transform.txt", "r") do |fi|
fi.each_line do |words|
text_transform << words.chomp
end
end
text_transform.each do |transform|
text_word.each do |words|
new_words << words.gsub(transform)
end
end

You can see String#gsub
If the second argument is a Hash, and the matched text is one of its
keys, the corresponding value is the replacement string.
Also you can use IO::readlines
File.readlines('transform.txt', chomp: true).map { |word| word.gsub(/[t$a]/, 't' => 3, '$' => 1, 'a' => '!') }

gsub returns an Enumerator when you provide just one argument (the pattern). If you want to replace just add the replacement string:
pry(main)> 'this is my string'.gsub(/i/, '1')
"th1s 1s my str1ng"
You need to refactor your code:
text_transform = Array.new
new_words = Array.new
File.open("transform.txt", "r") do |fi|
fi.each_line do |words|
text_transform << words.chomp.strip.split # "/t/ 3" -> ["/t/", "3"]
end
end
text_transform.each do |pattern, replacement| # pattern = "/t/", replacement = "3"
text_word.each do |words|
new_words << words.gsub(pattern, replacement)
end
end

Related

Trying to remove punctuation without using regex

I am trying to remove punctuation from an array of words without using regular expression. In below eg,
str = ["He,llo!"]
I want:
result # => ["Hello"]
I tried:
alpha_num="abcdefghijklmnopqrstuvwxyz0123456789"
result= str.map do |punc|
punc.chars {|ch|alpha_num.include?(ch)}
end
p result
But it returns ["He,llo!"] without any change. Can't figure out where the problem is.
include? block returns true/false, try use select function to filter illegal characters.
result = str.map {|txt| txt.chars.select {|c| alpha_num.include?(c.downcase)}}
.map {|chars| chars.join('')}
p result
str=["He,llo!"]
alpha_num="abcdefghijklmnopqrstuvwxyz0123456789"
Program
v=[]<<str.map do |x|
x.chars.map do |c|
alpha_num.chars.map.include?(c.downcase) ? c : nil
end
end.flatten.compact.join
p v
Output
["Hello"]
exclusions = ((32..126).map(&:chr) - [*'a'..'z', *'A'..'Z', *'0'..'9']).join
#=> " !\"\#$%&'()*+,-./:;<=>?#[\\]^_`{|}~"
arr = ['He,llo!', 'What Ho!']
arr.map { |word| word.delete(exclusions) }
#=> ["Hello", "WhatHo"]
If you could use a regular expression and truly only wanted to remove punctuation, you could write the following.
arr.map { |word| word.gsub(/[[:punct:]]/, '') }
#=> ["Hello", "WhatHo"]
See String#delete. Note that arr is not modified.

Ruby regex to get text blocks including delimiters

When using scan in Ruby, we are searching for a block within a text file.
Sample file:
sometextbefore
begin
sometext
end
sometextafter
begin
sometext2
end
sometextafter2
We want the following result in an array:
["begin\nsometext\nend","begin\nsometext2\nend"]
With this scan method:
textfile.scan(/begin\s.(.*?)end/m)
we get:
["sometext","sometext2"]
We want the begin and end still in the output, not cut off.
Any suggestions?
You may remove the capturing group completely:
textfile.scan(/begin\s.*?end/m)
See the IDEONE demo
The String#scan method returns captured values only if you have capturing groups defined inside the pattern, thus a non-capturing one should fix the issue.
UPDATE
If the lines inside the blocks must be trimmed from leading/trailing whitespace, you can just use a gsub against each matched block of text to remove all the horizontal whitespace (with the help of \p{Zs} Unicode category/property class):
.scan(/begin\s.*?end/m).map { |s| s.gsub(/^\p{Zs}+|\p{Zs}+$/, "") }
Here, each match is passed to a block where /^\p{Zs}+|\p{Zs}+$/ matches either the start of a line with 1+ horizontal whitespace(s) (see ^\p{Zs}+), or 1+ horizontal whitespace(s) at the end of the line (see \p{Zs}+$).
See another IDEONE demo
Here's another approach, using Ruby's flip-flop operator. I cannot say I would recommend this approach, but Rubiests should understand how the flip-flop operator works.
First let's create a file.
str =<<_
some
text
at beginning
begin
some
text
1
end
some text
between
begin
some
text
2
end
some text at end
_
#=> "some\ntext\nat beginning\nbegin\n some\n text\n 1\nend\n...at end\n"
FName = "text"
File.write(FName, str)
Now read the file line-by-line into the array lines:
lines = File.readlines(FName)
#=> ["some\n", "text\n", "at beginning\n", "begin\n", " some\n", " text\n",
# " 1\n", "end\n", "some text\n", "between\n", "begin\n", " some\n",
# " text\n", " 2\n", "end\n", "some text at end\n"]
We can obtain the desired result as follows.
lines.chunk { |line| true if line =~ /^begin\s*$/ .. line =~ /^end\s*$/ }.
map { |_,arr| arr.map(&:strip).join("\n") }
#=> ["begin\nsome\ntext\n1\nend", "begin\nsome\ntext\n2\nend"]
The two steps are as follows.
First, select and group the lines of interest, using Enumerable#chunk with the flip-flop operator.
a = lines.chunk { |line| true if line =~ /^begin\s*$/ .. line =~ /^end\s*$/ }
#=> #<Enumerator: #<Enumerator::Generator:0x007ff62b981510>:each>
We can see the objects that will be generated by this enumerator by converting it to an array.
a.to_a
#=> [[true, ["begin\n", " some\n", " text\n", " 1\n", "end\n"]],
# [true, ["begin\n", " some\n", " text\n", " 2\n", "end\n"]]]
Note that the flip-flop operator is distinguished from a range definition by making it part of a logical expression. For that reason we cannot write
lines.chunk { |line| line =~ /^begin\s*$/ .. line =~ /^end\s*$/ }.to_a
#=> ArgumentError: bad value for range
The second step is the following:
b = a.map { |_,arr| arr.map(&:strip).join("\n") }
#=> ["begin\nsome\ntext\n1\nend", "begin\nsome\ntext\n2\nend"]
Ruby has some great methods in Enumerable. slice_before and slice_after can help with this sort of problem:
string = <<EOT
sometextbefore
begin
sometext
end
sometextafter
begin
sometext2
end
sometextafter2
EOT
ary = string.split # => ["sometextbefore", "begin", "sometext", "end", "sometextafter", "begin", "sometext2", "end", "sometextafter2"]
.slice_after(/^end/) # => #<Enumerator: #<Enumerator::Generator:0x007fb1e20b42a8>:each>
.map{ |a| a.shift; a } # => [["begin", "sometext", "end"], ["begin", "sometext2", "end"], []]
ary.pop # => []
ary # => [["begin", "sometext", "end"], ["begin", "sometext2", "end"]]
If you want the resulting sub-arrays joined then that's an easy step:
ary.map{ |a| a.join("\n") } # => ["begin\nsometext\nend", "begin\nsometext2\nend"]

Build list of substrings created by separating a string by a match

I have a string:
"a_b_c_d_e"
I would like to build a list of substrings that result from removing everything after a single "_" from the string. The resulting list would look like:
['a_b_c_d', 'a_b_c', 'a_b', 'a']
What is the most rubyish way to achieve this?
s = "a_b_c_d_e"
a = []
s.scan("_"){a << $`} #`
a # => ["a", "a_b", "a_b_c", "a_b_c_d"]
You can split the string on the underscore character into an Array. Then discard the last element of the array and collect the remaining elements in another array joined by underscores. Like this:
str = "a_b_c_d_e"
str_ary = str.split("_") # will yield ["a","b","c","d","e"]
str_ary.pop # throw out the last element in str_ary
result_ary = [] # an empty array where you will collect your results
until str_ary.empty?
result_ary << str_ary.join("_") #collect the remaining elements of str_ary joined by underscores
str_ary.pop
end
# result_ary = ["a_b_c_d","a_b_c","a_b","a"]
Hope this helps.
I am not sure about “most rubyish”, my solutions would be:
str = 'a_b_c_d_e'
(items = str.split('_')).map.with_index do |_, i|
items.take(i + 1).join('_')
end.reverse
########################################################
(items = str.split('_')).size.downto(1).map do |e|
items.take(e).join('_')
end
########################################################
str.split('_').inject([]) do |memo, l|
memo << [memo.last, l].compact.join('_')
end.reverse
########################################################
([items]*items.size).map.with_index(&:take).map do |e|
e.join('_')
end.reject(&:empty?).reverse
My fave:
([str]*str.count('_')).map.with_index do |s, i|
s[/\A([^_]+_){#{i + 1}}/][0...-1]
end.reverse
Ruby ships with a module for abbreviation.
require "abbrev"
puts ["a_b_c_d_e".tr("_","")].abbrev.keys[1..-1].map{|a| a.chars*"_"}
# => ["a_b_c_d", "a_b_c", "a_b", "a"]
It works on an Array with words - just one in this case. Most work is removing and re-placing the underscores.

Counting frequency of symbols

So I have the following code which counts the frequency of each letter in a string (or in this specific instance from a file):
def letter_frequency(file)
letters = 'a' .. 'z'
File.read(file) .
split(//) .
group_by {|letter| letter.downcase} .
select {|key, val| letters.include? key} .
collect {|key, val| [key, val.length]}
end
letter_frequency(ARGV[0]).sort_by {|key, val| -val}.each {|pair| p pair}
Which works great, but I would like to see if there is someway to do something in ruby that is similar to this but to catch all the different possible symbols? ie spaces, commas, periods, and everything in between. I guess to put it more simply, is there something similar to 'a' .. 'z' that holds all the symbols? Hope that makes sense.
You won't need a range when you're trying to count every possible character, because every possible character is a domain. You should only create a range when you specifically need to use a subset of said domain.
This is probably a faster implementation that counts all characters in the file:
def char_frequency(file_name)
ret_val = Hash.new(0)
File.open(file_name) {|file| file.each_char {|char| ret_val[char] += 1 } }
ret_val
end
p char_frequency("1003v-mm") #=> {"\r"=>56, "\n"=>56, " "=>2516, "\xC9"=>2, ...
For reference I used this test file.
It may not use much Ruby magic with Ranges but a simple way is to build a character counter that iterates over each character in a string and counts the totals:
class CharacterCounter
def initialize(text)
#characters = text.split("")
end
def character_frequency
character_counter = {}
#characters.each do |char|
character_counter[char] ||= 0
character_counter[char] += 1
end
character_counter
end
def unique_characters
character_frequency.map {|key, value| key}
end
def frequency_of(character)
character_frequency[character] || 0
end
end
counter = CharacterCounter.new("this is a test")
counter.character_frequency # => {"t"=>3, "h"=>1, "i"=>2, "s"=>3, " "=>3, "a"=>1, "e"=>1}
counter.unique_characters # => ["t", "h", "i", "s", " ", "a", "e"]
counter.frequency_of 't' # => 3
counter.frequency_of 'z' # => 0

Replacing a char in Ruby with another character

I'm trying to replace all spaces in a string with '%20', but it's not producing the result I want.
I'm splitting the string, then going through each character. If the character is " " I want to replace it with '%20', but for some reason it is not being replaced. What am I doing wrong?
def twenty(string)
letters = string.split("")
letters.each do |char|
if char == " "
char = '%20'
end
end
letters.join
end
p twenty("Hello world is so played out")
Use URI.escape(...) for proper URI encoding:
require 'uri'
URI.escape('a b c') # => "a%20b%20c"
Or, if you want to roll your own as a fun learning exercise, here's my solution:
def uri_escape(str, encode=/\W/)
str.gsub(encode) { |c| '%' + c.ord.to_s(16) }
end
uri_escape('a b!c') # => "a%20%20b%21c"
Finally, to answer your specific question, your snippet doesn't behave as expected because the each iterator does not mutate the target; try using map with assignment (or map!) instead:
def twenty(string)
letters = string.split('')
letters.map! { |c| (c == ' ') ? '%20' : c }
letters.join
end
twenty('a b c') # => "a%20b%20c"
If you want to first split the string on spaces, you could do this:
def twenty(string)
string.split(' ').join('%20')
end
p twenty("Hello world is so played out")
#=> "Hello%20world%20is%20so%20played%20out"
Note that this is not the same as
def twenty_with_gsub(string)
string.gsub(' ', '%20')
end
for if
string = 'hi there'
then
twenty(string)
#=> "hi%20there"
twenty_with_gsub(string)
#=> "hi%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20there"

Resources