How to optimize code removing unwanted characters - ruby

This code is designed for a problem where the users computer has a bug where every time he/she hits the backspace button it displays a '<' symbol. The created program should fix this and output the intended string considering that '<' represents a backspace. The input string can be up to 10^6 characters long, and it only will include lowercase letters and '<'.
My code seems to be executing correctly but, when I submit it, the website says it exceeded the time limit for test 5/25. The amount of time given is 1 second. Also, if there are only '<' symbols it should produce no output.
For example,
"hellooo<< my name is matthe<<"
would output
"hello my name is matt"
and
"ssadfas<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<"
would output nothing, etc.
Here is the code:
input = gets.chomp
while input[/[[:lower:]]</]
input.gsub!(/[[:lower:]]</, "")
end
input.gsub!(/</, "")
puts"#{input}"
In the code above I stay in the while loop if there are any instances where a lowercase letter is in front of a '<'. Anywhere a lowercase letter is followed by a '<' it is replaced with nothing. Once the while loop is exited if there are any '<' symbols left, they are replaced with nothing. Then the final string is displayed.
I created a test which I think is worst case scenario for my code:
input = ("a" + "<" + "a")*10000000
#input = gets.chomp
while input[/[[:lower:]]</]
input.gsub!(/[[:lower:]]</, "")
end
input.gsub!(/</, "")
puts"#{input}"
I made the program stop between the creation of the string and the execution of the while loop and then ran it completely to be able to eyeball if it was taking longer than a second. It seemed to take much longer than 1 second.
How can it be modified to be faster or is there a much better way to do this?

Your approach is good but you get better performance if you adapt the regular expression.
Cary, I hope you don't mind I take your excellent solution also in the benchmark ?
Benchmark done on a MRI ruby 2.3.0p0 (2015-12-25 revision 53290) [x64-mingw32]
I use .dup on my sample string to make sure none of the methods changes the input sample.
require 'benchmark'
input = ""
10_000_000.times{input << ['a','<'].sample}
def original_method inp
while inp[/[[:lower:]]</]
inp.gsub!(/[[:lower:]]</, "")
end
inp.gsub(/</, "")
end
def better_method inp
tuple = /[^<]</
while inp[tuple]
inp.gsub!(inp[tuple], "")
end
inp.gsub(/</, "")
end
def backspace str
bs_count = 0
str.reverse.each_char.with_object([]) do |s, arr|
if s == '<'
bs_count += 1
else
bs_count.zero? ? arr.unshift(s) : bs_count -= 1
end
end.join
end
puts original_method(input.dup).length
puts better_method(input.dup).length
puts backspace(input.dup).length
Benchmark.bm do |x|
x.report("original_method") { original_method(input.dup) }
x.report("backspace ") { backspace(input.dup) }
x.report("better_method ") { better_method(input.dup) }
end
gives
3640
3640
3640
user system total real
original_method 3.494000 0.016000 3.510000 ( 3.510709)
backspace 1.872000 0.000000 1.872000 ( 1.862550)
better_method 1.155000 0.031000 1.186000 ( 1.187495)

def backspace(str)
bs_count = 0
str.reverse.each_char.with_object([]) do |s, arr|
if s == '<'
bs_count += 1
else
bs_count.zero? ? arr.unshift(s) : bs_count -= 1
end
end.join
end
backspace "Now is the<< tim<e fo<<<r every<<<one to chill ou<<<<t"
#=> "Now is t tier evone to chilt"

Related

why is my input method for the hangman game failing to function properly?

I have this method where it gets an input from the user and it checks it against a while condition. if the user inputted anything that isnt a string or if the user inputted a character that was longer than 1 the method would prompt the user again for a valid input, basically adhering to the hangman rules. Heres the code
class Hangman
def initialize
dictionary = File.open('5desk.txt',"r")
line = dictionary.readlines
#word = line[rand(1..line.length)]
#length = #word.length
random = #word.length - rand(#word.length/2)
random.times do
#word[rand(#word.length)] = "_"
end
end
This method fails to function properly.
def get_input
puts #word
puts "Letter Please?"
#letter = gets.chomp
while !#letter.kind_of? String || #letter.length != 1
puts "Invalid input,try again!"
#letter = gets.chomp
end
end
end
Game = Hangman.new
Game.get_input
class Hangman
Stop right there! Why create a class considering that you would only create a single instance of it? There's no need for one. A few methods and one instance variable are sufficient.
Generate secret words randomly
I assume the file '5desk.txt' contains one secret words per line and you will be selecting one randomly. So begin by gulping the entire file into an array held by an instance variable (as opposed to reading the file line-by-line). I assume '5desk.txt1' contains the three words shown below.
#secret_words = File.readlines('5desk.txt', chomp: true)
#=> ["cat", "violin", "whoops"]
See the doc for the class method IO::readlines1,2. The option chomp: true removes the newline character from the end of each line.
This method closes the file after it has been read. (You used File::open. When doing so you need to close the file when you are finished with it: f = File.open(fname)...f.close.)
You need a method to randomly choose a secret_word.
def fetch_secret_word
#secret_words.sample
end
fetch_secret_word
#=> "violin"
See Array#sample. You could have instead used
#secret_words[rand(#secret_words.size)]
See Kernel#rand. The first and last words in #secret_words are #secret_words[0] and #secret_words[#secret_words.size-1]. Therefore, where you wrote
#word = line[rand(1..line.length)]
it should have been
#word = line[rand(0..line.length-1)]
which is the same as
#word = line[rand(line.length)]
Now let's create a method for playing the game, passing an argument that equals the maximum number of incorrect guesses the player has before losing.
def play_hangman(max_guesses)
First get a secret word:
secret_word = fetch_secret_word
Let us suppose that secret_word #=> "violin"
Initialize objects
Next, initialize the number of incorrect guesses and an image of the secret word:
incorrect_guesses = 0
secret_word_image = "-" * secret_word.size
#=> "------"
So we now have
def play_hangman(max_guesses)
secret_word = fetch_secret_word
incorrect_guesses = 0
secret_word_image = "-" * secret_word.size
Loop over guesses
Now we need to loop over the player's guesses. I suggest you use Kernel#loop, in conjuction with the keyword break for all your looping needs. (For now, forget about while and until, and never use for.) The first thing we will do in the loop is to obtain the guess of a letter from the player, which I'll do by calling a method:
loop do
guess = get_letter(secret_word_image)
...<to be completed>
end
def get_letter(secret_word_image)
loop do
puts secret_word_image
print "Gimme a letter: "
letter = gets.chomp.downcase
break letter if letter.match?(/[a-z]/)
puts "That's not a letter. Try again."
end
end
guess = secret_letter(secret_word_image)
#=> "b"
Here this method returns "b" (the guess) and displays:
------
Gimme a letter: &
That's not a letter. Try again.
------
Gimme a letter: 3
That's not a letter. Try again.
------
Gimme a letter: b
See if letter guessed is in secret word
Now we need to see which if any of the hidden letters equal letter. Again, let's make this a method3.
def hidden_letters(guess, secret_word, secret_word_image)
(0..secret_word.size-1).select do |i|
guess == secret_word[i] && secret_word_image[i] = '-'
end
end
Suppose guess #=> "i". Then:
idx = hidden_letters(guess, secret_word, secret_word_image)
#=> [1,4]
There are two "i"'s, at indices 1 and 4. Had there been no hidden letters "i" the method would have returned an empty array.
Before continuing let's look at our play_hangman is coming along.
def play_hangman(max_guesses)
secret_word = fetch_secret_word
incorrect_guesses = 0
secret_word_image = "-" * secret_word.size
loop do
unless secret_word_image.include?('-')
puts "You've won. The secret word is '#{secret_word}'!"
break
end
guess = get_letter(secret_word_image)
idx = hidden_letters(guess, secret_word, secret_word_image)
...<to be completed>
end
Process a guess
We now have to carry out one course of action if the array idx is empty and another if it is not.
case idx.size
when 0
puts "Sorry, no #{guess}'s"
incorrect_guesses += 1
if incorrect_guesses == max_guesses
puts "Oh, my, you've used up all your guesses, but"
puts "we'd like you take home a bar of soap"
break
else
puts idx.size == 1 ? "There is 1 #{guess}!" :
"There are #{idx} #{guess}'s!"
idx.each { |i| secret_word_image[i] = guess }
if secret_word_image == secret_word
puts "You've won!! The secret word is '#{secret_word}'!"
break
end
end
Complete method
So now let's look at the full method (which calls fetch_secret_word, get_letter and hidden_letters).
def play_hangman(max_guesses)
secret_word = fetch_secret_word
incorrect_guesses = 0
secret_word_image = "-" * secret_word.size
loop do
guess = get_letter(secret_word_image)
idx = hidden_letters(guess, secret_word, secret_word_image)
case idx.size
when 0
puts "Sorry, no #{guess}'s"
incorrect_guesses += 1
if incorrect_guesses == max_guesses
puts "Oh, my, you've used up all your guesses,\n" +
"but we'd like you take home a bar of soap"
return
end
else
puts idx.size == 1 ? "There is 1 #{guess}!" :
"There are #{idx.size} #{guess}'s!"
idx.each { |i| secret_word_image[i] = guess }
if secret_word_image == secret_word
puts "You've won!! The secret word is '#{secret_word}'!"
return
end
end
end
end
Play the game!
Here is a example play of the game.
play_hangman(4)
------
Gimme a letter: #
That's not a letter. Try again.
------
Gimme a letter: e
Sorry, no e's
------
Gimme a letter: o
There is 1 o!
--o---
Gimme a letter: i
There are 2 i's!
-io-i-
Gimme a letter: l
There is 1 l!
-ioli-
Gimme a letter: v
There is 1 v!
violi-
Gimme a letter: r
Sorry, no r's
violi-
Gimme a letter: s
Sorry, no s's
violi-
Gimme a letter: t
Sorry, no t's
Oh, my, you've used up all your guesses,
but we'd like you take home a bar of soap
1 The class File has no (class) method readlines. So how can we write File.readlines? It's because File is a subclass of IO (File.superclass #=> IO) and therefore inherits IO's methods. One commonly sees IO class methods invoked with File as the receiver.
2 Ruby's class methods are referenced mod::meth (e.g., Array::new), where mod is the name of a module (which may be a class) and meth is the method. Instance methods are referenced mod#meth (e.g., Array#join).
3 Some Rubyists prefer to write (0..secret_word.size-1) with three dots: (0...secret_word.size). I virtually never use three dots because I find it tends to create bugs. The one exception is when creating an infinite range that excludes the endpoint (e.g., 1.0...1.5).

How to speed up Ruby script? Or shell script alternative?

I have a Ruby script that does the following to a text file:
removes non-ASCII lines
removes lines containing "::" (two colons in a row)
if there is more than one ":" present in the line (which aren't directly next to each other), it only keeps the strings on both sides of the last colon.
removes leading whitespace
removes unusual control characters
The problem is, I'm working with files that have ~20 million lines, and my script says it'll take ~45 minutes to run.
Is there a way to majorly speed this up? Or, is there a significantly quicker way to handle this in shell?
require 'ruby-progressbar'
class String
def strip_control_characters()
chars.each_with_object("") do |char, str|
str << char unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
end
end
def strip_control_and_extended_characters()
chars.each_with_object("") do |char, str|
str << char if char.ascii_only? and char.ord.between?(32,126)
end
end
end
class Numeric
def percent_of(n)
self.to_f / n.to_f * 100.0
end
end
def clean(file_in,file_out)
if !File.exists?(file_in)
puts "File '#{file_in}' does not exist."
return
end
File.delete(file_out) if File.exist?(file_out)
`touch #{file_out}`
deleted = 0
count = 0
line_count = `wc -l "#{file_in}"`.strip.split(' ')[0].to_i
puts "File has #{line_count} lines. Cleaning..."
progressbar = ProgressBar.create(total: line_count, length: 100, format: 'Progress |%B| %a %e')
IO.foreach(file_in) {|x|
if x.ascii_only?
line = x.strip_control_and_extended_characters.strip
if line == ""
deleted += 1
next
end
if line.include?("::")
deleted += 1
next
end
split = line.split(":")
c = split.count
if c == 1
deleted += 1
next
end
if c > 2
line = split.last(2).join(":")
end
if line != ""
File.open(file_out, 'a') { |f| f.puts(line) }
else
deleted += 1
end
else
deleted += 1
end
progressbar.progress += 1
}
puts "Deleted #{deleted} lines."
end
Here is one of your big problems:
if line != ""
File.open(file_out, 'a') { |f| f.puts(line) }
end
So your program needs to open and close the output file millions of times because it is doing that for every single line. Each time it opens it, since it is being opened in append mode, your system might have to do a lot of work to find the end of the file.
You should really change your program to open the output file once at the beginning and only close it at the end. Also, run strace to see what your Ruby I/O operations are doing behind the scenes; it should buffer up the writes and then send them to the OS in blocks of about 4 kilobytes at a time; it shouldn't issue a write system call for every single line.
To further improve the performance, you should use a Ruby profiling tool to see which functions are taking the most time.
You can improve the speed by changing your String additions to variations on:
class String
def strip_control_characters()
gsub(/[[:cntrl:]]+/, '')
end
def strip_control_and_extended_characters()
strip_control_characters.gsub(/[^[:ascii:]]+/, '')
end
end
str = (0..255).to_a.map { |b| b.chr }.join # => "\x00\x01\x02\x03\x04\x05\x06\a\b\t\n\v\f\r\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\e\x1C\x1D\x1E\x1F !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7F\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
str.strip_control_characters
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
str.strip_control_and_extended_characters
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
Use the built-in gsub method along with the POSIX character-sets instead of iterating over the strings and testing each character.
As #Myst said though, monkey-patching is rude. Use refinements, or create some methods and pass in the string:
def strip_control_characters(str)
str.gsub(/[[:cntrl:]]+/, '')
end
def strip_control_and_extended_characters(str)
strip_control_characters(str).gsub(/[^[:ascii:]]+/, '')
end
str = (0..255).to_a.map { |b| b.chr }.join # => "\x00\x01\x02\x03\x04\x05\x06\a\b\t\n\v\f\r\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\e\x1C\x1D\x1E\x1F !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7F\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
strip_control_characters(str)
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
strip_control_and_extended_characters(str)
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
Moving on...
`touch #{file_out}`
is a problem too. You're create a sub-shell every time that runs, executing touch then tearing it down which is a slow operation. Let Ruby do it:
=== Implementation from FileUtils
------------------------------------------------------------------------------
touch(list, noop: nil, verbose: nil, mtime: nil, nocreate: nil)
------------------------------------------------------------------------------
Updates modification time (mtime) and access time (atime) of file(s) in list.
Files are created if they don't exist.
FileUtils.touch 'timestamp'
FileUtils.touch Dir.glob('*.c'); system 'make'
Finally, learn to benchmark code as you develop. Take the time to think of a couple ways to do something, then test them against each other and find out which is the fastest. I use Fruity, because it handles issues that the Benchmark class doesn't, but do one or the other. You can find a lot of tests I did here for various things by searching SO for my user and "benchmark".
require 'fruity'
class String
def strip_control_characters()
chars.each_with_object("") do |char, str|
str << char unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
end
end
def strip_control_and_extended_characters()
chars.each_with_object("") do |char, str|
str << char if char.ascii_only? and char.ord.between?(32,126)
end
end
end
def strip_control_characters2(str)
str.gsub(/[[:cntrl:]]+/, '')
end
def strip_control_and_extended_characters2(str)
strip_control_characters2(str).gsub(/[^[:ascii:]]+/, '')
end
str = (0..255).to_a.map { |b| b.chr }.join
str.strip_control_characters # => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
strip_control_characters2(str) # => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
str.strip_control_and_extended_characters # => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
strip_control_and_extended_characters2(str) # => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
compare do
scc { str.strip_control_characters }
scc2 { strip_control_characters2(str) }
end
# >> Running each test 512 times. Test will take about 1 second.
# >> scc2 is faster than scc by 10x ± 1.0
and:
compare do
scec { str.strip_control_and_extended_characters }
scec2 { strip_control_and_extended_characters2(str) }
end
# >> Running each test 256 times. Test will take about 1 second.
# >> scec2 is faster than scec by 5x ± 1.0
There seem to be only to possible approaches to optimizing this:
Concurrency.
If your machine is a Unix/Linux based machine that has a multi-core CPU, you can take advantage of the multi-cores by using fork, dividing up the work between different processes.
Multi-threading might not work as well as you'd expect with Ruby, since there's a GIL (Global Instruction Lock) that prevents multiple threads from running together.
Code optimizations.
These include minimizing system calls (such as the File.open) and minimizing any temporary objects.
I would start with this approach before I moved on to fork, mainly due to the extra coding required when using fork.
The first approach requires a large rewrite of the script, while the second approach might be more easily achieved.
For example, the following approach minimizes some system calls (such as the File's open, close and write system calls):
require 'ruby-progressbar'
class String
def strip_control_characters()
chars.each_with_object("") do |char, str|
str << char unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
end
end
def strip_control_and_extended_characters()
chars.each_with_object("") do |char, str|
str << char if char.ascii_only? and char.ord.between?(32,126)
end
end
end
class Numeric
def percent_of(n)
self.to_f / n.to_f * 100.0
end
end
def clean(file_in,file_out)
if !File.exists?(file_in)
puts "File '#{file_in}' does not exist."
return
end
File.delete(file_out) if File.exist?(file_out)
`touch #{file_out}`
deleted = 0
count = 0
line_count = `wc -l "#{file_in}"`.strip.split(' ')[0].to_i
puts "File has #{line_count} lines. Cleaning..."
progressbar = ProgressBar.create(total: line_count, length: 100, format: 'Progress |%B| %a %e')
file_fd = File.open(file_out, 'a')
buffer = "".dup
IO.foreach(file_in) {|x|
if x.ascii_only?
line = x.strip_control_and_extended_characters.strip
if line == ""
deleted += 1
next
end
if line.include?("::")
deleted += 1
next
end
split = line.split(":")
c = split.count
if c == 1
deleted += 1
next
end
if c > 2
line = split.last(2).join(":")
end
if line != ""
buffer += "\r\n#{line}"
else
deleted += 1
end
else
deleted += 1
end
if buffer.length >= 2048
file_fd.puts(buffer)
buffer.clear
end
progressbar.progress += 1
}
file_fd.puts(buffer)
buffer.clear
file_fd.close
puts "Deleted #{deleted} lines."
end
P.S.
I would avoid monkey patching - it's rude.
After posting this I read #DavidGrayson's answer, which pinpoints an issue with your code's performance in a much shorter and succinct answer.
I up-voted his answer, as I think you'll get a big performance gain from this simple change.

Ruby program for searching words in a file

I started with Ruby yesterday, I only have some experience with C.
Now I'm trying to write a program that gets a file and a word to search in that file from ARGV, and prints how many times the word appeared. Got rid of any error, but it prints 0 anyway when I test it.
if ARGV.size !=2
puts "INSERT A FILE AND A WORD OR A CHAR TO SEARCH FOR"
exit 1
else
file = File.open(ARGV[0], mode = "r")
word = ARGV[1]
if !file
puts "ERROR: INVALID INPUT FILE"
exit 1
end
while true
begin
i = 0
count_word = 0
string = []
string[i] = file.readline
if string[i].upcase.include? word.upcase
count_word += 1
end
i += 1
rescue EOFError
break
end
end
print "The word searched is ", word, " Frequency: ", count_word, "\n"
end
I hope you could tell me what's wrong (I believe I do something wrong when counting), thanks in advance.
A great thing about Ruby it that it operates on a way higher level of abstraction. Here is a snippet that does what you want:
if ARGV.size != 2
puts "Provide file to be searched in and word to be found"
exit 1
end
file = ARGV[0]
word = ARGV[1]
count = 0
File.open(file, 'r').each { |line| count += 1 if line.downcase.include? word.downcase }
puts "The word searched is #{word} Frequency: #{count}"
As you can see, the language provides a lot of features like string interpolation, enumeration of the file contents, etc.
There is a handful of problems with the code you provided. From styling issues like indentation, to incorrect assumptions about the language like the if !file check and strange decisions overall - like why do you use a list if you want only the current line.
I suggest you to look at http://tryruby.org/ . It is very short and will get you a feel of the Ruby way to do things. Also it covers your question (processing files).
As a general note when you post a question on stackoverflow, please include the code in the question, rather than link to an external page. This way people can read through it faster, edit it and the code wont be lost if the other site goes down. You can still link to external pages if you want to show the snippet in action.
Hope this will help you, the error that you did is that you included this part:
i = 0
count_word = 0
string = []
into the while loop, which every time resets your counter to zero even if it found the word, so to correct this error here what you should do:
if ARGV.size !=2
puts "INSERT A FILE AND A WORD OR A CHAR TO SEARCH FOR"
exit 1
else
file = File.open(ARGV[0], mode = "r")
word = ARGV[1]
if !file
puts "ERROR: INVALID INPUT FILE"
exit 1
end
i = 0
count_word = 0
string = []
while true
begin
string[i] = file.readline
if string[i].upcase.include? word.upcase
count_word += 1
end
i += 1
rescue EOFError
break
end
end
print "The word searched is ", word, " Frequency: ", count_word, "\n"
end

Truncate string when it is too long

I have two strings:
short_string = "hello world"
long_string = "this is a very long long long .... string" # suppose more than 10000 chars
I want to change the default behavior of print to:
puts short_string
# => "hello world"
puts long_string
# => "this is a very long long....."
The long_string is only partially printed. I tried to change String#to_s, but it didn't work. Does anyone know how to do it like this?
updated
Actually i wanna it works smoothly, that means the following cases also work fine:
> puts very_long_str
> puts [very_long_str]
> puts {:a => very_long_str}
So i think the behavior belongs to String.
Thanks all anyway.
First of all, you need a method to truncate a string, either something like:
def truncate(string, max)
string.length > max ? "#{string[0...max]}..." : string
end
Or by extending String: (it's not recommended to alter core classes, though)
class String
def truncate(max)
length > max ? "#{self[0...max]}..." : self
end
end
Now you can call truncate when printing the string:
puts "short string".truncate
#=> short string
puts "a very, very, very, very long string".truncate
#=> a very, very, very, ...
Or you could just define your own puts:
def puts(string)
super(string.truncate(20))
end
puts "short string"
#=> short string
puts "a very, very, very, very long string"
#=> a very, very, very, ...
Note that Kernel#puts takes a variable number of arguments, you might want to change your puts method accordingly.
This is how Ruby on Rails does it in their String#truncate method as a monkey-patch:
class String
def truncate(truncate_at, options = {})
return dup unless length > truncate_at
options[:omission] ||= '...'
length_with_room_for_omission = truncate_at - options[:omission].length
stop = if options[:separator]
rindex(options[:separator], length_with_room_for_omission) ||
length_with_room_for_omission
else
length_with_room_for_omission
end
"#{self[0...stop]}#{options[:omission]}"
end
end
Then you can use it like this
'And they found that many people were sleeping better.'.truncate(25, omission: '... (continued)')
# => "And they f... (continued)"
You can write a wrapper around puts that handles truncation for you:
def pleasant(string, length = 32)
raise 'Pleasant: Length should be greater than 3' unless length > 3
truncated_string = string.to_s
if truncated_string.length > length
truncated_string = truncated_string[0...(length - 3)]
truncated_string += '...'
end
puts truncated_string
truncated_string
end
Truncate naturally
I want to propose a solution that truncates naturally. I fell in love with the String#truncate method offered by Ruby on Rails. It was already mentioned by #Oto Brglez above. Unfortunately I couldn't rewrite it for pure ruby. So I wrote this function.
def truncate(content, max)
if content.length > max
truncated = ""
collector = ""
content = content.split(" ")
content.each do |word|
word = word + " "
collector << word
truncated << word if collector.length < max
end
truncated = truncated.strip.chomp(",").concat("...")
else
truncated = content
end
return truncated
end
Example
Test: I am a sample phrase to show the result of this function.
NOT: I am a sample phrase to show the result of th...
BUT: I am a sample phrase to show the result of...
Note: I'm open for improvements because I'm convinced that there is a shorter solution possible.
You can just use this syntax:
"mystring"[0..MAX_LENGTH]
[5] pry(main)> "hello world"[0..10]
=> "hello world"
[6] pry(main)> "hello world why"[0..10]
=> "hello world"
[7] pry(main)> "hello"[0..10]
=> "hello"
There's no need to check if it actually exceed the maximum length.

Misbehaving Case Statement

I'm messing around in Ruby some more. I have a file containing a class with two methods and the following code:
if __FILE__ == $0
seq = NumericSequence.new
puts "\n1. Fibonacci Sequence"
puts "\n2. Pascal\'s Triangle"
puts "\nEnter your selection: "
choice = gets
puts "\nExcellent choice."
choice = case
when 1
puts "\n\nHow many fibonacci numbers would you like? "
limit = gets.to_i
seq.fibo(limit) { |x| puts "Fibonacci number: #{x}\n" }
when 2
puts "\n\nHow many rows of Pascal's Triangle would you like?"
n = gets.to_i
(0..n).each {|num| seq.pascal_triangle_row(num) \
{|row| puts "#{row} "}; puts "\n"}
end
end
How come if I run the code and supply option 2, it still runs the first case?
Your case syntax is wrong. Should be like this:
case choice
when '1'
some code
when '2'
some other code
end
Take a look here.
You also need to compare your variable against strings, as gets reads and returns user input as a string.
Your bug is this: choice = case should be case choice.
You're providing a case statement with no "default" object, so the first clause, when 1, always returns true.
Effectively, you've written: choice = if 1 then ... elsif 2 then ... end
And, as Mladen mentioned, compare strings to strings or convert to int: choice = gets.to_i

Resources