I am writing a bowling score calculator in Ruby that is defined and tested using RSpec. It currently runs, but only passes 5 of the 8 input tests. Here is the code for my implementation:
class ScoreKeeper
def calculate(input)
unless input.is_a? String
raise argumentException, "Score Keeper will only except string types for score calculation."
end
# Thanksgiving Turkey Edge Case
return 300 if input == "xxxxxxxxxxxx"
# Calculate Score
throws = input.gsub(/-/, "0").split(//)
score = 0
throws.each_with_index do |ball, i|
current_throw = i
last_throw = throws[i - 1] || "0"
lastlast_throw = throws[i - 2] || "0"
next_throw = throws[i + 1] || "0"
if current_throw == 0
last_throw = 0
lastlast_throw = 0
end
if current_throw == 1
lastlast_throw = 0
end
working_value = 0
# Add numbers directly (unless part of a spare frame)
if ((1..9) === ball.to_i)
working_value = ball.to_i
end
# Add strike as 10 points
if ball == "x"
working_value = 10
end
# Add spare as number of remaining pins from last throw
if ball == "/"
if last_throw == "/" || last_throw == "x"
raise argumentException, "Invalid score string. A spare cannot immediately follow a strike or spare."
end
working_value = 10 - last_throw.to_i
end
# Strike / Spare Bonus
if last_throw == "x" || last_throw == "/" || lastlast_throw == "x"
score += working_value
end
# Add current throw value
score += working_value
end
if score > 300 || score < 0
raise argumentExcpetion, "Invalid score string. Impossible score detected."
end
score
end
end
I can't tell why my code is not calculating a proper score in every test case.
The RSpec:
require "./score_keeper"
describe ScoreKeeper do
describe "calculating score" do
let(:score_keeper) { described_class.new }
context "when rolls are valid" do
{
"xxxxxxxxxxxx" => 300,
"--------------------" => 0,
"9-9-9-9-9-9-9-9-9-9-" => 90,
"5/5/5/5/5/5/5/5/5/5/5" => 150,
"14456/5/---17/6/--2/6" => 82,
"9/3561368153258-7181" => 86,
"9-3/613/815/0/8-7/8-" => 121,
"x3/61xxx2/9-7/xxx" => 193
}.each do |bowling_stats, score|
it "returns #{score} for #{bowling_stats}" do
expect(score_keeper.calculate(bowling_stats)).to eq score
end
end
end
end
end
The failing inputs are:
"5/5/5/5/5/5/5/5/5/5/5" (expected: 150, got: 155),
"x3/61xxx2/9-7/xxx" (expected: 82, got: 88),
"14456/5/---17/6/--2/6" (expected: 193, got: 223)
The first thing I see is your use of gsub:
input.gsub(/-/, "0")
You're not assigning the string returned by gsub to anything, and instead you're throwing it away.
input = '#0#'
input.gsub('0', '-') # => "#-#"
input # => "#0#"
I suspect you're thinking of the mutating gsub! but instead I suggest simply passing the value to split:
_frames = input.gsub(/-/, "0").split(//)
Your code is not idiomatic Ruby; There are a number of things you need to do differently:
Instead of if !input.is_a? String use:
unless input.is_a? String
raise argumentException, "Score Keeper will only except string types for score calculation."
end
It's considered better to use unless than a negated test.
Instead of
if input == "xxxxxxxxxxxx"
return 300
end
use a "trailing if":
return 300 if input == "xxxxxxxxxxxx"
Don't name variables with a leading _. _frames should be frames.
Don't name variables like lastFrame, lastlastFrame and workingValue with mixed-case AKA "camelCase". We use "snake_case" for Ruby variables and methods and camelCase for classes and modules. It_is_a matterOfReadability.
Don't end lines with a trailing ;:
workingValue = 0;
The only time we use a trailing semicolon is when we're using multiple statements on a single line, which should be extremely rare. Just don't do that unless you know why and when you should.
Consider the potential problem you have here:
"12".include?('1') # => true
"12".include?('2') # => true
"12".include?('12') # => true
While your code might skirt that issue, don't write code like that and think about side-effects. Perhaps you want to really test to see if the value is an integer between 1 and 9?
((1 .. 9) === '1'.to_i) # => true
((1 .. 9) === '2'.to_i) # => true
((1 .. 9) === '12'.to_i) # => false
Instead of using
return score
you can simply use
score
Ruby will return the last value seen; You don't have to explicitly return it.
Indent your code properly. Your future self will appreciate it when you have to dive back into code to debug something. Consistenly use two space indents.
Use whitespace liberally to separate your code into readable blocks. It doesn't affect the run-time speed of your code and it makes it a lot easier to read. Again, your future self will appreciate it.
While it might seem nit-picking, those little things go a long way when coding in a team of developers, and failing to do those things can land you in the hot seat during a code-review.
You're problem appears to be that that for your first two frames you're adding the last two frames. Consider the following.
arr = [1,2,3,4,5,6,7,8,9]
arr.each_with_index do |num, i|
puts "current number #{num}"
puts arr[i-1]
puts arr[i-2]
end
I think you need an if statement to handle the first two frames because - index will loop back to the end of the array if you're at 0 index.
so you need something like
arr = [1,2,3,4,5,6,7,8,9]
arr.each_with_index do |num, i|
puts "current number #{num}"
if i <= 1
puts "no previous frame"
elsif i == 1
puts arr[i-1] + "can be added to frame 2"
else
puts arr[i-1] + "can be added to frame 1"
puts arr[i-2] + "can be added to frame 2"
end
end
Related
Could you please help me on improving these piece of code?
def print_last_frame(result, frame, i)
line = ''
if frame.strike?
if frame.result.reduce(:+) == 30
line += "X\t X\t X"
elsif frame.result.reduce(:+) == 20
line += "X\t #{frame.result[1]}\t /"
else
line += "X\t #{frame.result[1]}\t #{frame.result[2]}"
end
elsif frame.spare?
if frame.result.reduce(:+) == 20
line += "#{frame.result[0]}\t /\t X"
else
line += "#{frame.result[0]}\t /\t #{frame.result[2]}"
end
else
line += "#{result.shots[i]}\t #{result.shots[i+1]}"
end
line
end
I'm concerned about the conditionals
First idea I came with: using a Hash with Hash#default:
line = ""
strike = {20 => "20", 30 => "30"}
strike.default = "other"
line += strike[0]
line
#=> "other"
Did not test, but you should be able to write something like
def print_last_frame(result, frame, i)
result = frame.result
shots = result.shots
cases = {
strike: {20 => "X\t #{result[1]}\t /",
30 => "X\t X\t X"},
spare: {20 =>"#{result[0]}\t /\t X"}
cases[:strike].default = "X\t #{result[1]}\t #{result[2]}"
cases[:spare].default = "#{result[0]}\t /\t #{result[2]}"
cases.default = {}
cases.default.default = "#{shots[i]}\t #{shots[i+1]}"
cases[frame.value][result.sum]
end
Whether frame.value returns :strike, :spare or nil.
I suggest you perform your calculations as follows.
def last_frame(result, frame, i)
case
when frame.strike?
strike(frame)
when frame.spare?
spare(frame)
else
shots(result, i)
end
end
def strike(frame)
case frame.result.sum
when 30
"X\t X\t X"
when 20
"X\t #{frame.result[1]}\t /"
else
"X\t #{frame.result[1]}\t #{frame.result[2]}"
end
end
def spare(frame)
case frame.result.sum
when 20
"#{frame.result[0]}\t /\t X"
else
"#{frame.result[0]}\t /\t #{frame.result[2]}"
end
end
def shots(result, i)
"#{result.shots[i]}\t #{result.shots[i+1]}"
end
Here are some arguments for organizing your calculations this way.
breaking it into four methods makes it clear which arguments of last_frame are used by each case considered (strike and spare use only frame; shots uses result and i)
breaking it into four methods facilitates testing
there is no need for the variable line, initialized to an empty string then appended to (line += ...), as line is only set to a single value (i.e., there is no accumulation).
using puts last_frame(result, frame, i) to print results, rather than incorporating puts in the method, increases the method's flexibility
I have a preference for using case statements rather than if-elsif-else-end constructs, in part because I think they look a bit neater (even when case has no argument, as in last_frame).
I wrote a simple guess the number game. But it keeps looping even when I input the correct number. Please help, thanks!
puts "Pick a number between 0 - 1000."
user_guess = gets.chomp.to_i
my_num = rand(831)
guess_count = 0
until user_guess == my_num do
if user_guess == my_num
guess_count += 1
puts "you got it!"
elsif user_guess <= 830
guess_count += 1
puts "higher"
else user_guess >= 1000
guess_count += 1
puts "lower"
end
end
puts "You guessed my number in #{guess_count} attempts. Not bad"
The part of the code that asks for a number from the user is outside the loop, so it will not repeat after the answer is checked. If you want to ask the user to guess again when their guess is wrong, that code needs to be inside the loop.
my_num = rand(831)
guess_count = 0
keep_going = true
while keep_going do
puts "Pick a number between 0 - 1000."
user_guess = gets.chomp.to_i
if user_guess == my_num
guess_count += 1
puts "you got it!"
keep_going = false
elsif user_guess <= 830
guess_count += 1
puts "higher"
else user_guess >= 1000
guess_count += 1
puts "lower"
end
end
puts "You guessed my number in #{guess_count} attempts. Not bad"
This code still has some bugs in it that stops the game from working correctly though, see if you can spot what they are.
As #Tobias has answered your question I would like to take some time to suggest how you might make your code more Ruby-like.
Firstly, while you could use a while or until loop, I suggest you rely mainly on the method Kernel#loop for most loops you will write. This simply causes looping to continue within loop's block until the keyword break is encountered1. It is much like while true or until false (commonly used in some languages) but I think it reads better. More importantly, the use of loop protects computations within its block from prying eyes. (See the section Other considerations below for an example of this point.)
You can also exit loop's block by executing return or exit, but normally you will use break.
My second main suggestion is that for this type of problem you use a case statement rather than an if/elsif/else/end construct. Let's first do that using ranges.
Use a case statement with ranges
my_num = rand(831)
guess_count = 0
loop do
print "Pick a number between 0 and 830: "
guess_count += 1
case gets.chomp.to_i
when my_num
puts "you got it!"
break
when 0..my_num-1
puts "higher"
else
puts "lower"
end
end
There are a few things to note here.
I used print rather than puts so the user will enter their response on on the same line as the prompt.
guess_count is incremented regardless of the user's response so that can be done before the case statement is executed.
there is no need to assign the user's response (gets.chomp.to_i) to a variable.
case statements compare values with the appropriate case equality method ===.
With regard to the last point, here we are comparing an integer (gets.chomp.to_i) with another integer (my_num) and with a range (0..my_num-1). In the first instance, Integer#=== is used, which is equivalent to Integer#==. For ranges the method Range#=== is used.
Suppose, for example, that my_num = 100 and gets.chomp.to_i #=> 50 The case statement then reads as follows.
case 50
when 100
puts "you got it!"
break
when 0..99
puts "higher"
else
puts "lower"
end
Here we find that 100 == 50 #=> false and (0..99) === 50 #=> true, so puts "higher" is displayed. (0..99) === 50 returns true because the integer (on the right of ===) is covered by the range (on the left). That is not the same as 50 === (0..90), which loosely reads, "(0..99) is a member of 50", so false is returned.
Here are a couple more examples of how case statements can be used to advantage because of their reliance on the triple equality method.
case obj
when Integer
obj + 10
when String
obj.upcase
when Array
obj.reverse
...
end
case str
when /\A#/
puts "A comment"
when /\blaunch missiles\b/
big_red_button.push
...
end
Use a case statement with the spaceship operator <=>
The spaceship operator is used by Ruby's Array#sort and Enumerable#sort methods, but has other uses, as in case statements. Here we can use Integer#<=> to compare two integers.
my_num = rand(831)
guess_count = 0
loop do
print "Pick a number between 0 and 830: "
case gets.chomp.to_i <=> my_num
when 0
puts "you got it!"
break
when -1
puts "higher"
else # 1
puts "lower"
end
end
In other applications the spaceship operator might be used to compare strings (String#<=>), arrays (Array#<=>), Date objects (Date#<=>) and so on.
Use a hash
Hashes can often be used as an alternative to case statements. Here we could write the following.
response = { -1=>"higher", 0=>"you got it!", 1=>"lower" }
my_num = rand(831)
guess_count = 0
loop do
print "Pick a number between 0 and 830: "
guess = gets.chomp.to_i
puts response[guess <=> my_num]
break if guess == my_num
end
Here we need the value of gets.chomp.to_i twice, so I've saved it to a variable.
Other considerations
Suppose we write the following:
i = 0
while i < 5
i += 1
j = i
end
j #=> 5
j following the loop is found to equal 5.
If we instead use loop:
i = 0
loop do
i += 1
j = i
break if i == 5
end
j #=> NameError (undefined local variable or method 'j')
Although while and loop both have access to i, but loop confines the values of local variables created in its block to the block. That's because blocks create a new scope, which is good coding practice. while and until do not use blocks. We generally don't want code following the loop to have access to local variables created within the loop, which is one reason for favouring loop over while and until.
Lastly, the keyword break can also be used with an argument whose value is returned by loop. For example:
def m
i = 0
loop do
i += 1
break 5*i if i == 10
end
end
m #=> 50
or
i = 0
n = loop do
i += 1
break 5*i if i == 10
end
n #=> 50
1. If you examine the doc for Kernel#loop you will see that executing break from within loop's block is equivalent to raising a StopIteration exception.
# Character Counter
class String
def count_char
#lcase_count ,#upcase_count, #num_count, #spl_char_count = [0, 0 ,0 ,0]
each_char { |char|
if ('a'..'z').cover?(char)
#lcase_count += 1
elsif ('A'..'Z').cover?(char)
#upcase_count += 1
elsif ('0'..'9').cover?(char)
#num_count += 1
else
#spl_char_count += 1
end
}
return #lcase_count,#upcase_count,#num_count,#spl_char_count
end
end
input = ARGV[0]
if ARGV.empty?
puts 'Please provide an input'
exit
end
puts 'Lowercase characters = %d' % [input.count_char[0]]
puts 'Uppercase characters = %d' % [input.count_char[1]]
puts 'Numeric characters = %d' % [input.count_char[2]]
puts 'Special characters = %d' % [input.count_char[3]]
Traceback (most recent call last):
1: from new.rb:25:in <main>'
new.rb:3:incount_char': can't modify frozen String (FrozenError)
I think as far, i didnt modify string not sure why getting FrozenError
You are monkeypatching the String class and at the same time introduce new instance variables to String, which already is a terrible design decision, because - unless you are the author of the String class -, you don't know whether or not these variables exist already. Then, in your code, you modify the variables by incrementing them. Since ARGV is an array of frozen strings, you get the error.
Using instance variables here is absolutely unnecessary. Just use normal local variables.
It’s impossible to tell what exactly is wrong with your code, it looks like one of the instance variables you use is initialized as string or likewise. Introducing instance variables in foreign classes is not a good practice in general, also you do abuse each for reducing. Here is an idiomatic ruby code for your task:
class String
def count_char
each_char.with_object(
{lcase_count: 0, upcase_count: 0, num_count: 0, spl_char_count: 0}
) do |char, acc|
case char
when 'a'..'z' then acc[:lcase_count] += 1
when 'A'..'Z' then acc[:upcase_count] += 1
when '0'..'9' then acc[:num_count] += 1
else acc[:spl_char_count] += 1
end
end
end
end
Please note, that this code deals with a simple latin alphabet only. Better approach would be to match regular expressions, like:
lcase_count = scan(/\P{Lower}/).count
upcase_count = scan(/\P{Upper}/).count
...
You can try following,
class String
def count_char
chars = { lcase_count: 0 ,upcase_count: 0, num_count: 0, spl_char_count: 0 }
each_char do |char|
if ('a'..'z').cover?(char)
chars[:lcase_count] += 1
elsif ('A'..'Z').cover?(char)
chars[:upcase_count] += 1
elsif ('0'..'9').cover?(char)
chars[:num_count] += 1
else
chars[:spl_char_count] += 1
end
end
return chars
end
end
str = 'Asdssd'
# => "Asdssd"
str.count_char
# => {:lcase_count=>5, :upcase_count=>1, :num_count=>0, :spl_char_count=>0}
str.count_char[:upcase_count]
# => 1
I couldn't find a document regarding ARGV being a frozen string.
But it seems to be that is the case.
You can use dup to fix your error.
input = ARGV[0].dup
I have a Ruby script that does the following to a text file:
removes non-ASCII lines
removes lines containing "::" (two colons in a row)
if there is more than one ":" present in the line (which aren't directly next to each other), it only keeps the strings on both sides of the last colon.
removes leading whitespace
removes unusual control characters
The problem is, I'm working with files that have ~20 million lines, and my script says it'll take ~45 minutes to run.
Is there a way to majorly speed this up? Or, is there a significantly quicker way to handle this in shell?
require 'ruby-progressbar'
class String
def strip_control_characters()
chars.each_with_object("") do |char, str|
str << char unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
end
end
def strip_control_and_extended_characters()
chars.each_with_object("") do |char, str|
str << char if char.ascii_only? and char.ord.between?(32,126)
end
end
end
class Numeric
def percent_of(n)
self.to_f / n.to_f * 100.0
end
end
def clean(file_in,file_out)
if !File.exists?(file_in)
puts "File '#{file_in}' does not exist."
return
end
File.delete(file_out) if File.exist?(file_out)
`touch #{file_out}`
deleted = 0
count = 0
line_count = `wc -l "#{file_in}"`.strip.split(' ')[0].to_i
puts "File has #{line_count} lines. Cleaning..."
progressbar = ProgressBar.create(total: line_count, length: 100, format: 'Progress |%B| %a %e')
IO.foreach(file_in) {|x|
if x.ascii_only?
line = x.strip_control_and_extended_characters.strip
if line == ""
deleted += 1
next
end
if line.include?("::")
deleted += 1
next
end
split = line.split(":")
c = split.count
if c == 1
deleted += 1
next
end
if c > 2
line = split.last(2).join(":")
end
if line != ""
File.open(file_out, 'a') { |f| f.puts(line) }
else
deleted += 1
end
else
deleted += 1
end
progressbar.progress += 1
}
puts "Deleted #{deleted} lines."
end
Here is one of your big problems:
if line != ""
File.open(file_out, 'a') { |f| f.puts(line) }
end
So your program needs to open and close the output file millions of times because it is doing that for every single line. Each time it opens it, since it is being opened in append mode, your system might have to do a lot of work to find the end of the file.
You should really change your program to open the output file once at the beginning and only close it at the end. Also, run strace to see what your Ruby I/O operations are doing behind the scenes; it should buffer up the writes and then send them to the OS in blocks of about 4 kilobytes at a time; it shouldn't issue a write system call for every single line.
To further improve the performance, you should use a Ruby profiling tool to see which functions are taking the most time.
You can improve the speed by changing your String additions to variations on:
class String
def strip_control_characters()
gsub(/[[:cntrl:]]+/, '')
end
def strip_control_and_extended_characters()
strip_control_characters.gsub(/[^[:ascii:]]+/, '')
end
end
str = (0..255).to_a.map { |b| b.chr }.join # => "\x00\x01\x02\x03\x04\x05\x06\a\b\t\n\v\f\r\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\e\x1C\x1D\x1E\x1F !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7F\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
str.strip_control_characters
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
str.strip_control_and_extended_characters
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
Use the built-in gsub method along with the POSIX character-sets instead of iterating over the strings and testing each character.
As #Myst said though, monkey-patching is rude. Use refinements, or create some methods and pass in the string:
def strip_control_characters(str)
str.gsub(/[[:cntrl:]]+/, '')
end
def strip_control_and_extended_characters(str)
strip_control_characters(str).gsub(/[^[:ascii:]]+/, '')
end
str = (0..255).to_a.map { |b| b.chr }.join # => "\x00\x01\x02\x03\x04\x05\x06\a\b\t\n\v\f\r\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\e\x1C\x1D\x1E\x1F !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7F\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
strip_control_characters(str)
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
strip_control_and_extended_characters(str)
# => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
Moving on...
`touch #{file_out}`
is a problem too. You're create a sub-shell every time that runs, executing touch then tearing it down which is a slow operation. Let Ruby do it:
=== Implementation from FileUtils
------------------------------------------------------------------------------
touch(list, noop: nil, verbose: nil, mtime: nil, nocreate: nil)
------------------------------------------------------------------------------
Updates modification time (mtime) and access time (atime) of file(s) in list.
Files are created if they don't exist.
FileUtils.touch 'timestamp'
FileUtils.touch Dir.glob('*.c'); system 'make'
Finally, learn to benchmark code as you develop. Take the time to think of a couple ways to do something, then test them against each other and find out which is the fastest. I use Fruity, because it handles issues that the Benchmark class doesn't, but do one or the other. You can find a lot of tests I did here for various things by searching SO for my user and "benchmark".
require 'fruity'
class String
def strip_control_characters()
chars.each_with_object("") do |char, str|
str << char unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
end
end
def strip_control_and_extended_characters()
chars.each_with_object("") do |char, str|
str << char if char.ascii_only? and char.ord.between?(32,126)
end
end
end
def strip_control_characters2(str)
str.gsub(/[[:cntrl:]]+/, '')
end
def strip_control_and_extended_characters2(str)
strip_control_characters2(str).gsub(/[^[:ascii:]]+/, '')
end
str = (0..255).to_a.map { |b| b.chr }.join
str.strip_control_characters # => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
strip_control_characters2(str) # => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
str.strip_control_and_extended_characters # => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
strip_control_and_extended_characters2(str) # => " !\"\#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
compare do
scc { str.strip_control_characters }
scc2 { strip_control_characters2(str) }
end
# >> Running each test 512 times. Test will take about 1 second.
# >> scc2 is faster than scc by 10x ± 1.0
and:
compare do
scec { str.strip_control_and_extended_characters }
scec2 { strip_control_and_extended_characters2(str) }
end
# >> Running each test 256 times. Test will take about 1 second.
# >> scec2 is faster than scec by 5x ± 1.0
There seem to be only to possible approaches to optimizing this:
Concurrency.
If your machine is a Unix/Linux based machine that has a multi-core CPU, you can take advantage of the multi-cores by using fork, dividing up the work between different processes.
Multi-threading might not work as well as you'd expect with Ruby, since there's a GIL (Global Instruction Lock) that prevents multiple threads from running together.
Code optimizations.
These include minimizing system calls (such as the File.open) and minimizing any temporary objects.
I would start with this approach before I moved on to fork, mainly due to the extra coding required when using fork.
The first approach requires a large rewrite of the script, while the second approach might be more easily achieved.
For example, the following approach minimizes some system calls (such as the File's open, close and write system calls):
require 'ruby-progressbar'
class String
def strip_control_characters()
chars.each_with_object("") do |char, str|
str << char unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
end
end
def strip_control_and_extended_characters()
chars.each_with_object("") do |char, str|
str << char if char.ascii_only? and char.ord.between?(32,126)
end
end
end
class Numeric
def percent_of(n)
self.to_f / n.to_f * 100.0
end
end
def clean(file_in,file_out)
if !File.exists?(file_in)
puts "File '#{file_in}' does not exist."
return
end
File.delete(file_out) if File.exist?(file_out)
`touch #{file_out}`
deleted = 0
count = 0
line_count = `wc -l "#{file_in}"`.strip.split(' ')[0].to_i
puts "File has #{line_count} lines. Cleaning..."
progressbar = ProgressBar.create(total: line_count, length: 100, format: 'Progress |%B| %a %e')
file_fd = File.open(file_out, 'a')
buffer = "".dup
IO.foreach(file_in) {|x|
if x.ascii_only?
line = x.strip_control_and_extended_characters.strip
if line == ""
deleted += 1
next
end
if line.include?("::")
deleted += 1
next
end
split = line.split(":")
c = split.count
if c == 1
deleted += 1
next
end
if c > 2
line = split.last(2).join(":")
end
if line != ""
buffer += "\r\n#{line}"
else
deleted += 1
end
else
deleted += 1
end
if buffer.length >= 2048
file_fd.puts(buffer)
buffer.clear
end
progressbar.progress += 1
}
file_fd.puts(buffer)
buffer.clear
file_fd.close
puts "Deleted #{deleted} lines."
end
P.S.
I would avoid monkey patching - it's rude.
After posting this I read #DavidGrayson's answer, which pinpoints an issue with your code's performance in a much shorter and succinct answer.
I up-voted his answer, as I think you'll get a big performance gain from this simple change.
This function is supposed to take a string and return the characters in reverse order.
def reverse(string)
reversedString = "";
i = string.length - 1
while i >= 0
reversedString = reversedString + string[i]
i -= 1
end
puts reversedString
end
however all the tests return false:
puts(
'reverse("abc") == "cba": ' + (reverse("abc") == "cba").to_s
)
puts(
'reverse("a") == "a": ' + (reverse("a") == "a").to_s
)
puts(
'reverse("") == "": ' + (reverse("") == "").to_s
)
Does anyone see what the problem is?
Try to use the default String class reverse method like this:
"Hello World".reverse
"Hello World".reverse!
Check Ruby's String class API at https://ruby-doc.org/core-2.4.0/String.html
If you want to make your custom method, you could use a map like this:
string = String.new
"Hello World".chars.each { | c | string.prepend c }
The problem is your function isn't returning its result, it's printing it. It needs to return reversedString.
As a rule of thumb, functions should return their result. Another function should format and print it.
def reverse(string)
reversedString = "";
i = string.length - 1
while i >= 0
reversedString = reversedString + string[i]
i -= 1
end
return reversedString
end
Note: This was probably an exercise, but Ruby already has String#reverse.
It's good that you're writing tests, but the way you're writing them it's hard to tell what went wrong. Look into a Ruby testing framework like MiniTest.
require "minitest/autorun"
class TestReverse < Minitest::Test
def test_reverse
assert_equal "cba", reverse("abc")
assert_equal "a", reverse("a")
assert_equal "", reverse("")
end
end
That would have told you that your function is returning nil.
1) Failure:
TestReverse#test_reverse [test.rb:16]:
Expected: "cba"
Actual: nil
To make this more Ruby-like yet avoid using the built-in String#reverse method you'd do this:
def reverse(string)
string.chars.reverse.join('')
end
Remember that in Ruby the result of the last operation is automatically the return value of the method. In your case the last operation is puts which always returns nil, eating your value. You want to pass it through.
Try to design methods with a simple mandate, that is, this function should focus on doing one job and one job only: reversing a string. Displaying it is beyond that mandate, so that's a job for another method, like perhaps the caller.
To avoid calling any sort of reverse method at all:
def reverse(string)
result = ''
length = string.length
length.times do |i|
result << string[length - 1 - i]
end
result
end
You can often avoid for almost completely and while frequently if you use things like times or ranges (0..n) to iterate over.
puts prints and returns nil, so the whole method returns nil. If, for debugging reasons , you want to inspect what your method is returning, use p which returns it's argument (reversedString in this case).
def reverse(string)
reversedString = ""
i = string.length - 1
while i >= 0
reversedString = reversedString + string[i]
i -= 1
end
p reversedString # !!!
end
And all 3 tests return true
If I was going to do this, I'd probably take advantage of an array:
ary = 'foo bar baz'.chars
reversed_ary = []
ary.size.times do
reversed_ary << ary.pop
end
reversed_ary.join # => "zab rab oof"
pop removes the last character from the array and returns it, so basically it's walking backwards through ary, nibbling at the end and pushing each character onto the end of reversed_ary, effectively reversing the array.
Alternately it could be done using a string:
ary = 'foo bar baz'.chars
reversed_str = ''
ary.size.times do
reversed_str << ary.pop
end
reversed_str # => "zab rab oof"
or:
reversed_str += ary.pop
I just saw that #tadman did a similar thing with the string. His would run more quickly but this is more readable, at least to my eyes.