Ruby program for searching words in a file - ruby

I started with Ruby yesterday, I only have some experience with C.
Now I'm trying to write a program that gets a file and a word to search in that file from ARGV, and prints how many times the word appeared. Got rid of any error, but it prints 0 anyway when I test it.
if ARGV.size !=2
puts "INSERT A FILE AND A WORD OR A CHAR TO SEARCH FOR"
exit 1
else
file = File.open(ARGV[0], mode = "r")
word = ARGV[1]
if !file
puts "ERROR: INVALID INPUT FILE"
exit 1
end
while true
begin
i = 0
count_word = 0
string = []
string[i] = file.readline
if string[i].upcase.include? word.upcase
count_word += 1
end
i += 1
rescue EOFError
break
end
end
print "The word searched is ", word, " Frequency: ", count_word, "\n"
end
I hope you could tell me what's wrong (I believe I do something wrong when counting), thanks in advance.

A great thing about Ruby it that it operates on a way higher level of abstraction. Here is a snippet that does what you want:
if ARGV.size != 2
puts "Provide file to be searched in and word to be found"
exit 1
end
file = ARGV[0]
word = ARGV[1]
count = 0
File.open(file, 'r').each { |line| count += 1 if line.downcase.include? word.downcase }
puts "The word searched is #{word} Frequency: #{count}"
As you can see, the language provides a lot of features like string interpolation, enumeration of the file contents, etc.
There is a handful of problems with the code you provided. From styling issues like indentation, to incorrect assumptions about the language like the if !file check and strange decisions overall - like why do you use a list if you want only the current line.
I suggest you to look at http://tryruby.org/ . It is very short and will get you a feel of the Ruby way to do things. Also it covers your question (processing files).
As a general note when you post a question on stackoverflow, please include the code in the question, rather than link to an external page. This way people can read through it faster, edit it and the code wont be lost if the other site goes down. You can still link to external pages if you want to show the snippet in action.

Hope this will help you, the error that you did is that you included this part:
i = 0
count_word = 0
string = []
into the while loop, which every time resets your counter to zero even if it found the word, so to correct this error here what you should do:
if ARGV.size !=2
puts "INSERT A FILE AND A WORD OR A CHAR TO SEARCH FOR"
exit 1
else
file = File.open(ARGV[0], mode = "r")
word = ARGV[1]
if !file
puts "ERROR: INVALID INPUT FILE"
exit 1
end
i = 0
count_word = 0
string = []
while true
begin
string[i] = file.readline
if string[i].upcase.include? word.upcase
count_word += 1
end
i += 1
rescue EOFError
break
end
end
print "The word searched is ", word, " Frequency: ", count_word, "\n"
end

Related

How to optimize code removing unwanted characters

This code is designed for a problem where the users computer has a bug where every time he/she hits the backspace button it displays a '<' symbol. The created program should fix this and output the intended string considering that '<' represents a backspace. The input string can be up to 10^6 characters long, and it only will include lowercase letters and '<'.
My code seems to be executing correctly but, when I submit it, the website says it exceeded the time limit for test 5/25. The amount of time given is 1 second. Also, if there are only '<' symbols it should produce no output.
For example,
"hellooo<< my name is matthe<<"
would output
"hello my name is matt"
and
"ssadfas<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<"
would output nothing, etc.
Here is the code:
input = gets.chomp
while input[/[[:lower:]]</]
input.gsub!(/[[:lower:]]</, "")
end
input.gsub!(/</, "")
puts"#{input}"
In the code above I stay in the while loop if there are any instances where a lowercase letter is in front of a '<'. Anywhere a lowercase letter is followed by a '<' it is replaced with nothing. Once the while loop is exited if there are any '<' symbols left, they are replaced with nothing. Then the final string is displayed.
I created a test which I think is worst case scenario for my code:
input = ("a" + "<" + "a")*10000000
#input = gets.chomp
while input[/[[:lower:]]</]
input.gsub!(/[[:lower:]]</, "")
end
input.gsub!(/</, "")
puts"#{input}"
I made the program stop between the creation of the string and the execution of the while loop and then ran it completely to be able to eyeball if it was taking longer than a second. It seemed to take much longer than 1 second.
How can it be modified to be faster or is there a much better way to do this?
Your approach is good but you get better performance if you adapt the regular expression.
Cary, I hope you don't mind I take your excellent solution also in the benchmark ?
Benchmark done on a MRI ruby 2.3.0p0 (2015-12-25 revision 53290) [x64-mingw32]
I use .dup on my sample string to make sure none of the methods changes the input sample.
require 'benchmark'
input = ""
10_000_000.times{input << ['a','<'].sample}
def original_method inp
while inp[/[[:lower:]]</]
inp.gsub!(/[[:lower:]]</, "")
end
inp.gsub(/</, "")
end
def better_method inp
tuple = /[^<]</
while inp[tuple]
inp.gsub!(inp[tuple], "")
end
inp.gsub(/</, "")
end
def backspace str
bs_count = 0
str.reverse.each_char.with_object([]) do |s, arr|
if s == '<'
bs_count += 1
else
bs_count.zero? ? arr.unshift(s) : bs_count -= 1
end
end.join
end
puts original_method(input.dup).length
puts better_method(input.dup).length
puts backspace(input.dup).length
Benchmark.bm do |x|
x.report("original_method") { original_method(input.dup) }
x.report("backspace ") { backspace(input.dup) }
x.report("better_method ") { better_method(input.dup) }
end
gives
3640
3640
3640
user system total real
original_method 3.494000 0.016000 3.510000 ( 3.510709)
backspace 1.872000 0.000000 1.872000 ( 1.862550)
better_method 1.155000 0.031000 1.186000 ( 1.187495)
def backspace(str)
bs_count = 0
str.reverse.each_char.with_object([]) do |s, arr|
if s == '<'
bs_count += 1
else
bs_count.zero? ? arr.unshift(s) : bs_count -= 1
end
end.join
end
backspace "Now is the<< tim<e fo<<<r every<<<one to chill ou<<<<t"
#=> "Now is t tier evone to chilt"

How to determine whether input is empty or enter is pressed

I have a task to puts an infinite number of word, each in one line to array, and when enter is pressed on an empty line, puts these words in reverse order. How can I define when enter is pressed or empty line is input?
Code is here:
word = []
puts "Enter word"
add = 0
until add == ????
word.push gets.chomp
add = word.last
end
puts word.reverse
Here's a possible solution, with comments. I didn't see any useful role being played by your add variable, so I ignored it. I also believe in prompting the user regularly so they know the program is still engaged with them, so I moved the user-prompt inside the loop.
word = [] # Start with an empty array
# Use loop when the terminating condition isn't known at the beginning
# or end of the repetition, but rather it's determined in the middle
loop do
print 'Enter word: ' # I like to prompt the user each time.
response = gets.chomp # Read the response and clean it up.
break if response.empty? # No response? Time to bail out of the loop!
word << response # Still in the loop? Append the response to the array.
end
puts word.reverse # Now that we're out of the loop, reverse and print
You may or may not prefer to use strip rather than chomp. Strip would halt if the user input a line of whitespace.
Here, this is a modified version of your code and it works as requested.
word = []
puts "Enter word"
add = 0
while add != -1
ans = gets.chomp
word.push ans
if ans == ""
puts word.reverse
exit
end
add += 1
end
puts word.reverse
This is another version, using (as you did originally) the until loop.
word = []
puts "Enter word"
add = 0
until add == Float::INFINITY
ans = gets.chomp
word.push ans
if ans == ""
puts word.reverse
exit
end
add += 1
end
puts word.reverse

Ruby code efficiency

Is there a way to make this code shorter and simpler?
loop do
if possibleSet.split(" ").map(&:to_i).any? {|e| (e<0 || e>12)}
print "Please enter valid numbers (between 1 and 12): "
possibleSet = gets
errorinput = false
else
errorinput = true
end
break if errorinput
end
Refactored a bit :)
loop do
print "Please enter valid numbers (between 1 and 12): "
possibleSet = gets.chomp
break unless possibleSet.split(" ").map(&:to_i).any? {|e| (e<0 || e>12)}
end
The code below will check input for correctness:
input = loop do
print "Please enter valid numbers (between 1 and 12): "
# ⇓⇓⇓ as many spaces as user wants
input = gets.chomp.split(/\s+/).map(&:to_i) rescue []
break input unless input.empty? || input.any? { |i| !(0..12).include? i }
end
This parses the user input in an array (not exactly the same behavior, but I hope it is cleaner and you can work from there)
set = []
until set.all? {|i| (1..11).include?(i) } && !set.empty? do
set = gets.split(' ').map(&:to_i)
end

find the target string from a large file

I want to write a class, it can find a target string in a txt file and output the line number and the position.
class ReadFile
def find_string(filename, string)
line_num = 0
IO.readlines(filename).each do |line|
line_num += 1
if line.include?(string)
puts line_num
puts line.index(string)
end
end
end
end
a= ReadFile.new
a.find_string('test.txt', "abc")
If the txt file is very large(1 GB, 10GB ...), the performance of this method is very poor.
Is there the better solution?
Use foreach to efficiently read a single line from the file at a time and with_index to track the line number (0-based):
IO.foreach(filename).with_index do |line, index|
if found = line.index(string)
puts "#{index+1}, #{found+1}"
break # skip this if you want to find more than 1 result
end
end
See here for a good explanation of why readlines is giving you performance problems.
This is a variant of #PinnyM's answer. It uses find, which I think is more descriptive than looping and breaking, but does the same thing. This does have a small penalty of having to determine the offset into the line where the string begins after the line is found.
line, index = IO.foreach(filename).with_index.find { |line,index|
line.include?(string) }
if line
puts "'#{string}' found in line #{index}, " +
"beginning in column #{line.index(string)+1}"
else
puts "'#{string}' not found"
end

Naming a new file with timestamp?

I'm trying to write a simple command line program that lets me keep track of how many times I get distracted during a study session.
I'm getting an argument error when I run this code saying my file.open has invalid arguments. If I want to name each new_session file with a timestamp, what would be a simple solution?
def first_question
puts "Would you like to start a new session?"
answer = gets.chomp
answer = answer.downcase
if answer == "yes"
new_session
else
puts "Ok. Would you like to review a previous session?"
prev_session = gets.chomp
prev_session.downcase
if prev_session == "yes"
#GET AND REVIEW PREVIOUS SESSIONS
elsif prev_session == "no"
puts "Well if you don't want a new session, and you don't want to review your old sessions, then you're SOL."
else
"That's not an acceptable response."
first_question
end
end
end
def new_session
distractions = 0
d = File.open("Session"+Time.now.to_s , 'w'){|f| f.write(distractions) }
puts "What would you like to do (add track(s) or review tracks)?"
request = gets.chomp
request.downcase
if request == "add track"
distractions = distractions.to_i + 1
puts "You have #{distractions} tracks in this session."
elsif request == "add tracks"
puts "How many times have you been distracted since we last met?"
answer = gets.chomp
distractions = distractions.to_i + answer.to_i
puts "You have #{distractions} tracks."
elsif request == "review tracks"
puts distractions
end
File.open( d , 'w') {|f| f.write(distractions) }
end
first_question
Most of your code is messy and redundant. The problem you are referring to, though, comes from here:
d = File.open("Session"+Time.now.to_s , 'w'){|f| f.write(distractions) }
d will be the number of bytes written to the file and thus a Fixnum. You can't open a Fixnum which you're trying to do in the last line of the function.
Further,
request = gets.chomp
request.downcase
The second line here does nothing.
You have two File.open statements:
d = File.open("Session"+Time.now.to_s , 'w'){|f| f.write(distractions) }
and:
File.open( d , 'w') {|f| f.write(distractions) }
Your error code will tell you which one is wrong, but, from looking at them, I'd say it's the second one.
d will be assigned the result of the block for the first File.open, which is going to be the result of f.write(distractions):
The File.open docs say:
The value of the block will be returned from File.open.
The File.write docs say:
Returns the number of bytes written.
As a result, you are assigning d a number of bytes, then trying to create a file with an integer for a filename, which is an error because a filename MUST be a string.
That leads to a bigger problem, which is, your code makes no sense.
d = File.open("Session"+Time.now.to_s , 'w'){|f| f.write(distractions) } writes a 0 to the file created by "Session"+Time.now.to_s.
request.downcase converts the contents of request to lowercase and immediately throws it away. Perhaps you meant request.downcase!, but it'd be better to write:
request = gets.chomp
request.downcase
As:
request = gets.chomp.downcase
distractions = distractions.to_i + 1? distractions is already 0 which is a Fixnum. You're converting a Fixnum to an integer using to_i then adding 1 to it. Simply do:
distractions += 1
distractions = distractions.to_i + answer.to_i should be:
distractions += answer.to_i
File.open( d , 'w') {|f| f.write(distractions) }, because it's trying to write to a file with the integer name, won't update your original file. If it succeeded, it'd write to an entirely new file, which would end up overwriting the previously created file, which was the result of writing a single 0 to disk. Instead, d should be the name of the file previously created.
Consider this:
def new_session
distractions = 0
puts "What would you like to do (add track(s) or review tracks)?"
request = gets.chomp.downcase
case request
when "add track"
distractions += 1
puts "You have #{distractions} tracks in this session."
when "add tracks"
puts "How many times have you been distracted since we last met?"
distractions += gets.chomp.to_i
puts "You have #{distractions} tracks."
when "review tracks"
puts distractions
end
File.write( "Session" + Time.now.to_s, distractions)
end
This code is cleaner, and makes more sense now.

Resources