How do I create this File Input and Output assignment in Ruby - ruby

I have an assignment that I am not sure what to do. Was wondering if anyone could help. This is it:
Create a program that allows the user to input how many hours they exercised for today. Then the program should output the total of how many hours they have exercised for all time. To allow the program to persist beyond the first run the total exercise time will need to be written and retrieved from a file.
My code is this so far:
myFileObject2 = File.open("exercise.txt")
myFileObjecit2.read
puts "This is an exercise log. It keeps track of the number hours of exercise."
hours = gets.to_f
myFileObject2.close

Write your code like:
File.open("exercise.txt", "r") do |fi|
file_content = fi.read
puts "This is an exercise log. It keeps track of the number hours of exercise."
hours = gets.chomp.to_f
end
Ruby's File.open takes a block. When that block exits File will automatically close the file. Don't use the non-block form unless you are absolutely positive you know why you should do it another way.
chomp the value you get from gets. This is because gets won't return until it sees a trailing END-OF-LINE, which is usually a "\n" on Mac OS and *nix, or "\r\n" on Windows. Failing to remove that with chomp is the cause of much weeping and gnashing of teeth in unaware developers.
The rest of the program is left for you to figure out.
The code will fail if "exercise.txt" doesn't already exist. You need to figure out how to deal with that.
Using read is bad form unless you are absolutely positive the file will always fit in memory because the entire file will be read at once. Once it is in memory, it will be one big string of data so you'll have to figure out how to break it into an array so you can iterate it. There are better ways to handle reading than read so I'd study the IO class, plus read what you can find on Stack Overflow. Hint: Don't slurp your files.

Related

Incrementally reading logs

Looked around with numerous search strings but can't find anything quite like this:
I'm writing a custom log parser (ala analog or webalizer except not for webserver) and I want to be able to skip the hard work for the lines that have already been parsed. I have thought about using a history file like webalizer but have no idea how it actually works internally and my C is pretty poor.
I've considered hashing each line and writing the hashes out, then parsing the history file for their presence but I think this will perform poorly.
The only other method I can think of is storing the line number of the last parse and skipping until that number is reached the next time round. What happens when the log is rotated I am not sure.
Any other ideas would be appreciated. I will be writing the parser in ruby but tips in a similar language will help as well.
The solutions I can think of right now are bound to be brittle.
Even if you store the line number and later realize it would be past the length of the current file, what happens if old lines have been trimmed? You would start reading (well) after the last position.
If, on the other hand, you are sure your log files won't be tampered with and they will only be rotated, I only see two ways of doing what you want, and I'm not sure the second is applicable to you.
Anyway, here goes.
First solution
You store the last line you parsed along with a timestamp. At the next run, you consider all the rotated log files sorting them by their last modified date, figure out which one you read last time, and start reading from there.
I didn't think this through, there might be funny corner cases you will need to handle.
Second solution
You create a background script that continuously watches the log file. A quick search on Google turned out this gem, but I'm not sure if that's even an option for you. Even then, you might want to integrate this solution with the previous one just in case your daemon will get interrupted (because that's clearly bound to happen at some point).
As you read the file and parse the lines keep track of the byte count. Save that. On next read, try to seek to that byte offset in the file. If the file is smaller than the byte count, it's a new file so start at the beginning.

app is not stopping at "gets" - is my online interpreter forcing a timeout?

While starting an assignment (Towers of Hanoi), I leave my code in a very basic state while I ponder the logic of how to continue.
while arr3.count < 6
puts "Move ring FROM which tower?"
from = gets.chomp
puts "Move ring TO which tower?"
to = gets.chomp
end
Before I can start building the rest of the app, however, gets seems to fall through without any input from me, and the second puts displays on the screen. This continues looping every, say, 30 seconds or so. Should I assume this is a feature of online interpreters (like codeacademy labs)?
Now I'm distracted from continuing the assignment and have to find a better place to do my code.
I'm installing Aptana (based on some advice on this forum) to see if I can get a better environment to do my assignments. Or do most people use a text editor then run their .rb file through the windows console window?
Thx

Why doesn't IO#seek work for TCPSocket?

I wrote some simple code to learn the structure of a TCPSocket. I thought it's like an IO stream so I tried to use seek to move the "reading position" back a byte:
socket.gets #=> hello world
socket.seek(-5, IO::SEEK_CUR)
socket.gets #=> hello world # this should return world
but, it gives me an error:
server.rb:11:in `seek': Illegal seek (Errno::ESPIPE)
Does anyone have an idea why this doesn't work?
If this was the case then the socket needs to keep all data around if someone would decides to seek backwards (and how would forward seek work, block for more data?). You could probably quite easy write a wrapper class around a socket that keeps track of a position and also buffers all data or blocks if needed etc.
But maybe you could try to use IO#bytes or IO#chars in combination with Enumerator#peek?
TCP/IP would be more like having a series of files on disk, where you can only read forward a file at a time. The files have to be read sequentially, and you can't jump ahead or back. It's not capable of random I/O, like you can do on a disk, it's more like a serial connection you can only read as things appear.
In order to do what you want you have to build a buffer, where you append each block (i.e., file), reconstructing the entire message. If you want to look backwards at any point, you have to look in your buffer. If you want to look forward you have to wait for that block to be received and read and appended.
That's a simple explanation. It's possible to request blocks be resent in IP but really, at the level we normally work at, we're only reading forward.

Fastest Way to Parse a Large File in Ruby

I have a simple text file that is ~150mb. My code will read each line, and if it matches certain regexes, it gets written to an output file.
But right now, it just takes a long time to iterate through all of the lines of the file (several minutes) doing it like
File.open(filename).each do |line|
# do some stuff
end
I know that it is the looping through the lines of the file that is taking a while because even if I do nothing with the data in "#do some stuff", it still takes a long time.
I know that some unix programs can parse large files like this almost instantly (like grep), so I am wondering why ruby (MRI 1.9) takes so long to read the file, and is there some way to make it faster?
It's not really fair to compare to grep because that is a highly tuned utility that only scans the data, it doesn't store any of it. When you're reading that file using Ruby you end up allocating memory for each line, then releasing it during the garbage collection cycle. grep is a pretty lean and mean regexp processing machine.
You may find that you can achieve the speed you want by using an external program like grep called using system or through the pipe facility:
`grep ABC bigfile`.split(/\n/).each do |line|
# ... (called on each matching line) ...
end
File.readlines.each do |line|
#do stuff with each line
end
Will read the whole file into one array of lines. It should be a lot faster, but it takes more memory.
You should read it into the memory and then parse. Of course it depends on what you are looking for. Don't expect miracle performance from ruby, especially comparing to c/c++ programs which are being optimized for past 30 years ;-)

Increasing the Loading Speed of Large Files

There are two large text files (Millions of lines) that my program uses. These files are parsed and loaded into hashes so that the data can be accessed quickly. The problem I face is that, currently, the parsing and loading is the slowest part of the program. Below is the code where this is done.
database = extractDatabase(#type).chomp("fasta") + "yml"
revDatabase = extractDatabase(#type + "-r").chomp("fasta.reverse") + "yml"
#proteins = Hash.new
#decoyProteins = Hash.new
File.open(database, "r").each_line do |line|
parts = line.split(": ")
#proteins[parts[0]] = parts[1]
end
File.open(revDatabase, "r").each_line do |line|
parts = line.split(": ")
#decoyProteins[parts[0]] = parts[1]
end
And the files look like the example below. It started off as a YAML file, but the format was modified to increase parsing speed.
MTMDK: P31946 Q14624 Q14624-2 B5BU24 B7ZKJ8 B7Z545 Q4VY19 B2RMS9 B7Z544 Q4VY20
MTMDKSELVQK: P31946 B5BU24 Q4VY19 Q4VY20
....
I've messed around with different ways of setting up the file and parsing them, and so far this is the fastest way, but it's still awfully slow.
Is there a way to improve the speed of this, or is there a whole other approach I can take?
List of things that don't work:
YAML.
Standard Ruby threads.
Forking off processes and then retrieving the hash through a pipe.
In my usage, reading all or part the file into memory before parsing usually goes faster. If the database sizes are small enough this could be as simple as
buffer = File.readlines(database)
buffer.each do |line|
...
end
If they're too big to fit into memory, it gets more complicated, you have to setup block reads of data followed by parse, or threaded with separate read and parse threads.
Why not use the solution devised through decades of experience: a database, say SQLlite3?
(To be different, although I'd first recommend looking at (Ruby) BDB and other "NoSQL" backend-engines, if they fit your need.)
If fixed-sized records with a deterministic index are used then you can perform a lazy-load of each item through a proxy object. This would be a suitable candidate for a mmap. However, this will not speed up the total access time, but will merely amortize the loading throughout the life-cycle of the program (at least until first use and if some data is never used then you get the benefit of never loading it). Without fixed-sized records or deterministic index values this problem is more complex and starts to look more like a traditional "index" store (eg. a B-tree in an SQL back-end or whatever BDB uses :-).
The general problems with threading here are:
The IO will likely be your bottleneck around Ruby "green" threads
You still need all the data before use
You may be interested in the Widefinder Project, just in general "trying to get faster IO processing".
I don't know too much about Ruby but I have had to deal with the problem before. I found the best way was to split the file up into chunks or separate files then spawn threads to read each chunk in at a single time. Once the partitioned files are in memory combining the results should be fast. Here is some information on Threads in Ruby:
http://rubylearning.com/satishtalim/ruby_threads.html
Hope that helps.

Resources