Reading previous line of file with Ruby - ruby

How to read the prevous line of a file. The opposite of IO.gets.I initially thought to set IO.lineno to the line number I wanted to read but that doesn't work as expect. How do you actually read the previous line?

One simple way is to remember the previous line you read:
prev = nil
File.foreach('_vimrc') do |line|
p [line, prev] # do whatever processing here
prev = line
end

There are a couple of ways, it depends on how much information you have at your disposal.
If you know the length of the last line you read in you can take the easiest approach which would be
# assume f is the File object
len = last_line_length
f.seek -len, IO::SEEK_CUR
Of course, if you are not provided that information things become a little less nice. You could use the same approach as above to walk backwards (one byte at a time) until you hit a newline marker or take lineno and read from the beginning. Something like
lines = f.lineno
f.rewind
(lines - 1).times { f.gets }
However, as far as I know, there is no direct mechanism to just go back 1 N where N represents a line.
As an aside, you should know that while you can write to File.lineno doing so does not actually affect the position in the file and it also ruins the accuracy of the variable for reads after that point.

The Elif gem has a gets method that is the opposite of IO.gets.
$ sudo gem install elif
$ irb
require "elif"
last_syslog = Elif.open('/var/log/syslog') { |file| file.gets }

Saw an excellent suggestion on comp.lang.ruby -- use IO.tell to keep track of the position of each line in the file so you can seek directly to it:
File.open "foo" do |io|
idx = [io.tell]
while l = io.gets
p l
idx << io.tell
end
end

Related

In ruby, file.readlines.each not faster than file.open.each_line, why?

Just to analyze my iis log (BONUS: happened to know that iislog is encoded in ASCII, errrr..)
Here's my ruby code
1.readlines
Dir.glob("*.log").each do |filename|
File.readlines(filename,:encoding => "ASCII").each do |line|
#comment line
if line[0] == '#'
next
else
line_content = line.downcase
#just care about first one
matched_keyword = keywords.select { |e| line_content.include? e }[0]
total_count += 1 if extensions.any? { |e| line_content.include? e }
hit_count[matched_keyword] += 1 unless matched_keyword.nil?
end
end
end
2.open
Dir.glob("*.log").each do |filename|
File.open(filename,:encoding => "ASCII").each_line do |line|
#comment line
if line[0] == '#'
next
else
line_content = line.downcase
#just care about first one
matched_keyword = keywords.select { |e| line_content.include? e }[0]
total_count += 1 if extensions.any? { |e| line_content.include? e }
hit_count[matched_keyword] += 1 unless matched_keyword.nil?
end
end
end
"readlines" read the whole file in mem, why "open" always a bit faster on the contrary??
I tested it a couple of times on Win7 Ruby1.9.3
Both readlines and open.each_line read the file only once. And Ruby will do buffering on IO objects, so it will read a block (e.g. 64KB) data from disk every time to minimize the cost on disk read. There should be little time consuming difference in the disk read step.
When you call readlines, Ruby constructs an empty array [] and repeatedly reads a line of file contents and pushes it to the array. And at last it will return the array containing all lines of the file.
When you call each_line, Ruby reads a line of file contents and yield it to your logic. When you finished processing this line, ruby reads another line. It repeatedly reads lines until there is no more contents in the file.
The difference between the two method is that readlines have to append the lines to an array. When the file is large, Ruby might have to duplicate the underlying array (C level) to enlarge its size one or more times.
Digging into the source, readlines is implemented by io_s_readlines which calls rb_io_readlines. rb_io_readlines calls rb_io_getline_1 to fetch line and rb_ary_push to push result into the returning array.
each_line is implemented by rb_io_each_line which calls rb_io_getline_1 to fetch line just like readlines and yield the line to your logic with rb_yield.
So, there is no need to store line results in a growing array for each_line, no array resizing, copying issue.

Why IO#lineno doen't alway indicate the next read point for IO#read

f= File.open('path_to_file','w')
f.lineno
#=> 0
f.gets
#=>"this is the content of the first line"
f.lineno
#=>1 # the lineno cooresponse to the next read point of IO#gets
f.rewind
f.lineno
#=>0
f.read
#=>"all the content in the file"
f.lineno
#=>0 # the lineno still is the beginning
f.read
#=>"" # but I can't get anyting , it seems like the read point reach to the end of the file, so the f.lineno should be 3, instead of 0
Or Is there any other way to know the next read point of IO stream
f.lineno
#=>0
From the Ruby IO docs, lineno doesn't tell you the position in the stream. Rather, it tells you how many times gets has been called. As the read function doesn't use gets, the lineno value doesn't change.
What you probably want is is pos, which tells you the current offset in the file in bytes. You can also set pos to jump to a different spot in the file.

Ruby: What's an elegant way to pick a random line from a text file?

I've seen some really beautiful examples of Ruby and I'm trying to shift my thinking to be able to produce them instead of just admire them. Here's the best I could come up with for picking a random line out of a file:
def pick_random_line
random_line = nil
File.open("data.txt") do |file|
file_lines = file.readlines()
random_line = file_lines[Random.rand(0...file_lines.size())]
end
random_line
end
I feel like it's gotta be possible to do this in a shorter, more elegant way without storing the entire file's contents in memory. Is there?
There is already a random entry selector built into the Ruby Array class: sample().
def pick_random_line
File.readlines("data.txt").sample
end
You can do it without storing anything except the most recently-read line and the current candidate for the returned random line.
def pick_random_line
chosen_line = nil
File.foreach("data.txt").each_with_index do |line, number|
chosen_line = line if rand < 1.0/(number+1)
end
return chosen_line
end
So the first line is chosen with probability 1/1 = 1; the second line is chosen with probability 1/2, so half the time it keeps the first one and half the time it switches to the second.
Then the third line is chosen with probability 1/3 - so 1/3 of the time it picks it, and the other 2/3 of the time it keeps whichever one of the first two it picked. Since each of them had a 50% chance of being chosen as of line 2, they each wind up with a 1/3 chance of being chosen as of line 3.
And so on. At line N, every line from 1-N has an even 1/N chance of being chosen, and that holds all the way through the file (as long as the file isn't so huge that 1/(number of lines in file) is less than epsilon :)). And you only make one pass through the file and never store more than two lines at once.
EDIT You're not going to get a real concise solution with this algorithm, but you can turn it into a one-liner if you want to:
def pick_random_line
File.foreach("data.txt").each_with_index.reduce(nil) { |picked,pair|
rand < 1.0/(1+pair[1]) ? pair[0] : picked }
end
This function does exactly what you need.
It's not a one-liner. But it works with textfiles of any size (except zero size, maybe :).
def random_line(filename)
blocksize, line = 1024, ""
File.open(filename) do |file|
initial_position = rand(File.size(filename)-1)+1 # random pointer position. Not a line number!
pos = Array.new(2).fill( initial_position ) # array [prev_position, current_position]
# Find beginning of current line
begin
pos.push([pos[1]-blocksize, 0].max).shift # calc new position
file.pos = pos[1] # move pointer backward within file
offset = (n = file.read(pos[0] - pos[1]).rindex(/\n/) ) ? n+1 : nil
end until pos[1] == 0 || offset
file.pos = pos[1] + offset.to_i
# Collect line text till the end
begin
data = file.read(blocksize)
line.concat((p = data.index(/\n/)) ? data[0,p.to_i] : data)
end until file.eof? or p
end
line
end
Try it:
filename = "huge_text_file.txt"
100.times { puts random_line(filename).force_encoding("UTF-8") }
Negligible (imho) drawbacks:
the longer the line, the higher the chance it'll be picked.
doesn't take into account the "\r" line separator ( windows-specific ). Use files with Unix-style line endings!
This is not much better than what you came up with, but at least it's shorter:
def pick_random_line
lines = File.readlines("data.txt")
lines[rand(lines.length)]
end
One thing you can do to make your code more Rubyish is omitting braces. Use readlines and size instead of readlines() and size().
A one liner:
def pick_random_line(file)
`head -$((${RANDOM} % `wc -l < #{file}` + 1)) #{file} | tail -1`
end
If you protest that it's not Ruby, go find a talk in this year's Euruko titled Ruby is unlike a Banana.
PS: Ignore SO's incorrect syntax highlighting.
Here a shorter version of Mark's exellent answer, not as short as Dave's though
def pick_random_line number=1, chosen_line=""
File.foreach("data.txt") {|line| chosen_line = line if rand < 1.0/number+=1}
chosen_line
end
Stat the file, pick a random number between zero and the size of the file, seek to that byte in the file. Scan until the next newline, then read and return the next line (assuming you're not at the end of the file).

Useful file output from reading a file (ruby/rails environment)

I have a model connected to a log, so I'm beginning to build ways to use that info with the model and pass it around elsewhere.
this method:
def read_log
counter = 1
f = File.open(self.log_file_path, 'r')
while (line = f.gets)
puts "#{counter}: #{line}"
counter = counter + 1
end
end
works, and dumps the log to the command line but ends with nil, so it reads it out to stdout but when calling that I get nothing. How can I read the contents into a more useful format? I need to read this into a controller variable for a template within rails on a webpage. It is basic, but something I haven't done yet.
contents = f.read
Now contents contains... the contents. Not sure what "useful" means in your context, but you can do things like split on newline to get each line.
You can also create an enumerator via f.lines, whether or not that's more useful, not sure.

Read, edit, and write a text file line-wise using Ruby

Is there a good way to read, edit, and write files in place in Ruby?
In my online search I've found stuff suggesting to read it all into an array, modify said array, then write everything out. I feel like there should be a better solution, especially if I'm dealing with a very big file.
Something like:
myfile = File.open("path/to/file.txt", "r+")
myfile.each do |line|
myfile.replace_puts('blah') if line =~ /myregex/
end
myfile.close
Where replace_puts would write over the current line, rather than (over)writing the next line as it currently does because the pointer is at the end of the line (after the separator).
So then every line that matches /myregex/ will be replaced with 'blah'. Obviously what I have in mind is a bit more involved than that, as far as processing, and would be done in one line, but the idea is the same - I want to read a file line by line, and edit certain lines, and write out when I'm done.
Maybe there's a way to just say "rewind back to just after the last separator"? Or some way of using each_with_index and write via a line index number? I couldn't find anything of the sort, though.
The best solution I have so far is to read things line-wise, write them out to a new (temp) file line-wise (possibly edited), then overwrite the old file with the new temp file and delete. Again, I feel like there should be a better way - I don't think I should have to create a new 1gig file just to edit some lines in an existing 1GB file.
In general, there's no way to make arbitrary edits in the middle of a file. It's not a deficiency of Ruby. It's a limitation of the file system: Most file systems make it easy and efficient to grow or shrink the file at the end, but not at the beginning or in the middle. So you won't be able to rewrite a line in place unless its size stays the same.
There are two general models for modifying a bunch of lines. If the file is not too large, just read it all into memory, modify it, and write it back out. For example, adding "Kilroy was here" to the beginning of every line of a file:
path = '/tmp/foo'
lines = IO.readlines(path).map do |line|
'Kilroy was here ' + line
end
File.open(path, 'w') do |file|
file.puts lines
end
Although simple, this technique has a danger: If the program is interrupted while writing the file, you'll lose part or all of it. It also needs to use memory to hold the entire file. If either of these is a concern, then you may prefer the next technique.
You can, as you note, write to a temporary file. When done, rename the temporary file so that it replaces the input file:
require 'tempfile'
require 'fileutils'
path = '/tmp/foo'
temp_file = Tempfile.new('foo')
begin
File.open(path, 'r') do |file|
file.each_line do |line|
temp_file.puts 'Kilroy was here ' + line
end
end
temp_file.close
FileUtils.mv(temp_file.path, path)
ensure
temp_file.close
temp_file.unlink
end
Since the rename (FileUtils.mv) is atomic, the rewritten input file will pop into existence all at once. If the program is interrupted, either the file will have been rewritten, or it will not. There's no possibility of it being partially rewritten.
The ensure clause is not strictly necessary: The file will be deleted when the Tempfile instance is garbage collected. However, that could take a while. The ensure block makes sure that the tempfile gets cleaned up right away, without having to wait for it to be garbage collected.
If you want to overwrite a file line by line, you'll have to ensure the new line has the same length as the original line. If the new line is longer, part of it will be written over the next line. If the new line is shorter, the remainder of the old line just stays where it is.
The tempfile solution is really much safer. But if you're willing to take a risk:
File.open('test.txt', 'r+') do |f|
old_pos = 0
f.each do |line|
f.pos = old_pos # this is the 'rewind'
f.print line.gsub('2010', '2011')
old_pos = f.pos
end
end
If the line size does change, this is a possibility:
File.open('test.txt', 'r+') do |f|
out = ""
f.each do |line|
out << line.gsub(/myregex/, 'blah')
end
f.pos = 0
f.print out
f.truncate(f.pos)
end
Just in case you are using Rails or Facets, or you otherwise depend on Rails' ActiveSupport, you can use the atomic_write extension to File:
File.atomic_write('path/file') do |file|
file.write('your content')
end
Behind the scenes, this will create a temporary file which it will later move to the desired path, taking care of closing the file for you.
It further clones the file permissions of the existing file or, if there isn't one, of the current directory.
You can write in the middle of a file but you have to be carefull to keep the length of the string you overwrite the same otherwise you overwrite some of the following text. I give an example here using File.seek, IO::SEEK_CUR gives he current position of the file pointer, at the end of the line that is just read, the +1 is for the CR character at the end of the line.
look_for = "bbb"
replace_with = "xxxxx"
File.open(DATA, 'r+') do |file|
file.each_line do |line|
if (line[look_for])
file.seek(-(line.length + 1), IO::SEEK_CUR)
file.write line.gsub(look_for, replace_with)
end
end
end
__END__
aaabbb
bbbcccddd
dddeee
eee
After executed, at the end of the script you now have the following, not what you had in mind I assume.
aaaxxxxx
bcccddd
dddeee
eee
Taking that in consideration, the speed using this technique is much better than the classic 'read and write to a new file' method.
See these benchmarks on a file with music data of 1.7 GB big.
For the classic approach I used the technique of Wayne.
The benchmark is done withe the .bmbm method so that caching of the file doesn't play a very big deal. Tests are done with MRI Ruby 2.3.0 on Windows 7.
The strings were effectively replaced, I checked both methods.
require 'benchmark'
require 'tempfile'
require 'fileutils'
look_for = "Melissa Etheridge"
replace_with = "Malissa Etheridge"
very_big_file = 'D:\Documents\muziekinfo\all.txt'.gsub('\\','/')
def replace_with file_path, look_for, replace_with
File.open(file_path, 'r+') do |file|
file.each_line do |line|
if (line[look_for])
file.seek(-(line.length + 1), IO::SEEK_CUR)
file.write line.gsub(look_for, replace_with)
end
end
end
end
def replace_with_classic path, look_for, replace_with
temp_file = Tempfile.new('foo')
File.foreach(path) do |line|
if (line[look_for])
temp_file.write line.gsub(look_for, replace_with)
else
temp_file.write line
end
end
temp_file.close
FileUtils.mv(temp_file.path, path)
ensure
temp_file.close
temp_file.unlink
end
Benchmark.bmbm do |x|
x.report("adapt ") { 1.times {replace_with very_big_file, look_for, replace_with}}
x.report("restore ") { 1.times {replace_with very_big_file, replace_with, look_for}}
x.report("classic adapt ") { 1.times {replace_with_classic very_big_file, look_for, replace_with}}
x.report("classic restore") { 1.times {replace_with_classic very_big_file, replace_with, look_for}}
end
Which gave
Rehearsal ---------------------------------------------------
adapt 6.989000 0.811000 7.800000 ( 7.800598)
restore 7.192000 0.562000 7.754000 ( 7.774481)
classic adapt 14.320000 9.438000 23.758000 ( 32.507433)
classic restore 14.259000 9.469000 23.728000 ( 34.128093)
----------------------------------------- total: 63.040000sec
user system total real
adapt 7.114000 0.718000 7.832000 ( 8.639864)
restore 6.942000 0.858000 7.800000 ( 8.117839)
classic adapt 14.430000 9.485000 23.915000 ( 32.195298)
classic restore 14.695000 9.360000 24.055000 ( 33.709054)
So the in_file replacement was 4 times faster.

Resources