I want to gain some insight into how Ruby manages file buffering. I looked elsewhere for the answers, but I guess I'm not asking the right questions.
In an IRB session I opened a file for reading:
f = File.open('somefile.txt', 'r')
Using this command:
puts f.gets
prints out the first line of somefile.txt. If I repeat the puts f.gets command, I get the second line, and so on.
My questions are:
Is the file buffer being altered by gets?
If the answer to question 1 is yes, then is there any way to see all the lines that still remain in the buffer?
If the answer to question 2 is no, then I'm assuming that gets has some record of the last line of the file that it read. Is there any way to find out the value of this line index?
f.lineno will give you the current line number
f.pos will give you the current offset in bytes
Related
Let's say I want to combine several massive files into one and then uniq! the one (THAT alone might take a hot second)
It's my understanding that File.readlines() loads ALL the lines into memory. Is there a way to read it line by line, sort of like how node.js pipe() system works?
One of the great things about Ruby is that you can do file IO in a block:
File.open("test.txt", "r").each_line do |row|
puts row
end # file closed here
so things get cleaned up automatically. Maybe it doesn't matter on a little script but it's always nice to know you can get it for free.
you aren't operating on the entire file contents at once, and you don't need to store the entirety of each line either if you use readline.
file = File.open("sample.txt", 'r')
while !file.eof?
line = file.readline
puts line
end
Large files are best read by streaming methods like each_line as shown in the other answer or with foreach which opens the file and reads line by line. So if the process doesn't request to have the whole file in memory you should use the streaming methods. While using streaming the required memory won't increase even if the file size increases opposing to non-streaming methods like readlines.
File.foreach("name.txt") { |line| puts line }
uniq! is defined on Array, so you'll have to read the files into an Array anyway. You cannot process the file line-by-line because you don't want to process a file, you want to process an Array, and an Array is a strict in-memory data structure.
Probably a simple question, but I need to delete the contents of a file after a specific line number? So I wan't to keep the first e.g 5 lines and delete the rest of the contents of a file. I have been searching for a while and can't find a way to do this, I am an iOS developer so Ruby is not a language I am very familiar with.
That is called truncate. The truncate method needs the byte position after which everything gets cut off - and the File.pos method delivers just that:
File.open("test.csv", "r+") do |f|
f.each_line.take(5)
f.truncate( f.pos )
end
The "r+" mode from File.open is read and write, without truncating existing files to zero size, like "w+" would.
The block form of File.open ensures that the file is closed when the block ends.
I'm not aware of any methods to delete from a file so my first thought was to read the file and then write back to it. Something like this:
path = '/path/to/thefile'
start_line = 0
end_line = 4
File.write(path, File.readlines(path)[start_line..end_line].join)
File#readlines reads the file and returns an array of strings, where each element is one line of the file. You can then use the subscript operator with a range for the lines you want
This isn't going to be very memory efficient for large files, so you may want to optimise if that's something you'll be doing.
I want to read a file in and show how large it is. .count is acting like .count! and changing the size of my input file buffer. so now logfile.each doesn't iterate. What's going on?
logfile = open(input_fspec)
puts "logfile size: #{logfile.count} lines"
count will read all the lines from the input in order to do the counting. If you want to read the lines again (e.g. using readline or each) then you will need to call logfile.rewind to move back to the start of the file.
In fact, what count is actually returning is the number of lines that have not been read yet. For example, if you had already read through the file and called count afterwards then it would return 0.
You could do this instead before you even open it:
File.size("input_fspec")
I tried searching for this, but couldn't find much. It seems like something that's probably been asked before (many times?), so I apologize if that's the case.
I was wondering what the fastest way to parse certain parts of a file in Ruby would be. For example, suppose I know the information I want for a particular function is between lines 500 and 600 of, say, a 1000 line file. (obviously this kind of question is geared toward much large files, I'm just using those smaller numbers for the sake of example), since I know it won't be in the first half, is there a quick way of disregarding that information?
Currently I'm using something along the lines of:
while buffer = file_in.gets and file_in.lineno <600
next unless file_in.lineno > 500
if buffer.chomp!.include? some_string
do_func_whatever
end
end
It works, but I just can't help but think it could work better.
I'm very new to Ruby and am interested in learning new ways of doing things in it.
file.lines.drop(500).take(100) # will get you lines 501-600
Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.
PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would drop load whole file, there is indeed a problem: drop returns the rest of the file in an array. Here's a way this could be avoided:
file.lines.select.with_index{|l,i| (501..600) === i}
PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:
enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100
or, if you prefer FP:
file.lines.tap{|enum| 500.times{enum.next}}.take(100)
Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)
I don't know if there is an equivalent way of doing this for lines, but you can use seek or the offset argument on an IO object to "skip" bytes.
See IO#seek, or see IO#open for information on the offset argument.
Sounds like rio might be of help here. It provides you with a lines() method.
You can use IO#readlines, that returns an array with all the lines
IO.readlines(file_in)[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end
or
f = File.new(file_in)
f.readlines[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end
I've opened a file in ruby with the options a+. I can seek to the middle of the file and read from it but when I try to write the writes always go to the end. How do I write to a position in the middle?
jpg = File.new("/tmp/bot.jpg", "a+")
jpg.seek 24
puts jpg.getc.chr
jpg.seek 24
jpg.write "R"
jpg.seek 28
jpg.write "W"
puts jpg.pos
jpg.close
The R and W both end up at the end of the file.
I know I can only overwrite existing bytes, that is ok, that is what I want to do.
This behavior is exactly what you request with the "a+" mode: ensure all writes always go to the end, while allowing reading and seeking (seeking only meaningful for reading of course, given the mode). Use "r+" if you don't want all writes to always go to the end.