I've opened a file in ruby with the options a+. I can seek to the middle of the file and read from it but when I try to write the writes always go to the end. How do I write to a position in the middle?
jpg = File.new("/tmp/bot.jpg", "a+")
jpg.seek 24
puts jpg.getc.chr
jpg.seek 24
jpg.write "R"
jpg.seek 28
jpg.write "W"
puts jpg.pos
jpg.close
The R and W both end up at the end of the file.
I know I can only overwrite existing bytes, that is ok, that is what I want to do.
This behavior is exactly what you request with the "a+" mode: ensure all writes always go to the end, while allowing reading and seeking (seeking only meaningful for reading of course, given the mode). Use "r+" if you don't want all writes to always go to the end.
Related
How do you open a file, read it, then write to it replacing the entire contents, and close it?
I could do this:
File.open('foo.bin', 'r') do |f|
contents = f.read
end
# do something with the contents
File.open('foo.bin', 'w') do |f|
f.print contents
end
But there are 2 IO open steps and 2 IO close steps in that and doubling the IO steps seems like a total waste, not to mention much harder on the disk with as many times as it is likely to happen in my script.
Is there a way to open, read, overwrite, then close?
First, if you didn't profile your code, do it now. An extra file open/close is unlikely to be the cause of your slow down. Profiling will show where the real issue is.
I'm not convinced this will be any faster, but here are the general steps to do this with a single open and close.
Open for read & write.
Read the whole file (not line by line).
Truncate the file.
Go back to the beginning.
Write.
Close.
In Ruby, you do that like so:
# Open the file for read/write.
File.open("test.data", "r+") { |f|
# Read the whole file
contents = f.read
# Truncate the file
f.truncate(0)
# Jump back to the beginning
f.rewind
# Write the new content
f.write("new stuff\n")
}
Say I have the following Ruby code which, given a hash of insert positions, reads a file and creates a new file with extra text inserted at those positions:
insertpos = {14=>25,16=>25}
File.open('file.old', 'r') do |oldfile|
File.open('file.new', 'w') do |newfile|
oldfile.each_with_index do |line,linenum|
inserthere = insertpos[linenum]
if(!inserthere.nil?)then
line.insert(inserthere,"foo")
end
newfile.write(line)
end
end
end
Now, instead of creating that new file, I would like to modify this original (old) file. Can someone give me a hint on how to modify the code? Thanks!
At a very fundamental level, this is an extremely difficult thing to do, in any language, on any operating system. Envision a file as a contiguous series of bytes on disk (this is a very simplistic scenario, but it serves to illustrate the point). You want to insert some bytes in the middle of the file. Where do you put those bytes? There's no place to put them! You would have to basically "shift" the existing bytes after the insertion point "down" by the number of bytes you want to insert. If you're inserting multiple sections into an existing file, you would have to do this multiple times! It will be extremely slow, and you will run a high risk of corrupting your data if something goes awry.
You can, however, overwrite existing bytes, and/or append to the end of the file. Most Unix utilities give the appearance of modifying files by creating new files and swapping them with the old. Some more sophisticated schemes, such as those used by databases, allow inserts in the middle of files by 1. reserving space for such operations (when the data is first written), 2. allowing non-contiguous blocks of data within the file through indexing and other techniques, and/or 3. copy-on-write schemes where a new version of the data is written to the end of the file and the old version is invalidated by overwriting an indicator of some kind. You are most likely not wanting to go through all this trouble for your simple use case!
Anyway, you've already found the best way to do what you're trying to do. The only thing you're missing is a FileUtils.mv('file.new', 'file.old') at the very end to replace the old file with the new. Please let me know in the comments if I can help explain this any further.
(Of course, you can read the entire file into memory, make your changes, and overwrite the old file with the updated contents, but I don't believe that's what you're asking here.)
Here's something that hopefully solves your purpose:
# 'source' param is a string, the entire source text
# 'lines' param is an array, a list of line numbers to insert after
# 'new' param is a string, the text to add
def insert(source, lines, new)
results = []
source.split("\n").each_with_index do |line, idx|
if lines.include?(idx)
results << (line + new)
else
results << line
end
end
results.join("\n")
end
File.open("foo", "w") do |f|
10.times do |i|
f.write("#{i}\n")
end
end
puts "initial text: \n\n"
txt = File.read("foo")
puts txt
puts "\n\n after inserting at lines 1,3, and 5: \n\n"
result = insert(txt, [1,3,5], "\nfoo")
puts result
Running this shows:
initial text:
0
1
2
3
4
5
6
7
8
9
after inserting at lines 1,3, and 5:
0
1
foo
2
3
foo
4
5
foo
6
7
8
If its a relatively simple operation you can do it with a ruby one-liner, like this
ruby -i -lpe '$_.reverse!' thefile.txt
(found e.g. at https://gist.github.com/KL-7/1590797).
Let's say I want to combine several massive files into one and then uniq! the one (THAT alone might take a hot second)
It's my understanding that File.readlines() loads ALL the lines into memory. Is there a way to read it line by line, sort of like how node.js pipe() system works?
One of the great things about Ruby is that you can do file IO in a block:
File.open("test.txt", "r").each_line do |row|
puts row
end # file closed here
so things get cleaned up automatically. Maybe it doesn't matter on a little script but it's always nice to know you can get it for free.
you aren't operating on the entire file contents at once, and you don't need to store the entirety of each line either if you use readline.
file = File.open("sample.txt", 'r')
while !file.eof?
line = file.readline
puts line
end
Large files are best read by streaming methods like each_line as shown in the other answer or with foreach which opens the file and reads line by line. So if the process doesn't request to have the whole file in memory you should use the streaming methods. While using streaming the required memory won't increase even if the file size increases opposing to non-streaming methods like readlines.
File.foreach("name.txt") { |line| puts line }
uniq! is defined on Array, so you'll have to read the files into an Array anyway. You cannot process the file line-by-line because you don't want to process a file, you want to process an Array, and an Array is a strict in-memory data structure.
Probably a simple question, but I need to delete the contents of a file after a specific line number? So I wan't to keep the first e.g 5 lines and delete the rest of the contents of a file. I have been searching for a while and can't find a way to do this, I am an iOS developer so Ruby is not a language I am very familiar with.
That is called truncate. The truncate method needs the byte position after which everything gets cut off - and the File.pos method delivers just that:
File.open("test.csv", "r+") do |f|
f.each_line.take(5)
f.truncate( f.pos )
end
The "r+" mode from File.open is read and write, without truncating existing files to zero size, like "w+" would.
The block form of File.open ensures that the file is closed when the block ends.
I'm not aware of any methods to delete from a file so my first thought was to read the file and then write back to it. Something like this:
path = '/path/to/thefile'
start_line = 0
end_line = 4
File.write(path, File.readlines(path)[start_line..end_line].join)
File#readlines reads the file and returns an array of strings, where each element is one line of the file. You can then use the subscript operator with a range for the lines you want
This isn't going to be very memory efficient for large files, so you may want to optimise if that's something you'll be doing.
I want to gain some insight into how Ruby manages file buffering. I looked elsewhere for the answers, but I guess I'm not asking the right questions.
In an IRB session I opened a file for reading:
f = File.open('somefile.txt', 'r')
Using this command:
puts f.gets
prints out the first line of somefile.txt. If I repeat the puts f.gets command, I get the second line, and so on.
My questions are:
Is the file buffer being altered by gets?
If the answer to question 1 is yes, then is there any way to see all the lines that still remain in the buffer?
If the answer to question 2 is no, then I'm assuming that gets has some record of the last line of the file that it read. Is there any way to find out the value of this line index?
f.lineno will give you the current line number
f.pos will give you the current offset in bytes