How to read lines of a file in Ruby - ruby

I was trying to use the following code to read lines from a file. But when reading a file, the contents are all in one line:
line_num=0
File.open('xxx.txt').each do |line|
print "#{line_num += 1} #{line}"
end
But this file prints each line separately.
I have to use stdin, like ruby my_prog.rb < file.txt, where I can't assume what the line-ending character is that the file uses. How can I handle it?

Ruby does have a method for this:
File.readlines('foo').each do |line|
puts(line)
end
http://ruby-doc.org/core-1.9.3/IO.html#method-c-readlines

File.foreach(filename).with_index do |line, line_num|
puts "#{line_num}: #{line}"
end
This will execute the given block for each line in the file without slurping the entire file into memory. See: IO::foreach.

I believe my answer covers your new concerns about handling any type of line endings since both "\r\n" and "\r" are converted to Linux standard "\n" before parsing the lines.
To support the "\r" EOL character along with the regular "\n", and "\r\n" from Windows, here's what I would do:
line_num=0
text=File.open('xxx.txt').read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
print "#{line_num += 1} #{line}"
end
Of course this could be a bad idea on very large files since it means loading the whole file into memory.

Your first file has Mac Classic line endings (that’s "\r" instead of the usual "\n"). Open it with
File.open('foo').each(sep="\r") do |line|
to specify the line endings.

I'm partial to the following approach for files that have headers:
File.open(file, "r") do |fh|
header = fh.readline
# Process the header
while(line = fh.gets) != nil
#do stuff
end
end
This allows you to process a header line (or lines) differently than the content lines.

It is because of the endlines in each lines.
Use the chomp method in ruby to delete the endline '\n' or 'r' at the end.
line_num=0
File.open('xxx.txt').each do |line|
print "#{line_num += 1} #{line.chomp}"
end

how about gets ?
myFile=File.open("paths_to_file","r")
while(line=myFile.gets)
//do stuff with line
end

Don't forget that if you are concerned about reading in a file that might have huge lines that could swamp your RAM during runtime, you can always read the file piece-meal. See "Why slurping a file is bad".
File.open('file_path', 'rb') do |io|
while chunk = io.read(16 * 1024) do
something_with_the chunk
# like stream it across a network
# or write it to another file:
# other_io.write chunk
end
end

Related

How to delete lines from multiple files

I'm trying to read a file (d:\mywork\list.txt) line by line and search if that string occurs in any of the files (one by one) in a particular directory (d:\new_work).
If present in any of the files (may be one or more) I want to delete the string (car\yrui3,) from the respective files and save the respective file.
list.txt:
car\yrui3,
dom\09iuo,
id\byt65_d,
rfc\some_one,
desk\aa_tyt_99,
.........
.........
Directory having multiple files: d:\new_work:
Rollcar-access.txt
Mycar-access.txt
Newcar-access.txt
.......
......
My code:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
The issue is, values are not getting deleted as expected. Also, text is empty when I tried to print the value.
There are a number of things wrong with your code, and you're not safely handling your file changes.
Meditate on this untested code:
ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")
File.foreach('D:/mywork/list.txt') do |target|
target = target.strip.sub(/,$/, '')
ACCESS_FILES.each do |filename|
new_filename = "#{filename}.new"
old_filename = "#{filename}.old"
File.open(new_filename, 'w') do |fileout|
File.foreach(filename) do |line_in|
fileout.puts line_in unless line_in[target]
end
end
File.rename(filename, old_filename)
File.rename(new_filename, filename)
File.delete(old_filename)
end
end
In your code you use:
File.open('D:\\mywork\\list.txt').read
instead, a shorter, and more concise and clear way would be to use:
File.read('D:/mywork/list.txt')
Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability. From the IO documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb".
The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.
Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
While
Dir.glob("D:/new_work/*-access.txt") do |fn|
is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.
Again,
text = File.read(fn)
has scalability issues. Using foreach is a better solution. Again.
Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:
replace = text.gsub(line.strip, "")
Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:
File.open(fn, "w") { |file| file.puts replace }
A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.
A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.
I just ran your code and it works as expected on my machine. My best guess is that you're not taking the commas at the end of each line in list.txt into account. Try removing them with an extra chomp!:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
line.chomp!(",")
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
By the way, you shouldn't need this line: value.gsub!(/\r\n?/, "\n") since you're chomping all the newlines away anyway, and chomp can recognize \r\n by default.

Deleting a specific line in a text file?

How can I delete a single, specific line from a text file? For example the third line, or any other line. I tried this:
line = 2
file = File.open(filename, 'r+')
file.each { last_line = file.pos unless file.eof? }
file.seek(last_line, IO::SEEK_SET)
file.close
Unfortunately, it does nothing. I tried a lot of other solutions, but nothing works.
I think you can't do that safely because of file system limitations.
If you really wanna do a inplace editing, you could try to write it to memory, edit it, and then replace the old file. But beware that there's at least two problems with this approach. First, if your program stops in the middle of rewriting, you will get an incomplete file. Second, if your file is too big, it will eat your memory.
file_lines = ''
IO.readlines(your_file).each do |line|
file_lines += line unless <put here your condition for removing the line>
end
<extra string manipulation to file_lines if you wanted>
File.open(your_file, 'w') do |file|
file.puts file_lines
end
Something along those lines should work, but using a temporary file is a much safer and the standard approach
require 'fileutils'
File.open(output_file, "w") do |out_file|
File.foreach(input_file) do |line|
out_file.puts line unless <put here your condition for removing the line>
end
end
FileUtils.mv(output_file, input_file)
Your condition could be anything that showed it was the unwanted line, like, file_lines += line unless line.chomp == "aaab" for example, would remove the line "aaab".

Ruby console overwrites line when printing out lines in a file

I have started learning Ruby and I have come across an annoying problem. I have imported a text file into my program and I want to iterate over the lines in it and print them out to the screen.
When I do this, the console overwrites the last printed out line and writes the new one on top. Why is this happening and how can I solve it?
Here is my code:
passwords = File.open('C:\Users\Ryan\Desktop\pw.txt', 'r')
lines = passwords.gets
for line in lines
puts line
end
Update:
The loop is acting very strange. I put a sleep statement into it and all it did was sleep once then continue to output the lines. I would have expected it to sleep before outputting each line. Example below:
passwords.each do |line|
sleep 1
puts line.chomp
end
Update 2:
I just created a new text file and typed some random stuff into it for testing and it works fine. Looks like the original file had some bad characters/encoding which messed up the printing to the console.
Do you have an EOL (AKA end-of-line) problem? Try this:
passwords = File.open('C:\Users\Ryan\Desktop\pw.txt', 'r')
lines = passwords.gets
lines.each { |line| puts line.chomp }
passwords.close
The chomp call will strip off any \n, \r, or \r\n line endings, then puts will append the native EOL.
File.open('C:\Users\Ryan\Desktop\pw.txt') do |line|
while not line.eof?
puts line.readline.chomp
end
end
or
File.read("file").each { |line| puts line.chomp }
In the end I found out that the text file was the cause of my problem. I created a new one with the same content and it started working how I intended.

How to get a particular line from a file

Is it possible to extract a particular line from a file knowing its line number? For example, just get the contents of line N as a string from file "text.txt"?
You could get it by index from readlines.
line = IO.readlines("file.txt")[42]
Only use this if it's a small file.
Try one of these two solutions:
file = File.open "file.txt"
#1 solution would eat a lot of RAM
p [*file][n-1]
#2 solution would not
n.times{ file.gets }
p $_
file.close
def get_line_from_file(path, line)
result = nil
File.open(path, "r") do |f|
while line > 0
line -= 1
result = f.gets
end
end
return result
end
get_line_from_file("/tmp/foo.txt", 20)
This is a good solution because:
You don't use File.read, thus you don't read the entire file into memory. Doing so could become a problem if the file is 20MB large and you read often enough so GC doesn't keep up.
You only read from the file until the line you want. If your file has 1000 lines, getting line 20 will only read the 20 first lines into Ruby.
You can replace gets with readline if you want to raise an error (EOFError) instead of returning nil when passing an out-of-bounds line.
File has a nice lineno method.
def get_line(filename, lineno)
File.open(filename,'r') do |f|
f.gets until f.lineno == lineno - 1
f.gets
end
end
linenumber=5
open("file").each_with_index{|line,ind|
if ind+1==linenumber
save=line
# break or exit if needed.
end
}
or
linenumber=5
f=open("file")
while line=f.gets
if $. == linenumber # $. is line number
print "#{f.lineno} #{line}" # another way
# break # break or exit if needed
end
end
f.close
If you just want to get the line and do nothing else, you can use this one liner
ruby -ne '(print $_ and exit) if $.==5' file
If you want one liner and do not care about memory usage, use (assuming lines are numbered from 1)
lineN = IO.readlines('text.txt')[n-1]
or
lineN = f.readlines[n-1]
if you already have file opened.
Otherwise it would be better to do like this:
lineN = File.open('text.txt') do |f|
(n-1).times { f.gets } # skip lines preceeding line N
f.gets # read line N contents
end
These solutions work if you want only one line from a file, or if you want multiple lines from a file small enough to be read repeatedly. Large files (for example, 10 million lines) take much longer to search for a specific line so it's better to get the necessary lines sequentially in a single read so the large file doesn't get read multiple times.
Create a large file:
File.open('foo', 'a') { |f| f.write((0..10_000_000).to_a.join("\n")) }
Pick which lines will be read from it and make sure they're sorted:
lines = [9_999_999, 3_333_333, 6_666_666].sort
Print out those lines:
File.open('foo') do |f|
lines.each_with_index do |line, index|
(line - (index.zero? ? 0 : lines[index - 1]) - 1).times { f.gets }
puts f.gets
end
end
This solution works for any number of lines, does not load the entire file into memory, reads as few lines as possible, and only reads the file one time.

Ruby: Length of a line of a file in bytes?

I'm writing this little HelloWorld as a followup to this and the numbers do not add up
filename = "testThis.txt"
total_bytes = 0
file = File.new(filename, "r")
file.each do |line|
total_bytes += line.unpack("U*").length
end
puts "original size #{File.size(filename)}"
puts "Total bytes #{total_bytes}"
The result is not the same as the file size. I think I just need to know what format I need to plug in... or maybe I've missed the point entirely. How can I measure the file size line by line?
Note: I'm on Windows, and the file is encoded as type ANSI.
Edit: This produces the same results!
filename = "testThis.txt"
total_bytes = 0
file = File.new(filename, "r")
file.each_byte do |whatever|
total_bytes += 1
end
puts "Original size #{File.size(filename)}"
puts "Total bytes #{total_bytes}"
so anybody who can help now...
IO#gets works the same as if you were capturing input from the command line: the "Enter" isn't sent as part of the input; neither is it passed when #gets is called on a File or other subclass of IO, so the numbers are definitely not going to match up.
See the relevant Pickaxe section
May I enquire why you're so concerned about the line lengths summing to the file size? You may be solving a harder problem than is necessary...
Aha. I think I get it now.
Lacking a handy iPod (or any other sort, for that matter), I don't know if you want exactly 4K chunks, in which case IO#read(4000) would be your friend (4000 or 4096?) or if you're happier to break by line, in which case something like this ought to work:
class Chunkifier
def Chunkifier.to_chunks(path)
chunks, current_chunk_size = [""], 0
File.readlines(path).each do |line|
line.chomp! # strips off \n, \r or \r\n depending on OS
if chunks.last.size + line.size >= 4_000 # 4096?
chunks.last.chomp! # remove last line terminator
chunks << ""
end
chunks.last << line + "\n" # or whatever terminator you need
end
chunks
end
end
if __FILE__ == $0
require 'test/unit'
class TestFile < Test::Unit::TestCase
def test_chunking
chs = Chunkifier.to_chunks(PATH)
chs.each do |chunk|
assert 4_000 >= chunk.size, "chunk is #{chunk.size} bytes long"
end
end
end
end
Note the use of IO#readlines to get all the text in one slurp: #each or #each_line would do as well. I used String#chomp! to ensure that whatever the OS is doing, the byts at the end are removed, so that \n or whatever can be forced into the output.
I would suggest using File#write, rather than #print or #puts for the output, as the latter have a tendency to deliver OS-specific newline sequences.
If you're really concerned about multi-byte characters, consider taking the each_byte or unpack(C*) options and monkey-patching String, something like this:
class String
def size_in_bytes
self.unpack("C*").size
end
end
The unpack version is about 8 times faster than the each_byte one on my machine, btw.
You might try IO#each_byte, e.g.
total_bytes = 0
file_name = "test_this.txt"
File.open(file_name, "r") do |file|
file.each_byte {|b| total_bytes += 1}
end
puts "Original size #{File.size(file_name)}"
puts "Total bytes #{total_bytes}"
That, of course, doesn't give you a line at a time. Your best option for that is probably to go through the file via each_byte until you encounter \r\n. The IO class provides a bunch of pretty low-level read methods that might be helpful.
You potentially have several overlapping issues here:
Linefeed characters \r\n vs. \n (as per your previous post). Also EOF file character (^Z)?
Definition of "size" in your problem statement: do you mean "how many characters" (taking into account multi-byte character encodings) or do you mean "how many bytes"?
Interaction of the $KCODE global variable (deprecated in ruby 1.9. See String#encoding and friends if you're running under 1.9). Are there, for example, accented characters in your file?
Your format string for #unpack. I think you want C* here if you really want to count bytes.
Note also the existence of IO#each_line (just so you can throw away the while and be a little more ruby-idiomatic ;-)).
The issue is that when you save a text file on windows, your line breaks are two characters (characters 13 and 10) and therefore 2 bytes, when you save it on linux there is only 1 (character 10). However, ruby reports both these as a single character '\n' - it says character 10. What's worse, is that if you're on linux with a windows file, ruby will give you both characters.
So, if you know that your files are always coming from windows text files and executed on windows, every time you get a newline character you can add 1 to your count. Otherwise it's a couple of conditionals and a little state machine.
BTW there's no EOF 'character'.
f = File.new("log.txt")
begin
while (line = f.readline)
line.chomp
puts line.length
end
rescue EOFError
f.close
end
Here is a simple solution, presuming that the current file pointer is set to the start of a line in the read file:
last_pos = file.pos
next_line = file.gets
current_pos = file.pos
backup_dist = last_pos - current_pos
file.seek(backup_dist, IO::SEEK_CUR)
in this example "file" is the file from which you are reading. To do this in a loop:
last_pos = file.pos
begin loop
next_line = file.gets
current_pos = file.pos
backup_dist = last_pos - current_pos
last_pos = current_pos
file.seek(backup_dist, IO::SEEK_CUR)
end loop

Resources