reading text file lines in ruby - ruby

I would like to scan each line in a text file, EXCEPT the first line.
I would usually do:
while line = file.gets do
...
...etc
end
but line = file.gets reads EVERY single line starting from the first.
How do I read from the second line onwards?

Why not simply call file.gets once and discard the result:
file.gets
while line = file.gets
# code here
end

I would do it in a simple fashion:
IO.readlines('filename').drop(1).each do |line| # drop the first array element
# do any proc here
end

Do you actually want to avoid reading the first line or avoid doing something with it. If you are OK reading the line but you want to avoid processing it then you can use lineno to ignore the line during processing as follows
f = File.new "/tmp/xx"
while line = f.gets do
puts line unless f.lineno == 1
end

Related

In Ruby, why aren't variables not interchangeable within code blocks?

I have a file called "file1.txt":
Ruby
programming
is fun
In files.rb, which I'm calling from IRB, I have:
File.open('file1.txt', 'r') do |file|
while line = file.gets
puts "** " + line.chomp + " **" #--> why can't I use file.gets.chomp?
end
end
Why isn't line and file.gets interchangeable on line 3? If I switch line with file.gets, the function does not work, and I am a little bit perplexed considering that
line = file.gets
and
file.gets = line
should be interchangeable, but in this case, it is not as it gives me an error. The function works with line.chomp.
I tried getting rid of the while code block, and simply writing
puts file.gets
and it seems to output a line of code from file1.txt, but does not work inside the while statement on line 3.
I'm not really into Ruby, but I think that is because if you use while line = file.gets, the file.gets return a line and read (and copy to buffer) the next one. In the final iteration, where the while is in the last line, the while line = file.gets will return the last line. But in the while, you call again file.gets, so as there are no more lines in file, it returns an error.
This is untested, but your code can be reduced to:
File.foreach('file1.txt') do |line|
puts "** " + line + " **"
end

Ruby read file line issue

I'm trying to read a file into a string. For instance, I tried reading this file:
123456
23456
3456
456
56
6
I tried:
contents = File.open("test.txt", "rb").read
print contents
IO.foreach('test.txt') do |line|
print line
end
File.open('test.txt', 'r').each_line do |line|
print line
end
but I seem to get a single line that will overwrite it's contents with each new line. I get 666666.
The issue has to be the fact that the file is using the CR line terminator (or your terminal is messed up and not responding to LF). print does not go into the new line by default (you should use puts if that's what you want), and each_line does not strip the line terminator. So what happens is, print "123456\r" prints out 123456 and then returns the cursor to the start of the line, without moving to the next line (so the cursor is on 1. Then when you print "23456\r", it will overwrite the first five characters and again come back to the start, the current state being 234566... In the end, 566666 will get overwritten by "6\r" for the final 666666.
Why not try the simple solution
# ruby sample code.
# process every line in a text file with ruby (version 1).
file='test.txt'
File.readlines(file).each do |line|
puts line
end
Second approach
# ruby sample code.
# process every line in a text file with ruby (version 2).
file='test.txt'
f = File.open(file, "r")
f.each_line { |line|
puts line
}
f.close
Answer Source

Force the file opened with "r+" to end

There is a file with some marker word in it:
qwerty
I am the marker!
zxcvbn
123456
I want to overwrite all the rest of the file after the marker with some unknown amount of lines instead:
qwerty
I am the marker!
inserted line #1
inserted line #2
inserted line #3
But if there are too few lines to be inserted, the tail can be still there, that I do not need:
qwerty
I am the marker!
inserted line #1
123456
Here is my code (simplified):
File.open("file.txt", "r+") do |file|
file.gets "marker"
file.gets
lines_to_insert.each do |line|
file.puts line
end
# I wish I could do file.put_EOF here
end
File.open("file.txt", "r+") do |file|
file.gets "marker"
file.gets
lines_to_insert.each do |line|
file.puts line
end
# EOF here
file.truncate(file.pos)
end
Making use of File#pos to specify where to truncate.
How about using a temp file?
File.open("file.tmp", "w") do |tmp_file|
File.open("file.txt", "r+") do |file|
file.readlines.each do |line|
# add each line of the original file up to and including marker line
tmp_file.puts line
if line.include? "marker" #or however you're indicating marker
break
end
end
# add new lines
lines_to_insert.each do |line|
tmp_file.puts line
end
end
end
FileUtils.mv 'file.tmp', 'file.txt'
This will guarantee a file with a proper EOF line and not a hacky set of lines at the end that are nothing but newline characters or spaces.
Why not fill an array with each line, by using something like this:
array = file.split("\\\n")
Then you can just find the index of the array that contains the word marker
marker_index = array.index{|line|line.include('marker')}
Then just add random values to any index > marker_index
Finally concatenate all the strings in your array (don't forget to add your \n back in) and write back to your file.

Regular Expression matching in ruby, checking for white space

I am trying to check a file for white spaces at the beginning of each line. I want the white-space at the beginning of the line to be consistent, all start with spaces or all start with tabs. I wrote the code below but it isn't working for me. If there exist a space at a beginning of one line and then a tab exists in the beginning of another line print a warning or something.
file = File.open("file_tobe_checked","r") #I'm opening the file to be checked
while (line = file.gets)
a=(line =~ /^ /).nil?
b=(line =~/^\t/).nil?
if a==false && b==false
print "The white spaces at the beginning of each line are not consistent"
end
end
file.close
This is one solution where you don't read the file or the extracted lines array twice:
#!/usr/bin/env ruby
file = ARGV.shift
tabs = spaces = false
File.readlines(file).each do |line|
line =~ /^\t/ and tabs = true
line =~ /^ / and spaces = true
if spaces and tabs
puts "The white spaces at the beginning of each line are not consistent."
break
end
end
Usage:
ruby script.rb file_to_be_checked
And it may be more efficient to compare lines with these:
line[0] == "\t" and tabs = true
line[0] == ' ' and spaces = true
You can also prefer to use each_line over readlines. Perhaps each_line allows you to read the file line by line instead of reading all the lines in one shot:
File.open(file).each_line do |line|
How important is it that you check for the whitespace (and warn/notify accordingly)? If you are aiming to just correct the whitespace, .strip is great at taking care of errant whitespace.
lines_array = File.readlines(file_to_be_checked)
File.open(file_to_be_checked, "w") do |f|
lines_array.each do |line|
# Modify the line as you need and write the result
f.write(line.strip)
end
end
I assume that no line can begin with one or more spaces followed by a tab, or vice-versa.
To merely conclude that there are one or more inconsistencies within the file is not very helpful in dealing with the problem. Instead you might consider giving the line number of the first line that begins with a space or tab, then giving the line numbers of all subsequent lines that begin with a space or tab that does not match the first line found with such. You could do that as follows (sorry, untested).
def check_file(fname)
file = File.open(fname,"r")
line_no = 0
until file.eof?
first_white = file.gets[/(^\s)/,1]
break if first_white
line_no += 1
end
unless file.eof?
puts "Line #{line_no} begins with a #{(first_white=='\t') ? "tab":"space"}"
until file.eof?
preface = file.gets[/(^\s)/,1))]
puts "Line #{line_no} begins with a #{(preface=='\t') ? "tab":"space"}" \
if preface && preface != first_white
line_no += 1
end
end
file.close
end

How to get a particular line from a file

Is it possible to extract a particular line from a file knowing its line number? For example, just get the contents of line N as a string from file "text.txt"?
You could get it by index from readlines.
line = IO.readlines("file.txt")[42]
Only use this if it's a small file.
Try one of these two solutions:
file = File.open "file.txt"
#1 solution would eat a lot of RAM
p [*file][n-1]
#2 solution would not
n.times{ file.gets }
p $_
file.close
def get_line_from_file(path, line)
result = nil
File.open(path, "r") do |f|
while line > 0
line -= 1
result = f.gets
end
end
return result
end
get_line_from_file("/tmp/foo.txt", 20)
This is a good solution because:
You don't use File.read, thus you don't read the entire file into memory. Doing so could become a problem if the file is 20MB large and you read often enough so GC doesn't keep up.
You only read from the file until the line you want. If your file has 1000 lines, getting line 20 will only read the 20 first lines into Ruby.
You can replace gets with readline if you want to raise an error (EOFError) instead of returning nil when passing an out-of-bounds line.
File has a nice lineno method.
def get_line(filename, lineno)
File.open(filename,'r') do |f|
f.gets until f.lineno == lineno - 1
f.gets
end
end
linenumber=5
open("file").each_with_index{|line,ind|
if ind+1==linenumber
save=line
# break or exit if needed.
end
}
or
linenumber=5
f=open("file")
while line=f.gets
if $. == linenumber # $. is line number
print "#{f.lineno} #{line}" # another way
# break # break or exit if needed
end
end
f.close
If you just want to get the line and do nothing else, you can use this one liner
ruby -ne '(print $_ and exit) if $.==5' file
If you want one liner and do not care about memory usage, use (assuming lines are numbered from 1)
lineN = IO.readlines('text.txt')[n-1]
or
lineN = f.readlines[n-1]
if you already have file opened.
Otherwise it would be better to do like this:
lineN = File.open('text.txt') do |f|
(n-1).times { f.gets } # skip lines preceeding line N
f.gets # read line N contents
end
These solutions work if you want only one line from a file, or if you want multiple lines from a file small enough to be read repeatedly. Large files (for example, 10 million lines) take much longer to search for a specific line so it's better to get the necessary lines sequentially in a single read so the large file doesn't get read multiple times.
Create a large file:
File.open('foo', 'a') { |f| f.write((0..10_000_000).to_a.join("\n")) }
Pick which lines will be read from it and make sure they're sorted:
lines = [9_999_999, 3_333_333, 6_666_666].sort
Print out those lines:
File.open('foo') do |f|
lines.each_with_index do |line, index|
(line - (index.zero? ? 0 : lines[index - 1]) - 1).times { f.gets }
puts f.gets
end
end
This solution works for any number of lines, does not load the entire file into memory, reads as few lines as possible, and only reads the file one time.

Resources