Ruby - how to read first n lines from file into array - ruby

For some reason, I can't find any tutorial mentioning how to do this...
So, how do I read the first n lines from a file?
I've come up with:
while File.open('file.txt') and count <= 3 do |f|
...
count += 1
end
end
but it is not working and it also doesn't look very nice to me.
Just out of curiosity, I've tried things like:
File.open('file.txt').10.times do |f|
but that didn't really work either.
So, is there a simple way to read just the first n lines without having to load the whole file?
Thank you very much!

Here is a one-line solution:
lines = File.foreach('file.txt').first(10)
I was worried that it might not close the file in a prompt manner (it might only close the file after the garbage collector deletes the Enumerator returned by File.foreach). However, I used strace and I found out that if you call File.foreach without a block, it returns an enumerator, and each time you call the first method on that enumerator it will open up the file, read as much as it needs, and then close the file. That's nice, because it means you can use the line of code above and Ruby will not keep the file open any longer than it needs to.

There are many ways you can approach this problem in Ruby. Here's one way:
File.open('Gemfile') do |f|
lines = 10.times.map { f.readline }
end

File.foreach('file.txt').with_index do |line, i|
break if i >= 10
puts line
end

File inherits from IO and IO mixes in Enumerable methods which include #first
Passing an integer to first(n) will return the first n items in the enumerable collection. For a File object, each item is a line in the file.
File.open('filename.txt', 'r').first(10)
This returns an array of the lines including the \n line breaks.
You may want to #join them to create a single whole string.
File.open('filename.txt', 'r').first(10).join

You could try the following:
`head -n 10 file`.split
It's not really "pure ruby" but that's rarely a requirement these days.

Related

How to read a file more than one time in Cinch and Ruby

I have this code:
on :message, "something" do |m|
m.reply file.read.lines[2]
end
...which works, but only once. When I try it again or use the same code but with a different file, it doesn't work. Can someone help me do this?
The reason you're getting this behavior depends on how you're defining file.
But with no other information it's still possible to give a (hopefully) working example:
on :message, "something" do |m|
first_line_to_read = 0
last_line_to_read = 2
lines_of_text = file.read.split("\n")
first_line_to_read.upto(last_line_to_read).each do |idx|
m.reply lines_of_text[idx]
end
end
Hopefully this example is clear. It sends separate replies for each line of the text that is within the bounds of the first_line_to_read and last_line_to_read indexes.
Some important concepts are:
Read the entire file into a string and store it to a variable. If for whatever reason you can't call file.read multiple times, this will store the result of calling it the first time.
Split the string by newlines
Use an iterator to go one-by-one through the desired lines of text
inside the iterator block, use the m variable which is defined in parent scope to send the message.

How to reset value of local variable within loop?

I'd like to point out I tried quite extensively to find a solution for this and the closest I got was this. However I couldn't see how I could use map to solve my issue here. I'm brand new to Ruby so please bear that in mind.
Here's some code I'm playing with (simplified):
def base_word input
input_char_array = input.split('') # split string to array of chars
#file.split("\n").each do |dict_word|
input_text = input_char_array
dict_word.split('').each do |char|
if input_text.include? char.downcase
input_text.slice!(input_text.index(char))
end
end
end
end
I need to reset the value of input_text back to the original value of input_char_array after each cycle, but from what I gather since Ruby is reference-based, the modifications I make with the line input_text.slice!(input_text.index(char)) are reflected back in the original reference, and I end up assigning input_text to an empty array fairly quickly as a result.
How do I mitigate that? As mentioned I've tried to use .map but maybe I haven't fully wrapped my head around how I ought to go about it.
You can get an independent reference by cloning the array. This, obviously, has some RAM usage implications.
input_text = input_char_array.dup
The Short and Quite Frankly Not Very Good Answer
Using slice! overwrites the variable in place, equivalent to
input_text = input_text.slice # etc.
If you use plain old slice instead, it won't overwrite input_text.
The Longer and Quite Frankly Much Better Answer
In Ruby, code nested four levels deep is often a smell. Let's refactor, and avoid the need to reset a loop at all.
Instead of splitting the file by newline, we'll use Ruby's built-in file handling module to read through the lines. Memoizing it (the ||= operator) may prevent it from reloading the file each time it's referenced, if we're running this more than once.
def dictionary
#dict ||= File.open('/path/to/dictionary')
end
We could also immediately make all the words lowercase when we open the file, since every character is downcased individually in the original example.
def downcased_dictionary
#dict ||= File.open('/path/to/dictionary').each(&:downcase)
end
Next, we'll use Ruby's built-in file and string functions, including #each_char, to do the comparisons and output the results. We don't need to convert any inputs into Arrays (at all!), because #include? works on strings, and #each_char iterates over the characters of a string.
We'll decompose the string-splitting into its own method, so the loop logic and string logic can be understood more clearly.
Lastly, by using #slice instead of #slice!, we don't overwrite input_text and entirely avoid the need to reset the variable later.
def base_word(input)
input_text = input.to_s # Coerce in case it's not a string
# Read through each line in the dictionary
dictionary.each do |word|
word.each_char {|char| slice_base_word(input_text, char) }
end
end
def slice_base_word(input, char)
input.slice(input.index(char)) if input.include?(char)
end

Learn Ruby the Hard Way ex17 extra credit 3 - consolidating to one line

For exercise 17, through searching other responses I was able to condense the following into one line (as asked in the extra credit #3)
from_file, to_file = ARGV
script = $0
input = File.open(from_file)
indata = input.read()
output = File.open(to_file, 'w')
output.write(indata)
output.close()
input.close()
I was able to condense it into:
from_file, to_file = ARGV
script = $0
File.open(to_file, 'w') {|f| f.write IO.read(from_file)}
Is there a better/different way to condense this into 1 line?
Can someone help explain the line I created? I created this from various questions/answers unrelated to this question. I have tried looking up exactly what I did but I am still a little lost and want a full understanding of it.
Similar to using IO::read to simplify "just read the whole file into a string", you can use IO::write to "just write the string to the file":
from_file, to_file = ARGV
IO.write(to_file, IO.read(from_file))
Since you don't use script, it can be removed. If you really want to get things down to one line, you can do:
IO.write(ARGV[1], IO.read(ARGV[0]))
I personally find this just as comprehensible, and the lack of error checking is equivalent.
You're using File#open with a block to open to_file in write-only mode ('w'). Inside the block you have access to the open file as f, and the file will be closed for you when the block terminates. IO::read reads the entire contents of from_file, which you then pass to IO#write on f (File is a subclass of IO), writing those contents to f (which is the open, write-only File for to_file).
There are always different ways of doing things:
Using File.open with a block is a good approach here. I like that to_file and from_file are declared in variables. So I think this is a good and readable solution that is not overly verbose.
The basic approach here is swapping out open/close operations with the more-clean File.open method with a block. File.open with a block will open a file, run the block, and then close the file, which is exactly what is needed here. Because the method automatically opens and closes the file, we are able to remove the boilerplate code that appears in the initial example. IO.read is another shortcut method that allows us to open/read/close the file without all of the open/close boilerplate. This is an exercise to learn more about Ruby's standard File/IO library, and in this case swapping out the more verbose methods is sufficient to reduce things to a single line.
I'm just a complete beginner, but this works for me:
open(ARGV[1], 'w').write(open(ARGV[0]).read)
It doesn't look elegant for me, but it works.
Edit: This is my attempt to put the entire script into one line if it's not clear.

Check if file contains string

So I found this question on here, but I'm having an issue with the output and how to handle it with an if statement. This is what I have, but it's always saying that it's true even if the word monitor does not exist in the file
if File.readlines("testfile.txt").grep(/monitor/)
do something
end
Should it be something like == "nil"? I'm quite new to ruby and not sure of what the outputs would be.
I would use:
if File.readlines("testfile.txt").grep(/monitor/).any?
or
if File.readlines("testfile.txt").any?{ |l| l['monitor'] }
Using readlines has scalability issues though as it reads the entire file into an array. Instead, using foreach will accomplish the same thing without the scalability problem:
if File.foreach("testfile.txt").grep(/monitor/).any?
or
if File.foreach("testfile.txt").any?{ |l| l['monitor'] }
See "Why is "slurping" a file not a good practice?" for more information about the scalability issues.
Enumerable#grep does not return a boolean; it returns an array (how would you have access to the matches without passing a block otherwise?).
If no matches are found it returns an empty array, and [] evaluates to true. You'll need to check the size of the array in the if statement, i.e.:
if File.readlines("testfile.txt").grep(/monitor/).size > 0
# do something
end
The documentation should be your first resource for questions like this.
Grep will give you an array of all found 'monitor's. But you don't want an array, you want a boolean: is there any 'monitor' string in this file?
This one reads as little of the file as needed:
if File.open('test.txt').lines.any?{|line| line.include?('monitor')}
p 'do something'
end
readlines reads the whole file, lines returns an enumerator which does it line by line.
update
#lines are deprecated, Use #each_line instead
if File.open('test.txt').each_line.any?{|line| line.include?('monitor')}
p 'do something'
end
if anyone is looking for a solution to display last line of a file where that string occurs just do
File.readlines('dir/testfile.txt').select{|l| l.match /monitor/}.last
example
file:
monitor 1
monitor 2
something else
you'll get
monitor 2
I generally skip ruby for the command-line utilities as they tend to be faster.
`grep "monitor" "testfile.txt" > /dev/null`
$?.success #=> true if zero exit status, false otherwise.

How can I handle large files in Ruby?

I'm pretty new to programming, so be gentle. I'm trying to extract IBSN numbers from a library database .dat file. I have written code that works, but it is only searching through about half of the 180MB file. How can I adjust it to search the whole file? Or how can I write a program the will split the dat file into manageable chunks?
edit: Here's my code:
export = File.new("resultsfinal.txt","w+")
File.open("bibrec2.dat").each do |line|
line.scan(/[a]{1}[1234567890xX]{10}\W/) do |x|
export.puts x
end
line.scan(/[a]{1}[1234567890xX]{13}/) do |x|
export.puts x
end
end
You should try to catch exception to check if the problem is really on the read block or not.
Just so you know I already made a script with kinda the same syntax to search real big file of ~8GB without problem.
export = File.new("resultsfinal.txt","w+")
File.open("bibrec2.dat").each do |line|
begin
line.scan(/[a]{1}[1234567890xX]{10}\W/) do |x|
export.puts x
end
line.scan(/[a]{1}[1234567890xX]{13}/) do |x|
export.puts x
end
rescue
puts "Problem while adding the result"
end
end
The main thing is to clean up and combine the regex for performance benefits. Also you should always use block syntax with files to ensure the fd's are getting closed properly. File#each doesn't load the whole file into memory, it does one line at a time:
File.open("resultsfinal.txt","w+") do |output|
File.open("bibrec2.dat").each do |line|
output.puts line.scan(/a[\dxX]{10}(?:[\dxX]{3}|\W)/)
end
end
file = File.new("bibrec2.dat", "r")
while (line = file.gets)
line.scan(/[a]{1}[1234567890xX]{10}\W/) do |x|
export.puts x
end
line.scan(/[a]{1}[1234567890xX]{13}/) do |x|
export.puts x
end
end
file.close
As to the performance issue, I can't see anything particularly worrying about the file size: 180MB shouldn't pose any problems. What happens to memory use when you're running your script?
I'm not sure, however, that your Regular Expressions are doing what you want. This, for example:
/[a]{1}[1234567890xX]{10}\W/
does (I think) this:
one "a". Do you really want to match for an "a"? "a" would suffice, rather than "[a]{1}", in that case.
exactly 10 of (digit or "x" or "X")
a single "non-word" character i.e. not a-z, A-Z, 0-9 or underscore
There are a couple of sample ISBN matchers here and here, although they seem to be matching something more like the format that we see on the back cover of a book and I'm guessing your input file has stripped out some of that formatting.
You can look into using File#truncate and IO#seek and employ the binary search type algorithm. #truncate may be destructive so you should duplicate the file (I know this is a hassle).
middle = File.new("my_huge_file.dat").size / 2
tmpfile = File.new("my_huge_file.dat", "r+").truncate(middle)
# run search algoritm on 'tmpfile'
File.open("my_huge_file.dat") do |huge_file|
huge_file.seek(middle + 1)
# run search algorithm from here
end
The code is highly untested, brittle and incomplete. But I hope it gives you a platform to build of off.
If you are programming on a modern operating system and the computer has enough memory (say 512megs), Ruby should have no problem reading the entire file into memory.
Things typically get iffy when you get to about a 2 gigabyte working set on a typical 32bit OS.

Resources