Running the following code
Dir.foreach(FileUtils.pwd()) do |f|
if f.end_with?('log')
File.open(f) do |file|
if File.size(f) > MAX_FILE_SIZE
puts f
puts file.ctime
puts file.mtime
# zipping the file
orig = f
Zlib::GzipWriter.open('arch_log.gz') do |gz|
gz.mtime = File.mtime(orig)
gz.orig_name = orig
gz.write IO.binread(orig)
puts "File has been archived"
end
#deleting the file
begin
File.delete(f)
puts "File has been deleted"
rescue Exception => e
puts "File #{f} can not be deleted"
puts " Error #{e.message}"
puts "======= Please remove file manually =========="
end
end
end
end
end
Also files are pretty heavy more than 1GB. Any help would be appreciated.
If the files you are reading are > 1GB, you have to have that much memory free at a minimum, because IO.binread is going to slurp that amount in.
You'd be better off to load a known amount and loop over the input until it's completely read, reading and writing in chunks.
From the docs:
IO.binread(name, [length [, offset]] ) -> string
------------------------------------------------------------------------------
Opens the file, optionally seeks to the given offset, then returns
length bytes (defaulting to the rest of the file). binread ensures
the file is closed before returning. The open mode would be "rb:ASCII-8BIT".
IO.binread("testfile") #=> "This is line one\nThis is line two\nThis is line three\nAnd so on...\n"
IO.binread("testfile", 20) #=> "This is line one\nThi"
IO.binread("testfile", 20, 10) #=> "ne one\nThis is line "
Related
I am trying to import a large text file (approximately 2 million rows of numbers at 260MB) into an array, make edits to the array, and then write the results to a new text file, by writing:
file_data = File.readlines("massive_file.txt")
file_data = file_data.map!(&:strip)
file_data.each do |s|
s.gsub!(/,.*\z/, "")
end
File.open("smaller_file.txt", 'w') do |f|
f.write(file_data.map(&:strip).uniq.join("\n"))
end
However, I have received the error failed to allocate memory (NoMemoryError). How can I allocate more memory to complete the task? Or, ideally, is there another method I can use where I can avoid having to re-allocate memory?
You can read the file line by line:
require 'set'
require 'digest/md5'
file_data = File.new('massive_file.txt', 'r')
file_output = File.new('smaller_file.txt', 'w')
unique_lines_set = Set.new
while (line = file_data.gets)
line.strip!
line.gsub!(/,.*\z/, "")
# Check if the line is unique
line_hash = Digest::MD5.hexdigest(line)
if not unique_lines_set.include? line_hash
# It is unique so add its hash to the set
unique_lines_set.add(line_hash)
# Write the line in the output file
file_output.puts(line)
end
end
file_data.close
file_output.close
You can try reading and writing one line at once:
new_file = File.open('smaller_file.txt', 'w')
File.open('massive_file.txt', 'r') do |file|
file.each_line do |line|
new_file.puts line.strip.gsub(/,.*\z/, "")
end
end
new_file.close
The only thing pending is find duplicated lines
Alternatively you can read file in chunks which should be faster compared to reading it line by line:
FILENAME="massive_file.txt"
MEGABYTE = 1024*1024
class File
def each_chunk(chunk_size=MEGABYTE) # or n*MEGABYTE
yield read(chunk_size) until eof?
end
end
filedata = ""
open(FILENAME, "rb") do |f|
f.each_chunk() {|chunk|
chunk.gsub!(/,.*\z/, "")
filedata += chunk
}
end
ref: https://stackoverflow.com/a/1682400/3035830
I need to check what method a file object is opened in. E.g. is it r, r+, w, a etc.
thefile = File.open(filename, method)
It must be using the object thefile and not just the filename.
On POSIX platforms, you can call IO#fcntl with F_GETFL to get the file status flags:
require 'fcntl'
def filemode(io)
flags = io.fcntl(Fcntl::F_GETFL)
case flags & Fcntl::O_ACCMODE
when Fcntl::O_RDONLY
'r'
when Fcntl::O_WRONLY
(flags & Fcntl::O_APPEND).zero? ? 'w' : 'a'
when Fcntl::O_RDWR
(flags & Fcntl::O_APPEND).zero? ? 'r+ / w+' : 'a+'
end
end
File.open('test.txt', 'r') { |f| puts filemode(f) } #=> r
File.open('test.txt', 'w') { |f| puts filemode(f) } #=> w
File.open('test.txt', 'a+') { |f| puts filemode(f) } #=> a+
fcntl's return value is a bitwise OR of the individual O_* flags:
Fcntl::O_RDONLY # 0
Fcntl::O_WRONLY # 1
Fcntl::O_RDWR # 2
Fcntl::O_APPEND # 4
Fcntl::O_NONBLOCK # 8
Fcntl::O_ACCMODE can be used to mask the file access modes.
Further information:
http://man7.org/linux/man-pages/man2/fcntl.2.html
http://www.gnu.org/software/libc/manual/html_node/File-Status-Flags.html
http://ruby-doc.org/stdlib/libdoc/fcntl/rdoc/Fcntl.html
I am not going to write the full script for you, but will give you a hint.
Suppose you have an IO opened:
io = File.open("/tmp/foo", "r")
Assuming that io was created successfully, you can tell whether it is opened for writing by attempting write:
begin
io.write("")
rescue IOError => e
puts e.message
end
#=> not opened for writing
Make sure to copy the file before attempting this in order not to lose the file in case the mode was "w" or "w+".
Do along the same line for distinguishing other modes.
i read multipe file and i try to get data in yaml file, but i dont know why i get nothing in my yaml file .
Do you have an idea where i can make a mistake ?
a = array.size
i = 0
array.each do |f|
while i < a
puts array[i]
output = File.new('/home/zyriuse/documents/Ruby-On-Rails/script/Api_BK/licence.yml', 'w')
File.readlines(f).each do |line|
output.puts line
output.puts line.to_yaml
#output.puts YAML::dump(line)
end
i += 1
end
end
There's two problems...
You are initializing i to zero too early... when you process the
first file 'f' you process JUST that first file as many times as you
have files in the array, but for all following files i is now always >= a so you're not doing anything with them.
You are doing File.new in every iteration of 'f' so you are wiping out your last iteration.
This might work better...
output = File.new('licence.yml', 'w')
array.each do |f|
puts f
File.readlines(f).each do |line|
output.puts line
output.puts line.to_yaml
end
end
I am trying to read file lines from a directory containing about 200 text files, however, I can't get Ruby to read them line-by-line. I did it before, using one text file, not reading them from a directory.
I can get the file names as strings, but I am struggling to open them and read each line.
Here are some of the methods I've tried.
Method 1:
def readdirectory
#filearray = []
Dir.foreach('mydirectory') do |i|
# puts i.class
#filearray.push(i)
#filearray.each do |s|
# #words =IO.readlines('s')
puts s
end#do
# puts #words
end#do
end#readdirectory
Method 2:
def tryread
Dir.foreach('mydir'){
|x| IO.readlines(x)
}
end#tryread
Method 3:
def tryread
Dir.foreach('mydir') do |s|
File.readlines(s).each do |line|
sentence =line.split
end#inner do
end #do
end#tryread
With every attempt to open the string passed by the loop function, I keep getting the error:
Permission denied - . (Errno::EACCES)
sudo ruby reader.rb or whatever your filename is.
Since permissions are process based you can not read files with elevated permissions if the process reading does not have them.
Only solutions are either to run the script with more permissions or call another process which is already running with higher permissions to read for you.
Thanks for all replies,I did a bit of trial and error and got it to work.This is the syntax I used
Dir.entries('lemmatised').each do |s|
if !File.directory?(s)
file = File.open("pathname/#{s}", 'r')
file.each_line do |line|
count+=1
#words<<line.split(/[^a-zA-Z]/)
end # inner do
puts #words
end #if
end #do
Try this one,
#it'll hold the lines
f = []
#here test directory contains all the files,
#write the path as per the your computer,
#mine's as you can see, below
#fetch filenames and keep in sorted order
a = Dir.entries("c:/Users/lordsangram/desktop/test")
#read the files, line by line
Dir.chdir("c:/Users/lordsangram/desktop/test")
#beginning for i = 1, to ignore first two elements of array a,
#which has no associated file names
2.upto(a.length-1) do |i|
File.readlines("#{a[i]}").each do |line|
f.push(line)
end
end
f.each do |l|
puts l
end
#the Tin Man -> you need to avoid processing "." and ".." which are listed in Dir.foreach and give the permission denied error. A simple if should fix all your apporoaches.
Dir.foreach(ARGV[0]) do |f|
if f != "." and f != ".."
# code to process file
# example
# File.open(ARGV[0] + "\\" + f) do |file|
# end
end
end
I need to read a file in MB chunks, is there a cleaner way to do this in Ruby:
FILENAME="d:\\tmp\\file.bin"
MEGABYTE = 1024*1024
size = File.size(FILENAME)
open(FILENAME, "rb") do |io|
read = 0
while read < size
left = (size - read)
cur = left < MEGABYTE ? left : MEGABYTE
data = io.read(cur)
read += data.size
puts "READ #{cur} bytes" #yield data
end
end
Adapted from the Ruby Cookbook page 204:
FILENAME = "d:\\tmp\\file.bin"
MEGABYTE = 1024 * 1024
class File
def each_chunk(chunk_size = MEGABYTE)
yield read(chunk_size) until eof?
end
end
open(FILENAME, "rb") do |f|
f.each_chunk { |chunk| puts chunk }
end
Disclaimer: I'm a ruby newbie and haven't tested this.
Alternatively, if you don't want to monkeypatch File:
until my_file.eof?
do_something_with( my_file.read( bytes ) )
end
For example, streaming an uploaded tempfile into a new file:
# tempfile is a File instance
File.open( new_file, 'wb' ) do |f|
# Read in small 65k chunks to limit memory usage
f.write(tempfile.read(2**16)) until tempfile.eof?
end
You can use IO#each(sep, limit), and set sep to nil or empty string, for example:
chunk_size = 1024
File.open('/path/to/file.txt').each(nil, chunk_size) do |chunk|
puts chunk
end
If you check out the ruby docs:
http://ruby-doc.org/core-2.2.2/IO.html
there's a line that goes like this:
IO.foreach("testfile") {|x| print "GOT ", x }
The only caveat is. Since, this process can read the temp file faster than the
generated stream, IMO, a latency should be thrown in.
IO.foreach("/tmp/streamfile") {|line|
ParseLine.parse(line)
sleep 0.3 #pause as this process will discontine if it doesn't allow some buffering
}
https://ruby-doc.org/core-3.0.2/IO.html#method-i-read gives an example of iterating over fixed length records with read(length):
# iterate over fixed length records
open("fixed-record-file") do |f|
while record = f.read(256)
# ...
end
end
If length is a positive integer, read tries to read length bytes without any conversion (binary mode). It returns nil if an EOF is encountered before anything can be read. Fewer than length bytes are returned if an EOF is encountered during the read. In the case of an integer length, the resulting string is always in ASCII-8BIT encoding.
FILENAME="d:/tmp/file.bin"
class File
MEGABYTE = 1024*1024
def each_chunk(chunk_size=MEGABYTE)
yield self.read(chunk_size) until self.eof?
end
end
open(FILENAME, "rb") do |f|
f.each_chunk {|chunk| puts chunk }
end
It works, mbarkhau. I just moved the constant definition to the File class and added a couple of "self"s for clarity's sake.