In Ruby- Parsing Directory and reading first row of the file - ruby

Below is the piece of code that is supposed read the directory and for each file entry prints the first row of the file. The issue is x is not visible so file is not being parsed.
Dir.foreach("C:/fileload/src") do |file_name|
x = file_name
puts x
f = File.open("C:/fileload/src/" +x)
f.readlines[1..1].each do |line|
puts line
end
end

Why are you assigning x to file_name? You can use file_name directly. And if you are only reading the first line of the file, why not try this?
#!/usr/bin/ruby
dir = "C:/fileload/src"
Dir.foreach(dir) do |file_name|
full = File.join(dir, file_name)
if File.file?(full)
f = File.open(full)
puts f.first
f.close
end
end
You should use File.join to safely combine paths in Ruby. I also checked that you are opening a file using the File.file? method.

You have no visibility issue with x. You should be using File::join or Pathname#+ to build your file paths. You should exclude non-files from consideration. You're selecting the second line, not the first with [1..1]. Here's a cleaner correct replacement for your sample code.
dir = "C:/fileload/src"
Dir.foreach(dir).
map { |fn| File.join(dir,fn) }.
select { |fn| File.file?(fn) }.
each { |fn| puts File.readlines(fn).first }

Related

Ruby - iterate tasks with files

I am struggling to iterate tasks with files in Ruby.
(Purpose of the program = every week, I have to save 40 pdf files off the school system containing student scores, then manually compare them to last week's pdfs and update one spreadsheet with every student who has passed their target this week. This is a task for a computer!)
I have converted a pdf file to text, and my program then extracts the correct data from the text files and turns each student into an array [name, score, house group]. It then checks each new array against the data in the csv file, and adds any new results.
My program works on a single pdf file, because I've manually typed in:
f = File.open('output\agb summer report.txt')
agb = []
f.each_line do |line|
agb.push line
end
But I have a whole folder of pdf files that I want to run the program on iteratively. I've also had problems when I try to write each result to a new-named file.
I've tried things with variables and code blocks, but I now don't think you can use a variable in that way?
Dir.foreach('output') do |ea|
f = File.open(ea)
agb = []
f.each_line do |line|
agb.push line
end
end
^ This doesn't work. I've also tried exporting the directory names to an array, and doing something like:
a.each do |ea|
var = '\'output\\' + ea + '\''
f = File.open(var)
agb = []
f.each_line do |line|
agb.push line
end
end
I think I'm fundamentally confused about the sorts of object File and Dir are? I've searched a lot and haven't found a solution yet. I am fairly new to Ruby.
Anyway, I'm sure this can be done - my current backup plan is to copy my program 40 times with different details, but that sounds absurd. Please offer thoughts?
You're very close. Dir.foreach() will return the name of the files whereas File.open() is going to want the path. A crude example to illustrate this:
directory = 'example_directory'
Dir.foreach(directory) do |file|
# Assuming Unix style filesystem, skip . and ..
next if file.start_with? '.'
# Simply puts the contents
path = File.join(directory, file)
puts File.read(path)
end
Use Globbing for File Lists
You need to use Dir#glob to get your list of files. For example, given three PDF files in /tmp/pdf, you collect them with a glob like so:
Dir.glob('/tmp/pdf/*pdf')
# => ["/tmp/pdf/1.pdf", "/tmp/pdf/2.pdf", "/tmp/pdf/3.pdf"]
Dir.glob('/tmp/pdf/*pdf').class
# => Array
Once you have a list of filenames, you can iterate over them with something like:
Dir.glob('/tmp/pdf/*pdf').each do |pdf|
text = %x(pdftotext "#{pdf}")
# do something with your textual data
end
If you're on a Windows system, then you might need a gem like pdf-reader or something else from Ruby Toolbox that suits you better to actually parse the PDF. Regardless, you should use globbing to create a file list; what you do after that depends on what kind of data the file actually holds. IO#read and descendants like File#read are good places to start.
Handling Text Files
If you're dealing with text files rather than PDF files, then something like this will get you started:
Dir.glob('/tmp/pdf/*txt').each do |text|
# Do something with your textual data. In this case, just
# dump the files to standard output.
p File.read(text)
end
You can use Dir.new("./") to get all the files in the current directory
so something like this should work.
file_names = Dir.new "./"
file_names.each do |file_name|
if file_name.end_with? ".txt"
f = File.open(file_name)
agb = []
f.each_line do |line|
agb.push line
end
end
end
btw, you can just use agb = f.to_a to convert the file contents into an array were each element is a line from the file.
file_names = Dir.new "./"
file_names.each do |file_name|
if file_name.end_with? ".txt"
f = File.open file_name
agb = f.to_a
# do whatever processing you need to do
end
end
if you assign your target folder like this /path/to/your/folder/*.txt it will only iterate over text files.
2.2.0 :009 > target_folder = "/home/ziya/Desktop/etc3/example_folder/*.txt"
=> "/home/ziya/Desktop/etc3/example_folder/*.txt"
2.2.0 :010 > Dir[target_folder].each do |texts|
2.2.0 :011 > puts texts
2.2.0 :012?> end
/home/ziya/Desktop/etc3/example_folder/ex4.txt
/home/ziya/Desktop/etc3/example_folder/ex3.txt
/home/ziya/Desktop/etc3/example_folder/ex2.txt
/home/ziya/Desktop/etc3/example_folder/ex1.txt
iteration over text files is ok
2.2.0 :002 > Dir[target_folder].each do |texts|
2.2.0 :003 > File.open(texts, 'w') {|file| file.write("your content\n")}
2.2.0 :004?> end
results
2.2.0 :008 > system ("pwd")
/home/ziya/Desktop/etc3/example_folder
=> true
2.2.0 :009 > system("for f in *.txt; do cat $f; done")
your content
your content
your content
your content

Line replacement in directories clears the whole files?

I am trying to recursively replace a whole line from index.html files into a directory with sub-directories.
The code above puts the right lines I'm searching with the var "pattern", but when I run it, it removes everything form my index.html files.
pattern = "Keyword"
replacement = "<td width=\"30\"><img src=\"styles/img/trans.gif\" width=\"30\"></td>"
Dir.glob('/Users/root/Desktop/directory/test/**/index.html') do |item|
next unless File.file?(item)
File.open(item, "w+:ASCII-8BIT") do |f|
f.each_line do |line|
if line.match(pattern)
my_line = line
line.sub(my_line, replacement)
end
end
end
end
What am I doing wrong ?
You need to read the file first, build the expected output, and then write it:
Dir.glob('/Users/root/Desktop/directory/test/**/index.html') do |item|
next unless File.file?(item)
output = IO.readlines(item).map do |line|
if line.match(pattern)
replacement
else
line
end
end
File.open(item, "w+:ASCII-8BIT") do |f|
f.write output.join
end
end
end
You use File.open with open mode w+ which, according to Ruby documentation, is:
"w+" Read-write, truncates existing file to zero length or creates a new file for reading and writing.
To read the file and put some lines use r:
File.open(item, "r:ASCII-8BIT")

Script to append files

I am trying to write a script to do the following:
There are two directories A and B. In directory A, there are files called "today" and "today1". In directory B, there are three files called "today", "today1" and "otherfile".
I want to loop over the files in directory A and append the files that have similar names in directory B to the files in Directory A.
I wrote the method below to handle this but I am not sure if this is on track or if there is a more straightforward way to handle such a case?
Please note I am running the script from directory B.
def append_data_to_daily_files
directory = "B"
Dir.entries('B').each do |file|
fileName = file
next if file == '.' or file == '..'
File.open(File.join(directory, file), 'a') {|file|
Dir.entries('.').each do |item|
next if !(item.match(/fileName/))
File.open(item, "r")
file<<item
item.close
end
#file.puts "hello"
file.close
}
end
end
In my opinion, your append_data_to_daily_files() method is trying to do too many things -- which makes it difficult to reason about. Break down the logic into very small steps, and write a simple method for each step. Here's a start along that path.
require 'set'
def dir_entries(dir)
Dir.chdir(dir) {
return Dir.glob('*').to_set
}
end
def append_file_content(target, source)
File.open(target, 'a') { |fh|
fh.write(IO.read(source))
}
end
def append_common_files(target_dir, source_dir)
ts = dir_entries(target_dir)
ss = dir_entries(source_dir)
common_files = ts.intersection(ss)
common_files.each do |file_name|
t = File.join(target_dir, file_name)
s = File.join(source_dir, file_name)
append_file_content(t, s)
end
end
# Run script like this:
# ruby my_script.rb A B
append_common_files(*ARGV)
By using a Set, you can easily figure out the common files. By using glob you can avoid the hassle of filtering out the dot-directories. By designing the code to take its directory names from the command line (rather than hard-coding the names in the script), you end up with a potentially re-usable tool.
My solution....
def append_old_logs_to_daily_files
directory = "B"
#For each file in the folder "B"
Dir.entries('B').each do |file|
fileName = file
#skip dot directories
next if file == '.' or file == '..'
#Open each file
File.open(File.join(directory, file), 'a') {|file|
#Get each log file from the current directory in turn
Dir.entries('.').each do |item|
next if item == '.' or item == '..'
#that matches the day we are looking for
next if !(item.match(fileName))
#Read the log file
logFilesToBeCopied = File.open(item, "r")
contents = logFilesToBeCopied.read
file<<contents
end
file.close
}
end
end

How to open and read files line-by-line from a directory?

I am trying to read file lines from a directory containing about 200 text files, however, I can't get Ruby to read them line-by-line. I did it before, using one text file, not reading them from a directory.
I can get the file names as strings, but I am struggling to open them and read each line.
Here are some of the methods I've tried.
Method 1:
def readdirectory
#filearray = []
Dir.foreach('mydirectory') do |i|
# puts i.class
#filearray.push(i)
#filearray.each do |s|
# #words =IO.readlines('s')
puts s
end#do
# puts #words
end#do
end#readdirectory
Method 2:
def tryread
Dir.foreach('mydir'){
|x| IO.readlines(x)
}
end#tryread
Method 3:
def tryread
Dir.foreach('mydir') do |s|
File.readlines(s).each do |line|
sentence =line.split
end#inner do
end #do
end#tryread
With every attempt to open the string passed by the loop function, I keep getting the error:
Permission denied - . (Errno::EACCES)
sudo ruby reader.rb or whatever your filename is.
Since permissions are process based you can not read files with elevated permissions if the process reading does not have them.
Only solutions are either to run the script with more permissions or call another process which is already running with higher permissions to read for you.
Thanks for all replies,I did a bit of trial and error and got it to work.This is the syntax I used
Dir.entries('lemmatised').each do |s|
if !File.directory?(s)
file = File.open("pathname/#{s}", 'r')
file.each_line do |line|
count+=1
#words<<line.split(/[^a-zA-Z]/)
end # inner do
puts #words
end #if
end #do
Try this one,
#it'll hold the lines
f = []
#here test directory contains all the files,
#write the path as per the your computer,
#mine's as you can see, below
#fetch filenames and keep in sorted order
a = Dir.entries("c:/Users/lordsangram/desktop/test")
#read the files, line by line
Dir.chdir("c:/Users/lordsangram/desktop/test")
#beginning for i = 1, to ignore first two elements of array a,
#which has no associated file names
2.upto(a.length-1) do |i|
File.readlines("#{a[i]}").each do |line|
f.push(line)
end
end
f.each do |l|
puts l
end
#the Tin Man -> you need to avoid processing "." and ".." which are listed in Dir.foreach and give the permission denied error. A simple if should fix all your apporoaches.
Dir.foreach(ARGV[0]) do |f|
if f != "." and f != ".."
# code to process file
# example
# File.open(ARGV[0] + "\\" + f) do |file|
# end
end
end

Read Certain Lines from File

Hi just getting into Ruby, and I am trying to learn some basic file reading commands, and I haven't found any solid sources yet.
I am trying to go through certain lines from that file, til the end of the file.
So in the file where it says FILE_SOURCES I want to read all the sources til end of file, and place them in a file.
I found printing the whole file, and replacing words in the file, but I just want to read certain parts in the file.
Usually you follow a pattern like this if you're trying to extract a section from a file that's delimited somehow:
open(filename) do |f|
state = nil
while (line = f.gets)
case (state)
when nil
# Look for the line beginning with "FILE_SOURCES"
if (line.match(/^FILE_SOURCES/))
state = :sources
end
when :sources
# Stop printing if you hit something starting with "END"
if (line.match(/^END/))
state = nil
else
print line
end
end
end
end
You can change from one state to another depending on what part of the file you're in.
I would do it like this (assuming you can read the entire file into memory):
source_lines = IO.readlines('source_file.txt')
start_line = source_lines.index{ |line| line =~ /SOURCE_LINE/ } + 1
File.open( 'other_file.txt', 'w' ) do |f|
f << source_lines[ start_line..-1 ].join( "\n" )
end
Relevant methods:
IO.readlines to read the lines into an array
Array#index to find the index of the first line matching a regular expression
File.open to create a new file on disk (and automatically close it when done)
Array#[] to get the subset of lines from the index to the end
If you can't read the entire file into memory, then I'd do a simpler variation on #tadman's state-based one:
started = false
File.open( 'other_file.txt', 'w' ) do |output|
IO.foreach( 'source_file.txt' ) do |line|
if started then
output << line
elsif line =~ /FILE_SOURCES/
started = true
end
end
end
Welcome to Ruby!
File.open("file_to_read.txt", "r") {|f|
line = f.gets
until line.include?("FILE_SOURCES")
line = f.gets
end
File.open("file_to_write.txt", "w") {|new_file|
f.each_line {|line|
new_file.puts(line)
}
new_file.close
}
f.close
}
IO functions have no idea what "lines" in a file are. There's no straightforward way to skip to a certain line in a file, you'll have to read it all and ignore the lines you don't need.

Resources