Ruby - iterate tasks with files - ruby

I am struggling to iterate tasks with files in Ruby.
(Purpose of the program = every week, I have to save 40 pdf files off the school system containing student scores, then manually compare them to last week's pdfs and update one spreadsheet with every student who has passed their target this week. This is a task for a computer!)
I have converted a pdf file to text, and my program then extracts the correct data from the text files and turns each student into an array [name, score, house group]. It then checks each new array against the data in the csv file, and adds any new results.
My program works on a single pdf file, because I've manually typed in:
f = File.open('output\agb summer report.txt')
agb = []
f.each_line do |line|
agb.push line
end
But I have a whole folder of pdf files that I want to run the program on iteratively. I've also had problems when I try to write each result to a new-named file.
I've tried things with variables and code blocks, but I now don't think you can use a variable in that way?
Dir.foreach('output') do |ea|
f = File.open(ea)
agb = []
f.each_line do |line|
agb.push line
end
end
^ This doesn't work. I've also tried exporting the directory names to an array, and doing something like:
a.each do |ea|
var = '\'output\\' + ea + '\''
f = File.open(var)
agb = []
f.each_line do |line|
agb.push line
end
end
I think I'm fundamentally confused about the sorts of object File and Dir are? I've searched a lot and haven't found a solution yet. I am fairly new to Ruby.
Anyway, I'm sure this can be done - my current backup plan is to copy my program 40 times with different details, but that sounds absurd. Please offer thoughts?

You're very close. Dir.foreach() will return the name of the files whereas File.open() is going to want the path. A crude example to illustrate this:
directory = 'example_directory'
Dir.foreach(directory) do |file|
# Assuming Unix style filesystem, skip . and ..
next if file.start_with? '.'
# Simply puts the contents
path = File.join(directory, file)
puts File.read(path)
end

Use Globbing for File Lists
You need to use Dir#glob to get your list of files. For example, given three PDF files in /tmp/pdf, you collect them with a glob like so:
Dir.glob('/tmp/pdf/*pdf')
# => ["/tmp/pdf/1.pdf", "/tmp/pdf/2.pdf", "/tmp/pdf/3.pdf"]
Dir.glob('/tmp/pdf/*pdf').class
# => Array
Once you have a list of filenames, you can iterate over them with something like:
Dir.glob('/tmp/pdf/*pdf').each do |pdf|
text = %x(pdftotext "#{pdf}")
# do something with your textual data
end
If you're on a Windows system, then you might need a gem like pdf-reader or something else from Ruby Toolbox that suits you better to actually parse the PDF. Regardless, you should use globbing to create a file list; what you do after that depends on what kind of data the file actually holds. IO#read and descendants like File#read are good places to start.
Handling Text Files
If you're dealing with text files rather than PDF files, then something like this will get you started:
Dir.glob('/tmp/pdf/*txt').each do |text|
# Do something with your textual data. In this case, just
# dump the files to standard output.
p File.read(text)
end

You can use Dir.new("./") to get all the files in the current directory
so something like this should work.
file_names = Dir.new "./"
file_names.each do |file_name|
if file_name.end_with? ".txt"
f = File.open(file_name)
agb = []
f.each_line do |line|
agb.push line
end
end
end
btw, you can just use agb = f.to_a to convert the file contents into an array were each element is a line from the file.
file_names = Dir.new "./"
file_names.each do |file_name|
if file_name.end_with? ".txt"
f = File.open file_name
agb = f.to_a
# do whatever processing you need to do
end
end

if you assign your target folder like this /path/to/your/folder/*.txt it will only iterate over text files.
2.2.0 :009 > target_folder = "/home/ziya/Desktop/etc3/example_folder/*.txt"
=> "/home/ziya/Desktop/etc3/example_folder/*.txt"
2.2.0 :010 > Dir[target_folder].each do |texts|
2.2.0 :011 > puts texts
2.2.0 :012?> end
/home/ziya/Desktop/etc3/example_folder/ex4.txt
/home/ziya/Desktop/etc3/example_folder/ex3.txt
/home/ziya/Desktop/etc3/example_folder/ex2.txt
/home/ziya/Desktop/etc3/example_folder/ex1.txt
iteration over text files is ok
2.2.0 :002 > Dir[target_folder].each do |texts|
2.2.0 :003 > File.open(texts, 'w') {|file| file.write("your content\n")}
2.2.0 :004?> end
results
2.2.0 :008 > system ("pwd")
/home/ziya/Desktop/etc3/example_folder
=> true
2.2.0 :009 > system("for f in *.txt; do cat $f; done")
your content
your content
your content
your content

Related

How to properly automate xml to xls

I am getting a lot of xml files recently, that i want to analyse in excel. In stead of using the xml conversion standard in (newer versions of) excel, I want to use a Ruby code that does it for a number of files automatically.
I am not very familiar, however, with rexml. After half a days work I got the code to convert just one(!) xml node. This is how it looks:
require 'rexml/document'
Dir.glob("FILES/archive/*.xml") do |eksemel|
puts "converting #{eksemel}"
filename = (/\d+/.match(eksemel)).to_s
xml_file = File.open("#{eksemel}", "r")
csv_file = File.new("#{filename}.csv", "w")
xml = REXML::Document.new( xml_file )
counter = 0
xml.elements.each("RESULTS") do |e|
e.elements.each("component") do |f|
f.elements.each("paragraph") do |g|
counter = counter + 1
csv_file.puts g.text
end
end
end
end
Is there a way to a) instead of define the names of the elements and the number let ruby do it automatically and b) save all of these as separate columns in a csv file?
It isn't clear what you are using counter for. It would also help if you clarified what kind of structure the XML file has (for instance, are there many <paragraph> elements within each <component> element?). But, here is a cleaner way to write what I think you shooting for:
require 'rexml/document'
require 'csv'
Dir.glob('FILES/archive/*.xml') do |eksemel|
puts "converting #{eksemel}"
# I assume you are creating a .csv file with the same name as your .xml file
xml_file = File.new(eksemel)
csv_file = CSV.open(eksemel.sub(/\.xml$/, '.csv'), 'w')
xml = REXML::Document.new(xml_file)
counter = xml.elements.to_a('RESULTS//component//paragraph').length
xml.elements.each('RESULTS//component') do |component|
csv_file << component.elements.to_a('paragraph')
end
[xml_file, csv_file].each {|f| f.close}
end

In Ruby- Parsing Directory and reading first row of the file

Below is the piece of code that is supposed read the directory and for each file entry prints the first row of the file. The issue is x is not visible so file is not being parsed.
Dir.foreach("C:/fileload/src") do |file_name|
x = file_name
puts x
f = File.open("C:/fileload/src/" +x)
f.readlines[1..1].each do |line|
puts line
end
end
Why are you assigning x to file_name? You can use file_name directly. And if you are only reading the first line of the file, why not try this?
#!/usr/bin/ruby
dir = "C:/fileload/src"
Dir.foreach(dir) do |file_name|
full = File.join(dir, file_name)
if File.file?(full)
f = File.open(full)
puts f.first
f.close
end
end
You should use File.join to safely combine paths in Ruby. I also checked that you are opening a file using the File.file? method.
You have no visibility issue with x. You should be using File::join or Pathname#+ to build your file paths. You should exclude non-files from consideration. You're selecting the second line, not the first with [1..1]. Here's a cleaner correct replacement for your sample code.
dir = "C:/fileload/src"
Dir.foreach(dir).
map { |fn| File.join(dir,fn) }.
select { |fn| File.file?(fn) }.
each { |fn| puts File.readlines(fn).first }

How to open and read files line-by-line from a directory?

I am trying to read file lines from a directory containing about 200 text files, however, I can't get Ruby to read them line-by-line. I did it before, using one text file, not reading them from a directory.
I can get the file names as strings, but I am struggling to open them and read each line.
Here are some of the methods I've tried.
Method 1:
def readdirectory
#filearray = []
Dir.foreach('mydirectory') do |i|
# puts i.class
#filearray.push(i)
#filearray.each do |s|
# #words =IO.readlines('s')
puts s
end#do
# puts #words
end#do
end#readdirectory
Method 2:
def tryread
Dir.foreach('mydir'){
|x| IO.readlines(x)
}
end#tryread
Method 3:
def tryread
Dir.foreach('mydir') do |s|
File.readlines(s).each do |line|
sentence =line.split
end#inner do
end #do
end#tryread
With every attempt to open the string passed by the loop function, I keep getting the error:
Permission denied - . (Errno::EACCES)
sudo ruby reader.rb or whatever your filename is.
Since permissions are process based you can not read files with elevated permissions if the process reading does not have them.
Only solutions are either to run the script with more permissions or call another process which is already running with higher permissions to read for you.
Thanks for all replies,I did a bit of trial and error and got it to work.This is the syntax I used
Dir.entries('lemmatised').each do |s|
if !File.directory?(s)
file = File.open("pathname/#{s}", 'r')
file.each_line do |line|
count+=1
#words<<line.split(/[^a-zA-Z]/)
end # inner do
puts #words
end #if
end #do
Try this one,
#it'll hold the lines
f = []
#here test directory contains all the files,
#write the path as per the your computer,
#mine's as you can see, below
#fetch filenames and keep in sorted order
a = Dir.entries("c:/Users/lordsangram/desktop/test")
#read the files, line by line
Dir.chdir("c:/Users/lordsangram/desktop/test")
#beginning for i = 1, to ignore first two elements of array a,
#which has no associated file names
2.upto(a.length-1) do |i|
File.readlines("#{a[i]}").each do |line|
f.push(line)
end
end
f.each do |l|
puts l
end
#the Tin Man -> you need to avoid processing "." and ".." which are listed in Dir.foreach and give the permission denied error. A simple if should fix all your apporoaches.
Dir.foreach(ARGV[0]) do |f|
if f != "." and f != ".."
# code to process file
# example
# File.open(ARGV[0] + "\\" + f) do |file|
# end
end
end

Strange number conversion while reading a csv file with ruby

i've got a strange problem in ruby on rails
There is a csv file, made with Excel 2003.
5437390264172534;Mark;5
I have a page with upload input and i read the file like this:
file = params[:upload]['datafile']
file.read.split("\n").each do |line|
num,name,type = line.split(";")
logger.debug "row: #{num} #{name} #{type}"
end
etc
So. finally i've got the following:
num = 5437...2534
name = Mark
type = 5
Why num has so strange value?
Also i tried to do like this:
str = file.read
csv = CSV.parse(str)
csv.each do |line|
RAILS_DEFAULT_LOGGER.info "######## #{line.to_yaml}"
end
but again i got
######## ---
- !str:CSV::Cell "5437...2534;Mark;5"
The csv file in win1251 (i can't change file encoding)
ruby file in UTF8
ruby version 1.8.4
rails version 2.0.2
If it indeed has a strange value, it probably has to to do with the code you didn't post. Edit your question, and include the smallest bit of code that will run independently and still produce your questionable output.
split() returns an array of strings. So the first value of your CSV file is a String, not a Bignum. Maybe you need num.to_i, or a test like num.is_a?(Bignum) somewhere in your code.
file = File.open("test.csv", "r")
# Just getting the first line
line = file.gets
num,name,type = line.split(";")
# split() returns an array of String
puts num.class
puts num
# Make num a number
puts num.to_i.class
puts num.to_i
file.close
Running that file here gives me this:
$ ruby test.rb
String
5437390264172534
Bignum
5437390264172534

Search for text in files in the path using ruby

I need to search all the *.c source files in the path to find a reference to a *.h header to find unused C headers. I wrote a ruby script but it feel very clumsy.
I create an array with all C files and an array with all the H files.
I iterate over the header file array. For each header I open each C file and look for a reference to the header.
Is there a easier or better way?
require 'ftools'
require 'find'
# add a file search
class File
def self.find(dir, filename="*.*", subdirs=true)
Dir[ subdirs ? File.join(dir.split(/\\/), "**", filename) : File.join(dir.split(/\\/), filename) ]
end
end
files = File.find(".", "*.c", true)
headers = File.find(".", "*.h", true)
headers.each do |file|
#puts "Searching for #{file}(#{File.basename(file)})"
found = 0
files.each do |cfile|
#puts "searching in #{cfile}"
if File.read(cfile).downcase.include?(File.basename(file).downcase)
found += 1
end
end
puts "#{file} used #{found} times"
end
As already pointed out, you can use Dir#glob to simplify your file-finding. You could also consider switching your loops, which would mean opening each C file once, instead of once per H file.
I'd consider going with something like the following, which ran on the Ruby source in 3 seconds:
# collect the File.basename for all h files in tree
hfile_names = Dir.glob("**/*.h").collect{|hfile| File.basename(hfile) }
h_counts = Hash.new(0) # somewhere to store the counts
Dir.glob("**/*.c").each do |cfile| # enumerate the C files
file_text = File.read(cfile) # downcase here if necessary
hfile_names.each do |hfile|
h_counts[hfile] += 1 if file_text.include?(hfile)
end
end
h_counts.each { |file, found| puts "#{file} used #{found} times" }
EDIT: That won't list H files not referenced in any C files. To be certain to catch those, the hash would have to be explicitly initialised:
h_counts = {}
hfile_names.each { |hfile| h_counts[hfile] = 0 }
To search *.c and *.h files, you could use Dir.glob
irb(main):012:0> Dir.glob("*.[ch]")
=> ["test.c", "test.h"]
To search across any subdirectory, you can pass **/*
irb(main):013:0> Dir.glob("**/*.[ch]")
=> ["src/Python-2.6.2/Demo/embed/demo.c", "src/Python-2.6.2/Demo/embed/importexc.c",
.........
Well, once you've found your .c files, you can do this to them:
1) open the file and store the text in a variable
2) use 'grep' : http://ruby-doc.org/core/classes/Enumerable.html#M003121
FileList in the Rake API is very useful for this. Just be aware of the list size growing larger than you have memory to handle. :)
http://rake.rubyforge.org/

Resources