Search for text in files in the path using ruby - ruby

I need to search all the *.c source files in the path to find a reference to a *.h header to find unused C headers. I wrote a ruby script but it feel very clumsy.
I create an array with all C files and an array with all the H files.
I iterate over the header file array. For each header I open each C file and look for a reference to the header.
Is there a easier or better way?
require 'ftools'
require 'find'
# add a file search
class File
def self.find(dir, filename="*.*", subdirs=true)
Dir[ subdirs ? File.join(dir.split(/\\/), "**", filename) : File.join(dir.split(/\\/), filename) ]
end
end
files = File.find(".", "*.c", true)
headers = File.find(".", "*.h", true)
headers.each do |file|
#puts "Searching for #{file}(#{File.basename(file)})"
found = 0
files.each do |cfile|
#puts "searching in #{cfile}"
if File.read(cfile).downcase.include?(File.basename(file).downcase)
found += 1
end
end
puts "#{file} used #{found} times"
end

As already pointed out, you can use Dir#glob to simplify your file-finding. You could also consider switching your loops, which would mean opening each C file once, instead of once per H file.
I'd consider going with something like the following, which ran on the Ruby source in 3 seconds:
# collect the File.basename for all h files in tree
hfile_names = Dir.glob("**/*.h").collect{|hfile| File.basename(hfile) }
h_counts = Hash.new(0) # somewhere to store the counts
Dir.glob("**/*.c").each do |cfile| # enumerate the C files
file_text = File.read(cfile) # downcase here if necessary
hfile_names.each do |hfile|
h_counts[hfile] += 1 if file_text.include?(hfile)
end
end
h_counts.each { |file, found| puts "#{file} used #{found} times" }
EDIT: That won't list H files not referenced in any C files. To be certain to catch those, the hash would have to be explicitly initialised:
h_counts = {}
hfile_names.each { |hfile| h_counts[hfile] = 0 }

To search *.c and *.h files, you could use Dir.glob
irb(main):012:0> Dir.glob("*.[ch]")
=> ["test.c", "test.h"]
To search across any subdirectory, you can pass **/*
irb(main):013:0> Dir.glob("**/*.[ch]")
=> ["src/Python-2.6.2/Demo/embed/demo.c", "src/Python-2.6.2/Demo/embed/importexc.c",
.........

Well, once you've found your .c files, you can do this to them:
1) open the file and store the text in a variable
2) use 'grep' : http://ruby-doc.org/core/classes/Enumerable.html#M003121

FileList in the Rake API is very useful for this. Just be aware of the list size growing larger than you have memory to handle. :)
http://rake.rubyforge.org/

Related

How do I open each file in a directory with Ruby?

I need to open each file inside a directory. My attempt at this looks like:
Dir.foreach('path/to/directory') do |filename|
next if filename == '.' || filename == '..'
puts "working on #{filename}"
# this is where it crashes
file = File.open(filename, 'r')
#some code
file.close
# more code
end
My code keeps crashing at File.open(filename, 'r'). I'm not sure what filename should be.
The filename should include the path to the file when the file is not in the same directory than the Ruby file itself:
path = 'path/to/directory'
Dir.foreach(path) do |filename|
next if filename == '.' || filename == '..'
puts "working on #{filename}"
file = File.open("#{path}/#{filename}", 'r')
#some code
file.close
# more code
end
I recommend using Find.find.
While we can use various methods from the Dir class, it will look and retrieve the list of files before returning, which can be costly if we're recursively searching multiple directories or have a huge number of files embedded in the directories.
Instead, Find.find will walk the directories, returning both the directories and files as each is found. A simple check lets us decide which we want to continue processing or whether we want to skip it. The documentation has this example which should be easy to understand:
The Find module supports the top-down traversal of a set of file paths.
For example, to total the size of all files under your home directory, ignoring anything in a “dot” directory (e.g. $HOME/.ssh):
require 'find'
total_size = 0
Find.find(ENV["HOME"]) do |path|
if FileTest.directory?(path)
if File.basename(path)[0] == ?.
Find.prune # Don't look any further into this directory.
else
next
end
else
total_size += FileTest.size(path)
end
end
I'd go for Dir.glob or File.find. But not Dir.foreach as it returns . and .. which you don't want.
Dir.glob('something/*').each do |filename|
next if File.directory?(filename)
do_something_with_the_file(filename)
end

Ruby - iterate tasks with files

I am struggling to iterate tasks with files in Ruby.
(Purpose of the program = every week, I have to save 40 pdf files off the school system containing student scores, then manually compare them to last week's pdfs and update one spreadsheet with every student who has passed their target this week. This is a task for a computer!)
I have converted a pdf file to text, and my program then extracts the correct data from the text files and turns each student into an array [name, score, house group]. It then checks each new array against the data in the csv file, and adds any new results.
My program works on a single pdf file, because I've manually typed in:
f = File.open('output\agb summer report.txt')
agb = []
f.each_line do |line|
agb.push line
end
But I have a whole folder of pdf files that I want to run the program on iteratively. I've also had problems when I try to write each result to a new-named file.
I've tried things with variables and code blocks, but I now don't think you can use a variable in that way?
Dir.foreach('output') do |ea|
f = File.open(ea)
agb = []
f.each_line do |line|
agb.push line
end
end
^ This doesn't work. I've also tried exporting the directory names to an array, and doing something like:
a.each do |ea|
var = '\'output\\' + ea + '\''
f = File.open(var)
agb = []
f.each_line do |line|
agb.push line
end
end
I think I'm fundamentally confused about the sorts of object File and Dir are? I've searched a lot and haven't found a solution yet. I am fairly new to Ruby.
Anyway, I'm sure this can be done - my current backup plan is to copy my program 40 times with different details, but that sounds absurd. Please offer thoughts?
You're very close. Dir.foreach() will return the name of the files whereas File.open() is going to want the path. A crude example to illustrate this:
directory = 'example_directory'
Dir.foreach(directory) do |file|
# Assuming Unix style filesystem, skip . and ..
next if file.start_with? '.'
# Simply puts the contents
path = File.join(directory, file)
puts File.read(path)
end
Use Globbing for File Lists
You need to use Dir#glob to get your list of files. For example, given three PDF files in /tmp/pdf, you collect them with a glob like so:
Dir.glob('/tmp/pdf/*pdf')
# => ["/tmp/pdf/1.pdf", "/tmp/pdf/2.pdf", "/tmp/pdf/3.pdf"]
Dir.glob('/tmp/pdf/*pdf').class
# => Array
Once you have a list of filenames, you can iterate over them with something like:
Dir.glob('/tmp/pdf/*pdf').each do |pdf|
text = %x(pdftotext "#{pdf}")
# do something with your textual data
end
If you're on a Windows system, then you might need a gem like pdf-reader or something else from Ruby Toolbox that suits you better to actually parse the PDF. Regardless, you should use globbing to create a file list; what you do after that depends on what kind of data the file actually holds. IO#read and descendants like File#read are good places to start.
Handling Text Files
If you're dealing with text files rather than PDF files, then something like this will get you started:
Dir.glob('/tmp/pdf/*txt').each do |text|
# Do something with your textual data. In this case, just
# dump the files to standard output.
p File.read(text)
end
You can use Dir.new("./") to get all the files in the current directory
so something like this should work.
file_names = Dir.new "./"
file_names.each do |file_name|
if file_name.end_with? ".txt"
f = File.open(file_name)
agb = []
f.each_line do |line|
agb.push line
end
end
end
btw, you can just use agb = f.to_a to convert the file contents into an array were each element is a line from the file.
file_names = Dir.new "./"
file_names.each do |file_name|
if file_name.end_with? ".txt"
f = File.open file_name
agb = f.to_a
# do whatever processing you need to do
end
end
if you assign your target folder like this /path/to/your/folder/*.txt it will only iterate over text files.
2.2.0 :009 > target_folder = "/home/ziya/Desktop/etc3/example_folder/*.txt"
=> "/home/ziya/Desktop/etc3/example_folder/*.txt"
2.2.0 :010 > Dir[target_folder].each do |texts|
2.2.0 :011 > puts texts
2.2.0 :012?> end
/home/ziya/Desktop/etc3/example_folder/ex4.txt
/home/ziya/Desktop/etc3/example_folder/ex3.txt
/home/ziya/Desktop/etc3/example_folder/ex2.txt
/home/ziya/Desktop/etc3/example_folder/ex1.txt
iteration over text files is ok
2.2.0 :002 > Dir[target_folder].each do |texts|
2.2.0 :003 > File.open(texts, 'w') {|file| file.write("your content\n")}
2.2.0 :004?> end
results
2.2.0 :008 > system ("pwd")
/home/ziya/Desktop/etc3/example_folder
=> true
2.2.0 :009 > system("for f in *.txt; do cat $f; done")
your content
your content
your content
your content

Script to append files

I am trying to write a script to do the following:
There are two directories A and B. In directory A, there are files called "today" and "today1". In directory B, there are three files called "today", "today1" and "otherfile".
I want to loop over the files in directory A and append the files that have similar names in directory B to the files in Directory A.
I wrote the method below to handle this but I am not sure if this is on track or if there is a more straightforward way to handle such a case?
Please note I am running the script from directory B.
def append_data_to_daily_files
directory = "B"
Dir.entries('B').each do |file|
fileName = file
next if file == '.' or file == '..'
File.open(File.join(directory, file), 'a') {|file|
Dir.entries('.').each do |item|
next if !(item.match(/fileName/))
File.open(item, "r")
file<<item
item.close
end
#file.puts "hello"
file.close
}
end
end
In my opinion, your append_data_to_daily_files() method is trying to do too many things -- which makes it difficult to reason about. Break down the logic into very small steps, and write a simple method for each step. Here's a start along that path.
require 'set'
def dir_entries(dir)
Dir.chdir(dir) {
return Dir.glob('*').to_set
}
end
def append_file_content(target, source)
File.open(target, 'a') { |fh|
fh.write(IO.read(source))
}
end
def append_common_files(target_dir, source_dir)
ts = dir_entries(target_dir)
ss = dir_entries(source_dir)
common_files = ts.intersection(ss)
common_files.each do |file_name|
t = File.join(target_dir, file_name)
s = File.join(source_dir, file_name)
append_file_content(t, s)
end
end
# Run script like this:
# ruby my_script.rb A B
append_common_files(*ARGV)
By using a Set, you can easily figure out the common files. By using glob you can avoid the hassle of filtering out the dot-directories. By designing the code to take its directory names from the command line (rather than hard-coding the names in the script), you end up with a potentially re-usable tool.
My solution....
def append_old_logs_to_daily_files
directory = "B"
#For each file in the folder "B"
Dir.entries('B').each do |file|
fileName = file
#skip dot directories
next if file == '.' or file == '..'
#Open each file
File.open(File.join(directory, file), 'a') {|file|
#Get each log file from the current directory in turn
Dir.entries('.').each do |item|
next if item == '.' or item == '..'
#that matches the day we are looking for
next if !(item.match(fileName))
#Read the log file
logFilesToBeCopied = File.open(item, "r")
contents = logFilesToBeCopied.read
file<<contents
end
file.close
}
end
end

How do I get all the files names in one folder using Ruby?

These are in a folder:
This_is_a_very_good_movie-y08iPnx_ktA.mp4
myMovie2-lKESbDzUwUg.mp4
his_is_another_movie-lKESbDzUwUg.mp4
How do I fetch the first part of the string mymovie1 from the file by giving the last part, y08iPnx_ktA? Something like:
get_first_part("y08iPnx_kTA") #=> "This_is_a_very_good_movie"
Break the problem into into parts. The method get_first_part should go something like:
Use Dir to get a listing of files.
Iterate over each file and;
Extract the "name" ('This_is_a_very_good_movie') and the "tag" ('y08iPnx_ktA'). The same regex should be used for each file.
If the "tag" matches what is being looked for, return "name".
Happy coding.
Play around in the REPL and have fun :-)
def get_first_part(path, suffix)
Dir.entries(path).find do |fname|
File.basename(fname, File.extname(fname)).end_with?(suffix)
end.split(suffix).first
end
Kind of expands on the answer from #Steve Wilhelm -- except doesn't use glob (there's no need for it when we're only working with filenames), avoids Regexp and uses File.exname(fname) to the File.basename call so you don't have to include the file extension. Also returns the string "This_is_a_very_good_movie" instead of an array of files.
This will of course raise if no file could be found.. in which case if you just want to return nil if a match couldn't be found:
def get_first_part(path, suffix)
file = Dir.entries(path).find do |fname|
File.basename(fname, File.extname(fname)).end_with?(suffix)
end
file.split(suffix).first if file
end
Can it be done cleaner than this? REVISED based on #Tin Man's suggestion
def get_first_part(path, suffix)
Dir.glob(path + "*" + suffix + "*").map { |x| File.basename(x).gsub(Regexp.new("#{suffix}\.*$"),'') }
end
puts get_first_part("/path/to/files/", "-y08iPnx_kTA")
If the filenames only have a single hyphen:
path = '/Users/greg/Desktop/test'
target = 'rb'
def get_files(path, target)
Dir.chdir(path) do
return Dir["*#{ target }*"].map{ |f| f.split('-').first }
end
end
puts get_files(path, 'y08iPnx_ktA')
# >> This_is_a_very_good_movie
If there are multiple hyphens:
def get_files(path, target)
Dir.chdir(path) do
return Dir["*#{ target }*"].map{ |f| f.split(target).first.chop }
end
end
puts get_files(path, 'y08iPnx_ktA')
# >> This_is_a_very_good_movie
If the code is assumed to be running from inside the directory containing the files, then Dir.chdir can be removed, simplifying things to either:
puts Dir["*#{ target }*"].map{ |f| f.split('-').first }
# >> This_is_a_very_good_movie
or
puts Dir["*#{ target }*"].map{ |f| f.split(target).first.chop }
# >> This_is_a_very_good_movie

how do i get Ruby FileList to pick up files without a name, like .htaccess on windows

I want to search my filesystem for any files with the extension .template.
The below works fine for everything except .htaccess.template
FileList.new(File.join(root, '**', '*.template')).each do |file|
# do stuff with file
end
because windows doesn't like nameless files, grrrr
How do I make this work on Windows? This code works fine on Linux....
How about
Dir.glob([".*.template", "*.template"])
Assuming that FileList here is the FileList class from rake then the problem is in Ruby's underlying Dir class (which is used by FileList) not matching files starting with . for the * wildcard. The relevant portion of rake.rb is
# Add matching glob patterns.
def add_matching(pattern)
Dir[pattern].each do |fn|
self << fn unless exclude?(fn)
end
end
Below is an ugly hack that overrides add_matching to also include files starting with . Hopefully someone else will be along to suggest a more elegant solution.
class Rake::FileList
def add_matching(pattern)
files = Dir[pattern]
# ugly hack to include files starting with . on Windows
if RUBY_PLATFORM =~ /mswin/
parts = File.split(pattern)
# if filename portion of the pattern starts with * also
# include the files matching '.' + the same pattern
if parts.last[0] == ?*
files += Dir[File.join(parts[0...-1] << '.' + parts.last)]
end
end
files.each do |fn|
self << fn unless exclude?(fn)
end
end
end
Update: I have just tested this on Linux here and the files starting with . are not included either. e.g. If I have a directory /home/mikej/root with 2 subdirectories a and b where each contains first.template and .other.template then
Rake::FileList.new('home/mikej/root/**/*.template')
=> ["/home/mikej/root/a/first.template", "/home/mikej/root/b/first.template"]
so I would double check the behaviour on Linux and verify that there isn't something else causing the difference in behaviour.

Resources