I am having issue with regular expressions. So basically I have a folder and this folder contains sub folders as well as files. I have to check for certain words in those folders. The words I have to check for are located in a file called words.txt file.
This is the code that I have so far in Ruby:
def check_words
array_of_words = File.readlines('words.txt')
re = Regexp.union(array_of_words)
new_array_of_words = [/\b(?:#{re.source})\b/]
Dir['source_test/**/*'].select{|f| File.file?(f) }.each do |filepath|
new_array_of_words.each do |word|
puts File.foreach(filepath).include?(word)
end
end
end
When I execute this code I keep getting false even though some of the files inside the folders/subfolders contains those words.
new_array_of_words is a single regex, and the include? methods acts on strings (and it doesn't make much sense to iterate over a single regex anyway).
You can keep using the regex, but use regex methods instead.
You can also fix your current code as follows:
arr = File.readlines('/home/afifit/Tests/word.txt')
arr.select { |e| e.strip! }
Dir['yourdir/*'].select{|f| File.file?(f) }.each do |filepath|
arr.each do |word|
puts File.foreach(filepath).include?(word)
end
end
I also used strip to remove any unnecessary whitespaces and newlines.
Related
I have a collection of folders (within a folder) that all need to be renamed based on their contents.
Specifically, I'd like to rename "/working_directory/my_folder/my_file.extension" to /working_directory/my_file/my_file.extension"
There are a few other files within /my_folder/. How might I recursively do this using ruby?
I'm new to ruby and programming, I have so tried to just extract the file names, but have not have much luck. The attempt at itterating through the folders. This will cycle through /working_directory/ every time Find.find is called. The intent is to search /working_directory/my_folder/ only for the file with the .fls extension.
require 'find'
Path = "/working_directory/"
Dir.foreach(Path) do |file|
puts file
new_dir = Path+file
puts new_dir
Find.find(new_dir) do |i| # this is intended to by /working_directory/my_folder/
fls_file << i if i =~ /.*\.fls$/
puts fls_file
end
end
Assuming, the my_file is to be chosen by extension, one might do:
Dir["/working_directory/**/*"].select do |dir_or_file|
File.directory? dir_or_file # select only directories, recursively
end.inject({}) do |memo, dir|
new_name = Dir["#{dir}/*.extension"].to_a
unless new_name.size == 1 # check if the folder contains only one proper file
puts "Multiple/No choices; can not rename dir [#{dir}] ⇒ skipping..."
next memo # skip if no condition met
end
my_file = new_name.first[/[^\/]+(?=\.extension\z)/] # get my_name
memo[dir] = dir.gsub /[^\/]+(?=\/#{myfile}\.extension\z)/, my_file
memo
end.each do |old, neu|
# dry run to make sure everything is OK
puts "Gonna rename #{old} to #{neu}"
# uncomment the lines below as you are certain the code works properly
# neu_folder = neu[/(.*?)([^\/]+\z)/, 1]
# FileUtils.mkdir neu_folder unless File.exist? neu_folder
# FileUtils.mv old, neu # rename
end
The rename is done after the main processing for the sake of previous iterator consistency, probably in this case it might be done in the previous loop, instead of injecting old: neu pairs into hash and iterating it later.
We are heavily using string parsing with regexps here.
my_file = new_name.first[/[^\/]+(?=\.extension\z)/] # get my_name
this line gets a new folder name by parsing a tail of the string, containing no slashes and trailing with '.extension\z' (see positive lookahead.)
memo[dir] = dir.gsub /[^\/]+(?=\/#{myfile}\.extension\z)/, my_file
This line assigns a new element on an accumulator hash, substituting the old folder name with the new one.
I have a file with the following data:
other data
user1=name1
user2=name2
user3=name3
other data
to extract the names I do the following
names = File.open('resource.cfg', 'r') do |f|
f.grep(/[a-z][a-z][0-9]/)
end
which returns the following array
user1=name1
user2=name2
user3=name3
but I really want only the name part
name1
name2
name3
Right now I'm doing this after the file step:
names = names.map do |name|
name[7..9]
end
is there a better way to do? with the file step
You could do it like this, using String#scan with a regex:
Code
File.read(FNAME).scan(/(?<==)[A-Za-z]+\d+$/)
Explanation
Let's start by constructing a file:
FNAME = "my_file"
lines =<<_
other data
user1=name1
user2=name2
user3=name3
other data
_
File.write(FNAME,lines)
We can confirm the file contents:
puts File.read(FNAME)
other data
user1=name1
user2=name2
user3=name3
other data
Now run the code::
File.read(FNAME).scan(/(?<==)[A-Za-z]+\d+$/)
#=> ["name1", "name2", "name3"]
A word about the regex I used.
(?<=...)
is called a "positive lookbehind". Whatever is inserted in place of the dots must immediately precede the match, but is not part of the match (and for that reason is sometimes referred to as as "zero-length" group). We want the match to follow an equals sign, so the "positive lookbehind" is as follows:
(?<==)
This is followed by one or more letters, then one or more digits, then an end-of-line, which comprise the pattern to be matched. You could of course change this if you have different requirements, such as names being lowercase or beginning with a capital letter, a specified number of digits, and so on.
Is your code working as you have posted it?
names = File.open('resource.cfg', 'r') { |f| f.grep(/[a-z][a-z][0-9]/) }
names = names.map { |name| name[7..9] }
=> ["ame", "ame", "ame"]
You could make it into a neat little one-liner by writing it as such:
names = File.readlines('resource.cfg').grep(/=(\w*)/) { |x| x.split('=')[1].chomp }
You can do it all in a single step:
names = File.open('resource.cfg', 'r') do |f|
f.grep(/[a-z][a-z][0-9]/).map {|x| x.split('=')[1]}
end
Given a directory with about 100 000 small files (each files is about 1kB).
I need to get list of these files and iterate over it in order to find files with the same name but different case (the files are on Linux ext4 FS).
Currently, I use some code like this:
def similar_files_in_folder(file_path, folder, exclude_folders = false)
files = Dir.glob(file_path, File::FNM_CASEFOLD)
files_set = files.select{|f| f.start_with?(folder)}
return files_set unless exclude_folders
files_set.reject{|entry| File.directory? entry}
end
dir_entries = Dir.entries(#directory) - ['.', '..']
dir_entries.map do |file_name|
similar_files_in_folder(file_name, #directory)
end
The issue with this approach is that the snippet takes a lot!!! of time to finish.
It is about some hours on my system.
Is there another way to achieve the same goal but much faster in Ruby?
Limitation: I can't load the file list in memory and then just compare the names in down case, because in the #directory new files are appear.
So, I need to scan the #directory on each iteration.
Thanks for any hint.
If I understand your code correctly, this already returns an array of all those 100k filenames:
dir_entries = Dir.entries(#directory) - ['.', '..']
#=> ["foo.txt", "bar.txt", "BAR.txt", ...]
I would group this array by the lowercase filename:
dir_entries.group_by(&:downcase)
#=> {"foo.txt"=>["foo.txt"], "bar.txt"=>["bar.txt", "BAR.txt"], ... }
And select the ones with more than 1 occurrences:
dir_entries.group_by(&:downcase).select { |k, v| v.size > 1 }
#=> {"bar.txt"=>["bar.txt", "BAR.txt"], ...}
What I meant by my comment was that you could search for a string as you traverse the filesystem, instead of first building up a huge array of all possible files and only then searching. I wrote something similar to a linux find <path> | grep --color -i <pattern> , except highlighting the pattern only in basename:
require 'find'
#find files whose basename matches a pattern (and output results to console)
def find_similar(s, opts={})
#by default, path is '.', case insensitive, no bash terminal coloring
opts[:verbose] ||= false
opts[:path] ||= '.'
opts[:insensitive]=true if opts[:insensitive].nil?
opts[:color]||=false
boldred = "\e[1m\e[31m\\1\e[0m" #contains an escaped \1 for regex
puts "searching for \"#{s}\" in \"#{opts[:path]}\", insensitive=#{opts[:insensitive]}..." if opts[:verbose]
reg = opts[:insensitive] ? /(#{s})/i : /(#{s})/
dir,base = '',''
Find.find(opts[:path]) {|path|
dir,base = File.dirname(path), File.basename(path)
if base =~ reg
if opts[:color]
puts "#{dir}/#{base.gsub(reg, boldred)}"
else
puts path
end
end
}
end
time = Time.now
#find_similar('LOg', :color=>true) #similar to find . | grep --color -i LOg
find_similar('pYt', :path=>'c:/bin/sublime3/', :color=>true, :verbose=>true)
puts "search took #{Time.now-time}sec"
example output (cygwin), but also works if run from cmd.exe
Please forgive my ignorance, I am new to Ruby.
I know how to search a string, or even a single file with a regular expression:
str = File.read('example.txt')
match = str.scan(/[0-9A-Za-z]{8,8}/)
puts match[1]
I know how to search for a static phrase in multiple files and directories
pattern = "hello"
Dir.glob('/home/bob/**/*').each do |file|
next unless File.file?(file)
File.open(file) do |f|
f.each_line do |line|
puts "#{pattern}" if line.include?(pattern)
end
end
end
I can not figure out how to use my regexp against multiple files and directories. Any and all help is much appreciated.
Well, you're quite close. First make pattern a Regexp object:
pattern = /hello/
Or if you are trying to make a Regexp from a String (like passed in on the command line), you might try:
pattern = Regexp.new("hello")
# or use first argument for regexp
pattern = Regexp.new(ARGV[0])
Now when you are searching, line is a String. You can use match or scan to get the results of it matching against your pattern.
f.each_line do |line|
if line.match(pattern)
puts $0
end
# or
if !(match_data = line.match(pattern)).nil?
puts match_data[0]
end
# or to see multiple matches
unless (matches = line.scan(pattern)).empty?
p matches
end
end
Is there a way to open a file case-insensitively in Ruby under Linux? For example, given the string foo.txt, can I open the file FOO.txt?
One possible way would be reading all the filenames in the directory and manually search the list for the required file, but I'm looking for a more direct method.
One approach would be to write a little method to build a case insensitive glob for a given filename:
def ci_glob(filename)
glob = ''
filename.each_char do |c|
glob += c.downcase != c.upcase ? "[#{c.downcase}#{c.upcase}]" : c
end
glob
end
irb(main):024:0> ci_glob('foo.txt')
=> "[fF][oO][oO].[tT][xX][tT]"
and then you can do:
filename = Dir.glob(ci_glob('foo.txt')).first
Alternatively, you can write the directory search you suggested quite concisely. e.g.
filename = Dir.glob('*').find { |f| f.downcase == 'foo.txt' }
Prior to Ruby 3.1 it was possible to use the FNM_CASEFOLD option to make glob case insensitive e.g.
filename = Dir.glob('foo.txt', File::FNM_CASEFOLD).first
if filename
# use filename here
else
# no matching file
end
The documentation suggested FNM_CASEFOLD couldn't be used with glob but it did actually work in older Ruby versions. However, as mentioned by lildude in the comments, the behaviour has now been brought inline with the documentation and so this approach shouldn't be used.
You can use Dir.glob with the FNM_CASEFOLD flag to get a list of all filenames that match the given name except for case. You can then just use first on the resulting array to get any result back or use min_by to get the one that matches the case of the orignial most closely.
def find_file(f)
Dir.glob(f, File::FNM_CASEFOLD).min_by do |f2|
f.chars.zip(f2.chars).count {|c1,c2| c1 != c2}
end
end
system "touch foo.bar"
system "touch Foo.Bar"
Dir.glob("FOO.BAR", File::FNM_CASEFOLD) #=> ["foo.bar", "Foo.Bar"]
find_file("FOO.BAR") #=> ["Foo.Bar"]