Ruby: Searching a regular expression across multiple files in multiple directories - ruby

Please forgive my ignorance, I am new to Ruby.
I know how to search a string, or even a single file with a regular expression:
str = File.read('example.txt')
match = str.scan(/[0-9A-Za-z]{8,8}/)
puts match[1]
I know how to search for a static phrase in multiple files and directories
pattern = "hello"
Dir.glob('/home/bob/**/*').each do |file|
next unless File.file?(file)
File.open(file) do |f|
f.each_line do |line|
puts "#{pattern}" if line.include?(pattern)
end
end
end
I can not figure out how to use my regexp against multiple files and directories. Any and all help is much appreciated.

Well, you're quite close. First make pattern a Regexp object:
pattern = /hello/
Or if you are trying to make a Regexp from a String (like passed in on the command line), you might try:
pattern = Regexp.new("hello")
# or use first argument for regexp
pattern = Regexp.new(ARGV[0])
Now when you are searching, line is a String. You can use match or scan to get the results of it matching against your pattern.
f.each_line do |line|
if line.match(pattern)
puts $0
end
# or
if !(match_data = line.match(pattern)).nil?
puts match_data[0]
end
# or to see multiple matches
unless (matches = line.scan(pattern)).empty?
p matches
end
end

Related

Checking for words inside folders/subfolders and files

I am having issue with regular expressions. So basically I have a folder and this folder contains sub folders as well as files. I have to check for certain words in those folders. The words I have to check for are located in a file called words.txt file.
This is the code that I have so far in Ruby:
def check_words
array_of_words = File.readlines('words.txt')
re = Regexp.union(array_of_words)
new_array_of_words = [/\b(?:#{re.source})\b/]
Dir['source_test/**/*'].select{|f| File.file?(f) }.each do |filepath|
new_array_of_words.each do |word|
puts File.foreach(filepath).include?(word)
end
end
end
When I execute this code I keep getting false even though some of the files inside the folders/subfolders contains those words.
new_array_of_words is a single regex, and the include? methods acts on strings (and it doesn't make much sense to iterate over a single regex anyway).
You can keep using the regex, but use regex methods instead.
You can also fix your current code as follows:
arr = File.readlines('/home/afifit/Tests/word.txt')
arr.select { |e| e.strip! }
Dir['yourdir/*'].select{|f| File.file?(f) }.each do |filepath|
arr.each do |word|
puts File.foreach(filepath).include?(word)
end
end
I also used strip to remove any unnecessary whitespaces and newlines.

Ruby : line.include?"#{varStrTextSearch}"

I have a file ImageContainer.xml with text as follow:
<leftArrowImage>/apps/mcaui/PAL/Arrows/C0004OptionNavArrowLeft.png</leftArrowImage>
<rightArrowImage>/apps/mcaui/PAL/Arrows/C0003OptionNavArrowRight.png</rightArrowImage>
Now, I am searching for C0004OptionNavArrowLeft.png and C0003OptionNavArrowRight.png in that file.
Code is:
#LangFileName = "ZZZPNG.txt"
fileLangInput = File.open(#LangFileName)
fileLangInput.each_line do |varStrTextSearch|
puts "\nSearching ==>" + varStrTextSearch
Dir.glob("**/*.*") do |file_name|
fileSdfInput = File.open(file_name)
fileSdfInput.each_line do |line|
if line.include?"#{varStrTextSearch}"
puts"Found"
end
end
end
end
here varStrTextSearch is string variable having different string values.
Problem is that is it is finding C0004OptionNavArrowLeft.png but not finding C0003OptionNavArrowRight.png.
Can someone tell me where I am doing wrong?
My guess is, newline chars are the problem.
fileLangInput.each_line do |varStrTextSearch|
varStrTextSearch here will contain a \n char at the end. And if your XML is not consistently formatted (for example, like this)
<leftArrowImage>
/apps/mcaui/PAL/Arrows/C0004OptionNavArrowLeft.png
</leftArrowImage>
<rightArrowImage>/apps/mcaui/PAL/Arrows/C0003OptionNavArrowRight.png</rightArrowImage>
Then your problem can be reproduced (there's no newline char after "C0003OptionNavArrowRight", so it can't be found).
Solution? Remove the unwanted whitespace.
fileSdfInput.each_line do |line|
if line.include? varStrTextSearch.chomp # read the docs on String#chomp
puts"Found"
end
end

Search and replace multiple words in file via Ruby

Good afternoon!
I am pretty new to Ruby and want to code a basic search and replace function in Ruby.
When you call the function, you can pass parameters (search pattern, replacing word).
This works like this: multiedit(pattern1, replacement1, pattern2, replacement2, ...)
Now, I want my function to read a text file, search for pattern1 and replace it with replacement2, search for pattern2 and replace it with replacement2 and so on. Finally, the altered text should be written to another text file.
I've tried to do this with a until loop, but all I get is that only the very first pattern is replaced while all the following patterns are ignored (in this example, only apple is replaced with fruit). I think the problem is that I always reread the original unaltered text? But I can't figure out a solution. Can you help me? Calling the function the way I am doing it is important for me.
def multiedit(*_patterns)
return puts "Number of search patterns does not match number of replacement strings!" if (_patterns.length % 2 > 0)
f = File.open("1.txt", "r")
g = File.open("2.txt", "w")
i = 0
until i >= _patterns.length do
f.each_line {|line|
output = line.sub(_patterns[i], _patterns[i+1])
g.puts output
}
i+=2
end
f.close
g.close
end
multiedit("apple", "fruit", "tomato", "veggie", "steak", "meat")
Can you help me out?
Thank you very much in advance!
Regards
Your loop was kind of inside-out ... do this instead ...
f.each_line do |line|
_patterns.each_slice 2 do |a, b|
line.sub! a, b
end
g.puts line
end
Perhaps the most efficient way to evaluate all the patterns for every line is to build a single regexp from all the search patterns and use the hash replacement form of String#gsub
def multiedit *patterns
raise ArgumentError, "Number of search patterns does not match number of replacement strings!" if (_patterns.length % 2 != 0)
replacements = Hash[ *patterns ].
regexp = Regexp.new replacements.keys.map {|k| Regexp.quote(k) }.join('|')
File.open("2.txt", "w") do |out|
IO.foreach("1.txt") do |line|
out.puts line.gsub regexp, replacements
end
end
end
Easier and better method is to use erb.
http://apidock.com/ruby/ERB

Ruby Regex not matching

I'm writing a short class to extract email addresses from documents. Here is my code so far:
# Class to scrape documents for email addresses
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
def EmailScraper.scrape(doc)
email_addresses = []
File.open(doc) do |file|
while line = file.gets
temp = line.scan(EmailRegex)
temp.each do |email_address|
puts email_address
emails_addresses << email_address
end
end
end
return email_addresses
end
end
if EmailScraper.scrape("email_tests.txt").empty?
puts "Empty array"
else
puts EmailScraper.scrape("email_tests.txt")
end
My "email_tests.txt" file looks like so:
example#live.com
another_example90#hotmail.com
example3#diginet.ie
When I run this script, all I get is the "Empty array" printout. However, when I fire up irb and type in the regex above, strings of email addresses match it, and the String.scan function returns an array of all the email addresses in each string. Why is this working in irb and not in my script?
Several things (some already mentioned and expanded upon below):
\z matches to the end of the string, which with IO#gets will typically include a \n character. \Z (upper case 'z') matches the end of the string unless the string ends with a \n, in which case it matches just before.
the typo of emails_addresses
using \A and \Z is fine while the entire line is or is not an email address. You say you're seeking to extract addresses from documents, however, so I'd consider using \b at each end to extract emails delimited by word boundaries.
you could use File.foreach()... rather than the clumsy-looking File.open...while...gets thing
I'm not convinced by the Regex - there's a substantial body of work already around:
There's a smarter one here: http://www.regular-expressions.info/email.html (clicking on that odd little in-line icon takes you to a piece-by-piece explanation). It's worth reading the discussion, which points out several potential pitfalls.
Even more mind-bogglingly complex ones may be found here.
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\Z/i # changed \z to \Z
def EmailScraper.scrape(doc)
email_addresses = []
File.foreach(doc) do |line| # less code, same effect
temp = line.scan(EmailRegex)
temp.each do |email_address|
email_addresses << email_address
end
end
email_addresses # "return" isn't needed
end
end
result = EmailScraper.scrape("email_tests.txt") # store it so we don't print them twice if successful
if result.empty?
puts "Empty array"
else
puts result
end
Looks like you're putting the results into emails_addresses, but are returning email_addresses. This would mean that you're always returning the empty array you defined for email_addresses, making the "Empty array" response correct.
You have a typo, try with:
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
def EmailScraper.scrape(doc)
email_addresses = []
File.open(doc) do |file|
while line = file.gets
temp = line.scan(EmailRegex)
temp.each do |email_address|
puts email_address
email_addresses << email_address
end
end
end
return email_addresses
end
end
if EmailScraper.scrape("email_tests.txt").empty?
puts "Empty array"
else
puts EmailScraper.scrape("email_tests.txt")
end
You used at the end \z try to use \Z according to http://www.regular-expressions.info/ruby.html it has to be a uppercase Z to match the end of the string.
Otherwise try to use ^ and $ (matching the start and the end of a row) this worked for me here on Regexr
When you read the file, the end of line is making the regex fail. In irb, there probably is no end of line. If that is the case, chomp the lines first.
regex=/\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
line_from_irb = "example#live.com"
line_from_file = line_from_irb +"/n"
p line_from_irb.scan(regex) # => ["example#live.com"]
p line_from_file.scan(regex) # => []

Create regular expression from string

Is there any way to create the regex /func:\[sync\] displayPTS/ from string func:[sync] displayPTS?
The story behind this question is that I have serval string pattens to search against in a text file and I don't want to write the same thing again and again.
File.open($f).readlines.reject {|l| not l =~ /"#{string1}"/}
File.open($f).readlines.reject {|l| not l =~ /"#{string2}"/}
Instead , I want to have a function to do the job:
def filter string
#build the reg pattern from string
File.open($f).readlines.reject {|l| not l =~ pattern}
end
filter string1
filter string2
s = "func:[sync] displayPTS"
# => "func:[sync] displayPTS"
r = Regexp.new(s)
# => /func:[sync] displayPTS/
r = Regexp.new(Regexp.escape(s))
# => /func:\[sync\]\ displayPTS/
I like Bob's answer, but just to save the time on your keyboard:
string = 'func:\[sync] displayPTS'
/#{string}/
If the strings are just strings, you can combine them into one regular expression, like so:
targets = [
"string1",
"string2",
].collect do |s|
Regexp.escape(s)
end.join('|')
targets = Regexp.new(targets)
And then:
lines = File.readlines('/tmp/bar').reject do |line|
line !~ target
end
s !~ regexp is equivalent to not s =~ regexp, but easier to read.
Avoid using File.open without closing the file. The file will remain open until the discarded file object is garbage collected, which could be long enough that your program will run out of file handles. If you need to do more than just read the lines, then:
File.open(path) do |file|
# do stuff with file
end
Ruby will close the file at the end of the block.
You might also consider whether using find_all and a positive match would be easier to read than reject and a negative match. The fewer negatives the reader's mind has to go through, the clearer the code:
lines = File.readlines('/tmp/bar').find_all do |line|
line =~ target
end
How about using %r{}:
my_regex = "func:[sync] displayPTS"
File.open($f).readlines.reject { |l| not l =~ %r{#{my_regex}} }

Resources