Good afternoon!
I am pretty new to Ruby and want to code a basic search and replace function in Ruby.
When you call the function, you can pass parameters (search pattern, replacing word).
This works like this: multiedit(pattern1, replacement1, pattern2, replacement2, ...)
Now, I want my function to read a text file, search for pattern1 and replace it with replacement2, search for pattern2 and replace it with replacement2 and so on. Finally, the altered text should be written to another text file.
I've tried to do this with a until loop, but all I get is that only the very first pattern is replaced while all the following patterns are ignored (in this example, only apple is replaced with fruit). I think the problem is that I always reread the original unaltered text? But I can't figure out a solution. Can you help me? Calling the function the way I am doing it is important for me.
def multiedit(*_patterns)
return puts "Number of search patterns does not match number of replacement strings!" if (_patterns.length % 2 > 0)
f = File.open("1.txt", "r")
g = File.open("2.txt", "w")
i = 0
until i >= _patterns.length do
f.each_line {|line|
output = line.sub(_patterns[i], _patterns[i+1])
g.puts output
}
i+=2
end
f.close
g.close
end
multiedit("apple", "fruit", "tomato", "veggie", "steak", "meat")
Can you help me out?
Thank you very much in advance!
Regards
Your loop was kind of inside-out ... do this instead ...
f.each_line do |line|
_patterns.each_slice 2 do |a, b|
line.sub! a, b
end
g.puts line
end
Perhaps the most efficient way to evaluate all the patterns for every line is to build a single regexp from all the search patterns and use the hash replacement form of String#gsub
def multiedit *patterns
raise ArgumentError, "Number of search patterns does not match number of replacement strings!" if (_patterns.length % 2 != 0)
replacements = Hash[ *patterns ].
regexp = Regexp.new replacements.keys.map {|k| Regexp.quote(k) }.join('|')
File.open("2.txt", "w") do |out|
IO.foreach("1.txt") do |line|
out.puts line.gsub regexp, replacements
end
end
end
Easier and better method is to use erb.
http://apidock.com/ruby/ERB
Related
I'm doing a rather chaotic experiment with a goofy Markov Chain twitter bot. The current version of the bot opens a CSV file of my tweet archive, strips out things like links and whatnot and leaves only plain text. Works like a charm. Love it!
PATH_TO_TWEETS_CSV = 'tweets.csv'
PATH_TO_TWEETS_CLEAN = 'liber_markov.txt'
csv_text = CSV.parse(File.read(PATH_TO_TWEETS_CSV))
File.open(PATH_TO_TWEETS_CLEAN, 'w') do |file|
csv_text.reverse.each do |row|
tweet_text = row[5].gsub(/(?:f|ht)tps?:\/[^\s]+/, '').gsub(/\n/,' ')
file.write("#{tweet_text}\n")
end
end
However.
I'd like to take an insane step forward and sift through the file a second time, stripping out all but every fourth word, effectively removing 75% of the content. Is there a regex that can handle that?
I don't know about a regex solution specifically, but you could to this:
File.open(PATH_TO_TWEETS_CLEAN, 'w') do |file|
csv_text.reverse.each do |row|
clean_text = row[5].gsub(/(?:f|ht)tps?:\/[^\s]+/, '').gsub(/\n/,' ')
tweet_text = clean_text.split.select.with_index { |_, i| i % 4 == 0 }.join(' ')
file.write("#{tweet_text}\n")
end
end
I'd probably do it using each_slice:
File.open(PATH_TO_TWEETS_CLEAN, 'w') do |file|
csv_text.reverse.each do |row|
tweet_text = row[5].gsub(/(?:f|ht)tps?:\/[^\s]+/, '').gsub(/\n/,' ')
tweet_text = tweet_text.split.each_slice(4).map(&:first).join(' ')
file.write("#{tweet_text}\n")
end
end
The accepted answer is fine, but since you asked about regular expressions, I thought I'd show you how it can be done. Here's a Regexp to start with:
/((\S+\s+){3})\S+\s*/
I've chosen to take "word" to mean any sequence of non-whitespace characters. This matches any word (\S+) followed by one or more whitespace characters (\s+), three times, followed by any word and zero or more whitespace characters (zero so it can match the last word in the string). Here's how you would use it:
tweet_text = "I'm doing a rather chaotic experiment with a goofy Markov Chain twitter bot."
tweet_text.gsub(/((\S+\s+){3})\S+\s*/, '\1')
# => I'm doing a chaotic experiment with goofy Markov Chain bot.
I have a script that telnets into a box, runs a command, and saves the output. I run another script after that which parses through the output file, comparing it to key words that are located in another file for matching. If a line is matched, it should save the entire line (from the original telnet-output) to a new file.
Here is the portion of the script that deals with parsing text:
def parse_file
filter = []
temp_file = File.open('C:\Ruby193\scripts\PARSED_TRIAL.txt', 'a+')
t = File.open('C:\Ruby193\scripts\TRIAL_output_log.txt')
filter = File.open('C:\Ruby193\scripts\Filtered_text.txt').readlines
t.each do |line|
filter.each do |segment|
if (line =~ /#{segment}/)
temp_file.puts line
end
end
end
t.close()
temp_file.close()
end
Currently, it is only saving the last run string located in array filter and saving that to temp_file. It looks like the loop does not run all the strings in the array, or does not save them all. I have five strings placed inside the text file Filtered_text.txt. It only prints my last matched line into temp_file.
This (untested code) will duplicate the original code, only more succinctly and idiomatically:
filter = Regexp.union(File.open('C:\Ruby193\scripts\Filtered_text.txt').readlines.map(&:chomp))
File.open('C:\Ruby193\scripts\PARSED_TRIAL.txt', 'a+') do |temp_file|
File.foreach('C:\Ruby193\scripts\TRIAL_output_log.txt') do |l|
temp_file.puts l if (l[filter])
end
end
To give you an idea what is happening:
Regexp.union(%w[a b c])
=> /a|b|c/
This gives you a regular expression that'll walk through the string looking for any substring matches. It's a case-sensitive search.
If you want to close those holes, use something like:
Regexp.new(
'\b' + Regexp.union(
File.open('C:\Ruby193\scripts\Filtered_text.txt').readlines.map(&:chomp)
).source + '\b',
Regexp::IGNORECASE
)
which, using the same sample input array as above would result in:
/\ba|b|c\b/i
Please forgive my ignorance, I am new to Ruby.
I know how to search a string, or even a single file with a regular expression:
str = File.read('example.txt')
match = str.scan(/[0-9A-Za-z]{8,8}/)
puts match[1]
I know how to search for a static phrase in multiple files and directories
pattern = "hello"
Dir.glob('/home/bob/**/*').each do |file|
next unless File.file?(file)
File.open(file) do |f|
f.each_line do |line|
puts "#{pattern}" if line.include?(pattern)
end
end
end
I can not figure out how to use my regexp against multiple files and directories. Any and all help is much appreciated.
Well, you're quite close. First make pattern a Regexp object:
pattern = /hello/
Or if you are trying to make a Regexp from a String (like passed in on the command line), you might try:
pattern = Regexp.new("hello")
# or use first argument for regexp
pattern = Regexp.new(ARGV[0])
Now when you are searching, line is a String. You can use match or scan to get the results of it matching against your pattern.
f.each_line do |line|
if line.match(pattern)
puts $0
end
# or
if !(match_data = line.match(pattern)).nil?
puts match_data[0]
end
# or to see multiple matches
unless (matches = line.scan(pattern)).empty?
p matches
end
end
Is there any way to create the regex /func:\[sync\] displayPTS/ from string func:[sync] displayPTS?
The story behind this question is that I have serval string pattens to search against in a text file and I don't want to write the same thing again and again.
File.open($f).readlines.reject {|l| not l =~ /"#{string1}"/}
File.open($f).readlines.reject {|l| not l =~ /"#{string2}"/}
Instead , I want to have a function to do the job:
def filter string
#build the reg pattern from string
File.open($f).readlines.reject {|l| not l =~ pattern}
end
filter string1
filter string2
s = "func:[sync] displayPTS"
# => "func:[sync] displayPTS"
r = Regexp.new(s)
# => /func:[sync] displayPTS/
r = Regexp.new(Regexp.escape(s))
# => /func:\[sync\]\ displayPTS/
I like Bob's answer, but just to save the time on your keyboard:
string = 'func:\[sync] displayPTS'
/#{string}/
If the strings are just strings, you can combine them into one regular expression, like so:
targets = [
"string1",
"string2",
].collect do |s|
Regexp.escape(s)
end.join('|')
targets = Regexp.new(targets)
And then:
lines = File.readlines('/tmp/bar').reject do |line|
line !~ target
end
s !~ regexp is equivalent to not s =~ regexp, but easier to read.
Avoid using File.open without closing the file. The file will remain open until the discarded file object is garbage collected, which could be long enough that your program will run out of file handles. If you need to do more than just read the lines, then:
File.open(path) do |file|
# do stuff with file
end
Ruby will close the file at the end of the block.
You might also consider whether using find_all and a positive match would be easier to read than reject and a negative match. The fewer negatives the reader's mind has to go through, the clearer the code:
lines = File.readlines('/tmp/bar').find_all do |line|
line =~ target
end
How about using %r{}:
my_regex = "func:[sync] displayPTS"
File.open($f).readlines.reject { |l| not l =~ %r{#{my_regex}} }
What is the best way to validate a gets input against a very long word list (a list of all the English words available)?
I am currently playing with readlines to manipulate the text, but before there's any manipulation, I would like to first validate the entry against the list.
The simplest way, but by no means the fastest, is to simply search against the word list each time. If the word list is in an array:
if word_list.index word
#manipulate word
end
If, however, you had the word list as a separate file (with each word on a separate line), then we'll use File#foreach to find it:
if File.foreach("word.list") {|x| break x if x.chomp == word}
#manipulate word
end
Note that foreach does not strip off the trailing newline character(s), so we get rid of them with String#chomp.
Here's a simple example using a Set, though Mark Johnson is right,
a bloom filter would be more efficient.
require 'set'
WORD_RE = /\w+/
# Read in the default dictionary (from /usr/share/dict/words),
# and put all the words into a set
WORDS = Set.new(File.read('/usr/share/dict/words').scan(WORD_RE))
# read the input line by line
STDIN.each_line do |line|
# find all the words in the line that aren't contained in our dictionary
unrecognized = line.scan(WORD_RE).find_all { |term| not WORDS.include? term }
# if none were found, the line is valid
if unrecognized.empty?
puts "line is valid"
else # otherwise, the line contains some words not in our dictionary
puts "line is invalid, could not recognize #{unrecognized.inspect}"
end
end
are you reading the list from a file?
can't you have it all in memory?
maybe a finger tree may help you
if not, there's not more than "read a chunk of data from the file and grep into"
Read the word list into memory, and for each word, make an entry into a hash table:
def init_word_tester
#words = {}
File.foreach("word.list") {|word|
#words[word.chomp] = 1
}
end
now you can just check every word against your hash:
def test_word word
return #words[word]
end