Search for word in text file and display (Ruby) - ruby

I'm trying to take input from the user, search through a text file (case insensitively), and then display the match from the file if it matches (with the case of the word in the file). I don't know how to get the word from the file, here's my code:
found = 0
words = []
puts "Please enter a word to add to text file"
input = gets.chomp
#Open text file in read mode
File.open("filename", "r+") do |f|
f.each do |line|
if line.match(/\b#{input}\b/i)
puts "#{input} was found in the file." # <--- I want to show the matched word here
#Set to 1 indicating word was found
found = 1
end
end
end

So, what you want to do is to store the result of the match method, you can then get the actual matched word out of that, ie.
if m = line.match( /\b#{input}\b/i )
puts "#{m[0]} was found in the file."
# ... etc.
end
Update
Btw, you didn't ask - but I would use scan in this case, so that I got an array of the matched words on each line (for when there's more than one match on the same line), something like this:
if m = line.scan( /\b#{input}\b/i )
puts "Matches found on line #{f.lineno}: #{m.join(', ')}"
# ... etc.
end

If you don't need to report the locations of the matches and the file is not overly large, you could just do this:
File.read("testfile").scan /\b#{input}\b/i
Let's try it:
text = <<THE_END
Now is the time for all good people
to come to the aid of their local
grocers, because grocers are important
to our well-being. One more thing.
Grocers sell ice cream, and boy do I
love ice cream.
THE_END
input = "grocers"
F_NAME = "text"
File.write(F_NAME, text)
File.read(F_NAME).scan /\b#{input}\b/i
# => ["grocers", "grocers", "Grocers"]
File.read(F_NAME) returns the entire text file in a single string. scan /\b#{input}\b/i is sent to that string.

Related

Deleting lines containing specific words in text file in Ruby

I have a text file like this:
User accounts for \\AGGREP-1
-------------------------------------------------------------------------------
Administrator users grzesieklocal
Guest scom SUPPORT_8855
The command completed successfully.
First line is empty line. I want to delete every empty lines in this file and every line containing words "User accounts for", "-------", "The command". I want to have only lines containing users. I don't want to delete only first 4 and the last one lines, because it can be more users in some systems and file will contain more lines.
I load file using
a = IO.readlines("test.txt")
Is any way to delete lines containing specific words?
Solution
This structure reads the file line by line, and write a new file directly :
def unwanted?(line)
line.strip.empty? ||
line.include?('User accounts') ||
line.include?('-------------') ||
line.include?('The command completed')
end
File.open('just_users.txt', 'w+') do |out|
File.foreach('test.txt') do |line|
out.puts line unless unwanted?(line)
end
end
If you're familiar with regexp, you could use :
def unwanted?(line)
line =~ /^(User accounts|------------|The command completed|\s*$)/
end
Warning from your code
The message warning: string literal in condition appears when you try to use :
string = "nothing"
if string.include? "a" or "b"
puts "FOUND!"
end
It outputs :
parse_text.rb:16: warning: string literal in condition
FOUND!
Because it should be written :
string = 'nothing'
if string.include?('a') || string.include?('b')
puts "FOUND!"
end
See this question for more info.
IO::readlines returns an array, so you could use Array#select to select just the lines you need. Bear in mind that this means that your whole input file will be in memory, which might be a problem, if the file is really large.
An alternative approach would be to use IO::foreach, which processes one line at a time:
selected_lines = []
IO.foreach('test.txt') { |line| selected_lines << line if line_matches_your_requirements }

Extract names from File using Ruby and Grep

I have a file with the following data:
other data
user1=name1
user2=name2
user3=name3
other data
to extract the names I do the following
names = File.open('resource.cfg', 'r') do |f|
f.grep(/[a-z][a-z][0-9]/)
end
which returns the following array
user1=name1
user2=name2
user3=name3
but I really want only the name part
name1
name2
name3
Right now I'm doing this after the file step:
names = names.map do |name|
name[7..9]
end
is there a better way to do? with the file step
You could do it like this, using String#scan with a regex:
Code
File.read(FNAME).scan(/(?<==)[A-Za-z]+\d+$/)
Explanation
Let's start by constructing a file:
FNAME = "my_file"
lines =<<_
other data
user1=name1
user2=name2
user3=name3
other data
_
File.write(FNAME,lines)
We can confirm the file contents:
puts File.read(FNAME)
other data
user1=name1
user2=name2
user3=name3
other data
Now run the code::
File.read(FNAME).scan(/(?<==)[A-Za-z]+\d+$/)
#=> ["name1", "name2", "name3"]
A word about the regex I used.
(?<=...)
is called a "positive lookbehind". Whatever is inserted in place of the dots must immediately precede the match, but is not part of the match (and for that reason is sometimes referred to as as "zero-length" group). We want the match to follow an equals sign, so the "positive lookbehind" is as follows:
(?<==)
This is followed by one or more letters, then one or more digits, then an end-of-line, which comprise the pattern to be matched. You could of course change this if you have different requirements, such as names being lowercase or beginning with a capital letter, a specified number of digits, and so on.
Is your code working as you have posted it?
names = File.open('resource.cfg', 'r') { |f| f.grep(/[a-z][a-z][0-9]/) }
names = names.map { |name| name[7..9] }
=> ["ame", "ame", "ame"]
You could make it into a neat little one-liner by writing it as such:
names = File.readlines('resource.cfg').grep(/=(\w*)/) { |x| x.split('=')[1].chomp }
You can do it all in a single step:
names = File.open('resource.cfg', 'r') do |f|
f.grep(/[a-z][a-z][0-9]/).map {|x| x.split('=')[1]}
end

How do I create an array from a txt file converted by a HTML file (Ruby)?

I'm trying to complete the first task to our assignment:
Get 5 regular emails and 5 advance-­‐fee fraud emails (aka spam). Convert them all into text files and then turn each into an array of words (split may help here). Then use a bunch of regular expressions to search the array of words looking for keywords to classify which files are spam or not. If you want to get fancy you could give each array a spam-­‐score out of 10.
Open HTML page and read file.
Strip script, links etc from file.
Have body/para on its own.
Open text file (file2) & write to it (UTF-8).
Pass content from HTML document (file 1).
Now put the words from text file (file2) into an array and later split.
Go through array finding any words that are considered spam and print message to screen stating if the email is a spam or not.
Here is my code:
require 'nokogiri'
file = File.open("EMAILS/REG/Membership.htm", "r")
doc = Nokogiri::HTML(file)
#What ever is passed from elements to the newFile is being put into the new array however the euro sign doesn't appear correctly
elements = doc.xpath("/html/body//p").text
#puts elements
newFile = File.open("test1.txt", "w")
newFile.write(elements)
newFile.close()
#I want to open the file again and print the lines to the screen
#
array_of_words = {}
puts "\n\tRetrieving test1.txt...\n\n"
File.open("test1.txt", "r:UTF-8").each_line do |line|
words = line.split(' ')
words.each do |word|
puts "#{word}"
#array_of_words[word] = gets.chomp.split(' ')
end
end
EDITED: Here I've edited the file, however, I'm unable to retrieve the UTF-8 encoding of the euro sign in the array (see the image).
require 'nokogiri'
doc = Nokogiri::HTML(File.open("EMAILS/REG/Membership.htm", "r:UTF-8"))
#What ever is passed from elements to the newFile is being put into the new
#array however the euro sign doesn't appear correctly
elements = doc.xpath("//p").text
#puts elements
File.write("test1.txt", elements)
puts "\n\tRetrieving test1.txt...\n\n"
#I want to open the file again and print the lines to the screen
#
word_array = Array.new
File.read("test1.txt").each_line do |line|
line.split(' ').each do |word|
puts "#{word}"
word_array << word
end
end
Because this is an assignment, I'm not going to try to answer how you're supposed to do this; You're supposed to figure it out on your own.
What I will do is show you how you should have written what you've already done, and point you in a direction:
require 'nokogiri'
doc = Nokogiri::HTML(File.read("EMAILS/REG/Membership.htm"))
# What ever is passed from elements to the newFile is being put into the new
# array however the euro sign doesn't appear correctly
elements = doc.xpath("//p").text
File.write("test1.txt", elements)
print "\n\tRetrieving test1.txt...\n\n"
# I want to open the file again and print the lines to the screen
word_hash = {}
File.open("test1.txt", "r:UTF-8").each_line do |line|
line.split(' ').each do |word|
puts "#{word}"
#word_hash[word] = gets.chomp.split(' ')
end
end
Many of Ruby's IO methods, and File's by inheritance, can take advantage of blocks, which automatically close the stream when the block exits. Use that capability as leaving files open throughout the run-time of an app is not good.
array_of_words = {} doesn't define an array, it's a hash.
#array_of_words[word] = gets.chomp.split(' ') wouldn't work because of where gets wants to read from. By default it's STDIN, which would be the console, meaning the keyboard. You've already got word at that point so do something with it.
But think, you're basically creating the basis for a Bayesian Filter. You need to be counting the number of occurrences of words, so merely assigning the word to the hash won't get you what you want to know, you need to know how many times a particular word was seen. Stack Overflow has a lot of questions answered about how to count the number of words found in a string, so search for those.
You're making things harder for yourself. You already have the paragraph text in elements so there's no need to read test1.txt after writing to it. Then use String#split without arguments to split on all whitespace.

Compare Arrays for matching string

I have a script that telnets into a box, runs a command, and saves the output. I run another script after that which parses through the output file, comparing it to key words that are located in another file for matching. If a line is matched, it should save the entire line (from the original telnet-output) to a new file.
Here is the portion of the script that deals with parsing text:
def parse_file
filter = []
temp_file = File.open('C:\Ruby193\scripts\PARSED_TRIAL.txt', 'a+')
t = File.open('C:\Ruby193\scripts\TRIAL_output_log.txt')
filter = File.open('C:\Ruby193\scripts\Filtered_text.txt').readlines
t.each do |line|
filter.each do |segment|
if (line =~ /#{segment}/)
temp_file.puts line
end
end
end
t.close()
temp_file.close()
end
Currently, it is only saving the last run string located in array filter and saving that to temp_file. It looks like the loop does not run all the strings in the array, or does not save them all. I have five strings placed inside the text file Filtered_text.txt. It only prints my last matched line into temp_file.
This (untested code) will duplicate the original code, only more succinctly and idiomatically:
filter = Regexp.union(File.open('C:\Ruby193\scripts\Filtered_text.txt').readlines.map(&:chomp))
File.open('C:\Ruby193\scripts\PARSED_TRIAL.txt', 'a+') do |temp_file|
File.foreach('C:\Ruby193\scripts\TRIAL_output_log.txt') do |l|
temp_file.puts l if (l[filter])
end
end
To give you an idea what is happening:
Regexp.union(%w[a b c])
=> /a|b|c/
This gives you a regular expression that'll walk through the string looking for any substring matches. It's a case-sensitive search.
If you want to close those holes, use something like:
Regexp.new(
'\b' + Regexp.union(
File.open('C:\Ruby193\scripts\Filtered_text.txt').readlines.map(&:chomp)
).source + '\b',
Regexp::IGNORECASE
)
which, using the same sample input array as above would result in:
/\ba|b|c\b/i

Simplest way to validate an input against a file of words?

What is the best way to validate a gets input against a very long word list (a list of all the English words available)?
I am currently playing with readlines to manipulate the text, but before there's any manipulation, I would like to first validate the entry against the list.
The simplest way, but by no means the fastest, is to simply search against the word list each time. If the word list is in an array:
if word_list.index word
#manipulate word
end
If, however, you had the word list as a separate file (with each word on a separate line), then we'll use File#foreach to find it:
if File.foreach("word.list") {|x| break x if x.chomp == word}
#manipulate word
end
Note that foreach does not strip off the trailing newline character(s), so we get rid of them with String#chomp.
Here's a simple example using a Set, though Mark Johnson is right,
a bloom filter would be more efficient.
require 'set'
WORD_RE = /\w+/
# Read in the default dictionary (from /usr/share/dict/words),
# and put all the words into a set
WORDS = Set.new(File.read('/usr/share/dict/words').scan(WORD_RE))
# read the input line by line
STDIN.each_line do |line|
# find all the words in the line that aren't contained in our dictionary
unrecognized = line.scan(WORD_RE).find_all { |term| not WORDS.include? term }
# if none were found, the line is valid
if unrecognized.empty?
puts "line is valid"
else # otherwise, the line contains some words not in our dictionary
puts "line is invalid, could not recognize #{unrecognized.inspect}"
end
end
are you reading the list from a file?
can't you have it all in memory?
maybe a finger tree may help you
if not, there's not more than "read a chunk of data from the file and grep into"
Read the word list into memory, and for each word, make an entry into a hash table:
def init_word_tester
#words = {}
File.foreach("word.list") {|word|
#words[word.chomp] = 1
}
end
now you can just check every word against your hash:
def test_word word
return #words[word]
end

Resources