Ruby Regex not matching - ruby

I'm writing a short class to extract email addresses from documents. Here is my code so far:
# Class to scrape documents for email addresses
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
def EmailScraper.scrape(doc)
email_addresses = []
File.open(doc) do |file|
while line = file.gets
temp = line.scan(EmailRegex)
temp.each do |email_address|
puts email_address
emails_addresses << email_address
end
end
end
return email_addresses
end
end
if EmailScraper.scrape("email_tests.txt").empty?
puts "Empty array"
else
puts EmailScraper.scrape("email_tests.txt")
end
My "email_tests.txt" file looks like so:
example#live.com
another_example90#hotmail.com
example3#diginet.ie
When I run this script, all I get is the "Empty array" printout. However, when I fire up irb and type in the regex above, strings of email addresses match it, and the String.scan function returns an array of all the email addresses in each string. Why is this working in irb and not in my script?

Several things (some already mentioned and expanded upon below):
\z matches to the end of the string, which with IO#gets will typically include a \n character. \Z (upper case 'z') matches the end of the string unless the string ends with a \n, in which case it matches just before.
the typo of emails_addresses
using \A and \Z is fine while the entire line is or is not an email address. You say you're seeking to extract addresses from documents, however, so I'd consider using \b at each end to extract emails delimited by word boundaries.
you could use File.foreach()... rather than the clumsy-looking File.open...while...gets thing
I'm not convinced by the Regex - there's a substantial body of work already around:
There's a smarter one here: http://www.regular-expressions.info/email.html (clicking on that odd little in-line icon takes you to a piece-by-piece explanation). It's worth reading the discussion, which points out several potential pitfalls.
Even more mind-bogglingly complex ones may be found here.
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\Z/i # changed \z to \Z
def EmailScraper.scrape(doc)
email_addresses = []
File.foreach(doc) do |line| # less code, same effect
temp = line.scan(EmailRegex)
temp.each do |email_address|
email_addresses << email_address
end
end
email_addresses # "return" isn't needed
end
end
result = EmailScraper.scrape("email_tests.txt") # store it so we don't print them twice if successful
if result.empty?
puts "Empty array"
else
puts result
end

Looks like you're putting the results into emails_addresses, but are returning email_addresses. This would mean that you're always returning the empty array you defined for email_addresses, making the "Empty array" response correct.

You have a typo, try with:
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
def EmailScraper.scrape(doc)
email_addresses = []
File.open(doc) do |file|
while line = file.gets
temp = line.scan(EmailRegex)
temp.each do |email_address|
puts email_address
email_addresses << email_address
end
end
end
return email_addresses
end
end
if EmailScraper.scrape("email_tests.txt").empty?
puts "Empty array"
else
puts EmailScraper.scrape("email_tests.txt")
end

You used at the end \z try to use \Z according to http://www.regular-expressions.info/ruby.html it has to be a uppercase Z to match the end of the string.
Otherwise try to use ^ and $ (matching the start and the end of a row) this worked for me here on Regexr

When you read the file, the end of line is making the regex fail. In irb, there probably is no end of line. If that is the case, chomp the lines first.
regex=/\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
line_from_irb = "example#live.com"
line_from_file = line_from_irb +"/n"
p line_from_irb.scan(regex) # => ["example#live.com"]
p line_from_file.scan(regex) # => []

Related

create comma separated string in the First element of an array Ruby

So this may seem odd, and I have done quite a bit of googling, however, I am not really a programmer, (sysops) and trying to figure out how to pass data to the AWS API in the required format, which does seem a little odd.
So, working with resources in AWS, I need to pass tags which are keys and values. The key is a string. The value is a comma separated string, in the first element of an array. So in Ruby terms, looks like this.
{env => ["stage,qa,dev"]}
and not
{env => ["stage","qa","dev"]}
I'm created an admittedly. not a very pretty little app that will allow me to run ssm documents on targeted instances in aws.
I can get the string into an array element using this class I created
class Tags
attr_accessor :tags
def initialize
#tags = {"env" => nil ,"os" => nil ,"group" => nil }
end
def set_values()
puts "please enter value/s for the following keys, using spaces or commas for multiple values"
#tags.each { |key,value|
print "enter #{key} value/s: "
#tags[key] = [gets.strip.chomp]
#tags[key] = Validate.multi_value(tags[key])
}
end
end
I then call this Validate.multi_value passing in the created Array, but it spits an array of my string value back.
class Validate
def self.multi_value(value)
if value.any?{ |sub_string| sub_string.include?(",") || sub_string.include?(" ") }
value = value[0].split(/[,\s]+/)
return value
else
return value
end
end
end
Using pry, I've seen it gets for example ["stage dev qa"] then the if statement does work, then it spits out ["stage","dev","qa"].
and I need it to output ["stage,dev,qa"] but for the life of me, I can't make it work.
I hope that's clear.
If you have any suggestions, I'd be most grateful.
I'm not hugely experienced at ruby and the may be class methods that I've missed.
If your arrays are always coming through in the format ["stage dev qa"] then first we need to split the one string into the parts we want:
arr = ["stage dev qa"]
arr.split(' ')
=> ["stage", "dev", "qa"]
Then we need to join them with the comma:
arr.split(' ').join(',')
=> "stage,dev,qa"
And finally we need to wrap it in an array:
[arr.first.split(' ').join(',')]
=> ["stage,dev,qa"]
All together:
def transform_array(arr)
[arr.first.split(' ').join(',')]
end
transform_array(['stage dev qa'])
=> ['stage,dev,qa']
More info: How do I convert an array of strings into a comma-separated string?
I see no point in creating a class here when a simple method would do.
def set_values
["env", "os", "group"].map do |tag|
puts "Please enter values for #{tag}, using spaces or commas"
print "to separate multiple values: "
gets.strip.gsub(/[ ,]+/, ',')
end
end
Suppose, when asked, the user enters, "stage dev,qa" (for"env"), "OS X" (for"OS") and "Hell's Angels" for "group". Then:
set_values
#=> ["stage,dev,qa", "OS,X", "Hell's,Angels"]
If, as I suspect, you only wish to convert spaces to commas for "env" and not for "os" or "group", write:
def set_values
puts "Please enter values for env, using spaces or commas"
print "to separate multiple values: "
[gets.strip.gsub(/[ ,]+/, ',')] +
["os", "group"].map do |tag|
print "Please enter value for #{tag}: "
gets.strip
end
end
set_values
#=> ["stage,dev,ga", "OS X", "Hell's Angels"]
See Array#map, String#gsub and Array#+.
gets.strip.gsub(/[ ,]+/, ',') merely chains the two operations s = gets.strip and s.gsub(/[ ,]+/, ','). Chaining is commonplace in Ruby.
The regular expression used by gsub reads, "match one or more spaces or commas", [ ,] being a character class, requiring one of the characters in the class be matched, + meaning that one or more of those spaces or commas are to be matched. If the string were "a , b,, c" there would be two matches, " , " and ",, "; gsub would convert both to a single comma.
Using print rather than puts displays the user's entry on the same line as the prompt, immediately after ": ", rather than on the next line. That is of course purely stylistic.
Often one would write gets.chomp rather than gets.strip. Both remove newlines and other whitespace at the end of the string, strip also removes any whitespace at the beginning of the string. strip is probably best in this case.
What do you think about this?, everything gets treated in the Validates method. I don't know if you wanted to remove repeated values, but, just in case I did, so a
"this string,, has too many,,, , spaces"
will become
"this,string,has,too,many,spaces"
and not
"this,,,,string,,,has,too,,many,,,,,,spaces"
Here's the code
class Tags
attr_accessor :tags
# initializes the class (no change)
#
def initialize
#tags = {"env" => nil ,"os" => nil ,"group" => nil }
end
# request and assign the values <- SOME CHANGES
#
def set_values
puts "please enter value/s for the following keys, using spaces or commas for multiple values"
#tags.each do |key,value|
print "enter #{key} value/s: "
#tags[key] = Validate.multi_value( gets )
end
end
end
class Validate
# Sets the array
#
def self.multi_value(value)
# Remove leading spaces, then remove special chars,
# replace all spaces with commas, then remove repetitions
#
[ value.strip.delete("\n","\r","\t","\rn").gsub(" ", ",").squeeze(",") ]
end
end
EDITED, thanks lacostenycoder

Insert text before the end of a file

I am trying to write a script that will insert a text before the last end tag within a Ruby file. For example, I want to insert the following:
def hello
puts "hello!"
end
within the following file, just before the end of the class:
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
helper_method :authenticated?, :current_user
def current_user?
session[:current_user]
end
end
The result should look like this:
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
helper_method :authenticated?, :current_user
def current_user?
session[:current_user]
end
def hello
puts "hello!"
end
end
I have tried to find a regex that would match the last occurence of end and replace it with the block I want to add but all regexes I have tried match the first end only. Tried these:
end(?=[^end]*$)
end(?!.*end)
(.*)(end)(.*)
To replace the string, I do the following (maybe the EOL characters are screwing up the matching?):
file_to_override = File.read("app/controllers/application_controller.rb")
file_to_override = file_to_override.sub(/end(?=[^end]*$)/, "#{new_string}\nend")
EDIT: I also tried with the solution provided in How to replace the last occurrence of a substring in ruby? but strangely, it replaces all occurences of end.
What am I doing wrong? Thanks!
The approach explained in the post is working here, too. You just need to re-organize capturing groups and use the /m modifier that forces . to match newline symbols, too.
new_string = <<EOS
def hello
puts "Hello!"
end
EOS
file_to_override = <<EOS
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
helper_method :authenticated?, :current_user
def current_user?
session[:current_user]
end
end
EOS
file_to_override=file_to_override.gsub(/(.*)(\nend\b.*)/m, "\\1\n#{new_string}\\2")
puts file_to_override
See IDEONE demo
The /(.*)(\nend\b.*)/m pattern will match and capture into Group 1 all the text up to the last whole word (due to the \n before and \b after) end preceded with a line feed, and will place the line feed, "end" and whatever remains into Group 2. In the replacement, we back-reference the captured substrings with backreferences \1 and \2 and also insert the string we need to insert.
If there are no other words after the last end, you could even use a /(.*)(\nend\s*\z)/m regex.
Suppose you read the file into the string text:
text = <<_
class A
def a
'hi'
end
end
_
and wish to insert the string to_enter:
to_enter = <<_
def hello
puts "hello!"
end
_
before the last end. You could write
r = /
.* # match any number of any character (greedily)
\K # discard everything matched so far
(?=\n\s*end\b) # match end-of-line, indenting spaces, and "end" followed
# by a word break in a positive lookahead
/mx # multi-line and extended/free-spacing regex definition modes
puts text.sub(r, to_enter)
(prints)
class A
def a
'hi'
end
def hello
puts "hello!"
end
end
Note that sub is replacing an empty string with to_enter.
Edit: Answer from Wiktor is exactly what I was looking for. Leaving the following too because it works as well.
Finally, I gave up on replacing using a regex. Instead, I use the position of the last end:
positions = file_to_override.enum_for(:scan, /end/).map { Regexp.last_match.begin(0) }
Then, before writing the file, I add what I need within the string at last position - 1:
new_string = <<EOS
def hello
puts "Hello!"
end
EOS
file_to_override[positions.last - 1] = "\n#{test_string}\n"
File.open("app/controllers/application_controller.rb", 'w') {|file| file.write(file_to_override)}
This works but it doesn't look like idiomatic Ruby to me.
You can also find and replace the last occurence of "end" (note that this will also match the end in # Hello my friend, but see below) like this
# Our basics: In this text ...
original_content = "# myfile.rb\n"\
"module MyApp\n"\
" class MyFile\n"\
" def myfunc\n"\
" end\n"\
" end\n"\
"end\n"
# ...we want to inject this:
substitute = "# this will come to a final end!\n"\
"end\n"
# Now find the last end ...
idx = original_content.rindex("end") # => index of last "end"(69)
# ... and substitute it
original_content[idx..idx+3] = substitute # (3 = "end".length)
This solution is somewhat more old-school (dealing with indexes in strings felt much cooler some years ago) and in this form more "vulnerable" but avoids you to sit down and digest the regexps. Dont get me wrong, regular expressions are a tool of incredible power and the minutes learning them are worth it.
That said, you can use all the regular expressions from the other answers also with rindex (e.g. rindex(/ *end/)).

odd rake behavior - is it broken?

EDIT: figured it out. This is telling me my ruby is producing an infinite loop. Now if I could only figure out how to fix the loop ...
I run a rake test and this is all that outputs to the terminal:
(in /home/macs/Desktop/projects/odin/odin3_ruby/learn_ruby)
#translate
The test was working fine before I changed my ruby to this:
def translate(x)
vowel = /\b[aeiou]*/
array = x.split("")
until array[0]==vowel do
it = array[0]
array.push(it)
array.delete(it)
end
new = array.join("")
new+="ay"
end
I don't remember exactly what my ruby code was before I changed it.
If you are curious to see my rake file it is this. By the way I downloaded this file from the tutorial website and I am positive I didn't change it at all.
# # Topics
#
# * modules
# * strings
#
# # Pig Latin
#
# Pig Latin is a made-up children's language that's intended to be confusing. It obeys a few simple rules (below) but when it's spoken quickly it's really difficult for non-children (and non-native speakers) to understand.
#
# Rule 1: If a word begins with a vowel sound, add an "ay" sound to the end of the word.
#
# Rule 2: If a word begins with a consonant sound, move it to the end of the word, and then add an "ay" sound to the end of the word.
#
# (There are a few more rules for edge cases, and there are regional variants too, but that should be enough to understand the tests.)
#
# See <http://en.wikipedia.org/wiki/Pig_latin> for more details.
#
#
require "pig_latin"
describe "#translate" do
it "translates a word beginning with a vowel" do
s = translate("apple")
s.should == "appleay"
end
it "translates a word beginning with a consonant" do
s = translate("banana")
s.should == "ananabay"
end
it "translates a word beginning with two consonants" do
s = translate("cherry")
s.should == "errychay"
end
it "translates two words" do
s = translate("eat pie")
s.should == "eatay iepay"
end
it "translates a word beginning with three consonants" do
translate("three").should == "eethray"
end
it "counts 'sch' as a single phoneme" do
s = translate("school")
s.should == "oolschay"
end
it "counts 'qu' as a single phoneme" do
s = translate("quiet")
s.should == "ietquay"
end
it "counts 'qu' as a consonant even when it's preceded by a consonant" do
s = translate("square")
s.should == "aresquay"
end
it "translates many words" do
s = translate("the quick brown fox")
s.should == "ethay ickquay ownbray oxfay"
end
# Test-driving bonus:
# * write a test asserting that capitalized words are still capitalized (but with a different initial capital letter, of course)
# * retain the punctuation from the original phrase
end
You have an array of Strings:
array = x.split("")
but you're comparing those strings with a Regexp:
vowel = /\b[aeiou]*/
#...
until array[0]==vowel do
a == b is false for every String a and Regexp b so you're just writing this in a complicated way:
until false do
Perhaps you mean to say:
until array[0] =~ vowel do
That should stop your infinite loop but your loop still doesn't make much sense. You do this:
it = array[0]
array.push(it)
array.delete(it)
So you grab the first element, push it onto the end of the array, and then delete everything from array that matches it (note that Array#delete deletes all matches). Why not just array.delete(it)? Or, if you only want to toss the first entry, use shift.

Is there a simple way in ruby to call next word in a string?

Long story short I am checking a rather lengthy error log. I would like to find and parse the ip address associated with each error.
example I want to parse
client: 12.345.678.910
def check_file( file, string )
File.open( file ) do |io|
io.each do |line|
result << parse_ip( line ) if line.include? string
end
end
result
end
def parse_ip( flag )
flag = flag.split.find_all{|word| /^client:.+/.match word}
ip = flag. # need to grab ip here
ip
end
Is there a simple way to get next word?
I am just not sure how to grab the characters following "client:"
Any assistance is appreciated.
EDIT: syntax error
There are several ways to do it. For your example, you could extract the ip with a single regexp capture:
def parse_ip( flag )
m = /\bclient:\s*([\d\.]+)/.match flag
m && m[1]
end
If you really prefer to tokenize with split for other reasons, you can use Enumerable#drop_while to scan to the key, then index to the next token:
def parse_ip( flag )
flag.split.drop_while{|token| token !~ /^client:/}[1]
end
"client: 12.345.678.910"[/client:\s*(.*?)\s*$/, 1]
=> "12.345.678.910"

Ruby: Use condition result in condition block

I have such code
reg = /(.+)_path/
if reg.match('home_path')
puts reg.match('home_path')[0]
end
This will eval regex twice :(
So...
reg = /(.+)_path/
result = reg.match('home_path')
if result
puts result[0]
end
But it will store variable result in memory till.
I have one functional-programming idea
/(.+)_path/.match('home_path').compact.each do |match|
puts match[0]
end
But seems there should be better solution, isn't it?
There are special global variables (their names start with $) that contain results of the last regexp match:
r = /(.+)_path/
# $1 - the n-th group of the last successful match (may be > 1)
puts $1 if r.match('home_path')
# => home
# $& - the string matched by the last successful match
puts $& if r.match('home_path')
# => home_path
You can find full list of predefined global variables here.
Note, that in the examples above puts won't be executed at all if you pass a string that doesn't match the regexp.
And speaking about general case you can always put assignment into condition itself:
if m = /(.+)_path/.match('home_path')
puts m[0]
end
Though, many people don't like that as it makes code less readable and gives a good opportunity for confusing = and ==.
My personal favorite (w/ 1.9+) is some variation of:
if /(?<prefix>.+)_path/ =~ "home_path"
puts prefix
end
If you really want a one-liner: puts /(?<prefix>.+)_path/ =~ 'home_path' ? prefix : false
See the Ruby Docs for a few limitations of named captures and #=~.
From the docs: If a block is given, invoke the block with MatchData if match succeed.
So:
/(.+)_path/.match('home_path') { |m| puts m[1] } # => home
/(.+)_path/.match('homepath') { |m| puts m[1] } # prints nothing
How about...
if m=/regex here/.match(string) then puts m[0] end
A neat one-line solution, I guess :)
how about this ?
puts $~ if /regex/.match("string")
$~ is a special variable that stores the last regexp match. more info: http://www.regular-expressions.info/ruby.html
Actually, this can be done with no conditionals at all. (The expression evaluates to "" if there is no match.)
puts /(.+)_path/.match('home_xath').to_a[0].to_s

Resources