Ruby Regex Odd Error, What is going on? - ruby

I have the following program:
class Matcher
include Enumerable
def initialize(string, match)
#string = string
#match = match
end
def each
#string.scan(/[##match]/) do |pattern|
yield pattern
end
end
end
mch = Matcher.new("the quickbrown fox", "aeiou")
puts mch.inject {|x, n| x+n}
It is supposed to match the characters, aeiou with the string the quickbrown fox
No matter what I put as the pattern, it oddly prints out the characters: thc. What's going on?

#string.scan(/[##match]/) do |pattern| is incorrect. #{#match} is what you're looking for.

Related

How can I create a method that checks if a string starts with a capitalized letter?

So far I have:
def capitalized?(str)
str[0] == str[0].upcase
end
THe problem wit this is that it returns true for strings like "12345", "£$%^&" and"9ball" etc. I would like it to only return true if the first character is a capital letter.
You can use match? to return true if the first character is a letter in the range of A to Z both uppercase or not:
def capitalized?(str)
str.match?(/\A[A-Z]/)
end
p capitalized?("12345") # false
p capitalized?("fooo") # false
p capitalized?("Fooo") # true
Also you can pass a regular expression to start_with?:
p 'Foo'.start_with?(/[A-Z]/) # true
p 'foo'.start_with?(/[A-Z]/) # false
There's probably a nicer way to do it with regex, but keeping this ruby based, you can make an array of capital letters:
capital_letters = ("A".."Z")
Then you can check if your first letter is in that array:
def capitalized?(str)
capital_letters = ("A".."Z")
capital_letters.include?(str[0])
end
Or a bit shorter:
def capitalized?(str)
("A".."Z").include?(str[0])
end
I would avoid character ranges if possible, because without knowing the encoding, you can never be sure what is in a range. In your case, it is unnecessary. A simple
/^[[:upper:]]/ =~ str
would do. See here for the definition of POSIX character classes.
def capitalized?(str)
str[0] != str[0].downcase
end
capitalized? "Hello" #=> true
capitalized? "hello" #=> false
capitalized? "007, I presume" #=> false
capitalized? "$100 for that?" #=> false
Simple solution
def capitalized?(str)
str == str.capitalize
end

Insert text before the end of a file

I am trying to write a script that will insert a text before the last end tag within a Ruby file. For example, I want to insert the following:
def hello
puts "hello!"
end
within the following file, just before the end of the class:
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
helper_method :authenticated?, :current_user
def current_user?
session[:current_user]
end
end
The result should look like this:
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
helper_method :authenticated?, :current_user
def current_user?
session[:current_user]
end
def hello
puts "hello!"
end
end
I have tried to find a regex that would match the last occurence of end and replace it with the block I want to add but all regexes I have tried match the first end only. Tried these:
end(?=[^end]*$)
end(?!.*end)
(.*)(end)(.*)
To replace the string, I do the following (maybe the EOL characters are screwing up the matching?):
file_to_override = File.read("app/controllers/application_controller.rb")
file_to_override = file_to_override.sub(/end(?=[^end]*$)/, "#{new_string}\nend")
EDIT: I also tried with the solution provided in How to replace the last occurrence of a substring in ruby? but strangely, it replaces all occurences of end.
What am I doing wrong? Thanks!
The approach explained in the post is working here, too. You just need to re-organize capturing groups and use the /m modifier that forces . to match newline symbols, too.
new_string = <<EOS
def hello
puts "Hello!"
end
EOS
file_to_override = <<EOS
class ApplicationController < ActionController::Base
# Prevent CSRF attacks by raising an exception.
# For APIs, you may want to use :null_session instead.
protect_from_forgery with: :exception
helper_method :authenticated?, :current_user
def current_user?
session[:current_user]
end
end
EOS
file_to_override=file_to_override.gsub(/(.*)(\nend\b.*)/m, "\\1\n#{new_string}\\2")
puts file_to_override
See IDEONE demo
The /(.*)(\nend\b.*)/m pattern will match and capture into Group 1 all the text up to the last whole word (due to the \n before and \b after) end preceded with a line feed, and will place the line feed, "end" and whatever remains into Group 2. In the replacement, we back-reference the captured substrings with backreferences \1 and \2 and also insert the string we need to insert.
If there are no other words after the last end, you could even use a /(.*)(\nend\s*\z)/m regex.
Suppose you read the file into the string text:
text = <<_
class A
def a
'hi'
end
end
_
and wish to insert the string to_enter:
to_enter = <<_
def hello
puts "hello!"
end
_
before the last end. You could write
r = /
.* # match any number of any character (greedily)
\K # discard everything matched so far
(?=\n\s*end\b) # match end-of-line, indenting spaces, and "end" followed
# by a word break in a positive lookahead
/mx # multi-line and extended/free-spacing regex definition modes
puts text.sub(r, to_enter)
(prints)
class A
def a
'hi'
end
def hello
puts "hello!"
end
end
Note that sub is replacing an empty string with to_enter.
Edit: Answer from Wiktor is exactly what I was looking for. Leaving the following too because it works as well.
Finally, I gave up on replacing using a regex. Instead, I use the position of the last end:
positions = file_to_override.enum_for(:scan, /end/).map { Regexp.last_match.begin(0) }
Then, before writing the file, I add what I need within the string at last position - 1:
new_string = <<EOS
def hello
puts "Hello!"
end
EOS
file_to_override[positions.last - 1] = "\n#{test_string}\n"
File.open("app/controllers/application_controller.rb", 'w') {|file| file.write(file_to_override)}
This works but it doesn't look like idiomatic Ruby to me.
You can also find and replace the last occurence of "end" (note that this will also match the end in # Hello my friend, but see below) like this
# Our basics: In this text ...
original_content = "# myfile.rb\n"\
"module MyApp\n"\
" class MyFile\n"\
" def myfunc\n"\
" end\n"\
" end\n"\
"end\n"
# ...we want to inject this:
substitute = "# this will come to a final end!\n"\
"end\n"
# Now find the last end ...
idx = original_content.rindex("end") # => index of last "end"(69)
# ... and substitute it
original_content[idx..idx+3] = substitute # (3 = "end".length)
This solution is somewhat more old-school (dealing with indexes in strings felt much cooler some years ago) and in this form more "vulnerable" but avoids you to sit down and digest the regexps. Dont get me wrong, regular expressions are a tool of incredible power and the minutes learning them are worth it.
That said, you can use all the regular expressions from the other answers also with rindex (e.g. rindex(/ *end/)).

RE-capitalizing the first word

So it's apparent this question has been asked before, but what I'm actually asking is specific to the code I am writing. Basically I'm capitalizing the words (titleizing). My method is not optimized, and it does go in circles so just bear with me. I can't seem to recapitalize the first word of the title once I made it lowercased again. I have written comments in the code, so you can just breeze through it without analyzing the entire thing. I'm not asking you to write a new code because I can just google that. I'm more interested in why my solutions aren't working..
input: "the hamster and the mouse"
output: "the Hamster and the Mouse"
WHAT I WANT: "The Hamster and the Mouse"
class String
def titleize
#regex reads: either beginning of string or whitespace followed by alpha
self.gsub(/(\A|\s)[a-z]/) do |letter|
letter.upcase!
end
end
end
class Book
attr_accessor :title
def title=(title)
#title = title.titleize #makes every word capitalized
small_words = %w[In The And A An Of]
words = #title.split(" ")
#makes all the "small_words" uncapitalized again
words.each do |word|
if small_words.include?(word)
word.downcase!
end
end
words[0][0].upcase! #doesnt work
#title = words.join(" ")
#NEED TO MAKE FIRST WORD CAPITALIZED EVEN IF ITS A "small_word"
#title[0].upcase! #also doesnt work
end
end
Replace words[0][0].upcase! with words[0] = words[0].titleize. This will titleize the first word in the title, which is what you want.
You also don't need #title[0].upcase!.
Change the last line from:
#title[0].upcase!
To:
#title.capitalize!
EDIT:
I rewrote the class. Fewer lines and you don't need RegEx or String#titleize method.
class Book
attr_reader :title
def title=(title)
small_words = ["in", "the", "and", "a", "an", "of"]
#title = title.split.each do |word|
small_words.include?(word.downcase) ? word.downcase! : word.capitalize!
end
#title[0].capitalize!
#title = #title.join(" ")
end
end
new_book = Book.new
new_book.title="the hamster and the mouse"
new_book.title # => "The Hamster and the Mouse"

How do I capitalize all words in a string apart from small words in the middle and in the beginning?

I have a long string, "the silver rider on his back and the palm tree". I would like to write a Ruby method that capitalizes all words except "on", "the", and "and" in the middle of the sentence, but have the "the" capitalized at the beginning?
Here is what I have so far:
def title(word)
small_words = %w[on the and]
word.split(' ').map do |w|
unless small_words.include?(w)
w.capitalize
else
w
end
end.join(' ')
end
This code actually does most of what I need but don't know how to include or exclude for that matter the "the" at the beginning of the sentence.
This will capitalize all the words, except for the stop words (your small words) that aren't the first in the sentence.
def title(sentence)
stop_words = %w{a an and the or for of nor} #there is no such thing as a definite list of stop words, so you may edit it according to your needs.
sentence.split.each_with_index.map{|word, index| stop_words.include?(word) && index > 0 ? word : word.capitalize }.join(" ")
end
It’s easiest to just forget about the special case of the first letter initially and then handle it after doing everything else:
def title(sentence)
small_words = %w[on the and]
capitalized_words = sentence.split(' ').map do |word|
small_words.include?(word) ? word : word.capitalize
end
capitalized_words.first.capitalize!
capitalized_words.join(' ')
end
This also capitalizes any “small word” at the beginning, not just “the”—but I think that’s probably what you want anyway.
A simple mod to your existing code would make it work:
def title( word )
small_words = %w[on the and]
word.split(' ').map.with_index do |w, i|
unless (small_words.include? w) and (i > 0)
w.capitalize
else
w
end
end.join(' ')
end
SmallWords = %w[on the and]
def title word
word.gsub(/[\w']+/){
SmallWords.include?($&) && $~.begin(0).zero?.! ? $& : $&.capitalize
}
end

Ruby Regex not matching

I'm writing a short class to extract email addresses from documents. Here is my code so far:
# Class to scrape documents for email addresses
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
def EmailScraper.scrape(doc)
email_addresses = []
File.open(doc) do |file|
while line = file.gets
temp = line.scan(EmailRegex)
temp.each do |email_address|
puts email_address
emails_addresses << email_address
end
end
end
return email_addresses
end
end
if EmailScraper.scrape("email_tests.txt").empty?
puts "Empty array"
else
puts EmailScraper.scrape("email_tests.txt")
end
My "email_tests.txt" file looks like so:
example#live.com
another_example90#hotmail.com
example3#diginet.ie
When I run this script, all I get is the "Empty array" printout. However, when I fire up irb and type in the regex above, strings of email addresses match it, and the String.scan function returns an array of all the email addresses in each string. Why is this working in irb and not in my script?
Several things (some already mentioned and expanded upon below):
\z matches to the end of the string, which with IO#gets will typically include a \n character. \Z (upper case 'z') matches the end of the string unless the string ends with a \n, in which case it matches just before.
the typo of emails_addresses
using \A and \Z is fine while the entire line is or is not an email address. You say you're seeking to extract addresses from documents, however, so I'd consider using \b at each end to extract emails delimited by word boundaries.
you could use File.foreach()... rather than the clumsy-looking File.open...while...gets thing
I'm not convinced by the Regex - there's a substantial body of work already around:
There's a smarter one here: http://www.regular-expressions.info/email.html (clicking on that odd little in-line icon takes you to a piece-by-piece explanation). It's worth reading the discussion, which points out several potential pitfalls.
Even more mind-bogglingly complex ones may be found here.
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\Z/i # changed \z to \Z
def EmailScraper.scrape(doc)
email_addresses = []
File.foreach(doc) do |line| # less code, same effect
temp = line.scan(EmailRegex)
temp.each do |email_address|
email_addresses << email_address
end
end
email_addresses # "return" isn't needed
end
end
result = EmailScraper.scrape("email_tests.txt") # store it so we don't print them twice if successful
if result.empty?
puts "Empty array"
else
puts result
end
Looks like you're putting the results into emails_addresses, but are returning email_addresses. This would mean that you're always returning the empty array you defined for email_addresses, making the "Empty array" response correct.
You have a typo, try with:
class EmailScraper
EmailRegex = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
def EmailScraper.scrape(doc)
email_addresses = []
File.open(doc) do |file|
while line = file.gets
temp = line.scan(EmailRegex)
temp.each do |email_address|
puts email_address
email_addresses << email_address
end
end
end
return email_addresses
end
end
if EmailScraper.scrape("email_tests.txt").empty?
puts "Empty array"
else
puts EmailScraper.scrape("email_tests.txt")
end
You used at the end \z try to use \Z according to http://www.regular-expressions.info/ruby.html it has to be a uppercase Z to match the end of the string.
Otherwise try to use ^ and $ (matching the start and the end of a row) this worked for me here on Regexr
When you read the file, the end of line is making the regex fail. In irb, there probably is no end of line. If that is the case, chomp the lines first.
regex=/\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
line_from_irb = "example#live.com"
line_from_file = line_from_irb +"/n"
p line_from_irb.scan(regex) # => ["example#live.com"]
p line_from_file.scan(regex) # => []

Resources