How to pull the email address out of this string? - ruby

Here are two possible email string scenarios:
email = "Joe Schmoe <joe#example.com>"
email = "joe#example.com"
I always only want joe#example.com.
So what would the regex or method be that would account for both scenarios?

This passes your examples:
def find_email(string)
string[/<([^>]*)>$/, 1] || string
end
find_email "Joe Schmoe <joe#example.com>" # => "joe#example.com"
find_email "joe#example.com" # => "joe#example.com"

If you know your email is always going to be in the < > then you can do a sub string with those as the starting and ending indexes.

If those are the only two formats, don't use a regex. Just use simple string parsing. IF you find a "<>" pair, then pull the email address out from between them, and if you don't find those characters, treat the whole string as the email address.
Regexes are great when you need them, but if you have very simple patterns, then the overhead of loading in and parsing down the regex and processing with it will be much higher than simple string manipulation. Not loading in extra libraries other than what is very core in a language will almost always be faster than going a different route.

If you are willing to load an extra library, this has already been solved in the TMail gem:
http://lindsaar.net/2008/4/13/tip-5-cleaning-up-an-verifying-an-email-address-with-ruby-on-rails
TMail::Address.parse('Mikel A. <spam#lindsaar.net>').spec
=> "spam#lindsaar.net"

Related

Clean string to get Email with Regex [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have a ruby code that extracts email addresses from a page. my code outputs the email address, but also captures other text as well.
I would like to pull the actual email out of this string. Sometimes, the string will include a mailto, sometimes it will not. I was trying to get the single word that occurs before the #, and anything that comes after the # by using a split, but I'm having trouble. Any ideas? Thanks!
href="mailto:someonesname#domain.rr.com"> | Email</a></td>
Use something prebuilt:
require 'uri'
addresses = URI.extract(<<EOT, :mailto)
this is some text. mailto:foo#bar.com and more text
and some more http://foo#bar.com text
href="mailto:someonesname#domain.rr.com"> | Email</a></td>
EOT
addresses # => ["mailto:foo#bar.com", "mailto:someonesname#domain.rr.com"]
URI comes with Ruby, and the pattern used to parse out URIs is well tested. It's not bullet-proof, but it works pretty well. If you're getting false-positives, you can use a select, reject or grep block to filter out the unwanted entries returned.
If you can't count on having mailto:, the problem becomes harder, because email addresses aren't simple to parse; There's too much variation to them. The problem is akin to validating an email address using a pattern, because, again, the format for addresses varies too much. "Using a regular expression to validate an email address" and "JavaScript Email Validation when there are (soon to be) 1000's of TLD's?" are good reads for more information.
This should also work nicely though won't account for invalid email formats - it will simply extract the email address based on your two use cases.
string[/[^\"\:](\w+#.*)(?=\")/]
This should work
inputstring[/href="[^"]+"/][6 .. -2].gsub("mailto:", "")
Explanation:
Grab the href attribute and it's contents
Remove the href= and qoutes
Remove the mailto: if it's there
Example:
irb(main):021:0> test = "href=\"mailto:francesco#hawaii.rr.com\"> | Email DuVin</a></td>"
=> "href=\"mailto:francesco#hawaii.rr.com\"> | Email DuVin</a></td>"
irb(main):022:0> test[/href="[^"]+"/][6 .. -2].gsub("mailto:", "")
=> "francesco#hawaii.rr.com"
irb(main):023:0> test = "href=\"francesco#hawaii.rr.com\"> | Email DuVin</a></td>"
=> "href=\"francesco#hawaii.rr.com\"> | Email DuVin</a></td>"
irb(main):024:0> test[/href="[^"]+"/][6 .. -2].gsub("mailto:", "")
=> "francesco#hawaii.rr.com"

String parse using regex

I have a string which is a function call. I want to parse it and obtain the parameters:
"add_location('http://abc.com/page/1/','This is the title, it is long',39.677765,-45.4343,34454,'http://abc.com/images/image_1.jpg')"
It has a total of 6 parameters and is a mixture of urls, integers and decimals. I can't figure out the regex for the split method which I will be using. Please help!
This is what I have come up with - which is wrong.
/('(.*\/[0-9]*)',)|([0-9]*,)/
Treating the string like a CSV might work:
require 'csv'
str = "add_location('http://abc.com/page/1/','This is the title, it is long',39.677765,-45.4343,34454,'http://abc.com/images/image_1.jpg')"
p CSV.parse(str[13..-2], :quote_char => "'").first
# => ["http://abc.com/page/1/", "This is the title, it is long", "39.677765", "-45.4343", "34454", "http://abc.com/images/image_1.jpg"]
Assuming all non-numeric parameters are enclosed in single quotes, as in your example
string.scan( /'.+?'|[-0-9.]+/ )
You really don't want to be parsing things this complex with a reg-ex; it just won't work in the long run. I'm not sure if you just want to parse this one string, or if there are lots of strings in this form which vary in exact contents. If you give a bit more info about your end goal, you might be able to get some more detailed help.
For parsing things this complex in the general case, you really want to perform proper tokenization (i.e. lexical analysis) of the string. In the past with Ruby, I've had good experiences doing this with Citrus. It's a nice gem for parsing complex tokens/languages like you're trying to do. You can find more about it here:
https://github.com/mjijackson/citrus

What does <<DESC mean in ruby?

I am learning Ruby, and in the book I use, there is an example code like this
#...
restaurant = Restaurant.new
restaurant.name = "Mediterrano"
restaurant.description = <<DESC
One of the best Italian restaurants in the Kings Cross area,
Mediterraneo will never leave you disappointed
DESC
#...
Can someone explain to me what <<DESC means in the above example? How does it differ from the common string double quote?
It is used to create multiline strings. Basically, '<< DESC' tells ruby to consider everything that follows until the next 'DESC' keyword. 'DESC' is not mandatory, as it can be replaced with anything else.
a = <<STRING
Here
is
a
multiline
string
STRING
The << operator is followed by an identifier that marks the end of the document. The end mark is called the terminator. The lines of text prior to the terminator are joined together, including the newlines and any other whitespace.
http://en.wikibooks.org/wiki/Ruby_Programming/Here_documents
It allows the creation of multi-line string constants in a readable way. See http://en.wikibooks.org/wiki/Ruby_Programming/Here_documents.
It is called a heredoc, or heredocument. It allows you to write multiline. You can test it in your terminal!

Convert Ruby string to *nix filename-compatible string

In Ruby I have an arbitrary string, and I'd like to convert it to something that is a valid Unix/Linux filename. It doesn't matter what it looks like in its final form, as long as it is visually recognizable as the string it started as. Some possible examples:
"Here's my string!" => "Heres_my_string"
"* is an asterisk, you see" => "is_an_asterisk_you_see"
Is there anything built-in (maybe in the file libraries) that will accomplish this (or close to this)?
By your specifications, you could accomplish this with a regex replacement. This regex will match all characters other than basic letters and digits:
s/[^\w\s_-]+//g
This will remove any extra whitespace in between words, as shown in your examples:
s/(^|\b\s)\s+($|\s?\b)/\\1\\2/g
And lastly, replace the remaining spaces with underscores:
s/\s+/_/g
Here it is in Ruby:
def friendly_filename(filename)
filename.gsub(/[^\w\s_-]+/, '')
.gsub(/(^|\b\s)\s+($|\s?\b)/, '\\1\\2')
.gsub(/\s+/, '_')
end
First, I see that it was asked purely in ruby, and second that it's not the same purpose (*nix filename compatible), but if you are using Rails, there is a method called parameterize that should help.
In rails console:
"Here's my string!".parameterize => "here-s-my-string"
"* is an asterisk, you see".parameterize => "is-an-asterisk-you-see"
I think that parameterize, as being compliant with URL specifications, may work as well with filenames :)
You can see more about here:
http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-parameterize
There's also a whole lot of another helpful methods.

how to convert strings like "this is an example" to "this-is-an-example" under ruby

How do I convert strings like "this is an example" to "this-is-an-example" under ruby?
The simplest version:
"this is an example".tr(" ", "-")
#=> "this-is-an-example"
You could also do something like this, which is slightly more robust and easier to extend by updating the regular expression:
"this is an example".gsub(/\s+/, "-")
#=> "this-is-an-example"
The above will replace all chunks of white space (any combination of multiple spaces, tabs, newlines) to a single dash.
See the String class reference for more details about the methods that can be used to manipulate strings in Ruby.
If you are trying to generate a string that can be used in a URL, you should also consider stripping other non-alphanumeric characters (especially the ones that have special meaning in URLs), or replacing them with an alphanumeric equivalent (example, as suggested by Rob Cameron in his answer).
If you are trying to make something that is a good URL slug, there are lots of ways to do it.
Generally, you want to remove everything that is not a letter or number, and then replace all whitespace characters with dashes.
So:
s = "this is an 'example'"
s = s.gsub(/\W+/, ' ').strip
s = s.gsub(/\s+/,'-')
At the end s will equal "this-is-an-example"
I used the source code from a ruby testing library called contest to get this particular way to do it.
If you're using Rails take a look at parameterize(), it does exactly what you're looking for:
http://api.rubyonrails.org/classes/ActiveSupport/CoreExtensions/String/Inflections.html#M001367
foo = "Hello, world!"
foo.parameterize => 'hello-world'

Resources