Extract email addresses from a block of text - ruby

How can I create an array of email addresses contained within a block of text?
I've tried
addrs = text.scan(/ .+?#.+? /).map{|e| e[1...-1]}
but (not surprisingly) it doesn't work reliably.

Howabout this for a (slightly) better regular expression
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b
You can find this here:
Email Regex
Just an FYI, the problem with your email is that you allow only one type of separator before or after an email address. You would match "#" alone, if separated by spaces.

Related

Can a valid email address contain a newline character?

I'm dealing with some email parsing and validation and I'm wondering if a valid email address can contain a newline character.
There's no way for e-mail address to contain a newline.
This should help: How to validate an email address using a regular expression?
Here's a diagram, note there's no 0x0a character anywhere:

Is there a printable character which is not available for use in SMS messages?

I need a printable character which is not available in the mobile SMS messages. The reason is that I have a file which has a bunch of data, and one of those data fields is SMS-text. It is dummy data ofcourse.
I need to extract this field. The tool I am using for it asks for a field-separator, on the basis of which it separates fields into a CSV file. And it uses a comma character as the default field separator.
Now the problem is that whenever a comma character occurs in SMS text, it separates the rest of the SMS text and makes it a separate field.
So my question is that how do I find a single character which I can use as a field separater in this case?
I think you can encode the text using Base64 before sending SMS, and then decode after receiving. Please see: https://en.wikipedia.org/wiki/Base64.
You may want to have a look at the GSM charset spec. Be aware about the 7bits / 8bits encoding and the encoding of the different (human) languages.

Ruby Regex to extract domain from email address

I have no real previous experience using regex, just saying.
I want to extract domain names from email addresses with the below format.
richardc#mydomain.com
so that the regex returns just: mydomain
With an explanation of how/why it works if possible!
Cheers
Here capturing (...) the domain name in group \1 and replace the whole string with that capture, which yields the domain name only at the end.
email = 'richardc#mydomain.com'
domain = email.gsub(/.+#([^.]+).+/, '\1')
# => mydomain
.+ means any character(except \n). So its basically matching the whole email string, and capturing the domain name using ([^.]+) [means anything but dot]
if you want to take the parsing route instead, the mail gem will do the job:
Mail::Address.new("richardc#mydomain.com").domain

How can I sort an array of emails by the email provider?

So I dumped all the emails from a DB into a txt file and I`m looking to sort them by email provider, basically anything that comes after the # sign.
I know I can use regex to validate each email.
However how do I indicate that I want to sort them by anything that comes after the # sign?
I know I can use regex to validate each email.
Careful! The range of valid e-mail addresses is much wider than most people think. The only correct regexes for e-mail validation are on the order of a page in length. If you must use a regex, just check for the # and one ..
However how do I indicate that I want to sort them by anything that comes after the # sign
email_addresses.sort_by {|addr| addr.split('#').last }

VBScript Regular Expressions to check IP address validity with some adtional characters

How to create VB script Irregular expression syntax to check the VPparam (IP address validity) When the last octatat of the IP address is a range between ip's (x-y) and between each IP we can put the "," separator in order to add another IP
example of VBparam
VBparam=172.17.202.1-20
VBparam=172.17.202.1-10,192.9.200.1-100
VBparam=172.17.202.1-10,192.9.200.1-100,180.1.1.1-20
THX yael
I believe the term you're looking for is "regular expression", not "irregular" - might help when google searching. I don't know enough VB to provide a complete script, but the pattern you're looking for is:
(\d{1,3}\.){3}\d{1,3}(\-\d{1,3})?(,(\d{1,3}\.){3}\d{1,3}(\-\d{1,3})?)*
This will not validate that X < Y, or that each octet is in a proper range, i.e. 999.999.999.999 would be valid. You can't validate X < Y in regex (abbrev. for regular expression), so you'll need to use pattern captures to validate those yourself in the script. If you wish to validate that octets are in the proper range, replace \d{1,3} with ((1\d{2})|(2[0-4]\d)|(25[0-5])|\d{1,2}) each time it appears in the above script.

Resources