Ruby regular express. detect wrong email address such as "hi#myio..io" - ruby

To detect wrong email address such as "hi#myio..io"
VALID_EMAIL_REGEX = /\A[\w+\-.]+#[a-z\d\-.]+\.[a-z]+\z/i
VALID_EMAIL_REGEX_FULL = /\A[\w+\-.]+#[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i
First one fails. Second suceeds.
I don't understand how does this part make it different? (\.[a-z]+)*\.[a-z]
Thank you!

The better answer is that using a regular expression for matching email addresses is a bad idea. For one, all valid addresses are not active. hd1#jsc.d8u.us is me, hd2#jsc.d8u.us is a valid email address by every RFC in existence, but it's not an active email account.
If you want to do email address validation, you could do worse than to set up a web service that does nothing more than take a string, use JavaMail's address parsing (InternetAddress.parse()), which throws an exception if the parse fails and returns the address if it succeeds. Sample code below:
public class ValidationServlet extends HttpServlet {
protected void doHead(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String candid8Address = request.getParameter("email");
try {
InternetAddress.parse(candid8Address);
response.setStatus(HttpServletResponse.SC_OK);
} catch (AddressException e) {
response.setStatus(HttpServletResponse.SC_NOT_FORBIDDEN);
}
}
}
Let me know if you need further assistance...

In the first one #[a-z\d\-.] has . which matches with any character including an .. It should be removed so the domain will only match a alphanumeric character. It should be:
/\A[\w+\-.]+#[a-z\d\-]+\.[a-z]+\z/i

Try this :
/^([\w.%+-]+)#([\w-]+.)+([\w]{2,})$/i
To validate the email address

Try the following:--
/\A([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i

I once found this one:
# RFC822 Email Address Regexp
# ---------------------------
#
# Originally written by Cal Henderson
# c.f. http://iamcal.com/publish/articles/php/parsing_email/
#
# Translated to Ruby by Tim Fletcher, with changes suggested by Dan Kubb.
#
# Licensed under a Creative Commons Attribution-ShareAlike 2.5 License
# http://creativecommons.org/licenses/by-sa/2.5/
#
# (see: http://tfletcher.com/lib/rfc822.rb)
RFC822 = begin
qtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]'
dtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]'
atom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-' +
'\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+'
quoted_pair = '\\x5c[\\x00-\\x7f]'
domain_literal = "\\x5b(?:#{dtext}|#{quoted_pair})*\\x5d"
quoted_string = "\\x22(?:#{qtext}|#{quoted_pair})*\\x22"
domain_ref = atom
sub_domain = "(?:#{domain_ref}|#{domain_literal})"
word = "(?:#{atom}|#{quoted_string})"
domain = "#{sub_domain}(?:\\x2e#{sub_domain})*"
local_part = "#{word}(?:\\x2e#{word})*"
# The following line was needed to change for ruby 1.9
# was: addr_spec = "#{local_part}\\x40#{domain}"
addr_spec = Regexp.new("#{local_part}\\x40#{domain}", nil, 'n')
pattern = /\A#{addr_spec}\z/
end.freeze

Related

Get IPv4 out of a Ruby string that contains both IPv4 and IPv6?

I have a string from the X-Forwarded-For header that contains both IPv4 and IPv6 addresses.
I need to pull just the IPv4 address from the string.
It's comma-separated, but the order of them changes so I can't just split and pull the second item.
Example: header = 2600:1740:8540:cff9:1c50:617:c9c5:63f7, 165.154.107.112
I ultimately just want 165.154.107.112.
I'm using Ruby 2.5.1 (and this happens to be inside a Rails 5.2.0 app, for what it's worth).
Assuming your header is always as you've posted:
require 'ipaddr'
header = "165.154.107.112, 2600:1740:8540:cff9:1c50:617:c9c5:63f7"
ip = header.split(', ').select {|ip| ip if IPAddr.new(ip).ipv4? }.pop
# => "165.154.107.112"
header = "2600:1740:8540:cff9:1c50:617:c9c5:63f7, 165.154.107.112, 166.155.108.113"
header.split(/\s?,\s?/).find { |s| IPAddr.new(s).ipv4? }
#=> "165.154.107.112"
or
header.split(/,\s+/).select { |s| IPAddr.new(s).ipv4? }
#=> ["165.154.107.112", "166.155.108.113"]
See IPAddr::new and IPAddr#ipv4?.
If "header = " is part of the string str, replace header.split with str[/\d.+/].split.
If the string may contain text that is not a valid IP address, you could write the following.
header.split(/\s?,\s?/).find { |s| (IPAddr.new(s) rescue nil)&.ipv4? }
IPAddr.new('cat'), for example, raises the exception IPAddr::InvalidAddressError (invalid address). & is Ruby's safe navigation operator, which made it's debut in v2.3.

How to pass symbol as parameter of post request in ruby?

This code work well
Geokit::default_units = :miles #:kms, :nms, :meters
But this code make errors
puts params[:unit] # miles
Geokit::default_units = params[:unit] #:miles, :kms, :nms, :meters
What is wrong with this?
That's because all that goes through the params is an string, if you want a symbol, then consider using .to_sym:
params = { unit: 'miles' }
p params[:unit].class # String
p params[:unit].to_sym.class # Symbol
have you confirmed that params[:unit] is actually a symbol, and not a string?
Geokit::default_units = params[:unit].to_sym
If the above solves your problem, then you didn't have a symbol in there to start with (likely, if params has been read from an HTTP request)

What are the AppName, AppPublisher and AppVersion header values for a WSE 2012 R2 WebApi call?

I'm trying to query my Server 2012 Essentials R2 server to determine the most recent Client Backup time for a given Device, so I can display nag screens at signon for forgetful users. (They're on laptops, so I can't depend on the machine being available during the automatic window.)
The closest thing in the way of documentation I've been able to find is this: (https://msdn.microsoft.com/en-us/library/jj713757.aspx)
GET services/builtin/DeviceManagement.svc/devices/index/{index}/count/{count}
But it requires a preceding call to get the token: (https://msdn.microsoft.com/en-us/library/jj713753.aspx)
GET https://www.contoso.com/services/builtin/session.svc/login HTTP/1.1
Accept: application/xml
Host: servername
Authorization: Basic VXNlcjpQYXNzd29yZCE=
AppName: Sample App Name
AppPublisher: publisher
AppVersion: 1.0
Does anyone know what the values for those last three headers should be—or how to discover them—for a standard WSE 2012 R2 installation? The documentation provides no assistance here.
Or if someone knows a better way to accomplish this, please let me know.
OK, I got it working. The code is below.
As it turns out, the value of the AppName header is irrelevant—it can be any string, but it can't be empty.
I already knew it couldn't be empty from a look at the WSE source in Wssg.WebApi.Framework in the GAC, but the code is decoupled to the point that it's next to impossible to find out what process picks up the the RemoteConnectionClientInfo object once it gets dropped into the HTTP session.
The part that was misleading me was—go figure—the documentation itself.
There's a bang (!) after the password on the Authentication page, suggesting that it should trail the actual password prior to encoding. This was why I was getting an authentication error, which in turn I was (mistakenly) attributing to the statement in the documentation: "Add Appname, Apppublisher, and Appversion values in HTTP header fields. These values are also required to log on."
So once I cleared all that up, I sailed right in.
And there are other errors in the documentation. On the Devices page we are told that the Host header should be set to the domain name, and that a Content-Length header should be added.
These are both incorrect. The Host header should be the server's hostname and there should be no Content-Length header (that's a response header, not a request header).
AND...! After all this, I find that the Device info returned doesn't contain the most recent backup time. I'll have to dig further for that. But at least now I can connect.
So Microsoft's incomplete, inaccurate and sloppy documentation has cost me a day's work. Hopefully somebody else can use this and avoid the pain I went through.
Module Main
Public Sub Main()
Dim aCredentials() As Byte
Dim _
oAuthenticateUri,
oDeviceListUri As Uri
Dim _
sCanary,
sCookie,
sDevices As String
aCredentials = Encoding.ASCII.GetBytes($"{USERNAME}:{PASSWORD}")
Using oClient As New HttpClient
oAuthenticateUri = New Uri($"https://{HOST}/services/builtin/session.svc/login")
oDeviceListUri = New Uri($"https://{HOST}/services/builtin/devicemanagement.svc/devices/index/0/count/99")
oClient.DefaultRequestHeaders.Accept.Add(New MediaTypeWithQualityHeaderValue("application/xml"))
oClient.DefaultRequestHeaders.Authorization = New AuthenticationHeaderValue("Basic", Convert.ToBase64String(aCredentials))
oClient.DefaultRequestHeaders.Host = HOST
oClient.DefaultRequestHeaders.Add("AppPublisher", String.Empty)
oClient.DefaultRequestHeaders.Add("AppVersion", String.Empty)
oClient.DefaultRequestHeaders.Add("AppName", "None")
Using oAuthenticateResponse As HttpResponseMessage = oClient.GetAsync(oAuthenticateUri).Result
If oAuthenticateResponse.IsSuccessStatusCode Then
sCanary = oAuthenticateResponse.Headers.Single(Function(Pair) Pair.Key = CANARY_HEADER).Value(0)
sCookie = Split(oAuthenticateResponse.Headers.Single(Function(Pair) Pair.Key = COOKIE_HEADER).Value(0), ";")(0)
oClient.DefaultRequestHeaders.Clear()
oClient.DefaultRequestHeaders.Host = HOST
oClient.DefaultRequestHeaders.Add(CANARY_HEADER, sCanary)
oClient.DefaultRequestHeaders.Add(COOKIE_HEADER, sCookie)
Using oDeviceListResponse As HttpResponseMessage = oClient.GetAsync(oDeviceListUri).Result
If oDeviceListResponse.IsSuccessStatusCode Then
sDevices = oDeviceListResponse.Content.ReadAsStringAsync.Result
Else
Console.WriteLine("{0} ({1})", oDeviceListResponse.StatusCode, oDeviceListResponse.ReasonPhrase)
End If
End Using
Else
Console.WriteLine("{0} ({1})", oAuthenticateResponse.StatusCode, oAuthenticateResponse.ReasonPhrase)
End If
End Using
End Using
End Sub
Private Const CANARY_HEADER As String = "Canary"
Private Const COOKIE_HEADER As String = "Set-Cookie"
Private Const USERNAME As String = "domain.admin"
Private Const PASSWORD As String = "admin.password"
Private Const HOST As String = "server"
End Module

How to retain the untokenizable character in MaxEntTagger?

I'm using MaxEntTagger for pos-tagging and sentence splitting by using the follwing codes:
MaxentTagger tagger = new MaxentTagger("models/left3words-wsj-0-18.tagger");
#SuppressWarnings("unchecked")
List<Sentence<? extends HasWord>> sentences = MaxentTagger.tokenizeText(new BufferedReader(new StringReader(out2)));
for (Sentence<? extends HasWord> sentence : sentences) {
content.append(sentence + "\n");
Sentence<TaggedWord> tSentence = MaxentTagger.tagSentence(sentence);
out.append(tSentence.toString(false) + "\n");
}
The problem is it will complain there are untokenizable characters in the text. And the tagged output will omit those untokenizable characters. So for example, the original text is:
Let Σ be a finite set of function symbols, the signature.
where Σ is in big5 code. But the program will show the following warning message:
Untokenizable: Σ (first char in decimal: 931)
and the tagged output is:
Let/VB be/VB a/DT finite/JJ set/NN of/IN function/NN symbols/NNS ,/, the/DT signature/NN ./.
the splitted sentence I got is:
Let be a finite set of function symbols , the signature .
My question is how to retain these untokenizable characters?
I've tried modifying the mode's props file but with no luck:
tagger training invoked at Sun Sep 21 23:03:26 PDT 2008 with arguments:
model = left3words-wsj-0-18.tagger
arch = left3words,naacl2003unknowns,wordshapes(3)
trainFile = /u/nlp/data/pos-tagger/train-wsj-0-18 ...
encoding = Big5
initFromTrees = false
Any suggestion?
Thanks Prof. Manning's help. But I encounter the same issue when utilizing parser tree.
The sequel
I need to get the parser tree of a sentence, so I used the following codes:
PTBTokenizer<Word> ptb = PTBTokenizer.newPTBTokenizer(new StringReader(sentences));
List<Word> words = ptb.tokenize();
Tree parseTree2 = lp.apply(words);
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parseTree2);
But I don't know how to set PTBTokenizer for resolving the issue of untokenizable characters this time.
If using the factory method to generate an PTBTokenizer object, I don't know how to concatenate it to the StringReader.
List<Word> words = ptb.getTokenizer(new StringReader(sentences));
doesn't work.
The Stanford tokenizer accepts a variety of options to control tokenization, including how characters it doesn't know about are handled. However, to set them, you currently have to instantiate your own tokenizer. But that's not much more difficult than what you have above. The following complete program makes a tokenizer with options and then tags using it.
The "noneKeep" option means that it logs no messages about unknown characters but keeps them and turns each into a single character token. You can learn about the other options in the PTBTokenizer class javadoc.
NOTE: you seem to be using a rather old version of the tagger. (We got rid of the Sentence class and started just using List's of tokens about 2 years ago, probably around the same time these options were added to the tokenizer.) So you may well have to upgrade to the latest version. At any rate, the code below will only compile correctly against a more recent version of the tagger.
import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.process.*;
import edu.stanford.nlp.objectbank.TokenizerFactory;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
/** This demo shows user-provided sentences (i.e., {#code List<HasWord>})
* being tagged by the tagger. The sentences are generated by direct use
* of the DocumentPreprocessor class.
*/
class TaggerDemo2 {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("usage: java TaggerDemo modelFile fileToTag");
return;
}
MaxentTagger tagger = new MaxentTagger(args[0]);
TokenizerFactory<CoreLabel> ptbTokenizerFactory =
PTBTokenizer.factory(new CoreLabelTokenFactory(), "untokenizable=noneKeep");
BufferedReader r =
new BufferedReader(new InputStreamReader(new FileInputStream(args[1]), "utf-8"));
PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, "utf-8"));
DocumentPreprocessor documentPreprocessor = new DocumentPreprocessor(r);
documentPreprocessor.setTokenizerFactory(ptbTokenizerFactory);
for (List<HasWord> sentence : documentPreprocessor) {
List<TaggedWord> tSentence = tagger.tagSentence(sentence);
pw.println(Sentence.listToString(tSentence, false));
}
}
}

Directly Downloading a File From an RSS feed Using Ruby - Handling Redirects

I'm writing a program in Ruby that downloads a file from an RSS feed to my local hard drive. Previously, I'd written this application in Perl and figured a great way to learn Ruby would be to recreate this program using Ruby code.
In the Perl program (which works), I was able to download the original file directly from the server it was hosted on (keeping the original file name) and it worked great. In the Ruby program (which isn't working), I have to sort of "stream" the data from the file I want into a new file that I've created on my hard drive. Unfortunately, this isn't working and the "streamed" data is always coming back empty. My assumption is that there is some sort of redirect that Perl can handle to retrieve the file directly that Ruby cannot.
I'm going to post both programs (they're relatively small) and hope that this helps solve my issue. If you have questions, please let me know. As a side note, I pointed this program at a more static URL (a jpeg) and it downloaded the file just fine. This is why I'm theorizing that some sort of redirect is causing issues.
The Ruby Code (That Doesn't Work)
require 'net/http';
require 'open-uri';
require 'rexml/document';
require 'sqlite3';
# Create new SQLite3 database connection
db_connection = SQLite3::Database.new('fiend.db');
# Make sure I can reference records in the query result by column name instead of index number
db_connection.results_as_hash = true;
# Grab all TV shows from the shows table
query = '
SELECT
id,
name,
current_season,
last_episode
FROM
shows
ORDER BY
name
';
# Run through each record in the result set
db_connection.execute(query) { |show|
# Pad the current season number with a zero for later user in a search query
season = '%02d' % show['current_season'].to_s;
# Calculate the next episode number and pad with a zero
next_episode = '%02d' % (Integer(show['last_episode']) + 1).to_s;
# Store the name of the show
name = show['name'];
# Generate the URL of the RSS feed that will hold the list of torrents
feed_url = URI.encode("http://btjunkie.org/rss.xml?query=#{name} S#{season}E#{next_episode}&o=52");
# Generate a simple string the denotes the show, season and episode number being retrieved
episode_id = "#{name} S#{season}E#{next_episode}";
puts "Loading feed for #{name}..";
# Store the response from the download of the feed
feed_download_response = Net::HTTP.get_response(URI.parse(feed_url));
# Store the contents of the response (in this case, XML data)
xml_data = feed_download_response.body;
puts "Feed Loaded. Parsing items.."
# Create a new REXML Document and pass in the XML from the Net::HTTP response
doc = REXML::Document.new(xml_data);
# Loop through each in the feed
doc.root.each_element('//item') { |item|
# Find and store the URL of the torrent we wish to download
torrent_url = item.elements['link'].text + '/download.torrent';
puts "Downloading #{episode_id} from #{torrent_url}";
## This is where crap stops working
# Open Connection to the host
Net::HTTP.start(URI.parse(torrent_url).host, 80) { |http|
# Create a torrent file to dump the data into
File.open("#{episode_id}.torrent", 'wb') { |torrent_file|
# Try to grab the torrent data
data = http.get(torrent_url[19..torrent_url.size], "User-Agent" => "Mozilla/4.0").body;
# Write the data to the torrent file (the data is always coming back blank)
torrent_file.write(data);
# Close the torrent file
torrent_file.close();
}
}
break;
}
}
The Perl Code (That Does Work)
use strict;
use XML::Parser;
use LWP::UserAgent;
use HTTP::Status;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=fiend.db", "", "", { RaiseError => 1, AutoCommit => 1 });
my $userAgent = new LWP::UserAgent; # Create new user agent
$userAgent->agent("Mozilla/4.0"); # Spoof our user agent as Mozilla
$userAgent->timeout(20); # Set timeout limit for request
my $currentTag = ""; # Stores what tag is currently being parsed
my $torrentUrl = ""; # Stores the data found in any node
my $isDownloaded = 0; # 1 or zero that states whether or not we've downloaded a particular episode
my $shows = $dbh->selectall_arrayref("SELECT id, name, current_season, last_episode FROM shows ORDER BY name");
my $id = 0;
my $name = "";
my $season = 0;
my $last_episode = 0;
foreach my $show (#$shows) {
$isDownloaded = 0;
($id, $name, $season, $last_episode) = (#$show);
$season = sprintf("%02d", $season); # Append a zero to the season (e.g. 6 becomes 06)
$last_episode = sprintf("%02d", ($last_episode + 1)); # Append a zero to the last episode (e.g. 6 becomes 06) and increment it by one
print("Checking $name S" . $season . "E" . "$last_episode \n");
my $request = new HTTP::Request(GET => "http://btjunkie.org/rss.xml?query=$name S" . $season . "E" . $last_episode . "&o=52"); # Retrieve the torrent feed
my $rssFeed = $userAgent->request($request); # Store the feed in a variable for later access
if($rssFeed->is_success) { # We retrieved the feed
my $parser = new XML::Parser(); # Make a new instance of XML::Parser
$parser->setHandlers # Set the functions that will be called when the parser encounters different kinds of data within the XML file.
(
Start => \&startHandler, # Handles start tags (e.g. )
End => \&endHandler, # Handles end tags (e.g.
Char => \&DataHandler # Handles data inside of start and end tags
);
$parser->parsestring($rssFeed->content); # Parse the feed
}
}
#
# Called every time XML::Parser encounters a start tag
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub startHandler {
my($parseInstance, $element, %attributes) = #_;
$currentTag = $element;
}
#
# Called every time XML::Parser encounters anything that is not a start or end tag (i.e, all the data in between tags)
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub DataHandler {
my($parseInstance, $element, %attributes) = #_;
if($currentTag eq "link" && $element ne "\n") {
$torrentUrl = $element;
}
}
#
# Called every time XML::Parser encounters an end tag
# #param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed.
# #param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed.
# #attributes {array} | An array of all of the attributes of $element
# #returns: void
#
sub endHandler {
my($parseInstance, $element, %attributes) = #_;
if($element eq "item" && $isDownloaded == 0) { # We just finished parsing an element so let's attempt to download a torrent
print("DOWNLOADING: $torrentUrl" . "/download.torrent \n");
system("echo.|lwp-download " . $torrentUrl . "/download.torrent"); # We echo the "return " key into the command to force it to skip any file-overwite prompts
if(unlink("download.torrent.html")) { # We tried to download a 'locked' torrent
$isDownloaded = 0; # Forces program to download next torrent on list from current show
}
else {
$isDownloaded = 1;
$dbh->do("UPDATE shows SET last_episode = '$last_episode' WHERE id = '$id'"); # Update DB with new show information
}
}
}
Yes, the URLs you are retrieving appear to be returning a 302 (redirect). Net::HTTP requires/allows you to handle the redirect yourself. You typically use a recursive techique like AboutRuby mentioned (although this http://www.ruby-forum.com/topic/142745 suggests you should not only look at the 'Location' field but also for META REFRESH in the response).
open-uri will handle redirects for you if you're not interested in the low-level interaction:
require 'open-uri'
File.open("#{episode_id}.torrent", 'wb') {|torrent_file| torrent_file.write open(torrent_url).read}
get_response will return a class from the HTTPResponse hierarchy. It's usually HTTPSuccess, but if there's a redirect, it will be HTTPRedirection. A simple recursive method can solve this, that follows redirects. How to handle this correctly is in the docs under the heading "Following Redirection."

Resources