I need create a regular expression for validate the first and second name of a person. The second name is optional because there are people without second name. The space character can be between the two names, but it can not be the end of string
example
"Juan Perez" is valid
"Juan Perez " is invalid because there is a space character the end of the string
You could use the below regex which uses an optional group.
^[A-Za-z]+(?:\s[A-Za-z]+)?$
(?:\s[A-Za-z]+)? optional group will do a match if there is a space followed by one or more alphabets. It won't match the string if there is only a single space exists. And also it asserts that there must be an alphabet present at the last.
DEMO
Update:
If the user name contains not only the firstname but also middlename,lastname ... then you could use the below regex. * repeats the previous token zero or more times where ? after a token (not of * or +) will turn the previous token as an optional one.
^[A-Za-z]+(?:\s[A-Za-z]+)*$
How about a way that doesn't require repeating the char class:
^\b(\s?[[:alpha:]]+)+$
Related
Has there ever been an implementation of the field function (page 311) in the various flavors of Pick/UniBasic etc. that would operate on a delimiter of more than one character?
The documented implementations I can find stipulate one character as the delimiter argument and if the delimiter is presented with more than one character, the first character of the delimiter string is used instead of the entire string as a delimiter.
I am asking this because there are many instances in the commercial and custom software I maintain where I see attempts to use a multi-character delimiter with the field statement. It seems programmers were using it expecting a different result than is currently happening.
jBASE does allow for this. From the FIELD docs:
This function returns a multi-character delimited field from within a string. It takes the general form:
FIELD(string, delimiter, occurrence{, extractCount})
where:
string specifies the string, from which the field(s) is to be extracted.
delimiter specifies the character or characters that delimit the fields within the dynamic array.
occurrence should evaluate to an integer of value 1 or higher. It specifies the delimiter used as the starting point for the extraction.
extractCount is an integer that specifies the number of fields to extract. If omitted, assumes one.
Additionally, an example from the docs:
in_Value = "AAAA : BBjBASEBB : CCCCC"
CRT FIELD(in_Value , "jBASE", 1)
Producing output:
AAAA : BB
Update 2020-08-13 (adding context for OpenQM):
As an official comment since we maintain both jBASE and OpenQM, I felt it worth calling out that OpenQM does not allow multi-character delimiters for FIELD().
I want to replace a pattern in Ruby only if the next letter after the pattern is one of the given.
Example: replace "αυ" with "av" ONLY IF next letter after "αυ" is one of the followings: α|γ|δ|λ|μ|ν|ρ|σμ|ω
This code will not work of course, I suppose I need to use a regex more complicate to match one of the letter after the pattern.
string.gsub!("αυ", "av") if string =~ /α|γ|δ|λ|μ|ν|ρ|σμ|ω/
Thanks for any suggestion.
Use a positive lookahead:
string.gsub!(/αυ(?=α|γ|δ|λ|μ|ν|ρ|σμ|ω)/, "av")
See the Rubular demo
Details
αυ - a αυ substring
(?=α|γ|δ|λ|μ|ν|ρ|σμ|ω) - a positive lookahead that requires the presence of one of the alternatives inside it while excluding the alternative inside the match value, i.e. it will be left in the resulting string).
You may also "contract" the single-char alternations into a character class
/αυ(?=[αγδλμνρω]|σμ)/
^^^^^^^^^^
See another Rubular demo. σμ cannot be put inside a character class since it contains 2 chars.
I have ruby app that uses first matched string by regex. my_url.match(/my_regex/).first
As a strings I have a list of urls that contain adress or postcode and from each of them I need to extract postocode or adreess by using regex
Samples of urls:
http://www.adresses.co.uk/avon/bath-city
http://www.adresses.co.uk/postcode/rm107jj
My regex:
\.co\.uk\/postcode\/([^\/]*)|\.co\.uk\/(?!postcode)([^\/]*\/[^\/]*)
My problem is that for non postcode urls a first matched data by this regex is nil see_on_rubular
How to rewrite or change this reflex so it will skip nil matches or to make first matches non nils. I need to solve it with regex not in ruby coding please.
Here's a regex that captures in group #1 everything after postcode/ if it's present, or else everything after .co.uk/:
\.co\.uk\/(?:postcode\/)?([^\/\n]+(?:\/[^\/\n]+)?)
(DEMO)
Note that this will give unexpected results if there are unwanted path elements at the end of a postcode link, such as:
http://www.adresses.co.uk/postcode/rm107jj/oops
UPDATE: Based on the comments, it looks like you want to match just the last path element. But we can't simply capture the second element, because there might be only one:
http://www.adresses.co.uk/west-midlands
We can, however, make the first element optional:
\.co\.uk\/(?:[^\/\n]+\/)?([^\/\n]+)
Notice how I used a non-capturing group for the optional portion, so the part you want is still captured in group #1.
...
I am new to cucumber with capybara. I got an application to test whose flow is:'after submitting a form, an email will be sent to the user which contains the link to another app. In order to access the app we have to open the mail and click the link, which will redirect to the app.'. I don't have access to the mail Id. Is there any way to extract that link and continue with the flow?
Please, give some possible way to do it.
Regards,
Abhisek Das
In your test, use whatever means you need in order to trigger the sending of the email by your application. Once the email is sent, use a regular expression to find the URL from the link within the email body (note this will work only for an email that contains a single link), and then visit the path from that URL with Capybara to continue with your test:
path_regex = /(?:"https?\:\/\/.*?)(\/.*?)(?:")/
email = ActionMailer::Base.deliveries.last
path = email.body.match(path_regex)[1]
visit(path)
Regular expression explained
A regular expression (regex) itself is demarcated by forward slashes, and this regex in particular consists of three groups, each demarcated by pairs of parentheses. The first and third groups both begin with ?:, indicating that they are non-capturing groups, while the second is a capturing group (no ?:). I will explain the significance of this distinction below.
The first group, (?:"https?\:\/\/.*?), is a:
non-capturing group, ?:
that matches a single double quote, "
we match a quote since we anticipate the URL to be in the href="..." attribute of a link tag
followed by the string http
optionally followed by a lowercase s, s?
the question mark makes the preceding match, in this case s, optional
followed by a colon and two forward slashes, \:\/\/
note the backslashes, which are used to escape characters that otherwise have a special meaning in a regex
followed by a wildcard, .*?, which will match any character any number of times up until the next match in the regex is reached
the period, or wildcard, matches any character
the asterisk, *, repeats the preceding match up to an unlimited number of times, depending on the successive match that follows
the question mark makes this a lazy match, meaning the wildcard will match as few characters as possible while still allowing the next match in the regex to be satisfied
The second group, (\/.*?) is a capturing group that:
matches a single forward slash, \/
this will match the first forward slash after the host portion of the URL (e.g. the slash at the end of http://www.example.com/) since the slashes in http:// were already matched by the first group
followed by another lazy wildcard, .*?
The third group, (?:"), is:
another non-capturing group, ?:
that matches a single double quote, "
And thus, our second group will match the portion of the URL starting with the forward slash after the host and going up to, but not including, the double quote at the end of our href="...".
When we call the match method using our regex, it returns an instance of MatchData, which behaves much like an array. The element at index 0 is a string containing the entire matched string (from all of the groups in the regex), while elements at subsequent indices contain only the portions of the string matched by the regex's capturing groups (only our second group, in this case). Thus, to get the corresponding match of our second group—which is the path we want to visit using Capybara—we grab the element at index 1.
You can use Nokogiri to parse the email body and find the link you want to click.
Imagine you want to click a link Change my password:
email = ActionMailer::Base.deliveries.last
html = Nokogiri::HTML(email.html_part.body.to_s)
target_url = html.at("a:contains('Change my password')")['href']
visit target_url
I think this is more semantic and robust that using regular expressions. For example, this would work if the email has many links.
If you're using or willing to use the capybara-email gem, there's now a simpler way of doing this. Let's say you've generated an email to recipient#email.com, which contains the link 'fancy link'.
Then you can just do this in your test suite:
open_email('recipient#email.com') # Allows the current_email method
current_email.click_link 'fancy link'
I was reading this and I did not understand it. I have two questions.
What is the difference ([aeiou]) and [aeiou]?
What does <\1> mean?
"hello".sub(/([aeiou])/, '<\1>') #=> "h<e>llo"
Here it documented:
If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form "\d", where d is a group number, or "\k<n>", where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash. However, within replacement the special match variables, such as &$, will not refer to the current match.
Character Classes
A character class is delimited with square brackets ([, ]) and lists characters that may appear at that point in the match. /[ab]/ means a or b, as opposed to /ab/ which means a followed by b.
Hope above definition made clear what [aeiou] is.
Capturing
Parentheses can be used for capturing. The text enclosed by the nth group of parentheses can be subsequently referred to with n. Within a pattern use the backreference \n; outside of the pattern use MatchData[n].
Hope above definition made clear what ([aeiou]) is.
([aeiou]) - any characters inside the character class [..],which will be found first from the string "hello",is the value of \1(i.e.the first capture group). In this example value of \1 is e,which will be replaced by <e> (as you defined <\1>). That's how "h<e>llo" has been generated from the string hello using String#sub method.
The doc you post says
It may contain back-references to the pattern’s capture groups of the
form "\d", where d is a group number, or "\k", where n is a group
name.
So \1 matches whatever was captured in the first () group, i.e. one of [aeiou] and then uses it in the replacement <\1>