regex to find if there are more than 2 same words - expression

i am new to regex.
if there are more than 2 same words match it or show it.
ex:
124#abc.com , 345#abc.com,.........................

Since you have asked a general question, I can only give you a general answer. Adding /g at the end of your expression will stand for global match. Meaning that it will find all occurrences of your expression.

Related

simple symbol regex solution

The problem I'm looking at says only inputs with '+' symbols covering any letters in the string is true so like "+d++" or "+d+==+a+" but not
"f++d+"
"3+a=+b+"
"++d+=c+"
I tried to solve this using regex since it's kind of a string pattern matching problem. /(+[a-z][^+])|([^+.][a-z]+)/ but this does not cover patterns where the letters are at the beginning or end of the string. I need help something more comprehensive.
You should try following
/^\+{0,2}[a-z0-9]+\+{0,2}(=*\+{0-2}[a-z0-9]+\+{0,2})*$/
You could use the below regex.
^(?:[^\w\n]*\+[a-z]+\+)+[^\w\n]*$
DEMO
If you want to match +f+g+ also, then put the following + inside a positive lookahead assertion.
^(?:[^\w\n]*\+[a-z]+(?=\+))+[^\w\n]*$
DEMO

Ruby Koans - Why are repetition operators called "greedy" [duplicate]

What are these two terms in an understandable way?
Greedy will consume as much as possible. From http://www.regular-expressions.info/repeat.html we see the example of trying to match HTML tags with <.+>. Suppose you have the following:
<em>Hello World</em>
You may think that <.+> (. means any non newline character and + means one or more) would only match the <em> and the </em>, when in reality it will be very greedy, and go from the first < to the last >. This means it will match <em>Hello World</em> instead of what you wanted.
Making it lazy (<.+?>) will prevent this. By adding the ? after the +, we tell it to repeat as few times as possible, so the first > it comes across, is where we want to stop the matching.
I'd encourage you to download RegExr, a great tool that will help you explore Regular Expressions - I use it all the time.
'Greedy' means match longest possible string.
'Lazy' means match shortest possible string.
For example, the greedy h.+l matches 'hell' in 'hello' but the lazy h.+?l matches 'hel'.
Greedy quantifier
Lazy quantifier
Description
*
*?
Star Quantifier: 0 or more
+
+?
Plus Quantifier: 1 or more
?
??
Optional Quantifier: 0 or 1
{n}
{n}?
Quantifier: exactly n
{n,}
{n,}?
Quantifier: n or more
{n,m}
{n,m}?
Quantifier: between n and m
Add a ? to a quantifier to make it ungreedy i.e lazy.
Example:
test string : stackoverflow
greedy reg expression : s.*o output: stackoverflow
lazy reg expression : s.*?o output: stackoverflow
Greedy means your expression will match as large a group as possible, lazy means it will match the smallest group possible. For this string:
abcdefghijklmc
and this expression:
a.*c
A greedy match will match the whole string, and a lazy match will match just the first abc.
As far as I know, most regex engine is greedy by default. Add a question mark at the end of quantifier will enable lazy match.
As #Andre S mentioned in comment.
Greedy: Keep searching until condition is not satisfied.
Lazy: Stop searching once condition is satisfied.
Refer to the example below for what is greedy and what is lazy.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String args[]){
String money = "100000000999";
String greedyRegex = "100(0*)";
Pattern pattern = Pattern.compile(greedyRegex);
Matcher matcher = pattern.matcher(money);
while(matcher.find()){
System.out.println("I'm greedy and I want " + matcher.group() + " dollars. This is the most I can get.");
}
String lazyRegex = "100(0*?)";
pattern = Pattern.compile(lazyRegex);
matcher = pattern.matcher(money);
while(matcher.find()){
System.out.println("I'm too lazy to get so much money, only " + matcher.group() + " dollars is enough for me");
}
}
}
The result is:
I'm greedy and I want 100000000 dollars. This is the most I can get.
I'm too lazy to get so much money, only 100 dollars is enough for me
Taken From www.regular-expressions.info
Greediness: Greedy quantifiers first tries to repeat the token as many times
as possible, and gradually gives up matches as the engine backtracks to find
an overall match.
Laziness: Lazy quantifier first repeats the token as few times as required, and
gradually expands the match as the engine backtracks through the regex to
find an overall match.
From Regular expression
The standard quantifiers in regular
expressions are greedy, meaning they
match as much as they can, only giving
back as necessary to match the
remainder of the regex.
By using a lazy quantifier, the
expression tries the minimal match
first.
Greedy matching. The default behavior of regular expressions is to be greedy. That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient.
Example:
import re
text = "<body>Regex Greedy Matching Example </body>"
re.findall('<.*>', text)
#> ['<body>Regex Greedy Matching Example </body>']
Instead of matching till the first occurrence of ‘>’, it extracted the whole string. This is the default greedy or ‘take it all’ behavior of regex.
Lazy matching, on the other hand, ‘takes as little as possible’. This can be effected by adding a ? at the end of the pattern.
Example:
re.findall('<.*?>', text)
#> ['<body>', '</body>']
If you want only the first match to be retrieved, use the search method instead.
re.search('<.*?>', text).group()
#> '<body>'
Source: Python Regex Examples
Greedy Quantifiers are like the IRS
They’ll take as much as they can. e.g. matches with this regex: .*
$50,000
Bye-bye bank balance.
See here for an example: Greedy-example
Non-greedy quantifiers - they take as little as they can
Ask for a tax refund: the IRS sudden becomes non-greedy - and return as little as possible: i.e. they use this quantifier:
(.{2,5}?)([0-9]*) against this input: $50,000
The first group is non-needy and only matches $5 – so I get a $5 refund against the $50,000 input.
See here: Non-greedy-example.
Why do we need greedy vs non-greedy?
It becomes important if you are trying to match certain parts of an expression. Sometimes you don't want to match everything - as little as possible. Sometimes you want to match as much as possible. Nothing more to it.
You can play around with the examples in the links posted above.
(Analogy used to help you remember).
Greedy means it will consume your pattern until there are none of them left and it can look no further.
Lazy will stop as soon as it will encounter the first pattern you requested.
One common example that I often encounter is \s*-\s*? of a regex ([0-9]{2}\s*-\s*?[0-9]{7})
The first \s* is classified as greedy because of * and will look as many white spaces as possible after the digits are encountered and then look for a dash character "-". Where as the second \s*? is lazy because of the present of *? which means that it will look the first white space character and stop right there.
Best shown by example. String. 192.168.1.1 and a greedy regex \b.+\b
You might think this would give you the 1st octet but is actually matches against the whole string. Why? Because the.+ is greedy and a greedy match matches every character in 192.168.1.1 until it reaches the end of the string. This is the important bit! Now it starts to backtrack one character at a time until it finds a match for the 3rd token (\b).
If the string a 4GB text file and 192.168.1.1 was at the start you could easily see how this backtracking would cause an issue.
To make a regex non greedy (lazy) put a question mark after your greedy search e.g
*?
??
+?
What happens now is token 2 (+?) finds a match, regex moves along a character and then tries the next token (\b) rather than token 2 (+?). So it creeps along gingerly.
To give extra clarification on Laziness, here is one example which is maybe not intuitive on first look but explains idea of "gradually expands the match" from Suganthan Madhavan Pillai answer.
input -> some.email#domain.com#
regex -> ^.*?#$
Regex for this input will have a match. At first glance somebody could say LAZY match(".*?#") will stop at first # after which it will check that input string ends("$"). Following this logic someone would conclude there is no match because input string doesn't end after first #.
But as you can see this is not the case, regex will go forward even though we are using non-greedy(lazy mode) search until it hits second # and have a MINIMAL match.
try to understand the following behavior:
var input = "0014.2";
Regex r1 = new Regex("\\d+.{0,1}\\d+");
Regex r2 = new Regex("\\d*.{0,1}\\d*");
Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // "0014.2"
input = " 0014.2";
Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // " 0014"
input = " 0014.2";
Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // ""

Precedence of Ruby regular expressions?

I am reviewing regular expressions and cannot understand why a regular expression won't match a given string, specifically:
regex = /(ab*)+(bc)?/
mystring = "abbc"
The match matches "abb" but leaves the c off. I tested this using Rubular and in IRB and don't understand why the regex doesn't match the entire string. I thought that (ab*)+ would match "ab" and then (bc)? would match "bc".
Am I missing something in terms of precedence for regular expression operations?
Regular expressions try to match the first part of the regular expression as much as possible by default, and they do not backtrack to try to make larger sections match if they don't have to. Since you make (bc) optional, the (ab*) can match as much as it wants (the non-zero repetition after it doesn't have much to do) and doesn't try backtracking to try other matching alternatives.
If you want the whole string to be matched (which will force some backtracking in this case) make sure you anchor both ends of the string:
regex = /^(ab*)+(bc)?$/
The regex with parenthesis assumes you have two matches in your string.
The first one is abb because (ab*) means a and zero or more b. You have two b, so the match is abb. Then you have only c in your string, so it doesn't match the second condition which is bc.

ruby regex: match URL recurring pattern

I want to be able to match all the following cases below using Ruby 1.8.7.
/pages/multiedit/16801,16809,16817,16825,16833
/pages/multiedit/16801,16809,16817
/pages/multiedit/16801
/pages/multiedit/1,3,5,7,8,9,10,46
I currently have:
\/pages\/multiedit\/\d*
This matches upto the first set of numbers. So for example:
"/pages/multiedit/16801,16809,16817,16825,16833"[/\/pages\/multiedit\/\d*/]
# => "/pages/multiedit/16801"
See http://rubular.com/r/ruFPx5yIAF for example.
Thanks for the help, regex gods.
\/pages\/multiedit\/\d+(?:,\d+)*
Example: http://rubular.com/r/0nhpgki6Gy
Edit: Updated to not capture anything... Although the performance hit would be negligible. (Thanks Tin Man)
The currently accepted answer of
\/pages\/multiedit\/[\d,]+
may not be a good idea because that will also match the following strings
.../pages/multiedit/,,,
.../pages/multiedit/,1,
My answer requires there be at least one digit before the first comma, and at least one digit between commas, and it must end with a digit.
I'd use:
/\/pages\/multiedit\/[\d,]+/
Here's a demonstration of the pattern at http://rubular.com/r/h7VLZS1W1q
[\d,]+ means "find one or more numbers or commas"
The reason \d* doesn't work is it means "find zero or more numbers". As soon as the pattern search runs into a comma it stops. You have to tell the engine that it's OK to find numbers and commas.

Ruby string does not match expression

I have this ruby expression as below
(a|bc)(d?|e)*
when i use rubular to test out possible strings that fit this expression, I have some strings that I dont understand why they dont fit
the strings are "ade", it matches "ad" but does not match the "e". Anyone can help?
The second part of the regular expression you entered (d?|e)* is the problem. Putting the ? on the d says, match d 0 or 1 times. When you run through the string ade, the regex matches a, then d, then d 0 times... If you instead changed it to (a|bc)(d|e)*, it would match ade, and seem to have the semantics that you're looking for.
(d?)* is a non-greedy match and e* will be "short circuited" by logic or. It will match as few as possible.
I don't know why you put a question mark there. Just use
(a|bc)(d|e)*
Will be fine.

Resources