Ruby Regular Expression for Parcelify (Shopify) - ruby

I'm trying to write a Regex in Ruby for a shipping query.
If postcodes match MK1 - MK10, MK19, MK43, MK46 or MK77, then allow it.
If postcodes match NN1 - NN7, NN12, NN13, NN29 or NN77, then allow it.
If postcodes match MK11 - MK18 then don't allow it.
My trouble is that in the UK our postcodes are a bit funny where you can put MK1 1TS and MK11TS and they're considered the same. By not allowing MK11, MK11TY could be misread as MK11.
I've written a regex below, and so far it will disallow MK111TS and MK11\s1TS, and allow MK1\s1TS but not MK11TS. Any help would be greatly appreciated, I've only tested this for MK11 so far.
^((?!MK11\d).)*$&^((?!MK11\s\d).)*$|(MK(1 |2 |3 |4 |5 |6 |7 |8 |9 |10 ))|(MK19)|(MK43)|(MK46)|(MK77)|(NN1)|(NN2)|(NN3)|(NN4)|(NN5)|(NN6)|(NN7)|(NN12)|(NN13)|(NN29)|(NN77)
Thanks in advance.

r = /
(?: # begin non-capture group
MK # match characters
(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77) # match one of the choices
| # or
NN # match characters
(?:1|2|3|4|5|6|7|12|13|29|77) # match one of the choices
) # end non-capture group
(?![^\sA-Z]) # do not match a space or cap letter
/ix # case indifferent and free-spacing
# regex definition mode
This is conventionally written
r = /(?:MK(?:1|2|...|10|19|...|77)|NN(?:1|2|...|7|12|13|29|77))(?![^\sA-Z])/i
"MK4 abc def MK11MK19ghi NN6 jkl NN13 NN29NN77".scan(r)
# => ["MK4", "NN6", "NN13", "NN29", "NN77"]
"MK11" is not matched because "11" is not in the list. "MK19" is not matched because it is followed by a character that is neither a space nor a capital letter.
Alternatively, one could write
s = (['MK'].product(%w{1 2 3 4 5 6 7 8 9 10 19 43 46 77}).map(&:join) +
['NN'].product(%w{1 2 3 4 5 6 7 12 13 29 77}).map(&:join)).join('|')
# => "MK1|MK2|...|MK10|MK19|MK43|MK46|MK77|NN1|NN2|...|NN7|NN12|NN13|NN29|NN77"
r = /(?:#{s})(?![^\sA-Z])/i
#=> /(?:MK1|MK2|...|MK10|MK19|...|MK77|NN1|NN2|...|NN7|NN12|NN13|NN29|NN77)(?![^\sA-Z])/
If the remainder of the postal code is to be included in the regex, perhaps something like the following could be done.
suffixes = %w|ES AB CD EF|.join('|')
#=> "ES|AB|CD|EF"
Then replace (?![^\sA-Z])/x with the following.
\s? # optionally match a space
(?:#{suffixes}) # match a valid suffix in a non-capture group
(?!\S) # do not match a non-whitespace char (negative lookahead)
/ix # case-indifferent and free-spacing regex definition mode
Note the negative lookahead is satisfied if the suffix is at the end of the string.

Now I have written the following to match the postcodes format exactly:
#format: Area Code, Localities accepted, whitespace (MKor not), any digit, any single character, any single character
((MK|mk|Mk|mK)(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77)\s\d[A-Za-z][A-Za-z]) #with whitespace
|
((MK|mk|Mk|mK)(?:1|2|3|4|5|6|7|8|9|10|19|43|46|77)\d[A-Za-z][A-Za-z]) #without whitespace
|
((NN|nn|Nn|nN)(?:1|2|3|4|5|6|7|12|13|29|77)\s\d[A-Za-z][A-Za-z]) #with whitespace
|
((NN|nn|Nn|nN)(?:1|2|3|4|5|6|7|12|13|29|77)\d[A-Za-z][A-Za-z]) #without whitespace
This works for my purposes, I got here using Cary's answer, which has been extremely helpful. Thank you and have marked up.

Related

Replace pattern in string with value from a ruby array

I have a string like this
"base: [_0x3e63[241], _0x3e63[242]],
gray: [_0x3e63[243], _0x3e63[244], _0x3e63[245], _0x3e63[246], _0x3e63[247], _0x3e63[248], _0x3e63[249], _0x3e63[250], _0x3e63[251], _0x3e63[252]],
red: [_0x3e63[253], _0x3e63[254], _0x3e63[255], _0x3e63[256], _0x3e63[257], _0x3e63[258], _0x3e63[259], _0x3e63[260], _0x3e63[261], _0x3e63[262]],
pink: [_0x3e63[263], _0x3e63[264], _0x3e63[265], _0x3e63[266], _0x3e63[267], _0x3e63[268], _0x3e63[269], _0x3e63[270], _0x3e63[271], _0x3e63[272]],
grape: [_0x3e63[273], _0x3e63[274], _0x3e63[275], _0x3e63[276], _0x3e63[277], _0x3e63[278], _0x3e63[279], _0x3e63[280], _0x3e63[281], _0x3e63[282]],
violet: [_0x3e63[283], _0x3e63[284], _0x3e63[285], _0x3e63[286], _0x3e63[287], _0x3e63[288], _0x3e63[289], _0x3e63[290], _0x3e63[291], _0x3e63[292]],
indigo: [_0x3e63[293], _0x3e63[294], _0x3e63[295], _0x3e63[296], _0x3e63[297], _0x3e63[298], _0x3e63[299], _0x3e63[300], _0x3e63[301], _0x3e63[302]],
blue: [_0x3e63[303], _0x3e63[304], _0x3e63[305], _0x3e63[306], _0x3e63[307], _0x3e63[308], _0x3e63[309], _0x3e63[310], _0x3e63[311], _0x3e63[312]],
cyan: [_0x3e63[313], _0x3e63[314], _0x3e63[315], _0x3e63[316], _0x3e63[317], _0x3e63[318], _0x3e63[319], _0x3e63[320], _0x3e63[321], _0x3e63[322]],
teal: [_0x3e63[323], _0x3e63[324], _0x3e63[325], _0x3e63[326], _0x3e63[327], _0x3e63[328], _0x3e63[329], _0x3e63[330], _0x3e63[331], _0x3e63[332]],
green: [_0x3e63[333], _0x3e63[334], _0x3e63[335], _0x3e63[336], _0x3e63[337], _0x3e63[338], _0x3e63[339], _0x3e63[340], _0x3e63[341], _0x3e63[342]],
lime: [_0x3e63[343], _0x3e63[344], _0x3e63[345], _0x3e63[346], _0x3e63[347], _0x3e63[348], _0x3e63[349], _0x3e63[350], _0x3e63[351], _0x3e63[352]],
yellow: [_0x3e63[353], _0x3e63[354], _0x3e63[355], _0x3e63[356], _0x3e63[357], _0x3e63[358], _0x3e63[359], _0x3e63[360], _0x3e63[361], _0x3e63[362]],
orange: [_0x3e63[363], _0x3e63[364], _0x3e63[365], _0x3e63[366], _0x3e63[367], _0x3e63[368], _0x3e63[369], _0x3e63[370], _0x3e63[371], _0x3e63[372]]"
_0x3e63 is a ruby array with the values.
_0x3e63 = ["#f783ac", "#faa2c1", "#fcc2d7", "#ffdeeb", "#fff0f6", "#862e9c", "#9c36b5", "#ae3ec9", "#be4bdb", "#cc5de8", "#da77f2", "#e599f7", "#eebefa", "#f3d9fa", "#f8f0fc", "#5f3dc4", "#6741d9", "#7048e8", "#7950f2", "#845ef7", "#9775fa", "#b197fc", "#d0bfff", "#e5dbff", "#f3f0ff", "#364fc7", "#3b5bdb", "#4263eb", "#4c6ef5", "#5c7cfa", "#748ffc", "#91a7ff", "#bac8ff", "#dbe4ff", "#edf2ff", "#1864ab", "#1971c2", "#1c7ed6", "#228be6", "#339af0", "#4dabf7", "#74c0fc", "#a5d8ff", "#d0ebff", "#e7f5ff", "#0b7285", "#0c8599", "#1098ad", "#15aabf", "#22b8cf", "#3bc9db", "#66d9e8", "#99e9f2", "#c5f6fa", "#e3fafc", "#087f5b", "#099268", "#0ca678", "#12b886", "#20c997", "#38d9a9", "#63e6be", "#96f2d7", "#c3fae8", "#e6fcf5", "#2b8a3e", "#2f9e44", "#37b24d", "#40c057", "#51cf66", "#69db7c", "#8ce99a", "#b2f2bb", "#d3f9d8", "#ebfbee", "#5c940d", "#66a80f", "#74b816", "#82c91e", "#94d82d", "#a9e34b", "#c0eb75", "#d8f5a2", "#e9fac8", "#f4fce3", "#e67700", "#f08c00", "#f59f00", "#fab005", "#fcc419", "#ffd43b", "#ffe066", "#ffec99", "#fff3bf", "#fff9db", "#d9480f", "#e8590c"]
I cannot find a way to retrieve from the string _0x3e63[xxxxxxx] replacing it with the right value....
Use String#gsub with a block.
Assuming your input string is stored in the variable input, the following code does the replacement and displays the result:
puts input.gsub(/_0x3e63\[(\d+)\]/){|s| _0x3e63[$1.to_i]}
(The array _0x3e63 you posted in the question does not contain enough values to have indices like 247 or 251 but the code works nevertheless.)
The code is very simple. The regular expression /_0x3e63\[(\d+)\]/ matches any string that starts with _0x3e63[, continues with one or more digits (\d+) and ends with ].
For each match the block is executed and the value returned by the block is used to replace the matched piece of the original string.
The replacement uses $1 (that contains the sub-string that matches the first capturing group) as an index into the array _0x3e63. Because the value of $1 is a string, .to_i is used to convert it to a number (required to be used as index in the array).
We are given:
str =<<~END
base: [arr[6], arr[3]],
gray: [arr[0], arr[4], arr[1], arr[5]],
red: [arr[2]]
END
#=> "base: [arr[6], arr[3]],\ngray: [arr[0], arr[4], arr[1], arr[5]],\nred: [arr[2]]\n"
and
arr = ["#f783ac", "#faa2c1", "#fcc2d7", "#ffdeeb", "#fff0f6", "#862e9c",
"#9c36b5"]
We can perform the required replacements by using String#gsub with a regular expression and Kernel#eval:
puts str.gsub(/\barr\[\d+\]/) { |s| eval s }
base: [#9c36b5, #ffdeeb],
gray: [#f783ac, #fff0f6, #faa2c1, #862e9c],
red: [#fcc2d7]
The regular expression preforms the following operations:
\b # match a word break (to avoid matching 'gnarr')
arr\[ # match string 'arr['
\d+ # match 1+ digits
\] # match ']'
Rubular
One must be cautious about using eval (to avoid launching missiles inadvertently, for example), but as long as the matches of the string can be trusted it's a perfectly safe and useful method.

How to parse username, ID or whole part using Ruby Regex in this sentence?

I have a sentences like this:
Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?
And what I want to is get #[Pratha](user:1) and #[John](user:3). Either their names and ids or just as texts as I quoted so that i can explode and parse name and id myself.
But there is an issue here. Names Pratha and John may include non-abc characters like ', ,, -, + , etc... But not [] and ()
What I tried so far:
c = ''
f = c.match(/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s(\[)|$)))(\w+)(?=\s|$)/i)
But no success.
You may use
/#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
See the regex demo
Details
# - a # char
\[ - a [
([^\]\[]*) - Group 1: 0+ chars other than [ and ]
\] - a ] char
\( - a ( char
[^()]*- 0+ chars other than ( and )
: - a colon
(\d+) - Group 2: 1 or more digits
\) - a ) char.
Sample Ruby code:
s = "Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?"
rx = /#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
res = s.scan(rx)
puts res
# = > [["Pratha", "1"], ["John", "3"]]
"Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?".scan(/#.*?\)/)
#⇒ ["#[Pratha](user:1)", "#[John](user:3)"]
Since the line is not coming from the user input, you might rely on that the part you are interested in starts with # and ends with ).
You could use 2 capturing groups to get the names and the id's:
#\[([^]]+)]\([^:]+:([^)]+)\)
That will match
# Match literally
\[ Match [
([^]]+) 1st capturing group which matches not ] 1+ times using a negated character class.
\( Match literally
[^:]+: Match not :, then match :
([^)]+) 2nd capturing group which matches not ) 1+ times
\) Match )
Regex demo | Ruby demo

Ruby Regex find numbers not surrounded by alphabetical characters [duplicate]

I have a regex expression that I'm using to find all the words in a given block of content, case insensitive, that are contained in a glossary stored in a database. Here's my pattern:
/($word)/i
The problem is, if I use /(Foo)/i then words like Food get matched. There needs to be whitespace or a word boundary on both sides of the word.
How can I modify my expression to match only the word Foo when it is a word at the beginning, middle, or end of a sentence?
Use word boundaries:
/\b($word)\b/i
Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:
/(?:\W|^)(\Q$word\E)(?:\W|$)/i
To match any whole word you would use the pattern (\w+)
Assuming you are using PCRE or something similar:
Above screenshot taken from this live example: http://regex101.com/r/cU5lC2
Matching any whole word on the commandline with (\w+)
I'll be using the phpsh interactive shell on Ubuntu 12.10 to demonstrate the PCRE regex engine through the method known as preg_match
Start phpsh, put some content into a variable, match on word.
el#apollo:~/foo$ phpsh
php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'
php> echo preg_match('(\w+)', $content1);
1
php> echo preg_match('(\w+)', $content2);
1
php> echo preg_match('(\w+)', $content3);
0
The preg_match method used the PCRE engine within the PHP language to analyze variables: $content1, $content2 and $content3 with the (\w)+ pattern.
$content1 and $content2 contain at least one word, $content3 does not.
Match a number of literal words on the commandline with (dart|fart)
el#apollo:~/foo$ phpsh
php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';
php> echo preg_match('(dart|fart)', $gun1);
1
php> echo preg_match('(dart|fart)', $gun2);
1
php> echo preg_match('(dart|fart)', $gun3);
1
php> echo preg_match('(dart|fart)', $gun4);
0
variables gun1 and gun2 contain the string dart or fart. gun4 does not. However it may be a problem that looking for word fart matches farty. To fix this, enforce word boundaries in regex.
Match literal words on the commandline with word boundaries.
el#apollo:~/foo$ phpsh
php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'farty gun';
php> $gun4 = 'unicorn gun';
php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
1
php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
1
php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
0
php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
0
So it's the same as the previous example except that the word fart with a \b word boundary does not exist in the content: farty.
Using \b can yield surprising results. You would be better off figuring out what separates a word from its definition and incorporating that information into your pattern.
#!/usr/bin/perl
use strict; use warnings;
use re 'debug';
my $str = 'S.P.E.C.T.R.E. (Special Executive for Counter-intelligence,
Terrorism, Revenge and Extortion) is a fictional global terrorist
organisation';
my $word = 'S.P.E.C.T.R.E.';
if ( $str =~ /\b(\Q$word\E)\b/ ) {
print $1, "\n";
}
Output:
Compiling REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
Final program:
1: BOUND (2)
2: OPEN1 (4)
4: EXACT (9)
9: CLOSE1 (11)
11: BOUND (12)
12: END (0)
anchored "S.P.E.C.T.R.E." at 0 (checking anchored) stclass BOUND minlen 14
Guessing start of match in sv for REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P
.E.C.T.R.E. (Special Executive for Counter-intelligence,"...
Found anchored substr "S.P.E.C.T.R.E." at offset 0...
start_shift: 0 check_at: 0 s: 0 endpos: 1
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "\b(S\.P\.E\.C\.T\.R\.E\.)\b" against "S.P.E.C.T.R.E. (Special Exec
utive for Counter-intelligence,"...
0 | 1:BOUND(2)
0 | 2:OPEN1(4)
0 | 4:EXACT (9)
14 | 9:CLOSE1(11)
14 | 11:BOUND(12)
failed...
Match failed
Freeing REx: "\b(S\.P\.E\.C\.T\.R\.E\.)\b"
For Those who want to validate an Enum in their code you can following the guide
In Regex World you can use ^ for starting a string and $ to end it. Using them in combination with | could be what you want :
^(Male)$|^(Female)$
It will return true only for Male or Female case.
If you are doing it in Notepad++
[\w]+
Would give you the entire word, and you can add parenthesis to get it as a group. Example: conv1 = Conv2D(64, (3, 3), activation=LeakyReLU(alpha=a), padding='valid', kernel_initializer='he_normal')(inputs). I would like to move LeakyReLU into its own line as a comment, and replace the current activation. In notepad++ this can be done using the follow find command:
([\w]+)( = .+)(LeakyReLU.alpha=a.)(.+)
and the replace command becomes:
\1\2'relu'\4 \n # \1 = LeakyReLU\(alpha=a\)\(\1\)
The spaces is to keep the right formatting in my code. :)
use word boundaries \b,
The following (using four escapes) works in my environment: Mac, safari Version 10.0.3 (12602.4.8)
var myReg = new RegExp(‘\\\\b’+ variable + ‘\\\\b’, ‘g’)
Get all "words" in a string
/([^\s]+)/g
Basically ^/s means break on spaces (or match groups of non-spaces)
Don't forget the g for Greedy
Try it:
"Not the answer you're looking for? Browse other questions tagged regex word-boundary or ask your own question.".match(/([^\s]+)/g)
→ (17) ['Not', 'the', 'answer', "you're", 'looking', 'for?', 'Browse', 'other', 'questions', 'tagged', 'regex', 'word-boundary', 'or', 'ask', 'your', 'own', 'question.']

How to capture part of a sentence that starts with a verb and finishes with nouns

I am trying to use NLTK package to capture the following chunk in a sentence:
verb + smth + noun
or it may be
verb + smth + noun + and + noun
I truthfully spent entire day messing with regex, but still nothing proper is produced..
I was looking at this tutorial which wasn't much of help.
When you have an idea of what those somethings that might come in between are, there is a relatively easy method using NLTK's CFG. This is most certainly not the most efficient way. For a comprehensive analysis, consult NLTK's book on chapter 8.
We have two patterns as you mentioned:
<verb> ... <noun>
<verb> ... <noun> "and" <noun>
We should assemble a list of VPs and NPs and also the range of possible words that could happen in between. As a silly little example:
grammar = nltk.CFG.fromstring("""
% start S
S -> VP SOMETHING NP
VP -> V
SOMETHING -> WORDS SOMETHING
SOMETHING ->
NP -> N 'and' N
NP -> N
V -> 'told' | 'scolded' | 'loved' | 'respected' | 'nominated' | 'rescued' | 'included'
N -> 'this' | 'us' | 'them' | 'you' | 'I' | 'me' | 'him'|'her'
WORDS -> 'among' | 'others' | 'not' | 'all' | 'of'| 'uhm' | '...' | 'let'| 'finish' | 'certainly' | 'maybe' | 'even' | 'me'
""")
Now suppose this is the list of the sentences we want to use our filter against:
sentences = ['scolded me and you', 'included certainly uhm maybe even her and I', 'loved me and maybe many others','nominated others not even him', 'told certainly among others uhm let me finish ... us and them', 'rescued all of us','rescued me and somebody else']
As you can see, the third and the last phrases don't pass the filter. We can check whether the rest match the pattern:
def sentence_filter(sent, grammar):
rd_parser = nltk.RecursiveDescentParser(grammar)
try:
for p in rd_parser.parse(sent):
print("SUCCESS!")
except:
print("Doesn't match the filter...")
for s in sentences:
s = s.split()
sentence_filter(s, grammar)
When we run this, we get this result:
>>>
SUCCESS!
SUCCESS!
Doesn't match the filter...
SUCCESS!
SUCCESS!
SUCCESS!
Doesn't match the filter...
>>>

How do you exclude a specific character pattern with regular expressions

I working with some regular expression matching and I'm trying to figure out how you would exclude a specific character pattern. Specifically, I want to exclude the following pattern:
5 - #in words: digit, space, dash & space)
I know how to exclude the components individually: [^5 ^-] but I'm looking to exclude the specific pattern. Is this possible?
Update - I'm using Ruby as my programming language.
Here is some sample input and desired output.:
Input: 1 - Blue-Stork Stables; 2 - Young, Robert, S.; 3 - Seahorse Stable; 4 - Carney, Elvis; 5 - Guerrero, Juan, Carlos-Martin; 6 - Dubb, Michael; 7 - Summers, Hope; 8 - DTH Stables; 9 - Peebles, Matthew\n
the desired output would be:
Output: Blue-Stork Stables; Young, Robert, S.; Seahorse Stable; Carney, Elvis; Guerrero, Juan, Carlos-Marting; Dubb, Michael; Summers, Hope; DTH Stables; Peebles, Matthew\n
Please take note of the dashes on Blue-Stork Stables and Juan Carlos-Martin.
EDIT: So you mean "remove", not "exclude". No problem:
result = subject.gsub(/\d+ - /, '')
transforms your input into the desired output. I've taken the liberty to allow more than one digit (after all, if numbers reach 10 or higher, you probably want to remove those entirely, too. Right?).
(Old answer for "historical reasons")
Depending on what you mean by "exclude", it appears that you're looking for negative lookahead assertions:
^(?!.*\d - )
will fail on strings that contain 5 - anywhere and succeed on all other strings:
"5 - " // fail
"5 -" // match
"abc5 - xyz" // fail
"foobar5 - " // fail

Resources