This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do I make part of a regular expression optional in Ruby?
I'm trying to build a regular expression with rubular to match:
On Feb 23, 2011, at 10:22 , James Bond wrote:
OR
On Feb 23, 2011, at 10:22 AM , James Bond wrote:
Here's what I have so far, but for some reason it's not matching? Ideas?
(On.* (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, [12]\d{3}.* at \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:)
How can I make the AM/PM text optional? Either match AM/PM or neither?
This seems to catch the date info. I purposely captured in groups, making it easier to build a real date:
regex = /^On (\w+ \d+, \d+), \w+ (\S+) (\w*)\s*,/
[
'On Feb 23, 2011, at 10:22 , James Bond wrote:',
'On Feb 23, 2011, at 10:22 AM , James Bond wrote:'
].each do |ary|
ary =~ regex
puts "#{$1} #{$2} #{$3}"
end
# >> Feb 23, 2011 10:22
# >> Feb 23, 2011 10:22 AM
I purposed didn't try to match on the months. Your sample strings look like quote headers from email messages. Those are very standard and generated by software, so you should see a lot of consistency in the format, allowing some simplification in the regex. If you can't trust those, then go with the matches on month name abbreviations to help ignore false-positive matches. The same things apply for the day, year, and time values.
The important thing in the regex is how to deal with the AM/PM when it's missing.
maybe this
(On\s+(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+[12]\d{3},\s+at\s+\d{1,2}:\d{1,2}\s+(?:AM|PM)*,.*wrote:)
however, if you can be verify and be sure that only these lines are unique, you don't have to use a whole range of regex. Maybe it starts with "On" and ends with "wrote:" , your regex might then simple be /^On.*wrote:/
Just use the question mark operator after any group you want to be optional, so in this case:
(?:(?:AM|PM) )?
Be sure to match the space as well, otherwise the strings without AM/PM need to include two spaces. The solution with (?:AM|PM)* would also match AMAMPM, so that's probably not what you want. But why do you match those group without creating backreferences? Aren't you going to use the values?
For info on backreferences:
http://www.regular-expressions.info/brackets.html
Related
I use TextExpander on Windows and have googled for an solution to my problem. All the solutions I've seen so far are written in AppleScript which doesn't work on Windows.
The format I need looks like this:
Monday, 1st August 2016
Tuesday, 2nd August 2016
Wednesday, 3rd August 2016
Thursday, 4th August 2016
and so on. Notice the parts in bold.
I've tried using the date/time tools included with TextExpander but the st, nd, rd and th ordinals are not included.
I don't know any scripting at all. Is there a script for this long date format that will work on Windows?
If you leave out the conditional "st", "nd", "rd", or "th" after the date, then:
%A, %d %B %Y
In this example I made the Day of the week bold to demonstrate how the result will show up bold as well. Of course it is only an option.
If you do want the extra suffix with nothing bold, then:
%A, %d%fillpopup:name=Suffix:default=:st:nd:rd:th% %B %Y
I Hope this helps!
I am still struggling to find some ruby regex syntax despite the numerous documentation on-line. I have an array of string and I am looking for strings that include one number (whatever the number of digits) but not specific one (let's say for instance dates from 19XX to 201X).
I manage to get the regex for "the line contain a number"
.*\p{N}.*
I manage to get "exclude the line if this number is a year"
(?!19\d\d|20[0-1]\d)\d{4}
But I fail to combine both. I would need something that would intuitively be written as such
(.*\p{N}.*)&&(?!19\d\d|20[0-1]\d)\d{4}
But I am not sure how an AND operator can be used.
Here it is:
^(?!.*19\d\d.*)(?!.*20[01]\d.*)(.*\p{N}.*)$
You want a string that:
(?!.*19\d\d.*) doesn't contains 19xx
(?!.*20[01]\d.*) doesn't contains 200x or 201x
(.*\p{N}+.*) contains, at least, one digit
In regex && means, well, literal && and not and operator
If you want to capture numbers that are not in the range 1900-2019 you can replace with:
(?!\b19\d\d\b)(?!\b20[01]\d\b)(\b\p{N}+\b)
You can test it here
While the solution by Thomas is probably the best one, another option would be to go without negation: just select everything, that matches:
re = /\D(
[03-9]\d*|
(?:1|2|20)(?=\D)|
1[0-8]\d*|
19\d?(?=\D)|
19\d{3,}|
20[2-9]\d*|
20[01]?(?=\D)|
20[01]\d{2,}
)/x
▶ 'Here 2014 and 1945 and 1878 and 20000 and 2 and 19 and 195 and 203'.scan re
#⇒ [["1878"], ["20000"], ["2"], ["19"], ["195"], ["203"]]
What would be a good practice to match strings while the user is typing them in?
I would like to take this regex for matching a time as an example:
/(\d\d?)(:\d\d?)?\s?(am|pm)?/ matches 12:30 am, 9 pm, 23:00 perfectly fine.
It only becomes tricky when you want to provide feedback while the user is typing in that string.
12: is not matched, neither would 12:30 p.
My solution would be to use a second, independend, regex for for the matching of incomplete strings that includes all input possibilities:
/(\d\d?)(:\d\d?|:)?\s?(|a|p|am|pm)?/ will match 12: and 12:30 p just fine.
Is there a better, more elegant way, to do this?
I'm trying to write a negative lookahead for a regular expression that will ignore lines of boilerplate in the text of an email, specifically the bit that goes like:
> On Sat, Apr 27, 2013 at 11:39 PM, Jane Smith <jane.smith#example.com> wrote:
I want to match all the digits that are not in my negative lookahead. I tried this:
(?!(?:^>?*\sOn\s.*wrote:\s?)$)\d
But that always matches inside that line. I'm particularly confused because this regex:
(?:^>?*\sOn\s.*wrote:\s?)$
matches that entire line. Obviously I'm missing something, but I have no idea what it is. Thanks for any help.
try this pattern, but don't forget to remove the empty matches:
> On .*+\n> wrote:|(\d++)
I have the following types of strings.
BILL SMITH (USA)
WINTHROP (FR)
LORD AT WAR (GB)
KIM SMITH
With these strings, I have the following constraints:
1. all caps
2. can be 2 to 18 charters long
3. should not have any white spaces or carriage returns at the end
4. the country abbreviation inside the parens should be excluded
5. some of the names will not have the country in parens and they should be matched too
After applying my regular expression I'd like to get the following:
BILL SMITH (USA) => BILL SMITH
WINTHROP (FR) => WINTHROP
LORD AT WAR (GB) = LORD AT WAR
KIM SMITH => KIM SMITH
I came up with the following regular expression but I'm not getting any matches:
String.scan(\([A-Z \s*]{1,18})(^?!(\([A-Z]{1,3}\)))\)
I been banging my head on this for a while so if anyone can point error I'd appreciated it.
UPDATE:
I've gotten some great responses, however, so far none of the regular expression solutions have met all the constraints. The tricky part seems to be that some of the string has the country in parenthesis and some don't. In one case strings without the country was not being matched and in another it was returning the correct string along with the country abbreviation without the parenthesis. (See the comments on the second response.) One point of clarification: All of the strings that I will be matching will be the start point of the string. Not sure if that helps or not. Thanks again for all your help.
Here's one solution:
^((?:[A-Z]|\s){2,18}+?)(?:\s\([A-Z]+\))?$
See it on Rubular. Note that it counts 18 characters before the parenthesis - not sure how you want it to behave specificically. If you want to make sure the whole line isn't more than 18 characters, I suggest you just do unless line.length < 18 ... Similarly, if you want to make sure there is no whitespace at the end, I recommend you use line.strip. That'll greatly reduce the complexity of the Regexp you need and make your code more readable.
Edit: also works when no parentheses are used after the name.
The biggest error is that you wrote (^?!...) where you meant (?=...). The former means "an optional start-of-line anchor, followed by !, followed by ..., inside a capture group"; the latter means "a position in the string that is followed by ...". Fixing that, as well as makin a few other tweaks, and adding the requirement that the initial string end with a letter, we get:
[A-Z\s]{1,17}[A-Z])(?=\s*\([A-Z]{1,3}\)
Update based on OP comments: Since this will always match at the start of a string, you can use \A to "anchor" your pattern to the start of the string. You can then get rid of the lookahead assertion. This:
\A[A-Z][A-Z\s]{0,16}[A-Z]
matches start-of-string, followed by an uppercase letter, followed by up to 16 characters that are either uppercase letters or whitespace characters, followed by an uppercase letter.
You can also just use gsub to remove the part(s) you don't want. To remove everything in parenthesis you could do:
str.gsub(/\s*\([^)]*\)/, '')