how to check if a string contains any form of an apostrophe not just single quote and ruby - ruby

I am using the following code to check if a string contains an apostrophe:
string.scan(/’|'/)
I have included two types of single quotation because I found that using just the standard ' did not catch some strings that contain an apostrophe using the ’
My concern is that if I am checking strings that may contain other fonts or styles my regex won't catch the apostrophe.
Is there a more general approach that would catch all forms of an apostrophe?

Straight single quote is the generic ver­ti­cal quo­ta­tion marks:
straight sin­gle quote (')
Curly quotes are the quo­ta­tion marks used in good ty­pog­ra­phy. There are two curly single quote char­ac­ters:
the open­ing sin­gle quote (‘)
the clos­ing sin­gle quote (’)
Going by the above three variants:
You maytry this:
string.scan(/['‘’]/)

Those would probably be the most common ones :
/[‘’']/
If you just need to check if a string contains a regex, you shouldn't use scan :
"apostrophe's" =~ /[‘’']/ #=> 10
=~ will stop at the first match.

Related

How to use escaped colon in thymeleaf th:text?

This is what happens when I try to use a colon in th:text:
and a backslash doesn't seem to fix it:
How can I use the colon symbol in th:text?
If you want to place a literal into th:text, you have to use single quotes: th:text="'7:00AM'". See documentation here.
(By contrast, something like this th:text="7_00AM" is valid - because it is a literal token. Such strings can only use a subset of characters, but do not need enclosing 's.)

Ruby regex syntax for "not matching one of the following"

Nice simple regex syntax question for you.
I have a block of text and i want to find instances of href=" or href=' which are NOT followed by either [ or http://
I can get "not followed by [" with
record.body =~ /href=['"](?!\[)/
and i can get "not followed by http://" with
record.body =~ /href=['"](?!http\:\/\/)/
But i can't quite work out how to combine the two.
Just to be clear: i want to find bad strings like this
`href="www.foo.com"`
but i'm ok with (ie don't want to find) strings like this
`href="http://www.foo.com"`
`href="[registration_url]"`
Combine the both by using the alternation operator.
href=['"](?!http\:\/\/|\[)
For more specific, it would be.
href=(['"])(?!http\:\/\/|\[)(?:(?!\1).)*\1
This would handle both single quoted or double quoted string in the href part. And this won't match the strings like href='foo.com" or href="foo.com' (unmatched quotes)
(['"]) would capture double quote or single quote. (?!http\:\/\/|\[) and the matched quote won't be followed by http:// or [, if yes, then it moves on to the next pattern. (?:(?!\1).)* matches any character but not of the captured character, zero or more times. \1 followed by the captured character.
DEMO
Use alternative list with pipe | symbol to combine the look-ahead conditions:
(?!http\:\/\/|\[)
So, to match the hrefs, you can use the following regex:
href=\"((?!http\:\/\/|\[)[^\"]+?)\"
See demo on Rubular.com.

Lookahead containing the same token as left/right anchors

Got a variation of the classic "regex quoted strings" problem. I need to pick out strings that look like this:
"foo bar bar"
from a long string like this
token token "maybe quoted token that can also contain spaces"
Each of the tokens can be quoted or unquoted (this is easy to take care of using alternating groups) but sometimes I have quoted strings which have literal quotes inside them (not escaped in any way),
the only useable thing being that those quotes never have spaces on either side (since that would
create a delimiter). Those tokens look like this: "foo-bar"baz"
My initial thought was /"(?:[^"]|" )*"/ but that doesn't seem to work because a token like this: "here is some"quotes" gets split in two.
How should I do this? Platform is Ruby 2.1
Use this:
"(?:[^"]|"\w)+"
or
"(?:[^"]|"\S)+"
You can play with sample strings in the regex demo.
Explanation
" matches the opening quote
The non-capturing group(?:start [^"]|"\w) matches...
One [^"] non-quote character, OR |
One quote and a word character "\w
+ one or more times
" closing quote
Further Refinements
If you want to allow quotes in other contexts, for instance escaped quotes, just add them to the alternation:
"(?:\\"|[^"]|"\w)+"
To allow quotes to be followed not just by a word char but any non-space:
"(?:\\"|[^"]|"\S)+"
This one may also suit your needs:
".*?"(?!\S)
Debuggex Demo
To match also non-quoted tokens:
".*?"(?!\S)|\S+
Debuggex Demo

Interpolation within single quotes

How can I perform interpolation within single quotes?
I tried something like this but there are two problems.
string = 'text contains "#{search.query}"'
It doesn't work
I need the final string to have the dynamic content wrapped in double quotes like so:
'text contains "candy"'
Probably seems strange but the gem that I'm working with requires this.
You can use %{text contains "#{search.query}"} if you don't want to escape the double quotes "text contains \"#{search.query}\"".
'Hi, %{professor}, You have been invited for %{booking_type}. You may accept, reject or keep discussing more about this offer' % {professor: 'Mr. Ashaan', booking_type: 'Dinner'}
Use
%q(text contains "#{search.query}")
which means start the string with single quote. Or if you want to start with double quote use:
%Q(text contains '#{text}')

Backslash + captured group within Ruby regular expression

How do I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?
As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1' or '\\\\\\1'. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1 is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1' and '\1' are equivalent but '\' and '\\' are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1 to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1" means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1. It also understands \ as an escape character, so \\1 is not sufficient and will simply evaluate to the literal output \1. So, we need \\\1 in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1 therefore requires "\\\\\\1". In the case of single quotes one slash can be omitted as \1 is not a valid escape sequence in single quotes and is treated literally.
Addendum
One of the reasons this problem is usually hidden is thanks to the use of /.+/ style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the . for it to be understood as a literal . by the regexp engine, since "." and "\." both evaluate to . in double-quoted strings, but we need the engine itself to receive \..
This happens due to a double string escaping. You should use 5 slashes in this case.
"foo+bar".gsub(/([+])/, '\\\\\1')
Adding \ two more times escapes this properly.
irb(main):011:0> puts "foo+bar".gsub(/(\+)/, '\\\\\1')
foo\+bar
=> nil

Resources