flex regular expression in multiple match condition - expression

I can to use flex regular expression to find a match on "hello world" with double quote, the flex regular expression would be ["][^"\n]*["], as the result, I could get match case as hello world string without double quote. However, what happen if the string in "hello"""world", how can I use the flex regular expression to get the result as hello"""world? thanks.

You can solve this by allowing double quotes as an alternative to any single character in the string. At the moment you have a character in the string as [^"\n]. To expand that to allow the double quotes as an alternative you write [^"\n]|\"\". A complete example flex program would look like this:
stringchar ([^\"\r\n]|\"\")
ws [ \t\r\n]+
%%
{ws} printf("Whitespace: \"%s\"\n",yytext);
\"{stringchar}*\" printf("String: %s\n",yytext);
[^ \t\r\n]+ printf("Not a string: \"%s\"\n",yytext);
%%
You should also note that flex has problems with matching adjoining tokens (see How to make lex/flex recognize tokens not separated by whitespace? ) for details. This means that "Hello"World" gets matched as "Not a String" and not a String followed by "Not a String".

Related

how to check if a string contains any form of an apostrophe not just single quote and ruby

I am using the following code to check if a string contains an apostrophe:
string.scan(/’|'/)
I have included two types of single quotation because I found that using just the standard ' did not catch some strings that contain an apostrophe using the ’
My concern is that if I am checking strings that may contain other fonts or styles my regex won't catch the apostrophe.
Is there a more general approach that would catch all forms of an apostrophe?
Straight single quote is the generic ver­ti­cal quo­ta­tion marks:
straight sin­gle quote (')
Curly quotes are the quo­ta­tion marks used in good ty­pog­ra­phy. There are two curly single quote char­ac­ters:
the open­ing sin­gle quote (‘)
the clos­ing sin­gle quote (’)
Going by the above three variants:
You maytry this:
string.scan(/['‘’]/)
Those would probably be the most common ones :
/[‘’']/
If you just need to check if a string contains a regex, you shouldn't use scan :
"apostrophe's" =~ /[‘’']/ #=> 10
=~ will stop at the first match.

Need to match a string containing the string file: and report in the string

Need to match a string containing the string "file://\\" and "report" in the string.
if i use the regular expression (file://\\\\)(.*)\\\\report\\\\(.*) it is working fine.
but, if i use the expression (file://\\\\)(.*)\\report\\(.*) it is giving errors.
My question is why do need to use four back slashes(\\\\) to do a match for one back slash present before and after the report string.
*wstring target(L"file://\\\\Example\\report\\001");
wsmatch wideMatch;
wregex wrx(L"(file://\\\\)(.*)\\\\report\\\\(.*)");
if (regex_match(target.cbegin(), target.cend(), wideMatch, wrx))
wcout << L"The matching text is:" << wideMatch.str() << endl;*
can some one please answer. Thanks in advance...
Backslashes are special in both string literals and in regular expressions. To match a backslash in a regular expression you need to escape it, by adding a second backslash. And to have two backslashes in a string literal then you need to escape both of them leading to you needing four backslashes.

Ruby regex syntax for "not matching one of the following"

Nice simple regex syntax question for you.
I have a block of text and i want to find instances of href=" or href=' which are NOT followed by either [ or http://
I can get "not followed by [" with
record.body =~ /href=['"](?!\[)/
and i can get "not followed by http://" with
record.body =~ /href=['"](?!http\:\/\/)/
But i can't quite work out how to combine the two.
Just to be clear: i want to find bad strings like this
`href="www.foo.com"`
but i'm ok with (ie don't want to find) strings like this
`href="http://www.foo.com"`
`href="[registration_url]"`
Combine the both by using the alternation operator.
href=['"](?!http\:\/\/|\[)
For more specific, it would be.
href=(['"])(?!http\:\/\/|\[)(?:(?!\1).)*\1
This would handle both single quoted or double quoted string in the href part. And this won't match the strings like href='foo.com" or href="foo.com' (unmatched quotes)
(['"]) would capture double quote or single quote. (?!http\:\/\/|\[) and the matched quote won't be followed by http:// or [, if yes, then it moves on to the next pattern. (?:(?!\1).)* matches any character but not of the captured character, zero or more times. \1 followed by the captured character.
DEMO
Use alternative list with pipe | symbol to combine the look-ahead conditions:
(?!http\:\/\/|\[)
So, to match the hrefs, you can use the following regex:
href=\"((?!http\:\/\/|\[)[^\"]+?)\"
See demo on Rubular.com.

Lookahead containing the same token as left/right anchors

Got a variation of the classic "regex quoted strings" problem. I need to pick out strings that look like this:
"foo bar bar"
from a long string like this
token token "maybe quoted token that can also contain spaces"
Each of the tokens can be quoted or unquoted (this is easy to take care of using alternating groups) but sometimes I have quoted strings which have literal quotes inside them (not escaped in any way),
the only useable thing being that those quotes never have spaces on either side (since that would
create a delimiter). Those tokens look like this: "foo-bar"baz"
My initial thought was /"(?:[^"]|" )*"/ but that doesn't seem to work because a token like this: "here is some"quotes" gets split in two.
How should I do this? Platform is Ruby 2.1
Use this:
"(?:[^"]|"\w)+"
or
"(?:[^"]|"\S)+"
You can play with sample strings in the regex demo.
Explanation
" matches the opening quote
The non-capturing group(?:start [^"]|"\w) matches...
One [^"] non-quote character, OR |
One quote and a word character "\w
+ one or more times
" closing quote
Further Refinements
If you want to allow quotes in other contexts, for instance escaped quotes, just add them to the alternation:
"(?:\\"|[^"]|"\w)+"
To allow quotes to be followed not just by a word char but any non-space:
"(?:\\"|[^"]|"\S)+"
This one may also suit your needs:
".*?"(?!\S)
Debuggex Demo
To match also non-quoted tokens:
".*?"(?!\S)|\S+
Debuggex Demo

Backslash + captured group within Ruby regular expression

How do I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?
As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1' or '\\\\\\1'. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1 is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1' and '\1' are equivalent but '\' and '\\' are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1 to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1" means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1. It also understands \ as an escape character, so \\1 is not sufficient and will simply evaluate to the literal output \1. So, we need \\\1 in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1 therefore requires "\\\\\\1". In the case of single quotes one slash can be omitted as \1 is not a valid escape sequence in single quotes and is treated literally.
Addendum
One of the reasons this problem is usually hidden is thanks to the use of /.+/ style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the . for it to be understood as a literal . by the regexp engine, since "." and "\." both evaluate to . in double-quoted strings, but we need the engine itself to receive \..
This happens due to a double string escaping. You should use 5 slashes in this case.
"foo+bar".gsub(/([+])/, '\\\\\1')
Adding \ two more times escapes this properly.
irb(main):011:0> puts "foo+bar".gsub(/(\+)/, '\\\\\1')
foo\+bar
=> nil

Resources