The expression contained an invalid collating element name - c++14

My reg object created with below expression:
std::string regE("\\s*[0-9A-Za-z_.]+\\s*=\\s*[a-zA-Z0-9._()\\s-,/*+!~\"'?<>\\[\\]{}|^%$##]+");
std::regex r(regE);
and i am getting below exception at runtime:
The expression contained an invalid collating element name
terminate called after throwing an instance of 'std::__1::regex_error'
what(): The expression contained an invalid collating element name.

The error originates from this place:
\\s*[0-9A-Za-z_.]+\\s*=\\s*[a-zA-Z0-9._()\\s-,/*+!~\"'?<>\\[\\]{}|^%$##]+
~^~
The dash character, -, gains a special meaning inside a bracket expression, i.e., it specifies a range of characters. Because there's no such range as [\\s-,], that is, starting from \s and ending with ,, an error is reported.
In order to parse - literally, it must be at the beginning of the character sequence enclosed by brackets (excluding the ^ negation operator), at the end, or escaped with \.
Also notice that C++ supports raw string literals, that can be used to avoid escaping characters and thus make regular expressions more readable. Having said that, the correct and simplified regular expression could be:
std::regex r(R"(\s*[0-9A-Za-z_.]+\s*=\s*[a-zA-Z0-9._()\s\-,/*+!~"'?<>\[\]{}|^%$##]+)");

In my expression i have done 2 mistakes:
1) \s*[0-9A-Za-z_.]+\s*=\s*[a-zA-Z0-9._()\s-
the last '-' Character range is out of order
2) /*+!~\"'?<>\[\]{}|^%$##]+
the first '/' character is an unescaped delimiter so it must be escaped with a backslash (\)

Related

Ruby method gsub with string '+'

I've found interesting thing in ruby. Do anybody know why is behavior?
tried '+'.gsub!('+', '\+') and expected "\\+" but got ""(empty string)
gsub is implemented, after some indirection, as rb_sub_str_bang in C, which calls rb_reg_regsub.
Now, gsub is supposed to allow the replacement string to contain backreferences. That is, if you pass a regular expression as the first argument and that regex defines a capture group, then your replacement string can include \1 to indicate that that capture group should be placed at that position.
That behavior evidently still happens if you pass an ordinary, non-regex string as the pattern. Your verbatim string obviously won't have any capture groups, so it's a bit silly in this case. But trying to replace, for instance, + with \1 in the string + will give the empty string, since \1 says to go get the first capture group, which doesn't exist and hence is vacuously "".
Now, you might be thinking: + isn't a number. And you'd be right. You're replacing + with \+. There are several other backreferences allowed in your replacement string. I couldn't find any official documentation where these are written down, but the source code does quite fine. To summarize the code:
Digits \1 through \9 refer to numbered capture groups.
\k<...> refers to a named capture group, with the name in angled brackets.
\0 or \& refer to the whole substring that was matched, so (\0) as a replacement string would enclose the match in parentheses.
A backslash followed by a backtick (I have no idea how to write that using StackOverflow's markdown) refers to the entire string up to the match.
\' refers to the entire string following the match.
\+ refers to the final capture group, i.e. the one with the highest number.
\\ is a literal backslash.
(Most of these are based on Perl variables of a similar name)
So, in your examples,
\+ as the replacement string says "take the last capture group". There is no capture group, so you get the empty string.
\- is not a valid backreference, so it's replaced verbatim.
\ok is, likewise, not a backreference, so it's replaced verbatim.
In \\+, Ruby eats the first backslash sequence, so the actual string at runtime is \+, equivalent to the first example.
For \\\+, Ruby processes the first backslash sequence, so we get \\+ by the time the replacement function sees it. \\ is a literal backslash, and + is no longer part of an escape sequence, so we get \+.

What does "1\/1." mean in Ruby?

I am learning Ruby and I have something to match with (/^1\/1. Guess a word from an anagram [RUBY]{4}$/)
Please, what does "1\/1." mean in this expression. Can anyone explain what's going on for me.
Thanks
Generally speaking, a backslash in a regular expression escapes the next character, so that it's treated as an ordinary character rather than whatever its special meaning would be. For instance a* matches zero or more of the letter a, but a\* matches, literally, an a followed by a star. Since most regular expressions in Ruby are wrapped in the delimiter /, we can't directly put forward slashes in our regex. If we had written
/^1/1. Guess a word from an anagram [RUBY]{4}$/
Then the regex would be /^1/ and the rest of the line would be a very confusing syntax error. This is for the same reasons that we can't put " characters directly inside of a "-delimited string.
So a backslash treats it as an actual slash in the expression rather than a delimiter.
/^1\/1. Guess a word from an anagram [RUBY]{4}$/
We're literally matches a 1 followed by a slash followed by a 1 at the start of the line.

'%' express any characters, is there any special character for only one character?

In Oracle '%' stands for any characters in that position.
Example:
Select * from table where id like '%1'
This stands for anything behind the number 1 : XXXXXXXXXXXX1 99999999991.
Is there any other character to express only 1 character ?.
Example of what I mean: (im going to use ~ as that reserved character)
Select * from table where id like '~1'
In this case only 91, x1, X1... etc would enter the select, but XX1 woudn't as you only used one ~.
Select * from table where id like '~~~1'
xxx1, 9991, 8881, etc....
Hope I explained myself, english is not my native language.
Only one wildcard character is represented by underscore, _
You can refer to Oracle's LIKE condition documentation:
like_condition::=
In this syntax:
char1 is a character expression, such as a character column, called the search value.
char2 is a character expression, usually a literal, called the pattern.
esc_char is a character expression, usually a literal, called the escape character.
[...]
The pattern can contain special pattern-matching characters:
An underscore (_) in the pattern matches exactly one character (as opposed to one byte in a multibyte character set) in the value.
A percent sign (%) in the pattern can match zero or more characters (as opposed to bytes in a multibyte character set) in the value. The pattern '%' cannot match a null.
Use an _ underscore.

Need to match a string containing the string file: and report in the string

Need to match a string containing the string "file://\\" and "report" in the string.
if i use the regular expression (file://\\\\)(.*)\\\\report\\\\(.*) it is working fine.
but, if i use the expression (file://\\\\)(.*)\\report\\(.*) it is giving errors.
My question is why do need to use four back slashes(\\\\) to do a match for one back slash present before and after the report string.
*wstring target(L"file://\\\\Example\\report\\001");
wsmatch wideMatch;
wregex wrx(L"(file://\\\\)(.*)\\\\report\\\\(.*)");
if (regex_match(target.cbegin(), target.cend(), wideMatch, wrx))
wcout << L"The matching text is:" << wideMatch.str() << endl;*
can some one please answer. Thanks in advance...
Backslashes are special in both string literals and in regular expressions. To match a backslash in a regular expression you need to escape it, by adding a second backslash. And to have two backslashes in a string literal then you need to escape both of them leading to you needing four backslashes.

Backslash + captured group within Ruby regular expression

How do I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?
As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1' or '\\\\\\1'. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1 is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1' and '\1' are equivalent but '\' and '\\' are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1 to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1" means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1. It also understands \ as an escape character, so \\1 is not sufficient and will simply evaluate to the literal output \1. So, we need \\\1 in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1 therefore requires "\\\\\\1". In the case of single quotes one slash can be omitted as \1 is not a valid escape sequence in single quotes and is treated literally.
Addendum
One of the reasons this problem is usually hidden is thanks to the use of /.+/ style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the . for it to be understood as a literal . by the regexp engine, since "." and "\." both evaluate to . in double-quoted strings, but we need the engine itself to receive \..
This happens due to a double string escaping. You should use 5 slashes in this case.
"foo+bar".gsub(/([+])/, '\\\\\1')
Adding \ two more times escapes this properly.
irb(main):011:0> puts "foo+bar".gsub(/(\+)/, '\\\\\1')
foo\+bar
=> nil

Resources