Here are the examples:
Transfer-Encoding = "Transfer-Encoding" ":" 1#transfer-coding
Upgrade = "Upgrade" ":" 1#product
Server = "Server" ":" 1*( product | comment )
delta-seconds = 1*DIGIT
Via = "Via" ":" 1#( received-protocol received-by [ comment ] )
chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
date3 = month SP ( 2DIGIT | ( SP 1DIGIT ))
Questions are:
What is the 1#transfer-coding (the 1# regarding the rule transfer-coding)? Same with 1#product.
What does 1 times x mean, as in 1*( product | comment )? Or 1*DIGIT.
What do the brackets mean, as in [ comment ]? The parens (...) group it all, but what about the [...]?
What does the *(...) mean, as in *( ";" chunk-ext-name [ "=" chunk-ext-val ] )?
What do the nested square brackets mean, as in [ abs_path [ "?" query ]]? Nested optional values? It doesn't make sense.
What does 2DIGIT and 1DIGIT mean, where do those come from / get defined?
I may have missed where these are defined, but knowing these would help clarify how to parse the grammar definitions they use in the RFCs.
I get the rest of the grammar notation, juts not these few remaining pieces.
Update: Looks like this is a good start.
Square brackets enclose an optional element sequence:
[foo bar]
is equivalent to
*1(foo bar).
Specific Repetition: nRule
A rule of the form:
<n>element
is equivalent to
<n>*<n>element
That is, exactly <n> occurrences of <element>. Thus, 2DIGIT is a
2-digit number, and 3ALPHA is a string of three alphabetic
characters.
Variable Repetition: *Rule
The operator "*" preceding an element indicates repetition. The full
form is:
<a>*<b>element
where <a> and <b> are optional decimal values, indicating at least
<a> and at most <b> occurrences of the element.
Default values are 0 and infinity so that *<element> allows any
number, including zero; 1*<element> requires at least one;
3*3<element> allows exactly 3; and 1*2<element> allows one or two.
But what I'm still missing is what the # means?
Update 2: Found it I think!
#RULE: LISTS
A construct "#" is defined, similar to "*", as follows:
<l>#<m>element
indicating at least <l> and at most <m> elements, each separated
by one or more commas (","). This makes the usual form of lists
very easy; a rule such as '(element *("," element))' can be shown
as "1#element".
Also, what do these mean?
1*2DIGIT
2*4DIGIT
Related
I have query in my project and that is having REGEXP_REPLACE
i tried to find how it works by searching but i found it like
w+ Matches a word character (that is, an alphanumeric or underscore
(_) character).
but not able to find '"\w+\":' why these "" are used and what is mean by '{|}|"',''
UPDATE (SELECT data,data_value FROM TEMP) t
SET t.DATA_VALUE=REGEXP_REPLACE(REGEXP_REPLACE(t.data, '"\w+\":',''),'{|}|"','');
can you please tell me how it works?
This appear to be a regular expression for stripping keys and enclosing brackets from a JSON string - unfortunately, if this is the case then it does not work in all situations.
The regular expression
'"\w+\":'
will match:
A " double quotation mark;
\w+ one-or-more word (a-z or A-Z or 0-9 or _) characters;
\" another double quotation mark - note: the \ character is not necessary; then
A : colon.
So:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote"}',
'"\w+":', -- Pattern matched
'' -- Replacement string
)
Will output:
{"value","value with \"quote"}
The second pattern {|}|" will match either a {, or a } or a " character (and could have been equivalently written as [{}"]) so:
REGEXP_REPLACE(
'{"value","value with \"quote"}',
'{|}|"', -- Pattern matched
'' -- Replacement string
)
Will output:
value,value with \quote
Which is fine, until (like my example) you have an escaped double quote (or curly braces) in the value string; in which case those will also get stripped leaving the escape character.
(Note: you would not typically find this but it is possible to include escaped quotes in the key. So {"keywith\":quote":"value"} would get replaced to {quote":"value"} and then quote:value which is not the intended output.)
If parsing JSON is what you are trying to do (pre-Oracle 12) then you can use:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'^{|"(\\"|[^"])+":(")?((\\"|[^"])+?)\2((,)|})',
'\3\6'
)
Which outputs:
value,value with \"quote,value with \"{}
Or in Oracle 12 you can do:
SELECT *
FROM JSON_TABLE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'$.*' NULL ON ERROR
COLUMNS (
value VARCHAR2(4000) PATH '$'
)
)
Which outputs:
VALUE
-----------------
value
value with "quote
value with "{}
example:::REGEXP_REPLACE( string, pattern [, replacement_string [, start_position [, nth_appearance [, match_parameter ] ] ] ] )
| is or(CAN MEAN MORE THAN ONE ALTERNATIVE ) , is for at least as in {n,} at least n times
https://www.techonthenet.com/oracle/functions/regexp_replace.php
"where I got my info"
'"\w+\":' why these "" are used and what is mean by '{|}|"',''
Matches a word character(\w)One or more times(+) this has to be messed up it's missing the right quantity of close parentheses by putting \" w+ \"
they allow the " to be shown. This expression takes one expression changes it then uses that as the basis for the next change. Good luck figuring the rest out. Regular expressions aren't too bad, pretty intuitive once you get the basics down.
In Python we can use pass clause as an placeholder.
What is the equivalent clause in Golang?
An ; or something else?
The Go Programming Language Specification
Empty statements
The empty statement does nothing.
EmptyStmt = .
Notation
The syntax is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" [ Expression ] "." .
Expression = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term = production_name | token [ "…" token ] | Group | Option | Repetition .
Group = "(" Expression ")" .
Option = "[" Expression "]" .
Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following
operators, in increasing precedence:
| alternation
() grouping
[] option (0 or 1 times)
{} repetition (0 to n times)
Lower-case production names are used to identify lexical tokens.
Non-terminals are in CamelCase. Lexical tokens are enclosed in double
quotes "" or back quotes ``.
The form a … b represents the set of characters from a through b as
alternatives. The horizontal ellipsis … is also used elsewhere in the
spec to informally denote various enumerations or code snippets that
are not further specified. The character … (as opposed to the three
characters ...) is not a token of the Go language.
The empty statement is empty. In EBNF (Extended Backus–Naur Form) form: EmptyStmt = . or an empty string.
For example,
for {
}
var no
if true {
} else {
no = true
}
I know, or at least I think I know, what this does (string.split(/\?|\.|!/).size); splits the string at every ending punctuation into an array and then gets the size of the array.
The part I am confused with is (/\?|\.|!/).
Thank you for your explanation.
Regular expressions are surrounded by slashes / /
The backslash before the question mark and dot means use those characters literally (don't interpret them as special instructions)
The vertical pipes are "or"
So you have / then question mark \? then "or" | then period \. then "or" | then exclamation point ! then / to end the expression.
/\?|\.|!/
It's a Regular Expression. That particular one matches any '?', '.' or '!' in the target string.
You can learn more about them here: http://regexr.com/
A regular expression splitting on the char "a" would look like this: /a/. A regular expression splitting on "a" or "b" is like this: /a|b/. So splitting on "?", "!" and "." would look like /?|!|./ - but it does not. Unfortunately, "?", and "." have special meaning in regexps which we do not want in this case, so they must be escaped, using "\".
A way to avoid this is to use Regexp.union("?","!",".") which results in /\?|!|\./
(/\?|\.|!/)
Working outside in:
The parentheses () captures everything enclosed.
The // tell Ruby you're using a Regular Expression.
\? Matches any ?
\. Matches any .
! Matches any !
The preceding \ tells Ruby we want to find these specific characters in the string, rather than using them as special characters.
Special characters (that need to be escaped to be matched) are:
. | ( ) [ ] { } + \ ^ $ * ?.
There is a nice guide to Ruby RegEx at:
http://rubular.com/ & http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm
For SO answers that involve regular expressions, I often use the "extended" mode, which makes them self-documenting. This one would be:
r = /
\? # match a question mark
| # or
\. # match a period
| # or
! # match an explamation mark
/x # extended mode
str = "Out, damn'd spot! out, I say!—One; two: why, then 'tis time to " +
"do't.—Hell is murky.—Fie, my lord, fie, a soldier, and afeard?"
str.split(r)
#=> ["Out, damn'd spot",
# " out, I say",
# "—One; two: why, then 'tis time to do't",
# "—Hell is murky",
# "—Fie, my lord, fie, a soldier, and afeard"]
str.split(r).size #=> 5
#steenslag mentioned Regexp::union. You could also use Regexp::new to write (with single quotes):
r = Regexp.new('\?|\.|!')
#=> /\?|\.|!/
but it really doesn't buy you anything here. You might find it useful in other situations, however.
Apparently I still don't understand exactly how it works ...
Here is my problem: I'm trying to match numbers in strings such as:
910 -6.258000 6.290
That string should gives me an array like this:
[910, -6.2580000, 6.290]
while the string
blabla9999 some more text 1.1
should not be matched.
The regex I'm trying to use is
/([-]?\d+[.]?\d+)/
but it doesn't do exactly that. Could someone help me ?
It would be great if the answer could clarify the use of the parenthesis in the matching.
Here's a pattern that works:
/^[^\d]+?\d+[^\d]+?\d+[\.]?\d+$/
Note that [^\d]+ means at least one non digit character.
On second thought, here's a more generic solution that doesn't need to deal with regular expressions:
str.gsub(/[^\d.-]+/, " ").split.collect{|d| d.to_f}
Example:
str = "blabla9999 some more text -1.1"
Parsed:
[9999.0, -1.1]
The parenthesis have different meanings.
[] defines a character class, that means one character is matched that is part of this class
() is defining a capturing group, the string that is matched by this part in brackets is put into a variable.
You did not define any anchors so your pattern will match your second string
blabla9999 some more text 1.1
^^^^ here ^^^ and here
Maybe this is more what you wanted
^(\s*-?\d+(?:\.\d+)?\s*)+$
See it here on Regexr
^ anchors the pattern to the start of the string and $ to the end.
it allows Whitespace \s before and after the number and an optional fraction part (?:\.\d+)? This kind of pattern will be matched at least once.
maybe /(-?\d+(.\d+)?)+/
irb(main):010:0> "910 -6.258000 6.290".scan(/(\-?\d+(\.\d+)?)+/).map{|x| x[0]}
=> ["910", "-6.258000", "6.290"]
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map(&:to_f)
# => [910.0, -6.258, 6.29]
If you don't want integers to be converted to floats, try this:
str = " 910 -6.258000 6.290"
str.scan(/-?\d+\.?\d+/).map do |ns|
ns[/\./] ? ns.to_f : ns.to_i
end
# => [910, -6.258, 6.29]
I need to be able to match a certain string ('[' then any number of equals signs or none then '['), then i need to match a matching close bracket (']' then the same number of equals signs then ']') after some other match rules. ((options{greedy=false;}:.)* if you must know). I have no clue how to do this in ANTLR, how can i do it?
An example: I need to match [===[whatever arbitrary text ]===] but not [===[whatever arbitrary text ]==].
I need to do it for an arbitrary number of equals signs as well, so therein lies the problem: how do i get it to match an equal number of equals signs in the open as in the close? The supplied parser rules so far dont seem to make sense as far as helping.
You can't easely write a lexer for it, you need parsing rules. Two rules should be sufficient. One is responsible for matching the braces, one for matching the equal signs.
Something like this:
braces : '[' ']'
| '[' equals ']'
;
equals : '=' equals '='
| '=' braces '='
;
This should cover the use case you described. Not absolute shure but maybe you have to use a predicate in the first rule of 'equals' to avoid ambiguous interpretations.
Edit:
It is hard to integrate your greedy rule and at the same time avoid a lexer context switch or something similar (hard in ANTLR). But if you are willing to integrate a little bit of java in your grammer you can write an lexer rule.
The following example grammar shows how:
grammar TestLexer;
SPECIAL : '[' { int counter = 0; } ('=' { counter++; } )+ '[' (options{greedy=false;}:.)* ']' ('=' { counter--; } )+ { if(counter != 0) throw new RecognitionException(input); } ']';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
rule : ID
| SPECIAL
;
Your tags mention lexing, but your question itself doesn't. What you're trying to do is non-regular, so I don't think it can be done as part of lexing (though I don't remember if ANTLR's lexer is strictly regular -- it's been a couple of years since I last used ANTLR).
What you describe should be possible in parsing, however. Here's the grammar for what you described:
thingy : LBRACKET middle RBRACKET;
middle : EQUAL middle EQUAL
| LBRACKET RBRACKET;