Here are the examples:
Transfer-Encoding = "Transfer-Encoding" ":" 1#transfer-coding
Upgrade = "Upgrade" ":" 1#product
Server = "Server" ":" 1*( product | comment )
delta-seconds = 1*DIGIT
Via = "Via" ":" 1#( received-protocol received-by [ comment ] )
chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
date3 = month SP ( 2DIGIT | ( SP 1DIGIT ))
Questions are:
What is the 1#transfer-coding (the 1# regarding the rule transfer-coding)? Same with 1#product.
What does 1 times x mean, as in 1*( product | comment )? Or 1*DIGIT.
What do the brackets mean, as in [ comment ]? The parens (...) group it all, but what about the [...]?
What does the *(...) mean, as in *( ";" chunk-ext-name [ "=" chunk-ext-val ] )?
What do the nested square brackets mean, as in [ abs_path [ "?" query ]]? Nested optional values? It doesn't make sense.
What does 2DIGIT and 1DIGIT mean, where do those come from / get defined?
I may have missed where these are defined, but knowing these would help clarify how to parse the grammar definitions they use in the RFCs.
I get the rest of the grammar notation, juts not these few remaining pieces.
Update: Looks like this is a good start.
Square brackets enclose an optional element sequence:
[foo bar]
is equivalent to
*1(foo bar).
Specific Repetition: nRule
A rule of the form:
<n>element
is equivalent to
<n>*<n>element
That is, exactly <n> occurrences of <element>. Thus, 2DIGIT is a
2-digit number, and 3ALPHA is a string of three alphabetic
characters.
Variable Repetition: *Rule
The operator "*" preceding an element indicates repetition. The full
form is:
<a>*<b>element
where <a> and <b> are optional decimal values, indicating at least
<a> and at most <b> occurrences of the element.
Default values are 0 and infinity so that *<element> allows any
number, including zero; 1*<element> requires at least one;
3*3<element> allows exactly 3; and 1*2<element> allows one or two.
But what I'm still missing is what the # means?
Update 2: Found it I think!
#RULE: LISTS
A construct "#" is defined, similar to "*", as follows:
<l>#<m>element
indicating at least <l> and at most <m> elements, each separated
by one or more commas (","). This makes the usual form of lists
very easy; a rule such as '(element *("," element))' can be shown
as "1#element".
Also, what do these mean?
1*2DIGIT
2*4DIGIT
I have a code in which extracts variable from expressions
eg:-
expr = "a + b *2"
expr.split(/\W+/).reject{ |s| (s.to_i.to_s == s || s.to_f.to_s == s || s == "")
but if expr has pointers
eg:-
expr = "*a + b -*c"
It removes * from it. is there any way so that I can extract pointers also?
Rather than split/reject, I'd recommend scan:
expr.scan(/\*?\w+/)
#=> ["*a", "b", "*c"]
The regular expression looks for an optional * followed by one or more word chars.
If true quotation has zero arguments I can use when word because implicit false quotation also has zero arguments (does nothing).
But when I want to consume argument, I need else branch just to clean-up the stack. If logic were more complex, I imagine it might be tedious and error-prone re-factoring. Is there an easier way?
: print-if-dir ( directory-entry -- ) dup directory? [ name>> . ] [ drop ] if ;
You need to use a smart-when*:
USE: combinators.smart
: print-if-string ( object -- ) [ string? ] [ . ] smart-when* ;
Testing this out in the listener:
scratchpad: 2 print-if-string ! Nothing happens
scratchpad: "2" print-if-string ! Prints "2"
"2"
HI I have a string like { Its A Very Good Day! Isn't It }. I have to change all the first letter of every word to lower case but the spaces should also be there. For changing to upper case I have used the following code but I do not know how to include the spaces as well.
The code is:
set wordlist { Its A Very Good Day! Isn't It }
set newlistupper [list]
for {set i 0} {$i < [llength $wordlist]} {incr i} {
set word [lindex $wordlist $i]
set newupper [string toupper $word 0 0]
lappend newlistupper $newupper
}
puts $newlistupper
I want to know how to keep the spaces also in the output. Please help.
I would modify your current script a little bit like so:
set wordlist { Its A Very Good Day! Isn't It }
set newlistlower [list]
foreach word [regexp -inline -all -- {\W+|\w+} $wordlist] {
set newlower [string tolower $word 0 0]
lappend newlistlower $newlower
}
puts [join $newlistlower ""]
[regexp -inline -all -- {\W+|\w+} $wordlist] splits the string into word and non-word characters, which means you get to keep spaces and punctuations and so on.
foreach allows you to get each word (spaces get into the loop as well, but string tolower won't be affecting them).
This will also work on strings such as:
set wordlist {Its A Very Good Day! Isn't It--RIGHT }
to give:
{its a very good day! isn't it--rIGHT }
(Braces put to show that the trailing space on the right is kept)
You can use a similar technique to my answer for your previous question:
set sentence { Its A Very Good Day! Isn't It }
set lc [subst -nob -nov [regsub -all {\s[[:upper:]]} $sentence {[string tolower "&"]}]]
puts ">$lc<"
> its a very good day! isn't it <
Another way to do it
% regexp -all -inline -indices {\s[[:upper:]]} $sentence
{1 2} {5 6} {7 8} {12 13} {17 18} {22 23} {28 29}
% set lc $sentence
Its A Very Good Day! Isn't It
% foreach match [regexp -all -inline -indices {\s[[:upper:]]} $sentence ] {
set lc [string replace $lc {*}$match [string tolower [string range $lc {*}$match]]]
}
% puts ">$lc<"
> its a very good day! isn't it <
I'm trying to use a DCG to split a string into two parts separated by spaces. E.g. 'abc def' should give me back "abc" & "def". The program & DCG are below.
main:-
prompt(_, ''),
repeat,
read_line_to_codes(current_input, Codes),
(
Codes = end_of_file
->
true
;
processData(Codes),
fail
).
processData(Codes):-
(
phrase(data(Part1, Part2), Codes)
->
format('~s, ~s\n', [ Part1, Part2 ])
;
format('Didn''t recognize data.\n')
).
data([ P1 | Part1 ], [ P2 | Part2 ]) --> [ P1 | Part1 ], spaces(_), [ P2 | Part2 ].
spaces([ S | S1 ]) --> [ S ], { code_type(S, space) }, (spaces(S1); "").
This works correctly. But I found that having to type [ P1 | Part1 ] & [ P2 | Part2 ] was really verbose. So, I tried replacing all instances of [ P1 | Part1 ] w/ Part1 & likewise w/ [ P2 | Part2 ] in the definition of data, i.e. the following.
data(Part1, Part2) --> Part1, spaces(_), Part2.
That's much easier to type, but that gave me an Arguments are not sufficiently instantiated error. So it looks like an unbound variable isn't automatically interpreted as a list of codes in a DCG. Is there any other way to make this less verbose? My intent is to use DCG's where I would use regular expressions in other programming languages.
Your intuition is correct; the term-expansion procedure for DCGs (at least in SWI-Prolog, but should apply to others) with your modified version of data gives the following:
?- listing(data).
data(A, D, B, F) :-
phrase(A, B, C),
spaces(_, C, E),
phrase(D, E, F).
As you can see, the variable Part1 and Part2 parts of your DCG rule have been interpreted into calls to phrase/3 again, and not lists; you need to explicitly specify that they are lists for them to be treated as such.
I can suggest an alternative version which is more general. Consider the following bunch of DCG rules:
data([A|As]) -->
spaces(_),
chars([X|Xs]),
{atom_codes(A, [X|Xs])},
spaces(_),
data(As).
data([]) --> [].
chars([X|Xs]) --> char(X), !, chars(Xs).
chars([]) --> [].
spaces([X|Xs]) --> space(X), !, spaces(Xs).
spaces([]) --> [].
space(X) --> [X], {code_type(X, space)}.
char(X) --> [X], {\+ code_type(X, space)}.
Take a look at the first clause at the top; the data rule now attempts to match 0-to-many spaces (as many as possible, because of the cut), then one-to-many non-space characters to construct an atom (A) from the codes, then 0-to-many spaces again, then recurses to find more atoms in the string (As). What you end up with is a list of atoms which appeared in the input string without any spaces. You can incorporate this version into your code with the following:
processData(Codes) :-
% convert the list of codes to a list of code lists of words
(phrase(data(AtomList), Codes) ->
% concatenate the atoms into a single one delimited by commas
concat_atom(AtomList, ', ', Atoms),
write_ln(Atoms)
;
format('Didn''t recognize data.\n')
).
This version breaks a string apart with any number of spaces between words, even if they appear at the start and end of the string.