How to describe a quoted string in EBNF - ebnf

How do I describe a quoted string (like in C, Java, etc) in EBNF notation?
I was thinking of this (see below), but the AnyCharacter part will also match the double quotes (").
QuotedString = '"' AnyCharacter* '"' ;
In other words, how do I match all characters except the double quote character ("), but still allow escapes (/")?

You could do something like
string = " printable-chars | nested-quotes "
where
printable chars = letter | digit | ~ # # % _ $ & ' - + /
where
letter = A..Z | a..z | extended ascii
and
digit = 0..9
I think you've got the general idea

Related

How to remove special characters and multiple spaces from a string [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I want to remove all special characters (including spaces) from a string's beginning and end and replace consecutive spaces with one. For example,
" !:;:§" this string is normal. "§$"§"$"§$ $"$§" "
should become:
"this string is normal"
I want to allow ! and ? at the end of the string.
" !:;:§" this string is normal? "§$"§"$"§$ $"$§" "
" !:;:§" this string is very normal! "§$"§"$"§$ $"$§" "
" !:;:§" this string is very normal!? "§$"§"$"§$ $"$§" "
should become:
"this string is normal?"
"this string is normal!"
"this string is normal!?"
This is all for getting nice titles in an app.
Can someone help me please? Or does anyone know a good regex command for nice titles?
Do it step by step:
str.
gsub(/\A\W+/, ''). # remove garbage from the very beginning
gsub(/\W*\z/) { |m| m[/\A\p{Punct}*/] }. # leave trailing punctuation
gsub(/\s{2,}/, ' ') # squeeze
R = /
(?: # begin a non-capture group
\p{Alnum}+ # match one or more alphanumeric characters
[ ]+ # match one or more spaces
)* # end non-capture group and execute zero or more times
\p{Alnum}+ # match one or more alphanumeric characters
[!?]* # match zero or more characters '!' and '?'
/x # free-spacing regex definition mode
def extract(str)
str[R].squeeze(' ')
end
arr = [
' !:;:§" this string is normal? "§$"§"$"§$ $"$§" ',
' !:;:§" this string is very normal! "§$"§"$"§$ $"$§" ',
' !:;:§" this string is very normal!? "§$"§"$"§$ $"$§" ',
' !:;:§" cette chaîne est normale? "§$"§"$"§$ $"$§" '
]
arr.each { |s| puts extract(s) }
prints
this string is normal?
this string is very normal!
this string is very normal!?
cette chaîne est normale?
See the doc for \p{Alnum} in Regexp (search for "\p{} construct").
I wrote the regular expression in free-spacing mode in order to document each step. It would conventionally be written as follows.
/(?:\p{Alnum}+ +)*\p{Alnum}+[!?]*/
Notice that in free-spacing mode I put a space in a character class. Had I not done so the space would have been removed before the regular expression was evaluated.
If non-alphanumeric characters, other than spaces, are permitted in the interior of the string, change the regular expression to the following.
def extract(str)
str.gsub(R,'')
end
R = /
\A # match the beginning of the string
[^\p{Alnum}]+ # match one non-alphanumeric characters
| # or
[^\p{Alnum}!?] # match a character other than a alphanumeric, '!' and '?'
[^\p{Alnum}]+ # match one non-alphanumeric characters
\z # match the end of the string
| # or
[ ] # match a space...
(?=[ ]) # ...followed by a space
/x # free-spacing regex definition mode
extract ' !:;:§" this string $$ is abnormal? "§$" $"$§" '
prints
"this string $$ is abnormal?"
This will regex will:
Question and exclamantion marks that are not preceded by a "normal" character or a question or exclamation mark.
Whitespaces that are not preceded by a "normal" character
All non-"normal" characters
The word "very"
(I assume "normal" characters in this case are 0..9, a..z and A..Z).
str = '" !:;:§" this string is very normal!? "§$"§"$"§$ $"$§" "'
str.gsub(/
(?:\bvery\s+) |
(?:(?<![A-Za-z\d!?])[!?]) |
(?:(?<![A-Za-z\d])\s) |
[^A-Za-z\s\d!?]
/x, '')
=> "this string is normal!?"

tr command: strange behavior with | and \

Let's say I have a file test.txt with contents:
+-foo.bar:2.4
| bar.foo:1.1:test
\| hello.goobye:3.3.3
\|+- baz.yeah:4
I want to use the tr command to delete all instances of the following set of characters:
{' ', '+', '-', '|', '\'}
Done some pretty extensive research on this but found no clear/concise answers.
This is the command that works:
input:
cat test.txt | tr -d "[:blank:]|\\\+-"
output:
foo.bar:2.4
bar.foo:1.1:test
hello.goobye:3.3.3
baz.yeah:4
I experimented with many combinations of that set and I found out that the '-' was being treated as a range indicator (like... [a-z]) and therefore must be put at the end. But I have two main questions:
1) Why must the backslash be double escaped in order to be included in the set?
2) Why does putting the '|' at the end of the set string cause the tr program to delete everything in the file except for trailing new line characters?
Like this:
tr -d '\-|\\+[:blank:] ' < file
You have to escape the - because it is used for denoting ranges of characters like:
tr -d '1-5'
and must therefore being escaped if you mean a literal hyphen. You can also put it at the end. (learned that, thanks! :) )
Furthermore the \ must be escaped when you mean a literal \ because it has a special meaning needed for escape sequences.
The remaining characters must not being escaped.
Why must the \ being doubly escaped in your example?
It's because you are using a "" (double quoted) string to quote the char set. A double quoted string will be interpreted by the shell, a \\ in a double quoted string means a literal \. Try:
echo "\+"
echo "\\+"
echo "\\\+"
To avoid to doubly escape the \ you can just use single quotes as in my example above.
Why does putting the '|' at the end of the set string cause the tr program to delete everything in the file except for trailing new line characters?
Following CharlesDuffy's comment having the | at the end means also that you had the unescaped - not at the end, which means it was describing a range of characters where the actual range depends on the position you had it in the set.
another approach is to define the allowed chars
$ tr -cd '[:alnum:]:.\n' <file
foo.bar:2.4
bar.foo:1.1:test
hello.goobye:3.3.3
baz.yeah:4
or, perhaps delete all the prefix non-word chars
$ sed -E 's/\W+//' file

How to escape special chars powershell

I am using the code below to send some keys to automate some process in my company.
$wshell = New-Object -ComObject wscript.shell;
$wshell.SendKeys("here comes my string");
The problem is that the string that gets sent must be sanitazed to escape some special chars as described here.
For example: {, [, +, ~ all those symbols must be escaped like {{}, {[}, {+}, {~}
So I am wondering: is there any easy/clean way to do a replace in the string? I dont want to use tons of string.replace("{","{{}"); string.replace("[","{[}")
What is the right way to do this?
You can use a Regular Expression (RegEx for short) to do this. RegEx is used for pattern matching, and works great for what you need. Ironicly you will need to escape the characters for RegEx before defining the RegEx pattern, so we'll make an array of the special characters, escape them, join them all with | (which indicates OR), and then replace on that with the -replace operator.
$SendKeysSpecialChars = '{','}','[',']','~','+','^','%'
$ToEscape = ($SendKeysSpecialChars|%{[regex]::Escape($_)}) -join '|'
"I need to escape [ and } but not # or !, but I do need to for %" -replace "($ToEscape)",'{$1}'
That produces:
I need to escape {[} and {}} but not # or !, but I do need to for {%}
Just put the first two near the beginning of the script, then use the replace as needed. Or make a function that you can call that'll take care of the replace and the SendKeys call for you.
You can use Here Strings.
Note: Here Strings were designed for multi-line strings, but you can still use them to escape expression characters.
As stated on this website.
A here string is a single-quoted or double-quoted string which can
span multiple lines. Expressions in single-quoted strings are not
evaluated.
All the lines in a here-string are interpreted as strings,
even though they are not enclosed in quotation marks.
Example:
To declare a here string you have to use a new-line for the text
itself, Powershell syntax.
$string = #'
{ [ + ~ ! £ $ % ^ & ( ) _ - # ~ # '' ""
'#
Output: { [ + ~ ! £ $ % ^ & ( ) _ - # ~ # '' ""

Where did the character go?

I matched a string against a regex:
s = "`` `foo`"
r = /(?<backticks>`+)(?<inline>.+)\g<backticks>/
And I got:
s =~ r
$& # => "`` `foo`"
$~[:backticks] # => "`"
$~[:inline] # => " `foo"
Why is $~[:inline] not "` `foo"? Since $& is s, I expect:
$~[:backticks] + $~[:inline] + $~[:backticks]
to be s, but it is not, one backtick is gone. Where did the backtick go?
It is actually expected. Look:
(?<backticks>`+) - matches 1+ backticks and stores them in the named capture group "backticks" (there are two backticks). Then...
(?<inline>.+) - 1+ characters other than a newline are matched into the "inline" named capture group. It grabs all the string and backtracks to yield characters to the recursed subpattern that is actually the "backticks" capture group. So,...
\g<backticks> - finds 1 backtick that is at the end of the string. It satisfies the condition to match 1+ backticks. The named capture "backtick" buffer is re-written here.
The matching works like this:
"`` `foo`"
||1
| 2 |
|3
And then 1 becomes 3, and since 1 and 3 are the same group, you see one backtick.

Escape status within a string literal as argument of `String#tr`

There is something mysterious to me about the escape status of a backslash within a single quoted string literal as argument of String#tr. Can you explain the contrast between the three examples below? I particularly do not understand the second one. To avoid complication, I am using 'd' here, which does not change the meaning when escaped in double quotation ("\d" = "d").
'\\'.tr('\\', 'x') #=> "x"
'\\'.tr('\\d', 'x') #=> "\\"
'\\'.tr('\\\d', 'x') #=> "x"
Escaping in tr
The first argument of tr works much like bracket character grouping in regular expressions. You can use ^ in the start of the expression to negate the matching (replace anything that doesn't match) and use e.g. a-f to match a range of characters. Since it has control characters, it also does escaping internally, so you can use - and ^ as literal characters.
print 'abcdef'.tr('b-e', 'x') # axxxxf
print 'abcdef'.tr('b\-e', 'x') # axcdxf
Escaping in Ruby single quote strings
Furthermore, when using single quotes, Ruby tries to include the backslash when possible, i.e. when it's not used to actually escape another backslash or a single quote.
# Single quotes
print '\\' # \
print '\d' # \d
print '\\d' # \d
print '\\\d' # \\d
# Double quotes
print "\\" # \
print "\d" # d
print "\\d" # \d
print "\\\d" # \d
The examples revisited
With all that in mind, let's look at the examples again.
'\\'.tr('\\', 'x') #=> "x"
The string defined as '\\' becomes the literal string \ because the first backslash escapes the second. No surprises there.
'\\'.tr('\\d', 'x') #=> "\\"
The string defined as '\\d' becomes the literal string \d. The tr engine, in turn uses the backslash in the literal string to escape the d. Result: tr replaces instances of d with x.
'\\'.tr('\\\d', 'x') #=> "x"
The string defined as '\\\d' becomes the literal \\d. First \\ becomes \. Then \d becomes \d, i.e. the backslash is preserved. (This particular behavior is different from double strings, where the backslash would be eaten alive, leaving only a lonesome d)
The literal string \\d then makes tr replace all characters that are either a backslash or a d with the replacement string.

Resources