I have tried this expression on a few (online) scheme interpreters/parsers and sometimes get different answers. For the following expression:
(display "okkk") \; "ok;" ;"ok"
What would it display and/or return? Why? For example:
Are \ acceptable outside of an s-expression, and do they escape the next character?
How is a string interpreted outside an expression, or is that invalid?
That probably depends on what Scheme syntax your implementation may support.
One might for example expect:
(display "okkk") -> displays okkk
\; -> error: unbound variable
"ok;" -> displays nothing, but returns the string
;"ok" -> end of line comment
Related
I am running string-match using the pattern [ \[\]a-zA-Z0-9_:.,/-]+ to match a sample text Text [a,b]. Although the pattern works on regex101, when I run it on scheme it returns #f. Here is the regex101 link.
This is the function I am running
(string-match "[ \\[\\]a-zA-Z0-9_:.,/-]+" "Text [a,b]")
Why isn't it working on scheme but works eleswhere? Am I missing something?
After discussing the issue on the guile gnu mailing list, I found out that Guile's (ice-9 regex) library uses POSIX extended regular expressions. And this flavor of regular expression doesn't support escaping in character classes [..], hence that's why it wasn't matching the strings.
However, I used the following function as a workaround and it works:
(string-match "[][a-zA-Z]+" "Text[ab]")
I don't see anything wrong with your regular expression syntax as it is quoted correctly so I assume there must be a bug in Guile, or the regexp library it uses, where \] just isn't interpreted the correct way inside brackets. I found a workaround by using the octal code point values instead:
(string-match "[A-Za-z\\[\\0135]+" "Text [a,b]")
; ==> #("Text [a,b]" (0 . 4))
Your regular expression isn't very good. It matches any combination of those chars so "]/Te,3.xt[2" also matches. If you are expecting a string like "Something [something, something]" I would rather have made /[A-Z][a-z0-9]+ [[a-z0-9]+,[a-z0-9]+]/ instead. eg.
(define pattern "[A-Z][a-z0-9]+ \\[[a-z0-9]+,[a-z0-9]+\\]")
(string-match pattern "Test [q,w]") ; ==> #("Test [q,w]" (0 . 10))
(string-match pattern "Be100 [sub,45]") ; ==> #("Be100 [sub,45]" (0 . 14))
Coming from a C++: it always seems like magic to me that some whitespace has an effect on the validity or semantics of the script. Here's an example:
echo a 2 > &1
bash: syntax error near unexpected token `&'
echo a 2 >&1
a 2
echo a 2>&1
a
echo a 2>& 1
a
Looking at this didn't help much. My main problem is that it does not feel consistent; and I am in a state of confusion.
I'm trying to find out how bash tokenizes its scripts. A general description thereof to clear up any confusion would be appreciated.
Edit:
I am NOT looking for redirections specifically. They just came up as example. Other examples:
A="something"
A = "something"
if [$x = $y];
if [ $x = $y ];
Why isn't there a space necessary between ] and ;? Why does assignment require an immediate equal sign? ...
2>&1 is a single operator token, so any whitespace that breaks it up will change the meaning of the command. It just happens to be a parameterized token, which means the shell will further tokenize it to determine what exactly the operator does. The general form is n>&m, where n is the file descriptor you are redirecting, and m is the descriptor you are copying to. In this case, you are saying that the standard error (2) of the command should be copied to whatever standard output (1) is currently open on.
The examples you gave have the behavior they do for good reason.
Redirection sources default to FD 1. Thus, >&1 is legitimate syntax on its own -- it redirects FD 1 to FD 1 -- meaning allowing whitespace before the > would result in an ambiguous syntax: The parser couldn't tell if the preceding token was its own word or a redirection source.
Nothing other than a FD number is valid under >&, unless you're in a very new bash which allows a variable to be dereferenced to retrieve a FD number. In any event, anything immediately following >& is known to be a file descriptor, so allowing optional whitespace creates no ambiguity there.
a = 1 is parsed as a legitimate command, not a syntax error: It runs the command a with the first argument = and the second argument 1. Disallowing whitespace within assignments eliminates this ambiguity. Similarly, a= foo has a separate and distinct meaning: It exports an environment variable a with an empty value while running the command foo. Relaxing the whitespace rules would disallow both of these legitimate commands.
[ is a command, not special syntax known to the parser; thus, [foo tries to find a command (named, say, /usr/bin/[foo), requiring whitespace.
; takes precedence in the parser as a statement separator, rather than being treated as part of a word, unless quoted or escaped. The same is true of & (another separator), or a newline.
The thing is, there's no single general rule which will explain all this; you need to read and learn the language syntax. Fortunately, there's not very much syntax: Almost all commands are "simple commands", which follow very simple and clear rules. You're asking about, and we're explaining, some of the exceptions to that; there are other exceptions, such as [[ ]] in bash, but they're small enough in total that they can be learned.
Other suggested resources:
http://aosabook.org/en/bash.html (The Architecture of Open Source Applications; chapter on bash)
http://mywiki.wooledge.org/BashParser (Wooledge wiki high-level description of the parser -- though this focuses more on expansion rules than tokenization)
http://mywiki.wooledge.org/BashGuide (an introductory guide to bash syntax in general, written with more of a focus on accuracy and best practices than some competing materials).
Im trying to use CLISP on Windows. So, when I start it in Command line I see next
*** - SYSTEM::DRIVER: Character #\u0414 cannot be represented in the character set CHARSET:cp437
Break 1 [3]>
How can I fix this?
This is an FAQ:
What do these error messages mean: “invalid byte #x94 in CHARSET:ASCII conversion” and “character #\u00B3 cannot be represented in the character set CHARSET:ASCII”?
This means that you are trying to read (“invalid byte”) or write (“character cannot be represented”) a non-ASCII character from (or to) a character stream which has ASCII :EXTERNAL-FORMAT. The default is described in -Edomain encoding.
This may also be caused by filesystem access. If you have files with names incompatible with your CUSTOM:*PATHNAME-ENCODING*, filesystem access (e.g., DIRECTORY) will SIGNAL this ERROR. You will need to set CUSTOM:*PATHNAME-ENCODING* or pass -Edomain encoding to CLISP. Using a “1:1” encoding, such as CHARSET:ISO-8859-1, should help you avoid this error.
Note that this error may be signaled by the “Print” part of the read-eval-print loop and not by the function you call. E.g., if file "foo" contains non-ASCII characters, you will see such an error when you type
(WITH-OPEN-FILE (s "foo"
:direction :input
:EXTERNAL-FORMAT CHARSET:ISO-8859-1)
(READ-LINE s))
If instead you type
(WITH-OPEN-FILE (s "foo"
:direction :input
:EXTERNAL-FORMAT CHARSET:ISO-8859-1)
(SETQ l (READ-LINE s))
NIL)
CLISP will just print NIL and signal the error when you type l.
cp437 seems to indicate a code page. Code page 437 is "US-ASCII" if I remember correctly, that is only 7 bits. It seems that you need to configure your "Command line" to display unicode.
I want to write a code in DrRacket that accepts a multiple words from the command prompt and does converts them to a list of string. For eg. if I enter hello how do you do in the prompt, it should convert it into a list '("hello" "how" "do" "you" "do"). Is it possible in DrRacket?
i tried this:
(define inp-lst (read))
On running this code, an input bar is shown in the command promt. but when i enter the above line, the value of inp-lst turns out to be just 'hello. Can someone help?
As a first step, type your input between quotes, like this:
(define inp-lst (read))
"hello how do you do"
Now, you can create a list of strings doing this:
(regexp-split #px" " inp-lst)
> '("hello" "how" "do" "you" "do")
EDIT :
As has been pointed out in the comments, read-line might be a better alternative:
(define inp-lst (read-line))
(regexp-split #px" " inp-lst)
> '("hello" "how" "do" "you" "do")
Using read-line, you don't need to surround the typed text with quotes.
The 'read' function reads one expression which in your case it the single symbol 'hello.' Your intention is to read one line, terminated by #\newline, get a single string and then split it by #\space. Try 'read-line'
I have written a function match-rewriter that is just match-lambda except that it returns its argument if no match is found. match-rewriter is part of a larger function. Here is a portion of the code:
((match-rewriter
(`(PARAMS: (,<arg>))
`(Success))
(`(,<func> . ,<args>)
`(Failure))
)ls)
This function call:
(annotate '(PARAMS: (y))
returns Failure
In another post someone pointed out that this works:
#lang racket
(match `(PARAMS: (y))
[`(PARAMS: (,var)) 'yep]
[otherise 'nope])
returning yep
I verified that it works but I can't figure out why the same pattern isn't being matched in match-rewriter.
Strangely, if I just run this code manually substituting '(PARAMS: (y)) for "ls" it works. Which really confuses me.
Any advice is appreciated.