Getting ASCII code with Gforth - char

When entering 'a' in Gforth, the ASCII number of the character (the same number which would be put onto the stack by using the key word and pressing a) is put onto the stack.
This does not work for example with ' ' (space). Instead:
' ' ok
.s <1> 34384939008 ok
The number “should” be 32. What explains this behavior? And what can be done about it – aside from manually putting the ASCII number corresponding to ' ' (space) on the stack?

This 'a' syntax is quite new to Forth. It's added as an extension on top of the traditional syntax which parses everything into whitespace-delimited tokens. So 'a' is one atomic token, which is then parsed as a character literal.
Now, ' ' isn't an atomic token, since it contains a space character. Rather, it's parsed as two ' tokens. It's actually perfectly valid Forth code, because ' is a Forth word (called "tick"). In your example, the first tick operates on the second. The result, 34384939008, is the xt for '.
What to do instead? The traditional words for getting the ASCII code of a character is CHAR or [CHAR]. The first works in interpreted mode, and the second in compiled mode. BUT they don't work for the particular case of the space character, because again, all whitespace is parsed away.
However, there is another word which pushes the ASCII code space character: BL.

Related

Backspace character does not work in the Go playground

I am new to Go. Just learnt the various uses of fmt.Println(). I tried the following stuff in the official playground but got a pretty unexpected output. Please explain where I have gone wrong in my understanding.
input: fmt.Println("hi\b", "there!")
output: hi� there!
expected: h there!
input: fmt.Println("hi", '\b', "there!")
output: hi 8 there!
expected: hithere!... assuming runes are not appended with spaces
input: fmt.Println("hi", "\bthere!")
output: hi �there!
expected: hithere!
(Note: above, the placeholder character has been substituted by U+FFFD, as the original character does not render consistently between environments.)
Your program outputs exactly what you told it to. The problem is mostly with your output viewer.
Control characters and sequences only have their expected effect when sent to a compatible virtual console (or a physical terminal, or a printer or teletypewriter; but the latter are pretty rare these days). What the Go playground does is capture the output of your program as-is and send it unmodified to the browser to display. The browser does not interpret terminal control codes (other than the newline character, and even that only sometimes); instead, it expects formatting to be conveyed via HTML markup. Since the backspace character does not have an assigned glyph, browsers will usually display a placeholder glyph instead, or sometimes nothing at all.
You would get a similar effect if, when running your Go program on your local machine, you redirected its output into a text file and then opened the file in a text editor: the editor will not interpret any escape sequence contained in the text file; sometimes it will even actively prevent control characters from being interpreted by the terminal displaying the editor (if it happens to be console-based editor), by substituting a symbolic, conventional representation of the character like ^H.
In the middle example, the '\b' literal evaluates to an integer with the value of the character’s Unicode code point number (what Go terms a ‘rune’). This is explained in the specification:
A rune literal represents a rune constant, an integer value identifying a Unicode code point. A rune literal is expressed as one or more characters enclosed in single quotes, as in 'x' or '\n'. Within the quotes, any character may appear except newline and unescaped single quote. A single quoted character represents the Unicode value of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.
Since '\b' represents U+0008, what is passed to fmt.Println is the integer value 8. The function then prints the integer as its decimal representation, instead of interpreting it as a character code.
First thing to check out is your terminal, '\b' is terminal dependent, check if the terminal running your program handles that as "move cursor one character back" (most unixes-like will, i don't know about Windows), your first and third given example works exactly how your expectation is on my terminal (st).
input: fmt.Println("hi", '\b', "there!")
output: hi 8 there!
expected: hithere!... assuming runes are not appended with spaces
Here your assumption is not what package fmt does:
For each Printf-like function, there is also a Print function that takes no format and is equivalent to saying %v for every operand. Another variant Println inserts blanks between operands and appends a newline.
Fmt handles %v for rune as %d, not %c, so '\b' is formatted as "8" (ascii value 56), not the '\b' (ascii value 8). Also runes will have a space if they are between two arguments.
What Println does for this input is:
print string "hi"
print space
format number 8 then print string "8"
print space
print string "there!"
To debug problems like rendering invisible characters, I suggest you to use encoding/hex package, For example:
package main
import (
"encoding/hex"
"fmt"
"os"
)
func main() {
d := hex.Dumper(os.Stdout)
defer d.Close()
fmt.Fprintln(d, "hi", '\b', "there!")
}
Output: 00000000 68 69 20 38 20 74 68 65 72 65 21 0a |hi 8 there!.|
Playground: https://go.dev/play/p/F-I2mdh43K7

Replacing chars in string

I have this code:
inspect w-string1 replacing all x'C48D' by 'c'
But I got this error by compiler
Operand has wrong size
Is there any solution how to replace more chars by one char thru inspect command. Or I must do it by myself via perform loop?
When using the INSPECT statement, both strings must be the same length. The only way to replace multiple characters by a different number of characters is to write your own loop to do it.

Replace non-word characters, unless given sequence matches

I have a string like this:
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
I want to replace all non-word characters (symbols and whitespace), except the ### delimiters.
I'm currently using:
str.gsub(/[^\w#]+/, 'X')
which yields:
"JimXBobXsXemailX###hl###address###endhl###XisXjb#exampleXcom"
In practice, this is good enough, but it offends me for two reasons:
The # in the email address is not replaced.
The use of [^\w] instead of \W feels sloppy.
How do I replace all non-word characters, unless those characters make up the ###hl### or ###endhl### delimiter strings?
str.gsub(/(###.*?###|\w+)|./) { $1 || "X" }
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
This approach uses the fact that alternations work like case structure: the first matching one consumes the corresponding string, then no further matching is done on it. Thus, ###.*?### will consume a marker (like ###hl###; nothing else will be matched inside it. We also match any sequence of word characters. If any of those are captured, we can just return them as-is ($1). If not, then we match any other character (i.e. not inside a marker, and not a word character) and replace it with "X".
Regarding your second point, I think you are asking too much; there is no simple way to avoid that.
Regarding the first point, a simple way is to temporarily replace "###" with a character that you will never use (let's say you are using a system without "\r", so that that character is not used; we can use that as a temporal replacement).
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
.gsub("###", "\r").gsub(/[^\w\r]/, "X").gsub("\r", "###")
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"

What's the difference between /\t+|,/ and /[\t+,]/ when split a string using Ruby?

I have a string seperated by \t and ,, but the number of \t is not fixed, for example :
a=["seg1\tseg2\t\tseg3,seg4"]
seg2 and seg3 is seperated by two \t.
So I try to split them by
a.split(/\t+|,/)
it print the right anwser :
["seg1", "seg2", "seg3", "seg4"]
And I also try this
a.split(/[\t+,]/)
but the answer is
["seg1", "seg2", "", "seg3", "seg4"]
Why ruby print different results?
Because \t+ inside [] does not mean "one or more tabs", it means "a tab or a plus". Since it finds two consecutive tabs, it splits twice, and the string in the middle becomes empty.
Most special characters, like . + * ? etc, when placed in an interval become "regular" characters. There are some exceptions, like ^ (which negates the interval when placed at the beginning), the \ (that escapes the next character(s), just like it does outside intervals) and the ] (that closes the interval; another [ is also disallowed there). So, [\t+,] actually means '\t' or '+' or ','.
Unfortunatly, I don't know any reference for the full set of characters that need or don't need escaping inside an interval. In doubt, I tend to escape just to be sure. In any case, an interval will always match a single character only, if you want something different you must put your quantifier outside the interval. (For example: [\t,]+, if you also admit two commas in a row; otherwise, your first regex is really the correct one)

Parse /var/email/username file in Ruby

For some reason I need to fetch emails from /var/mail/username file. It seems like an append only file.
My question is, is it safe to parse the content of the /var/email/username file depending on the first line From username#host Mon Jun 20 16:50:15 2011? What if the similar pattern found inside the email body?
Furthermore, is there any opensource ruby script available for reference?
Yes, that seems like more or less the right way to parse the mbox format - from a quick scan of the RFC specification:
The structure of the separator lines
vary across implementations, but
usually contain the exact character
sequence of "From", followed by a
single Space character (0x20), an
email address of some kind, another
Space character, a timestamp sequence
of some kind, and an end-of- line
marker.
And...
Many implementations are also known
to escape message body lines that
begin with the character sequence of
"From ", so as to prevent confusion
with overly-liberal parsers that do
not search for full separator
lines. In the common case, a leading
Greater-Than symbol (0x3E) is used
for this purpose (with "From "
becoming ">From "). However, other
implementations are known not to
escape such lines unless they are
immediately preceded by a blank line
or if they also appear to contain
an email address and a timestamp.
Other implementations are also
known to perform secondary escapes
against these lines if they are
already escaped or quoted, while
others ignore these mechanisms
altogether.
Update:
There's also this: https://github.com/meh/ruby-mbox

Resources