Where does the extra 'D' come from in dup1.go? - go

I'm new to golang and learning it now. I'm reading "The Go Programming Language" book and trying to run the dup1 example on my Mac. But I noticed a very weird issue. The output of the count contains an extra "D". Anyone has any idea why?
> go run dup1New.go test
test
test
hello
hello
world
3D test
2 hello
> cat dup1New.go
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
counts := make(map[string]int)
input := bufio.NewScanner(os.Stdin)
for input.Scan() {
counts[input.Text()]++
}
// NOTE: ignoring potential errors from input.Err()
for line, n := range counts {
if n > 1 {
fmt.Printf("%d\t%s\n", n, line)
}
}
}
go version go1.13.5 darwin/amd64

You're getting that D character from Ctrl+D is because of echoctl option in your terminal device interface. You could easily remove that off by running this command in your shell/terminal:
stty -echoctl
Ref: man stty

As wlisrausr answered, this is in part from your MacOS Terminal stty settings. (You probably should not turn off echoctl, though.)
To be more complete: when you type the CTRL+D sequence to signal EOF,1 the tty driver2 "displays" the character as the two-character sequence ^D, but then prints two backspace or CTRL+H characters. More precisely, it does so as long as the ECHOCTL flag is set in the lflags control field in the underlying tty settings.
The window that is displaying the interactive Terminal session is treating output as directives to draw particular characters, move (position) the cursor, and have other interesting effects. Some character codes, particularly those in the range 0x20 (32 decimal) through 0x7e (126 decimal), are displayable ASCII characters. Others are controlling characters—ANSI escape codes—or Unicode characters that have been encoded in UTF-8. Go itself uses UTF-8 extensively, to encode runes, so Go's use of UTF-8 dovetails nicely with Terminal's use of UTF-8.3
The CTRL+H, ASCII code 8—which they call BACKSPACE or BS—has the effect of moving the cursor back one display-column. That is, it is a cursor-positioning control code. (There are many of these; see the ANSI escape codes page. This stuff has a very long history, going back to just after the first glass tty.)
So, the CTRL+D has been displayed as ^D, but the cursor is positioned over the ^ (hat or caret or circumflex) character. Now you, in your Go program, send to the Terminal display-handling code, a sequence of ASCII codes: 3, which is 0x33 or 51 decimal; then TAB or CTRL+I or ASCII Horizontal Tab (HT), which is code 9; then the ASCII codes for the letters test (0x74, 0x65, 0x73, 0x74), then a newline or CTRL+J or ASCII NL, which is code 10.
Like backspace, a horizontal tab is a cursor positioning operation. It directs the terminal (or window emulation of terminal) to move the cursor to the next tab-stop, without changing anything else on the display. So you first overwrite the ^ with 3, leaving 3D visible, and the cursor positioned over the letter D. Then you have Terminal move the cursor to column 9 (columns are numbered from 1 and the default tab stop is at every eighth column) and display the word test, and then move the cursor to column 1 of a new line. The result is that the line shows:
3D test
(with exactly six blank positions between D and the first t). On the newly exposed or created line, which is currently all-blank, you print the character 2, move to column 9, and print the letters hello (and another newline directive).
1In fact, control-D simply pushes the accumulating line through the "input canonization" queue as is. If the line is empty, this sends a zero-length record up the tty's read side. Reading zero bytes from a file or device-file is interpreted as EOF by many systems, including Go's os.File reader. If you type a partial line, without a terminating newline, and then use control-D to send it, you can no longer edit that partial line, and a reader that is reading and is not concerned with newlines will have obtained the data and be using it at this point. A second control-D is then required to signal the EOF: the reader simply got the non-newline terminated input from the first control-D.
2This link describes Linux tty drivers, but Linux tty drivers are derived from the same common ancestor behind MacOS tty drivers.
3This is not an accident, even though the Go folks are not the Darwin folks: again, all this stuff goes back (via different paths) to some common ancestors.

Related

Backspace character does not work in the Go playground

I am new to Go. Just learnt the various uses of fmt.Println(). I tried the following stuff in the official playground but got a pretty unexpected output. Please explain where I have gone wrong in my understanding.
input: fmt.Println("hi\b", "there!")
output: hi� there!
expected: h there!
input: fmt.Println("hi", '\b', "there!")
output: hi 8 there!
expected: hithere!... assuming runes are not appended with spaces
input: fmt.Println("hi", "\bthere!")
output: hi �there!
expected: hithere!
(Note: above, the placeholder character has been substituted by U+FFFD, as the original character does not render consistently between environments.)
Your program outputs exactly what you told it to. The problem is mostly with your output viewer.
Control characters and sequences only have their expected effect when sent to a compatible virtual console (or a physical terminal, or a printer or teletypewriter; but the latter are pretty rare these days). What the Go playground does is capture the output of your program as-is and send it unmodified to the browser to display. The browser does not interpret terminal control codes (other than the newline character, and even that only sometimes); instead, it expects formatting to be conveyed via HTML markup. Since the backspace character does not have an assigned glyph, browsers will usually display a placeholder glyph instead, or sometimes nothing at all.
You would get a similar effect if, when running your Go program on your local machine, you redirected its output into a text file and then opened the file in a text editor: the editor will not interpret any escape sequence contained in the text file; sometimes it will even actively prevent control characters from being interpreted by the terminal displaying the editor (if it happens to be console-based editor), by substituting a symbolic, conventional representation of the character like ^H.
In the middle example, the '\b' literal evaluates to an integer with the value of the character’s Unicode code point number (what Go terms a ‘rune’). This is explained in the specification:
A rune literal represents a rune constant, an integer value identifying a Unicode code point. A rune literal is expressed as one or more characters enclosed in single quotes, as in 'x' or '\n'. Within the quotes, any character may appear except newline and unescaped single quote. A single quoted character represents the Unicode value of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.
Since '\b' represents U+0008, what is passed to fmt.Println is the integer value 8. The function then prints the integer as its decimal representation, instead of interpreting it as a character code.
First thing to check out is your terminal, '\b' is terminal dependent, check if the terminal running your program handles that as "move cursor one character back" (most unixes-like will, i don't know about Windows), your first and third given example works exactly how your expectation is on my terminal (st).
input: fmt.Println("hi", '\b', "there!")
output: hi 8 there!
expected: hithere!... assuming runes are not appended with spaces
Here your assumption is not what package fmt does:
For each Printf-like function, there is also a Print function that takes no format and is equivalent to saying %v for every operand. Another variant Println inserts blanks between operands and appends a newline.
Fmt handles %v for rune as %d, not %c, so '\b' is formatted as "8" (ascii value 56), not the '\b' (ascii value 8). Also runes will have a space if they are between two arguments.
What Println does for this input is:
print string "hi"
print space
format number 8 then print string "8"
print space
print string "there!"
To debug problems like rendering invisible characters, I suggest you to use encoding/hex package, For example:
package main
import (
"encoding/hex"
"fmt"
"os"
)
func main() {
d := hex.Dumper(os.Stdout)
defer d.Close()
fmt.Fprintln(d, "hi", '\b', "there!")
}
Output: 00000000 68 69 20 38 20 74 68 65 72 65 21 0a |hi 8 there!.|
Playground: https://go.dev/play/p/F-I2mdh43K7

What does writing "\r\027[1A\027[K" to stdout do?

I came across some code for chat application in the terminal (in OCaml) and swa this string (in ASCII?) "\r\027[1A\027[K" being printed into the terminal before a new user message is printed to the terminal.
I have tried googling literals one by one, so I know that "\r" stands for cartridge return and \027 for ESC in ASCII, but what does "[1A" and "[K" do? What character encoding is this?
And finally, what is the aggregate effect of this command?
[ introduces a control sequence. A is the control sequence for "cursor up", and [1A moves the cursor up 1 line. K erases a line. So \x1b[1A\x1b[K moves up one line and deletes it (replaces it with spaces).
Of course, that is only valid if the terminal that receives that string recognizes the control sequences. Not all do.
See https://en.wikipedia.org/wiki/ANSI_escape_code
I'm not sure what 027 is trying to do. It seems like an error and should have been 033.

Clearing the screen by printing a character?

I'm using chez-scheme and I can't find a way to clear the screen completely. (If someone knows a better way than printing I'd be interested in that too but it's not my question here)
From what I can find clearing the screen by ^L (control-L) or giving the clear command (in bash at least) is equivalent to outputting ASCII character 12: Form feed.
However, printing this does nothing. If I use (display (integer->char 12)) it just prints a newline. Another way to encode this character is \f (analogous to \n for newline), but in Python print("\f") as well as in Scheme (display "\f") is just a newline.
Is my understanding of the meaning of ASCII 12 just wrong, or are implementations lacking?
Is there any way to clear the screen that should work across languages, analogous to \n for a newline?
If you want to clear the screen, the "ANSI" sequence in a printf
\033[2J
clears the entire screen, e.g.,
printf '\033[2J'
The command-line clear program uses this, along with moving the cursor to the "home" position, again an "ANSI" sequence:
\033[H
The program gets the information from the terminal database. For example, for TERM=vt100, it might see this (using \E as \033):
clear=\E[H\E[J$<50>
(the $<50> indicates padding needed for real VT100s). You might notice that the 2 is absent from this string. That is because the cursor is first moved to the home (upper left) position, and the 2 (entire screen) is not necessary. Eliminating that from the string made VT100s a little faster.
On the other hand, if you just want to reset the terminal, you can use the VT100-style RIS:
\033c
but that has side-effects, besides not being in ECMA-48. These bug reports were for side-effects of \033c:
Debian Bug report logs - #60377
"reset" broken for dumb terminals
Debian Bug report logs - #239205
"reset changes a unicode console to non-unicode"
Further reading:
Why doesn't the screen clear when I type control/L?
XTerm Control Sequences
CSI Ps J Erase in Display (ED).
Ps = 0 -> Erase Below (default).
Ps = 1 -> Erase Above.
Ps = 2 -> Erase All.
Ps = 3 -> Erase Saved Lines (xterm).
ECMA-48: Control Functions for Coded Character Sets
You can print \033c which resets the terminal:
petite -q <<< '(display "\033c")'
\033 is escape and c is literal c.
I can't give you any information about how widely this is supported.

An obscure one: Documented VT100 'soft-wrap' escape sequence?

When connected to a remote BASH session via SSH (with the terminal type set to vt100), the console command line will soft-wrap when the cursor hits column 80.
What I am trying to discover is if the <space><carriage return> sequence that gets sent at this point is documented anywhere?
For example sending the following string
std::string str = "0123456789" // 1
"0123456789"
"0123456789" // 3
"0123456789"
"0123456789" // 5
"012345678 9"
"0123456789_" // 7
"0123456789"
"0";
gets the following response back from the host (Linux Mint as it happens)
01234567890123456789012345678901234567890123456789012345678<WS><WS><CR>90123456789_01234567890
The behaviour observed is not really part of bash; rather, it is part of the behaviour of the readline library. It doesn't happen if you simply use echo (which is a bash builtin) to output enough text to force an automatic line wrap, nor does it happen if bash produces an error message which is wider than the console. (Try, for example, the command . with an argument of more then 80 characters not corresponding to any existing file.)
So it's not an official "soft-wrap sequence", nor is it part of any standard. Rather, it's a pragmatic solution to one of the many irritating problems related to console display management.
There is an ambiguity in terminal implementation of line wrapping:
The terminal wraps after a character is inserted at the rightmost position.
The terminal wraps just before the next character is sent.
As a result, it is not possible to reliably send a newline after the last column position. If the terminal had already wrapped (option 1 above), then the newline will create an extra blank line. Otherwise (option 2), the following newline will be "eaten".
These days, almost all terminals follow some variant of option 2, which was the behaviour of the DEC VT-100 terminal. In the vocabulary of the terminfo terminal description database, this is called xenl: the "eat-newline-glitch".
There are actually two possible subvariants of option 2. In the one actually implemented by the VT-100 (and xterm), the cursor ends up in an anomalous state at the end of the line; effectively, it is one character position off the screen, so you can still backspace the cursor in the same line. Other historic terminals "ate" the newline, but positioned the cursor at the beginning of the next line anyway, so that a backspace would not be possible. (Unless the terminal has the bw capability.)
This creates a problem for programs which need to accurately keep track of the cursor position, even for apparently simple applications like echoing input. (Obviously, the easiest way to echo input is to let the terminal do that itself, but that precludes being able to implement extra control characters like tab completion.) Suppose the user has entered text right up to the right margin, and then types the backspace character to delete the last character typed. Normally, you could implement a backspace-delete by outputting a cub1 (move left 1) code and then an el (clear to end of line). (It's more complicated if the deletion is in the middle of a line, but the principle is the same.)
However, if the cursor could possibly be at the beginning of the next line, this won't work. If you knew the cursor was at the beginning of the next, you could move up and then to the right before doing the el, but that wouldn't work if the cursor was still on the same line.
Historically, what was considered "correct" was to force the cursor to the next line with a hard return. (Following quote is taken from the file terminfo.src found in the ncurses distribution. I don't know who wrote it or when):
# Note that the <xenl> glitch in vt100 is not quite the same as on the Concept,
# since the cursor is left in a different position while in the
# weird state (concept at beginning of next line, vt100 at end
# of this line) so all versions of vi before 3.7 don't handle
# <xenl> right on vt100. The correct way to handle <xenl> is when
# you output the char in column 80, immediately output CR LF
# and then assume you are in column 1 of the next line. If <xenl>
# is on, am should be on too.
But there is another way to handle the issue which doesn't require you to even know whether the terminal has the xenl "glitch" or not: output a space character, after which the terminal will definitely have line-wrapped, and then return to the leftmost column.
As it turns out, this trick has another benefit if the terminal emulator is xterm (and probably other such emulators), which allows you to select a "word" by double-clicking on it. If the automatic line wrap happens in the middle of a word, it would be ideal if you could still select the entire word even though it is split over two lines. If you follow the suggestion in the terminfo file above, then xterm will (quite reasonably) treat the split word as two words, because they have an explicit newline between them. But if you let the terminal wrap automatically, xterm treats the result as a single word. (It does this despite the output of the space character, presumably because the space character was overwritten.)
In short, the SPCR sequence is not in any way a standardized feature of the VT100 terminal. Rather, it is a pragmatic response to a specific feature of terminal descriptions combined with the observed behaviour of a specific (and common) terminal emulator. Variants of this code can be found in a variety of codebases, and although as far as I know it is not part of any textbook or formal documentation, it is certainly part of terminal-handling folkcraft [note 2].
In the case of readline, you'll find a comment in the code which is much more telegraphic than this answer: [note 1]
/* If we're at the right edge of a terminal that supports xn, we're
ready to wrap around, so do so. This fixes problems with knowing
the exact cursor position and cut-and-paste with certain terminal
emulators. In this calculation, TEMP is the physical screen
position of the cursor. */
(xn is the short form of xenl.)
Notes
The comment is at line 1326 of display.c in the current view of the git repository as I type this answer. In future versions it may be at a different line number, and the provided link will therefore not work. If you notice that it has changed, please feel free to correct the link.
In the original version of this answer, I described this procedure as "part of terminal handling folklore", in which I used the word "folklore" to describe knowledge passed down from programmer to programmer rather than being part of the canon of academic texts and international standards. While "folklore" is often used with a negative connotation, I use it without such prejudice. "lore" (according to wiktionary) refers to "all the facts and traditions about a particular subject that have been accumulated over time through education or experience", and is derived from an Old Germanic word meaning "teach". Folklore is therefore the accumulated education and experience of the "folk", as opposed to the establishment: in Eric S. Raymond's analogy of the Cathedral and the Bazaar, folklore is the knowledge base of the Bazaar.
This usage raised the eyebrows of at least one highly-skilled practitioner, who suggested the use of the word "esoteric" to describe this bit of information about terminal-handling. "Esoteric" (again according to wiktionary) applies to information "intended for or likely to be understood by only a small number of people with a specialized knowledge or interest, or an enlightened inner circle", being derived from the Greek ἐσωτερικός, "inner circle". (In other words, the knowledge of the Cathedral.)
While the semantic discussion is, at least, amusing, I changed the text by using the hopefully less emotionally-charged word "folkcraft".
There is more than one reason for making line-wrapping a special case (and "folklore" seems an inappropriate term):
The xterm FAQ That description of wrapping is odd, say more? is one of many places discussing vt100 line-wrapping.
vim and screen both take care to not use cursor-addressing to avoid the wrapping, since that would interfere with selecting a wrapped line in xterm. Instead (and the sample seems to show bash doing this too) they send a series of printable characters which step across the margin before sending other control sequences which would prevent the line-wrapping flag from being set in xterm. This is noted in xterm's manual page:
Logical words and lines selected by double- or triple-clicking may wrap
across more than one screen line if lines were wrapped by xterm itself
rather than by the application running in the window.
As for "comments in code" - there certainly are, to explain to maintainers what should not be changed. This from Sven Mascheck's XTerm resource file gives a good explanation:
! Wether this works also with _wrapped_ selections, depends on
! - the terminal emulator: Neither MIT X11R5/6 nor Suns openwin xterm
! know about that. Use the 'xfree xterm' or 'rxvt'. Both compile on
! all major platforms.
! - It only works if xterm is wrapping the line itself
! (not always really obvious for the user, though).
! - Among the different vi's, vim actually supports this with a
! clever and little hackish trick (see screen.c):
!
! But before: vim inspects the _name_ of the value of TERM.
! This must be similar to "xterm" (like "xterm-xfree86", which is
! better than "xterm-color", btw, see his FAQ).
! The terminfo entry _itself_ doesn't matter here
! (e.g.: 'xterm' and 'vs100' are the same entry, but with
! the latter it doesn't work).
!
! If vim has to wrap a word, it appends a space at the first part,
! this space will be wrapped by xterm. Going on with writing, vim
! in turn then positions the cursor again at the _beginning_ of this
! next line. Thus, the space is not visible. But xterm now believes
! that the two lines are actually a single one--as xterm _has_ done
! some wrapping also...
The comment which #rici quotes came from the terminfo file which Eric Raymond incorporated from SCO in 1995. The history section of the terminfo source refers to this. Some of the material in that is based on the BSD termcap sources, but differs, as one would notice when comparing the BSD termcap in this section with ncurses. The four paragraphs beginning with the "not quite" are the same (aside from line-wrapping) with the SCO file. Here is a cut/paste from that file:
# # --------------------------------
#
# dec: DEC (DIGITAL EQUIPMENT CORPORATION)
#
# Manufacturer: DEC (DIGITAL EQUIPTMENT CORP.)
# Class: II
#
# Info:
# Note that xenl glitch in vt100 is not quite the same as concept,
# since the cursor is left in a different position while in the
# weird state (concept at beginning of next line, vt100 at end
# of this line) so all versions of vi before 3.7 don't handle
# xenl right on vt100. The correct way to handle xenl is when
# you output the char in column 80, immediately output CR LF
# and then assume you are in column 1 of the next line. If xenl
# is on, am should be on too.
#
# I assume you have smooth scroll off or are at a slow enough baud
# rate that it doesn't matter (1200? or less). Also this assumes
# that you set auto-nl to "on", if you set it off use vt100-nam
# below.
#
# The padding requirements listed here are guesses. It is strongly
# recommended that xon/xoff be enabled, as this is assumed here.
#
# The vt100 uses rs2 and rf rather than is2/tbc/hts because the
# tab settings are in non-volatile memory and don't need to be
# reset upon login. Also setting the number of columns glitches
# the screen annoyingly. You can type "reset" to get them set.
#
# smkx and rmkx, given below, were removed.
# smkx=\E[?1h\E=, rmkx=\E[?1l\E>,
# Somtimes smkx and rmkx are included. This will put the auxilliary keypad in
# dec application mode, which is not appropriate for SCO applications.
vt100|vt100-am|dec vt100 (w/advanced video),
If you compare the two, the ncurses version has angle brackets added around the terminfo capability names, and a minor grammatical change was made in the first sentence. But the author of the comment clearly was not Raymond.

Forward delete character?

To dynamically delete a character from a string, you can use the /b character.
puts "Hello\b World!" #=> Hell World!
\b basically does the same thing as a backspace. Is there a character that emulates a forward delete?
In the execution of:
puts "Hello\b World!"
The \b doesn't delete the prior character. This is a common misconception since a backspace used on a keyboard will delete the previously typed character prior to the cursor on screen and from the keyboard input buffer. That behavior occurs because of how the keyboard input software of the operating system is designed.
In the case of the above puts, the o still exists in the string. What happens is that, when displayed, the backspace causes the o to be overwritten by the following space. This occurs because the o is display first, followed by the backspace (output cursor is backed up one character position), followed by the space, in sequence.
If you could have such a case where:
puts "Hello<del> World!"
would display HelloWorld!, then that would mean the output of the value of <del> would somehow cause the following output of space () to not occur. In other words, the <del> would have the function of, "whatever the next charter is that comes for the output, skip it". I don't believe such a control character exists in Windows or Linux output, although I suppose it would be possible to write an output driver that would have that behavior for some defined control character.
You might even be able to do something like this:
"Hello W<left-arrow><left-arrow><del><right-arrow>orld!"
Which would display HelloWorld if your terminal is set up to accept control characters that move the cursor left or right. But it still obviously isn't the same functionality as the "delete in the future" case.

Resources