Backspace character does not work in the Go playground - go

I am new to Go. Just learnt the various uses of fmt.Println(). I tried the following stuff in the official playground but got a pretty unexpected output. Please explain where I have gone wrong in my understanding.
input: fmt.Println("hi\b", "there!")
output: hi� there!
expected: h there!
input: fmt.Println("hi", '\b', "there!")
output: hi 8 there!
expected: hithere!... assuming runes are not appended with spaces
input: fmt.Println("hi", "\bthere!")
output: hi �there!
expected: hithere!
(Note: above, the placeholder character has been substituted by U+FFFD, as the original character does not render consistently between environments.)

Your program outputs exactly what you told it to. The problem is mostly with your output viewer.
Control characters and sequences only have their expected effect when sent to a compatible virtual console (or a physical terminal, or a printer or teletypewriter; but the latter are pretty rare these days). What the Go playground does is capture the output of your program as-is and send it unmodified to the browser to display. The browser does not interpret terminal control codes (other than the newline character, and even that only sometimes); instead, it expects formatting to be conveyed via HTML markup. Since the backspace character does not have an assigned glyph, browsers will usually display a placeholder glyph instead, or sometimes nothing at all.
You would get a similar effect if, when running your Go program on your local machine, you redirected its output into a text file and then opened the file in a text editor: the editor will not interpret any escape sequence contained in the text file; sometimes it will even actively prevent control characters from being interpreted by the terminal displaying the editor (if it happens to be console-based editor), by substituting a symbolic, conventional representation of the character like ^H.
In the middle example, the '\b' literal evaluates to an integer with the value of the character’s Unicode code point number (what Go terms a ‘rune’). This is explained in the specification:
A rune literal represents a rune constant, an integer value identifying a Unicode code point. A rune literal is expressed as one or more characters enclosed in single quotes, as in 'x' or '\n'. Within the quotes, any character may appear except newline and unescaped single quote. A single quoted character represents the Unicode value of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.
Since '\b' represents U+0008, what is passed to fmt.Println is the integer value 8. The function then prints the integer as its decimal representation, instead of interpreting it as a character code.

First thing to check out is your terminal, '\b' is terminal dependent, check if the terminal running your program handles that as "move cursor one character back" (most unixes-like will, i don't know about Windows), your first and third given example works exactly how your expectation is on my terminal (st).
input: fmt.Println("hi", '\b', "there!")
output: hi 8 there!
expected: hithere!... assuming runes are not appended with spaces
Here your assumption is not what package fmt does:
For each Printf-like function, there is also a Print function that takes no format and is equivalent to saying %v for every operand. Another variant Println inserts blanks between operands and appends a newline.
Fmt handles %v for rune as %d, not %c, so '\b' is formatted as "8" (ascii value 56), not the '\b' (ascii value 8). Also runes will have a space if they are between two arguments.
What Println does for this input is:
print string "hi"
print space
format number 8 then print string "8"
print space
print string "there!"
To debug problems like rendering invisible characters, I suggest you to use encoding/hex package, For example:
package main
import (
"encoding/hex"
"fmt"
"os"
)
func main() {
d := hex.Dumper(os.Stdout)
defer d.Close()
fmt.Fprintln(d, "hi", '\b', "there!")
}
Output: 00000000 68 69 20 38 20 74 68 65 72 65 21 0a |hi 8 there!.|
Playground: https://go.dev/play/p/F-I2mdh43K7

Related

How to encode a TAB character in a Code128 barcode using only raw ZPL

In the past, we've used ZPL to create Code39 barcodes with a TAB character encoded in the middle using something similar to the following:
*USERNAME$IPASSWORD*
The $I in the middle gets translated to a TAB by the barcode scanners we use.
Now we have a need to do the same thing, but using Code128. With Code39, all the text needs to be uppercase (unless you're using Code39Extended, which supports lowercase letters). Because some of the data that is going to be encoded will be lowercase, we need to use Code128 B for most of the barcode, switching to Code128 A in the middle to encode the TAB character, then back to Code128 B for the final part.
Looking through the "ZPL II Programming Guide", it should be as easy as:
>:username>7{TAB}>6PA55w0rd
The >: at the beginning sets the subset to B, the >7 changes the subset to A, and the >6 changes the subset back to B. The problem I'm having (and haven't found a solution after almost a week of searching) is: How do I encode a TAB character using only text?
Use the ^FH (field hexidecimal encoding) command immediately prior to your field data. Based on your example:
^FH_^FD>:username>7_09>6PA55w0rd^FS
Where the underscore '_' is used as the escape character and 09 is the hex value for tab.
Also note that if the chosen escape character appears in the user name or password, you will need to escape it as well.
I tried what Mark Warren suggested, but unfortunately, it didn't work. It did, however, get me looking back through the ZPL II Programming Guide and I found the following, which I had overlooked before:
Code 128, Subsets A and C are programmed in pairs of digits, 00 to 99, in the field data string.
...
In Subset A, each pair of digits results in a single character being encoded in the bar code...
So, since 73 equates to a TAB in Subset A, I tried the following:
>:username>773>6PA55w0rd
And it worked!

Where does the extra 'D' come from in dup1.go?

I'm new to golang and learning it now. I'm reading "The Go Programming Language" book and trying to run the dup1 example on my Mac. But I noticed a very weird issue. The output of the count contains an extra "D". Anyone has any idea why?
> go run dup1New.go test
test
test
hello
hello
world
3D test
2 hello
> cat dup1New.go
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
counts := make(map[string]int)
input := bufio.NewScanner(os.Stdin)
for input.Scan() {
counts[input.Text()]++
}
// NOTE: ignoring potential errors from input.Err()
for line, n := range counts {
if n > 1 {
fmt.Printf("%d\t%s\n", n, line)
}
}
}
go version go1.13.5 darwin/amd64
You're getting that D character from Ctrl+D is because of echoctl option in your terminal device interface. You could easily remove that off by running this command in your shell/terminal:
stty -echoctl
Ref: man stty
As wlisrausr answered, this is in part from your MacOS Terminal stty settings. (You probably should not turn off echoctl, though.)
To be more complete: when you type the CTRL+D sequence to signal EOF,1 the tty driver2 "displays" the character as the two-character sequence ^D, but then prints two backspace or CTRL+H characters. More precisely, it does so as long as the ECHOCTL flag is set in the lflags control field in the underlying tty settings.
The window that is displaying the interactive Terminal session is treating output as directives to draw particular characters, move (position) the cursor, and have other interesting effects. Some character codes, particularly those in the range 0x20 (32 decimal) through 0x7e (126 decimal), are displayable ASCII characters. Others are controlling characters—ANSI escape codes—or Unicode characters that have been encoded in UTF-8. Go itself uses UTF-8 extensively, to encode runes, so Go's use of UTF-8 dovetails nicely with Terminal's use of UTF-8.3
The CTRL+H, ASCII code 8—which they call BACKSPACE or BS—has the effect of moving the cursor back one display-column. That is, it is a cursor-positioning control code. (There are many of these; see the ANSI escape codes page. This stuff has a very long history, going back to just after the first glass tty.)
So, the CTRL+D has been displayed as ^D, but the cursor is positioned over the ^ (hat or caret or circumflex) character. Now you, in your Go program, send to the Terminal display-handling code, a sequence of ASCII codes: 3, which is 0x33 or 51 decimal; then TAB or CTRL+I or ASCII Horizontal Tab (HT), which is code 9; then the ASCII codes for the letters test (0x74, 0x65, 0x73, 0x74), then a newline or CTRL+J or ASCII NL, which is code 10.
Like backspace, a horizontal tab is a cursor positioning operation. It directs the terminal (or window emulation of terminal) to move the cursor to the next tab-stop, without changing anything else on the display. So you first overwrite the ^ with 3, leaving 3D visible, and the cursor positioned over the letter D. Then you have Terminal move the cursor to column 9 (columns are numbered from 1 and the default tab stop is at every eighth column) and display the word test, and then move the cursor to column 1 of a new line. The result is that the line shows:
3D test
(with exactly six blank positions between D and the first t). On the newly exposed or created line, which is currently all-blank, you print the character 2, move to column 9, and print the letters hello (and another newline directive).
1In fact, control-D simply pushes the accumulating line through the "input canonization" queue as is. If the line is empty, this sends a zero-length record up the tty's read side. Reading zero bytes from a file or device-file is interpreted as EOF by many systems, including Go's os.File reader. If you type a partial line, without a terminating newline, and then use control-D to send it, you can no longer edit that partial line, and a reader that is reading and is not concerned with newlines will have obtained the data and be using it at this point. A second control-D is then required to signal the EOF: the reader simply got the non-newline terminated input from the first control-D.
2This link describes Linux tty drivers, but Linux tty drivers are derived from the same common ancestor behind MacOS tty drivers.
3This is not an accident, even though the Go folks are not the Darwin folks: again, all this stuff goes back (via different paths) to some common ancestors.

Is using ASCII 10 inside a HL7 segment a valid way to represent a new line?

Placing an ASCII 10 (0A) character somewhere inside of a segment of an HL7 message to represent a new line character. Is this valid?
From what I can see it is recommend to use \X0D\ or \X0D0A\ to represent a new line character for plain text format HL7. Is using just the 0A ASCII character explicitly invalid HL7?
To respond to the question "Is using just the 0A ASCII character explicitly invalid HL7?":
The character 0A is not mentioned anywhere in the HL7 specs as being special.
Extract from the HL7 2.5 US specs:
2.5.4 Message delimiters
In constructing a message, certain special characters are used. They are the segment terminator, the field
separator, the component separator, subcomponent separator, repetition separator, and escape character. The
segment terminator is always a carriage return (in ASCII, a hex 0D). The other delimiters are defined in the
MSH segment, with the field delimiter in the 4th character position, and the other delimiters occurring as in
the field called Encoding Characters, which is the first field after the segment ID. The delimiter values used
in the MSH segment are the delimiter values used throughout the entire message. In the absence of other
considerations, HL7 recommends the suggested values found in Figure 2-1 delimiter values.
Strictly speaking this would mean that you could use the character 0A just as any of the characters other than the 6 previously mentioned.
<end of "formal" reply>
That being said, I concur with Dale H. that you should better stay away from using this character in the content of an HL7 message. Since most editors (except old-fashioned Notepad on Windows) will display this character as a new line, you might unwillingly think that a segment was truncated or malformed. And I've had at least one instance where the interface engine indeed handled that character as a segment termination (which in itself is invalid, and the interface engine build was modified to not do this anymore).
So better avoid this. But in situations where you don't control the output, it doesn't seem to be a formally disallowed character...
Linefeeds (0x0A) are not allowed in HL7 messages. If you edit messages with notepad, wordpad and many other text editors, they will convert carriage returns (0x0D) to CR/LF (0x0D 0x0A) and if you save, you now have a corrupt HL7 message. Avoid LFs (0x0A).
If you only send 0A then there is no way to determine that you wanted ASCII 10/line feed and it would be assumed you wanted a zero and an A.
Standard HL7 with the escape character being a \, then yes the recommended way would be \X0A\. The \X representing the start of hexadecimal data, followed by two-character hexadecimal values, ending with a \.
That being said, if you are sending this data to a system then they should be able to tell you what they accept for lines feeds. I've seen systems that use \.br\ or the repetition character ~ to determine a new line. And sometimes they want repeating segments. For example below, each OBX segment is a new line of a report in the system.
OBX|1|TX|||This is line one
OBX|2|TX|||This is line two

code128 barcode with tilde and asterisk

I am maintaining a printing program that now requires printing both a ~ and an * in a code128 barcode in zpl.
Currently, I am using the code below that uses the ^FH to represent the tilde in hex:
^BCN,120,Y,N,N,N^FH^FDSPECIAL*MAKE_7e123456^FS
The barcode prints excluding the * and ~ as 'SPECIALMAKE123456'. Is it possible to print the tilde and asterisk in a zpl code128 barcode?
As a quick guess, since I don't have a ZPLII printer immediately available, I'd try
^BCN,120,Y,N,N,A^FH^FDSPECIAL*MAKE_7e123456^FS
(note A before the ^FH = Auto-select codeset)
Perhaps also forcing a codeset by ...^FH^FD>:SPECIAL*... may work, but subset B is the default in any case...
I located my old A300 printer, and was able to produce the required interpretation line using each of
^BCN,120,Y,N,N,A^FH^FDSPECIAL*MAKE_7E123456^FS
^BCN,120,Y,N,N,A^FH^FDSPECIAL_2AMAKE_7E123456^FS
Can't find my scanner to verify at present - but the computer room is a mite tidier...
It may depends on type of barcode.
For example, to print in 'barcode 128', you have to change code to code B, by signs >:
And: to print tilde ~, type >=. To print ^, type ><. To print >, type >0.
Look to zpl documentation, to table with Code 128 Invocation Characters.
My sample zpl code:
^XA
^BY2,3,95^FT0,206^BCN,,Y,N
^FD>:caret >< bigger >0 tilde >= end^FS
^PQ1,1,1,Y^XZ

How do I escape a Unicode string with Ruby?

I need to encode/convert a Unicode string to its escaped form, with backslashes. Anybody know how?
In Ruby 1.8.x, String#inspect may be what you are looking for, e.g.
>> multi_byte_str = "hello\330\271!"
=> "hello\330\271!"
>> multi_byte_str.inspect
=> "\"hello\\330\\271!\""
>> puts multi_byte_str.inspect
"hello\330\271!"
=> nil
In Ruby 1.9 if you want multi-byte characters to have their component bytes escaped, you might want to say something like:
>> multi_byte_str.bytes.to_a.map(&:chr).join.inspect
=> "\"hello\\xD8\\xB9!\""
In both Ruby 1.8 and 1.9 if you are instead interested in the (escaped) unicode code points, you could do this (though it escapes printable stuff too):
>> multi_byte_str.unpack('U*').map{ |i| "\\u" + i.to_s(16).rjust(4, '0') }.join
=> "\\u0068\\u0065\\u006c\\u006c\\u006f\\u0639\\u0021"
To use a unicode character in Ruby use the "\uXXXX" escape; where XXXX is the UTF-16 codepoint. see http://leejava.wordpress.com/2009/03/11/unicode-escape-in-ruby/
If you have Rails kicking around you can use the JSON encoder for this:
require 'active_support'
x = ActiveSupport::JSON.encode('µ')
# x is now "\u00b5"
The usual non-Rails JSON encoder doesn't "\u"-ify Unicode.
There are two components to your question as I understand it: Finding the numeric value of a character, and expressing such values as escape sequences in Ruby. Further, the former depends on what your starting point is.
Finding the value:
Method 1a: from Ruby with String#dump:
If you already have the character in a Ruby String object (or can easily get it into one), this may be as simple as displaying the string in the repl (depending on certain settings in your Ruby environment). If not, you can call the #dump method on it. For example, with a file called unicode.txt that contains some UTF-8 encoded data in it – say, the currency symbols €£¥$ (plus a trailing newline) – running the following code (executed either in irb or as a script):
s = File.read("unicode.txt", :encoding => "utf-8") # this may be enough, from irb
puts s.dump # this will definitely do it.
... should print out:
"\u20AC\u00A3\u00A5$\n"
Thus you can see that € is U+20AC, £ is U+00A3, and ¥ is U+00A5. ($ is not converted, since it's straight ASCII, though it's technically U+0024. The code below could be modified to give that information, if you actually need it. Or just add leading zeroes to the hex values from an ASCII table – or reference one that already does so.)
(Note: a previous answer suggested using #inspect instead of #dump. That sometimes works, but not always. For example, running ruby -E UTF-8 -e 'puts "\u{1F61E}".inspect' prints an unhappy face for me, rather than an escape sequence. Changing inspect to dump, though, gets me the escape sequence back.)
Method 1b: with Ruby using String#encode and rescue:
Now, if you're trying the above with a larger input file, the above may prove unwieldy – it may be hard to even find escape sequences in files with mostly ASCII text, or it may be hard to identify which sequences go with which characters. In such a case, one might replace the second line above with the following:
encodings = {} # hash to store mappings in
s.split("").each do |c| # loop through each "character"
begin
c.encode("ASCII") # try to encode it to ASCII
rescue Encoding::UndefinedConversionError # but if that fails
encodings[c] = $!.error_char.dump # capture a dump, mapped to the source character
end
end
# And then print out all the captured non-ASCII characters:
encodings.each do |char, dumped|
puts "#{char} encodes to #{dumped}."
end
With the same input as above, this would then print:
€ encodes to "\u20AC".
£ encodes to "\u00A3".
¥ encodes to "\u00A5".
Note that it's possible for this to be a bit misleading. If there are combining characters in the input, the output will print each component separately. For example, for input of 🙋🏾 ў ў, the output would be:
🙋 encodes to "\u{1F64B}".
🏾 encodes to "\u{1F3FE}".
ў encodes to "\u045E".
у encodes to "\u0443". ̆
encodes to "\u0306".
This is because 🙋🏾 is actually encoded as two code points: a base character (🙋 - U+1F64B), with a modifier (🏾, U+1F3FE; see also). Similarly with one of the letters: the first, ў, is a single pre-combined code point (U+045E), while the second, ў – though it looks the same – is formed by combining у (U+0443) with the modifier ̆ (U+0306 - which may or may not render properly, including on this page, since it's not meant to stand alone). So, depending on what you're doing, you may need to watch out for such things (which I leave as an exercise for the reader).
Method 2a: from web-based tools: specific characters:
Alternatively, if you have, say, an e-mail with a character in it, and you want to find the code point value to encode, if you simply do a web search for that character, you'll frequently find a variety of pages that give unicode details for the particular character. For example, if I do a google search for ✓, I get, among other things, a wiktionary entry, a wikipedia page, and a page on fileformat.info, which I find to be a useful site for getting details on specific unicode characters. And each of those pages lists the fact that that check mark is represented by unicode code point U+2713. (Incidentally, searching in that direction works well, too.)
Method 2b: from web-based tools: by name/concept:
Similarly, one can search for unicode symbols to match a particular concept. For example, I searched above for unicode check marks, and even on the Google snippet there was a listing of several code points with corresponding graphics, though I also find this list of several check mark symbols, and even a "list of useful symbols" which has a bunch of things, including various check marks.
This can similarly be done for accented characters, emoticons, etc. Just search for the word "unicode" along with whatever else you're looking for, and you'll tend to get results that include pages that list the code points. Which then brings us to putting that back into ruby:
Representing the value, once you have it:
The Ruby documentation for string literals describes two ways to represent unicode characters as escape sequences:
\unnnn Unicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F])
\u{nnnn ...} Unicode character(s), where each nnnn is 1-6 hexadecimal digits ([0-9a-fA-F])
So for code points with a 4-digit representation, e.g. U+2713 from above, you'd enter (within a string literal that's not in single quotes) this as \u2713. And for any unicode character (whether or not it fits in 4 digits), you can use braces ({ and }) around the full hex value for the code point, e.g. \u{1f60d} for 😍. This form can also be used to encode multiple code points in a single escape sequence, separating characters with whitespace. For example, \u{1F64B 1F3FE} would result in the base character 🙋 plus the modifier 🏾, thus ultimately yielding the abstract character 🙋🏾 (as seen above).
This works with shorter code points, too. For example, that currency character string from above (€£¥$) could be represented with \u{20AC A3 A5 24} – requiring only 2 digits for three of the characters.
You can directly use unicode characters if you just add #Encoding: UTF-8 to the top of your file. Then you can freely use ä, ǹ, ú and so on in your source code.
try this gem. It converts Unicode or non-ASCII punctuation and symbols to nearest ASCII punctuation and symbols
https://github.com/qwuen/punctuate
example usage:
"100٪".punctuate
=> "100%"
the gem uses the reference in https://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/docs/designDoc/UDF/unicode/DefaultTables/symbolTable.html for the conversion.

Resources