Is there a more efficient way to handle string escaping in this function? - go

I'm migrating some existing code from another language. In the following function it's more or less a 1-1 migration, but given the newness of the language to me I'd like to know if there's better / more efficient ways to handle how the escaped string gets built:
func influxEscape(str string) string {
var chars = map[string]bool{
"\\": true,
"\"": true,
",": true,
"=": true,
" ": true,
}
var escapeStr = ""
for i := 0; i < len(str); i++ {
var char = string(str[i])
if chars[char] == true {
escapeStr += "\\" + char
} else {
escapeStr += char
}
}
return escapeStr
}
This code performs escaping to make string values compatible with the InfluxDB line protocol.

This should be a comment, but it needs too much room for that.
One more thing to consider—which I mentioned in a comment on Burak Serdar's answer—is what happens when your input string is not valid UTF-8.
Remember that a Go string is a byte sequence. It need not be valid Unicode. It may be intended to represent valid Unicode, or it may not. For instance, it could be ISO-Latin-1 or something else that might not play well with UTF-8.
If it is non-UTF-8, using a range loop on it will translate each invalid sequence to the invalid rune. (See the linked Go blog post.) If it is intended to be valid UTF-8, this may be a plus, and of course, you can check for the resulting RuneError.
Your original loop leaves characters above ASCII DEL (127 or 0x7f) alone. If the bytes in the string are something like ISO-Latin-1, this may be the correct behavior. If not, you may be passing invalid, un-sanitized input to this other program. If you are deliberately sanitizing input, you must find out what kind of input it expects, and do a complete job of sanitizing input.
(I still have scars from being forced to cope with a really poor XML encoder coupled to an old database from some number of jobs ago, so I tend to be extra-cautious here.)

This should be somewhat equivalent to your code:
out := bytes.Buffer{}
for _, x := range str {
if strings.IndexRune(`\",= `, x)!=-1 {
out.WriteRune('\\')
}
out.WriteRune(x)
}
return out.String()

Related

Normalizing text input to ASCII

I am building a small tool which parses a user's input and finds common pitfalls in writing and flags them so the user can improve their text. So far everything works well except for text that has curly quotes compared to normal ASCII straight quotes. I have a hack now which will do a string replacement for opening (and closing) single curly quotes and double opening (and close) curly quotes like so:
cleanedData := bytes.Replace([]byte(data), []byte("’"), []byte("'"), -1)
I feel like there must be a better way to handle this in the stdlib so I can also convert other non-ascii characters to an ascii equivalent. Any help would be greatly appreciated.
The strings.Map function looks to me like what you want.
I don't know of a generic 'ToAscii' type function, but Map has a nice approach for mapping runes to other runes.
Example (updated):
func main() {
data := "Hello “Frank” or ‹François› as you like to be ‘called’"
fmt.Printf("Original: %s\n", data)
cleanedData := strings.Map(normalize, data)
fmt.Printf("Cleaned: %s\n", cleanedData)
}
func normalize(in rune) rune {
switch in {
case '“', '‹', '”', '›':
return '"'
case '‘', '’':
return '\''
}
return in
}
Output:
Original: Hello “Frank” or ‹François› as you like to be ‘called’
Cleaned: Hello "Frank" or "François" as you like to be 'called'

Differences between strings.Contains and strings.ContainsAny in Golang

In the source code:
// Contains returns true if substr is within s.
func Contains(s, substr string) bool {
return Index(s, substr) >= 0
}
// ContainsAny returns true if any Unicode code points in chars are within s.
func ContainsAny(s, chars string) bool {
return IndexAny(s, chars) >= 0
}
the only difference seems to be substr and the Unicode code points in chars. I wrote some test to test both of them. Their behaviors seem to be identical. I don't understand when to use which.
I think two functions are totally different. Contains are used to detect if a string contains a substring. ContainsAny are used to detect if a string contains any chars in the provided string.
Contains function reports whether a sub-string is within the string. Whereas ContainsAny function reports whether any Unicode code points in chars are within the string. Look at the documentation.
func main() {
fmt.Println(strings.Contains("seafood", "aes"))
fmt.Println(strings.ContainsAny("seafood", "aes"))
fmt.Println(strings.Contains("iiii", "ui"))
fmt.Println(strings.ContainsAny("iiii", "ui"))
}
The output is;
false
true
false
true

How do I do cursor-up in Go?

How do I do “cursor-up” in Go? (Clear-to-end-of-line would also be good to know). (All platforms).
To elaborate and show the context, I’m writing a test program in Go that requires the input of some parameters (via console) that are stored in a text file and used as defaults for the next usage. I want to have some very rudimentary console “editing” features.
Currently it is fairly primitive because I don’t want to go deeply into console editing, I just want something fairly basic but also not too basic.
In the example below from my test program, the String variable “sPrompt” contains the prompt for the input, and to the right it shows the default and then there are backspace characters to position the cursor so that the default is not overwritten – like I said, very basic.
When the operator enters the input, if an error, I'd like to display an error message, and then in either case move the cursor up to the line just displayed/entered and if an error, then display the original line, or if correct, display just the prompt and the new parameter.
I did read somewhere that ReadLine() should be avoided, but it appears to do the job.
Example:
func fInputString(sPrompt string, asValid []string, sDefault string)
(sInput string, tEnd bool) {
oBufReader := bufio.NewReader(os.Stdin)
for {
print("\n" + sPrompt)
vLine, _, _ := oBufReader.ReadLine()
sInput = strings.ToLower(string(vLine))
if sInput == "end" {
return "", true
}
if sInput == "" {
sInput = sDefault
}
// check for valid input //
for _, sVal := range asValid {
if sInput == sVal {
return sInput, false
}
}
}
}
This is how sPrompt is constructed (not meant to be optimized):
if sDefault != "" {
for len(sPrompt) < 67 {
sPrompt += " "
}
sPrompt += sDefault
for iBack := 20 + len(sDefault); iBack > 0; iBack-- {
sPrompt += "\b"
}
}
With tput strings that control the cursor around the screen:
tput sc to save the cursor position
tput rc to restore the cursor position
Then using these strings in the Go function:
package main
import (
"fmt"
"time"
)
const sc = "\u001B7"
const rc = "\u001B8"
func main() {
fmt.Print(sc)
for i := 1; i <= 10; i++ {
fmt.Print(rc + sc)
fmt.Println(i, "one")
fmt.Println(i, "two")
fmt.Println(i, "three")
fmt.Println(i, "four")
fmt.Println(i, "five")
time.Sleep(time.Second)
}
}
Should work in most terminals.
Make your own little shell
You should not reinvent the wheel and use a library which does exactly what you want.
A popular option is the readline library which is apparently available for Windows
as well. This is used, for example, by bash and ZSH. There are some Go wrappers for it:
https://github.com/shavac/readline
https://github.com/bobappleyard/readline
I personally would recommend bobappleyard/readline as it is better documented
and has a nicer API (less C-like). There does not seem to be a special build tag for
Windows, so you might have to write it for yourself but that should be not that hard.
termbox and its (pure) Go implementation termbox-go which was already pointed out by #bgp does not seem to be good for simply reading a line as it seems to be more intended
for full screen console applications. Also, you would have to code the up/down matching
and history yourself.
.Readline()
The doc is right by saying that you should not use this as it does not handle anything for
you. For example, reading from a stream that emits partial data, you have no guarantee that you will get a full line from Readline. Use ReadSlice('\n') for that.

How to ignore fields with sscanf (%* is rejected)

I wish to ignore a particular field whilst processing a string with sscanf.
Man page for sscanf says
An optional '*' assignment-suppression character: scanf() reads input as directed by the conversion specification, but discards the input. No corresponding pointer argument is required, and this specification is not included in the count of successful assignments returned by scanf().
Attempting to use this in Golang, to ignore the 3rd field:
if c, err := fmt.Sscanf(str, " %s %d %*d %d ", &iface.Name, &iface.BTx, &iface.BytesRx); err != nil || c != 3 {
compiles OK, but at runtime err is set to:
bad verb %* for integer
Golang doco doesn't specifically mention the %* conversion specification, but it does say,
Package fmt implements formatted I/O with functions analogous to C's printf and scanf.
It doesn't indicate that %* is not implemented, so... Am I doing it wrong? Or has it just been quietly omitted? ...but then, why does it compile?
To the best of my knowledge there is no such verb (as the format specifiers are called in the fmt package) for this task. What you can do however, is specifying some verb and ignoring its value. This is not particularly memory friendly, though. Ideally this would work:
fmt.Scan(&a, _, &b)
Sadly, it doesn't. So your next best option would be to declare the variables and ignore the one
you don't want:
var a,b,c int
fmt.Scanf("%d %v %d", &a, &b, &c)
fmt.Println(a,c)
%v would read a space separated token. Depending on what you're scanning on, you may fast forward the
stream to the position you need to scan on. See this answer
for details on seeking in buffers. If you're using stdio or you don't know which length your input may
have, you seem to be out of luck here.
It doesn't indicate that %* is not implemented, so... Am I doing it
wrong? Or has it just been quietly omitted? ...but then, why does it
compile?
It compiles because for the compiler a format string is just a string like any other. The content of that string is evaluated at run time by functions of the fmt package. Some C compilers may check format strings
for correctness, but this is a feature, not the norm. With go, the go vet command will try to warn you about format string errors with mismatched arguments.
Edit:
For the special case of needing to parse a row of integers and just caring for some of them, you
can use fmt.Scan in combination with a slice of integers. The following example reads 3 integers
from stdin and stores them in the slice named vals:
ints := make([]interface{}, 3)
vals := make([]int, len(ints))
for i, _ := range ints {
ints[i] = interface{}(&vals[i])
}
fmt.Scan(ints...)
fmt.Println(vals)
This is probably shorter than the conventional split/trim/strconv chain. It makes a slice of pointers
which each points to a value in vals. fmt.Scan then fills these pointers. With this you can even
ignore most of the values by assigning the same pointer over and over for the values you don't want:
ignored := 0
for i, _ := range ints {
if(i == 0 || i == 2) {
ints[i] = interface{}(&vals[i])
} else {
ints[i] = interface{}(&ignored)
}
}
The example above would assign the address of ignore to all values except the first and the second, thus
effectively ignoring them by overwriting.

How to be definite about the number of whitespace fmt.Fscanf consumes?

I am trying to implement a PPM decoder in Go. PPM is an image format that consists of a plaintext header and then some binary image data. The header looks like this (from the spec):
Each PPM image consists of the following:
A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P6".
Whitespace (blanks, TABs, CRs, LFs).
A width, formatted as ASCII characters in decimal.
Whitespace.
A height, again in ASCII decimal.
Whitespace.
The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.
A single whitespace character (usually a newline).
I try to decode this header with the fmt.Fscanf function. The following call to
fmt.Fscanf parses the header (not addressing the caveat explained below):
var magic string
var width, height, maxVal uint
fmt.Fscanf(input,"%2s %d %d %d",&magic,&width,&height,&maxVal)
The documentation of fmt states:
Note: Fscan etc. can read one character (rune) past the input they
return, which means that a loop calling a scan routine may skip some
of the input. This is usually a problem only when there is no space
between input values. If the reader provided to Fscan implements
ReadRune, that method will be used to read characters. If the reader
also implements UnreadRune, that method will be used to save the
character and successive calls will not lose data. To attach ReadRune
and UnreadRune methods to a reader without that capability, use
bufio.NewReader.
As the very next character after the final whitespace is already the beginning of the image data, I have to be certain about how many whitespace fmt.Fscanf did consume after reading MaxVal. My code must work on whatever reader the was provided by the caller and parts of it must not read past the end of the header, therefore wrapping stuff into a buffered reader is not an option; the buffered reader might read more from the input than I actually want to read.
Some testing suggests that parsing a dummy character at the end solves the issues:
var magic string
var width, height, maxVal uint
var dummy byte
fmt.Fscanf(input,"%2s %d %d %d%c",&magic,&width,&height,&maxVal,&dummy)
Is that guaranteed to work according to the specification?
No, I would not consider that safe. While it works now, the documentation states that the function reserves the right to read past the value by one character unless you have an UnreadRune() method.
By wrapping your reader in a bufio.Reader, you can ensure the reader has an UnreadRune() method. You will then need to read the final whitespace yourself.
buf := bufio.NewReader(input)
fmt.Fscanf(buf,"%2s %d %d %d",&magic,&width,&height,&maxVal)
buf.ReadRune() // remove next rune (the whitespace) from the buffer.
Edit:
As we discussed in the chat, you can assume the dummy char method works and then write a test so you know when it stops working. The test can be something like:
func TestFmtBehavior(t *testing.T) {
// use multireader to prevent r from implementing io.RuneScanner
r := io.MultiReader(bytes.NewReader([]byte("data ")))
n, err := fmt.Fscanf(r, "%s%c", new(string), new(byte))
if n != 2 || err != nil {
t.Error("failed scan", n, err)
}
// the dummy char read 1 extra char past "data".
// one byte should still remain
if n, err := r.Read(make([]byte, 5)); n != 1 {
t.Error("assertion failed", n, err)
}
}

Resources