unicode being output literally instead of as unicode - go

I am creating an IRC bot using Go as a first project to get to grips with the language. One of the bot functions is to grab data from the TVmaze API and display in the channel.
I have imported an env package which allows the bot admin to define how the output is displayed.
For example SHOWSTRING="#showname# - #status# – #network.name#"
I am trying to add functionality to it so that the admin can use IRC formatting functionality which is accessed with \u0002 this is bold \u0002 for example.
I have a function which generates the string that is being returned and displayed in the channel.
func generateString(show Show) string {
str := os.Getenv("SHOWSTRING")
r := strings.NewReplacer(
"#ID#", string(show.ID),
"#showname#", show.Name,
"#status#", show.Status,
"#network.name#", show.Network.Name,
)
result := r.Replace(str)
return result
}
From what i have read i think that i need to use the rune datatype instead of string and then converting the runes into a string before being output.
I am using the https://github.com/thoj/go-irceven package for interacting with IRC.
Although i think that using rune is the correct way to go, i have tried a few things that have confused me.
If i add \u0002 to the SHOWSTRING from the env, it returns \u0002House\u0002 - Ended - Fox. I am doing this by con.Privmsg(roomName, tvmaze.ShowLookup('house'))
However if i try con.Privmsg(roomName, "\u0002This should be bold\u0002") it outputs bold text.
What is the best option here? If it is converting the string into runes and then back to a string, how do i go about doing that?

I needed to use strconv.Unquote() on my return in the function.
The new generateString function now outputs the correct string and looks like this
func generateString(show Show) string {
str := os.Getenv("SHOWSTRING")
r := strings.NewReplacer(
"#ID#", string(show.ID),
"#showname#", show.Name,
"#status#", show.Status,
"#network.name#", show.Network.Name,
)
result := r.Replace(str)
ret, err := strconv.Unquote(`"` + result + `"`)
if err != nil {
fmt.Println("Error unquoting the string")
}
return ret
}

Related

How to convert the string representation of a Terraform set of strings to a slice of strings

I've a terratest where I get an output from terraform like so s := "[a b]". The terraform output's value = toset([resource.name]), it's a set of strings.
Apparently fmt.Printf("%T", s) returns string. I need to iterate to perform further validation.
I tried the below approach but errors!
var v interface{}
if err := json.Unmarshal([]byte(s), &v); err != nil {
fmt.Println(err)
}
My current implementation to convert to a slice is:
s := "[a b]"
s1 := strings.Fields(strings.Trim(s, "[]"))
for _, v:= range s1 {
fmt.Println("v -> " + v)
}
Looking for suggestions to current approach or alternative ways to convert to arr/slice that I should be considering. Appreciate any inputs. Thanks.
Actually your current implementation seems just fine.
You can't use JSON unmarshaling because JSON strings must be enclosed in double quotes ".
Instead strings.Fields does just that, it splits a string on one or more characters that match unicode.IsSpace, which is \t, \n, \v. \f, \r and .
Moeover this works also if terraform sends an empty set as [], as stated in the documentation:
returning [...] an empty slice if s contains only white space.
...which includes the case of s being empty "" altogether.
In case you need additional control over this, you can use strings.FieldsFunc, which accepts a function of type func(rune) bool so you can determine yourself what constitutes a "space". But since your input string comes from terraform, I guess it's going to be well-behaved enough.
There may be third-party packages that already implement this functionality, but unless your program already imports them, I think the native solution based on the standard lib is always preferrable.
unicode.IsSpace actually includes also the higher runes 0x85 and 0xA0, in which case strings.Fields calls FieldsFunc(s, unicode.IsSpace)
package main
import (
"fmt"
"strings"
)
func main() {
src := "[a b]"
dst := strings.Split(src[1:len(src)-1], " ")
fmt.Println(dst)
}
https://play.golang.org/p/KVY4r_8RWv6

Getting EOF on 2nd prompt when using a file as Stdin (Golang)

I am trying to do functional testing of a cli app similar to this way.
As the command asks a few input on command prompt, I am putting them in a file and setting it as os.Stdin.
cmd := exec.Command(path.Join(dir, binaryName), "myArg")
tmpfile := setStdin("TheMasterPassword\nSecondAnswer\n12121212\n")
cmd.Stdin = tmpfile
output, err := cmd.CombinedOutput()
The setStdin just creates a tmpFile, write the string in file and returns the *os.File.
Now, I am expecting TheMasterPassword to be first input, and it's working. But for the second input always getting Critical Error: EOF.
The function I am using for asking and getting user input this :
func Ask(question string, minLen int) string {
reader := bufio.NewReader(os.Stdin)
for {
fmt.Printf("%s: ", question)
response, err := reader.ReadString('\n')
ExitIfError(err)
if len(response) >= minLen {
return strings.TrimSpace(response)
} else {
fmt.Printf("Provide at least %d character.\n", minLen)
}
}
}
Can you please help me to find out what's going wrong?
Thanks a lot!
Adding setStdin as requested
func setStdin(userInput string) *os.File {
tmpfile, err := ioutil.TempFile("", "test_stdin_")
util.ExitIfError(err)
_, err = tmpfile.Write([]byte(userInput))
util.ExitIfError(err)
_, err = tmpfile.Seek(0, 0)
util.ExitIfError(err)
return tmpfile
}
It pretty much looks like in your app your call Ask() whenever you want a single input line.
Inside Ask() you create a bufio.Reader to read from os.Stdin. Know that bufio.Reader–as its name suggests–uses buffered reading, meaning it may read more data from its source than what is returned by its methods (Reader.ReadString() in this case). Which means if you just use it to read one (or some) lines and you throw away the reader, you will throw away buffered, unread data.
So next time you call Ask() again, attempting to read from os.Stdin, you will not continue from where you left off...
To fix this issue, only create a single bufio.Reader from os.Stdin, store it in a global variable for example, and inside Ask(), always use this single reader. So buffered and unread data will not be lost between Ask() calls. Of course this solution will not be valid to call from multiple goroutines, but reading from a single os.Stdin isn't either.
For example:
var reader = bufio.NewReader(os.Stdin)
func Ask(question string, minLen int) string {
// use the global reader here...
}
Also note that using bufio.Scanner would be easier in your case. But again, bufio.Scanner may also read more data from its source than needed, so you have to use a shared bufio.Scanner here too. Also note that Reader.ReadString() returns you a string containing the delimeter (a line ending with \n in your case) which you probably have to trim, while Scanner.Text() (with the default line splitting function) will strip that first before returning the line. That's also a simplification you can take advantage of.

Scandinavian characters not working in go-lang go-instagram API bindings

Hi I'm trying to wrap my head around what seems to be a problem with multibyte support in this open source library (https://github.com/carbocation/go-instagram/). I am using the code below to retrieve information about the tag blue in swedish. How ever I get an empty array when trying.
fmt.Println("Starting instagram download.")
client := instagram.NewClient(nil)
client.ClientID = "myid"
media, _, _ := client.Tags.RecentMedia("blå", nil)
fmt.Println(media)
I have tried using the api trough the browser and there are several pictures tagged with the tag. I have also tried using the code snippet with tags in English like blue and that returns the latest pictures as well. I would be glad if any one could explain why this might happen. Id like to update the lib so it supports multi-byte but I haven't got the go knowledge required. Is this a go problem or a problem with the library?
Thank you
The problem is in validTagName():
// Strip out things we know Instagram won't accept. For example, hyphens.
func validTagName(tagName string) (bool, error) {
//\W matches any non-word character
reg, err := regexp.Compile(`\W`)
if err != nil {
return false, err
}
if reg.MatchString(tagName) {
return false, nil
}
return true, nil
}
In Go, \W matches precisely [^0-9A-Za-z_]. This validation check is incorrect.

How to stop json.Marshal from escaping < and >?

package main
import "fmt"
import "encoding/json"
type Track struct {
XmlRequest string `json:"xmlRequest"`
}
func main() {
message := new(Track)
message.XmlRequest = "<car><mirror>XML</mirror></car>"
fmt.Println("Before Marshal", message)
messageJSON, _ := json.Marshal(message)
fmt.Println("After marshal", string(messageJSON))
}
Is it possible to make json.Marshal not escape < and >? I currently get:
{"xmlRequest":"\u003ccar\u003e\u003cmirror\u003eXML\u003c/mirror\u003e\u003c/car\u003e"}
but I am looking for something like this:
{"xmlRequest":"<car><mirror>XML</mirror></car>"}
As of Go 1.7, you still cannot do this with json.Marshal(). The source code for json.Marshal shows:
> err := e.marshal(v, encOpts{escapeHTML: true})
The reason json.Marshal always does this is:
String values encode as JSON strings coerced to valid UTF-8,
replacing invalid bytes with the Unicode replacement rune.
The angle brackets "<" and ">" are escaped to "\u003c" and "\u003e"
to keep some browsers from misinterpreting JSON output as HTML.
Ampersand "&" is also escaped to "\u0026" for the same reason.
This means you cannot even do it by writing a custom func (t *Track) MarshalJSON(), you have to use something that does not satisfy the json.Marshaler interface.
So, the workaround, is to write your own function:
func (t *Track) JSON() ([]byte, error) {
buffer := &bytes.Buffer{}
encoder := json.NewEncoder(buffer)
encoder.SetEscapeHTML(false)
err := encoder.Encode(t)
return buffer.Bytes(), err
}
https://play.golang.org/p/FAH-XS-QMC
If you want a generic solution for any struct, you could do:
func JSONMarshal(t interface{}) ([]byte, error) {
buffer := &bytes.Buffer{}
encoder := json.NewEncoder(buffer)
encoder.SetEscapeHTML(false)
err := encoder.Encode(t)
return buffer.Bytes(), err
}
https://play.golang.org/p/bdqv3TUGr3
In Go1.7 the have added a new option to fix this:
encoding/json:
add Encoder.DisableHTMLEscaping This provides a way to disable the escaping of <, >, and & in JSON strings.
The relevant function is
func (*Encoder) SetEscapeHTML
That should be applied to a Encoder.
enc := json.NewEncoder(os.Stdout)
enc.SetEscapeHTML(false)
Simple example: https://play.golang.org/p/SJM3KLkYW-
This doesn't answer the question directly but it could be an answer if you're looking for a way how to deal with json.Marshal escaping < and >...
Another way to solve the problem is to replace those escaped characters in json.RawMessage into just valid UTF-8 characters, after the json.Marshal() call.
It will work as well for any letters other than < and >. (I used to do this to make non-English letters to be human readable in JSON :D)
func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
if err != nil {
return nil, err
}
return []byte(str), nil
}
func main() {
// Both are valid JSON.
var jsonRawEscaped json.RawMessage // json raw with escaped unicode chars
var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars
// '\u263a' == '☺'
jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped) // "☺"
fmt.Println(string(jsonRawEscaped)) // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
}
https://play.golang.org/p/pUsrzrrcDG-
I hope this helps someone.
Here's my workaround:
// Marshal is a UTF-8 friendly marshaler. Go's json.Marshal is not UTF-8
// friendly because it replaces the valid UTF-8 and JSON characters "&". "<",
// ">" with the "slash u" unicode escaped forms (e.g. \u0026). It preemptively
// escapes for HTML friendliness. Where text may include any of these
// characters, json.Marshal should not be used. Playground of Go breaking a
// title: https://play.golang.org/p/o2hiX0c62oN
func Marshal(i interface{}) ([]byte, error) {
buffer := &bytes.Buffer{}
encoder := json.NewEncoder(buffer)
encoder.SetEscapeHTML(false)
err := encoder.Encode(i)
return bytes.TrimRight(buffer.Bytes(), "\n"), err
}
No, you can't.
A third-party json package might be the choice rather than the std json lib.
More detail:https://github.com/golang/go/issues/8592
I had a requirement to store xml inside json :puke:
At first I was having significant difficulty unmarshalling that xml after passing it via json, but my issue was actually due to trying to unmarshall the xml string as a json.RawMessage. I actually needed to unmarshall it as a string and then coerce it into []byte for the xml.Unmarshal.
type xmlInJson struct {
Data string `json:"data"`
}
var response xmlInJson
err := json.Unmarshall(xmlJsonData, &response)
var xmlData someOtherStructThatMatchesTheXmlFormat
err = xml.Unmarshall([]byte(response.Data), &xmlData)
Custom function is not kind of the best solution.
How about another library to solve this.
I use gabs
import
go get "github.com/Jeffail/gabs"
use
message := new(Track)
resultJson,_:=gabs.Consume(message)
fmt.Println(string(resultJson.EncodeJSON()))
I solve that problem like this.

Is there a way of cleaning up this Go code?

I am just beginning to learn Go, and have made a function which parses markdown files with a header, containing some metadata (the files are blog posts).
here is an example:
---
Some title goes here
19 September 2012
---
This is some content, read it.
I've written this function, which works, but I feel it's quite verbose and messy, I've had a look at the various strings packages, but I don't know enough about Go and it's best practices to know what I should be doing differently, if I could get some tips to clean this up, I would appreciate it. (also, I know that i shouldn't be neglecting that error).
type Post struct {
Title string
Date string
Body string
}
func loadPost(title string) *Post {
filename := title + ".md"
file, _ := ioutil.ReadFile("posts/" + filename)
fileString := string(file)
str := strings.Split(fileString, "---")
meta := strings.Split(str[1], "\n")
title = meta[1]
date := meta[2]
body := str[2]
return &Post{Title: title, Date: date, Body: body}
}
I think it's not bad. A couple of suggestions:
The hard-coded slash in "posts/" is platform-dependent. You can use path/filepath.Join to avoid that.
There is bytes.Split, so you don't need the string(file).
You can create the Post without repeating the fields: &Post{title, date, body}
Alternatively, you could find out where the body starts with LastIndex(s, "--") and use that to index the file contents accordingly. This avoids the allocation of using Split.
const sep = "--"
func loadPost(content string) *Post {
sepLength := len(sep)
i := strings.LastIndex(content, sep)
headers := content[sepLength:i]
body := content[i+sepLength+1:]
meta := strings.Split(headers, "\n")
return &Post{meta[1], meta[2], body}
}
I agree that it's not bad. I'll add a couple of other ideas.
As Thomas showed, you don't need the intermediate variables title date and body. Try though,
return &Post{
Title: meta[1],
Date: meta[2],
Body: body,
}
It's true that you can leave the field names out, but I sometimes like them to keep the code self-documenting. (I think go vet likes them too.)
I fuss over strings versus byte slices, but probably more than I should. Since you're reading the file in one gulp, you probably don't need to worry about this. Converting everything to one big string and then slicing up the string is a handy way of doing things, just remember that you're pinning the entire string in memory if you keep any part of it. If your files are large or you have lots of them and you only end up keeping, say, the meta for most of them, this might not be the way to go.
There's just one blog entry per file? If so, I think I'll propose a variant of Thomas's suggestion. Verify the first bytes are --- (or your file is corrupt), then use strings.Index(fileString[3:], "---"). Split is more appropriate when you have an unknown number of segments. In your case you're just looking for that single separator after the meta. Index will find it after searching the meta and be done, without searching through the whole body. (And anyway, what if the body contained the string "---"?)
Finally, some people would use regular expressions for this. I still haven't warmed up to regular expressions, but anyway, it's another approach.
Sonia has some great suggestions. Below is my take which accounts for problems you might encounter when parsing the header.
http://play.golang.org/p/w-XYyhPj9n
package main
import (
"fmt"
"strings"
)
const sep = "---"
type parseError struct {
msg string
}
func (e *parseError) Error() string {
return e.msg
}
func parse(s string) (header []string, content string, err error) {
if !strings.HasPrefix(s, sep) {
return header, content, &parseError{"content does not start with `---`!"}
}
arr := strings.SplitN(s, sep, 3)
if len(arr) < 3 {
return header, content, &parseError{"header was not terminated with `---`!"}
}
header = strings.Split(strings.TrimSpace(arr[1]), "\n")
content = strings.TrimSpace(arr[2])
return header, content, nil
}
func main() {
//
f := `---
Some title goes here
19 September 2012
---
This is some content, read it. --Anonymous`
header, content, err := parse(f)
if err != nil {
panic(err)
}
for i, val := range header {
fmt.Println(i, val)
}
fmt.Println("---")
fmt.Println(content)
//
f = `---
Some title goes here
19 September 2012
This is some content, read it.`
_, _, err = parse(f)
fmt.Println("Error:", err)
//
f = `
Some title goes here
19 September 2012
---
This is some content, read it.`
_, _, err = parse(f)
fmt.Println("Error:", err)
}

Resources