Escape unicode characters in json encode golang - go

Given the following example:
func main() {
buf := new(bytes.Buffer)
enc := json.NewEncoder(buf)
toEncode := []string{"hello", "wörld"}
enc.Encode(toEncode)
fmt.Println(buf.String())
}
I would like to have the output presented with escaped Unicode characters:
["hello","w\u00f6rld"]
Rather than:
["hello","wörld"]
I have attempted to write a function to quote the Unicode characters using strconv.QuoteToASCII and feed the results to Encode() however that results in double escaping:
func quotedUnicode(data []string) []string {
for index, element := range data {
quotedUnicode := strconv.QuoteToASCII(element)
// get rid of additional quotes
quotedUnicode = strings.TrimSuffix(quotedUnicode, "\"")
quotedUnicode = strings.TrimPrefix(quotedUnicode, "\"")
data[index] = quotedUnicode
}
return data
}
["hello","w\\u00f6rld"]
How can I ensure that the output from json.Encode contains correctly escaped Unicode characters?

Related

How to split a string by a delimiter in Golang

I want to split a string by a delimiter, and the result should be a slice.
For example:
The given string might be like this:
foo := "foo=1&bar=2&&a=8"
The result should be like
result := []string{
"foo=1",
"bar=2&",
"a=8"
}
I mean i can use strings.split(foo, "&") to split but the result does not meet my requirements.
// the result with strings.split()
result := []string{
"foo=1",
"bar=2",
"",
"a=8"
}
Why not use url.ParseQuery. If you need a special characters like & (which is the delimiter) to not be treated as such, then it needs to be escaped (%26):
//m, err := url.ParseQuery("foo=1&bar=2&&a=8")
m, err := url.ParseQuery("foo=1&bar=2%26&a=8") // escape bar's & as %26
fmt.Println(m) // map[a:[8] bar:[2&] foo:[1]]
https://play.golang.org/p/zG3NEL70HxE
You would typically URL encode parameters like so:
q := url.Values{}
q.Add("foo", "1")
q.Add("bar", "2&")
q.Add("a", "8")
fmt.Println(q.Encode()) // a=8&bar=2%26&foo=1
https://play.golang.org/p/hSdiLtgVj7m

Convert one unicode string to array

How can I convert this string into an array or slice in Golang ?
Separators are unicode caracters \u001e and \uu1d.
inputString="\u001e456\u001dBernard Janv\u001d0022000\u001d250\u001d804\u001d1169\u001d\u001d168"
Use strings.Fields to split the string:
inputString := "\u001e456\u001dBernard Janv\u001d0022000\u001d250\u001d804\u001d1169\u001d\u001d168"
parts := strings.FieldsFunc(inputString, func(r rune) bool {
return r == '\u001d' || r == '\u001e'
})
for _, part := range parts {
fmt.Println(part)
}
https://play.golang.org/p/x_le2P3h8ry

How in golang to remove the last letter from the string?

Let's say I have a string called varString.
varString := "Bob,Mark,"
QUESTION: How to remove the last letter from the string? In my case, it's the second comma.
How to remove the last letter from the string?
In Go, character strings are UTF-8 encoded. Unicode UTF-8 is a variable-length character encoding which uses one to four bytes per Unicode character (code point).
For example,
package main
import (
"fmt"
"unicode/utf8"
)
func trimLastChar(s string) string {
r, size := utf8.DecodeLastRuneInString(s)
if r == utf8.RuneError && (size == 0 || size == 1) {
size = 0
}
return s[:len(s)-size]
}
func main() {
s := "Bob,Mark,"
fmt.Println(s)
s = trimLastChar(s)
fmt.Println(s)
}
Playground: https://play.golang.org/p/qyVYrjmBoVc
Output:
Bob,Mark,
Bob,Mark
Here's a much simpler method that works for unicode strings too:
func removeLastRune(s string) string {
r := []rune(s)
return string(r[:len(r)-1])
}
Playground link: https://play.golang.org/p/ezsGUEz0F-D
Something like this:
s := "Bob,Mark,"
s = s[:len(s)-1]
Note that this does not work if the last character is not represented by just one byte.
newStr := strings.TrimRightFunc(str, func(r rune) bool {
return !unicode.IsLetter(r) // or any other validation can go here
})
This will trim anything that isn't a letter on the right hand side.

Replace a character at a specific location in a string

I know about the method string.Replace(). And it works if you know exactly what to replace and its occurrences. But what can I do if I want to replace a char at only a known position? I'm thinking of something like this:
randLetter := getRandomChar()
myText := "This is my text"
randPos := rand.Intn(len(myText) - 1)
newText := [:randPos] + randLetter + [randPos + 1:]
But this does not replace the char at randPos, just inserts the randLetter at that position. Right?
I've written some code to replace the character found at indexofcharacter with the replacement. I may not be the best method, but it works fine.
https://play.golang.org/p/9CTgHRm6icK
func replaceAtPosition(originaltext string, indexofcharacter int, replacement string) string {
runes := []rune(originaltext )
partOne := string(runes[0:indexofcharacter-1])
partTwo := string(runes[indexofcharacter:len(runes)])
return partOne + replacement + partTwo
}
UTF-8 is a variable-length encoding. For example,
package main
import "fmt"
func insertChar(s string, c rune, i int) string {
if i >= 0 {
r := []rune(s)
if i < len(r) {
r[i] = c
s = string(r)
}
}
return s
}
func main() {
s := "Hello, 世界"
fmt.Println(s)
s = insertChar(s, 'X', len([]rune(s))-1)
fmt.Println(s)
}
Output:
Hello, 世界
Hello, 世X
A string is a read-only slice of bytes. You can't replace anything.
A single Rune can consist of multiple bytes. So you should convert the string to a (intermediate) mutable slice of Runes anyway:
myText := []rune("This is my text")
randPos := rand.Intn(len(myText) - 1)
myText[randPos] = randLetter
fmt.Println(string(myText))

How to stop json.Marshal from escaping < and >?

package main
import "fmt"
import "encoding/json"
type Track struct {
XmlRequest string `json:"xmlRequest"`
}
func main() {
message := new(Track)
message.XmlRequest = "<car><mirror>XML</mirror></car>"
fmt.Println("Before Marshal", message)
messageJSON, _ := json.Marshal(message)
fmt.Println("After marshal", string(messageJSON))
}
Is it possible to make json.Marshal not escape < and >? I currently get:
{"xmlRequest":"\u003ccar\u003e\u003cmirror\u003eXML\u003c/mirror\u003e\u003c/car\u003e"}
but I am looking for something like this:
{"xmlRequest":"<car><mirror>XML</mirror></car>"}
As of Go 1.7, you still cannot do this with json.Marshal(). The source code for json.Marshal shows:
> err := e.marshal(v, encOpts{escapeHTML: true})
The reason json.Marshal always does this is:
String values encode as JSON strings coerced to valid UTF-8,
replacing invalid bytes with the Unicode replacement rune.
The angle brackets "<" and ">" are escaped to "\u003c" and "\u003e"
to keep some browsers from misinterpreting JSON output as HTML.
Ampersand "&" is also escaped to "\u0026" for the same reason.
This means you cannot even do it by writing a custom func (t *Track) MarshalJSON(), you have to use something that does not satisfy the json.Marshaler interface.
So, the workaround, is to write your own function:
func (t *Track) JSON() ([]byte, error) {
buffer := &bytes.Buffer{}
encoder := json.NewEncoder(buffer)
encoder.SetEscapeHTML(false)
err := encoder.Encode(t)
return buffer.Bytes(), err
}
https://play.golang.org/p/FAH-XS-QMC
If you want a generic solution for any struct, you could do:
func JSONMarshal(t interface{}) ([]byte, error) {
buffer := &bytes.Buffer{}
encoder := json.NewEncoder(buffer)
encoder.SetEscapeHTML(false)
err := encoder.Encode(t)
return buffer.Bytes(), err
}
https://play.golang.org/p/bdqv3TUGr3
In Go1.7 the have added a new option to fix this:
encoding/json:
add Encoder.DisableHTMLEscaping This provides a way to disable the escaping of <, >, and & in JSON strings.
The relevant function is
func (*Encoder) SetEscapeHTML
That should be applied to a Encoder.
enc := json.NewEncoder(os.Stdout)
enc.SetEscapeHTML(false)
Simple example: https://play.golang.org/p/SJM3KLkYW-
This doesn't answer the question directly but it could be an answer if you're looking for a way how to deal with json.Marshal escaping < and >...
Another way to solve the problem is to replace those escaped characters in json.RawMessage into just valid UTF-8 characters, after the json.Marshal() call.
It will work as well for any letters other than < and >. (I used to do this to make non-English letters to be human readable in JSON :D)
func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
if err != nil {
return nil, err
}
return []byte(str), nil
}
func main() {
// Both are valid JSON.
var jsonRawEscaped json.RawMessage // json raw with escaped unicode chars
var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars
// '\u263a' == '☺'
jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped) // "☺"
fmt.Println(string(jsonRawEscaped)) // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
}
https://play.golang.org/p/pUsrzrrcDG-
I hope this helps someone.
Here's my workaround:
// Marshal is a UTF-8 friendly marshaler. Go's json.Marshal is not UTF-8
// friendly because it replaces the valid UTF-8 and JSON characters "&". "<",
// ">" with the "slash u" unicode escaped forms (e.g. \u0026). It preemptively
// escapes for HTML friendliness. Where text may include any of these
// characters, json.Marshal should not be used. Playground of Go breaking a
// title: https://play.golang.org/p/o2hiX0c62oN
func Marshal(i interface{}) ([]byte, error) {
buffer := &bytes.Buffer{}
encoder := json.NewEncoder(buffer)
encoder.SetEscapeHTML(false)
err := encoder.Encode(i)
return bytes.TrimRight(buffer.Bytes(), "\n"), err
}
No, you can't.
A third-party json package might be the choice rather than the std json lib.
More detail:https://github.com/golang/go/issues/8592
I had a requirement to store xml inside json :puke:
At first I was having significant difficulty unmarshalling that xml after passing it via json, but my issue was actually due to trying to unmarshall the xml string as a json.RawMessage. I actually needed to unmarshall it as a string and then coerce it into []byte for the xml.Unmarshal.
type xmlInJson struct {
Data string `json:"data"`
}
var response xmlInJson
err := json.Unmarshall(xmlJsonData, &response)
var xmlData someOtherStructThatMatchesTheXmlFormat
err = xml.Unmarshall([]byte(response.Data), &xmlData)
Custom function is not kind of the best solution.
How about another library to solve this.
I use gabs
import
go get "github.com/Jeffail/gabs"
use
message := new(Track)
resultJson,_:=gabs.Consume(message)
fmt.Println(string(resultJson.EncodeJSON()))
I solve that problem like this.

Resources