How to convert ansi text to utf8 - utf-8

How to convert ansi text to utf8 in Go?
I am trying to convert ansi string to utf8 string.

Go only has UTF-8 strings. You can convert something to a UTF8 string using the conversion described here from a byte[]:
http://golang.org/doc/go_spec.html#Conversions

Here is newer method.
package main
import (
"bytes"
"fmt"
"io/ioutil"
"golang.org/x/text/encoding/traditionalchinese"
"golang.org/x/text/transform"
)
func Decode(s []byte) ([]byte, error) {
I := bytes.NewReader(s)
O := transform.NewReader(I, traditionalchinese.Big5.NewDecoder())
d, e := ioutil.ReadAll(O)
if e != nil {
return nil, e
}
return d, nil
}
func main() {
s := []byte{0xB0, 0xAA}
b, err := Decode(s)
fmt.Println(string(b))
fmt.Println(err)
}
I were use iconv-go to do such convert, you must know what's your ANSI code page, in my case, it is 'big5'.
package main
import (
"fmt"
//iconv "github.com/djimenez/iconv-go"
iconv "github.com/andelf/iconv-go"
"log"
)
func main() {
ibuf := []byte{170,76,80,67}
var obuf [256]byte
// Method 1: use Convert directly
nR, nW, err := iconv.Convert(ibuf, obuf[:], "big5", "utf-8")
if err != nil {
log.Fatalln(err)
}
log.Println(nR, ibuf)
log.Println(obuf[:nW])
fmt.Println(string(obuf[:nW]))
// Method 2: build a converter at first
cv, err := iconv.NewConverter("big5", "utf-8")
if err != nil {
log.Fatalln(err)
}
nR, nW, err = cv.Convert(ibuf, obuf[:])
if err != nil {
log.Fatalln(err)
}
log.Println(string(obuf[:nW]))
}

I've written a function that was useful for me, maybe someone else can use this. It converts from Windows-1252 to UTF-8. I've converted some code points that Windows-1252 treats as chars but Unicode considers to be control characters (http://en.wikipedia.org/wiki/Windows-1252)
func fromWindows1252(str string) string {
var arr = []byte(str)
var buf bytes.Buffer
var r rune
for _, b := range(arr) {
switch b {
case 0x80:
r = 0x20AC
case 0x82:
r = 0x201A
case 0x83:
r = 0x0192
case 0x84:
r = 0x201E
case 0x85:
r = 0x2026
case 0x86:
r = 0x2020
case 0x87:
r = 0x2021
case 0x88:
r = 0x02C6
case 0x89:
r = 0x2030
case 0x8A:
r = 0x0160
case 0x8B:
r = 0x2039
case 0x8C:
r = 0x0152
case 0x8E:
r = 0x017D
case 0x91:
r = 0x2018
case 0x92:
r = 0x2019
case 0x93:
r = 0x201C
case 0x94:
r = 0x201D
case 0x95:
r = 0x2022
case 0x96:
r = 0x2013
case 0x97:
r = 0x2014
case 0x98:
r = 0x02DC
case 0x99:
r = 0x2122
case 0x9A:
r = 0x0161
case 0x9B:
r = 0x203A
case 0x9C:
r = 0x0153
case 0x9E:
r = 0x017E
case 0x9F:
r = 0x0178
default:
r = rune(b)
}
buf.WriteRune(r)
}
return string(buf.Bytes())
}

There is no way to do it without writing the conversion yourself or using a third-party package. You could try using this: http://code.google.com/p/go-charset

golang.org/x/text/encoding/charmap package has functions exactly for this problem
import "golang.org/x/text/encoding/charmap"
func DecodeWindows1250(enc []byte) string {
dec := charmap.Windows1250.NewDecoder()
out, _ := dec.Bytes(enc)
return string(out)
}
func EncodeWindows1250(inp string) []byte {
enc := charmap.Windows1250.NewEncoder()
out, _ := enc.String(inp)
return out
}
Edit: undefined: ba is replace enc

Related

Using regular expressions in Go to Identify a common pattern

I'm trying to parse this string goats=1\r\nalligators=false\r\ntext=works.
contents := "goats=1\r\nalligators=false\r\ntext=works"
compile, err := regexp.Compile("([^#\\s=]+)=([a-zA-Z0-9.]+)")
if err != nil {
return
}
matchString := compile.FindAllStringSubmatch(contents, -1)
my Output looks like [[goats=1 goats 1] [alligators=false alligators false] [text=works text works]]
What I'm I doing wrong in my expression to cause goats=1 to be valid too? I only want [[goats 1]...]
For another approach, you can use the strings package instead:
package main
import (
"fmt"
"strings"
)
func parse(s string) map[string]string {
m := make(map[string]string)
for _, kv := range strings.Split(s, "\r\n") {
a := strings.Split(kv, "=")
m[a[0]] = a[1]
}
return m
}
func main() {
m := parse("goats=1\r\nalligators=false\r\ntext=works")
fmt.Println(m) // map[alligators:false goats:1 text:works]
}
https://golang.org/pkg/strings

Convert slice of string input from console to slice of numbers

I'm trying to write a Go script that takes in as many lines of comma-separated coordinates as the user wishes, split and convert the string of coordinates to float64, store each line as a slice, and then append each slice in a slice of slices for later usage.
Example inputs are:
1.1,2.2,3.3
3.14,0,5.16
Example outputs are:
[[1.1 2.2 3.3],[3.14 0 5.16]]
The equivalent in Python is
def get_input():
print("Please enter comma separated coordinates:")
lines = []
while True:
line = input()
if line:
line = [float(x) for x in line.replace(" ", "").split(",")]
lines.append(line)
else:
break
return lines
But what I wrote in Go seems way too long (pasted below), and I'm creating a lot of variables without the ability to change variable type as in Python. Since I literally just started writing Golang to learn it, I fear my script is long as I'm trying to convert Python thinking into Go. Therefore, I would like to ask for some advice as to how to write this script shorter and more concise in Go style? Thank you.
package main
import (
"fmt"
"os"
"bufio"
"strings"
"strconv"
)
func main() {
inputs := get_input()
fmt.Println(inputs)
}
func get_input() [][]float64 {
fmt.Println("Please enter comma separated coordinates: ")
var inputs [][]float64
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
if len(scanner.Text()) > 0 {
raw_input := strings.Replace(scanner.Text(), " ", "", -1)
input := strings.Split(raw_input, ",")
converted_input := str2float(input)
inputs = append(inputs, converted_input)
} else {
break
}
}
return inputs
}
func str2float(records []string) []float64 {
var float_slice []float64
for _, v := range records {
if s, err := strconv.ParseFloat(v, 64); err == nil {
float_slice = append(float_slice, s)
}
}
return float_slice
}
Using only string functions:
package main
import (
"bufio"
"fmt"
"os"
"strconv"
"strings"
)
func main() {
scanner := bufio.NewScanner(os.Stdin)
var result [][]float64
var txt string
for scanner.Scan() {
txt = scanner.Text()
if len(txt) > 0 {
values := strings.Split(txt, ",")
var row []float64
for _, v := range values {
fl, err := strconv.ParseFloat(strings.Trim(v, " "), 64)
if err != nil {
panic(fmt.Sprintf("Incorrect value for float64 '%v'", v))
}
row = append(row, fl)
}
result = append(result, row)
}
}
fmt.Printf("Result: %v\n", result)
}
Run:
$ printf "1.1,2.2,3.3
3.14,0,5.16
2,45,76.0, 45 , 69" | go run experiment2.go
Result: [[1.1 2.2 3.3] [3.14 0 5.16] [2 45 76 45 69]]
With given input, you can concatenate them to make a JSON string and then unmarshal (deserialize) that:
func main() {
var lines []string
for {
var line string
fmt.Scanln(&line)
if line == "" {
break
}
lines = append(lines, "["+line+"]")
}
all := "[" + strings.Join(lines, ",") + "]"
inputs := [][]float64{}
if err := json.Unmarshal([]byte(all), &inputs); err != nil {
fmt.Println(err)
return
}
fmt.Println(inputs)
}

Comments out of order after adding item to Go AST

The following test attempts to use AST to add fields to a struct. The fields are added correctly, but the comments are added out of order. I gather the position may need to be specified manually, but I've so far drawn a blank finding an answer.
Here's a failing test: http://play.golang.org/p/RID4N30FZK
Here's the code:
package generator
import (
"bytes"
"fmt"
"go/ast"
"go/parser"
"go/printer"
"go/token"
"testing"
)
func TestAst(t *testing.T) {
source := `package a
// B comment
type B struct {
// C comment
C string
}`
fset := token.NewFileSet()
file, err := parser.ParseFile(fset, "", []byte(source), parser.ParseComments)
if err != nil {
t.Error(err)
}
v := &visitor{
file: file,
}
ast.Walk(v, file)
var output []byte
buf := bytes.NewBuffer(output)
if err := printer.Fprint(buf, fset, file); err != nil {
t.Error(err)
}
expected := `package a
// B comment
type B struct {
// C comment
C string
// D comment
D int
// E comment
E float64
}
`
if buf.String() != expected {
t.Error(fmt.Sprintf("Test failed. Expected:\n%s\nGot:\n%s", expected, buf.String()))
}
/*
actual output = `package a
// B comment
type B struct {
// C comment
// D comment
// E comment
C string
D int
E float64
}
`
*/
}
type visitor struct {
file *ast.File
}
func (v *visitor) Visit(node ast.Node) (w ast.Visitor) {
if node == nil {
return v
}
switch n := node.(type) {
case *ast.GenDecl:
if n.Tok != token.TYPE {
break
}
ts := n.Specs[0].(*ast.TypeSpec)
if ts.Name.Name == "B" {
fields := ts.Type.(*ast.StructType).Fields
addStructField(fields, v.file, "int", "D", "D comment")
addStructField(fields, v.file, "float64", "E", "E comment")
}
}
return v
}
func addStructField(fields *ast.FieldList, file *ast.File, typ string, name string, comment string) {
c := &ast.Comment{Text: fmt.Sprint("// ", comment)}
cg := &ast.CommentGroup{List: []*ast.Comment{c}}
f := &ast.Field{
Doc: cg,
Names: []*ast.Ident{ast.NewIdent(name)},
Type: ast.NewIdent(typ),
}
fields.List = append(fields.List, f)
file.Comments = append(file.Comments, cg)
}
I believe I have gotten it to work. As stated in my comment above, the main points required are:
Specifically set the buffer locations including the Slash and NamePos
Use token.File.AddLine to add new lines at specific offsets (calculated using the positions from item 1)
Overallocate the source buffer so token.File.Position (used by printer.Printer and token.File.Addline don't fail range checks on the source buffer
Code:
package main
import (
"bytes"
"fmt"
"go/ast"
"go/parser"
"go/printer"
"go/token"
"testing"
)
func main() {
tests := []testing.InternalTest{{"TestAst", TestAst}}
matchAll := func(t string, pat string) (bool, error) { return true, nil }
testing.Main(matchAll, tests, nil, nil)
}
func TestAst(t *testing.T) {
source := `package a
// B comment
type B struct {
// C comment
C string
}`
buffer := make([]byte, 1024, 1024)
for idx,_ := range buffer {
buffer[idx] = 0x20
}
copy(buffer[:], source)
fset := token.NewFileSet()
file, err := parser.ParseFile(fset, "", buffer, parser.ParseComments)
if err != nil {
t.Error(err)
}
v := &visitor{
file: file,
fset: fset,
}
ast.Walk(v, file)
var output []byte
buf := bytes.NewBuffer(output)
if err := printer.Fprint(buf, fset, file); err != nil {
t.Error(err)
}
expected := `package a
// B comment
type B struct {
// C comment
C string
// D comment
D int
// E comment
E float64
}
`
if buf.String() != expected {
t.Error(fmt.Sprintf("Test failed. Expected:\n%s\nGot:\n%s", expected, buf.String()))
}
}
type visitor struct {
file *ast.File
fset *token.FileSet
}
func (v *visitor) Visit(node ast.Node) (w ast.Visitor) {
if node == nil {
return v
}
switch n := node.(type) {
case *ast.GenDecl:
if n.Tok != token.TYPE {
break
}
ts := n.Specs[0].(*ast.TypeSpec)
if ts.Name.Name == "B" {
fields := ts.Type.(*ast.StructType).Fields
addStructField(v.fset, fields, v.file, "int", "D", "D comment")
addStructField(v.fset, fields, v.file, "float64", "E", "E comment")
}
}
return v
}
func addStructField(fset *token.FileSet, fields *ast.FieldList, file *ast.File, typ string, name string, comment string) {
prevField := fields.List[fields.NumFields()-1]
c := &ast.Comment{Text: fmt.Sprint("// ", comment), Slash: prevField.End() + 1}
cg := &ast.CommentGroup{List: []*ast.Comment{c}}
o := ast.NewObj(ast.Var, name)
f := &ast.Field{
Doc: cg,
Names: []*ast.Ident{&ast.Ident{Name: name, Obj: o, NamePos: cg.End() + 1}},
}
o.Decl = f
f.Type = &ast.Ident{Name: typ, NamePos: f.Names[0].End() + 1}
fset.File(c.End()).AddLine(int(c.End()))
fset.File(f.End()).AddLine(int(f.End()))
fields.List = append(fields.List, f)
file.Comments = append(file.Comments, cg)
}
Example: http://play.golang.org/p/_q1xh3giHm
For Item (3), it is also important to set all the overallocated bytes to spaces (0x20), so that the printer doesn't complain about null bytes when processing them.
I know that this answer might be a little late. But for the benefit of others, I found a reference to this library in the following GitHub issue
https://github.com/golang/go/issues/20744
The library is called dst and it can convert a go ast to dst and vice versa.
https://github.com/dave/dst
In ast, Comments are stored by their byte offset instead of attached to nodes. Dst solves this by attaching the comments to its respective nodes so that re-arranging nodes doesn't break the output/tree.
The library works as advertized and I haven't found any issues so far.
Note: There is also a subpackage called dst/dstutil which is compatible with golang.org/x/tools/go/ast/astutil

Go: Removing accents from strings

I'm new to Go and I'm trying to implement a function to convert accented characters into their non-accented equivalent. I'm attempting to follow the example given in this blog (see the heading 'Performing magic').
What I've attempted to gather from this is:
package main
import (
"fmt"
"unicode"
"bytes"
"code.google.com/p/go.text/transform"
"code.google.com/p/go.text/unicode/norm"
)
func isMn (r rune) bool {
return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}
func main() {
r := bytes.NewBufferString("Your Śtring")
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
r = transform.NewReader(r, t)
fmt.Println(r)
}
It does not work in the slightest and I quite honestly don't know what it means anyway. Any ideas?
Note that Go 1.5 (August 2015) or Go 1.6 (Q1 2016) could introduce a new runes package, with transform operations.
That includes (in runes/example_test.go) a runes.Remove function, which will help transform résumé into resume:
func ExampleRemove() {
t := transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC)
s, _, _ := transform.String(t, "résumé")
fmt.Println(s)
// Output:
// resume
}
This is still being reviewed though (April 2015).
r should be or type io.Reader, and you can't print r like that. First, you need to read the content to a byte slice:
var (
s = "Your Śtring"
b = make([]byte, len(s))
r io.Reader = strings.NewReader(s)
)
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
r = transform.NewReader(r, t)
r.Read(b)
fmt.Println(string(b))
This works, but for some reason it returns me "Your Stri", two bytes less than needed.
This here is the version which actually does what you need, but I'm still not sure why the example from the blog works so strangely.
s := "Yoùr Śtring"
b := make([]byte, len(s))
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
_, _, e := t.Transform(b, []byte(s), true)
if e != nil { panic(e) }
fmt.Println(string(b))

Go - How to convert binary string as text to binary bytes?

I have a text dump file with strings like this one:
x\x9cK\xb42\xb5\xaa.\xb6\xb2\xb0R\xcaK-\x09J\xccKOU
I need to convert them to []byte.
Can someone please suggest how this can be done in Go?
The python equivalent is decode('string_escape').
Here is one way of doing it. Note this isn't a complete decode of the python string_escape format, but may be sufficient given the example you've given.
playground link
package main
import (
"fmt"
"log"
"regexp"
"strconv"
)
func main() {
b := []byte(`x\x9cK\xb42\xb5\xaa.\xb6\xb2\xb0R\xcaK-\x09J\xccKOU`)
re := regexp.MustCompile(`\\x([0-9a-fA-F]{2})`)
r := re.ReplaceAllFunc(b, func(in []byte) []byte {
i, err := strconv.ParseInt(string(in[2:]), 16, 64)
if err != nil {
log.Fatalf("Failed to convert hex: %s", err)
}
return []byte{byte(i)}
})
fmt.Println(r)
fmt.Println(string(r))
}
I did have the idea of using the json decoder, but unfortunately it doesn't understand the \xYY syntax.
Here's how you might approach write a little parser (if you needed to support other esc things in the future):
import (
"fmt"
"encoding/hex"
)
func decode(bs string) ([]byte,error) {
in := []byte(bs)
res := make([]byte,0)
esc := false
for i := 0; i<len(in); i++ {
switch {
case in[i] == '\\':
esc = true
continue
case esc:
switch {
case in[i] == 'x':
b,err := hex.DecodeString(string(in[i+1:i+3]))
if err != nil {
return nil,err
}
res = append(res, b...)
i = i+2
default:
res = append(res, in[i])
}
esc = false
default:
res = append(res, in[i])
}
}
return res,nil
}
playground

Resources