Encoding a string to its ASCII representation on varying length of strings - go

I want to encode a string in Go using ASCII encoding like my C# function below:
public static byte[] StrToByteArray(string str)
{
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
return encoding.GetBytes(str);
}
I know how to do it using the below function:
import (
"encoding/ascii85"
"fmt"
)
func main() {
dst := make([]byte, 25, 25)
dst2 := make([]byte, 25, 25)
ascii85.Encode(dst, []byte("Hello, playground"))
fmt.Println(dst)
ascii85.Decode(dst2, dst, false)
fmt.Println(string(dst2))
}
Currently it is hard coded to a length of 25. How could I adjust the length based on the size of the string?

ascii85.MaxEncodedLen() returns the maximum number of output bytes for the given number of input bytes. You may use this upper estimation.
The actual number of bytes used / written is returned ascii85.Encode(). If you passed a bigger slice to Encode(), you must use this to slice the destination slice, bytes beyond this are "garbage".
Same goes for ascii85.Decode(): it returns the number of written bytes, you must use that to slice the destination if you passed a bigger slice.
Also since decoding may fail (invalid input), you should also check the returned error.
Also since it's not guaranteed the given input will result in an output that is a multiple of the used 32-bit blocks, pass flush=true to consume the given input slice (and not wait for more input).
The final, corrected code:
s := []byte("Hello, playground")
maxlen := ascii85.MaxEncodedLen(len(s))
dst := make([]byte, maxlen)
n := ascii85.Encode(dst, s)
dst = dst[:n]
fmt.Println(string(dst))
dst2 := make([]byte, maxlen)
n, _, err := ascii85.Decode(dst2, dst, true)
if err != nil {
panic(err)
}
dst2 = dst2[:n]
fmt.Println(string(dst2))
Which outputs (try it on the Go Playground):
87cURD_*#MCghU%Ec6)<A,
Hello, playground

System.Text.ASCIIEncoding and the encoding/ascii85 package do different things. System.Text.ASCIIEncoding encodes text to ASCII by replacing characters outside the ASCII range with ?. The encoding/ascii85 package encodes binary data to ascii85, a.k.a. base85.
The following Go function replicates the C# function in the question:
func StrToByteArray(str string) []byte {
var result []byte
for _, r := range str {
if r >= utf8.RuneSelf {
r = '?'
}
result = append(result, byte(r))
}
return result
}
If you know that the string only contains ASCII characters, then a conversion will do the trick:
func StrToByteArray(str string) []byte { return []byte(str) }

Related

Writing binary data to a file (from hexdump)

I try to do something very simple in Go but I do not manage to find any resources.
I receive an hexadump and I want to write it to a file but the content of both files (src and dst) do not match at all. Currently the only way I have find it's to manually add \x every 2 characters.
I tried to loop over my string and add \x the string looks identical but output is very different.
This code manually works:
binary.Write(f, binary.LittleEndian, []byte("\x00\x00\x00\x04\x0A\xFA\x64\xA7\x00\x03\x31\x30"))
But I did not manage to make it from string "000000040afa64a700033130"...
What i currently do (this is what I do in python3):
text := "000000040afa64a700033130"
j := 0
f, _ := os.OpenFile("gotest", os.O_WRONLY|os.O_CREATE, 0600)
for i := 0; i < len(text); i += 2 {
if (i + 2) <= len(text) {
j = i + 2
}
value, _ := strconv.ParseInt(hex, 16, 8)
binary.Write(f, binary.LittleEndian,value)
s = append(s, value)
}
If your hex data is in the from of a string and you want to write the raw bytes you'll have to convert it first, the easier way would be to use hex.Decode.
import (
"encoding/hex"
"io/ioutil"
)
func foo() {
stringData := []byte("48656c6c6f20476f7068657221")
hexData := make([]byte, hex.DecodedLen(len(stringData)))
_, err := hex.Decode(stringData, hexData)
// handle err
err := ioutil.WriteFile("filename", hexData, 0644)
// handle err
}
Based on your use you could swap over to using ioutil.WriteFile. It writes the given byte slice to a file, creating the file if it doesn't exist or truncating it in the case it already exists.

How to convert []byte to C hex format 0x...?

func main() {
str := hex.EncodeToString([]byte("go"))
fmt.Println(str)
}
this code return 676f. How I can print C-like 0x67, 0x6f ?
I couldn't find any function in the hex module that would achieve what you want. However, we can use a custom buffer to write in our desired format.
package main
import (
"bytes"
"fmt"
)
func main() {
originalBytes := []byte("go")
result := make([]byte, 4*len(originalBytes))
buff := bytes.NewBuffer(result)
for _, b := range originalBytes {
fmt.Fprintf(buff, "0x%02x ", b)
}
fmt.Println(buff.String())
}
Runnable example: https://goplay.space/#fyhDJ094GgZ
Here's a solution that produces the result as specified in the question. Specifically, there's a ", " between each byte and no trailing space.
p := []byte("go")
var buf strings.Builder
if len(p) > 0 {
buf.Grow(len(p)*6 - 2)
for i, b := range p {
if i > 0 {
buf.WriteString(", ")
}
fmt.Fprintf(&buf, "0x%02x", b)
}
}
result := buf.String()
The strings.Builder type is used to avoid allocating memory on the final conversion to a string. Another answer uses bytes.Buffer that does allocate memory at this step.
The the builder is initially sized large enough to hold the representation of each byte and the separators. Another answer ignores the size of the separators.
Try this on the Go playground.

bytes.String() vs bytes.Bytes() in Go

Consider a text file like this:
Some text
here.
---
More text
another line.
---
Third part of text.
I want to split it into three parts, divided by the --- separator. The parts should be stored in a map.
Now, the exact same programs with different types.
When I use string, everything works fine:
KEY: 0
Some text
here.
KEY: 1
More text
another line.
KEY: 2
Third part of text.
https://play.golang.org/p/IcGdoUNcTEe
When I use []byte, things gets messed up:
KEY: 0
Third part of teKEY: 1
Third part of text.
ne.
KEY: 2
Third part of text.
https://play.golang.org/p/jqLhCrqsvOs
Why?
Program 1 (string):
func main() {
parts := parseParts([]byte(input))
for k, v := range parts {
fmt.Printf("KEY: %d\n%s", k, v)
}
}
func parseParts(input []byte) map[int]string {
parts := map[int]string{}
s := bufio.NewScanner(bytes.NewReader(input))
buf := bytes.Buffer{}
i := 0
for s.Scan() {
if s.Text() == "---" {
parts[i] = buf.String()
buf.Reset()
i++
continue
}
buf.Write(s.Bytes())
buf.WriteString("\n")
}
parts[i] = buf.String()
return parts
}
Program 2 ([]byte):
func main() {
parts := parseParts([]byte(input))
for k, v := range parts {
fmt.Printf("KEY: %d\n%s", k, v)
}
}
func parseParts(input []byte) map[int]string {
parts := map[int]string{}
s := bufio.NewScanner(bytes.NewReader(input))
buf := bytes.Buffer{}
i := 0
for s.Scan() {
if s.Text() == "---" {
parts[i] = buf.String()
buf.Reset()
i++
continue
}
buf.Write(s.Bytes())
buf.WriteString("\n")
}
parts[i] = buf.String()
return parts
}
In the string version,
parts[i] = buf.String()
sets parts[i] to a new string every time. In the []byte version,
parts[i] = buf.Bytes()
sets parts[i] to a byte slice backed by the same array every time. The contents of the backing array are the same for all three slices, but the lengths match the length when created, which is why all three slices show the same content but cut off at different places.
You could replace the byte slice line
parts[i] = buf.Bytes()
with something like this:
bb := buf.Bytes()
b := make([]byte, len(bb))
copy(b, bb)
parts[i] = b
in order to get the behavior to match the string version. But the string version is easier and better matches what you seem to be trying to do.
The difference is that bytes.Buffer.String copies the memory, while bytes.Buffer.Bytes does not. Quoting the documentation,
The slice is valid for use only until the next buffer modification (that is, only until the next call to a method like Read, Write, Reset, or Truncate).

Golang: How to convert an image.image to uint16

I am trying to use the go-skeltrack library with some depth images I have (Not using freenect). For that I need to modify the provided example by replacing the kinect images by my own. For that I have to read an image and convert it later to an []uint16 variable. The code which I tried is:
file, err := os.Open("./images/4.png")
if err != nil {
fmt.Println("4.png file not found!")
os.Exit(1)
}
defer file.Close()
fileInfo, _ := file.Stat()
var size int64 = fileInfo.Size()
bytes := make([]byte, size)
// read file into bytes
buffer := bufio.NewReader(file)
_, err = buffer.Read(bytes)
integerImage := binary.BigEndian.Uint16(bytes)
onDepthFrame(integerImage)
Where onDepthFrame is a function which has the form
func onDepthFrame(depth []uint16).
But I am getting the following error while compiling:
./skeltrackOfflineImage.go:155: cannot use integerImage (type uint16) as type []uint16 in argument to onDepthFrame
Which of course refers to the fact that I generated a single integer instead of an array. I am quite confused about the way that Go data types conversion works. Please help!
Thanks in advance for your help.
Luis
binary.BigEndian.Uint16 converts two bytes (in a slice) to a 16-bit value using big endian byte order. If you want to convert bytes to a slice of uint16, you should use binary.Read:
// This reads 10 uint16s from file.
slice := make([]uint16, 10)
err := binary.Read(file, binary.BigEndian, slice)
It sounds like you're looking to get raw pixels. If that's the case, I don't recommend reading the file as binary directly. It means you would need to parse the file format yourself since image files contain more information than just the raw pixel values. There are already tools in the image package to deal with that.
This code should get you on the right track. It reads RGBA values, so it ends up with a 1D array of uint8's of length width * height * 4, since there are four values per pixel.
https://play.golang.org/p/WUgHQ3pRla
import (
"bufio"
"fmt"
"image"
"os"
// for decoding png files
_ "image/png"
)
// RGBA attempts to load an image from file and return the raw RGBA pixel values.
func RGBA(path string) ([]uint8, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
img, _, err := image.Decode(bufio.NewReader(file))
if err != nil {
return nil, err
}
switch trueim := img.(type) {
case *image.RGBA:
return trueim.Pix, nil
case *image.NRGBA:
return trueim.Pix, nil
}
return nil, fmt.Errorf("unhandled image format")
}
I'm not entirely sure where the uint16 values you need should come from, but presumably it's data per pixel, so the code should be very similar to this except the switch on trueim should likely check for something other than image.RGBA. Take a look at the other image types in https://golang.org/pkg/image

String to UCS-2

I want to translate in Go my python program to convert an unicode string to a UCS-2 HEX string.
In python, it's quite simple:
u"Bien joué".encode('utf-16-be').encode('hex')
-> 004200690065006e0020006a006f007500e9
I am a beginner in Go and the simplest way I found is:
package main
import (
"fmt"
"strings"
)
func main() {
str := "Bien joué"
fmt.Printf("str: %s\n", str)
ucs2HexArray := []rune(str)
s := fmt.Sprintf("%U", ucs2HexArray)
a := strings.Replace(s, "U+", "", -1)
b := strings.Replace(a, "[", "", -1)
c := strings.Replace(b, "]", "", -1)
d := strings.Replace(c, " ", "", -1)
fmt.Printf("->: %s", d)
}
str: Bien joué
->: 004200690065006E0020006A006F007500E9
Program exited.
I really think it's clearly not efficient. How can-I improve it?
Thank you
Make this conversion a function then you can easily improve the conversion algorithm in the future. For example,
package main
import (
"fmt"
"strings"
"unicode/utf16"
)
func hexUTF16FromString(s string) string {
hex := fmt.Sprintf("%04x", utf16.Encode([]rune(s)))
return strings.Replace(hex[1:len(hex)-1], " ", "", -1)
}
func main() {
str := "Bien joué"
fmt.Println(str)
hex := hexUTF16FromString(str)
fmt.Println(hex)
}
Output:
Bien joué
004200690065006e0020006a006f007500e9
NOTE:
You say "convert an unicode string to a UCS-2 string" but your Python example uses UTF-16:
u"Bien joué".encode('utf-16-be').encode('hex')
The Unicode Consortium
UTF-16 FAQ
Q: What is the difference between UCS-2 and UTF-16?
A: UCS-2 is obsolete terminology which refers to a Unicode
implementation up to Unicode 1.1, before surrogate code points and
UTF-16 were added to Version 2.0 of the standard. This term should now
be avoided.
UCS-2 does not describe a data format distinct from UTF-16, because
both use exactly the same 16-bit code unit representations. However,
UCS-2 does not interpret surrogate code points, and thus cannot be
used to conformantly represent supplementary characters.
Sometimes in the past an implementation has been labeled "UCS-2" to
indicate that it does not support supplementary characters and doesn't
interpret pairs of surrogate code points as characters. Such an
implementation would not handle processing of character properties,
code point boundaries, collation, etc. for supplementary characters.
For anything other than trivially short input (and possibly even then), I'd use the golang.org/x/text/encoding/unicode package to convert to UTF-16 (as #peterSo and #JimB point out, slightly different from obsolete UCS-2).
The advantage (over unicode/utf16) of using this (and the golang.org/x/text/transform package) is that you get BOM support, big or little endian, and that you can encode/decode short strings or bytes, but you can also apply this as a filter to an io.Reader or to an io.Writer to transform your data as you process it instead of all up front (i.e. for a large stream of data you don't need to have it all in memory at once).
E.g.:
package main
import (
"bytes"
"fmt"
"io"
"io/ioutil"
"log"
"strings"
"golang.org/x/text/encoding/unicode"
"golang.org/x/text/transform"
)
const input = "Bien joué"
func main() {
// Get a `transform.Transformer` for encoding.
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
t := e.NewEncoder()
// For decoding, allows a Byte Order Mark at the start to
// switch to corresponding Unicode decoding (UTF-8, UTF-16BE, or UTF-16LE)
// otherwise we use `e` (UTF-16BE without BOM):
t2 := unicode.BOMOverride(e.NewDecoder())
_ = t2 // we don't show/use this
// If you have a string:
str := input
outstr, n, err := transform.String(t, str)
if err != nil {
log.Fatal(err)
}
fmt.Printf("string: n=%d, bytes=%02x\n", n, []byte(outstr))
// If you have a []byte:
b := []byte(input)
outbytes, n, err := transform.Bytes(t, b)
if err != nil {
log.Fatal(err)
}
fmt.Printf("bytes: n=%d, bytes=%02x\n", n, outbytes)
// If you have an io.Reader for the input:
ir := strings.NewReader(input)
r := transform.NewReader(ir, t)
// Now just read from r as you normal would and the encoding will
// happen as you read, good for large sources to avoid pre-encoding
// everything. Here we'll just read it all in one go though which negates
// that benefit (normally avoid ioutil.ReadAll).
outbytes, err = ioutil.ReadAll(r)
if err != nil {
log.Fatal(err)
}
fmt.Printf("reader: len=%d, bytes=%02x\n", len(outbytes), outbytes)
// If you have an io.Writer for the output:
var buf bytes.Buffer
w := transform.NewWriter(&buf, t)
_, err = fmt.Fprint(w, input) // or io.Copy from an io.Reader, or whatever
if err != nil {
log.Fatal(err)
}
fmt.Printf("writer: len=%d, bytes=%02x\n", buf.Len(), buf.Bytes())
}
// Whichever of these you need you could of
// course put in a single simple function. E.g.:
// NewUTF16BEWriter returns a new writer that wraps w
// by transforming the bytes written into UTF-16-BE.
func NewUTF16BEWriter(w io.Writer) io.Writer {
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
return transform.NewWriter(w, e.NewEncoder())
}
// ToUTFBE converts UTF8 `b` into UTF-16-BE.
func ToUTF16BE(b []byte) ([]byte, error) {
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
out, _, err := transform.Bytes(e.NewEncoder(), b)
return out, err
}
Gives:
string: n=10, bytes=004200690065006e0020006a006f007500e9
bytes: n=10, bytes=004200690065006e0020006a006f007500e9
reader: len=18, bytes=004200690065006e0020006a006f007500e9
writer: len=18, bytes=004200690065006e0020006a006f007500e9
The standard library has the built-in utf16.Encode() (https://golang.org/pkg/unicode/utf16/#Encode) function for this purpose.

Resources