Golang Serialize go-radix Tree to file? - go

I've been working on serializing a radix tree (used for indexing) to a file in golang. The radix tree nodes are storing 6-bit roaring bitmaps (see https://github.com/RoaringBitmap/roaring). The following code is what I am using, and the output I am getting when trying to load it back into memory:
serializedTree := i.index.ToMap()
encodeFile, err := os.Create(fmt.Sprintf("./serialized/%s/%s", appindex.name, i.field))
if err != nil {
panic(err)
}
e := gob.NewEncoder(encodeFile)
err = e.Encode(serializedTree)
encodeFile.Close()
// Turn it back for testing
decodeFile, err := os.Open(fmt.Sprintf("./serialized/%s/%s", appindex.name, i.field))
defer decodeFile.Close()
d := gob.NewDecoder(decodeFile)
decoded := make(map[string]interface{})
err = d.Decode(&decoded)
fmt.Println("before decode", serializedTree)
fmt.Println("after decode", decoded)
if err != nil {
fmt.Println("!!! Error serializing", err)
panic(err)
}
Output:
before decode map[dan:{1822509180252590512} dan1:{6238704462486574203} goodman:{1822509180252590512,6238704462486574203}]
after decode map[]
!!! Error serializing EOF
panic: EOF
goroutine 1 [running]:
main.(*appIndexes).SerializeIndex(0xc000098240)
(I understand the decode is empty because the gob package doesn't modify on EOF error)
I've noticed that when trying with bytes directly, only 15 bytes are being stored on disk (which is way too few). Trying with the encoding/json package with json.Marshall() and json.Unmarshall() and I see 33 bytes stored, but they are loading in empty (the roaring bitmaps are gone):
post encode map[dan:map[] dan1:map[] goodman:map[]]
I feel like this has something to do with the fact that I am trying to serialize a map[string]interface{} rather than something like a map[string]int, but I am still fairly green with golang.
See https://repl.it/#danthegoodman/SelfishMoralCharactermapping#main.go for an example and my testing.

I believe I fixed it by converting the map[string]interface{} into a map[string]*roaring64.Bitmap before writing to disk, then decoding it back into a map[string]*roaring64.Bitmap then converting it back to a map[string]interface{}
m2 := make(map[string]*roaring64.Bitmap)
// Convert m1 to m2
for key, value := range m1 {
m2[key] = value.(*roaring64.Bitmap)
}
fmt.Println("m1", m1)
fmt.Println("m2", m2)
encodeFile, err := os.Create("./test")
if err != nil {
panic(err)
}
e := gob.NewEncoder(encodeFile)
err = e.Encode(m2)
encodeFile.Close()
// Turn it back for testing
decodeFile, err := os.Open("./test")
defer decodeFile.Close()
d := gob.NewDecoder(decodeFile)
decoded := make(map[string]*roaring64.Bitmap)
err = d.Decode(&decoded)
fmt.Println("before decode", m2)
fmt.Println("after decode", decoded)
if err != nil {
fmt.Println("!!! Error serializing", err)
panic(err)
}
m3 := make(map[string]interface{})
// Convert m2 to m3
for key, value := range m2 {
m3[key] = value
}
afterDecTree := radix.NewFromMap(m3)
See https://repl.it/#danthegoodman/VictoriousUtterMention#main.go for a working example

Related

How to read arbitrary amounts of data directly from a file in Go?

Without reading the contents of a file into memory, how can I read "x" bytes from the file so that I can specify what x is for every separate read operation?
I see that the Read method of various Readers takes a byte slice of a certain length and I can read from a file into that slice. But in that case the size of the slice is fixed, whereas what I would like to do, ideally, is something like:
func main() {
f, err := os.Open("./file.txt")
if err != nil {
panic(err)
}
someBytes := f.Read(2)
someMoreBytes := f.Read(4)
}
bytes.Buffer has a Next method which behaves very closely to what I would want, but it requires an existing buffer to work, whereas I'm hoping to read an arbitrary amount of bytes from a file without needing to read the whole thing into memory.
What is the best way to accomplish this?
Thank you for your time.
Use this function:
// readN reads and returns n bytes from the reader.
// On error, readN returns the partial bytes read and
// a non-nil error.
func readN(r io.Reader, n int) ([]byte, error) {
// Allocate buffer for result
b := make([]byte, n)
// ReadFull ensures buffer is filled or error is returned.
n, err := io.ReadFull(r, b)
return b[:n], err
}
Call like this:
someBytes, err := readN(f, 2)
if err != nil { /* handle error here */
someMoreBytes := readN(f, 4)
if err != nil { /* handle error here */
you can do something like this:
f, err := os.Open("/tmp/dat")
check(err)
b1 := make([]byte, 5)
n1, err := f.Read(b1)
check(err)
fmt.Printf("%d bytes: %s\n", n1, string(b1[:n1]))
for more reading please check site.

Processing data in chunks with io.ReadFull results in corrupted file?

I'm trying to download and decrypt HLS streams by using io.ReadFull to process the data in chunks to conserve memory:
Irrelevant parts of code has been left out for simplicity.
func main() {
f, _ := os.Create(out.ts)
for _, v := range mediaPlaylist {
resp, _ := http.Get(v.URI)
for {
r, err := decryptHLS(key, iv, resp.Body)
if err != nil && err == io.EOF {
break
else if err != nil && err != io.ErrUnexpectedEOF {
panic(err)
}
io.Copy(f, r)
}
}
}
func decryptHLS(key []byte, iv []byte, r io.Reader) (io.Reader, error) {
block, _ := aes.NewCipher(key)
buf := make([]byte, 8192)
mode := cipher.NewCBCDecrypter(block, iv)
n, err := io.ReadFull(r, buf)
if err != nil && err != io.ErrUnexpectedEOF {
return nil, err
}
mode.CryptBlocks(buf, buf)
return bytes.NewReader(buf[:n]), err
}
At first this seems to work as file size is correct and no errors during download,
but the video is corrupted. Not completely as the file is still recognized as a video, but image and sound is distorted.
If I change the code to use ioutil.ReadAll instead, the final video files will no longer be corrupted:
func main() {
f, _ := os.Create(out.ts)
for _, v := range mediaPlaylist {
resp, _ := http.Get(v.URI)
segment, _ := ioutil.ReadAll(resp.Body)
r, _ := decryptHLS(key, iv, &segment)
io.Copy(f, r)
}
}
func decryptHLS(key []byte, iv []byte, s *[]byte) io.Reader {
block, _ := aes.NewCipher(key)
mode := cipher.NewCBCDecrypter(block, iv)
mode.CryptBlocks(*s, *s)
return bytes.NewReader(*s)
}
Any ideas why it works correctly when reading the entire segment into memory, and not when using io.ReadFull and processing it in chunks?
Internally, CBCDecrypter makes a copy of your iv, so subsequent blocks start with the initial IV rather than the one that's been mutated by previous decryptions.
Create the decrypter once, and you should be able to keep re-using it to decrypt block by block (assuming the block size is a multiple of the block size expected by this crypto algorithm).

Base64 encode/decode results in corrupted output

I'm trying to write some convenience wrapper funcs that base64 encodes and decodes byte slices. (Can't understand why this is not conveniently provided in the stdlib.)
However this code (in playground):
func b64encode(b []byte) []byte {
encodedData := &bytes.Buffer{}
encoder := base64.NewEncoder(base64.URLEncoding, encodedData)
defer encoder.Close()
encoder.Write(b)
return encodedData.Bytes()
}
func b64decode(b []byte) ([]byte, error) {
dec := base64.NewDecoder(base64.URLEncoding, bytes.NewReader(b))
buf := &bytes.Buffer{}
_, err := io.Copy(buf, dec)
if err != nil {
return nil, err
}
return buf.Bytes(), nil
}
func main() {
b := []byte("hello")
e := b64encode(b)
d, err := b64decode(e)
if err != nil {
log.Fatalf("could not decode: %s", err)
}
fmt.Println(string(d))
}
generates truncated output when I try to print it:
hel
What's going on?
The defer executes when the function ends. That is AFTER the return statement has been evaluated.
The following works: https://play.golang.org/p/sYn-W6fZh1
func b64encode(b []byte) []byte {
encodedData := &bytes.Buffer{}
encoder := base64.NewEncoder(base64.URLEncoding, encodedData)
encoder.Write(b)
encoder.Close()
return encodedData.Bytes()
}
That being said, if it really is all in memory, you can avoid creating an encoder entirely. Instead, you can do something like:
func b64encode(b []byte) []byte {
ret := make([]byte, base64.URLEncoding.EncodedLen(len(b)))
base64.URLEncoding.Encode(ret, b)
return ret
}
An added benefit of doing it this way it it is more efficient since it only needs to allocate once. It also allows you to no longer ignore errors in the Write and Close methods.

Add prefix to io.Reader

I've written a little server which receives a blob of data in the form of an io.Reader, adds a header and streams the result back to the caller.
My implementation isn't particularly efficient as I'm buffering the blob's data in-memory so that I can calculate the blob's length, which needs to form part of the header.
I've seen some examples of io.Pipe() with io.TeeReader but they're more for splitting an io.Reader into two, and writing them away in parallel.
The blobs I'm dealing with are around 100KB, so not huge but if my server gets busy, memory's going to quickly become an issue...
Any ideas?
func addHeader(in io.Reader) (out io.Reader, err error) {
buf := new(bytes.Buffer)
if _, err = io.Copy(buf, in); err != nil {
return
}
header := bytes.NewReader([]byte(fmt.Sprintf("header:%d", buf.Len())))
return io.MultiReader(header, buf), nil
}
I appreciate it's not a good idea to return interfaces from functions but this code isn't destined to become an API, so I'm not too concerned with that bit.
In general, the only way to determine the length of data in an io.Reader is to read until EOF. There are ways to determine the length of the data for specific types.
func addHeader(in io.Reader) (out io.Reader, err error) {
n := 0
switch v := in.(type) {
case *bytes.Buffer:
n = v.Len()
case *bytes.Reader:
n = v.Len()
case *strings.Reader:
n = v.Len()
case io.Seeker:
cur, err := v.Seek(0, 1)
if err != nil {
return nil, err
}
end, err := v.Seek(0, 2)
if err != nil {
return nil, err
}
_, err = v.Seek(cur, 0)
if err != nil {
return nil, err
}
n = int(end - cur)
default:
var buf bytes.Buffer
if _, err := buf.ReadFrom(in); err != nil {
return nil, err
}
n = buf.Len()
in = &buf
}
header := strings.NewReader(fmt.Sprintf("header:%d", n))
return io.MultiReader(header, in), nil
}
This is similar to how the net/http package determines the content length of the request body.

Reading from serial port with while-loop

I’ve written a short program in Go to communicate with a sensor through a serial port:
package main
import (
"fmt"
"github.com/tarm/goserial"
"time"
)
func main() {
c := &serial.Config{Name: "/dev/ttyUSB0", Baud: 9600}
s, err := serial.OpenPort(c)
if err != nil {
fmt.Println(err)
}
_, err = s.Write([]byte("\x16\x02N0C0 G A\x03\x0d\x0a"))
if err != nil {
fmt.Println(err)
}
time.Sleep(time.Second/2)
buf := make([]byte, 40)
n, err := s.Read(buf)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(buf[:n]))
s.Close()
}
It works fine, but after writing to the port I have to wait about half a second before I can start reading from it. I would like to use a while-loop instead of time.Sleep to read all incoming data. My attempt doesn’t work:
buf := make([]byte, 40)
n := 0
for {
n, _ := s.Read(buf)
if n > 0 {
break
}
}
fmt.Println(string(buf[:n]))
I guess buf gets overwritten after every loop pass. Any suggestions?
Your problem is that Read() will return whenever it has some data - it won't wait for all the data. See the io.Reader specification for more info
What you want to do is read until you reach some delimiter. I don't know exactly what format you are trying to use, but it looks like maybe \x0a is the end delimiter.
In which case you would use a bufio.Reader like this
reader := bufio.NewReader(s)
reply, err := reader.ReadBytes('\x0a')
if err != nil {
panic(err)
}
fmt.Println(reply)
Which will read data until the first \x0a.
I guess buf gets overwritten after every loop pass. Any suggestions?
Yes, buf will get overwritten with every call to Read().
A timeout on the file handle would be the approach I would take.
s, _ := os.OpenFile("/dev/ttyS0", syscall.O_RDWR|syscall.O_NOCTTY|syscall.O_NONBLOCK, 0666)
t := syscall.Termios{
Iflag: syscall.IGNPAR,
Cflag: syscall.CS8 | syscall.CREAD | syscall.CLOCAL | syscall.B115200,
Cc: [32]uint8{syscall.VMIN: 0, syscall.VTIME: uint8(20)}, //2.0s timeout
Ispeed: syscall.B115200,
Ospeed: syscall.B115200,
}
// syscall
syscall.Syscall6(syscall.SYS_IOCTL, uintptr(s.Fd()),
uintptr(syscall.TCSETS), uintptr(unsafe.Pointer(&t)),
0, 0, 0)
// Send message
n, _ := s.Write([]byte("Test message"))
// Receive reply
for {
buf := make([]byte, 128)
n, err = s.Read(buf)
if err != nil { // err will equal io.EOF
break
}
fmt.Printf("%v\n", string(buf))
}
Also note, if there is no more data read and there is no error, os.File.Read() will return an error of io.EOF,
as you can see here.

Resources