limitation on bytes.Buffer? - go

I am trying to gzip a slice of bytes using the package "compress/gzip". I am writing to a bytes.Buffer and I am writing 45976 bytes, when I am trying to uncompress the content using a gzip.reader and then reader function - I find that the not all of the content is recovered. Is there some limitations to bytes.buffer? and is it a way to by pass or alter this? here is my code (edit):
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if(err!=nil){
log.Fatal(err)
}
w.Close()
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
j, err := r.Read(b2)
if(err!=nil){
log.Fatal(err)
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
output from testing (with a chosen string as long_string) would give
Wrote: 45976, Read 32768

Continue reading to get the remaining 13208 bytes. The first read returns 32768 bytes, the second read returns 13208 bytes, and the third read returns zero bytes and EOF.
For example,
package main
import (
"bytes"
"compress/gzip"
"fmt"
"io"
"log"
)
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i, err := w.Write([]byte(long_string))
if err != nil {
log.Fatal(err)
}
w.Close()
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
j := 0
for {
n, err := r.Read(b2[:cap(b2)])
b2 = b2[:n]
j += n
if err != nil {
if err != io.EOF {
log.Fatal(err)
}
if n == 0 {
break
}
}
fmt.Println(len(b2))
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
var long_string string
func main() {
long_string = string(make([]byte, 45976))
compress_and_uncompress()
}
Output:
32768
13208
Wrote: 45976 Read: 45976

Use ioutil.ReadAll. The contract for io.Reader says it doesn't have to return all the data and there is a good reason for it not to to do with sizes of internal buffers. ioutil.ReadAll works like io.Reader but will read until EOF.
Eg (untested)
import "io/ioutil"
func compress_and_uncompress() {
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if err!=nil {
log.Fatal(err)
}
w.Close()
r, _ := gzip.NewReader(&buf)
b2, err := ioutil.ReadAll(r)
if err!=nil {
log.Fatal(err)
}
r.Close()
fmt.Println("Wrote:", i, "Read:", len(b2))
}

If the read from gzip.NewReader does not return the whole expected slice. You can just keep re-reading until you have recieved all the data in the buffer.
Regarding you problem where if you re-read the subsequent reads did not append to the end of the slice, but instead at the beginning; the answer can be found in the implementation of gzip's Read function, which includes
208 z.digest.Write(p[0:n])
This will result in an "append" at the beginning of the string.
This can be solves in this manner
func compress_and_uncompress(long_string string) {
// Writer
var buf bytes.Buffer
w := gzip.NewWriter(&buf)
i,err := w.Write([]byte(long_string))
if(err!=nil){
log.Fatal(err)
}
w.Close()
// Reader
var j, k int
b2 := make([]byte, 80000)
r, _ := gzip.NewReader(&buf)
for j=0 ; ; j+=k {
k, err = r.Read(b2[j:]) // Add the offset here
if(err!=nil){
if(err != io.EOF){
log.Fatal(err)
} else{
break
}
}
}
r.Close()
fmt.Println("Wrote:", i, "Read:", j)
}
The result will be:
Wrote: 45976 Read: 45976
Also after testing with a string of 45976 characters i can confirm that the output is in exactly the same manner as the input, where the second part is correctly appended after the first part.
Source for gzip.Read: http://golang.org/src/pkg/compress/gzip/gunzip.go?s=4633:4683#L189

Related

Binary Encoding/Decoding File in Golang Gives Different Checksum

I'm working on encoding and decoding files in golang. I specifically do need the 2D array that I'm using, this is just test code to show the point. I'm not entirely sure what I'm doing wrong, I'm attempting to convert the file into a list of uint32 numbers and then take those numbers and convert them back to a file. The problem is that when I do it the file looks fine but the checksum doesn't line up. I suspect that I'm doing something wrong in the conversion to uint32. I have to do the switch/case because I have no way of knowing how many bytes I'll read for sure at the end of a given file.
package main
import (
"bufio"
"encoding/binary"
"fmt"
"io"
"os"
)
const (
headerSeq = 8
body = 24
)
type part struct {
Seq int
Data uint32
}
func main() {
f, err := os.Open("speech.pdf")
if err != nil {
panic(err)
}
defer f.Close()
reader := bufio.NewReader(f)
b := make([]byte, 4)
o := make([][]byte, 0)
var value uint32
for {
n, err := reader.Read(b)
if err != nil {
if err != io.EOF {
panic(err)
}
}
if n == 0 {
break
}
fmt.Printf("len array %d\n", len(b))
fmt.Printf("len n %d\n", n)
switch n {
case 1:
value = uint32(b[0])
case 2:
value = uint32(uint32(b[1]) | uint32(b[0])<<8)
case 3:
value = uint32(uint32(b[2]) | uint32(b[1])<<8 | uint32(b[0])<<16)
case 4:
value = uint32(uint32(b[3]) | uint32(b[2])<<8 | uint32(b[1])<<16 | uint32(b[0])<<24)
}
fmt.Println(value)
bs := make([]byte, 4)
binary.BigEndian.PutUint32(bs, value)
o = append(o, bs)
}
fo, err := os.OpenFile("test.pdf", os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0600)
if err != nil {
panic(err)
}
defer fo.Close()
for _, ba := range o {
_, err := fo.Write(ba)
if err != nil {
panic(err)
}
}
}
So, you want to write and read arrays of varying length in a file.
import "encoding/binary"
// You need a consistent byte order for reading and writing multi-byte data types
const order = binary.LittleEndian
var dataToWrite = []byte{ ... ... ... }
var err error
// To write a recoverable array of varying length
var w io.Writer
// First, encode the length of data that will be written
err = binary.Write(w, order, int64(len(dataToWrite)))
// Check error
err = binary.Write(w, order, dataToWrite)
// Check error
// To read a variable length array
var r io.Reader
var dataLen int64
// First, we need to know the length of data to be read
err = binary.Read(r, order, &dataLen)
// Check error
// Allocate a slice to hold the expected amount of data
dataReadIn := make([]byte, dataLen)
err = binary.Read(r, order, dataReadIn)
// Check error
This pattern works not just with byte, but any other fixed size data type. See binary.Write for specifics about the encoding.
If the size of encoded data is a big concern, you can save some bytes by storing the array length as a varint with binary.PutVarint and binary.ReadVarint

Read line of numbers in Go

I have the following input, where on the first line is N - count of numbers, and on the second line N numbers, separated by space:
5
2 1 0 3 4
In Python I can read numbers without specifying its count (N):
_ = input()
numbers = list(map(int, input().split()))
How can I do the same in Go? Or I have to know exactly how many numbers are?
You can iterate through a file line-by-line using bufio, and the strings module can split a string into a slice. So that gets us something like:
package main
import (
"bufio"
"fmt"
"os"
"strconv"
"strings"
)
func main() {
readFile, err := os.Open("data.txt")
defer readFile.Close()
if err != nil {
fmt.Println(err)
}
fileScanner := bufio.NewScanner(readFile)
fileScanner.Split(bufio.ScanLines)
for fileScanner.Scan() {
// get next line from the file
line := fileScanner.Text()
// split it into a list of space-delimited tokens
chars := strings.Split(line, " ")
// create an slice of ints the same length as
// the chars slice
ints := make([]int, len(chars))
for i, s := range chars {
// convert string to int
val, err := strconv.Atoi(s)
if err != nil {
panic(err)
}
// update the corresponding position in the
// ints slice
ints[i] = val
}
fmt.Printf("%v\n", ints)
}
}
Which given your sample data will output:
[5]
[2 1 0 3 4]
Since you know the delimiter and you only have 2 lines, this is also a more compact solution:
package main
import (
"fmt"
"os"
"regexp"
"strconv"
"strings"
)
func main() {
parts, err := readRaw("data.txt")
if err != nil {
panic(err)
}
n, nums, err := toNumbers(parts)
if err != nil {
panic(err)
}
fmt.Printf("%d: %v\n", n, nums)
}
// readRaw reads the file in input and returns the numbers inside as a slice of strings
func readRaw(fn string) ([]string, error) {
b, err := os.ReadFile(fn)
if err != nil {
return nil, err
}
return regexp.MustCompile(`\s`).Split(strings.TrimSpace(string(b)), -1), nil
}
// toNumbers plays with the input string to return the data as a slice of int
func toNumbers(parts []string) (int, []int, error) {
n, err := strconv.Atoi(parts[0])
if err != nil {
return 0, nil, err
}
nums := make([]int, 0)
for _, p := range parts[1:] {
num, err := strconv.Atoi(p)
if err != nil {
return n, nums, err
}
nums = append(nums, num)
}
return n, nums, nil
}
The output out be:
5: [2 1 0 3 4]

Slice automatically be sorted?

While I want to create my own pipeline to practice with goroutines, there's something particularly weird.
I use the random perm function to generate some int numbers, randomly obviously, I write them to IO writer and then read them form IO reader, cuz its binary source so I print them and they are sorted!!
Here's the code:
func RandomSource(tally int) chan int {
out := make(chan int)
sli := rand.Perm(tally)
fmt.Println(sli)
go func() {
for num := range sli {
out <- num
}
close(out)
}()
return out
}
func ReaderSource(reader io.Reader) chan int {
out := make(chan int)
go func() {
buffer := make([]byte, 8)
for ; ; {
n, err := reader.Read(buffer)
if n > 0 {
v := int(binary.BigEndian.Uint64(buffer))
out <- v
}
if err != nil {
break
}
}
close(out)
}()
return out
}
func WriterSink(writer io.Writer, in chan int) {
for v := range in {
buffer := make([]byte, 8)
binary.BigEndian.PutUint64(
buffer, uint64(v))
writer.Write(buffer)
}
}
func main() {
fileName := "small.in"
file, err := os.Create(fileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
p := RandomSource(500)
WriterSink(file, p)
file, err = os.Open(fileName)
if err != nil {
log.Fatal(err)
}
defer file.Close()
p = ReaderSource(file)
for v := range p {
fmt.Println(v)
}
}
range returns an index as the first value for an array or slice, which always goes from 0 up to len - 1. Use for _, num := range sli { if you want to iterate over the values themselves rather than the set of indices.

reader.ReadLine() doesn't advance after a scanner.Scan() call

The code below reads its values from this file:
2 3\n
1.0 2.0 3.0\n
-1.0 -2.0 -3.0\n
And should print:
[ {1 2 3}, {-1 -2 -3} ]
But instead I get this:
[{2 [31 2 3]} {0 []}] strconv.ParseFloat: parsing "3.0-1.0": invalid syntax
It seems that the reader.ReadLine() stays at the same location. Is there a simpler way to scan lines, then values inside each line?
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strconv"
"strings"
)
type Example struct {
classLabel int
attributes []float64
}
func NewExample(classLabel int, attributes []float64) *Example {
return &Example{classLabel, attributes}
}
func readFile(path string) ([]Example, error) {
var (
result []Example
err error
file *os.File
part []byte
size int
attributeNum int
)
if file, err = os.Open(path); err != nil {
return result, err
}
defer file.Close()
reader := bufio.NewReader(file)
buffer := bytes.NewBuffer(make([]byte, 0))
if part, _, err = reader.ReadLine(); err != nil {
return result, err
}
buffer.Write(part)
newLine := buffer.String()
fmt.Println("newLine=" + newLine)
r := strings.NewReader(newLine)
scanner := bufio.NewScanner(r)
scanner.Split(bufio.ScanWords)
if scanner.Scan() {
size, err = strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
}
fmt.Println("size=" + strconv.Itoa(size))
if scanner.Scan() {
attributeNum, err = strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
}
fmt.Println("attributeNum=" + strconv.Itoa(attributeNum))
result = make([]Example, size)
var classLabel int
var attributes []float64
for k := 0; k < size; k++ {
if part, _, err = reader.ReadLine(); err != nil {
return result, err
}
buffer.Write(part)
newLine := buffer.String()
fmt.Println("newLine=" + newLine)
r := strings.NewReader(newLine)
scanner := bufio.NewScanner(r)
scanner.Split(bufio.ScanWords)
if scanner.Scan() {
classLabel, err = strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
}
fmt.Println("classLabel=" + strconv.Itoa(classLabel))
for i := 0; i < attributeNum; i++ {
var attribute float64
if scanner.Scan() {
attribute, err = strconv.ParseFloat(scanner.Text(), 64)
if err != nil {
return result, err
}
attributes = append(attributes, attribute)
fmt.Println("attribute=" + strconv.FormatFloat(attribute, 'f', -1, 64))
}
}
result[k] = *NewExample(classLabel, attributes)
}
return result, scanner.Err()
}
func main() {
example, err := readFile("test.txt")
fmt.Println(example, err)
}
When you do this inside the for loop:
buffer.Write(part)
newLine := buffer.String()
fmt.Println("newLine=" + newLine)
The next line gets appended to buffer.
That is,
before the loop begins, buffer contains 2 3,
and then after reading 1.0 2.0 3.0,
it gets appended to buffer,
so the content becomes 2 31.0 2.0 3.0,
which you store in newLine.
That's where things start to go sideways.
You probably want to clear the buffer before reading each new line:
buffer.Reset()
buffer.Write(part)
newLine := buffer.String()
fmt.Println("newLine=" + newLine)
But then you will have further problems still, here:
if scanner.Scan() {
classLabel, err = strconv.Atoi(scanner.Text())
if err != nil {
return result, err
}
}
Since the line contains 1.0 2.0 3.0, the strconf.Atoi is going to fail.
I don't understand the purpose of this snippet,
perhaps you can delete it (or comment out).
With the above fixed, you will still have one more problem, on this line:
attributes = append(attributes, attribute)
Since attributes is never reset, it keeps growing.
That is, after the first line, it will contain 1 2 3,
and after the second line it will contain 1 2 3 -1 -2 -3.
You could correct that by moving the declaration of attributes without the outer loop, like this:
var attributes []float64
for i := 0; i < attributeNum; i++ {
var attribute float64
if scanner.Scan() {
attribute, err = strconv.ParseFloat(scanner.Text(), 64)
if err != nil {
return result, err
}
attributes = append(attributes, attribute)
fmt.Println("attribute=" + strconv.FormatFloat(attribute, 'f', -1, 64))
}
}

golang scan a line of numbers from sdin

I'm trying to read input from stdin like
3 2 1<ENTER>
and save it in a list of ints. At the moment my code looks like this:
nums = make([]int, 0)
var i int
for {
_, err := fmt.Scan(&i)
if err != nil {
if err==io.EOF { break }
log.Fatal(err)
}
nums = append(nums, i)
}
at the moment the program never leaves the for-loop. I can't find an easy way to check for a newline character in the documentation. how would i do this?
Edit:
Since I know that there will almost certainly be four numbers, I tried the following:
var i0,i1,i2,i3 int
fmt.Scanf("%d %d %d %d\n", &i0, &i1, &i2, &i3)
but this only scanned the first number and then exited the program. I'm not sure if that's because of the z-shell I'm using.
Edit:
To clarify, the program will pause and ask for the user to input a list of n numbers separated by spaces and terminated with a newline. these numbers should be stored in an array.
Ok, I decided to bring out the large bufio hammer and solve it like this:
in := bufio.NewReader(os.Stdin)
line, err := in.ReadString('\n')
if err != nil {
log.Fatal(err)
}
strs := strings.Split(line[0:len(line)-1], " ")
nums := make([]int, len(strs))
for i, str := range strs {
if nums[i], err = strconv.Atoi(str); err != nil {
log.Fatal(err)
}
}
It does seem like an awful lot of code, but it works.
It seems that you want https://golang.org/pkg/fmt/#Fscanln
Something like
ok := func(err error) { if err != nil { panic(err) } }
for {
var i, j, k int
_, err := fmt.Fscanln(io.Stdin, &i, &j, &k)
ok(err)
fmt.Println(i, j, k)
}
I will suggest to use "bufio" package with the "scan()" method.
Following is the code where I'm reading two lines from "stdin" and storing the lines into an array.
Hope this helps you.
package main
import (
"fmt"
"bufio"
"os"
"strconv"
"strings"
)
func ReadInput() []string{
var lines []string
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
lines = append(lines, scanner.Text())
//count, _ := strconv.Atoi(lines[0])
if len(lines) == 2 { break }
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, err)
}
return lines
}
func main(){
lines := ReadInput()
count ,_ := strconv.Atoi(lines[0])
num := strings.Fields(lines[1])
if count != len(num) { os.Exit(0) }
// Do whatever you want here
}
Two lines will be accepted. First line will have a count. Second line will have all the numbers. You can modify the same code as per your requirement.
Example:
3
1 5 10

Resources