Change filed in byte representation of JSON object - go

I write A object to a file f.
a := A{42}
bytes, _ := json.MarshalIndent(a, "", "\t")
f.Write(bytes)
Where A looks like:
type A struct {
A int `json:"a"`
}
Then I change field of this object and write it to the file:
a.A = 666
f.Write(bytes)
As a result, I see only
{
"a": 42
}{
"a": 42
}
While I expected:
{
"a": 42
}{
"a": 666
}
I know that I can overcome it by using json.MarshalIndent again. But I need to do a lot of (~10^6) writings to the file, so using json.MarshalIndent again and again seems to be a heavy task.
How can I directly change a bytes variable?
Code is located at https://play.golang.org/p/8CMpwehMidR

You have no choice but to marshal repeatedly. Use a *json.Encoder for improved ergonomics and efficiency:
package main
import (
"encoding/json"
"log"
"os"
)
type A struct {
A int `json:"a"`
}
func main() {
f, err := os.Create("foo.json")
if err != nil {
log.Fatal(err)
}
defer f.Close()
enc := json.NewEncoder(f)
enc.SetIndent("", "\t")
a := A{42} // Using a pointer may improve performance if A is large.
enc.Encode(a)
a.A = 666
enc.Encode(a)
}
Wrapping the file with a buffered writer may also improve performance, depending on how quickly you can compute successive values for the As and how fast the disk is.

you can use the standard library to replace bytes within the given bytes slice.
https://golang.org/pkg/bytes/#Replace
package main
import (
"bufio"
"bytes"
"encoding/json"
"os"
)
type A struct {
A int `json:"a"`
}
func main() {
out := bufio.NewWriterSize(os.Stdout, 20)
// defer out.Flush() // commented for demonstration purpose. Uncomment this to finalize the flush.
a := A{42}
b, _ := json.MarshalIndent(a, "", "\t")
out.Write(b)
b = bytes.Replace(b, []byte("42"), []byte("666"), -1)
out.Write(b)
}
It is not recommended to do it, but ultimately this is possible.
I included a buffered writer for demonstration of others answers and comments, don t forget to flush it.

Related

How to parse a JSON string returned from scanner.Text() [duplicate]

Objects like the below can be parsed quite easily using the encoding/json package.
[
{"something":"foo"},
{"something-else":"bar"}
]
The trouble I am facing is when there are multiple dicts returned from the server like this :
{"something":"foo"}
{"something-else":"bar"}
This can't be parsed using the code below.
correct_format := strings.Replace(string(resp_body), "}{", "},{", -1)
json_output := "[" + correct_format + "]"
I am trying to parse Common Crawl data (see example).
How can I do this?
Assuming your input is really a series of valid JSON documents, use a json.Decoder to decode them:
package main
import (
"encoding/json"
"fmt"
"io"
"log"
"strings"
)
var input = `
{"foo": "bar"}
{"foo": "baz"}
`
type Doc struct {
Foo string
}
func main() {
dec := json.NewDecoder(strings.NewReader(input))
for {
var doc Doc
err := dec.Decode(&doc)
if err == io.EOF {
// all done
break
}
if err != nil {
log.Fatal(err)
}
fmt.Printf("%+v\n", doc)
}
}
Playground: https://play.golang.org/p/ANx8MoMC0yq
If your input really is what you've shown in the question, that's not JSON and you have to write your own parser.
Seems like each line is its own json object.
You may get away with the following code which will structure this output into correct json:
package main
import (
"fmt"
"strings"
)
func main() {
base := `{"trolo":"lolo"}
{"trolo2":"lolo2"}`
delimited := strings.Replace(base, "\n", ",", -1)
final := "[" + delimited + "]"
fmt.Println(final)
}
You should be able to use encoding/json library on final now.
Another option would be to parse each incoming line, line by line, and then add each one to a collection in code (ie a slice) Go provides a line scanner for this.
yourCollection := []yourObject{}
scanner := bufio.NewScanner(YOUR_SOURCE)
for scanner.Scan() {
obj, err := PARSE_JSON_INTO_yourObject(scanner.Text())
if err != nil {
// something
}
yourCollection = append(yourCollection, obj)
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading standard input:", err)
}
You can read the ndjson from the file row by row and parse it then apply the logical operations on it. In the below sample instead of reading from the file, I have used an Array of JSON string.
import (
"encoding/json"
"fmt"
)
type NestedObject struct {
D string
E string
}
type OuterObject struct {
A string
B string
C []NestedObject
}
func main() {
myJsonString := []string{`{"A":"1","B":"2","C":[{"D":"100","E":"10"}]}`, `{"A":"11","B":"21","C":[{"D":"1001","E":"101"}]}`}
for index, each := range myJsonString {
fmt.Printf("Index value [%d] is [%v]\n", index, each)
var obj OuterObject
json.Unmarshal([]byte(each), &obj)
fmt.Printf("a: %v, b: %v, c: %v", obj.A, obj.B, obj.C)
fmt.Println()
}
}
Output:
Index value [0] is [{"A":"1","B":"2","C":[{"D":"100","E":"10"}]}]
a: 1, b: 2, c: [{100 10}]
Index value [1] is [{"A":"11","B":"21","C":[{"D":"1001","E":"101"}]}]
a: 11, b: 21, c: [{1001 101}]
Try it on golang play

How to improve the speed of reading a large file line by line in Go

I'm trying to figure out the fastest way of reading a large file line by line and checking if the line contains a string. The file I'm testing on is about 680mb in size:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
func main() {
f, err := os.Open("./crackstation-human-only.txt")
scanner := bufio.NewScanner(f)
if err != nil {
panic(err)
}
defer f.Close()
for scanner.Scan() {
if strings.Contains(scanner.Text(), "Iforgotmypassword") {
fmt.Println(scanner.Text())
}
}
}
After building the program and timing it on my machine it runs over 3 seconds
./speed 3.13s user 1.25s system 122% cpu 3.563 total
After increasing the buffer
buf := make([]byte, 64*1024)
scanner.Buffer(buf, bufio.MaxScanTokenSize)
It gets a little better
./speed 2.47s user 0.25s system 104% cpu 2.609 total
I know it can get better because other tools manage to do it under a second without any kind of indexing. What seems to be the bottleneck with this approach?
0.33s user 0.14s system 94% cpu 0.501 total
LAST EDIT
This is a "line-by-line" solution to the problem that takes trivial time, it prints the entire matching line.
package main
import (
"bytes"
"fmt"
"io/ioutil"
)
func main() {
dat, _ := ioutil.ReadFile("./jumble.txt")
i := bytes.Index(dat, []byte("Iforgotmypassword"))
if i != -1 {
var x int
var y int
for x = i; x > 0; x-- {
if dat[x] == byte('\n') {
break
}
}
for y = i; y < len(dat); y++ {
if dat[y] == byte('\n') {
break
}
}
fmt.Println(string(dat[x : y+1]))
}
}
real 0m0.421s
user 0m0.068s
sys 0m0.352s
ORIGINAL ANSWER
If you just need to see if the string is in a file, why not use regex?
Note: I kept the data as a byte array instead of converting to string.
package main
import (
"fmt"
"io/ioutil"
"regexp"
)
var regex = regexp.MustCompile(`Ilostmypassword`)
func main() {
dat, _ := ioutil.ReadFile("./jumble.txt")
if regex.Match(dat) {
fmt.Println("Yes")
}
}
jumble.txt is a 859 MB of jumbled text with newlines included.
Running with time ./code I get:
real 0m0.405s
user 0m0.064s
sys 0m0.340s
To try and answer your comment, I don't think the bottleneck is inherently coming from searching line by line, Golang uses an efficient algorithm for searching strings/runes.
I think the bottleneck comes from the IO reads, when the program reads from the file, it is normally not first in line in the queue of reading, therefore, the program must wait until it can read in order to start actually comparing. Thus, when you are reading in over and over, you are being forced to wait for your turn in IO.
To give you some math, if your buffer size is 64 * 1024 (or 65535 bytes), and your file is 1 GB. Dividing 1 GB / 65535 bytes is 15249 reads needed to check the entire file. Where as in my method, I read the entire file "at once" and check against that constructed array.
Another thing I can think of is just the utter amount of loops needed to move through the file and the time needed for each loop:
Given the following code:
dat, _ := ioutil.ReadFile("./jumble.txt")
sdat := bytes.Split(dat, []byte{'\n'})
for _, l := range sdat {
if bytes.Equal([]byte("Iforgotmypassword"), l) {
fmt.Println("Yes")
}
}
I calculated that each loop takes on average 32 nanoseconds, the string Iforgotmypassword was on line 100000000 in my file, thus the execution time for this loop was roughly 32 nanoseconds * 100000000 ~= 3.2 seconds.
Using my own 700MB test file with your original, time was just over 7 seconds
With grep it was 0.49 seconds
With this program (which doesn't print out the line, it just says yes)
0.082 seconds
package main
import (
"bytes"
"fmt"
"io/ioutil"
"os"
)
func check(e error) {
if e != nil {
panic(e)
}
}
func main() {
find := []byte(os.Args[1])
dat, err := ioutil.ReadFile("crackstation-human-only.txt")
check(err)
if bytes.Contains(dat, find) {
fmt.Print("yes")
}
}
H. Ross' answer is awesome, but it reads the whole file into memory, which may not be feasible if your file is too big. If you still want to scan line-by-line, perhaps if you're searching for multiple items, I found that using scanner.Bytes() instead of scanner.Text() improves speed slightly on my machine, from 2.244s for the original question, to 1.608s. bufio's scanner.Bytes() method doesn't allocate any additional memory, whereas Text() creates a string from its buffer.
package main
import (
"bufio"
"fmt"
"os"
"bytes"
)
// uses scanner.Bytes to avoid allocation.
func main() {
f, err := os.Open("./crackstation-human-only.txt")
scanner := bufio.NewScanner(f)
if err != nil {
panic(err)
}
defer f.Close()
toFind := []byte("Iforgotmypassword")
for scanner.Scan() {
if bytes.Contains(scanner.Bytes(), toFind) {
fmt.Println(scanner.Text())
}
}
}
You might try using goroutines to process multiple lines in parallel:
lines := make(chan string, numWorkers * 2) // give the channel enough room to put lots of things in so the reader isn't blocked
go func(scanner *bufio.Scanner, out <-chan string) {
for scanner.Scan() {
out <- scanner.Text()
}
close(out)
}(scanner, lines)
var wg sync.WaitGroup
wg.Add(numWorkers)
for i := 0; i < numWorkers; i++ {
go func(in chan<- string) {
defer wg.Done()
for text := range in {
if strings.Contains(text, "Iforgotmypassword") {
fmt.Println(scanner.Text())
}
}
}(lines)
}
wg.Wait()
I'm not sure how much this will really speed things up as it depends on what kind of hardware you have available; it sounds like you're looking for a more than 5x speed improvement, so you might notice if you're running something that can support 8 parallel worker threads. Feel free to use lots of worker-goroutines. Good luck.

How to read packed binary data in Go?

I'm trying to figure out the best way to read a packed binary file in Go that was produced by Python like the following:
import struct
f = open('tst.bin', 'wb')
fmt = 'iih' #please note this is packed binary: 4byte int, 4byte int, 2byte int
f.write(struct.pack(fmt,4, 185765, 1020))
f.write(struct.pack(fmt,4, 185765, 1022))
f.close()
I have been tinkering with some of the examples I've seen on Github.com and a few other sources but I can't seem to get anything working correctly (update shows working method). What is the idiomatic way to do this sort of thing in Go? This is one of several attempts
UPDATE and WORKING
package main
import (
"fmt"
"os"
"encoding/binary"
"io"
)
func main() {
fp, err := os.Open("tst.bin")
if err != nil {
panic(err)
}
defer fp.Close()
lineBuf := make([]byte, 10) //4 byte int, 4 byte int, 2 byte int per line
for true {
_, err := fp.Read(lineBuf)
if err == io.EOF{
break
}
aVal := int32(binary.LittleEndian.Uint32(lineBuf[0:4])) // same as: int32(uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24)
bVal := int32(binary.LittleEndian.Uint32(lineBuf[4:8]))
cVal := int16(binary.LittleEndian.Uint16(lineBuf[8:10])) //same as: int16(uint32(b[0]) | uint32(b[1])<<8)
fmt.Println(aVal, bVal, cVal)
}
}
A well portable and rather easy way to handle the problem are Google's "Protocol Buffers". Though this is too late now since you got it working, I took some effort in explaining and coding it, so I am posting it anyway.
You can find the code on https://github.com/mwmahlberg/ProtoBufDemo
You need to install the protocol buffers for python using your preferred method (pip, OS package management, source) and for Go
The .proto file
The .proto file is rather simple for our example. I called it data.proto
syntax = "proto2";
package main;
message Demo {
required uint32 A = 1;
required uint32 B = 2;
// A shortcomning: no 16 bit ints
// We need to make this sure in the applications
required uint32 C = 3;
}
Now you need to call protoc on the file and have it provide the code for both Python and Go:
protoc --go_out=. --python_out=. data.proto
which generates the files data_pb2.py and data.pb.go. Those files provide the language specific access to the protocol buffer data.
When using the code from github, all you need to do is to issue
go generate
in the source directory.
The Python code
import data_pb2
def main():
# We create an instance of the message type "Demo"...
data = data_pb2.Demo()
# ...and fill it with data
data.A = long(5)
data.B = long(5)
data.C = long(2015)
print "* Python writing to file"
f = open('tst.bin', 'wb')
# Note that "data.SerializeToString()" counterintuitively
# writes binary data
f.write(data.SerializeToString())
f.close()
f = open('tst.bin', 'rb')
read = data_pb2.Demo()
read.ParseFromString(f.read())
f.close()
print "* Python reading from file"
print "\tDemo.A: %d, Demo.B: %d, Demo.C: %d" %(read.A, read.B, read.C)
if __name__ == '__main__':
main()
We import the file generated by protoc and use it. Not much magic here.
The Go File
package main
//go:generate protoc --python_out=. data.proto
//go:generate protoc --go_out=. data.proto
import (
"fmt"
"os"
"github.com/golang/protobuf/proto"
)
func main() {
// Note that we do not handle any errors for the sake of brevity
d := Demo{}
f, _ := os.Open("tst.bin")
fi, _ := f.Stat()
// We create a buffer which is big enough to hold the entire message
b := make([]byte,fi.Size())
f.Read(b)
proto.Unmarshal(b, &d)
fmt.Println("* Go reading from file")
// Note the explicit pointer dereference, as the fields are pointers to a pointers
fmt.Printf("\tDemo.A: %d, Demo.B: %d, Demo.C: %d\n",*d.A,*d.B,*d.C)
}
Note that we do not need to explicitly import, as the package of data.proto is main.
The result
After generation the required files and compiling the source, when you issue
$ python writer.py && ./ProtoBufDemo
the result is
* Python writing to file
* Python reading from file
Demo.A: 5, Demo.B: 5, Demo.C: 2015
* Go reading from file
Demo.A: 5, Demo.B: 5, Demo.C: 2015
Note that the Makefile in the repository offers a shorcut for generating the code, compiling the .go files and run both programs:
make run
The Python format string is iih, meaning two 32-bit signed integers and one 16-bit signed integer (see the docs). You can simply use your first example but change the struct to:
type binData struct {
A int32
B int32
C int16
}
func main() {
fp, err := os.Open("tst.bin")
if err != nil {
panic(err)
}
defer fp.Close()
for {
thing := binData{}
err := binary.Read(fp, binary.LittleEndian, &thing)
if err == io.EOF{
break
}
fmt.Println(thing.A, thing.B, thing.C)
}
}
Note that the Python packing didn't specify the endianness explicitly, but if you're sure the system that ran it generated little-endian binary, this should work.
Edit: Added main() function to explain what I mean.
Edit 2: Capitalized struct fields so binary.Read could write into them.
As I mentioned in my post, I'm not sure this is THE idiomatic way to do this in Go but this is the solution that I came up with after a fair bit of tinkering and adapting several different examples. Note again that this unpacks 4 and 2 byte int into Go int32 and int16 respectively. Posting so that there is a valid answer in case someone comes looking. Hopefully someone will post a more idiomatic way of accomplishing this but for now, this works.
package main
import (
"fmt"
"os"
"encoding/binary"
"io"
)
func main() {
fp, err := os.Open("tst.bin")
if err != nil {
panic(err)
}
defer fp.Close()
lineBuf := make([]byte, 10) //4 byte int, 4 byte int, 2 byte int per line
for true {
_, err := fp.Read(lineBuf)
if err == io.EOF{
break
}
aVal := int32(binary.LittleEndian.Uint32(lineBuf[0:4])) // same as: int32(uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24)
bVal := int32(binary.LittleEndian.Uint32(lineBuf[4:8]))
cVal := int16(binary.LittleEndian.Uint16(lineBuf[8:10])) //same as: int16(uint32(b[0]) | uint32(b[1])<<8)
fmt.Println(aVal, bVal, cVal)
}
}
Try binpacker libary.
Example:
Example data:
buffer := new(bytes.Buffer)
packer := binpacker.NewPacker(buffer)
unpacker := binpacker.NewUnpacker(buffer)
packer.PushByte(0x01)
packer.PushUint16(math.MaxUint16)
Unpack:
var val1 byte
var val2 uint16
var err error
val1, err = unpacker.ShiftByte()
val2, err = unpacker.ShiftUint16()
Or:
var val1 byte
var val2 uint16
var err error
unpacker.FetchByte(&val1).FetchUint16(&val2)
unpacker.Error() // Make sure error is nil

Reading from stdin in golang

I'm trying to read from Stdin in Golang as I'm trying to implement a driver for Erlang. I have the following code:
package main
import (
"fmt"
"os"
"bufio"
"time"
)
func main() {
go func() {
stdout := bufio.NewWriter(os.Stdin)
p := []byte{121,100,125,'\n'}
stdout.Write(p)
}()
stdin := bufio.NewReader(os.Stdin)
values := make([]byte,4,4)
for{
fmt.Println("b")
if read_exact(stdin) > 0 {
stdin.Read(values)
fmt.Println("a")
give_func_write(values)
}else{
continue
}
}
}
func read_exact(r *bufio.Reader) int {
bits := make([]byte,3,3)
a,_ := r.Read(bits)
if a > 0 {
r.Reset(r)
return 1
}
return -1
}
func give_func_write(a []byte) bool {
fmt.Println("Yahu")
return true
}
However it seems that the give_func_write is never reached. I tried to start a goroutine to write to standard input after 2 seconds to test this.
What am I missing here?
Also the line r.Reset(r). Is this valid in go? What I tried to achieve is simply restart the reading from the beginning of the file. Is there a better way?
EDIT
After having played around I was able to find that the code is stuck at a,_ := r.Read(bits) in the read_exact function
I guess that I will need to have a protocol in which I send a \n to
make the input work and at the same time discard it when reading it
No, you don't. Stdin is line-buffered only if it's bound to terminal. You can run your program prog < /dev/zero or cat file | prog.
bufio.NewWriter(os.Stdin).Write(p)
You probably don't want to write to stdin. See "Writing to stdin and reading from stdout" for details.
Well, it's not particular clear for me what you're trying to achieve. I'm assuming, that you just want to read data from stdin by fixed-size chunks. Use io.ReadFull for this. Or if you want to use buffers, you can use Reader.Peek or Scanner to ensure, that specific number of bytes is available. I've changed your program to demonstrate the usage of io.ReadFull:
package main
import (
"fmt"
"io"
"time"
)
func main() {
input, output := io.Pipe()
go func() {
defer output.Close()
for _, m := range []byte("123456") {
output.Write([]byte{m})
time.Sleep(time.Second)
}
}()
message := make([]byte, 3)
_, err := io.ReadFull(input, message)
for err == nil {
fmt.Println(string(message))
_, err = io.ReadFull(input, message)
}
if err != io.EOF {
panic(err)
}
}
You can easily split it in two programs and test it that way. Just change input to os.Stdin.

Get terminal size in Go

How to get tty size with Golang? I am trying do this with executing stty size command, but I can't craft code right.
package main
import (
"os/exec"
"fmt"
"log"
)
func main() {
out, err := exec.Command("stty", "size").Output()
fmt.Printf("out: %#v\n", out)
fmt.Printf("err: %#v\n", err)
if err != nil {
log.Fatal(err)
}
}
Output:
out: []byte{}
err: &exec.ExitError{ProcessState:(*os.ProcessState)(0xc200066520)}
2013/05/16 02:35:57 exit status 1
exit status 1
I think this is because Go spawns a process not related to the current tty, with which it is working. How can I relate the command to current terminal in order to get its size?
I just wanted to add a new answer since I ran into this problem recently. There is a terminal package which lives inside the official ssh package https://godoc.org/golang.org/x/crypto/ssh/terminal.
This package provides a method to easily get the size of a terminal.
width, height, err := terminal.GetSize(0)
0 would be the file descriptor of the terminal you want the size of. To get the fd or you current terminal you can always do int(os.Stdin.Fd())
Under the covers it uses a syscall to get the terminal size for the given fd.
I was stuck on a similar problem. Here is what I ended up with.
It doesn't use a subprocess, so might be desirable in some situations.
import (
"syscall"
"unsafe"
)
type winsize struct {
Row uint16
Col uint16
Xpixel uint16
Ypixel uint16
}
func getWidth() uint {
ws := &winsize{}
retCode, _, errno := syscall.Syscall(syscall.SYS_IOCTL,
uintptr(syscall.Stdin),
uintptr(syscall.TIOCGWINSZ),
uintptr(unsafe.Pointer(ws)))
if int(retCode) == -1 {
panic(errno)
}
return uint(ws.Col)
}
It works if you give the child process access to the parent's stdin:
package main
import (
"os/exec"
"fmt"
"log"
"os"
)
func main() {
cmd := exec.Command("stty", "size")
cmd.Stdin = os.Stdin
out, err := cmd.Output()
fmt.Printf("out: %#v\n", string(out))
fmt.Printf("err: %#v\n", err)
if err != nil {
log.Fatal(err)
}
}
Yields:
out: "36 118\n"
err: <nil>
You can use golang.org/x/term package (https://pkg.go.dev/golang.org/x/term)
Example
package main
import "golang.org/x/term"
func main() {
if term.IsTerminal(0) {
println("in a term")
} else {
println("not in a term")
}
width, height, err := term.GetSize(0)
if err != nil {
return
}
println("width:", width, "height:", height)
}
Output
in a term
width: 228 height: 27
Since no one else here has yet to present a cross-platform solution that will work on both Windows and Unix, I went ahead and put together a library that supports both.
https://github.com/nathan-fiscaletti/consolesize-go
package main
import (
"fmt"
"github.com/nathan-fiscaletti/consolesize-go"
)
func main() {
cols, rows := consolesize.GetConsoleSize()
fmt.Printf("Rows: %v, Cols: %v\n", rows, cols)
}
If anyone's interested I made a package to make this easier.
https://github.com/wayneashleyberry/terminal-dimensions
package main
import (
"fmt"
terminal "github.com/wayneashleyberry/terminal-dimensions"
)
func main() {
x, _ := terminal.Width()
y, _ := terminal.Height()
fmt.Printf("Terminal is %d wide and %d high", x, y)
}
I have one implementation that uses tcell module, under the hood it will still use approach that based on calling native dlls, but if you're searching for terminal dimensions there is a great chance that you would need that package anyway:
package main
import (
"fmt"
"github.com/gdamore/tcell"
)
func main() {
screen, _ := tcell.NewScreen()
screen.Init()
w, h := screen.Size()
fmt.Println(w, h)
}

Resources