Decode very big json to array of structs - go

I have a web application which have a REST API, get JSON as input and perform transformations of this JSON.
Here is my code:
func (a *API) getAssignments(w http.ResponseWriter, r *http.Request) {
var document DataPacket
err := json.NewDecoder(r.Body).Decode(&document)
if err != nil {
a.handleJSONParseError(err, w)
return
}
// transformations
JSON which I get is a collection of structs. External application use my application and send me very big json files (300-400MB). Decode this json at the one moment of time takes a very big time and amount of memory.
Is there any way to work with this json as stream and decode structs from this collection one by one ?

First, read the documentation.
Package json
import "encoding/json"
func (*Decoder) Decode
func (dec *Decoder) Decode(v interface{}) error
Decode reads the next JSON-encoded value from its input and stores it
in the value pointed to by v.
Example (Stream): This example uses a Decoder to decode a streaming array of JSON
objects.
Playground: https://play.golang.org/p/o6hD-UV85SZ
package main
import (
"encoding/json"
"fmt"
"log"
"strings"
)
func main() {
const jsonStream = `
[
{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}
]
`
type Message struct {
Name, Text string
}
dec := json.NewDecoder(strings.NewReader(jsonStream))
// read open bracket
t, err := dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
// while the array contains values
for dec.More() {
var m Message
// decode an array value (Message)
err := dec.Decode(&m)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%v: %v\n", m.Name, m.Text)
}
// read closing bracket
t, err = dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
}

Related

How to Parse big json file with Golang [duplicate]

I have a massive JSON array stored in a file ("file.json")
I need to iterate through the array and do some operation on each element.
err = json.Unmarshal(dat, &all_data)
Causes an out of memory - I'm guessing because it loads everything into memory first.
Is there a way to stream the JSON element by element?
There is an example of this sort of thing in encoding/json documentation:
package main
import (
"encoding/json"
"fmt"
"log"
"strings"
)
func main() {
const jsonStream = `
[
{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}
]
`
type Message struct {
Name, Text string
}
dec := json.NewDecoder(strings.NewReader(jsonStream))
// read open bracket
t, err := dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
// while the array contains values
for dec.More() {
var m Message
// decode an array value (Message)
err := dec.Decode(&m)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%v: %v\n", m.Name, m.Text)
}
// read closing bracket
t, err = dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
}
So, as commenters suggested, you could use the streaming API of "encoding/json" for reading one string at a time:
r := ... // get some io.Reader (e.g. open the big array file)
d := json.NewDecoder(r)
// read "["
d.Token()
// read strings one by one
for d.More() {
s, _ := d.Token()
// do something with s which is the newly read string
fmt.Printf("read %q\n", s)
}
// (optionally) read "]"
d.Token()
Note that for simplicity I've left error handling out which needs to be implemented.

Golang REST api request with token auth to json array reponse

EDIT
This is the working code incase someone finds it useful. The title to this question was originally
"How to parse a list fo dicts in golang".
This is title is incorrect because I was referencing terms I'm familiar with in python.
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
)
//Regional Strut
type Region []struct {
Region string `json:"region"`
Description string `json:"Description"`
ID int `json:"Id"`
Name string `json:"Name"`
Status int `json:"Status"`
Nodes []struct {
NodeID int `json:"NodeId"`
Code string `json:"Code"`
Continent string `json:"Continent"`
City string `json:"City"`
} `json:"Nodes"`
}
//working request and response
func main() {
url := "https://api.geo.com"
// Create a Bearer string by appending string access token
var bearer = "TOK:" + "TOKEN"
// Create a new request using http
req, err := http.NewRequest("GET", url, nil)
// add authorization header to the req
req.Header.Add("Authorization", bearer)
//This is what the response from the API looks like
//regionJson := `[{"region":"GEO:ABC","Description":"ABCLand","Id":1,"Name":"ABCLand [GEO-ABC]","Status":1,"Nodes":[{"NodeId":17,"Code":"LAX","Continent":"North America","City":"Los Angeles"},{"NodeId":18,"Code":"LBC","Continent":"North America","City":"Long Beach"}]},{"region":"GEO:DEF","Description":"DEFLand","Id":2,"Name":"DEFLand","Status":1,"Nodes":[{"NodeId":15,"Code":"NRT","Continent":"Asia","City":"Narita"},{"NodeId":31,"Code":"TYO","Continent":"Asia","City":"Tokyo"}]}]`
//Send req using http Client
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
log.Println("Error on response.\n[ERROR] -", err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Println("Error while reading the response bytes:", err)
}
var regions []Region
json.Unmarshal([]byte(body), &regions)
fmt.Printf("Regions: %+v", regions)
}
Have a look at this playground example for some pointers.
Here's the code:
package main
import (
"encoding/json"
"log"
)
func main() {
b := []byte(`
[
{"key": "value", "key2": "value2"},
{"key": "value", "key2": "value2"}
]`)
var mm []map[string]string
if err := json.Unmarshal(b, &mm); err != nil {
log.Fatal(err)
}
for _, m := range mm {
for k, v := range m {
log.Printf("%s [%s]", k, v)
}
}
}
I reformatted the API response you included because it is not valid JSON.
In Go it's necessary to define types to match the JSON schema.
I don't know why the API appends % to the end of the result so I've ignored that. If it is included, you will need to trim the results from the file before unmarshaling.
What you get from the unmarshaling is a slice of maps. Then, you can iterate over the slice to get each map and then iterate over each map to extract the keys and values.
Update
In your updated question, you include a different JSON schema and this change must be reflect in the Go code by update the types. There are some other errors in your code. Per my comment, I encourage you to spend some time learning the language.
package main
import (
"bytes"
"encoding/json"
"io/ioutil"
"log"
)
// Response is a type that represents the API response
type Response []Record
// Record is a type that represents the individual records
// The name Record is arbitrary as it is unnamed in the response
// Golang supports struct tags to map the JSON properties
// e.g. JSON "region" maps to a Golang field "Region"
type Record struct {
Region string `json:"region"`
Description string `json:"description"`
ID int `json:"id"`
Nodes []Node
}
type Node struct {
NodeID int `json:"NodeId`
Code string `json:"Code"`
}
func main() {
// A slice of byte representing your example response
b := []byte(`[{
"region": "GEO:ABC",
"Description": "ABCLand",
"Id": 1,
"Name": "ABCLand [GEO-ABC]",
"Status": 1,
"Nodes": [{
"NodeId": 17,
"Code": "LAX",
"Continent": "North America",
"City": "Los Angeles"
}, {
"NodeId": 18,
"Code": "LBC",
"Continent": "North America",
"City": "Long Beach"
}]
}, {
"region": "GEO:DEF",
"Description": "DEFLand",
"Id": 2,
"Name": "DEFLand",
"Status": 1,
"Nodes": [{
"NodeId": 15,
"Code": "NRT",
"Continent": "Asia",
"City": "Narita"
}, {
"NodeId": 31,
"Code": "TYO",
"Continent": "Asia",
"City": "Tokyo"
}]
}]`)
// To more closely match your code, create a Reader
rdr := bytes.NewReader(b)
// This matches your code, read from the Reader
body, err := ioutil.ReadAll(rdr)
if err != nil {
// Use Printf to format strings
log.Printf("Error while reading the response bytes\n%s", err)
}
// Initialize a variable of type Response
resp := &Response{}
// Try unmarshaling the body into it
if err := json.Unmarshal(body, resp); err != nil {
log.Fatal(err)
}
// Print the result
log.Printf("%+v", resp)
}

Reading and Unmarshalling API results in Golang

In the below program I'm extracting some data from an API.
It outputs a rather complex data.
When I ioutil.ReadAll(resp.Body), the result is of type []uint8.
If I try to read the results, its just a random array of integers.
However, I'm able to read it if I convert it to string using string(diskinfo)
But I want to use this in a Struct and having trouble unmarshalling.
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
"net/url"
"reflect"
)
type ApiResults struct {
results []struct {
statement_id int `json.statement_id`
series []struct {
name string `json.name`
tags struct {
host string `json.host`
}
columns []string `json.columns`
values []interface{} `json.values`
}
}
}
func main() {
my_url := "my_url"
my_qry := fmt.Sprintf("my_query")
resp, err := http.Get(my_url + url.QueryEscape(my_qry))
if err != nil {
fmt.Printf("ERROR: %v\n", err)
} else {
fmt.Println(reflect.TypeOf(resp))
diskinfo, _ := ioutil.ReadAll(resp.Body)
fmt.Println(reflect.TypeOf((diskinfo)))
fmt.Println(diskinfo)
fmt.Println(string(diskinfo))
diskinfo_string := string(diskinfo)
data := ApiResults{}
json.Unmarshal([]byte(diskinfo_string), &data)
//fmt.Printf("Values = %v\n", data.results.series.values)
//fmt.Printf("Server = %v\n", data.results.series.tags.host)
}
}
If I view the data as a string, I get this (formatted):
{"results":[
{"statement_id":0,
"series":[
{"name":"disk",
"tags":{"host":"myServer1"},
"columns":["time","disk_size"],
"values":[["2021-07-07T07:53:32.291490387Z",1044]]},
{"name":"disk",
"tags":{"host":"myServer2"},
"columns":["time","disk_size"],
"values":[["2021-07-07T07:53:32.291490387Z",1046]]}
]}
]}
I think my Apireturn struct is also structured incorrectly because the API results have info for multiple hosts.
But first, I doubt if the data has to be sent in a different format to the struct. Once I do this, I can probably try to figure out how to read from the Struct next.
The ioutil.ReadAll already provides you the data in the type byte[]. Therefore you can just call json.Unmarshal passing it as a parameter.
import (
"encoding/json"
"io/ioutil"
"net/http"
)
func toStruct(res *http.Response) (*ApiResults, error) {
body, err := ioutil.ReadAll(res.Body)
if err != nil {
return nil, err
}
defer res.Body.Close()
data := ApiResults{}
if err := json.Unmarshal(body, &data); err != nil {
return nil, err
}
return data, nil
}
There also seems to be an issue with your struct. The correct way to use struct tags is as follows. Plus, fields need to be exported for the json tag (used by json.Umarshal) to work – starting with uppercase will do it.
type ApiResults struct {
Results []struct {
StatementId int `json:"statement_id"`
Series []struct {
Name string `json:"name"`
Tags struct {
Host string `json:"host"`
} `json:"tags"`
Columns []string `json:"columns"`
Values []interface{} `json:"values"`
} `json:"series"`
} `json:"results"`
}

Stream data from a request response into a CSV in go

I have something like a data pipeline.
API response (10k) rows as JSON.
=> Sanitize some of the data into a new structure
=> Create a CSV File
I can currently do that by getting the full response and doing that step by step.
I was wondering if there's a simpler way to stream the response reading into CSV right away and also writing in the file as it goes over the request-response.
Current code:
I will have a JSON like { "name": "Full Name", ...( 20 columns)} and that data repeats about 10-20k times with different values.
For request
var res *http.Response
if res, err = client.Do(request); err != nil {
return errors.Wrap(err, "failed to perform request")
}
For Unmarshal
var record []RecordStruct
if err = json.NewDecoder(res.Body).Decode(&record); err != nil {
return err
}
For CSV
var row []byte
if row, err = csvutil.Marshal(record); err != nil {
return err
}
To stream an array of JSON objects you have to decode nested objects instead of root object. To do this you need read data using tokens (check out Token method). According to the documentation:
Token returns the next JSON token in the input stream. At the end of the input stream, Token returns nil, io.EOF.
Token guarantees that the delimiters [ ] { } it returns are properly nested and matched: if Token encounters an unexpected delimiter in the input, it will return an error.
The input stream consists of basic JSON values—bool, string, number, and null—along with delimiters [ ] { } of type Delim to mark the start and end of arrays and objects. Commas and colons are elided.
That mean you can decode document part by part. Find an official example how to do it here
I will post a code snippet that shows how you can combine json stream technic with writing result to the CSV:
package main
import (
"encoding/csv"
"encoding/json"
"log"
"os"
"strings"
)
type RecordStruct struct {
Name string `json:"name"`
Info string `json:"info"`
// ... any field you want
}
func (rs *RecordStruct) CSVRecord() []string {
// Here we form data for CSV writer
return []string{rs.Name, rs.Info}
}
const jsonData =
`[
{ "name": "Full Name", "info": "..."},
{ "name": "Full Name", "info": "..."},
{ "name": "Full Name", "info": "..."},
{ "name": "Full Name", "info": "..."},
{ "name": "Full Name", "info": "..."}
]`
func main() {
// Create file for storing our result
file, err := os.Create("result.csv")
if err != nil {
log.Fatalln(err)
}
defer file.Close()
// Create CSV writer using standard "encoding/csv" package
var w = csv.NewWriter(file)
// Put your reader here. In this case I use strings.Reader
// If you are getting data through http it will be resp.Body
var jsonReader = strings.NewReader(jsonData)
// Create JSON decoder using "encoding/json" package
decoder := json.NewDecoder(jsonReader)
// Token returns the next JSON token in the input stream.
// At the end of the input stream, Token returns nil, io.EOF.
// In this case our first token is '[', i.e. array start
_, err = decoder.Token()
if err != nil {
log.Fatalln(err)
}
// More reports whether there is another element in the
// current array or object being parsed.
for decoder.More() {
var record RecordStruct
// Decode only the one item from our array
if err := decoder.Decode(&record); err != nil {
log.Fatalln(err)
}
// Convert and put out record to the csv file
if err := writeToCSV(w, record.CSVRecord()); err != nil {
log.Fatalln(err)
}
}
// Our last token is ']', i.e. array end
_, err = decoder.Token()
if err != nil {
log.Fatalln(err)
}
}
func writeToCSV(w *csv.Writer, record []string) error {
if err := w.Write(record); err != nil {
return err
}
w.Flush()
return nil
}
You can also use 3d party packages like github.com/bcicen/jstream

Unmarshal unknown JSON fields without structs

I am trying to unmarshal a JSON object which has an optional array, I am doing this without an array and this is what I got so far:
import (
"encoding/json"
"fmt"
)
func main() {
jo := `
{
"given_name": "Akshay Raj",
"name": "Akshay",
"country": "New Zealand",
"family_name": "Gollahalli",
"emails": [
"name#example.com"
]
}
`
var raw map[string]interface{}
err := json.Unmarshal([]byte(jo), &raw)
if err != nil {
panic(err)
}
fmt.Println(raw["emails"][0])
}
The emails field might or might not come sometime. I know I can use struct and unmarshal it twice for with and without array. When I try to get the index 0 of raw["emails"][0] I get the following error
invalid operation: raw["emails"][0] (type interface {} does not support indexing)
Is there a way to get the index of the emails field?
Update 1
I can do something like this fmt.Println(raw["emails"].([]interface{})[0]) and it works. Is this the only way?
The easiest way is with a struct. There's no need to unmarshal twice.
type MyStruct struct {
// ... other fields
Emails []string `json:"emails"`
}
This will work, regardless of whether the JSON input contains the emails field. When it is missing, your resulting struct will just have an uninitialized Emails field.
You can use type assertions. The Go tutorial on type assertions is here.
A Go playground link applying type assertions to your problem is here. For ease of reading, that code is replicated below:
package main
import (
"encoding/json"
"fmt"
)
func main() {
jo := `
{
"given_name": "Akshay Raj",
"name": "Akshay",
"country": "New Zealand",
"family_name": "Gollahalli",
"emails": [
"name#example.com"
]
}
`
var raw map[string]interface{}
err := json.Unmarshal([]byte(jo), &raw)
if err != nil {
panic(err)
}
emails, ok := raw["emails"]
if !ok {
panic("do this when no 'emails' key")
}
emailsSlice, ok := emails.([]interface{})
if !ok {
panic("do this when 'emails' value is not a slice")
}
if len(emailsSlice) == 0 {
panic("do this when 'emails' slice is empty")
}
email, ok := (emailsSlice[0]).(string)
if !ok {
panic("do this when 'emails' slice contains non-string")
}
fmt.Println(email)
}
As always you can use additional libraries for work with your json data. For example with gojsonq package it will like so:
package main
import (
"fmt"
"github.com/thedevsaddam/gojsonq"
)
func main() {
json := `
{
"given_name": "Akshay Raj",
"name": "Akshay",
"country": "New Zealand",
"family_name": "Gollahalli",
"emails": [
"name#example.com"
]
}
`
first := gojsonq.New().JSONString(json).Find("emails.[0]")
if first != nil {
fmt.Println(first.(string))
} else {
fmt.Println("There isn't emails")
}
}

Resources