How to create an AVRO file using Go? - go

I'm trying to create an AVRO file using Go. So far I tried a couple of libraries and I have some code.
The problem is that I can work with the data but don't know how to serialize it to store it. Here's the code I got from github.com/hamba/avro with some small modifications.
import (
"fmt"
"github.com/hamba/avro"
"log"
)
type SimpleRecord struct {
A int64 `avro:"a"`
B string `avro:"b"`
}
func main() {
schema, err := avro.Parse(`{
"type": "record",
"name": "simple",
"namespace": "hamba",
"fields" : [
{"name": "a", "type": "long"},
{"name": "b", "type": "string"}
]
}`)
if err != nil {
log.Fatal(err)
}
in := SimpleRecord{A: 27, B: "foo"}
data, err := avro.Marshal(schema, in)
if err != nil {
log.Fatal(err)
}
fmt.Println(data)
}
This block of code prints:
[54 6 102 111 111]
This line corresponds to the avro encoding of the data. And it seems like this is all I need to store, but I don't know how to create the file itself.
I tried:
mode := int(0644)
permissions := os.FileMode(mode)
err = ioutil.WriteFile("file.avro", data, permissions)
if err != nil {
log.Fatal(err)
}
And it generates a file. However, when I try to read it as an AVRO file using Python fastavro library, I get the error ValueError: cannot read header - is it an avro file?.
But according to the docs (https://godoc.org/github.com/hamba/avro#example-Marshal): "Marshal returns the Avro encoding of v." Marshal(schema Schema, v interface{}) ([]byte, error), so data should be of type []byte.

Avro defines the data encoding format only which can be packaged as messages or files. So, for file storage should use Avro OCF - Avro Object Container Files. Here is a working hamba avro ocf encoder example.
In my code I've encoded multiple rows to upload it to BigQuery (error checks, init, and close are omitted for clarity):
f, err := os.Open("/your/avro/file.avro")
enc, err := ocf.NewEncoder(schema, w, ocf.WithCodec(ocf.Snappy))
for _, item := range items {
enc.Encode(item)
}

Related

BigQuery - Fetch table data from Go Server in Cloud Run

I am able to fetch the Big Query table output as json through Golang server. But is there a way to fetch the schema directly instead of defining it as ColStatsRow? Also, any way to make this better.
type ColStatsRow struct {
COLUMN_NAME string `bigquery:"column_name"`
COLUMN_VALUE string `bigquery:"column_value"`
FREQUENCY int64 `bigquery:"frequency"`
}
// getOutput prints results from a query
func getOutput(w http.ResponseWriter, iter *bigquery.RowIterator) error {
var rows []ColStatsRow
for {
var row ColStatsRow
err := iter.Next(&row)
if err == iterator.Done {
out, err := json.Marshal(rows)
if err != nil {
return fmt.Errorf("error marshalling results: %v", err)
}
w.Write([]byte(out))
return nil
}
if err != nil {
return fmt.Errorf("error iterating through results: %v", err)
}
rows = append(rows, row)
}
}
Thank you.
If you're after the schema for the result, it's available on the RowIterator.
If you mean you want to more dynamically process the rows without a specific struct, usually some combination of checking the schema and/or leveraging a type switch is the way to go about this.
According to the documentation you can specify a JSON schema file like this:
[
{
"description": "[DESCRIPTION]",
"name": "[NAME]",
"type": "[TYPE]",
"mode": "[MODE]"
},
{
"description": "[DESCRIPTION]",
"name": "[NAME]",
"type": "[TYPE]",
"mode": "[MODE]"
}
]
and then you can write this schema using the following command:
bq show --schema --format=prettyjson project_id:dataset.table > path_to_file

Extract avro schema

Similar to this question
How to extract schema for avro file in python
Is there a way to read in an avro file in golang without knowing the schema beforehand and extract a schema?
How about something like this (adapted code from https://github.com/hamba/avro/blob/master/ocf/ocf.go):
package main
import (
"github.com/hamba/avro"
"log"
"os"
)
// HeaderSchema is the Avro schema of a container file header.
var HeaderSchema = avro.MustParse(`{
"type": "record",
"name": "org.apache.avro.file.Header",
"fields": [
{"name": "magic", "type": {"type": "fixed", "name": "Magic", "size": 4}},
{"name": "meta", "type": {"type": "map", "values": "bytes"}},
{"name": "sync", "type": {"type": "fixed", "name": "Sync", "size": 16}}
]
}`)
var magicBytes = [4]byte{'O', 'b', 'j', 1}
const (
schemaKey = "avro.schema"
)
// Header represents an Avro container file header.
type Header struct {
Magic [4]byte `avro:"magic"`
Meta map[string][]byte `avro:"meta"`
Sync [16]byte `avro:"sync"`
}
func main() {
r, err := os.Open("path/my.avro")
if err != nil {
log.Fatal(err)
}
defer r.Close()
reader := avro.NewReader(r, 1024)
var h Header
reader.ReadVal(HeaderSchema, &h)
if reader.Error != nil {
log.Println("decoder: unexpected error: %v", reader.Error)
}
if h.Magic != magicBytes {
log.Println("decoder: invalid avro file")
}
schema, err := avro.Parse(string(h.Meta[schemaKey]))
if err != nil {
log.Println(err)
}
log.Println(schema)
}
Both https://github.com/hamba/avro and https://github.com/linkedin/goavro can decode Avro OCF files (which it sounds like is what you have) without an explicit schema file.
Once you've created a new reader/decoder, you can retrieve the metadata, which includes the schema at key avro.schema: https://pkg.go.dev/github.com/hamba/avro/ocf#Decoder.Metadata and https://pkg.go.dev/github.com/linkedin/goavro#OCFReader.MetaData

Decode very big json to array of structs

I have a web application which have a REST API, get JSON as input and perform transformations of this JSON.
Here is my code:
func (a *API) getAssignments(w http.ResponseWriter, r *http.Request) {
var document DataPacket
err := json.NewDecoder(r.Body).Decode(&document)
if err != nil {
a.handleJSONParseError(err, w)
return
}
// transformations
JSON which I get is a collection of structs. External application use my application and send me very big json files (300-400MB). Decode this json at the one moment of time takes a very big time and amount of memory.
Is there any way to work with this json as stream and decode structs from this collection one by one ?
First, read the documentation.
Package json
import "encoding/json"
func (*Decoder) Decode
func (dec *Decoder) Decode(v interface{}) error
Decode reads the next JSON-encoded value from its input and stores it
in the value pointed to by v.
Example (Stream): This example uses a Decoder to decode a streaming array of JSON
objects.
Playground: https://play.golang.org/p/o6hD-UV85SZ
package main
import (
"encoding/json"
"fmt"
"log"
"strings"
)
func main() {
const jsonStream = `
[
{"Name": "Ed", "Text": "Knock knock."},
{"Name": "Sam", "Text": "Who's there?"},
{"Name": "Ed", "Text": "Go fmt."},
{"Name": "Sam", "Text": "Go fmt who?"},
{"Name": "Ed", "Text": "Go fmt yourself!"}
]
`
type Message struct {
Name, Text string
}
dec := json.NewDecoder(strings.NewReader(jsonStream))
// read open bracket
t, err := dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
// while the array contains values
for dec.More() {
var m Message
// decode an array value (Message)
err := dec.Decode(&m)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%v: %v\n", m.Name, m.Text)
}
// read closing bracket
t, err = dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%T: %v\n", t, t)
}

How to read config.json in Go lang?

I have used the following code in filLib.go:
func LoadConfiguration(filename string) (Configuration, error) {
bytes, err := ioutil.ReadFile(filename)
if err != nil {
return Configuration{}, err
}
var c Configuration
err = json.Unmarshal(bytes, &c)
if err != nil {
return Configuration{}, err
}
return c, nil
}
But ioutil.ReadFile(filename) return *os.PathError.
Both the files config.json and filLib.go are in same folder.
The path of *.go file is not directly relevant to the working directory of the executing compiled code. Verify where your code thinks it actually is (compare to where you think it should be :).
import(
"os"
"fmt"
"log"
)
func main() {
dir, err := os.Getwd()
if err != nil {
log.Fatal(err)
}
fmt.Println(dir)
}
The issue might be with the filename you're providing. Below is the code sample that working fine for me.
func loadConfig() {
var AppConfig Conf
raw, err := ioutil.ReadFile("conf/conf.json")
if err != nil {
log.Println("Error occured while reading config")
return
}
json.Unmarshal(raw, &AppConfig)
}
I found this library enter link description here
It is a very simple and easy to use configuration library, allowing Json based config files for your Go application. Configuration provider reads configuration data from config.json file. You can get the string value of a configuration, or bind an interface to a valid JSON section by related section name convention parameter.
Consider the following config.json file:
{
"ConnectionStrings": {
"DbConnection": "Server=.;User Id=app;Password=123;Database=Db",
"LogDbConnection": "Server=.;User Id=app;Password=123;Database=Log"
},
"Caching": {
"ApplicationKey": "key",
"Host": "127.0.01"
},
"Website": {
"ActivityLogEnable": "true",
"ErrorMessages": {
"InvalidTelephoneNumber": "Invalid Telephone Number",
"RequestNotFound": "Request Not Found",
"InvalidConfirmationCode": "Invalid Confirmation Code"
}
},
"Services": {
"List": [
{
"Id": 1,
"Name": "Service1"
},
{
"Id": 2,
"Name": "Service2"
},
{
"Id": 3,
"Name": "Service3"
}
]
}
}
The following code displays how to access some of the preceding configuration settings. You can get config value via GetSection function with specifying Json sections as string parameter split by ":"
c, err := jsonconfig.GetSection("ConnectionStrings:DbConnection")
Any valid Json is a valid configuration type. You can also bind a struct via jsonconfig. For example, Caching configuration can be bind to valid struct:
type Caching struct {
ApplicationKey string
Host string
}
var c Caching
err = jsonconfig.Bind(&c, "Caching")

how I can mapping complex JSON to other JSON

I am trying to build aggregation services, to all third party APIs that's I used,
this aggregation services taking json values coming from my main system and it will put this value to key equivalent to third party api key then, aggregation services it will send request to third party api with new json format.
example-1:
package main
import (
"encoding/json"
"fmt"
"log"
"github.com/tidwall/gjson"
)
func main() {
// mapping JSON
mapB := []byte(`
{
"date": "createdAt",
"clientName": "data.user.name"
}
`)
// from my main system
dataB := []byte(`
{
"createdAt": "2017-05-17T08:52:36.024Z",
"data": {
"user": {
"name": "xxx"
}
}
}
`)
mapJSON := make(map[string]interface{})
dataJSON := make(map[string]interface{})
newJSON := make(map[string]interface{})
err := json.Unmarshal(mapB, &mapJSON)
if err != nil {
log.Panic(err)
}
err = json.Unmarshal(dataB, &dataJSON)
if err != nil {
log.Panic(err)
}
for i := range mapJSON {
r := gjson.GetBytes(dataB, mapJSON[i].(string))
newJSON[i] = r.Value()
}
newB, err := json.MarshalIndent(newJSON, "", " ")
if err != nil {
log.Println(err)
}
fmt.Println(string(newB))
}
output:
{
"clientName": "xxx",
"date": "2017-05-17T08:52:36.024Z"
}
I use gjson package to get values form my main system request in simple way from a json document.
example-2:
import (
"encoding/json"
"fmt"
"log"
"github.com/tidwall/gjson"
)
func main() {
// mapping JSON
mapB := []byte(`
{
"date": "createdAt",
"clientName": "data.user.name",
"server":{
"google":{
"date" :"createdAt"
}
}
}
`)
// from my main system
dataB := []byte(`
{
"createdAt": "2017-05-17T08:52:36.024Z",
"data": {
"user": {
"name": "xxx"
}
}
}
`)
mapJSON := make(map[string]interface{})
dataJSON := make(map[string]interface{})
newJSON := make(map[string]interface{})
err := json.Unmarshal(mapB, &mapJSON)
if err != nil {
log.Panic(err)
}
err = json.Unmarshal(dataB, &dataJSON)
if err != nil {
log.Panic(err)
}
for i := range mapJSON {
r := gjson.GetBytes(dataB, mapJSON[i].(string))
newJSON[i] = r.Value()
}
newB, err := json.MarshalIndent(newJSON, "", " ")
if err != nil {
log.Println(err)
}
fmt.Println(string(newB))
}
output:
panic: interface conversion: interface {} is map[string]interface {}, not string
I can handle this error by using https://golang.org/ref/spec#Type_assertions, but what if this json object have array and inside this array have json object ....
my problem is I have different apis, every api have own json schema, and my way for mapping json only work if
third party api have json key value only, without nested json or array inside this array json object.
is there a way to mapping complex json schema, or golang package to help me to do that?
EDIT:
After comment interaction and with updated question. Before we move forward, I would like to mention.
I just looked at your example-2 Remember one thing. Mapping is from one form to another form. Basically one known format to targeted format. Each data type have to handled. You cannot do generic to generic mapping logically (technically feasible though, would take more time & efforts, you can play around on this).
I have created sample working program of one approach; it does a mapping of source to targeted format. Refer this program as a start point and use your creativity to implement yours.
Playground link: https://play.golang.org/p/MEk_nGcPjZ
Explanation: Sample program achieves two different source format to one target format. The program consist of -
Targeted Mapping definition of Provider 1
Targeted Mapping definition of Provider 2
Provider 1 JSON
Provider 2 JSON
Mapping function
Targeted JSON marshal
Key elements from program: refer play link for complete program.
type MappingInfo struct {
TargetKey string
SourceKeyPath string
DataType string
}
Map function:
func mapIt(mapping []*MappingInfo, parsedResult gjson.Result) map[string]interface{} {
mappedData := make(map[string]interface{})
for _, m := range mapping {
switch m.DataType {
case "time":
mappedData[m.TargetKey] = parsedResult.Get(m.SourceKeyPath).Time()
case "string":
mappedData[m.TargetKey] = parsedResult.Get(m.SourceKeyPath).String()
}
}
return mappedData
}
Output:
Provider 1 Result: map[date:2017-05-17 08:52:36.024 +0000 UTC clientName:provider1 username]
Provider 1 JSON: {
"clientName": "provider1 username",
"date": "2017-05-17T08:52:36.024Z"
}
Provider 2 Result: map[date:2017-05-12 06:32:46.014 +0000 UTC clientName:provider2 username]
Provider 2 JSON: {
"clientName": "provider2 username",
"date": "2017-05-12T06:32:46.014Z"
}
Good luck, happy coding!
Typically Converting/Transforming one structure to another structure, you will have to handle this with application logic.
As you mentioned in the question:
my problem is I have different apis, every api have own json schema
This is true for every aggregation system.
One approach to handle this requirement effectively; is to keep mapping of keys for each provider JSON structure and targeted JSON structure.
For example: This is an approach, please go with your design as you see fit.
JSON structures from various provider:
// Provider 1 : JSON structrure
{
"createdAt": "2017-05-17T08:52:36.024Z",
"data": {
"user": {
"name": "xxx"
}
}
}
// Provider 2 : JSON structrure
{
"username": "yyy"
"since": "2017-05-17T08:52:36.024Z",
}
Mapping for target JSON structure:
jsonMappingByProvider := make(map[string]string)
// Targeted Mapping for Provider 1
jsonMappingByProvider["provider1"] = `
{
"date": "createdAt",
"clientName": "data.user.name"
}
`
// Targeted Mapping for Provider 2
jsonMappingByProvider["provider2"] = `
{
"date": "since",
"clientName": "username"
}
`
Now, based the on the provider you're handling, get the mapping and map the response JSON into targeted structure.
// get the mapping info by provider
mapping := jsonMappingByProvider["provider1"]
// Parse the response JSON
// Do the mapping
This way you can control each provider and it's mapping effectively.

Resources