Similar to this question
How to extract schema for avro file in python
Is there a way to read in an avro file in golang without knowing the schema beforehand and extract a schema?
How about something like this (adapted code from https://github.com/hamba/avro/blob/master/ocf/ocf.go):
package main
import (
"github.com/hamba/avro"
"log"
"os"
)
// HeaderSchema is the Avro schema of a container file header.
var HeaderSchema = avro.MustParse(`{
"type": "record",
"name": "org.apache.avro.file.Header",
"fields": [
{"name": "magic", "type": {"type": "fixed", "name": "Magic", "size": 4}},
{"name": "meta", "type": {"type": "map", "values": "bytes"}},
{"name": "sync", "type": {"type": "fixed", "name": "Sync", "size": 16}}
]
}`)
var magicBytes = [4]byte{'O', 'b', 'j', 1}
const (
schemaKey = "avro.schema"
)
// Header represents an Avro container file header.
type Header struct {
Magic [4]byte `avro:"magic"`
Meta map[string][]byte `avro:"meta"`
Sync [16]byte `avro:"sync"`
}
func main() {
r, err := os.Open("path/my.avro")
if err != nil {
log.Fatal(err)
}
defer r.Close()
reader := avro.NewReader(r, 1024)
var h Header
reader.ReadVal(HeaderSchema, &h)
if reader.Error != nil {
log.Println("decoder: unexpected error: %v", reader.Error)
}
if h.Magic != magicBytes {
log.Println("decoder: invalid avro file")
}
schema, err := avro.Parse(string(h.Meta[schemaKey]))
if err != nil {
log.Println(err)
}
log.Println(schema)
}
Both https://github.com/hamba/avro and https://github.com/linkedin/goavro can decode Avro OCF files (which it sounds like is what you have) without an explicit schema file.
Once you've created a new reader/decoder, you can retrieve the metadata, which includes the schema at key avro.schema: https://pkg.go.dev/github.com/hamba/avro/ocf#Decoder.Metadata and https://pkg.go.dev/github.com/linkedin/goavro#OCFReader.MetaData
Related
I'm trying to create an AVRO file using Go. So far I tried a couple of libraries and I have some code.
The problem is that I can work with the data but don't know how to serialize it to store it. Here's the code I got from github.com/hamba/avro with some small modifications.
import (
"fmt"
"github.com/hamba/avro"
"log"
)
type SimpleRecord struct {
A int64 `avro:"a"`
B string `avro:"b"`
}
func main() {
schema, err := avro.Parse(`{
"type": "record",
"name": "simple",
"namespace": "hamba",
"fields" : [
{"name": "a", "type": "long"},
{"name": "b", "type": "string"}
]
}`)
if err != nil {
log.Fatal(err)
}
in := SimpleRecord{A: 27, B: "foo"}
data, err := avro.Marshal(schema, in)
if err != nil {
log.Fatal(err)
}
fmt.Println(data)
}
This block of code prints:
[54 6 102 111 111]
This line corresponds to the avro encoding of the data. And it seems like this is all I need to store, but I don't know how to create the file itself.
I tried:
mode := int(0644)
permissions := os.FileMode(mode)
err = ioutil.WriteFile("file.avro", data, permissions)
if err != nil {
log.Fatal(err)
}
And it generates a file. However, when I try to read it as an AVRO file using Python fastavro library, I get the error ValueError: cannot read header - is it an avro file?.
But according to the docs (https://godoc.org/github.com/hamba/avro#example-Marshal): "Marshal returns the Avro encoding of v." Marshal(schema Schema, v interface{}) ([]byte, error), so data should be of type []byte.
Avro defines the data encoding format only which can be packaged as messages or files. So, for file storage should use Avro OCF - Avro Object Container Files. Here is a working hamba avro ocf encoder example.
In my code I've encoded multiple rows to upload it to BigQuery (error checks, init, and close are omitted for clarity):
f, err := os.Open("/your/avro/file.avro")
enc, err := ocf.NewEncoder(schema, w, ocf.WithCodec(ocf.Snappy))
for _, item := range items {
enc.Encode(item)
}
I'm having issues making github.com/xeipuuv/gojsonschema work for my REST API that I'm currently building.
The procedure would look like this
User sends request to /api/books/create (in this case I'm sending a PUT request)
User inputs body parameters name and content
The server converts these body parameters into readable JSON
The server tries to validate the JSON using a json schema
The server performs the request
or that is how it should work.
I get this error when trying to validate the JSON and I have no clue how to fix it.
http: panic serving [::1]:58611: parse {"name":"1","content":"2"}: first path segment in URL cannot contain colon
type CreateParams struct {
Name string
Content string
}
func Create(w http.ResponseWriter, r *http.Request) {
r.ParseForm()
data := &CreateParams{
Name: r.Form.Get("name"),
Content: r.Form.Get("Content"),
}
jsonData, err := json.Marshal(data)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(jsonData))
schema := `{
"required": [
"Name",
"Content"
],
"properties": {
"Name": {
"$id": "#/properties/Name",
"type": "string",
"title": "The Name Schema",
"default": "",
"examples": [
"1"
],
"minLength": 3,
"pattern": "^(.*)$"
},
"Content": {
"$id": "#/properties/Content",
"type": "string",
"title": "The Content Schema",
"default": "",
"examples": [
"2"
],
"pattern": "^(.*)$"
}
}
}`
schemaLoader := gojsonschema.NewStringLoader(schema)
documentLoader := gojsonschema.NewReferenceLoader(string(jsonData))
result, err := gojsonschema.Validate(schemaLoader, documentLoader)
if err != nil {
panic(err.Error())
}
if result.Valid() {
fmt.Printf("The document is valid\n")
} else {
fmt.Printf("The document is not valid. see errors :\n")
for _, desc := range result.Errors() {
fmt.Printf("- %s\n", desc)
}
}
}
My first thought was that it breaks because r.ParseForm() outputs things in a weird way, but I'm not sure anymore.
Note that I would like to have a "universal" method as I'm dealing with all kinds of requests: GET, POST, PUT, etc. But if you have a better solution I could work with that.
Any help is appreciated!
So, my use case consists of parsing varying JSON schemas into new struct types, which will be further used with an ORM to fetch data from a SQL database. Being compiled in nature, I believe there will not be an out-of-the-box solution in go, but is there any hack available to do this, without creating a separate go process. I tried with reflection, but could not find a satisfactory approach.
Currently, I am using a-h generate library which does generate the structs, but I am stuck at how to load these new struct types in go runtime.
EDIT
Example JSON Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Address",
"id": "Address",
"type": "object",
"description": "address",
"properties": {
"houseName": {
"type": "string",
"description": "House Name",
"maxLength": 30
},
"houseNumber": {
"type": "string",
"description": "House Number",
"maxLength": 4
},
"flatNumber": {
"type": "string",
"description": "Flat",
"maxLength": 15
},
"street": {
"type": "string",
"description": "Address 1",
"maxLength": 40
},
"district": {
"type": "string",
"description": "Address 2",
"maxLength": 30
},
"town": {
"type": "string",
"description": "City",
"maxLength": 20
},
"county": {
"type": "string",
"description": "County",
"maxLength": 20
},
"postcode": {
"type": "string",
"description": "Postcode",
"maxLength": 8
}
}
}
Now, in the above-mentioned library, there is a command line tool, which generates the text for struct type for above json as below:
// Code generated by schema-generate. DO NOT EDIT.
package main
// Address address
type Address struct {
County string `json:"county,omitempty"`
District string `json:"district,omitempty"`
FlatNumber string `json:"flatNumber,omitempty"`
HouseName string `json:"houseName,omitempty"`
HouseNumber string `json:"houseNumber,omitempty"`
Postcode string `json:"postcode,omitempty"`
Street string `json:"street,omitempty"`
Town string `json:"town,omitempty"`
}
Now, the issue is that how to use this struct type without re-compilation in the program. There is a hack, where I can start a new go process, but that doesn't seem a good way to do it. One other way is to write my own parser for unmarshalling JSON schema, something like:
b := []byte(`{"Name":"Wednesday","Age":6,"Parents":["Gomez","Morticia"]}`)
var f interface{}
json.Unmarshal(b, &f)
m := f.(map[string]interface{})
for k, v := range m {
switch vv := v.(type) {
case string:
fmt.Println(k, "is string", vv)
case float64:
fmt.Println(k, "is float64", vv)
case int:
fmt.Println(k, "is int", vv)
case []interface{}:
fmt.Println(k, "is an array:")
for i, u := range vv {
fmt.Println(i, u)
}
default:
fmt.Println(k, "is of a type I don't know how to handle")
}
}
Can someone please suggest some pointers to look for. Thanks.
So it looks like you're trying to implement your own json marshalling. That's no biggie: the standard json package already supports that. Just have your type implement the MarshalJSON and UnmarshalJSON functions (cf first example on the docs). Assuming some fields will be shared (eg schema, id, type), you can create a unified type like this:
// poor naming, but we need this level of wrapping here
type Data struct {
Metadata
}
type Metadata struct {
Schema string `json:"$schema"`
Type string `json:"type"`
Description string `json:"description"`
Id string `json:"id"`
Properties json.RawMessage `json:"properties"`
Address *Address `json:"-"`
// other types go here, too
}
Now all properties will be unmarshalled into a json.RawMessage field (essentially this is a []byte field). What you can do in your custom unmarshall function now is something like this:
func (d *Data) UnmarshalJSON(b []byte) error {
meta := Metadata{}
// unmarshall common fields
if err := json.Unmarshal(b, &meta); err != nil {
return err
}
// Assuming the Type field contains the value that allows you to determine what data you're actually unmarshalling
switch meta.Type {
case "address":
meta.Address = &Address{} // initialise field
if err := json.Unmarshal([]byte(meta.Properties), meta.Address); err != nil {
return err
}
case "name":
meta.Name = &Name{}
if err := json.Unmarshal([]byte(meta.Properties), meta.Name); err != nil {
return err
}
default:
return errors.New("unknown message type")
}
// all done
d.Metadata = meta // assign to embedded
// optionally: clean up the Properties field, as it contains raw JSON, and is exported
d.Metadata.Properties = json.RawMessage{}
return nil
}
You can do pretty much the same thing for marshalling. First work out what type you're actually working with, then marshal that object into the properties field, and then marhsal the entire structure
func (d Data) MarshalJSON() ([]byte, error) {
var (
prop []byte
err error
)
switch {
case d.Metadata.Address != nil:
prop, err = json.Marshal(d.Address)
case d.Metadata.Name != nil:
prop, err = json.Marshal(d.Name) // will only work if field isn't masked, better to be explicit
default:
err = errors.New("No properties to marshal") // handle in whatever way is best
}
if err != nil {
return nil, err
}
d.Metadata.Properties = json.RawMessage(prop)
return json.Marshal(d.Metadata) // marshal the unified type here
}
I have the eclipse software ready to go.
What I am looking for is to debug output the return data for calling search.RetrieveFeeds().
So, in essence, I'd be able to see an array of Feed structs so i know it processed the json data file
When I execute I get this error:
main\main.go:10:38: multiple-value search.RetrieveFeeds() in single-value context
I looked here: Multiple values in single-value context but it was a bit over my head.
Here is what I have so far as my architecture:
Data Json File Path
src/data/data.json
Data for Json File
[
{
"site": "npr",
"link": "http://www.npr.org/rss/rss.php?id=1001",
"type": "rss"
},
{
"site": "cnn",
"link": "http://rss.cnn.com/rss/cnn_world.rss",
"type": "rss"
},
{
"site": "foxnews",
"link": "http://feeds.foxnews.com/foxnews/world?format=xml",
"type": "rss"
},
{
"site": "nbcnews",
"link": "http://feeds.nbcnews.com/feeds/topstories",
"type": "rss"
}
]
Main Method File Path
src/main/main.go
Code for Main Method
package main
import (
"fmt"
"search"
)
func main() {
fmt.Println("HellAo")
var feeds = search.RetrieveFeeds()
fmt.Printf("%v",feeds)
}
Search File Path
src/search/feed.go
Search File Code
package search
import (
"encoding/json"
"os"
)
const dataFile = "data/data.json"
type Feed struct {
Name string `json:"site"`
URI string `json:"link"`
Type string `json:"type"`
}
func RetrieveFeeds() ([]*Feed, error){
file, err := os.Open(dataFile)
if err != nil {
return nil, err
}
defer file.Close()
var feeds []*Feed
err = json.NewDecoder(file).Decode(&feeds)
return feeds, err
}
UPDATE
I changed the data json path to:
const dataFile = "src/data/data.json"
And now the debug dump says:
<nil>
If given the string, from a MediaWiki API request:
str = ` {
"query": {
"pages": {
"66984": {
"pageid": 66984,
"ns": 0,
"title": "Main Page",
"touched": "2012-11-23T06:44:22Z",
"lastrevid": 1347044,
"counter": "",
"length": 28,
"redirect": "",
"starttimestamp": "2012-12-15T05:21:21Z",
"edittoken": "bd7d4a61cc4ce6489e68c21259e6e416+\\"
}
}
}
}`
What can be done to get the edittoken, using Go's json package (keep in mind the 66984 number will continually change)?
When you have a changing key like this the best way to deal with it is with a map. In the example below I've used structs up until the point we reach a changing key. Then I switched to a map format after that. I linked up a working example as well.
http://play.golang.org/p/ny0kyafgYO
package main
import (
"fmt"
"encoding/json"
)
type query struct {
Query struct {
Pages map[string]interface{}
}
}
func main() {
str := `{"query":{"pages":{"66984":{"pageid":66984,"ns":0,"title":"Main Page","touched":"2012-11-23T06:44:22Z","lastrevid":1347044,"counter":"","length":28,"redirect":"","starttimestamp":"2012-12-15T05:21:21Z","edittoken":"bd7d4a61cc4ce6489e68c21259e6e416+\\"}}}}`
q := query{}
err := json.Unmarshal([]byte(str), &q)
if err!=nil {
panic(err)
}
for _, p := range q.Query.Pages {
fmt.Printf("edittoken = %s\n", p.(map[string]interface{})["edittoken"].(string))
}
}
Note that if you use the &indexpageids=true parameter in the API request URL, the result will contain a "pageids" array, like so:
str = ` {
"query": {
"pageids": [
"66984"
],
"pages": {
"66984": {
"pageid": 66984,
"ns": 0,
"title": "Main Page",
"touched": "2012-11-23T06:44:22Z",
"lastrevid": 1347044,
"counter": "",
"length": 28,
"redirect": "",
"starttimestamp": "2012-12-15T05:21:21Z",
"edittoken": "bd7d4a61cc4ce6489e68c21259e6e416+\\"
}
}
}
}`
so you can use pageids[0] to access the continually changing number, which will likely make things easier.