How to extract field from protobuf message without schema - go

According to this issue, the protoreflect package provides APIs for accessing the "unknown fields" of protobuf messages, but I don't see any way to use it if there isn't any existing schema. Basically, I want to perform a "weak decode", similar to what the JSON unmarshaller does if the output is of type map[string]interface{}.
The example from the documentation looks like this:
err := UnmarshalOptions{DiscardUnknown: true}.Unmarshal(b, m)
where b is the input byte slice and m is the output message, which needs to be initialised somehow, as you can see here. I was thinking that dynamicpb can be used for this purpose, but it doesn't look possible without having an existing MessageDescriptor and that's where I got stuck...

I was able to achieve this using the low level protowire package. Here is a full example, where I extract two fields of type uint64 (which happen to be assigned field numbers 4 and 5 in the original schema):
import "google.golang.org/protobuf/encoding/protowire"
func getData(src []byte) (creationTime, expiryTime uint64, err error) {
remaining := src
for len(remaining) > 0 {
fieldNum, wireType, n := protowire.ConsumeTag(remaining)
if n < 0 {
return 0, 0, fmt.Errorf("failed to consume tag: %w", protowire.ParseError(n))
}
remaining = remaining[n:]
switch fieldNum {
case 4: // Expiry time
if wireType != protowire.VarintType {
return 0, 0, fmt.Errorf("unexpected type for expiry time field: %d", wireType)
}
expiryTime, n = protowire.ConsumeVarint(remaining)
case 5: // Creation time
if wireType != protowire.VarintType {
return 0, 0, fmt.Errorf("unexpected type for creation time field: %d", wireType)
}
creationTime, n = protowire.ConsumeVarint(remaining)
default:
n = protowire.ConsumeFieldValue(fieldNum, wireType, remaining)
}
if n < 0 {
return 0, 0, fmt.Errorf("failed to consume value for field %d: %w", fieldNum, protowire.ParseError(n))
}
remaining = remaining[n:]
}
return
}

Related

How to get columns data from golang apache-arrow?

I am using apache-arrow/go to read parquet data.
I can parse the data to table by using apach-arrow.
reader, err := ipc.NewReader(buf, ipc.WithAllocator(alloc))
if err != nil {
log.Println(err.Error())
return nil
}
defer reader.Release()
records := make([]array.Record, 0)
for reader.Next() {
rec := reader.Record()
rec.Retain()
defer rec.Release()
records = append(records, rec)
}
table := array.NewTableFromRecords(reader.Schema(), records)
Here, i can get the column info from table.Colunmn(index), such as:
for i, _ := range table.Schema().Fields() {
a := table.Column(i)
log.Println(a)
}
But the Column struct is defined as
type Column struct {
field arrow.Field
data *Chunked
}
and the println result is like
["WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN"]
However, this is not a string or slice. Is there anyway that i can get the data of each column with string type or []interface{} ?
Update:
I find that i can use reflect to get the element from col.
log.Println(col.(*array.Int64).Value(0))
But i am not sure if this is the recommended way to use it.
When working with Arrow data, there's a couple concepts to understand:
Array: Metadata + contiguous buffers of data
Record Batch: A schema + a collection of Arrays that are all the same length.
Chunked Array: A group of Arrays of varying lengths but all the same data type. This allows you to treat multiple Arrays as one single column of data without having to copy them all into a contiguous buffer.
Column: Is just a Field + a Chunked Array
Table: A collection of Columns allowing you to treat multiple non-contiguous arrays as a single large table without having to copy them all into contiguous buffers.
In your case, you're reading multiple record batches (groups of contiguous Arrays) and treating them as a single large table. There's a few different ways you can work with the data:
One way is to use a TableReader:
tr := array.NewTableReader(tbl, 5)
defer tr.Release()
for tr.Next() {
rec := tr.Record()
for i, col := range rec.Columns() {
// do something with the Array
}
}
Another way would be to interact with the columns directly as you were in your example:
for i := 0; i < table.NumCols(); i++ {
col := table.Column(i)
for _, chunk := range col.Data().Chunks() {
// do something with chunk (an arrow.Array)
}
}
Either way, you eventually have an arrow.Array to deal with, which is an interface containing one of the typed Array types. At this point you are going to have to switch on something, you could type switch on the type of the Array itself:
switch arr := col.(type) {
case *array.Int64:
// do stuff with arr
case *array.Int32:
// do stuff with arr
case *array.String:
// do stuff with arr
...
}
Alternately, you could type switch on the data type:
switch col.DataType().ID() {
case arrow.INT64:
// type assertion needed col.(*array.Int64)
case arrow.INT32:
// type assertion needed col.(*array.Int32)
...
}
For getting the data out of the array, primitive types which are stored contiguously tend to have a *Values method which will return a slice of the type. For example array.Int64 has Int64Values() which returns []int64. Otherwise, all of the types have .Value(int) methods which return the value at a particular index as you showed in your example.
Hope this helps!
Make sure you use v9
(import "github.com/apache/arrow/go/v9/arrow") because it have implemented json.Marshaller (from go-json)
Use "github.com/goccy/go-json" for Marshaler (because of this)
Then you can use TableReader to Marshal it then Unmarshal with type []any
In your example maybe look like this:
import (
"github.com/apache/arrow/go/v9/arrow"
"github.com/apache/arrow/go/v9/arrow/array"
"github.com/apache/arrow/go/v9/arrow/memory"
"github.com/goccy/go-json"
)
...
tr := array.NewTableReader(tabel, 6)
defer tr.Release()
// fmt.Printf("tbl.NumRows() = %+v\n", tbl.NumRows())
// fmt.Printf("tbl.NumColumn = %+v\n", tbl.NumCols())
// keySlice is for sorting same as data source
keySlice := make([]string, 0, tabel.NumCols())
res := make(map[string][]any, 0)
var key string
for tr.Next() {
rec := tr.Record()
for i, col := range rec.Columns() {
key = rec.ColumnName(i)
if res[key] == nil {
res[key] = make([]any, 0)
keySlice = append(keySlice, key)
}
var tmp []any
b2, err := json.Marshal(col)
if err != nil {
panic(err)
}
err = json.Unmarshal(b2, &tmp)
if err != nil {
panic(err)
}
// fmt.Printf("key = %s\n", key)
// fmt.Printf("tmp = %+v\n", tmp)
res[key] = append(res[key], tmp...)
}
}
fmt.Println("res", res)

Size control on logging an unknown length of parameters

The Problem:
Right now, I'm logging my SQL query and the args that related to that query, but what will happen if my args weight a lot? say 100MB?
The Solution:
I want to iterate over the args and once they exceeded the 0.5MB I want to take the args up till this point and only log them (of course I'll use the entire args set in the actual SQL query).
Where am stuck:
I find it hard to find the size on the disk of an interface{}.
How can I print it? (there is a nicer way to do it than %v?)
The concern is mainly focused on the first section, how can I find the size, I need to know the type, if its an array, stack, heap, etc..
If code helps, here is my code structure (everything sits in dal pkg in util file):
package dal
import (
"fmt"
)
const limitedLogArgsSizeB = 100000 // ~ 0.1MB
func parsedArgs(args ...interface{}) string {
currentSize := 0
var res string
for i := 0; i < len(args); i++ {
currentEleSize := getSizeOfElement(args[i])
if !(currentSize+currentEleSize =< limitedLogArgsSizeB) {
break
}
currentSize += currentEleSize
res = fmt.Sprintf("%s, %v", res, args[i])
}
return "[" + res + "]"
}
func getSizeOfElement(interface{}) (sizeInBytes int) {
}
So as you can see I expect to get back from parsedArgs() a string that looks like:
"[4378233, 33, true]"
for completeness, the query that goes with it:
INSERT INTO Person (id,age,is_healthy) VALUES ($0,$1,$2)
so to demonstrate the point of all of this:
lets say the first two args are equal exactly to the threshold of the size limit that I want to log, I will only get back from the parsedArgs() the first two args as a string like this:
"[4378233, 33]"
I can provide further details upon request, Thanks :)
Getting the memory size of arbitrary values (arbitrary data structures) is not impossible but "hard" in Go. For details, see How to get memory size of variable in Go?
The easiest solution could be to produce the data to be logged in memory, and you can simply truncate it before logging (e.g. if it's a string or a byte slice, simply slice it). This is however not the gentlest solution (slower and requires more memory).
Instead I would achieve what you want differently. I would try to assemble the data to be logged, but I would use a special io.Writer as the target (which may be targeted at your disk or at an in-memory buffer) which keeps track of the bytes written to it, and once a limit is reached, it could discard further data (or report an error, whatever suits you).
You can see a counting io.Writer implementation here: Size in bits of object encoded to JSON?
type CounterWr struct {
io.Writer
Count int
}
func (cw *CounterWr) Write(p []byte) (n int, err error) {
n, err = cw.Writer.Write(p)
cw.Count += n
return
}
We can easily change it to become a functional limited-writer:
type LimitWriter struct {
io.Writer
Remaining int
}
func (lw *LimitWriter) Write(p []byte) (n int, err error) {
if lw.Remaining == 0 {
return 0, io.EOF
}
if lw.Remaining < len(p) {
p = p[:lw.Remaining]
}
n, err = lw.Writer.Write(p)
lw.Remaining -= n
return
}
And you can use the fmt.FprintXXX() functions to write into a value of this LimitWriter.
An example writing to an in-memory buffer:
buf := &bytes.Buffer{}
lw := &LimitWriter{
Writer: buf,
Remaining: 20,
}
args := []interface{}{1, 2, "Looooooooooooong"}
fmt.Fprint(lw, args)
fmt.Printf("%d %q", buf.Len(), buf)
This will output (try it on the Go Playground):
20 "[1 2 Looooooooooooon"
As you can see, our LimitWriter only allowed to write 20 bytes (LimitWriter.Remaining), and the rest were discarded.
Note that in this example I assembled the data in an in-memory buffer, but in your logging system you can write directly to your logging stream, just wrap it in LimitWriter (so you can completely omit the in-memory buffer).
Optimization tip: if you have the arguments as a slice, you may optimize the truncated rendering by using a loop, and stop printing arguments once the limit is reached.
An example doing this:
buf := &bytes.Buffer{}
lw := &LimitWriter{
Writer: buf,
Remaining: 20,
}
args := []interface{}{1, 2, "Loooooooooooooooong", 3, 4, 5}
io.WriteString(lw, "[")
for i, v := range args {
if _, err := fmt.Fprint(lw, v, " "); err != nil {
fmt.Printf("Breaking at argument %d, err: %v\n", i, err)
break
}
}
io.WriteString(lw, "]")
fmt.Printf("%d %q", buf.Len(), buf)
Output (try it on the Go Playground):
Breaking at argument 3, err: EOF
20 "[1 2 Loooooooooooooo"
The good thing about this is that once we reach the limit, we don't have to produce the string representation of the remaining arguments that would be discarded anyway, saving some CPU (and memory) resources.

Get data from Twitter Library search into a struct in Go

How do I append output from a twitter search to the field Data in the SearchTwitterOutput{} struct.
Thanks!
I am using a twitter library to search twitter base on a query input. The search returns an array of strings(I believe), I am able to fmt.println the data but I need the data as a struct.
type SearchTwitterOutput struct {
Data string
}
func (SearchTwitter) execute(input SearchTwitterInput) (*SearchTwitterOutput, error) {
credentials := Credentials{
AccessToken: input.AccessToken,
AccessTokenSecret: input.AccessTokenSecret,
ConsumerKey: input.ConsumerKey,
ConsumerSecret: input.ConsumerSecret,
}
client, err := GetUserClient(&credentials)
if err != nil {
return nil, err
}
// search through the tweet and returns a
search, _ , err := client.Search.Tweets(&twitter.SearchTweetParams{
Query: input.Text,
})
if err != nil {
println("PANIC")
panic(err.Error())
return &SearchTwitterOutput{}, err
}
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
}
return &SearchTwitterOutput{
Data: "test", //data is a string for now it can be anything
}, nil
}
//Data field is a string type for now it can be anything
//I use "test" as a placeholder, bc IDK...
Result from fmt.Printf("Tweet %d - %s\n", k, v.Text):
Tweet 0 - You know I had to do it to them! #JennaJulien #Jenna_Marbles #juliensolomita #notjulen Got my first hydroflask ever…
Tweet 1 - RT #brenna_hinshaw: I was in J2 today and watched someone fill their hydroflask with vanilla soft serve... what starts here changes the wor…
Tweet 2 - I miss my hydroflask :(
This is my second week working with go and new to development. Any help would be great.
It doesn't look like the client is just returning you a slice of strings. The range syntax you're using (for k, v := range search.Statuses) returns two values for each iteration, the index in the slice (in this case k), and the object from the slice (in this case v). I don't know the type of search.Statuses - but I know that strings don't have a .Text field or method, which is how you're printing v currently.
To your question:
Is there any particular reason to return just a single struct with a Data field rather than directly returning the output of the twitter client?
Your function signature could look like this instead:
func (SearchTwitter) execute(input SearchTwitterInput) ([]<client response struct>, error)
And then you could operate on the text in those objects in wherever this function was called.
If you're dead-set on placing the data in your own struct, you could return a slice of them ([]*SearchTwitterOutput), in which case you could build a single SearchTwitterOutput in the for loop you're currently printing the tweets in and append it to the output list. That might look like this:
var output []*SearchTwitterOutput
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
output = append(output, &SearchTwitterOutput{
Data: v.Text,
})
}
return output, nil
But if your goal really is to return all of the results concatenated together and placed inside a single struct, I would suggest building a slice of strings (containing the text you want), and then joining them with the delimiter of your choosing. Then you could place the single output string in your return object, which might look something like this:
var outputStrings []string
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
outputStrings = append(outputStrings, v.Text)
}
output = strings.Join(outputStrings, ",")
return &SearchTwitterOutput{
Data: output,
}, nil
Though I would caution, it might be tricky to find a delimiter that will never show up in a tweet..

Get current position in stream from net/html tokenizer

I'm trying to figure out if there's a way to get the current character position of a tag using the golang.org/x/net/html tokenizer library?
Simplified code looks like:
func LookForForm(body string) {
reader := strings.NewReader(body)
tokenizer := html.NewTokenizer(reader)
idx := 0
lastIdx := 0
for {
token := tokenizer.Next()
lastIdx = idx
idx = int(reader.Size()) - int(reader.Len())
switch token {
case html.ErrorToken:
return
case html.StartTagToken:
t := tokenizer.Token()
tagName := strings.ToLower(t.Data)
if tagName == "form" {
fmt.Printf("found at form at %d\n", lastIdx)
return
}
}
}
}
This doesn't work (I think) because reader is not reading character-by-character but by chunks so my calculation of Size - Len is invalid. tokenizer maintains two private span structs ( https://github.com/golang/net/blob/master/html/token.go line 147) but I am unaware of how to access them.
One possible solution that just occurred to me is to make a "reader" that only reads a single character at a time so my Size and Len calculations are always correct. But, that seems like a hack and any suggestions would be appreciated.
You might be able to accomplish what you are trying to do (not what you want) with careful arithmetic using Tokenizer's Buffered method which returns the slice of bytes currently in buffer that have yet been tokenized. But I don't think you will get what you wanted, as <div><form></form></div> would probably buffer the whole string before give you the first div token. In that case the size of the buffered content is not helpful in calculating the solution.
Tokenizing mark up lang with nested structure will almost always need to buffer the input to work. the private span attribute should be quite useless as it is only a reference in it's buffer, not absolute position from the reader.
Since the html Tokenizer is not providing an API to access the raw position of a tag in the original data, to get want you wanted I probably would just do a strings.Index or bytes.Index on the raw buffer of the token to get the position:
strings.Index(body, string(tokenizer.Raw()))
A non-buffering reader ended up working ok for me. The implementation of the reader looks something like:
package rule
import (
"errors"
"io"
"unicode/utf8"
)
type Reader struct {
s string
i int64
z int64
prevRune int64 // index of the previously read rune or -1
}
func (r *Reader) String() string {
return r.s
}
func (r *Reader) Len() int {
if r.i >= r.z {
return 0
}
return int(r.z - r.i)
}
func (r *Reader) Size() int64 {
return r.z
}
func (r *Reader) Pos() int64 {
return r.i
}
func (r *Reader) Read(b []byte) (int, error) {
if r.i >= r.z {
return 0, io.EOF
}
r.prevRune = -1
b[0] = r.s[r.i]
r.i += 1
return 1, nil
}
Then the loop for the tokenizer is fairly easy to calculate:
reader := NewReader(body)
tokenizer := html.NewTokenizer(reader)
idx := 0
lastIdx := 0
tokenLoop:
for {
token := tokenizer.Next()
switch token {
case html.ErrorToken:
break tokenLoop
case html.EndTagToken, html.TextToken, html.CommentToken, html.SelfClosingTagToken:
lastIdx = int(reader.Pos())
case html.StartTagToken:
t := tokenizer.Token()
tagName := strings.ToLower(t.Data)
idx = int(reader.Pos())
if tagName == "form" {
fmt.Printf("found at form at %d\n", lastIdx)
return
}
}
}

In Go Language, how do I unmarshal json to array of object?

I have the following JSON, and I want to parse it into array of class:
{
"1001": {"level":10, "monster-id": 1001, "skill-level": 1, "aimer-id": 301}
"1002": {"level":12, "monster-id": 1002, "skill-level": 1, "aimer-id": 302}
"1003": {"level":16, "monster-id": 1003, "skill-level": 2, "aimer-id": 303}
}
Here is what i am trying to do but failed:
type Monster struct {
MonsterId int32
Level int32
SkillLevel int32
AimerId int32
}
type MonsterCollection struct {
Pool map[string]Monster
}
func (mc *MonsterCollection) FromJson(jsonStr string) {
var data interface{}
b := []byte(jsonStr)
err := json.Unmarshal(b, &data)
if err != nil {
return
}
m := data.(map[string]interface{})
i := 0
for k, v := range m {
monster := new(Monster)
monster.Level = v["level"]
monster.MonsterId = v["monster-id"]
monster.SkillLevel = v["skill-level"]
monster.AimerId = v["aimer-id"]
mc.Pool[i] = monster
i++
}
}
The compiler complain about the v["level"]
<< invalid operation. index of type interface().
This code has many errors in it. To start with, the json isn't valid json. You are missing the commas in between key pairs in your top level object. I added the commas and pretty printed it for you:
{
"1001":{
"level":10,
"monster-id":1001,
"skill-level":1,
"aimer-id":301
},
"1002":{
"level":12,
"monster-id":1002,
"skill-level":1,
"aimer-id":302
},
"1003":{
"level":16,
"monster-id":1003,
"skill-level":2,
"aimer-id":303
}
}
Your next problem (the one you asked about) is that m := data.(map[string]interface{}) makes m a map[string]interface{}. That means when you index it such as the v in your range loop, the type is interface{}. You need to type assert it again with v.(map[string]interface{}) and then type assert each time you read from the map.
I also notice that you next attempt mc.Pool[i] = monster when i is an int and mc.Pool is a map[string]Monster. An int is not a valid key for that map.
Your data looks very rigid so I would make unmarshall do most of the work for you. Instead of providing it a map[string]interface{}, you can provide it a map[string]Monster.
Here is a quick example. As well as changing how the unmarshalling works, I also added an error return. The error return is useful for finding bugs. That error return is what told me you had invalid json.
type Monster struct {
MonsterId int32 `json:"monster-id"`
Level int32 `json:"level"`
SkillLevel int32 `json:"skill-level"`
AimerId int32 `json:"aimer-id"`
}
type MonsterCollection struct {
Pool map[string]Monster
}
func (mc *MonsterCollection) FromJson(jsonStr string) error {
var data = &mc.Pool
b := []byte(jsonStr)
return json.Unmarshal(b, data)
}
I posted a working example to goplay: http://play.golang.org/p/4EaasS2VLL
Slightly off to one side - you asked for an array of objects when you needed a map
If you need an array (actually a slice)
http://ioblocks.blogspot.com/2014/09/loading-arrayslice-of-objects-from-json.html

Resources