How to get columns data from golang apache-arrow? - go

I am using apache-arrow/go to read parquet data.
I can parse the data to table by using apach-arrow.
reader, err := ipc.NewReader(buf, ipc.WithAllocator(alloc))
if err != nil {
log.Println(err.Error())
return nil
}
defer reader.Release()
records := make([]array.Record, 0)
for reader.Next() {
rec := reader.Record()
rec.Retain()
defer rec.Release()
records = append(records, rec)
}
table := array.NewTableFromRecords(reader.Schema(), records)
Here, i can get the column info from table.Colunmn(index), such as:
for i, _ := range table.Schema().Fields() {
a := table.Column(i)
log.Println(a)
}
But the Column struct is defined as
type Column struct {
field arrow.Field
data *Chunked
}
and the println result is like
["WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN"]
However, this is not a string or slice. Is there anyway that i can get the data of each column with string type or []interface{} ?
Update:
I find that i can use reflect to get the element from col.
log.Println(col.(*array.Int64).Value(0))
But i am not sure if this is the recommended way to use it.

When working with Arrow data, there's a couple concepts to understand:
Array: Metadata + contiguous buffers of data
Record Batch: A schema + a collection of Arrays that are all the same length.
Chunked Array: A group of Arrays of varying lengths but all the same data type. This allows you to treat multiple Arrays as one single column of data without having to copy them all into a contiguous buffer.
Column: Is just a Field + a Chunked Array
Table: A collection of Columns allowing you to treat multiple non-contiguous arrays as a single large table without having to copy them all into contiguous buffers.
In your case, you're reading multiple record batches (groups of contiguous Arrays) and treating them as a single large table. There's a few different ways you can work with the data:
One way is to use a TableReader:
tr := array.NewTableReader(tbl, 5)
defer tr.Release()
for tr.Next() {
rec := tr.Record()
for i, col := range rec.Columns() {
// do something with the Array
}
}
Another way would be to interact with the columns directly as you were in your example:
for i := 0; i < table.NumCols(); i++ {
col := table.Column(i)
for _, chunk := range col.Data().Chunks() {
// do something with chunk (an arrow.Array)
}
}
Either way, you eventually have an arrow.Array to deal with, which is an interface containing one of the typed Array types. At this point you are going to have to switch on something, you could type switch on the type of the Array itself:
switch arr := col.(type) {
case *array.Int64:
// do stuff with arr
case *array.Int32:
// do stuff with arr
case *array.String:
// do stuff with arr
...
}
Alternately, you could type switch on the data type:
switch col.DataType().ID() {
case arrow.INT64:
// type assertion needed col.(*array.Int64)
case arrow.INT32:
// type assertion needed col.(*array.Int32)
...
}
For getting the data out of the array, primitive types which are stored contiguously tend to have a *Values method which will return a slice of the type. For example array.Int64 has Int64Values() which returns []int64. Otherwise, all of the types have .Value(int) methods which return the value at a particular index as you showed in your example.
Hope this helps!

Make sure you use v9
(import "github.com/apache/arrow/go/v9/arrow") because it have implemented json.Marshaller (from go-json)
Use "github.com/goccy/go-json" for Marshaler (because of this)
Then you can use TableReader to Marshal it then Unmarshal with type []any
In your example maybe look like this:
import (
"github.com/apache/arrow/go/v9/arrow"
"github.com/apache/arrow/go/v9/arrow/array"
"github.com/apache/arrow/go/v9/arrow/memory"
"github.com/goccy/go-json"
)
...
tr := array.NewTableReader(tabel, 6)
defer tr.Release()
// fmt.Printf("tbl.NumRows() = %+v\n", tbl.NumRows())
// fmt.Printf("tbl.NumColumn = %+v\n", tbl.NumCols())
// keySlice is for sorting same as data source
keySlice := make([]string, 0, tabel.NumCols())
res := make(map[string][]any, 0)
var key string
for tr.Next() {
rec := tr.Record()
for i, col := range rec.Columns() {
key = rec.ColumnName(i)
if res[key] == nil {
res[key] = make([]any, 0)
keySlice = append(keySlice, key)
}
var tmp []any
b2, err := json.Marshal(col)
if err != nil {
panic(err)
}
err = json.Unmarshal(b2, &tmp)
if err != nil {
panic(err)
}
// fmt.Printf("key = %s\n", key)
// fmt.Printf("tmp = %+v\n", tmp)
res[key] = append(res[key], tmp...)
}
}
fmt.Println("res", res)

Related

Append to golang slice passed as empty interface

How to append to empty interface (that has been verified to be a *[]struct)?
func main() {
var mySlice []myStruct // myStruct can be any struct (dynamic)
decode(&mySlice, "...")
}
func decode(dest interface{}, src string) {
// assume dest has been verified to be *[]struct
var modelType reflect.Type = getStructType(dest)
rows, fields := getRows(src)
for _, row := range rows {
// create new struct of type modelType and assign all fields
model := reflect.New(modelType)
for field := fields {
fieldValue := getRowValue(row, field)
model.Elem().FieldByName(field).Set(fieldValue)
}
castedModelRow := model.Elem().Interface()
// append model to dest; how to do this?
// dest = append(dest, castedModelRow)
}
}
Things I've tried:
This simply panics: reflect: call of reflect.Append on ptr Value (as we pass &mySlice instead of mySlice)
dest = reflect.Append(reflect.ValueOf(dest), reflect.ValueOf(castedModelRow))
This works but doesn't set the value back to dest... in main func, len(mySlice) remains 0 after decode function is called.
func decode(dest interface{}, src string) {
...
result := reflect.MakeSlice(reflect.SliceOf(modelType), rowCount, rowCount)
for _, row : range rows {
...
result = reflect.Append(result, reflect.ValueOf(castedModelRow))
}
dest = reflect.ValueOf(result)
}
Here's how to fix the second decode function shown in the question. The statement
dest = reflect.ValueOf(result)
modifies local variable dest, not the caller's value. Use the following statement to modify the caller's slice:
reflect.ValueOf(dest).Elem().Set(result)
The code in the question appends decoded elements after the elements created in reflect.MakeSlice. The resulting slice has len(rows) zero values followed by len(rows) decoded values. Fix by changing
result = reflect.Append(result, reflect.ValueOf(castedModelRow))
to:
result.Index(i).Set(model)
Here's the update version of the second decode function in the question:
func decode(dest interface{}, src string) {
var modelType reflect.Type = getStructType(dest)
rows, fields := getRows(src)
result := reflect.MakeSlice(reflect.SliceOf(modelType), len(rows), len(rows))
for i, row := range rows {
model := reflect.New(modelType).Elem()
for _, field := range fields {
fieldValue := getRowValue(row, field)
model.FieldByName(field).Set(fieldValue)
}
result.Index(i).Set(model)
}
reflect.ValueOf(dest).Elem().Set(result)
}
Run it on the Playground.
You were very close with your original solution. You had to de-reference the pointer before calling the append operation. This solution would be helpful if your dest already had some existing elements and you don't want to lose them by creating a newSlice.
tempDest := reflect.ValueOf(dest).Elem()
tempDest = reflect.Append(tempDest, reflect.ValueOf(model.Interface()))
Similar to how #I Love Reflection pointed out, you finally need to set the new slice back to the pointer.
reflect.ValueOf(dest).Elem().Set(tempDest)
Overall Decode:
var modelType reflect.Type = getStructType(dest)
rows, fields := getRows(src)
tempDest := reflect.ValueOf(dest).Elem()
for _, row := range rows {
model := reflect.New(modelType).Elem()
for _, field := range fields {
fieldValue := getRowValue(row, field)
model.FieldByName(field).Set(fieldValue)
}
tempDest = reflect.Append(tempDest, reflect.ValueOf(model.Interface()))
}
reflect.ValueOf(dest).Elem().Set(tempDest)

Get data from Twitter Library search into a struct in Go

How do I append output from a twitter search to the field Data in the SearchTwitterOutput{} struct.
Thanks!
I am using a twitter library to search twitter base on a query input. The search returns an array of strings(I believe), I am able to fmt.println the data but I need the data as a struct.
type SearchTwitterOutput struct {
Data string
}
func (SearchTwitter) execute(input SearchTwitterInput) (*SearchTwitterOutput, error) {
credentials := Credentials{
AccessToken: input.AccessToken,
AccessTokenSecret: input.AccessTokenSecret,
ConsumerKey: input.ConsumerKey,
ConsumerSecret: input.ConsumerSecret,
}
client, err := GetUserClient(&credentials)
if err != nil {
return nil, err
}
// search through the tweet and returns a
search, _ , err := client.Search.Tweets(&twitter.SearchTweetParams{
Query: input.Text,
})
if err != nil {
println("PANIC")
panic(err.Error())
return &SearchTwitterOutput{}, err
}
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
}
return &SearchTwitterOutput{
Data: "test", //data is a string for now it can be anything
}, nil
}
//Data field is a string type for now it can be anything
//I use "test" as a placeholder, bc IDK...
Result from fmt.Printf("Tweet %d - %s\n", k, v.Text):
Tweet 0 - You know I had to do it to them! #JennaJulien #Jenna_Marbles #juliensolomita #notjulen Got my first hydroflask ever…
Tweet 1 - RT #brenna_hinshaw: I was in J2 today and watched someone fill their hydroflask with vanilla soft serve... what starts here changes the wor…
Tweet 2 - I miss my hydroflask :(
This is my second week working with go and new to development. Any help would be great.
It doesn't look like the client is just returning you a slice of strings. The range syntax you're using (for k, v := range search.Statuses) returns two values for each iteration, the index in the slice (in this case k), and the object from the slice (in this case v). I don't know the type of search.Statuses - but I know that strings don't have a .Text field or method, which is how you're printing v currently.
To your question:
Is there any particular reason to return just a single struct with a Data field rather than directly returning the output of the twitter client?
Your function signature could look like this instead:
func (SearchTwitter) execute(input SearchTwitterInput) ([]<client response struct>, error)
And then you could operate on the text in those objects in wherever this function was called.
If you're dead-set on placing the data in your own struct, you could return a slice of them ([]*SearchTwitterOutput), in which case you could build a single SearchTwitterOutput in the for loop you're currently printing the tweets in and append it to the output list. That might look like this:
var output []*SearchTwitterOutput
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
output = append(output, &SearchTwitterOutput{
Data: v.Text,
})
}
return output, nil
But if your goal really is to return all of the results concatenated together and placed inside a single struct, I would suggest building a slice of strings (containing the text you want), and then joining them with the delimiter of your choosing. Then you could place the single output string in your return object, which might look something like this:
var outputStrings []string
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
outputStrings = append(outputStrings, v.Text)
}
output = strings.Join(outputStrings, ",")
return &SearchTwitterOutput{
Data: output,
}, nil
Though I would caution, it might be tricky to find a delimiter that will never show up in a tweet..

Map seems to drop values in recursion

I've been working on a problem and I figured I would demonstrate it using a pokemon setup. I am reading from a file, parsing the file and creating objects/structs from them. This normally isn't a problem except now I need to implement interface like inheriting of traits. I don't want there to be duplicate skills in there so I figured I could use a map to replicate a set data structure. However it seems that in the transitive phase of my recursive parsePokemonFile function (see the implementsComponent case), I appear to be losing values in my map.
I am using the inputs like such:
4 files
Ratatta:
name=Ratatta
skills=Tackle:normal,Scratch:normal
Bulbosaur:
name=Bulbosaur
implements=Ratatta
skills=VineWhip:leaf
Oddish:
name=Oddish
implements=Ratatatt
skills=Acid:poison
Venosaur:
name=Venosaur
implements=bulbosaur,oddish
I'm expecting the output for the following code to be something like
Begin!
{Venosaur [{VineWhip leaf} {Acid poison} {Tackle normal} {Scratch normal}]}
but instead I get
Begin!
{Venosaur [{VineWhip leaf} {Acid poison}]}
What am I doing wrong? Could it be a logic error? Or am I making an assumption about the map holding values that I shouldn't?
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
// In order to create a set of pokemon abilities and for ease of creation and lack of space being taken up
// We create an interfacer capability that imports the skills and attacks from pokemon of their previous evolution
// This reduces the amount of typing of skills we have to do.
// Algorithm is simple. Look for the name "implements=x" and then add x into set.
// Unfortunately it appears that the set is dropping values on transitive implements interfaces
func main() {
fmt.Println("Begin!")
dex, err := parsePokemonFile("Venosaur")
if err != nil {
fmt.Printf("Got error: %v\n", err)
}
fmt.Printf("%v\n", dex)
}
type pokemon struct {
Name string
Skills []skill
}
type skill struct {
SkillName string
Type string
}
func parsePokemonFile(filename string) (pokemon, error) {
file, err := os.Open(filename)
if err != nil {
return pokemon{}, err
}
defer file.Close()
scanner := bufio.NewScanner(file)
var builtPokemon pokemon
for scanner.Scan() {
component, returned := parseLine(scanner.Text())
switch component {
case nameComponent:
builtPokemon.Name = returned
case skillsComponent:
skillsStrings := strings.Split(returned, ",")
var skillsArr []skill
// split skills and add them into pokemon skillset
for _, skillStr := range skillsStrings {
skillPair := strings.Split(skillStr, ":")
skillsArr = append(skillsArr, skill{SkillName: skillPair[0], Type: skillPair[1]})
}
builtPokemon.Skills = append(builtPokemon.Skills, skillsArr...)
case implementsComponent:
implementsArr := strings.Split(returned, ",")
// create set to remove duplicates
skillsSet := make(map[*skill]bool)
for _, val := range implementsArr {
// recursively call the pokemon files and get full pokemon
implementedPokemon, err := parsePokemonFile(val)
if err != nil {
return pokemon{}, err
}
// sieve out the skills into a set
for _, skill := range implementedPokemon.Skills {
skillsSet[&skill] = true
}
}
// append final set into the currently being built pokemon
for x := range skillsSet {
builtPokemon.Skills = append(builtPokemon.Skills, *x)
}
}
}
return builtPokemon, nil
}
type component int
// components to denote where to put our strings when it comes time to assemble what we've parsed
const (
nameComponent component = iota
implementsComponent
skillsComponent
)
func parseLine(line string) (component, string) {
arr := strings.Split(line, "=")
switch arr[0] {
case "name":
return nameComponent, arr[1]
case "implements":
return implementsComponent, arr[1]
case "skills":
return skillsComponent, arr[1]
default:
panic("Invalid field found")
}
}
This has nothing to do with Golang maps dropping any values.
The problem is that you are using a map of skill pointers and not skills. Two pointers to the same skill content can be different.
skillsSet := make(map[*skill]bool)
If you change this to map[skill]bool, this should work. You may try it out!

Redshift returns a []uint8 instead of an integer, converting between them returns incorrect values

I have a service which takes a SQL Query, runs the query on Amazon Redshift, using the database/sql drivers. However, I can't convert the result to a struct, because the queries are big data tasks on various tables, not created within this service. So I have to return a 'loose' data structure. I'm parsing the data returned into JSON and storing it in S3.
However, I'm having some odd issues with the data types returned. The queries, for numeric columns, return a map of uint8's instead of a numeric value. I understand that this is because the database driver can't have an opinion on what to convert it to because it could be imprecise. But I can't seem to be able to convert between []uint8 and an integer either.
Here's my code that queries the database:
// Execute executes SQL commands
func (r *Runner) Execute(query string, args ...interface{}) (types.Results, error) {
var results types.Results
rows, err := r.db.Query(query, args...)
if err != nil {
return results, err
}
columns, _ := rows.Columns()
colNum := len(columns)
values := make([]interface{}, colNum)
for i := range values {
var ii interface{}
values[i] = &ii
}
for rows.Next() {
rows.Scan(values...)
result := make(types.Result)
for i, colName := range columns {
rawValue := *(values[i].(*interface{}))
if reflect.TypeOf(rawValue).String() == "[]uint8" {
byteVal := rawValue.([]byte)
val := Intfrombytes(byteVal)
log.Println("Converted:", val)
}
result[colName] = rawValue
}
results = append(results, result)
}
return results, nil
}
I created the following function to attempt to convert between []uint8 and uint32.
func Intfrombytes(bytes []uint8) uint16 {
bits := binary.LittleEndian.Uint16(bytes)
return bits
}
However, if I insert 200 into that table, I get back 12339. The approach feels pretty flaky, generally. I'm doubting my decision to use Go for this as I'm dealing with undefined, loose data structures.
Is there a better approach to generic queries such as my example, or is there a way I can convert my numeric results into an integer?
I think you might be interpreting a string ([]uint8 == []byte), actually. See https://play.golang.org/p/Rfpey2NPiI7
originalValue := []uint8{0x32, 0x30, 0x30} // "200"
bValue := []byte(originalValue) // byte is a uint8 anyway
fmt.Printf("Converted to uint16: %d\n", binary.LittleEndian.Uint16(bValue))
fmt.Printf("Actual value: %s", string(bValue))
This has bitten me before when dealing with pq and some crypto code.

how to access nested Json key values in Golang

Team,
new to Programming.
I have data available after unmarshaling the Json as shown below, which has nested Key values. flat key values I am able to access, how do I access nested key values.
Here is the byte slice data shown below after unmarshaling —>
tables:[map[name:basic__snatpool_members] map[name:net__snatpool_members] map[name:optimizations__hosts] map[columnNames:[name] name:pool__hosts rows:[map[row:[ry.hj.com]]]] traffic_group:/Common/traffic-group-1
Flat key values I am able to access by using the following code
p.TrafficGroup = m[“traffic_group”].(string)
here is the complete function
func dataToIapp(name string, d *schema.ResourceData) bigip.Iapp {
var p bigip.Iapp
var obj interface{}
jsonblob := []byte(d.Get("jsonfile").(string))
err := json.Unmarshal(jsonblob, &obj)
if err != nil {
fmt.Println("error", err)
}
m := obj.(map[string]interface{}) // Important: to access property
p.Name = m[“name”].(string)
p.Partition = m[“partition”].(string)
p.InheritedDevicegroup = m[“inherited_devicegroup”].(string)
}
Note: This may not work with your JSON structure. I inferred what it would be based on your question but without the actual structure, I cannot guarantee this to work without modification.
If you want to access them in a map, you need to assert that the interface pulled from the first map is actually a map. So you would need to do this:
tmp := m["tables"]
tables, ok := tmp.(map[string]string)
if !ok {
//error handling here
}
r.Name = tables["name"].(string)
But instead of accessing the unmarshaled JSON as a map[string]interface{}, why don't you create structs that match your JSON output?
type JSONRoot struct {
Name string `json:"name"`
Partition string `json:"partition"`
InheritedDevicegroup string `json:"inherited_devicegroup"`
Tables map[string]string `json:"tables"` //Ideally, this would be a map of structs
}
Then in your code:
func dataToIapp(name string, d *schema.ResourceData) bigip.Iapp {
var p bigip.Iapp
var obj &JSONRoot{}
jsonblob := []byte(d.Get("jsonfile").(string))
err := json.Unmarshal(jsonblob, &obj)
if err != nil {
fmt.Println("error", err)
}
p.Name = obj.Name
p.Partition = obj.Partition
p.InheritedDevicegroup = obj.InheritedDevicegroup
p.Name = obj.Tables["name"]
}
JSON objects are unmarshaled into map[string]interface{}, JSON arrays into []interface{}, same applies for nested objects/arrays.
So for example if a key/index maps to a nested object you need to type assert the value to map[string]interface{} and if the key/index maps to an array of objects you first need to assert the value to []interface{} and then each element to map[string]interface{}.
e.g. (for brevity this code is not guarding against panic)
tables := obj.(map[string]interface{})["tables"]
table1 := tables.([]interface{})[0]
name := table1.(map[string]interface{})["name"]
namestr := name.(string)
However, if it's the case that the json you are parsing is not dynamic but instead has a specific structure you should define a struct type that mirrors that structure and unmarshal the json into that.
All you have to do is repeatedly accessing the map via type-switching or assertion:
for _, table := range m["tables"] {
switch val := table {
case string:
fmt.Println("table is string")
case int:
fmt.Println("table is integer")
// This is your case, since JSON is unmarshaled to type []interface{} and map[string]interface{}
case []interface{}:
fmt.Println("table is a slice of interface{}")
for _, tb := range value {
if m, ok := tb.(map[string]interface{}); ok {
// Now it's accessible
fmt.Println(m["name"])
}
}
default:
fmt.Println("unknown type")
}
}
You might want to handle errors better than this.
To read more, check out my writing from a while ago https://medium.com/code-zen/dynamically-creating-instances-from-key-value-pair-map-and-json-in-go-feef83ab9db2.

Resources