Writing a struct's fields and values of different types to a file in Go - go

I'm writing a simple program that takes in input from a form, populates an instance of a struct with the received data and the writes this received data to a file.
I'm a bit stuck at the moment with figuring out the best way to iterate over the populated struct and write its contents to the file.
The struct in question contains 3 different types of fields (ints, strings, []strings).
I can iterate over them but I am unable to get their actual type.
Inspecting my posted code below with print statements reveals that each of their types is coming back as structs rather than the aforementioned string, int etc.
The desired output format is be plain text.
For example:
field_1="value_1"
field_2=10
field_3=["a", "b", "c"]
Anyone have any ideas? Perhaps I'm going about this the wrong way entirely?
func (c *Config) writeConfigToFile(file *os.File) {
listVal := reflect.ValueOf(c)
element := listVal.Elem()
for i := 0; i < element.NumField(); i++ {
field := element.Field(i)
myType := reflect.TypeOf(field)
if myType.Kind() == reflect.Int {
file.Write(field.Bytes())
} else {
file.WriteString(field.String())
}
}
}

Instead of using the Bytes method on reflect.Value which does not work as you initially intended, you can use either the strconv package or the fmt to format you fields.
Here's an example using fmt:
var s string
switch fi.Kind() {
case reflect.String:
s = fmt.Sprintf("%q", fi.String())
case reflect.Int:
s = fmt.Sprintf("%d", fi.Int())
case reflect.Slice:
if fi.Type().Elem().Kind() != reflect.String {
continue
}
s = "["
for j := 0; j < fi.Len(); j++ {
s = fmt.Sprintf("%s%q, ", s, fi.Index(i).String())
}
s = strings.TrimRight(s, ", ") + "]"
default:
continue
}
sf := rv.Type().Field(i)
if _, err := fmt.Fprintf(file, "%s=%s\n", sf.Name, s); err!= nil {
panic(err)
}
Playground: https://play.golang.org/p/KQF3CicVzA

Why not use the built-in gob package to store your struct values?
I use it to store different structures, one per line, in files. During decoding, you can test the type conversion or provide a hint in a wrapper - whichever is faster for your given use case.
You'd treat each line as a buffer when Encoding and Decoding when reading back the line. You can even gzip/zlib/compress, encrypt/decrypt, etc the stream in real-time.
No point in re-inventing the wheel when you have a polished and armorall'd wheel already at your disposal.

Related

Alternative to using strings.Builder in conjunction with fmt.Sprintf

I am learning about the strings package in Go and I am trying to build up a simple error message.
I read that strings.Builder is a very eficient way to join strings, and that fmt.Sprintf lets me do some string interpolation.
With that said, I want to understand the best way to join a lot of strings together. For example here is what I create:
func generateValidationErrorMessage(err error) string {
errors := []string{}
for _, err := range err.(validator.ValidationErrors) {
var b strings.Builder
b.WriteString(fmt.Sprintf("[%s] failed validation [%s]", err.Field(), err.ActualTag()))
if err.Param() != "" {
b.WriteString(fmt.Sprintf("[%s]", err.Param()))
}
errors = append(errors, b.String())
}
return strings.Join(errors, "; ")
}
Is there another/better way to do this? Is using s1 + s2 considered worse?
You can use fmt to print directly to the strings.Builder. Use fmt.Fprintf(&builder, "format string", args).
The fmt functions beginning with Fprint..., meaning "file print", allow you to print to an io.Writer such as a os.File or strings.Builder.
Also, rather than using multiple builders and joining all their strings at the end, just use a single builder and keep writing to it. If you want to add a separator, you can do so easily within the loop:
var builder strings.Builder
for i, v := range values {
if i > 0 {
// unless this is the first item, add the separator before it.
fmt.Fprint(&builder, "; ")
}
fmt.Fprintf(&builder, "some format %v", v)
}
var output = builder.String()

How to get columns data from golang apache-arrow?

I am using apache-arrow/go to read parquet data.
I can parse the data to table by using apach-arrow.
reader, err := ipc.NewReader(buf, ipc.WithAllocator(alloc))
if err != nil {
log.Println(err.Error())
return nil
}
defer reader.Release()
records := make([]array.Record, 0)
for reader.Next() {
rec := reader.Record()
rec.Retain()
defer rec.Release()
records = append(records, rec)
}
table := array.NewTableFromRecords(reader.Schema(), records)
Here, i can get the column info from table.Colunmn(index), such as:
for i, _ := range table.Schema().Fields() {
a := table.Column(i)
log.Println(a)
}
But the Column struct is defined as
type Column struct {
field arrow.Field
data *Chunked
}
and the println result is like
["WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN" "WARN"]
However, this is not a string or slice. Is there anyway that i can get the data of each column with string type or []interface{} ?
Update:
I find that i can use reflect to get the element from col.
log.Println(col.(*array.Int64).Value(0))
But i am not sure if this is the recommended way to use it.
When working with Arrow data, there's a couple concepts to understand:
Array: Metadata + contiguous buffers of data
Record Batch: A schema + a collection of Arrays that are all the same length.
Chunked Array: A group of Arrays of varying lengths but all the same data type. This allows you to treat multiple Arrays as one single column of data without having to copy them all into a contiguous buffer.
Column: Is just a Field + a Chunked Array
Table: A collection of Columns allowing you to treat multiple non-contiguous arrays as a single large table without having to copy them all into contiguous buffers.
In your case, you're reading multiple record batches (groups of contiguous Arrays) and treating them as a single large table. There's a few different ways you can work with the data:
One way is to use a TableReader:
tr := array.NewTableReader(tbl, 5)
defer tr.Release()
for tr.Next() {
rec := tr.Record()
for i, col := range rec.Columns() {
// do something with the Array
}
}
Another way would be to interact with the columns directly as you were in your example:
for i := 0; i < table.NumCols(); i++ {
col := table.Column(i)
for _, chunk := range col.Data().Chunks() {
// do something with chunk (an arrow.Array)
}
}
Either way, you eventually have an arrow.Array to deal with, which is an interface containing one of the typed Array types. At this point you are going to have to switch on something, you could type switch on the type of the Array itself:
switch arr := col.(type) {
case *array.Int64:
// do stuff with arr
case *array.Int32:
// do stuff with arr
case *array.String:
// do stuff with arr
...
}
Alternately, you could type switch on the data type:
switch col.DataType().ID() {
case arrow.INT64:
// type assertion needed col.(*array.Int64)
case arrow.INT32:
// type assertion needed col.(*array.Int32)
...
}
For getting the data out of the array, primitive types which are stored contiguously tend to have a *Values method which will return a slice of the type. For example array.Int64 has Int64Values() which returns []int64. Otherwise, all of the types have .Value(int) methods which return the value at a particular index as you showed in your example.
Hope this helps!
Make sure you use v9
(import "github.com/apache/arrow/go/v9/arrow") because it have implemented json.Marshaller (from go-json)
Use "github.com/goccy/go-json" for Marshaler (because of this)
Then you can use TableReader to Marshal it then Unmarshal with type []any
In your example maybe look like this:
import (
"github.com/apache/arrow/go/v9/arrow"
"github.com/apache/arrow/go/v9/arrow/array"
"github.com/apache/arrow/go/v9/arrow/memory"
"github.com/goccy/go-json"
)
...
tr := array.NewTableReader(tabel, 6)
defer tr.Release()
// fmt.Printf("tbl.NumRows() = %+v\n", tbl.NumRows())
// fmt.Printf("tbl.NumColumn = %+v\n", tbl.NumCols())
// keySlice is for sorting same as data source
keySlice := make([]string, 0, tabel.NumCols())
res := make(map[string][]any, 0)
var key string
for tr.Next() {
rec := tr.Record()
for i, col := range rec.Columns() {
key = rec.ColumnName(i)
if res[key] == nil {
res[key] = make([]any, 0)
keySlice = append(keySlice, key)
}
var tmp []any
b2, err := json.Marshal(col)
if err != nil {
panic(err)
}
err = json.Unmarshal(b2, &tmp)
if err != nil {
panic(err)
}
// fmt.Printf("key = %s\n", key)
// fmt.Printf("tmp = %+v\n", tmp)
res[key] = append(res[key], tmp...)
}
}
fmt.Println("res", res)

How to transform HTML entities via io.Reader

My Go program makes HTTP requests whose response bodies are large JSON documents whose strings encode the ampersand character & as & (presumably due to some Microsoft platform quirk?). My program needs to convert those entities back to the ampersand character in a way that is compatible with json.Decoder.
An example response might look like the following:
{"name":"A&B","comment":"foo&bar"}
Whose corresponding object would be as below:
pkg.Object{Name:"A&B", Comment:"foo&bar"}
The documents come in various shapes so it's not feasible to convert the HTML entities after decoding. Ideally it would be done by wrapping the response body reader in another reader that performs the transformation.
Is there an easy way to wrap the http.Response.Body in some io.ReadCloser which replaces all instances of & with & (or in the general case, replaces any string X with string Y)?
I suspect this is possible with x/text/transform but don't immediately see how. In particular, I'm concerned about edge cases wherein an entity spans batches of bytes. That is, one batch ends with &am and the next batch starts with p;, for example. Is there some library or idiom that gracefully handles that situation?
If you don't want to rely on an external package like transform.Reader you can write a custom io.Reader wrapper.
The following will handle the edge case where the find element may span two Read() calls:
type fixer struct {
r io.Reader // source reader
fnd, rpl []byte // find & replace sequences
partial int // track partial find matches from previous Read()
}
// Read satisfies io.Reader interface
func (f *fixer) Read(b []byte) (int, error) {
off := f.partial
if off > 0 {
copy(b, f.fnd[:off]) // copy any partial match from previous `Read`
}
n, err := f.r.Read(b[off:])
n += off
if err != io.EOF {
// no need to check for partial match, if EOF, as that is the last Read!
f.partial = partialFind(b[:n], f.fnd)
n -= f.partial // lop off any partial bytes
}
fixb := bytes.ReplaceAll(b[:n], f.fnd, f.rpl)
return copy(b, fixb), err // preserve err as it may be io.EOF etc.
}
Along with this helper (which could probably use some optimization):
// returns number of matched bytes, if byte-slice ends in a partial-match
func partialFind(b, find []byte) int {
for n := len(find) - 1; n > 0; n-- {
if bytes.HasSuffix(b, find[:n]) {
return n
}
}
return 0 // no match
}
Working playground example.
Note: to test the edge-case logic, one could use a narrowReader to ensure short Read's and force a match is split across Reads like this: validation playground example
You need to create a transform.Transformer that replaces your characters.
So we need one that transforms an old []byte to a new []byte while preserving all other data. An implementation could look like this:
type simpleTransformer struct {
Old, New []byte
}
// Transform transforms `t.Old` bytes to `t.New` bytes.
// The current implementation assumes that len(t.Old) >= len(t.New), but it also seems to work when len(t.Old) < len(t.New) (this has not been tested extensively)
func (t *simpleTransformer) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
// Get the position of the first occurance of `t.Old` so we can replace it
var ci = bytes.Index(src[nSrc:], t.Old)
// Loop over the slice until we can't find any occurances of `t.Old`
// also make sure we don't run into index out of range panics
for ci != -1 && nSrc < len(src) {
// Copy source data before `nSrc+ci` that doesn't need transformation
copied := copy(dst[nDst:nDst+ci], src[nSrc:nSrc+ci])
nDst += copied
nSrc += copied
// Copy new data with transformation to `dst`
nDst += copy(dst[nDst:nDst+len(t.New)], t.New)
// Skip the rest of old bytes in the next iteration
nSrc += len(t.Old)
// search for the next occurance of `t.Old`
ci = bytes.Index(src[nSrc:], t.Old)
}
// Mark the rest of data as not completely processed if it contains a start element of `t.Old`
// (e.g. if the end is `&amp` and we're looking for `&`)
// This data will not yet be copied to `dst` so we can work with it again
// If it is at the end (`atEOF`), we don't need to do the check anymore as the string might just end with `&amp`
if bytes.Contains(src[nSrc:], t.Old[0:1]) && !atEOF {
err = transform.ErrShortSrc
return
}
// Copy rest of data that doesn't need any transformations
// The for loop processed everything except this last chunk
copied := copy(dst[nDst:], src[nSrc:])
nDst += copied
nSrc += copied
return nDst, nSrc, err
}
// To satisfy transformer.Transformer interface
func (t *simpleTransformer) Reset() {}
The implementation has to make sure that it deals with characters that are split between multible calls of the Transform method, which is why it returns transform.ErrShortSrc to tell the transform.Reader that it needs more information about the next bytes.
This can now be used to replace characters in a stream:
var input = strings.NewReader(`{"name":"A&B","comment":"foo&bar"}`)
r := transform.NewReader(input, &simpleTransformer{[]byte(`&`), []byte(`&`)})
io.Copy(os.Stdout, r) // Instead of io.Copy, use the JSON decoder to read from `r`
Output:
{"name":"A&B","comment":"foo&bar"}
You can also see this in action on the Go Playground.

Get data from Twitter Library search into a struct in Go

How do I append output from a twitter search to the field Data in the SearchTwitterOutput{} struct.
Thanks!
I am using a twitter library to search twitter base on a query input. The search returns an array of strings(I believe), I am able to fmt.println the data but I need the data as a struct.
type SearchTwitterOutput struct {
Data string
}
func (SearchTwitter) execute(input SearchTwitterInput) (*SearchTwitterOutput, error) {
credentials := Credentials{
AccessToken: input.AccessToken,
AccessTokenSecret: input.AccessTokenSecret,
ConsumerKey: input.ConsumerKey,
ConsumerSecret: input.ConsumerSecret,
}
client, err := GetUserClient(&credentials)
if err != nil {
return nil, err
}
// search through the tweet and returns a
search, _ , err := client.Search.Tweets(&twitter.SearchTweetParams{
Query: input.Text,
})
if err != nil {
println("PANIC")
panic(err.Error())
return &SearchTwitterOutput{}, err
}
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
}
return &SearchTwitterOutput{
Data: "test", //data is a string for now it can be anything
}, nil
}
//Data field is a string type for now it can be anything
//I use "test" as a placeholder, bc IDK...
Result from fmt.Printf("Tweet %d - %s\n", k, v.Text):
Tweet 0 - You know I had to do it to them! #JennaJulien #Jenna_Marbles #juliensolomita #notjulen Got my first hydroflask ever…
Tweet 1 - RT #brenna_hinshaw: I was in J2 today and watched someone fill their hydroflask with vanilla soft serve... what starts here changes the wor…
Tweet 2 - I miss my hydroflask :(
This is my second week working with go and new to development. Any help would be great.
It doesn't look like the client is just returning you a slice of strings. The range syntax you're using (for k, v := range search.Statuses) returns two values for each iteration, the index in the slice (in this case k), and the object from the slice (in this case v). I don't know the type of search.Statuses - but I know that strings don't have a .Text field or method, which is how you're printing v currently.
To your question:
Is there any particular reason to return just a single struct with a Data field rather than directly returning the output of the twitter client?
Your function signature could look like this instead:
func (SearchTwitter) execute(input SearchTwitterInput) ([]<client response struct>, error)
And then you could operate on the text in those objects in wherever this function was called.
If you're dead-set on placing the data in your own struct, you could return a slice of them ([]*SearchTwitterOutput), in which case you could build a single SearchTwitterOutput in the for loop you're currently printing the tweets in and append it to the output list. That might look like this:
var output []*SearchTwitterOutput
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
output = append(output, &SearchTwitterOutput{
Data: v.Text,
})
}
return output, nil
But if your goal really is to return all of the results concatenated together and placed inside a single struct, I would suggest building a slice of strings (containing the text you want), and then joining them with the delimiter of your choosing. Then you could place the single output string in your return object, which might look something like this:
var outputStrings []string
for k, v := range search.Statuses {
fmt.Printf("Tweet %d - %s\n", k, v.Text)
outputStrings = append(outputStrings, v.Text)
}
output = strings.Join(outputStrings, ",")
return &SearchTwitterOutput{
Data: output,
}, nil
Though I would caution, it might be tricky to find a delimiter that will never show up in a tweet..

How to get size of struct containing data structures in Go?

I'm currently trying to get the size of a complex struct in Go.
I've read solutions that use reflect and unsafe, but neither of these help with structs that contain arrays or maps (or any other field that's a pointer to an underlying data structure).
Example:
type testStruct struct {
A int
B string
C struct{}
items map[string]string
}
How would I find out the correct byte size of the above if items contains a few values in it?
You can get very close to the amount of memory required by the structure and its content by using the package reflect. You need to iterate over the fields and obtain the size of each field. For example:
func getSize(v interface{}) int {
size := int(reflect.TypeOf(v).Size())
switch reflect.TypeOf(v).Kind() {
case reflect.Slice:
s := reflect.ValueOf(v)
for i := 0; i < s.Len(); i++ {
size += getSize(s.Index(i).Interface())
}
case reflect.Map:
s := reflect.ValueOf(v)
keys := s.MapKeys()
size += int(float64(len(keys)) * 10.79) // approximation from https://golang.org/src/runtime/hashmap.go
for i := range(keys) {
size += getSize(keys[i].Interface()) + getSize(s.MapIndex(keys[i]).Interface())
}
case reflect.String:
size += reflect.ValueOf(v).Len()
case reflect.Struct:
s := reflect.ValueOf(v)
for i := 0; i < s.NumField(); i++ {
if s.Field(i).CanInterface() {
size += getSize(s.Field(i).Interface())
}
}
}
return size
}
This obtains the size of v using reflect and then, for the supported types in this example (slices, maps, strings, and structs), it computes the memory required by the content stored in them. You would need to add here other types that you need to support.
There are a few details to work out:
Private fields are not counted.
For structs we are double-counting the basic types.
For number two, you can filter them out before doing the recursive call when handling structs, you can check the kinds in the documentation for the reflect package.

Resources