Need faster way to list all datasets/tables in project - go

I am creating a utility that needs to be aware of all the datasets/tables that exist in my BigQuery project. My current code for getting this information is as follows (using Go API):
func populateExistingTableMap(service *bigquery.Service, cloudCtx context.Context, projectId string) (map[string]map[string]bool, error) {
tableMap := map[string]map[string]bool{}
call := service.Datasets.List(projectId)
//call.Fields("datasets/datasetReference")
if err := call.Pages(cloudCtx, func(page *bigquery.DatasetList) error {
for _, v := range page.Datasets {
if tableMap[v.DatasetReference.DatasetId] == nil {
tableMap[v.DatasetReference.DatasetId] = map[string]bool{}
}
table_call := service.Tables.List(projectId, v.DatasetReference.DatasetId)
//table_call.Fields("tables/tableReference")
if err := table_call.Pages(cloudCtx, func(page *bigquery.TableList) error {
for _, t := range page.Tables {
tableMap[v.DatasetReference.DatasetId][t.TableReference.TableId] = true
}
return nil
}); err != nil {
return errors.New("Error Parsing Table")
}
}
return nil
}); err != nil {
return tableMap, err
}
return tableMap, nil
}
For a project with about 5000 datasets, each with up to 10 tables, this code takes almost 15 minutes to return. Is there a faster way to iterate through the names of all existing datasets/tables? I have tried using the Fields method to return only the fields I need (you can see those lines commented out above), but that results in only 50 (exactly 50) of my datasets being returned.
Any ideas?

Here is an updated version of my code, with concurrency, that reduced the processing time from about 15 minutes to 3 minutes.
func populateExistingTableMap(service *bigquery.Service, cloudCtx context.Context, projectId string) (map[string]map[string]bool, error) {
tableMap = map[string]map[string]bool{}
call := service.Datasets.List(projectId)
//call.Fields("datasets/datasetReference")
if err := call.Pages(cloudCtx, func(page *bigquery.DatasetList) error {
var wg sync.WaitGroup
wg.Add(len(page.Datasets))
for _, v := range page.Datasets {
if tableMap[v.DatasetReference.DatasetId] == nil {
tableMap[v.DatasetReference.DatasetId] = map[string]bool{}
}
go func(service *bigquery.Service, datasetID string, projectId string) {
defer wg.Done()
table_call := service.Tables.List(projectId, datasetID)
//table_call.Fields("tables/tableReference")
if err := table_call.Pages(cloudCtx, func(page *bigquery.TableList) error {
for _, t := range page.Tables {
tableMap[datasetID][t.TableReference.TableId] = true
}
return nil // NOTE: returning a non-nil error stops pagination.
}); err != nil {
// TODO: Handle error.
fmt.Println(err)
}
}(service, v.DatasetReference.DatasetId, projectId)
}
wg.Wait()
return nil // NOTE: returning a non-nil error stops pagination.
}); err != nil {
return tableMap, err
// TODO: Handle error.
}
return tableMap, nil
}

Related

rows.Next() halts after some number of rows

Im newbie in Golang, so it may be simple for professionals but I got stuck with no idea what to do next.
I'm making some migration app that extract some data from oracle DB and after some conversion insert it to Postges one-by-one.
The result of native Query in DB console returns about 400k of rows and takes about 13 sec to end.
The data from Oracle extracts with rows.Next() with some strange behavior:
First 25 rows extracted fast enough, then about few sec paused, then new 25 rows until it pauses "forever".
Here is the function:
func GetHrTicketsFromOra() (*sql.Rows, error) {
rows, err := oraDB.Query("select id,STATE_ID,REMEDY_ID,HEADER,CREATE_DATE,TEXT,SOLUTION,SOLUTION_USER_LOGIN,LAST_SOLUTION_DATE from TICKET where SOLUTION_GROUP_ID = 5549")
if err != nil {
println("Error while getting rows from Ora")
return nil, err
}
log.Println("Finished legacy tickets export")
return rows, err
}
And here I export data:
func ConvertRows(rows *sql.Rows, c chan util.ArchTicket, m chan int) error {
log.Println("Conversion start")
defer func(rows *sql.Rows) {
err := rows.Close()
if err != nil {
log.Println("ORA connection closed", err)
return
}
}(rows)
for rows.Next() {
log.Println("Reading the ticket")
ot := util.OraTicket{}
at := util.ArchTicket{}
err := rows.Scan(&ot.ID, &ot.StateId, &ot.RemedyId, &ot.Header, &ot.CreateDate, &ot.Text, &ot.Solution, &ot.SolutionUserLogin, &ot.LastSolutionDate)
if err != nil {
log.Println("Error while reading row", err)
return err
}
at = convertLegTOArch(ot)
c <- at
}
if err := rows.Err(); err != nil {
log.Println("Error while reading row", err)
return err
}
m <- 1
return nil
}
UPD. I use "github.com/sijms/go-ora/v2" driver
UPD2. Seems like the root cause of the problem is in TEXT and SOLUTION fields of the result rows. They are varchar and can be big enough. Deleting them from the direct query changes the time of execution from 13sec to 258ms. But I still have no idea what to do with that.
UPD3.
Minimal reproducible example
package main
import (
"database/sql"
_ "github.com/sijms/go-ora/v2"
"log"
)
var oraDB *sql.DB
var con = "oracle://login:password#ora_db:1521/database"
func InitOraDB(dataSourceName string) error {
var err error
oraDB, err = sql.Open("oracle", dataSourceName)
if err != nil {
return err
}
return oraDB.Ping()
}
func GetHrTicketsFromOra() {
var ot string
rows, err := oraDB.Query("select TEXT from TICKET where SOLUTION_GROUP_ID = 5549")
if err != nil {
println("Error while getting rows from Ora")
}
for rows.Next() {
log.Println("Reading the ticket")
err := rows.Scan(&ot)
if err != nil {
log.Println("Reading failed", err)
}
log.Println("Read:")
}
log.Println("Finished legacy tickets export")
}
func main() {
err := InitOraDB(con)
if err != nil {
log.Println("Error connection Ora")
}
GetHrTicketsFromOra()
}

Atomically Execute commands across Redis Data Structures

I want to execute some redis commands atomically (HDel, SADD, HSet etc). I see the Watch feature in the go-redis to implement transactions , however since I am not going to modify the value of a key i.e use SET,GET etc , does it make sense to use Watch to execute it as transaction or just wrapping the commands in a TxPipeline would be good enough?
Approach 1 : Using Watch
func sampleTransaction() error{
transactionFunc := func(tx *redis.Tx) error {
// Get the current value or zero.
_, err := tx.TxPipelined(context.Background(), func(pipe redis.Pipeliner) error {
_, Err := tx.SAdd(context.Background(), "redis-set-key", "value1").Result()
if Err != nil {
return Err
}
_, deleteErr := tx.HDel(context.Background(), "redis-hash-key", "value1").Result()
if deleteErr != nil {
return deleteErr
}
return nil
})
return err
}
retries:=10
// Retry if the key has been changed.
for i := 0; i < retries; i++ {
fmt.Println("tries", i)
err := redisClient.Watch(context.Background(), transactionFunc())
if err == nil {
// Success.
return nil
}
if err == redis.TxFailedErr {
continue
}
return err
}
}
Approach 2: Just wrapping in TxPipelined
func sampleTransaction() error {
_, err:= tx.TxPipelined(context.Background(), func(pipe redis.Pipeliner) error {
_, Err := tx.SAdd(context.Background(), "redis-set-key", "value1").Result()
if Err != nil {
return Err
}
_, deleteErr := tx.HDel(context.Background(), "redis-hash-key", "value1").Result()
if deleteErr != nil {
return deleteErr
}
return nil
})
return err
}
As far as I know, pipelines do not guarantee atomicity. If you need atomicity, use lua.
https://pkg.go.dev/github.com/mediocregopher/radix.v3#NewEvalScript

How to bulk remove objects in minio with golang

I'm trying to bulk remove objects in minio as described here:
objectsCh := make(chan minio.ObjectInfo)
// Send object names that are needed to be removed to objectsCh
go func() {
defer close(objectsCh)
// List all objects from a bucket-name with a matching prefix.
for object := range minioClient.ListObjects(context.Background(), "my-bucketname", "my-prefixname", true, nil) {
if object.Err != nil {
log.Fatalln(object.Err)
}
objectsCh <- object
}
}()
opts := minio.RemoveObjectsOptions{
GovernanceBypass: true,
}
for rErr := range minioClient.RemoveObjects(context.Background(), "my-bucketname", objectsCh, opts) {
fmt.Println("Error detected during deletion: ", rErr)
}
Where I can ListObjects by bucketname and prefixname. However I'm struggling to find an approach where I can ListObjects by for example a slice of object names which I want to remove or any other way. So my question is: how can I properly generate a ListObjects for arbitrary objectNames in a given bucket? Or is there any other way to do remove objects by their names? Thanks.
func DeleteItemInMinio(ctx context.Context, item []string) (string, error) {
minioClient, err := minio.New("test.com", os.Getenv("MINIO_ACCESS_KEY"), os.Getenv("MINIO_SECRET_KEY"), true)
if err != nil {
log.Println(err)
}
for _, val := range item {
err = minioClient.RemoveObject("my-bucketname", val)
if err != nil {
panic(err)
}
}
return "success", nil
}
and call it with :
r.POST("/test/delete", func(c *gin.Context) {
item := []string{"golang.png", "phplogo.jpg"}
execute.DeleteItemInMinio(context.Background(), item)
})
i tried it and it works, in case you still need it

Can I have nested bucket under a nested bucket in boltdb?

This is what I have to create nested buckets. It does not return any error but fails at creating nested bucket under another nested bucket.
func CreateNestedBuckets(buckets []string) error {
err := db.Update(func(tx *bolt.Tx) error {
var bkt *bolt.Bucket
var err error
first := true
for _, bucket := range buckets {
log.Error(bucket)
if first == true {
bkt, err = tx.CreateBucketIfNotExists([]byte(bucket))
first = false
} else {
bkt, err = bkt.CreateBucketIfNotExists([]byte(bucket))
}
if err != nil {
log.Error("error creating nested bucket")
return err
}
}
return nil
})
if err != nil {
log.Error("error creating nested bucket!!!")
return err
}
return nil
}
Short answer: yes! You can have nested buckets: https://twitter.com/boltdb/status/454730212010254336
Long answer: your code works fine! Heres some things to check though:
Are you checking the correct bolt database file? The botlt db file will be created in the directory you run your code from, unless you've specified an absolute path.
Does your input actually contain enough elements to create a nested structure?
I've ran your code with the following setup (a couple of small changes but nothing major) and it works fine:
package main
import (
"log"
"os"
"time"
"github.com/boltdb/bolt"
)
var dbname = "test.bdb"
var dbperms os.FileMode = 0770
var options = &bolt.Options{Timeout: 1 * time.Second}
func main() {
var names []string
names = append(names, "bucketOne")
names = append(names, "bucketTwo")
names = append(names, "bucketThree")
if err := CreateNestedBuckets(names); err != nil {
log.Fatal(err)
}
}
// CreateNestedBuckets - Function to create
// nested buckets from an array of Strings
func CreateNestedBuckets(buckets []string) error {
db, dberr := bolt.Open(dbname, dbperms, options)
if dberr != nil {
log.Fatal(dberr)
}
defer db.Close()
err := db.Update(func(tx *bolt.Tx) error {
var bkt *bolt.Bucket
var err error
first := true
for _, bucket := range buckets {
log.Println(bucket)
if first == true {
bkt, err = tx.CreateBucketIfNotExists([]byte(bucket))
first = false
} else {
bkt, err = bkt.CreateBucketIfNotExists([]byte(bucket))
}
if err != nil {
log.Println("error creating nested bucket")
return err
}
}
return nil
})
if err != nil {
log.Println("error creating nested bucket!!!")
return err
}
return nil
}
To test you can cat the file through the strings command:
cat test.bdb | strings
bucketThree
bucketTwo
bucketOne
If you're on Windows, I'm not sure what the equivalent command is, but you can open the file with Notepad and inspect it manually. It won't be pretty, but you should still see the name of your buckets in there.
On another note, you error handling is going to result in very similar messages being printed in succession. Here's a slightly cleaner solution you can use:
// CreateNestedBucketsNew - function to create
// nested buckets from an array of Strings - my implementation
func CreateNestedBucketsNew(buckets []string) (err error) {
err = db.Update(func(tx *bolt.Tx) (err error) {
var bkt *bolt.Bucket
for index, bucket := range buckets {
if index == 0 {
bkt, err = tx.CreateBucketIfNotExists([]byte(bucket))
} else {
bkt, err = bkt.CreateBucketIfNotExists([]byte(bucket))
}
if err != nil {
return fmt.Errorf("Error creating nested bucket [%s]: %v", bucket, err)
}
}
return err
})
return err
}
fste89's demo has some debug;
this right:
package main
import (
"fmt"
"time"
"github.com/boltdb/bolt"
)
func CreateNestedBuckets(fatherTable string, sonTabls []string) error {
db, dberr := bolt.Open("your file path", 0600, &bolt.Options{Timeout: 1 * time.Second})
if dberr != nil {
fmt.Println(dberr)
}
defer db.Close()
err := db.Update(func(tx *bolt.Tx) error {
var bkt *bolt.Bucket
var err error
bkFather, err = tx.CreateBucketIfNotExists([]byte(fatherTable))
for _, ta := range sonTabls {
fmt.Println(ta)
_, err = bkFather.CreateBucketIfNotExists([]byte(ta))
if err != nil {
fmt.Println("error creating nested bucket")
return err
}
}
return nil
})
if err != nil {
fmt.Println("error creating nested bucket!!!")
return err
}
return nil
}
func main() {
t := []string{"cc", "1", "2", "3"}
fmt.Println(CreateNestedBuckets("sb", t))
}
echo:
cc
1
2
3
<nil>
Visible
enter image description here
func CreateNestedBuckets(fatherTable string, sonTables []string) error {
db, dberr := bolt.Open("E:\\OneDrive\\code\\go\\project\\transmission\\static\\localstorage.db", 0600, &bolt.Options{Timeout: 1 * time.Second})
if dberr != nil {
fmt.Println(dberr)
}
defer db.Close()
err := db.Update(func(tx *bolt.Tx) error {
var err error
bkFather, err := tx.CreateBucketIfNotExists([]byte(fatherTable))
for _, ta := range sonTables {
fmt.Println(ta)
_, err = bkFather.CreateBucketIfNotExists([]byte(ta))
if err != nil {
fmt.Println("error creating nested bucket")
return err
}
}
return nil
})
if err != nil {
fmt.Println("error creating nested bucket!!!")
return err
}
return nil
}

Go goroutines not beeing executed

I am trying to achieve some sort of multi-thread processing over here.
func (m *Map) Parse(mapData Node) error {
wg := &sync.WaitGroup{}
for _, node := range mapData.child {
wg.Add(1)
go parseChild(node, m, wg)
}
wg.Wait()
close(errors)
return nil
}
func parseChild(node Node, m *Map, wg *sync.WaitGroup) {
defer wg.Done()
var nodeType uint8
if err := binary.Read(node.data, binary.LittleEndian, &nodeType); err != nil {
errors <- err
}
if nodeType == OTBMNodeTowns {
for _, town := range node.child {
var nodeType uint8
if err := binary.Read(town.data, binary.LittleEndian, &nodeType); err != nil {
errors <- err
return
}
if nodeType != OTBMNodeTown {
errors <- fmt.Errorf("Parsing map towns: expected %v got %v", OTBMNodeTown, nodeType)
return
}
currentTown := Town{}
if err := binary.Read(town.data, binary.LittleEndian, &currentTown.ID); err != nil {
errors <- err
return
} else if currentTown.Name, err = town.ReadString(); err != nil {
errors <- err
return
} else if currentTown.TemplePosition, err = town.ReadPosition(); err != nil {
errors <- err
return
}
m.Towns = append(m.Towns, currentTown)
errors <- fmt.Errorf("This should be called: %v, nodeType)
return
}
}
}
But my goroutine never sends anything to the errors channel. Seems to be that the main thread is not waiting for the goroutines to even finish
I have no idea what I am missing here. Im waiting for all routines to finish using wg.Wait but doesnt seem to be working as I think it should
And yes. the slice is populated with atleast 3 results. This is the errrors channel
var (
errors = make(chan error, 0)
)
func init() {
go errChannel()
}
func errChannel() {
for {
select {
case err := <-errors:
log.Println(err)
}
}
}

Resources