I have a go program connected to a bigquery table. This is the table's schema:
name STRING NULLABLE
age INTEGER NULLABLE
amount INTEGER NULLABLE
I have succeded at queryng the data of this table and printing all rows on console with this code:
ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
q := client.Query("SELECT * FROM test.test_user LIMIT 1000")
it, err := q.Read(ctx)
if err != nil {
log.Fatal(err)
}
for {
var values []bigquery.Value
err := it.Next(&values)
if err == iterator.Done {
break
}
if err != nil {
// TODO: Handle error.
}
fmt.Println(values)
}
And I also have succeded to insert data on the table from a struct using this code:
type test struct {
Name string
Age int
Amount int
}
u := client.Dataset("testDS").Table("test_user").Uploader()
savers := []*bigquery.StructSaver{
{Struct: test{Name: "Jack", Age: 23, Amount:123}, InsertID: "id1"},
}
if err := u.Put(ctx, savers); err != nil {
log.Fatal(err)
}
fmt.Printf("rows inserted!!")
Now, what I am failing to do is updating rows. What I want to do is selecting all the rows and update all of them with an operation (for example: amount = amount * 2)
How can I achieve this using golang?
Updating rows is not specific to Go, or any other client library. If you want to update data in BigQuery, you need to use DML (Data Manipulation Language) via SQL. So, essentially you already have the main part working (running a query) - you just need to change this SQL to use DML.
But, a word of caution: BigQuery is a OLAP service. Don't use it for OLTP. Also, there are quotas with using DML. Make sure you familiarise yourself with them.
Related
Essentially, using GORMDB, my current code looks something like this:
res = []*modelExample
DB.Model(&modelExample{}).
Order("task_id ").
Find(res)
And what I do with res is that I will manually loop through and append the models with the same task_id into one list, and then append this list to be worked on. The reason why I need to do this is because there are some specific operations i need to do on specific columns that I need to extract which I can't do in GORM.
However, is there a way to do this more efficiently where I return like a list of list, which I can then for loop and do my operation on each list element?
You should be able to achieve your needs with the following code snippet:
package main
import (
"fmt"
"gorm.io/driver/postgres"
"gorm.io/gorm"
)
type modelExample struct {
TaskId int
Name string
}
func main() {
dsn := "host=localhost user=postgres password=postgres dbname=postgres port=5432 sslmode=disable"
db, err := gorm.Open(postgres.Open(dsn), &gorm.Config{})
if err != nil {
panic(err)
}
db.AutoMigrate(&modelExample{})
// here you should populate the database with some data
// querying
res := make(map[int][]modelExample, 0)
rows, err := db.Table("model_examples").Select("task_id, name").Rows()
if err != nil {
panic(err)
}
defer rows.Close()
// scanning
for rows.Next() {
var taskId int
var name string
rows.Scan(&taskId, &name)
if _, isFound := res[taskId]; !isFound {
res[taskId] = []modelExample{{taskId, name}}
continue
}
res[taskId] = append(res[taskId], modelExample{taskId, name})
}
// always good idea to check for errors when scanning
if err = rows.Err(); err != nil {
panic(err)
}
for _, v := range res {
fmt.Println(v)
}
}
After the initial setup, let's take a closer look at the querying section.
First, you're going to get all the records from the table. The records you get are stored in the rows variable.
In the for loop, you scan all of the records. Each record will be either added as a new map entry or appended to an existing one (if the taskId is already present in the map).
This is the easiest way to create different lists based on a specific column (e.g. the TaskId). Actually, from what I understood, you need to split the records rather than grouping them with an aggregation function (e.g. COUNT, SUM, and so on).
The other code I added was just put in for clarity.
Let me know if this solves your issue or if you need something else, thanks!
Looking to retrieve table names from my postgresql database. Now, I know in Go you can use sql and the pq driver, but I'm using GORM for doing queries in my REST API.
The table_name type in PostgreSQL is "information_schema.sql_identifier". This is what I was trying to do, but the type isn't string.
var tables []string
if err := db.Table("information_schema.tables").Select("table_name").Where("table_schema = ?", "public").Find(&tables).Error; err != nil {
panic(err)
}
TL;DR
To select a single column values into a slice using Gorm, you can use db.Pluck helper:
var tables []string
if err := db.Table("information_schema.tables").Where("table_schema = ?", "public").Pluck("table_name", &tables).Error; err != nil {
panic(err)
}
TS;WM
Considering this, the SELECT statement returns a set of rows with one or more columns. In order to map those to Go code, we need a sort of struct so that Gorm can understand which column is mapped to which field of the struct. Even when you only select 1 single column, it's just a struct with 1 single field.
type Table struct {
TableName string
// more fields if needed...
}
So your output variable should be []*Table:
var tables []*Table
if err := db.Table("information_schema.tables").Select("table_name").Where("table_schema = ?", "public").Find(&tables).Error; err != nil {
panic(err)
}
Note: it could be []Table as well if you don't want to modify the element inside the slice.
If you don't want to define the struct, you can use the the db.Pluck function which is just a helper of this sort of code:
rows, err := db.Table("information_schema.tables").Select("table_name").Where("table_schema = ?", "public").Rows()
defer rows.Close()
var tables []string
var name string
for rows.Next() {
row.Scan(&name)
tables = append(tables, name)
}
I have a chapter table have about 2000000 rows, I want to to update each row for some specific conditions:
func main(){
rows, err := db.Query("SELECT id FROM chapters where title = 'custom_type'")
if err != nil {
panic(err)
}
for rows.Next() {
var id int
_ = rows.Scan(&id)
fmt.Println(id)
go updateRowForSomeReason(id)
}
}
func updateRowForSomeReason(id int) {
row, err := db.Query(fmt.Sprintf("SELECT id FROM chapters where parent_id = %v", id))
if err != nil {
panic(err) <----- // here is the panic occurs
}
for rows.Next() {
// ignore update code for simplify
}
}
Inside updateRowForSomeReason, I execute update statement for each row.
It will works with few seconds, after that the error will print:
323005
323057
323125
323244
323282
323342
323459
323498
323556
323618
323693
325343
325424
325468
325624
325816
326001
326045
326082
326226
326297
panic: sql: database is closed
This doesn't seem to be a Go problem as such, more a question of how to optimally structure your SQL within your code. You're taking a result set from executing a query on 2,000,000 rows:
rows, err := db.Query("SELECT id FROM chapters where title = 'custom_type'")
then executing another query for each row in this result set:
row, err := db.Query(fmt.Sprintf("SELECT id FROM chapters where parent_id = %v", id))
and then executing some more code for each of those, apparently one by one:
for rows.Next() {
// ignore update code for simplify
}
This is effectively two levels of nesting of statements, which is very inefficient way to load all these results into program memory and then execute independent UPDATE statements:
SELECT
+---->SELECT
+---->UPDATE
Instead, you could be doing all the work in the database itself, which would be much more efficient. You don't show what the UPDATE statement is, but this is the key part. Let's say you want to set a publish flag. You could do something like this:
UPDATE chapters
SET publish=true
WHERE parent_id in
(SELECT id FROM chapters
WHERE title='custom_type')
RETURNING id;
By using a nested query, you can combine all of the three separate queries into one single query. The database has all the information it needs to optimise the operation and build the most efficient query plan, and you are only executing a single db.Query operation. The RETURNING clause lets you retrieve a list of the ids that ended up being updated in the operation. So the code would be as simple as:
func main(){
rows, err := db.Query("UPDATE chapters SET publish=true WHERE parent_id in" +
"(SELECT id FROM chapters WHERE title='custom_type')" +
"RETURNING id;")
if err != nil {
panic(err)
}
for rows.Next() {
var id int
_ = rows.Scan(&id)
fmt.Println(id)
}
}
Is there any way to get the rows that have been updated using the update command in Gorm, using a single operation.
I know this is like a million years old but for the sake of completion here's the Gorm way of doing it - clauses.
result := r.Gdb.Model(&User{}).Clauses(clause.Returning{}).Where("id = ?", "000-000-000").Updates(content)
Ref: Gorm Returning Data From Modified Rows
It's not pretty, but since you are using postgres you can do:
realDB := db.DB()
rows, err := realDB.Query("UPDATE some_table SET name = 'a' WHERE name = 'b' RETUNING id, name")
//you could probably do db.Raw but I'm not sure
if err != nil {
log.Fatal(err)
}
defer rows.Close()
for rows.Next() {
var id int
var name string
err := rows.Scan(&id, &name)
if err != nil {
log.Fatal(err)
}
log.Println(id, name)
}
This is a decent solution if you know the number of rows you're updating is relatively small (<1000)
var ids []int
var results []YourModel
// Get the list of rows that will be affected
db.Where("YOUR CONDITION HERE").Table("your_table").Select("id").Find(&ids)
query := db.Where("id IN (?)", ids)
// Do the update
query.Model(&YourModel{}).Updates(YourModel{field: "value"})
// Get the updated rows
query.Find(&results)
This is safe against race conditions since it uses the list of IDs to do the update instead of repeating the WHERE clause. As you can imagine this becomes a bit less practical when you start talking about thousands of rows.
I have a BigQuery table with this schema:
name STRING NULLABLE
age INTEGER NULLABLE
amount INTEGER NULLABLE
And I can succesfully insert on the table with this code:
ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
if err != nil {
log.Fatal(err)
}
u := client.Dataset(dataSetID).Table("test_user").Uploader()
savers := []*bigquery.StructSaver{
{Struct: test{Name: "Sylvanas", Age: 23, Amount: 123}},
}
if err := u.Put(ctx, savers); err != nil {
log.Fatal(err)
}
fmt.Printf("rows inserted!!")
This works fine because the table is already created on bigquery, what I want to do now is deleting the table if exist and creating it again from code:
type test struct {
Name string
Age int
Amount int
}
if err := client.Dataset(dataSetID).Table("test_user").Delete(ctx); err != nil {
log.Fatal(err)
}
fmt.Printf("table deleted")
t := client.Dataset(dataSetID).Table("test_user")
// Infer table schema from a Go type.
schema, err := bigquery.InferSchema(test{})
if err := t.Create(ctx,
&bigquery.TableMetadata{
Name: "test_user",
Schema: schema,
}); err != nil {
log.Fatal(err)
}
fmt.Printf("table created with the test schema")
This is also working really nice because is deleting the table and creating it with the infered schema from my struct test.
The problem is coming when I try to do the above insert after the delete/create process. No error is thrown but it is not inserting data (and the insert works fine if I comment the delete/create part).
What am I doing wrong?
Do I need to commit the create table transaction somehow in order to insert or maybe do I need to close the DDBB connection?
According to this old answer, it might take up to 2 min for a BigQuery streaming buffer to be properly attached to a deleted and immediately re-created table.
I have run some tests, and in my case it just took a few seconds until the table is availabe instead of the 2~5 min reported on other questions. The resulting code is quite different from yours, but the concepts should apply.
What I tried is, instead of directly inserting the rows, adding them on a buffered channel, and wait until you can verify that the current table is properly saving the values before start sending them.
I've used a quite simpler struct to run my tests (so it was easier to write the code):
type Row struct {
ByteField []byte
}
I generated my rows the following way:
func generateRows(rows chan<- *Row) {
for {
randBytes := make([]byte, 100)
_, _ = rand.Read(randBytes)
rows <- &row{randBytes}
time.Sleep(time.Millisecond * 500) // use whatever frequency you need to insert rows at
}
}
Notice how I'm sending the rows to the channel. Instead of generating them, you just have to get them from your data source.
The next part is finding a way to check if the table is properly saving the rows. What I did was trying to insert one of the buffered rows into the table, recover that row, and verify if everything is OK. If the row is not properly returned, send it back to the buffer.
func unreadyTable(rows chan *row) bool {
client, err := bigquery.NewClient(context.Background(), project)
if err != nil {return true}
r := <-rows // get a row to try to insert
uploader := client.Dataset(dataset).Table(table).Uploader()
if err := uploader.Put(context.Background(), r); err != nil {rows <- r;return true}
i, err := client.Query(fmt.Sprintf("select * from `%s.%s.%s`", project, dataset, table)).Read(context.Background())
if err != nil {rows <- r; return true}
var testRow []bigquery.Value
if err := i.Next(&testRow); err != nil {rows <- r;return true}
if reflect.DeepEqual(&row{testrow[0].([]byte)}, r) {return false} // there's probably a better way to check if it's equal
rows <- r;return true
}
With a function like that, we only need to add for ; unreadyTable(rows); time.Sleep(time.Second) {} to block until it's safe to insert the rows.
Finally, we put everything together:
func main() {
// initialize a channel where the rows will be sent
rows := make(chan *row, 1000) // make it big enough to hold several minutes of rows
// start generating rows to be inserted
go generateRows(rows)
// create the BigQuery client
client, err := bigquery.NewClient(context.Background(), project)
if err != nil {/* handle error */}
// delete the previous table
if err := client.Dataset(dataset).Table(table).Delete(context.Background()); err != nil {/* handle error */}
// create the new table
schema, err := bigquery.InferSchema(row{})
if err != nil {/* handle error */}
if err := client.Dataset(dataset).Table(table).Create(context.Background(), &bigquery.TableMetadata{Schema: schema}); err != nil {/* handle error */}
// wait for the table to be ready
for ; unreadyTable(rows); time.Sleep(time.Second) {}
// once it's ready, upload indefinitely
for {
if len(rows) > 0 { // if there are uninserted rows, create a batch and insert them
uploader := client.Dataset(dataset).Table(table).Uploader()
insert := make([]*row, min(500, len(rows))) // create a batch of all the rows on buffer, up to 500
for i := range insert {insert[i] = <-rows}
go func(insert []*row) { // do the actual insert async
if err := uploader.Put(context.Background(), insert); err != nil {/* handle error */}
}(insert)
} else { // if there are no rows waiting to be inserted, wait and check again
time.Sleep(time.Second)
}
}
}
Note: Since math.Min() does not like ints, I had to include func min(a,b int)int{if a<b{return a};return b}.
Here's my full working example.