Create TimescaleDB hypertable using golang batch query - go

My golang service needs to create a hypertable dynamically in Timescale DB. I use pgx driver. My code is the following (I removed an error handling):
func (s *Storage) CreateHyperTableBatch(ctx context.Context, tableName string) error {
indexName := "idx_" + tableName
conn, _ := s.pool.Acquire(ctx)
defer conn.Release()
b := pgx.Batch{}
b.Queue(fmt.Sprintf(`create table if not exists public.%s (
ts timestamptz primary key not null,
data jsonb not null
);
`, tableName))
b.Queue(fmt.Sprintf(`create index if not exists public.%s on public.%s using gin (data);`, indexName, tableName))
b.Queue(fmt.Sprintf(`select create_hypertable('%s', 'ts', if_not_exists => true);`, tableName))
batchResult := conn.SendBatch(ctx, &b)
defer func() { _ = batchResult.Close() }()
_, _ = batchResult.Exec()
return nil
}
Exec() returns an error
failed to create tableTwo: failed to run query: ERROR: relation "tabletwo" does not exist (SQLSTATE 42P01)
I added schema name to the query (as you see) but this doesn't help. If I split it into three queries everything works fine
func (s *Storage) CreateHyperTable(ctx context.Context, tableName string) error {
indexName := "idx_" + tableName
conn, _ := s.pool.Acquire(ctx)
defer conn.Release()
_, _ := conn.Exec(ctx, fmt.Sprintf(`create table if not exists %s (
ts timestamptz primary key not null,
data jsonb not null
);
`, tableName))
_, _ := conn.Exec(ctx, fmt.Sprintf(`create index if not exists %s on %s using gin (data);`, indexName, tableName))
_, _ := conn.Exec(ctx, fmt.Sprintf(`select create_hypertable('%s', 'ts', if_not_exists => true);`, tableName))
return nil
}
I suppose that the problem is that timescale db needs a plain table to be created and committed to create a hyper table. Is it right or what's the problem? Did anyone encountered the issue and how did you solve it?

Can't give a direct answer on your pgx question, but yes, you need to create a table first before trying to create a hypertable. So I'd try to do the former as a separate transaction and see if that succeeeds.

Related

Large batch Inserts Using Snowflake & Go

I am retrieving payloads from a REST API with which I then want to insert into a Snowflake table.
My current process is to use the Snowflake DB connection and iterate over a slice of structs (which contain my data from the API). However, this doesn't seem to be efficient or optimal. Everything is successfully loading, but I am trying to figure out how to optimize a large amount of inserts for potentially thousands of records. Perhaps there needs to be a separate channel for insertions instead of synchronously inserting?
General code flow:
import (
"database/sql"
"fmt"
"sync"
"time"
_ "github.com/snowflakedb/gosnowflake"
)
func ETL() {
var wg sync.WaitGroup
ch := make(chan []*Response)
defer close(ch)
// Create requests to API
for _, req := range requests {
// All of this flows fine without issue
wg.Add(1)
go func(request Request) {
defer wg.Done()
resp, _ := request.Get()
ch <- resp
}(request)
}
// Connect to snowflake
// This is not a problem
connString := fmt.Sprintf(config...)
db, _ := sql.Open("snowflake", connString)
defer db.Close()
// Collect responses from our channel
results := make([][]*Response, len(requests))
for i, _ := range results {
results[i] <-ch
for _, res := range results[i] {
// transform is just a function to flatten my structs into entries that I would like to insert into Snowflake. This is not a bottleneck.
entries := transform(res)
// Load the data into snowflake, passing the entries that have been
// Flattened as well as the db connection
err := load(entries, db)
}
}
}
type Entry struct {
field1 string
field2 string
statusCode int
}
func load(entries []*Entry, db *sql.DB) error {
start := time.Now()
for i, entry := range entries {
fmt.Printf("Loading entry %d\n", i)
stmt := `INSERT INTO tbl (field1, field2, updated_date, status_code)
VALUES (?, ?, CURRENT_TIMESTAMP(), ?)`
_, err := db.Exec(stmt, entry.field1, entry.field2, entry.statusCode)
if err != nil {
fmt.Println(err)
return err
}
}
fmt.Println("Load time: ", time.Since(start))
return nil
}
Instead of INSERTing individual rows, collect rows in files and each time you push one of these to S3/GCS/Azure it will be loaded immediately.
I wrote a post detailing these steps:
https://medium.com/snowflake/lightweight-batch-streaming-to-snowflake-with-gcp-pub-sub-1790ab76da31
With the appropriate storage integration, this would auto-ingest the files:
create pipe temp.public.meetup202011_pipe
auto_ingest = true
integration = temp_meetup202011_pubsub_int
as
copy into temp.public.meetup202011_rsvps
from #temp_fhoffa_gcs_meetup;
Also check these considerations:
https://www.snowflake.com/blog/best-practices-for-data-ingestion/
Soon: If you want to send individual rows and ingest them in real time into Snowflake - that's in development (https://www.snowflake.com/blog/snowflake-streaming-now-hiring-help-design-and-build-the-future-of-big-data-and-stream-processing/).

MonetDB Insert Statement Returning id support in Golang

I am trying to run this insert query statement using golang which should return an id
INSERT INTO users (u_email,u_phone,u_password_hash) VALUES (?,?,?) RETURNING u_user_id;
it seems there is no support for that in MonetDB.
Well i tried to get the id by unique email when the insert is done without the "RETURNING u_user_id" part as below
func GetByEmail(db *sql.DB) {
userid := `SELECT u_user_id FROM users WHERE u_email = ?;`
var id UserResponse
email := `baicmr#email.com`
stmt, err := db.Prepare(userid)
if err != nil {
log.Panic(err)
}
defer stmt.Close()
result := stmt.QueryRow(email)
if getErr := result.Scan(&id.ID); getErr != nil {
log.Panic(getErr)
}
fmt.Printf("id.ID: %v\n", id.ID)
}
the result.Scan() line panics
panic: runtime error: index out of range [0] with length 0
What am i doing which is wrong
Can you please open a issue/feature request over at: https://github.com/MonetDB/MonetDB-Go ? This looks like a bug.

Using GORM to retrieve tables names from Postgresql

Looking to retrieve table names from my postgresql database. Now, I know in Go you can use sql and the pq driver, but I'm using GORM for doing queries in my REST API.
The table_name type in PostgreSQL is "information_schema.sql_identifier". This is what I was trying to do, but the type isn't string.
var tables []string
if err := db.Table("information_schema.tables").Select("table_name").Where("table_schema = ?", "public").Find(&tables).Error; err != nil {
panic(err)
}
TL;DR
To select a single column values into a slice using Gorm, you can use db.Pluck helper:
var tables []string
if err := db.Table("information_schema.tables").Where("table_schema = ?", "public").Pluck("table_name", &tables).Error; err != nil {
panic(err)
}
TS;WM
Considering this, the SELECT statement returns a set of rows with one or more columns. In order to map those to Go code, we need a sort of struct so that Gorm can understand which column is mapped to which field of the struct. Even when you only select 1 single column, it's just a struct with 1 single field.
type Table struct {
TableName string
// more fields if needed...
}
So your output variable should be []*Table:
var tables []*Table
if err := db.Table("information_schema.tables").Select("table_name").Where("table_schema = ?", "public").Find(&tables).Error; err != nil {
panic(err)
}
Note: it could be []Table as well if you don't want to modify the element inside the slice.
If you don't want to define the struct, you can use the the db.Pluck function which is just a helper of this sort of code:
rows, err := db.Table("information_schema.tables").Select("table_name").Where("table_schema = ?", "public").Rows()
defer rows.Close()
var tables []string
var name string
for rows.Next() {
row.Scan(&name)
tables = append(tables, name)
}

How to update a bigquery row in golang

I have a go program connected to a bigquery table. This is the table's schema:
name STRING NULLABLE
age INTEGER NULLABLE
amount INTEGER NULLABLE
I have succeded at queryng the data of this table and printing all rows on console with this code:
ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
q := client.Query("SELECT * FROM test.test_user LIMIT 1000")
it, err := q.Read(ctx)
if err != nil {
log.Fatal(err)
}
for {
var values []bigquery.Value
err := it.Next(&values)
if err == iterator.Done {
break
}
if err != nil {
// TODO: Handle error.
}
fmt.Println(values)
}
And I also have succeded to insert data on the table from a struct using this code:
type test struct {
Name string
Age int
Amount int
}
u := client.Dataset("testDS").Table("test_user").Uploader()
savers := []*bigquery.StructSaver{
{Struct: test{Name: "Jack", Age: 23, Amount:123}, InsertID: "id1"},
}
if err := u.Put(ctx, savers); err != nil {
log.Fatal(err)
}
fmt.Printf("rows inserted!!")
Now, what I am failing to do is updating rows. What I want to do is selecting all the rows and update all of them with an operation (for example: amount = amount * 2)
How can I achieve this using golang?
Updating rows is not specific to Go, or any other client library. If you want to update data in BigQuery, you need to use DML (Data Manipulation Language) via SQL. So, essentially you already have the main part working (running a query) - you just need to change this SQL to use DML.
But, a word of caution: BigQuery is a OLAP service. Don't use it for OLTP. Also, there are quotas with using DML. Make sure you familiarise yourself with them.

Golang code is not inserting on BigQuery's table after I have created it from code

I have a BigQuery table with this schema:
name STRING NULLABLE
age INTEGER NULLABLE
amount INTEGER NULLABLE
And I can succesfully insert on the table with this code:
ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
if err != nil {
log.Fatal(err)
}
u := client.Dataset(dataSetID).Table("test_user").Uploader()
savers := []*bigquery.StructSaver{
{Struct: test{Name: "Sylvanas", Age: 23, Amount: 123}},
}
if err := u.Put(ctx, savers); err != nil {
log.Fatal(err)
}
fmt.Printf("rows inserted!!")
This works fine because the table is already created on bigquery, what I want to do now is deleting the table if exist and creating it again from code:
type test struct {
Name string
Age int
Amount int
}
if err := client.Dataset(dataSetID).Table("test_user").Delete(ctx); err != nil {
log.Fatal(err)
}
fmt.Printf("table deleted")
t := client.Dataset(dataSetID).Table("test_user")
// Infer table schema from a Go type.
schema, err := bigquery.InferSchema(test{})
if err := t.Create(ctx,
&bigquery.TableMetadata{
Name: "test_user",
Schema: schema,
}); err != nil {
log.Fatal(err)
}
fmt.Printf("table created with the test schema")
This is also working really nice because is deleting the table and creating it with the infered schema from my struct test.
The problem is coming when I try to do the above insert after the delete/create process. No error is thrown but it is not inserting data (and the insert works fine if I comment the delete/create part).
What am I doing wrong?
Do I need to commit the create table transaction somehow in order to insert or maybe do I need to close the DDBB connection?
According to this old answer, it might take up to 2 min for a BigQuery streaming buffer to be properly attached to a deleted and immediately re-created table.
I have run some tests, and in my case it just took a few seconds until the table is availabe instead of the 2~5 min reported on other questions. The resulting code is quite different from yours, but the concepts should apply.
What I tried is, instead of directly inserting the rows, adding them on a buffered channel, and wait until you can verify that the current table is properly saving the values before start sending them.
I've used a quite simpler struct to run my tests (so it was easier to write the code):
type Row struct {
ByteField []byte
}
I generated my rows the following way:
func generateRows(rows chan<- *Row) {
for {
randBytes := make([]byte, 100)
_, _ = rand.Read(randBytes)
rows <- &row{randBytes}
time.Sleep(time.Millisecond * 500) // use whatever frequency you need to insert rows at
}
}
Notice how I'm sending the rows to the channel. Instead of generating them, you just have to get them from your data source.
The next part is finding a way to check if the table is properly saving the rows. What I did was trying to insert one of the buffered rows into the table, recover that row, and verify if everything is OK. If the row is not properly returned, send it back to the buffer.
func unreadyTable(rows chan *row) bool {
client, err := bigquery.NewClient(context.Background(), project)
if err != nil {return true}
r := <-rows // get a row to try to insert
uploader := client.Dataset(dataset).Table(table).Uploader()
if err := uploader.Put(context.Background(), r); err != nil {rows <- r;return true}
i, err := client.Query(fmt.Sprintf("select * from `%s.%s.%s`", project, dataset, table)).Read(context.Background())
if err != nil {rows <- r; return true}
var testRow []bigquery.Value
if err := i.Next(&testRow); err != nil {rows <- r;return true}
if reflect.DeepEqual(&row{testrow[0].([]byte)}, r) {return false} // there's probably a better way to check if it's equal
rows <- r;return true
}
With a function like that, we only need to add for ; unreadyTable(rows); time.Sleep(time.Second) {} to block until it's safe to insert the rows.
Finally, we put everything together:
func main() {
// initialize a channel where the rows will be sent
rows := make(chan *row, 1000) // make it big enough to hold several minutes of rows
// start generating rows to be inserted
go generateRows(rows)
// create the BigQuery client
client, err := bigquery.NewClient(context.Background(), project)
if err != nil {/* handle error */}
// delete the previous table
if err := client.Dataset(dataset).Table(table).Delete(context.Background()); err != nil {/* handle error */}
// create the new table
schema, err := bigquery.InferSchema(row{})
if err != nil {/* handle error */}
if err := client.Dataset(dataset).Table(table).Create(context.Background(), &bigquery.TableMetadata{Schema: schema}); err != nil {/* handle error */}
// wait for the table to be ready
for ; unreadyTable(rows); time.Sleep(time.Second) {}
// once it's ready, upload indefinitely
for {
if len(rows) > 0 { // if there are uninserted rows, create a batch and insert them
uploader := client.Dataset(dataset).Table(table).Uploader()
insert := make([]*row, min(500, len(rows))) // create a batch of all the rows on buffer, up to 500
for i := range insert {insert[i] = <-rows}
go func(insert []*row) { // do the actual insert async
if err := uploader.Put(context.Background(), insert); err != nil {/* handle error */}
}(insert)
} else { // if there are no rows waiting to be inserted, wait and check again
time.Sleep(time.Second)
}
}
}
Note: Since math.Min() does not like ints, I had to include func min(a,b int)int{if a<b{return a};return b}.
Here's my full working example.

Resources