I would like to be able to specify maxResults when using the golang BigQuery library. It isn't clear how to do this, though. I don't see it as an option in the documentation, and I have browsed the source to try to find it but I only see some sporadic usage in seemingly functionality not related to queries. Is there a way to circumvent this issue?
I think there is no implemented method in the SDK for that but after looking a bit, I found this one: request
You could try to execute an HTTP GET specifying the parameters (you can find an example of the use of parameters here: query_parameters)
By default the google API iterators manage page size for you. The RowIterator returns a single row by default, backed internally by fetched pages that rely on the backend to select an appropriate size.
If however you want to specify a fixed max page size, you can use the google.golang.org/api/iterator package to iterate by pages while specifying a specific size. The size, in this case, corresponds to maxResults for BigQuery's query APIs.
See https://github.com/googleapis/google-cloud-go/wiki/Iterator-Guidelines for more general information about advanced iterator usage.
Here's a quick test to demonstrate with the RowIterator in bigquery. It executes a query that returns a row for each day in October, and iterates by page:
func TestQueryPager(t *testing.T) {
ctx := context.Background()
pageSize := 5
client, err := bigquery.NewClient(ctx, "your-project-id here")
if err != nil {
t.Fatal(err)
}
defer client.Close()
q := client.Query("SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2022-10-01','2022-10-31', INTERVAL 1 DAY)) as d")
it, err := q.Read(ctx)
if err != nil {
t.Fatalf("query failure: %v", err)
}
pager := iterator.NewPager(it, pageSize, "")
var fetchedPages int
for {
var rows [][]bigquery.Value
nextToken, err := pager.NextPage(&rows)
if err != nil {
t.Fatalf("NextPage: %v", err)
}
fetchedPages = fetchedPages + 1
if len(rows) > pageSize {
t.Errorf("page size exceeded, got %d want %d", len(rows), pageSize)
}
t.Logf("(next token %s) page size: %d", nextToken, len(rows))
if nextToken == "" {
break
}
}
wantPages := 7
if fetchedPages != wantPages {
t.Fatalf("fetched %d pages, wanted %d pages", fetchedPages, wantPages)
}
}
Related
I have a go program connected to a bigquery table. This is the table's schema:
name STRING NULLABLE
age INTEGER NULLABLE
amount INTEGER NULLABLE
I have succeded at queryng the data of this table and printing all rows on console with this code:
ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
q := client.Query("SELECT * FROM test.test_user LIMIT 1000")
it, err := q.Read(ctx)
if err != nil {
log.Fatal(err)
}
for {
var values []bigquery.Value
err := it.Next(&values)
if err == iterator.Done {
break
}
if err != nil {
// TODO: Handle error.
}
fmt.Println(values)
}
And I also have succeded to insert data on the table from a struct using this code:
type test struct {
Name string
Age int
Amount int
}
u := client.Dataset("testDS").Table("test_user").Uploader()
savers := []*bigquery.StructSaver{
{Struct: test{Name: "Jack", Age: 23, Amount:123}, InsertID: "id1"},
}
if err := u.Put(ctx, savers); err != nil {
log.Fatal(err)
}
fmt.Printf("rows inserted!!")
Now, what I am failing to do is updating rows. What I want to do is selecting all the rows and update all of them with an operation (for example: amount = amount * 2)
How can I achieve this using golang?
Updating rows is not specific to Go, or any other client library. If you want to update data in BigQuery, you need to use DML (Data Manipulation Language) via SQL. So, essentially you already have the main part working (running a query) - you just need to change this SQL to use DML.
But, a word of caution: BigQuery is a OLAP service. Don't use it for OLTP. Also, there are quotas with using DML. Make sure you familiarise yourself with them.
I am trying to read images (in long raw datatype) from external Oracle database using Golang code.
When sql's row.Next() is called following error is caught:
ORA-01406: fetched column value was truncated
row.Next works fine for reading blob images from mssql DB.
Example code:
db, err := sql.Open("oci8", getDSN()) //function to get connection details
if err != nil {
fmt.Println(err)
return
}
defer db.Close()
rows, err := db.Query("SELECT image FROM sysadm.all_images")
if err != nil {
fmt.Println(err)
return
}
defer rows.Close()
for rows.Next() {
var id string
var data []byte
rows.Scan(&id, &data)
}
fmt.Println("Total errors", rows.Err())
}
I hope someone can help me to fix this issue or pinpoint to problem area.
I assume you are using go-oci8 as the driver.
Based on this issue https://github.com/mattn/go-oci8/pull/71 there are someone who get the same error like yours, and then managed to fix it by modifying some codes on the driver.
As per this commit, the problem is already solved by increasing the value of oci8cols[i].size on file $GOPATH/src/github.com/mattn/go-oci8/oci8.go. I think in your case you have bigger blob data, that's why the revision is still not working.
case C.SQLT_NUM:
oci8cols[i].kind = C.SQLT_CHR
oci8cols[i].size = int(lp * 4) // <==== THIS VALUE
oci8cols[i].pbuf = C.malloc(C.size_t(oci8cols[i].size) + 1)
So, try to increase the multiplier, like:
oci8cols[i].size = int(lp * 12) // <==== OR GREATER
My question is specific to the "gopkg.in/olivere/elastic.v2" package I am using.
I am trying to return all documents that match my query:
termQuery := elastic.NewTermQuery("item_id", item_id)
searchResult, err := es.client.Search().
Index(index).
Type(SegmentsType). // search segments type
Query(termQuery). // specify the query
Sort("time", true). // sort by "user" field, ascending
From(0).Size(9).
Pretty(true). // pretty print request and response JSON
Do() // execute
if err != nil {
// Handle error
return timeline, err
}
The problem is I get an internal server error if I increase the size to something large. If I eliminate the line that states:
From(0).Size(9).
then the default is used (10 documents). How may I return all documents?
Using a scroller to retrieve all results is just a bit different and in the interest of brevity I'm not including much error handling that you might need.
Basically you just need to slightly change your code from Search to Scroller and then loop with the Scroller calling Do and handling pages of results.
termQuery := elastic.NewTermQuery("item_id", item_id)
scroller := es.client.Scroller().
Index(index).
Type(SegmentsType).
Query(termQuery).
Sort("time", true).
Size(1)
docs := 0
for {
res, err := scroller.Do(context.TODO())
if err == io.EOF {
// No remaining documents matching the search so break out of the 'forever' loop
break
}
for _, hit := range res.Hits.Hits {
// JSON parse or do whatever with each document retrieved from your index
item := make(map[string]interface{})
err := json.Unmarshal(*hit.Source, &item)
docs++
}
}
I am getting started with RethinkDB, I have never used it before. I give it a try together with Gorethink following this tutorial.
To sum up this tutorial, there are two programs:
The first one updates entries infinitely.
for {
var scoreentry ScoreEntry
pl := rand.Intn(1000)
sc := rand.Intn(6) - 2
res, err := r.Table("scores").Get(strconv.Itoa(pl)).Run(session)
if err != nil {
log.Fatal(err)
}
err = res.One(&scoreentry)
scoreentry.Score = scoreentry.Score + sc
_, err = r.Table("scores").Update(scoreentry).RunWrite(session)
}
And the second one, receives this changes and logs them.
res, err := r.Table("scores").Changes().Run(session)
var value interface{}
if err != nil {
log.Fatalln(err)
}
for res.Next(&value) {
fmt.Println(value)
}
In the statistics that RethinkDB shows, I can see that there are 1.5K reads and writes per second. But in the console of the second program, I see 1 or 2 changes per second approximately.
Why does this occur? Am I missing something?
This code:
r.Table("scores").Update(scoreentry).RunWrite(session)
Probably doesn't do what you think it does. This attempts to update every document in the table by merging scoreentry into it. This is why the RethinkDB console is showing so many writes per second: every time you run that query it's resulting in thousands of writes.
Usually you want to update documents inside of ReQL, like so:
r.Table('scores').Get(strconv.Itoa(pl)).Update(func (row Term) interface{} {
return map[string]interface{}{"Score": row.GetField('Score').Add(sc)};
})
If you need to do the update in Go code, though, you can replace just that one document like so:
r.Table('scores').Get(strconv.Itoa(pl)).Replace(scoreentry)
Im not sure why it is quite that slow, it could be because by default each query blocks until the write has been completely flushed. I would first add some kind of instrumentation to see which operation is being so slow. There are also a couple of ways that you can improve the performance:
Set the Durability of the write using UpdateOpts
_, err = r.Table("scores").Update(scoreentry, r.UpdateOpts{
Durability: "soft",
}).RunWrite(session)
Execute each query in a goroutine to allow your code to execute multiple queries in parallel (you may need to use a pool of goroutines instead but this code is just a simplified example)
for {
go func() {
var scoreentry ScoreEntry
pl := rand.Intn(1000)
sc := rand.Intn(6) - 2
res, err := r.Table("scores").Get(strconv.Itoa(pl)).Run(session)
if err != nil {
log.Fatal(err)
}
err = res.One(&scoreentry)
scoreentry.Score = scoreentry.Score + sc
_, err = r.Table("scores").Update(scoreentry).RunWrite(session)
}()
}
I think I did a silly mistake somewhere, but could not figure where for long time already :( The code is rough, I just testing things.
It deletes, but by some reasons not all documents, I have rewritten to delete it all one by one, and that went OK.
I use official package for Couchbase http://github.com/couchbase/gocb
Here is code:
var items []gocb.BulkOp
myQuery := gocb.NewN1qlQuery([Selecting ~ 283k documents from 1.5mln])
rows, err := myBucket.ExecuteN1qlQuery(myQuery, nil)
checkErr(err)
var idToDelete map[string]interface{}
for rows.Next(&idToDelete) {
items = append(items, &gocb.RemoveOp{Key: idToDelete["id"].(string)})
}
if err := rows.Close(); err != nil {
fmt.Println(err.Error())
}
if err := myBucket.Do(items);err != nil {
fmt.Println(err.Error())
}
This way it deleted ~70k documents, I run it again it got deleted 43k more..
Then I just let it delete one by one, and it worked fine:
//var items []gocb.BulkOp
myQuery := gocb.NewN1qlQuery([Selecting ~ 180k documents from ~1.3mln])
rows, err := myBucket.ExecuteN1qlQuery(myQuery, nil)
checkErr(err)
var idToDelete map[string]interface{}
for rows.Next(&idToDelete) {
//items = append(items, &gocb.RemoveOp{Key: idToDelete["id"].(string)})
_, err := myBucket.Remove(idToDelete["id"].(string), 0)
checkErr(err)
}
if err := rows.Close(); err != nil {
fmt.Println(err.Error())
}
//err = myBucket.Do(items)
By default, queries against N1QL use a consistency level called 'request plus'. Thus, your second time running the program to query will use whatever index update is valid at the time of the query, rather than considering all of your previous mutations by waiting until the index is up to date. You can read more about this in Couchbase's Developer Guide and it looks like the you'll want to add the RequestPlus parameter to your myquery through the consistency method on the query.
This kind of eventually consistent secondary indexing and the flexibility is pretty powerful because it gives you as a developer the ability to decide what level of consistency you want to pay for since index recalculations have a cost.