I am getting started with RethinkDB, I have never used it before. I give it a try together with Gorethink following this tutorial.
To sum up this tutorial, there are two programs:
The first one updates entries infinitely.
for {
var scoreentry ScoreEntry
pl := rand.Intn(1000)
sc := rand.Intn(6) - 2
res, err := r.Table("scores").Get(strconv.Itoa(pl)).Run(session)
if err != nil {
log.Fatal(err)
}
err = res.One(&scoreentry)
scoreentry.Score = scoreentry.Score + sc
_, err = r.Table("scores").Update(scoreentry).RunWrite(session)
}
And the second one, receives this changes and logs them.
res, err := r.Table("scores").Changes().Run(session)
var value interface{}
if err != nil {
log.Fatalln(err)
}
for res.Next(&value) {
fmt.Println(value)
}
In the statistics that RethinkDB shows, I can see that there are 1.5K reads and writes per second. But in the console of the second program, I see 1 or 2 changes per second approximately.
Why does this occur? Am I missing something?
This code:
r.Table("scores").Update(scoreentry).RunWrite(session)
Probably doesn't do what you think it does. This attempts to update every document in the table by merging scoreentry into it. This is why the RethinkDB console is showing so many writes per second: every time you run that query it's resulting in thousands of writes.
Usually you want to update documents inside of ReQL, like so:
r.Table('scores').Get(strconv.Itoa(pl)).Update(func (row Term) interface{} {
return map[string]interface{}{"Score": row.GetField('Score').Add(sc)};
})
If you need to do the update in Go code, though, you can replace just that one document like so:
r.Table('scores').Get(strconv.Itoa(pl)).Replace(scoreentry)
Im not sure why it is quite that slow, it could be because by default each query blocks until the write has been completely flushed. I would first add some kind of instrumentation to see which operation is being so slow. There are also a couple of ways that you can improve the performance:
Set the Durability of the write using UpdateOpts
_, err = r.Table("scores").Update(scoreentry, r.UpdateOpts{
Durability: "soft",
}).RunWrite(session)
Execute each query in a goroutine to allow your code to execute multiple queries in parallel (you may need to use a pool of goroutines instead but this code is just a simplified example)
for {
go func() {
var scoreentry ScoreEntry
pl := rand.Intn(1000)
sc := rand.Intn(6) - 2
res, err := r.Table("scores").Get(strconv.Itoa(pl)).Run(session)
if err != nil {
log.Fatal(err)
}
err = res.One(&scoreentry)
scoreentry.Score = scoreentry.Score + sc
_, err = r.Table("scores").Update(scoreentry).RunWrite(session)
}()
}
Related
I would like to be able to specify maxResults when using the golang BigQuery library. It isn't clear how to do this, though. I don't see it as an option in the documentation, and I have browsed the source to try to find it but I only see some sporadic usage in seemingly functionality not related to queries. Is there a way to circumvent this issue?
I think there is no implemented method in the SDK for that but after looking a bit, I found this one: request
You could try to execute an HTTP GET specifying the parameters (you can find an example of the use of parameters here: query_parameters)
By default the google API iterators manage page size for you. The RowIterator returns a single row by default, backed internally by fetched pages that rely on the backend to select an appropriate size.
If however you want to specify a fixed max page size, you can use the google.golang.org/api/iterator package to iterate by pages while specifying a specific size. The size, in this case, corresponds to maxResults for BigQuery's query APIs.
See https://github.com/googleapis/google-cloud-go/wiki/Iterator-Guidelines for more general information about advanced iterator usage.
Here's a quick test to demonstrate with the RowIterator in bigquery. It executes a query that returns a row for each day in October, and iterates by page:
func TestQueryPager(t *testing.T) {
ctx := context.Background()
pageSize := 5
client, err := bigquery.NewClient(ctx, "your-project-id here")
if err != nil {
t.Fatal(err)
}
defer client.Close()
q := client.Query("SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2022-10-01','2022-10-31', INTERVAL 1 DAY)) as d")
it, err := q.Read(ctx)
if err != nil {
t.Fatalf("query failure: %v", err)
}
pager := iterator.NewPager(it, pageSize, "")
var fetchedPages int
for {
var rows [][]bigquery.Value
nextToken, err := pager.NextPage(&rows)
if err != nil {
t.Fatalf("NextPage: %v", err)
}
fetchedPages = fetchedPages + 1
if len(rows) > pageSize {
t.Errorf("page size exceeded, got %d want %d", len(rows), pageSize)
}
t.Logf("(next token %s) page size: %d", nextToken, len(rows))
if nextToken == "" {
break
}
}
wantPages := 7
if fetchedPages != wantPages {
t.Fatalf("fetched %d pages, wanted %d pages", fetchedPages, wantPages)
}
}
I'm new to Go, so sorry for the silly question in advance!
I'm using Gin framework and want to make multiple queries to the database within the same handler (database/sql + lib/pq)
userIds := []int{}
bookIds := []int{}
var id int
/* Handling first query here */
rows, err := pgClient.Query(getUserIdsQuery)
defer rows.Close()
if err != nil {
return
}
for rows.Next() {
err := rows.Scan(&id)
if err != nil {
return
}
userIds = append(userIds, id)
}
/* Handling second query here */
rows, err = pgClient.Query(getBookIdsQuery)
defer rows.Close()
if err != nil {
return
}
for rows.Next() {
err := rows.Scan(&id)
if err != nil {
return
}
bookIds = append(bookIds, id)
}
I have a couple of questions regarding this code (any improvements and best practices would be appreciated)
Does Go properly handle defer rows.Close() in such a case? I mean I have reassignment of rows variable later down the code, so will compiler track both and properly close at the end of a function?
Is it ok to reuse id shared var or should I redeclare it while iterating within rows.Next() loop?
What's the better approach of having even more queries within one handler? Should I have some kind of Writer that accepts query and slice and populate it with ids retrieved?
Thanks.
I've never worked with go-pg library, and my answer is mostly focused on the other stuff, which are generic, and are not specific to golang or go-pg.
Regardless of the fact that the rows here has the same reference while being shared between 2 queries (so one rows.Close() call would suffice, unless the library has some special implementation), defining two variables is cleaner, like userRows and bookRows.
Although I already said that I have not worked with go-pg, I believe that you wont need to iterate through rows and scan the id for all the rows manually, I believe that the lib has provided some API like this (based on the quick look on the documentations):
userIds := []int{}
err := pgClient.Query(&userIds, "select id from users where ...", args...)
Regarding your second question, it depends on what you mean by "ok". Since your doing some synchronous iteration, I don't think it would result into bugs, but when it comes to coding style, personally, I wouldn't do this.
I think that the best thing to do in your case is this:
// repo layer
func getUserIds(args whatever) ([]int, err) {...}
// these can be exposed, based on your packaging logic
func getBookIds(args whatever) ([]int, err) {...}
// service layer, or wherever you want to aggregate both queries
func getUserAndBookIds() ([]int, []int, err) {
userIds, err := getUserIds(...)
// potential error handling
bookIds, err := getBookIds(...)
// potential error handling
return userIds, bookIds, nil // you have done err handling earlier
}
I think this code is easier to read/maintain. You won't face the variable reassignment and other issues.
You can take a look at the go-pg documentations for more details on how to improve your query.
I try to run SQL queries from a Golang application using the official Tarantool client. The only way I know how to do it is by using conn.Eval like below. But I don't receive any errors. I can drop non existing tables, insert rows with duplicate keys. I will never find out that something went wrong.
resp, err := conn.Eval("box.execute([[TRUNCATE TABLE not_exists;]])", []interface{}{})
// err is always nil
// resp.Error is always empty
Can you point out the way to get errors or the right way to run SQL queries.
thanks for the question!
I have talked to the team and we have two options for you. Here is the first one:
resp, err := conn.Eval("return box.execute([[TRUNCATE TABLE \"not_exists\";]])", []interface{}{})
if len(resp.Tuples()) > 1 {
fmt.Println("Error", resp.Tuples()[1])
}else{
fmt.Println("Result", resp.Tuples()[0])
}
And here is the second one:
r, err := tnt.Eval("local data, err = box.execute(...) return data or box.error(err)", []interface{}{
`TRUNCATE table "not_exists";`,
})
if err != nil {
log.Fatalln(err)
}
I hope that helps! And if it doesn't - let me know and we will look into this one more time.
I am importing data to neo4j using neoism, and I have some issues importing big data, 1000 nodes, would take 8s. here is a part of the code that imports 100nodes.
quite basic code, needs improvement, anyone can help me improve this?
var wg sync.WaitGroup
for _, itemProps := range items {
wg.Add(1)
go func(i interface{}) {
s := time.Now()
cypher := neoism.CypherQuery{
Statement: fmt.Sprintf(`
CREATE (%v)
SET i = {Props}
RETURN i
`, ItemLabel),
Parameters: neoism.Props{"Props": i},
}
if err := database.ExecuteCypherQuery(cypher); err != nil {
utils.Error(fmt.Sprintf("error ImportItemsNeo4j! %v", err))
wg.Done()
return
}
utils.Info(fmt.Sprintf("import Item success! took: %v", time.Since(s)))
wg.Done()
}(itemProps)
}
wg.Wait()
Afaik neoism still uses old APIs, you should use cq instead: https://github.com/go-cq/cq
also you should batch your creates,
i.e. either send multiple statements per request, e.g 100 statements per request
or even better send a list of parameters to a single cypher query:
e.g. {data} is a [{id:1},{id:2},...]
UNWIND {data} as props
CREATE (n:Label) SET n = props
I think I did a silly mistake somewhere, but could not figure where for long time already :( The code is rough, I just testing things.
It deletes, but by some reasons not all documents, I have rewritten to delete it all one by one, and that went OK.
I use official package for Couchbase http://github.com/couchbase/gocb
Here is code:
var items []gocb.BulkOp
myQuery := gocb.NewN1qlQuery([Selecting ~ 283k documents from 1.5mln])
rows, err := myBucket.ExecuteN1qlQuery(myQuery, nil)
checkErr(err)
var idToDelete map[string]interface{}
for rows.Next(&idToDelete) {
items = append(items, &gocb.RemoveOp{Key: idToDelete["id"].(string)})
}
if err := rows.Close(); err != nil {
fmt.Println(err.Error())
}
if err := myBucket.Do(items);err != nil {
fmt.Println(err.Error())
}
This way it deleted ~70k documents, I run it again it got deleted 43k more..
Then I just let it delete one by one, and it worked fine:
//var items []gocb.BulkOp
myQuery := gocb.NewN1qlQuery([Selecting ~ 180k documents from ~1.3mln])
rows, err := myBucket.ExecuteN1qlQuery(myQuery, nil)
checkErr(err)
var idToDelete map[string]interface{}
for rows.Next(&idToDelete) {
//items = append(items, &gocb.RemoveOp{Key: idToDelete["id"].(string)})
_, err := myBucket.Remove(idToDelete["id"].(string), 0)
checkErr(err)
}
if err := rows.Close(); err != nil {
fmt.Println(err.Error())
}
//err = myBucket.Do(items)
By default, queries against N1QL use a consistency level called 'request plus'. Thus, your second time running the program to query will use whatever index update is valid at the time of the query, rather than considering all of your previous mutations by waiting until the index is up to date. You can read more about this in Couchbase's Developer Guide and it looks like the you'll want to add the RequestPlus parameter to your myquery through the consistency method on the query.
This kind of eventually consistent secondary indexing and the flexibility is pretty powerful because it gives you as a developer the ability to decide what level of consistency you want to pay for since index recalculations have a cost.