Elastic Go cannot find document - elasticsearch

Sorry if this is a dumb question. I'm using a service which was built using Elasticsearch client for Go. I run the service and now seems like the elasticsearch server have the index of the data. However, when I tried to query those data with http://129.94.14.234:9200/chromosomes/chromosome/1, I got {"_index":"chromosomes","_type":"chromosome","_id":"1","_version":1,"found":true,"_source":{"id":"1","length":249250621}} I checked that the SQL query from the database have those data. Now the question is that, how do I check that my elasticsearch index have those data? Or If anyone can tell me what might be wrong on the code that'll be great as well.
Here's the code that I assume adding the documents to chromosomes index.
func (c ChromosomeIndexer) AddDocuments(db *sql.DB, client *elastic.Client, coordID int) {
sqlQuery := fmt.Sprintf("SELECT seq_region.name, seq_region.length FROM seq_region WHERE seq_region.`name` REGEXP '^[[:digit:]]{1,2}$|^[xXyY]$|(?i)^mt$' AND seq_region.`coord_system_id` = %d", coordID)
stmtOut, err := db.Prepare(sqlQuery)
check(err)
defer stmtOut.Close()
stmtOut.Query()
rows, err := stmtOut.Query()
defer rows.Close()
check(err)
chromoFn := func(rows *sql.Rows, bulkRequest *elastic.BulkService) {
var name string
var length int
err = rows.Scan(&name, &length)
check(err)
chromo := Chromosome{ID: name, Length: length}
fmt.Printf("chromoID: %s\n", chromo.ID)
req := elastic.NewBulkIndexRequest().
OpType("index").
Index("chromosomes").
Type("chromosome").
Id(chromo.ID).
Doc(chromo)
bulkRequest.Add(req)
}
elasticutil.IterateSQL(rows, client, chromoFn)
}
This service have other index which I can query the data with no problem, I only have problem when querying chromosomes data.
Please let me know if I need to put more code so that I can give a bit more context on the problem, I just started on Go and Elasticsearch, and I tried reading the documentation, but it just leads to more confusion.

Related

Multiple queries to Postgres within the same function

I'm new to Go, so sorry for the silly question in advance!
I'm using Gin framework and want to make multiple queries to the database within the same handler (database/sql + lib/pq)
userIds := []int{}
bookIds := []int{}
var id int
/* Handling first query here */
rows, err := pgClient.Query(getUserIdsQuery)
defer rows.Close()
if err != nil {
return
}
for rows.Next() {
err := rows.Scan(&id)
if err != nil {
return
}
userIds = append(userIds, id)
}
/* Handling second query here */
rows, err = pgClient.Query(getBookIdsQuery)
defer rows.Close()
if err != nil {
return
}
for rows.Next() {
err := rows.Scan(&id)
if err != nil {
return
}
bookIds = append(bookIds, id)
}
I have a couple of questions regarding this code (any improvements and best practices would be appreciated)
Does Go properly handle defer rows.Close() in such a case? I mean I have reassignment of rows variable later down the code, so will compiler track both and properly close at the end of a function?
Is it ok to reuse id shared var or should I redeclare it while iterating within rows.Next() loop?
What's the better approach of having even more queries within one handler? Should I have some kind of Writer that accepts query and slice and populate it with ids retrieved?
Thanks.
I've never worked with go-pg library, and my answer is mostly focused on the other stuff, which are generic, and are not specific to golang or go-pg.
Regardless of the fact that the rows here has the same reference while being shared between 2 queries (so one rows.Close() call would suffice, unless the library has some special implementation), defining two variables is cleaner, like userRows and bookRows.
Although I already said that I have not worked with go-pg, I believe that you wont need to iterate through rows and scan the id for all the rows manually, I believe that the lib has provided some API like this (based on the quick look on the documentations):
userIds := []int{}
err := pgClient.Query(&userIds, "select id from users where ...", args...)
Regarding your second question, it depends on what you mean by "ok". Since your doing some synchronous iteration, I don't think it would result into bugs, but when it comes to coding style, personally, I wouldn't do this.
I think that the best thing to do in your case is this:
// repo layer
func getUserIds(args whatever) ([]int, err) {...}
// these can be exposed, based on your packaging logic
func getBookIds(args whatever) ([]int, err) {...}
// service layer, or wherever you want to aggregate both queries
func getUserAndBookIds() ([]int, []int, err) {
userIds, err := getUserIds(...)
// potential error handling
bookIds, err := getBookIds(...)
// potential error handling
return userIds, bookIds, nil // you have done err handling earlier
}
I think this code is easier to read/maintain. You won't face the variable reassignment and other issues.
You can take a look at the go-pg documentations for more details on how to improve your query.

Get a single item from the query Firestore

I'm trying to get a single document from the Firestore Firebase using Golang. I know that it is easy if you have an id, you can write something like this:
database.Collection("profiles").Doc(userId).Get(ctx)
In my case I need to find a specific document using a Where condition, and this is where I get stuck. So far I was able to come up only with the following:
database.Collection("users").Where("name", "==", "Mark").Limit(1).Documents(ctx).GetAll()
And it is obviously not the best solution since I am looking only for one (basically the first) document which follows the condition and using GetAll() seems really weird. What would be the best approach?
The application can call Next and Stop to get a single document:
func getOne(ctx context.Context, q firestore.Query) (*firestore.DocumentSnapshot, error) {
it := q.Limit(1).Documents(ctx)
defer it.Stop()
snap, err := it.Next()
if err == iterator.Done {
err = fmt.Errorf("no matching documents")
}
return snap, err
}

Query max date in elasticsearch

I need to get in Go the latest date from a datetime field of the elasticsearch index's documents. Basically, I have this query that I do directly in the elasticsearch that returns me just what I need:
GET localhost:9200/index-name/_search
{
"aggs" : {
"max_date": {"max": {"field": "dateTime"}}
}
}
I need to do that same query in Go. I saw that there is a MaxAgregation in Olivere library, but I'm not sure about how to use it. Does someone knows how to do that?
Here's how to get started w/ the go library.
After you instantiate a client, you can do the following:
maxDateAgg := NewMaxAggregation().Field("dateTime")
builder := client.Search().Index("index-name").Pretty(true)
builder = builder.Aggregation("max_date", maxDateAgg)
After that, call .Do(ctx) on the builder.
I manage to make it work with one of the answers and some things that I already had:
maxDateAgg := elastic.NewMaxAggregation().Field("dateTime")
builder := esClient.Search().Index("index-name").Pretty(true)
builder = builder.Aggregation("max_datetime", maxDateAgg) builder = builder.Size(1).Sort("dateTime",false)
searchResult, err := builder.Do(ctx)
This returns all the fields from the documents with the max datetime, so I made a Struct, and json.unmarshall the search.hit.hit source into the struct, so i could take only the datetime field from the struct.
For those wondering how to get the date
ctx := context.Background()
termQuery := elastic.NewMatchAllQuery()
agg := elastic.NewMaxAggregation().Field("le_nested.last_updated")
sr, err := elastic.Client.Search("your_index").Query(termQuery).Aggregation("max_date", agg).Do(ctx)
if err != nil {
m := fmt.Sprint("Error getting getting last updated data", err)
fmt.Println(m)
}
max, found := sr.Aggregations.MaxBucket("max_date")
if !found {
m := fmt.Sprint("max_date aggregation not found", err)
fmt.Println(m)
}
fmt.Println("MAX DATE", max.ValueAsString)
tm := int64(*max.Value) / 1000
mu := time.Unix(tm, 0)
fmt.Println("DATE", mu)
ES returns the dates in millis, no need to unmarshall anything

Insert thousand nodes to neo4j using golang

I am importing data to neo4j using neoism, and I have some issues importing big data, 1000 nodes, would take 8s. here is a part of the code that imports 100nodes.
quite basic code, needs improvement, anyone can help me improve this?
var wg sync.WaitGroup
for _, itemProps := range items {
wg.Add(1)
go func(i interface{}) {
s := time.Now()
cypher := neoism.CypherQuery{
Statement: fmt.Sprintf(`
CREATE (%v)
SET i = {Props}
RETURN i
`, ItemLabel),
Parameters: neoism.Props{"Props": i},
}
if err := database.ExecuteCypherQuery(cypher); err != nil {
utils.Error(fmt.Sprintf("error ImportItemsNeo4j! %v", err))
wg.Done()
return
}
utils.Info(fmt.Sprintf("import Item success! took: %v", time.Since(s)))
wg.Done()
}(itemProps)
}
wg.Wait()
Afaik neoism still uses old APIs, you should use cq instead: https://github.com/go-cq/cq
also you should batch your creates,
i.e. either send multiple statements per request, e.g 100 statements per request
or even better send a list of parameters to a single cypher query:
e.g. {data} is a [{id:1},{id:2},...]
UNWIND {data} as props
CREATE (n:Label) SET n = props

golang couchbase gocb.RemoveOp - doesnt removes all

I think I did a silly mistake somewhere, but could not figure where for long time already :( The code is rough, I just testing things.
It deletes, but by some reasons not all documents, I have rewritten to delete it all one by one, and that went OK.
I use official package for Couchbase http://github.com/couchbase/gocb
Here is code:
var items []gocb.BulkOp
myQuery := gocb.NewN1qlQuery([Selecting ~ 283k documents from 1.5mln])
rows, err := myBucket.ExecuteN1qlQuery(myQuery, nil)
checkErr(err)
var idToDelete map[string]interface{}
for rows.Next(&idToDelete) {
items = append(items, &gocb.RemoveOp{Key: idToDelete["id"].(string)})
}
if err := rows.Close(); err != nil {
fmt.Println(err.Error())
}
if err := myBucket.Do(items);err != nil {
fmt.Println(err.Error())
}
This way it deleted ~70k documents, I run it again it got deleted 43k more..
Then I just let it delete one by one, and it worked fine:
//var items []gocb.BulkOp
myQuery := gocb.NewN1qlQuery([Selecting ~ 180k documents from ~1.3mln])
rows, err := myBucket.ExecuteN1qlQuery(myQuery, nil)
checkErr(err)
var idToDelete map[string]interface{}
for rows.Next(&idToDelete) {
//items = append(items, &gocb.RemoveOp{Key: idToDelete["id"].(string)})
_, err := myBucket.Remove(idToDelete["id"].(string), 0)
checkErr(err)
}
if err := rows.Close(); err != nil {
fmt.Println(err.Error())
}
//err = myBucket.Do(items)
By default, queries against N1QL use a consistency level called 'request plus'. Thus, your second time running the program to query will use whatever index update is valid at the time of the query, rather than considering all of your previous mutations by waiting until the index is up to date. You can read more about this in Couchbase's Developer Guide and it looks like the you'll want to add the RequestPlus parameter to your myquery through the consistency method on the query.
This kind of eventually consistent secondary indexing and the flexibility is pretty powerful because it gives you as a developer the ability to decide what level of consistency you want to pay for since index recalculations have a cost.

Resources