Why searchResult.TotalHits() is different than len(searchResult.Hits.Hits)? - elasticsearch

I work with golang elastic 5 API to run queries in ElasticSearch. I check the number of hits with searchResult.TotalHits() and it gives me a large number (more than 100), but when I try to iterate over the hits it gives only 10 entities. Also when I check the len(searchResult.Hits.Hits) variable I get 10.
I tried different queries when I select less than 10 entity and it works well.
query = elastic.NewBoolQuery()
ctx := context.Background()
query = query.Must(elastic.NewTermQuery("key0", "term"),
elastic.NewWildcardQuery("key1", "*term2*"),
elastic.NewWildcardQuery("key3", "*.*"),
elastic.NewRangeQuery("timestamp").From(fromTime).To(toTime),
)
searchResult, err = client.Search().Index("index").
Query(query).Pretty(true).Do(ctx)
fmt.Printf("TotalHits(): %v", searchResult.TotalHits()) //It gives me 482
fmt.Printf("length of the hits array: %v", len(searchResult.Hits.Hits)) //It gives 10
for _, hit := range searchResult.Hits.Hits {
var tweet Tweet
_ = json.Unmarshal(*hit.Source, &tweet)
fmt.Printf("entity: %s", tweet) //It prints 10 entity
}
What am I doing wrong? Are there batches in the SearchResult or what could be the solution?

It's not specified in your question so please comment if you're using a different client library (eg the official client), but it appears you're using github.com/olivere/elastic. Based on that assumption, what you're seeing is the default result set size of 10. The TotalHits number is how many documents match your query in total; the Hits number is how many were returned in your current result, which you can manipulate using Size, Sort and From. Size is documented as:
Size is the number of search hits to return. Defaults to 10.

Related

How to write a Go/Gin general function of 14 behaviors, and then call or check it in a certain place?

There are nearly 14 behaviors of sending points to users, such as user registration, login, purchase, and chat. It is required not to change the previous interface code in Go.
id
motion
points
remark
1
/register
5
one user only have one chance to get points
2
/login
5
every day the user has one chance to get points
3
/comment
3
every day the user has five chance to get points
4
/pay
10
every day the user has three chance to get points
5
/invite
10
every day the user has three chance to get points
6
/send
10
every day the user has three chance to get points
7
/purchase
10
every day the user has three chance to get points
Every day the user only has 100 points
Through the following partial code of /register every user can get 5 points after he registers successfully.
func (r *User) CreateUser(CreateUser *CreateUserModel, c *lmhttp.Context) {
err = r.userDB.insertScoreTx(CreateUser.UserID, 1, 5)
if err != nil {
r.Error("Send points failed", zap.Error(err))
c.ResponseError(errors.New("Send points failed"))
return
}
c.Response(LoginUserDetail(UserModel, token, r.ctx))
}
And the data of table user_points is as below:
id
user_id
points
points_type
1
1
5
1
Can I write a general function of sending points, and then call or check it in a certain place, that is, judge whether to increase points when each interface request is successful, such as the following places where each interface responds successfully?
// Response OK
func (c *Context) ResponseOK() {
c.JSON(http.StatusOK, gin.H{
"status": http.StatusOK,
})
}
Thanks so much for any advice
You can definitely do what you're asking. I don't know Go but some webservers will have a hook to run a function on every successful request. In that function based on the URL you can check if to add points for the user. You are already storing the type of points but you also need to store how many chances they used up for the day.
Then you can write the logic for whether they add points for each of the rules.
If in Go there is no hook function that runs on any request you can go as basic as a function that takes in e.g 'register', 'login'. And call it manually from everywhere you handle the requests you wanna add points for.
You also need to reset the data based on the date since all the rules are "today".
My best advice for that is to also store the current date on all the user_points entries. And when checking whether to add points if the current day is not today you set the number of used up chances to 1 and the date to current.
e.g table
id
user_id
points
points_type
chances_used
date
1
1
5
1
2
14:33 09/02/2023
e.g logic from inside the function:
if type == 'comment':
points = db.get(user_points).filter(user.id=1, points_type=3)
if points.date == date.today():
points.chances_used += 1
if points.chances_used < 5
user.points += 3
if points.date < date.today():
points.chances_used = 1
points.date = date.now()
user.points += 3
Though thinking about it now this means you would lose record of where a users points came from because I am using user.points to store the actual value.
You might wanna just add a row to the table every-time you check instead. And get the data instead of something like db.get(user_points).filter(user.id=1, points_type=3), just get the users latest for that point_type. And use that row to check the date and get the chances_used but save your data into a new row.
You could create a dedicated struct or function to handle scores. From what I see in the code in order to apply points to the user you would need to pass a few information such as userDB, UserId, points_type and points, the same what you already have in the code. If you want the function to be stateless and available everywhere you can define it in dedicated userpoints package and simply export:
func SendPointsTo(db userDB, userId, pointsType, pointsCount int) error {
err := d.insertScoreTx(userId, pointsType, pointsCount )
if err != nil {
return errors.New("Sending points failed")
}
return nil
}
Don't mix logic of creating user with some other actions - it introduces confusion and breaks SRP. It's better to make a dedicated createUserHandler which performs chains of steps defined by the contract. A part of the contract would then be additional SendPointsTo func call.
Is it what you mean? If not can you be more preciseful in problem definition?

NRediSearch - Getting total documents matched count

Is there a way to get a total results count when calling Aggregate function?
Note that I'm not using Aggregate function to aggregate results, but as an advanced search query, because Search function does not allow to sort by multiple fields.
RediSearch returns total documents matched count, but I can't find a way to get this number using NRediSearch library.
With NRediSearch
Using NRediSearch, you would need to build and execute aggregation that will run a GROUPBY 0 and the COUNT reducer, say you have a person-idx index and you want to count all the Person documents in Redis:
var client = new Client("person-idx", muxer.GetDatabase());
var result = await client.AggregateAsync(new AggregationBuilder().GroupBy(new List<string>(), new List<Reducer>{Reducers.Count()}));
Console.WriteLine(result.GetResults().First().Values.First());
Will get the count you are looking for.
With Redis.OM
There's a newer library Redis.OM which you can also use to make these aggregations a bit simpler, the same operation would be done with the following:
var peopleAggregations = provider.AggregationSet<Person>();
Console.WriteLine(peopleAggregations.Count());

Using a single query for multiple searches in ElasticSearch

I have a dataset with documents that are identifiable by three fields, let's say "name","timestamp" and "country". Now, I use elasticsearch-dsl-py, but I can read native elasticsearch queries, so I can accept those as answers as well.
Here's my code to get a single document by the three fields:
def get(name, timestamp, country):
search = Item.search()
search = search.filter("term", name=name)
search = search.filter("term", timestamp=timestamp)
search = search.filter("term", country=country)
search = search[:1]
return search.execute()[0]
This is all good, but sometimes I'll need to get 200+ items and calling this function means 200 queries to ES.
What I'm looking for is a single query that will take a list of the three field-identifiers and return all the documents matching it, no matter the order.
I've tried using ORs + ANDs but unfortunately the performance is still poor, although at least I'm not making 200 round trips to the server.
def get_batch(list_of_identifiers):
search = Item.search()
batch_query = None
for ref in list_of_identifiers:
sub_query = Q("match", name=ref["name"])
sub_query &= Q("match", timestamp=ref["timestamp"])
sub_query &= Q("match", country=ref["country"])
if not batch_query:
batch_query = sub_query
else:
batch_query |= sub_query
search = search.filter(batch_query)
return search.scan()
Is there a faster/better approach to this problem?
Is using a multi-search going to be the faster option than using should/musts (OR/ANDs) in a single query?
EDIT: I tried multi-search and there was virtually no difference in the time. We're talking about seconds here. For 6 items it takes 60ms to get the result, for 200 items we're talking about 4-5 seconds.

How to return the N documents closest to a specific key from a couchdb view

I have a view on a couchdb database which exposes a certain document property as a key:
function (doc) {
if (doc.docType && doc.docType === 'CARD') {
if (doc.elo) {
emit(doc.elo, doc._id);
} else {
emit(1000, doc._id);
}
}
}
I'm interested in querying this db for the (say) 25 documents with keys closest to a given input. The only thing I can think to do is to set a search range and make repeated queries until I have enough results:
// pouchdb's query fcn
function getNresultsClosestToK(key: number, limit: number) {
let range = 20;
do {
cards = await this.db.query('elo', {
limit,
startkey: (key - range).toString(),
endkey: (key + range).toString()
});
range += 20;
} while (cards.rows.length < limit)
return cards;
}
But this may require several calls and is inefficient. Is there a way to pass a single key and a limit to couch and have it return the limit documents closest to the supplied key?
If I understand correctly, you want to query for a specific key, then return 12 results before the key, the key itself, and 12 results after the key, for a total of 25 results.
The most direct way to do this is with two queries against your view, with the proper combination of startkey, limit, and descending values.
For example, to get the key itself, and the 12 values following, query your view with these options:
startkey: <your key>
limit: 13
descending: false
Then to get the 12 entries before your key, perform a query with the following options:
startkey: <your key>
limit: 13
descending: true
This will give you two result sets, each with (a maximum of) 13 items. Note that your target key will be repeated (it's in each result set). You'll then need to combine the two result sets.
Note this does have a few limitations:
It returns a maximum of 26 results. If your data does not contain 12 values before or after your target key, you'll get fewer than 26 results.
If you have duplicate keys, you may get unexpected results. In particular:
If your target key is duplicated, you'll get 25 - N unique results (where N is the number of duplicates of your target key)
If your non-target keys are duplicated, you have no way of guaranteeing which of the duplicate keys will be returned, so performing the same query multiple times may result in different return values.

ArangoDb - How to count number of filtered results before limiting them

db.query(aql `
FOR doc IN ${collection}
FILTER REGEX_TEST(CONCAT(VALUES(doc, true)), ${queryStr}, true)
SORT doc[${sortBy}] ${dir}
LIMIT ${start, count}
RETURN doc._key
`)
.then(cursor => {
cb(cursor._result)
}, err => console.log(err))
I have above AQL query,
I want to count total nuber of filtered results before limiting the results per page (For Pagination Purpose)
I think issue is similar to this MySQL - How to count rows before pagination?, Find total number of results in mySQL query with offset+limit
want to do in ArangoDb with AQL
and part of solution may be this How to count number of elements with AQL?
So, What is the efficient/best solution for my requirement with AQL ?
You can set the flag fullCount in the options for creating the cursor to true. Then the result will have an extra attribute with the sub-attributes stats and fullCount.
You then can get the the fullCount-attribute via cursor.extra.stats.fullCount. This attribute contains the number of documents in the result before the last LIMIT in the query was applied. see HTTP documentation
In addition, you should use the explain feature to analyse your query. In your case, your query will always make a full collection scan, thus won't scale well.
update
I added the fullCount flag to your code. Keep in mind, that the fullCount attribute only appears if the number of results before LIMIT is higher then the results after.
db.query(aql `
FOR doc IN ${collection}
FILTER REGEX_TEST(CONCAT(VALUES(doc, true)), ${queryStr}, true)
SORT doc[${sortBy}] ${dir}
LIMIT ${start, count}
RETURN {family: doc.family, group: doc.group} `, {count:true, options:{fullCount:true} })
.then(cursor => { console.log(cursor) }, err => console.log(err))

Resources