elasticsearch termvectors result page size - elasticsearch

GET /myindex/voc/100/_termvectors?pretty=true
{
"fields":["fields.bodyText"],
"term_statistics" : true,
"filter" : {
"min_doc_freq" : 50,
"max_doc_freq" : 60
}
}
This API returns only part of the results.
Is there something like
"from" : 0, "size" : 10,
as in the _search API pagination?

Yes there is, something like from which represents from which index you got to start the search, and the size represents the number of hits you wanted to return.
So if you have something like this:
"from" : 0, "size" : 10,
It'll return your first ten results from the result set. This could be helpful.

Related

Differentiating _delete_by_query tasks in a multi-tenant index

Scenario:
I have an index with a bunch of multi-tenant data in Elasticsearch 6.x. This data is frequently deleted (via _delete_by_query) and populated by the tenants.
When issuing a _delete_by_query request with wait_for_completion=false, supplying a query JSON to delete a tenants' data, I am able to see generic task information via the _tasks API. Problem is, with a large number of tenants, it is not actively clear who is deleting data at any given time.
My question is this:
Is there a way I can view the query for which the _delete_by_query task is operating on? Or can I attach an additional param to the URL that is cached in the task to differentiate them?
Side note: looking at the docs: https://www.elastic.co/guide/en/elasticsearch/reference/6.6/tasks.html I see there is a description field in the _tasks API response that has the query as a String, however, I do not see that level of detail in my description field:
"description" : "delete-by-query [myindex]"
Thanks in advance
One way to identify queries is to add the X-Opaque-Id HTTP header to your queries:
For instance, when deleting all tenant data for (e.g.) User 3, you can issue the following command:
curl -XPOST -H 'X-Opaque-Id: 3' -H 'Content-type: application/json' http://localhost:9200/my-index/_delete_by_query?wait_for_completion=false -d '{"query":{"term":{"user": 3}}}'
You then get a task ID, and when checking the related task document, you'll be able to identify which task is/was deleting which tenant data thanks to the headers section which contains your HTTP header:
"_source" : {
"completed" : true,
"task" : {
"node" : "DB0GKYZrTt6wuo7d8B8p_w",
"id" : 20314843,
"type" : "transport",
"action" : "indices:data/write/delete/byquery",
"status" : {
"total" : 3,
"updated" : 0,
"created" : 0,
"deleted" : 3,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0
},
"description" : "delete-by-query [deletes]",
"start_time_in_millis" : 1570075424296,
"running_time_in_nanos" : 4020566,
"cancellable" : true,
"headers" : {
"X-Opaque-Id" : "3" <--- user 3
}
},

How to apply aggregations on grouped fields in Elasticsearch?

On my eCommerce store I want to only include the first item in each group (grouped by item_id) in the final results. At the same time I don't want to lose my aggregations (little numbers next to attributes that indicate how many items with that attribute are found).
Here is a little example:
Suppose I make a search for items and only 25 show up. This is the result for the color aggregation that I currently get:
black (65)
green (32)
white (13)
And I want it to be:
black (14)
green (6)
white (5)
The numbers should amount to the total number the user actually sees on the page.
How could I achieve that with Elasticsearch? I have tried both Grouping (Top Hits) and Field Collapsing and both don't seem to fit my use case. Solr does it almost by default with its Grouping functionality.
It should be rather easy. When you are asking for aggregation you are simple sending request to the _search endpoint. Example:
POST /exams/_search
{
"aggs" : {
"avg_grade" : { "avg" : { "field" : "grade" } }
}
}
and in above example you will get aggregation for all the documents.
If you want to get aggregation for specific documents you just need to add specific query to the request body, like:
POST /exams/_search
{
"query": {
"bool" : {
"must" : {
"query_string" : {
"query" : "some query string here"
}
},
"filter" : {
"term" : { "user" : "kimchy" }
}
}
},
"aggs" : {
"avg_grade" : { "avg" : { "field" : "grade" } }
}
}
and you can send size and from parameters as well.

Elasticsearch Switch between previous/next record from search result

I am getting the results based on various filters in the Elasticsearch which also includes pagination.
Now I need to navigate between previous and next record from that search results, when we open a record of the search results.
Is there a way to achieve this through Elasticsearch?
You could use the from and size parameters of the Search API.
GET /_search
{
"from" : 0, "size" : 10,
"query" : {
"term" : { "user" : "kimchy" }
}
}
or
GET /_search?from=0&size=10
{
"query" : {
"term" : { "user" : "kimchy" }
}
}
Note the default value for size is 10.

Ordering term aggregation buckets by sub-aggregration result values

I have two questions about the query seen on this capture:
How do I order by value in the sum_category field in the results?
I use respsize again in the query but it's not correct as you can see below.
Even if I make only an aggregration, why do all the documents come with the result? I mean, if I make a group by query in SQL it retrieves only grouped data, but Elasticsearch retrieves all documents as if I made a normal search query. How do I skip them?
Try this:
{
"query" : {
"match_all" : {}
},
"size" : 0,
"aggs" : {
"categories" : {
"terms" : {
"field" : "category",
"size" : 999999,
"order" : {
"sum_category" : "desc"
}
},
"aggs" : {
"sum_category" : {
"sum" : {
"field" : "respsize"
}
}
}
}
}
}
1). See the note in (2) for what your sort is doing. As for ordering the categories by the value of sum_category, see the order portion. There appears to be an old and closed issue related to that https://github.com/elastic/elasticsearch/issues/4643 but it worked fine for me with v1.5.2 of Elasticsearch.
2). Although you do not have that match_all query, I think that's probably what you are getting results for. And so the sort your specified is actually getting applied to those results. To not get these back, I just have size: 0 portion.
Do you want buckets for all the categories? I noticed you do not have size specified for the main aggregation. That's the size: 999999 portion.

elasticsearch number of facets returned

I have faceted queries working with elasticsearch 0.19.9. However I would like to return all facets that have a count > 0.
According to the documentation I should be able to:
{
"query" : {
"match_all" : { }
},
"facets" : {
"tag" : {
"terms" : {
"field" : "tag",
"all_terms" : true
}
}
}
}
As I understand, this should give me all facets even if count is 0.
However, this is still only returning the top 10 facets by count. Which is the default size. The only thing that seems to affect the number of returned facets is by actually setting "size" : N where N is the number of facets which will be returned.
I could set this to a really high number but that just seems to hack-ish.
Any ideas as to what I may be doing wrong?
You're not doing anything wrong. You figured it out correctly! There is an open issue on github to make the terms facet similar to the Terms Stats facet which allows you to set size=0 in order to get all the terms back. For now you just need to use an high value, which is a bit tricky, I agree. On the other hand be careful not to return too many entries!
{
"query" : {
"match_all" : { }
},
"facets" : {
"tag" : {
"terms" : {
"field" : "tag",
"size" : 2147483647,
"all_terms" : false
}
}
}
}
The only way to remove the "count: 0" is to put "all_terms" as false, and set your size number as high and as impossible as you can in your Elasticsearch instance (the example above is the largest signed value that an integer in PHP can have).
It may not be the best way, but this is the only known approach so far. Facet filter doesn't work with this at present (unless they updated and improved Elasticsearch to do it).

Resources