Select and Update all matching Documents - elasticsearch

We are trying to do the following and any help would be appreciated.
Say you make a search and 100,000 documents match.
We would like to increment a counter in each document that matched. Then at the same time select the first page say the first 50.
Can this be done in one operation or may be a parallel scenario.

You could try a multi search query for such kind of maneuver:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html
Basically you add several queries in a single multi search which is parallelized on the ES side, and returns a list of responses per each query.

You Can use Update by query using NEST.Let me know if you still facing any issues.

You can use update by query to do this :
{
"script": {
// fieldName is field you want to increment in document
"source": "ctx._source.fieldName=params.val", // counter increment by when query match
"lang": "painless",
"params": {
"i": 0,
"val": i+1,
}
},
"query": {
// your match condition
}
}

Related

ElasticSearch 7.7 how can I increase the count of results of whole index

I understand that its theres hardcoded limit in Elasticsearch of 10k results per query. What I wanna know if theres any way to search results within this 10k limit but at the same time at least show count of all results for this particular query.
So let's suppose if there are 1M results matching for certain query, the count should show 1M instead of max limit of 10k.
Thank you.
Yes, You can.
You need to add the below attribute to your search query
{
"track_total_hits": true
}
It will show you the total count along with default result.
Elasticsearch supports a /_count API to result the count of all hits in query
GET /index/_count
{
// your search query here
"query": {
"match_all": {}
}
}
You can add "from" and "size" to visit specific hits of response
Example
GET index/_search
{
"from": 0,
"size": 100,
"query": {
"match_all": {}
}
}
In the returned query response from Elasticsearch, there is a field response['hits']['total']['value'] which has the count of hits too, but it also has its limitations.
NOTE: /_count API doesn't support "from" and "size", it gives you the total count.
for more details visit
Elasticsearch Count API.

ElasticSearch: Use Query to get single document ranking

I am trying to use ElasticSearch to compute a ranking. I'm not sure if this is possible and am trying to find out what my options might be. I need to run a query on all documents, sort them descending and then just return what number position in the list a specific record is located.
For example, I want to find out Julie's class ranking. I have records of each student in Julie's grade that contains their names and GPA's and I want to perform 1 query that will tell me what her rank in within her grade.
I am hoping there is an ES guru out there that can help because otherwise I am going to need to run a regular query, get back max 10,000 records and figure it out from there.
This cannot be found in a single query.
First you need to get GPA of "Julia" and then find count of docs which have score higher than Julia.
{
"query": {
"range": {
"gpa": {
"gt": 8 --> GPA of julia
}
}
},
"aggs": {
"count": {
"value_count": {
"field": "name.keyword" --> count where gpa is greater than 8
}
}
}
}
Better option is to store rank in document itself while indexing

Elasticsearch query to remove an delete value from inconsistent array of comma-separated values

Recently, I posted the following about adding a string to existing (inconsistent) arrays in documents: ElasticSearch query to populate or append a value to a field
The marked solution is working perfectly.
But now I need to understand how to delete one of the 5-character codes from the arrays. Assuming I now need to delete the code 'ABCDE' from the documents, while leaving the other codes in the array untouched, what would that query look like?
In below script I am looping through array and creating a list by removing the given value.
Please test before running on actual data.
{
"script": {
"source": "ctx._source.customCategories.removeAll(Collections.singleton(params.catg))",
"lang": "painless",
"params": {
"catg": "c"
}
}
}

nested count aggregations in elasticsearch

I have a type in elasticsearch where each user can post any number of posts(fields being "userid" and "post").Now I need the count of users who posted 0 post,1 post,2 posts and so on....how do I do it? I think it needs some nested aggregations implemented but I don't know how to proceed. Thanks in advance !
The best way of doing this is to add a separate field to store the number of posts.
Scripts are not too efficient (values are getting re-evaluated each time a query executes) and you get the value indexed properly which makes queries and aggregations very fast.
Of course you need to be sure you update this count each time you update the document.
You can use script in aggregation:
POST index_name/type_name/_search
{
"aggs": {
"group By Post Count": {
"terms": {
"script" : "doc['post'].size()"
}
}
}
}
Make sure you enable scriptig
Hope this helps you.

To Select documents having same startDate and endDate

I have some documents where in each document , there is a startDate and endDate date fields. I need all documents with both these value as same. I couldn't find any query which will help me to do it.
Elasticsearch supports script filters, which you can use in this case . More Info
Something like this is what you will need -
POST /<yourIndex>/<yourType>/_search?
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc['startDate'].value == doc['endDate'].value"
}
}
}
}
}
This can be achieved in 2 manner
Index solution - While indexing add an additional field called isDateSame and set it to true or false based on the value of startDate and endDate. Then you can easily do a query based on that field. This is the best optimized solution
Script solution - Elasticsdearch maintains all the indexed data in field data which is more like a reverse reverse index. Using script you can access any indexed fields and do comparison. This is pretty fast but not as good as first one.You can use the following query for the same

Resources