Elastic App Search results in different total_results for different current_page - elastic-appsearch

Doing lazy loading with Elastic App Search.
My initial request looks like
{
"query": "",
"page": {
"current": 1,
"size": 10
},
"sort": {
"editedat": "desc"
}
}
This will result correctly in the following response
{
"meta": {
"alerts": [],
"warnings": [],
"precision": 3,
"page": {
"current": 1,
"total_pages": 5,
"total_results": 41,
"size": 10
},
"engine": {
"name": "myengine",
"type": "default"
},
"request_id": "71805727-9c0a-496b-95a9-bb317345807c"
},
"results": [
{
// the 10 results
...
}
]
When my app now requests the second page, the results look different:
request
{
"query": "",
"page": {
"current": 2,
"size": 10
},
"sort": {
"editedat": "desc"
}
}
response
{
"meta": {
"alerts": [],
"warnings": [],
"precision": 3,
"page": {
"current": 2,
"total_pages": 2,
"total_results": 18,
"size": 10
},
"engine": {
"name": "myengine",
"type": "default"
},
"request_id": "5d402099-e25d-41c9-af80-b961b78c5a94"
},
"results": [
{
// the 8 results
...
}
]
Now it suddenly shows two pages in total and 18 results, but my first request responded five pages with 41 items in total which would be the correct amount.
Am I missing anything quite simple? Is this a bug? Do I have to take another approach?
Thanks for your help and experience.

Related

Filter documents out of the facet count in enterprise search

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

ElasticSearch - Combine filters & Composite Query to get unique fields combinations

Well.. I am quite "newb" regarding ES so regarding aggregation... there is no words in the dictionary to describe my level regarding it :p
Today I am facing an issue where I am trying to create a query that should execute something similar to a SQL DISTINCT, but among filters. I have this document given (of course, an abstraction of the real situation):
{
"id": "1",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 1,
"name": "a_name_for_id_1"
},
"structure": {
"material": "cartoon",
"thickness": 5
},
"shared": true,
"objective": "stackoverflow"
}
As all the data of the above document can vary, I however have some values that can be redundant, such as classification.id, kind, structure.material.
So, in order to fullfit my requirements, I would like to "group by" these 3 fields in order to have a unique combination of each. If we go deeper, with the following data, I should get the following possibilities:
[{
"id": "1",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 1,
"name": "a_name_for_id_1"
},
"structure": {
"material": "cartoon",
"thickness": 5
},
"shared": true,
"objective": "stackoverflow"
},
{
"id": "2",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 2,
"name": "a_name_for_id_2"
},
"structure": {
"material": "iron",
"thickness": 3
},
"shared": true,
"objective": "linkedin"
},
{
"id": "3",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": false,
"kind": "document",
"classification": {
"id": 2,
"name": "a_name_for_id_2"
},
"structure": {
"material": "paper",
"thickness": 1
},
"shared": false,
"objective": "tiktok"
},
{
"id": "4",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 3,
"name": "a_name_for_id_3"
},
"structure": {
"material": "cartoon",
"thickness": 5
},
"shared": false,
"objective": "snapchat"
},
{
"id": "5",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": true,
"kind": "document",
"classification": {
"id": 3,
"name": "a_name_for_id_3"
},
"structure": {
"material": "paper",
"thickness": 1
},
"shared": true,
"objective": "twitter"
},
{
"id": "6",
"createdAt": 1626783747,
"updatedAt": 1626783747,
"isAvailable": false,
"kind": "document",
"classification": {
"id": 3,
"name": "a_name_for_id_3"
},
"structure": {
"material": "iron",
"thickness": 3
},
"shared": true,
"objective": "facebook"
}
]
based on the above, I should get the following results in the "buckets":
document 1 cartoon
document 2 iron
document 2 paper
document 3 cartoon
document 3 paper
document 3 iron
Of course, for the sake of this example (and to make it easier, I yet don't have any duplicates)
However, on top of that, I need some "pre-filters" as I only want:
Documents that are available isAvailable=true
Documents'structure's thickness should range between 2 and 4 included: 2 >= structure.thickness >= 4
Document's that are shared shared=true
I should so then get only the following combinations compared to the first set of results:
document 1 cartoon -> not a valid result, thickness > 4
document 2 iron
document 2 paper -> not a valid result, isAvailable != true
document 3 cartoon -> not a valid result, thickness > 4
document 3 cartoon -> not a valid result, thickness < 2
document 3 iron -> not a valid result, isAvailable != true
If you're still reading, well.. thanks! xD
So, as you can see, I need all the possible combination of this field regarding the static pattern kind <> classification_id <> structure_material that are matching the filters regarding isAvailable, thickness, shared.
Regarding the output, the hits doesn't matter to me as I don't need the documents but only the combination kind <> classification_id <> structure_material :)
Thanks for any help :)
Max
You can got with Cardinatily aggregations with your existing filters.Please check this url and let me know if you have any queries.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
Thanks to a colleague, I could finally get it working as expected!
QUERY
GET index-latest/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"isAvailable": true
}
},
{
"range": {
"structure.thickness": {
"gte": 2,
"lte": 4
}
}
},
{
"term": {
"shared": true
}
}
]
}
},
"aggs": {
"my_agg_example": {
"composite": {
"size": 10,
"sources": [
{
"kind": {
"terms": {
"field": "kind.keyword",
"order": "asc"
}
}
},
{
"classification_id": {
"terms": {
"field": "classification.id",
"order": "asc"
}
}
},
{
"structure_material": {
"terms": {
"field": "structure.material.keyword",
"order": "asc"
}
}
}
]
}
}
}
}
The given result is then:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"my_agg_example": {
"after_key": {
"kind": "document",
"classification_id": 2,
"structure_material": "iron"
},
"buckets": [
{
"key": {
"kind": "document",
"classification_id": 2,
"structure_material": "iron"
},
"doc_count": 1
}
]
}
}
}
So, as we can see, we get the following bucket:
{
"key": {
"kind": "document",
"classification_id": 2,
"structure_material": "iron"
},
"doc_count": 1
}
Note: Be careful regarding the type of your field.. putting .keyword on classification.id was resulting to no results in the buckets... .keyword should be use only on types such as string (as far as I understood, correct me if I am wrong)
As expected, we have the following result (compared to the initial question):
document 2 iron
Note: Be careful, the order of the elements within the aggs.<name>.composite.sources does play a role in the returned results.
Thanks!

Calculate the counts of last snapshot of a record in ElasticSearch

I am storing snapshots of data in ElasticSearch. I want to perform count metric aggregation on latest snapshot of each entry, the purpose is to know what state my current (latest) data are in
I have something like this
[
{
"id": 2,
"state": "deleted",
"timestamp": "2019-11-20T18:18:09+00:00"
},
{
"id": 2,
"state": "published",
"timestamp": "2019-11-19T18:18:09+00:00"
},
{
"id": 3,
"state": "published",
"timestamp": "2019-10-17T18:18:09+00:00"
},
{
"id": 3,
"state": "draft",
"timestamp": "2019-10-16T18:18:09+00:00"
}
]
I tried this
POST /snapshots/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"2": {
"terms": {
"field": "state.keyword",
},
"aggs": {
"1": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
But the problem is it first create a bucket and in that bucket it does the sorting and calculate the top_hits so instead of
deleted = 1
published = 1
draft = 0
It returns
deleted = 1
published = 1
draft = 1

Storing JSON array string elasticsearch Bug

I am observing some strange behavior coming out of Elasticsearch 5.2 and it's impossible to debug-- as there are no errors thrown nor am I able to find similar issues/documentation online.
I'm storing a JSON array as a "string" in elasticsearch (using python's json.dumps()) -- long story short, I have to do it this way. However, when I do a DSL query, only the JSON arrays (stored as a singular string) containing 1 object are shown. If more than 1, then it just returns an empty bucket 0 objects. I'm storing them in a field called "metadata".
I'm very confused why only a subset of the data is shown, and other data (with more than 1 object in json array) is ignored. The data is encoded as string. I know for a fact the data stored in index. I can see it in kibana "discovery" -- as I can see large JSON strings with multiple objects.
Example 1 (JSON String w/ 1 object):
[{"score": 0.8829717636108398, "height": 0.875460147857666, "width":
0.3455989360809326, "y": 0.08105117082595825, "x": 0.5616265535354614, "note": "box1"}]
Example 2:
[{"score": 0.8829717636108398, "height": 0.875460147857666, "width":
0.3455989360809326, "y": 0.08105117082595825, "x": 0.5616265535354614, "note": "box1"}, {"score": 0.6821991136108398, "height":
0.875460147857666, "width": 0.3455989360809326, "y": 0.08105117082595825, "x": 0.5616265535354614, "note": "box2"}]
Here is my query:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"analyze_wildcard": true,
"query": "*"
}
},
{
"range": {
"created_at": {
"gte": 1508012482796,
"lte": 1508014282797,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"5": {
"terms": {
"field": "metadata.keyword",
"size": 31,
"order": {
"_count": "desc"
}
}
}
}
}
This query only returns strings with 1 object. See below:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4214,
"max_score": 0,
"hits": []
},
"aggregations": {
"5": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 35,
"buckets": [
{
"key": "[]",
"doc_count": 102
},
{
"key": "{}",
"doc_count": 8
},
{
"key": "[{\"score\": 0.9015679955482483, \"height\": 0.8632315695285797, \"width\": 0.343660831451416, \"y\": 0.08102986216545105, \"x\": 0.5559845566749573, \"note\": \"box11\"}]",
"doc_count": 6
},
{
"key": "[{\"score\": 0.6365205645561218, \"height\": 0.9410756528377533, \"width\": 0.97696852684021, \"y\": 0.04701271653175354, \"x\": 0.013666868209838867, \"note\": \"box17\"}]",
"doc_count": 4
},
...
}
As observed, only data with JSON strings with 1 objects (i.e. [{..}]) are returned/visible. It's completely ignoring the strings with multiple objects (i.e. [{...},{...}]).
More Clarifications:
It's using the default mappings
I am able to get the JSON string(regardless of the number of objects)
when queried by document id, or using "match" by exact field values)
If you're using the default mapping, this is most probably because your keyword mapping has an ignore_above: 256 settings and looks like this:
{
"mappings": {
"my_type": {
"properties": {
"metadata": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
You can increase that limit in order to index your JSON strings longer than 256 characters.

Elasticsearch - Show index-wide count for each returned result based from a given term

Firstly i apologise if the terminology i use is incorrect as i am learning elasticsearch day by day and maybe use incorrect phrases.
After spending several days trying to figure this out and pulling my hair out i seem to be hitting brick walls every-time.
I am trying to get elasticsearch to provide a document count for each returned result, I will provide an example below..
{
"suggest": {
"text": "aberdeen",
"city": {
"completion": {
"field": "city_suggest",
"size": "2"
}
},
"street": {
"completion": {
"field": "street_suggest",
"size": "2"
}
}
},
"size": 0,
"aggs": {
"meta": {
"filter": {
"term": {
"city.raw": "aberdeen"
}
},
"aggs": {
"name": {
"terms": {
"field": "city.raw"
}
}
}
}
}
}
The above query returns the following results:
{
"took": 37,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1870535,
"max_score": 0,
"hits": []
},
"aggregations": {
"meta": {
"doc_count": 119196,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Aberdeen",
"doc_count": 119196
}
]
}
}
},
"suggest": {
"city": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Aberdeen",
"score": 100
}
]
}
],
"street": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Davidson House, Aberdeen, AB15",
"score": 80
},
{
"text": "Bruce House, Aberdeen, AB15",
"score": 80
}
]
}
]
}
}
The result i am trying to achieve is to have an overall document count of each returned result so for example, The returned street address of "Davidson House, Aberdeen, AB15" would say how many documents in the index match this given address and this would be repeated for each result and the same for the city in a similar way to how the aggregated city currently shows the overall count.
{
"key": "Aberdeen",
"doc_count": 119196
}
Here is an example of something similar in production
The problem i believe i have faced with aggregations is i do not know the values that are going to be returned otherwise i could predefine them with aggregations like i did the city thus requesting the overall count of each given result that way.
To help give an overall example of how i pictured the results to be i will show how i pictured that possible working results to be like:
"suggest": {
"city": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Aberdeen",
"score": 100,
"total_addresses": 196152
}
]
}
],
"street": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Davidson House, Aberdeen, AB15",
"score": 80,
"total_addresses": 158
},
{
"text": "Bruce House, Aberdeen, AB15",
"score": 80,
"total_addresses": 30
}
]
}
]
}
En terms of the elasticsearch version i am using, I have two dev servers running elasticsearch 2.3 and 5.5 to see if the newer version of elasticsearch would make a difference and unfortunately i came up short so i have been using 2.3 in favour of 5.5
Any help or advice would be greatly appreciated, Thanks all.
you need to divide your query in two. First use the suggest API to gather suggestions, then run the aggregation on the result. The drawback of this solution would be, that you have a crazy fast suggestion (less than a millisecond, if you're lucky), against a longer running aggregation. If thats ok for you, this might be a good approach.
Another idea might be to have an own suggestion index with preaggregated data, that contains such a count - this index gets recreated regurlarly in the background.

Resources