Elasticseach multiple indices suggestions - elasticsearch

I have following problem. This is actually my implementation of an "did you mean" query. If I use only one index the results fit perfectly. If I use multiple indices I wont get any results.
Does this query only work for single indices?
GET index1/_search
{
"suggest": {
"text": "exmple",
"multi_phrase": {
"phrase": {
"field": "all",
"size": 5,
"gram_size": 3,
"collate": {
"query": {
"source": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": {
"multi_match": {
"query": "{{suggestion}}",
"type": "cross_fields",
"fields": [
"name",
"name2"
],
"operator": "AND",
"lenient": true
}
}
}
}
},
"params": {
"field_name": "all"
}
}
}
}
}
}
If I try this query against on single index everything works fine. If I use multiple indices the results are empty.
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"suggest": {
"multi_phrase": [
{
"text": "example",
"offset": 0,
"length": 9,
"options": []
}
]
}
}

I found the solution on my own. I have to use confidence parameter.
The confidence level defines a factor applied to the input phrases
score which is used as a threshold for other suggest candidates. Only
candidates that score higher than the threshold will be included in
the result. For instance a confidence level of 1.0 will only return
suggestions that score higher than the input phrase. If set to 0.0 the
top N candidates are returned. The default is 1.0.

Related

Elasticsearch aggregation shows incorrect total

Elasticsearch version is 7.4.2
I suck at Elasticsearch and I'm trying to figure out what's wrong with this query.
{
"size": 10,
"from": 0,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "firstName"
}
},
{
"query_string": {
"query": "*",
"fields": [
"params.display",
"params.description",
"params.name",
"lastName"
]
}
},
{
"match": {
"status": "DONE"
}
}
],
"filter": [
{
"term": {
"success": true
}
}
]
}
},
"sort": {
"createDate": "desc"
},
"collapse": {
"field": "lastName.keyword",
"inner_hits": {
"name": "lastChange",
"size": 1,
"sort": [
{
"createDate": "desc"
}
]
}
},
"aggs": {
"total": {
"cardinality": {
"field": "lastName.keyword"
}
}
}
}
It returns:
"aggregations": {
"total": {
"value": 429896
}
}
So ~430k results, but in pagination we stop getting results around the 426k mark. Meaning, when I run the query with
{
"size": 10,
"from": 427000,
...
}
I get:
{
"took": 2215,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"total": {
"value": 429896
}
}
}
But if I change from to be 426000 I still get results.
You are comparing the cardinality aggregation value of your field lastName.keyword to your total documents in the index, which is two different things.
You can check the total no of documents in your index using the count API and from/size you are defined at query level ie it brings the documents matching your search query and as you don't have track_total_hits it shows 10k with relation gte means there are more than 10k documents matching your search query.
When it comes to your aggregation, I can see in both the case it returns the count as 429896 as this aggregation is not depend on the from/size you are mentioning for your query.
I was surprised when I found out that the cardinality parameter has Precision control.
Setting the maximum value was the solution for me.

Value does not exist

I am familiar with checking if a field exists using the exists query. I am wondering if there is a way to check if a value does not exist instead; something like this:
GET /_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "user",
"value": "id"
}
}
}
}
}
Update:
I want to add that it is a compound query so counting the result will not work.
If you want to check that if a particular field value exists or not, then you can simply use a match query. There is no need to use exists query with the must_not clause.
If the document matching the field value is there in your index, then its count will come in the search result. hits.total.value will you the count of matching documents.
Adding a working example
Index Data:
{
"user": "abc"
}
Search Query:
{
"size":0,
"query": {
"match": {
"user": "abc"
}
}
}
Search Result:
"hits": {
"total": {
"value": 1, // note this
"relation": "eq"
},
"max_score": null,
"hits": []
}
Search Query:
{
"size":0,
"query": {
"match": {
"user": "def"
}
}
}
Search Result:
"hits": {
"total": {
"value": 0, // note this
"relation": "eq"
},
"max_score": null,
"hits": []
}
Another option is to use count API
GET /_count
{
"query": {
"match": {
"user": "def"
}
}
}
Search Result:
{
"count": 0, // note this
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
}
}

Elastic Search: Exact phrase to have higher score + keep analysis

I am using a simple multi_match query on a specific type of an index.
As per the below screenshot, when the search query is Hyderabad, In or even Hyderabad, Ind, the intended result appears with a lower score. In this case, the intended result is Hyderabad, India (INHYD)
EDIT: Below is my updated query:
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": {
"query": "Hyderabad, In",
"fuzziness": 4,
"prefix_length": 2,
"operator": "and"
}
}
}
]
}
}
}
Below is the full ElasticSearch trace:
Elasticsearch TRACE: 2016-08-17T07:33:21Z
-> POST http://192.168.99.100:9200/shipwaves/ports/_search?size=10&from=0&default_operator=AND
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": {
"query": "Hyderabad, In",
"fuzziness": 4,
"prefix_length": 2,
"operator": "and"
}
}
}
]
}
}
}
<- 200
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
How can I tweak the query to give me the intended result a higher score?
Note: I have learnt that using not_analyzed index will make this work, but I do want analysis to be done so phrases like **Hydrabad" (a letter missing) are matched. So I am keeping the analysis to simple.
Elastic Search v2.3.5

ElasticSearch - Average aggregation/sort over multivalued non-unique numeric fields

I am trying to handle sorting over the average of multivalued field called 'rating_average'. In the example I'm giving you, the values for this field are [1, 2, 2]. I'm expecting the average to be (1+2+2)/3 = 1.66666667. The reality I'm getting 1.5 as an average.
After a few tests and analyzing extended stats, I've discovered that happens because the average is calculated over all non-unique items. So statistical operators are applied over the set [1, 2] instead of [1, 2, 2]. I've proved this end also by adding an aggregations section to my query to double check the average calculated for the sort block is identical to the one in the stats aggregation.
An example document is the following:
{
"_source": {
"content_uri": "http://data.semint.co.uk/resource/testContent1",
"rating_average": [
"1",
"2",
"2"
],
"forDesk": "http://data.semint.co.uk/resource/kMFMJd1rtKD"
}
The query I'm performing is the following:
{
"from": 0,
"size": 20,
"aggs": {
"rating_stats": {
"extended_stats": {
"field": "rating_average"
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"mediaType": [
"http://data.semint.co.uk/resource/testMediaType3"
],
"execution": "and"
}
}
]
}
}
}
},
"fields": [ "content_uri", "rating_average"],
"sort": [
{
"rating_average": {
"order": "desc",
"mode": "avg"
}
}
]
}
And these are the results I get from executing the query over the document aforementioned.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "travel_content6",
"_type": "semantic-index",
"_id": "http://data.semint.co.uk/resource/testContent1",
"_score": null,
"fields": {
"content_uri": [
"http://data.semint.co.uk/resource/testContent1"
],
"rating_average": [1, 2, 2]
},
"sort": [
1.5
]
}
]
},
"aggregations": {
"rating_stats": {
"count": 2,
"min": 1,
"max": 2,
"avg": 1.5,
"sum": 3,
"sum_of_squares": 5,
"variance": 0.25,
"std_deviation": 0.5,
"std_deviation_bounds": {
"upper": 2.5,
"lower": 0.5
}
}
}
}

Elastic Search fulltext search query and filters

I wanna perform a full-text search, but I also wanna use one or many possible filters. The simplified structure of my document, when searching with /things/_search?q=*foo*:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "things",
"_type": "thing",
"_id": "63",
"_score": 1,
"fields": {
"name": [
"foo bar"
],
"description": [
"this is my description"
],
"type": [
"inanimate"
]
}
}
]
}
}
This works well enough, but how do I combine filters with a query? Let's say I wanna search for "foo" in an index with multiple documents, but I only want to get those with type == "inanimate"?
This is my attempt so far:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*foo*"
}
},
"filter": {
"bool": {
"must": {
"term": { "type": "inanimate" }
}
}
}
}
}
}
When I remove the filter part, it returns an accurate set of document hits. But with this filter-definition it does not return anything, even though I can manually verify that there are documents with type == "inanimate".
Since you have not done explicit mapping, term query is looking for an exact match. you need to add "index : not_analyzed" to type field and then your query will work.
This will give you correct documents
{
"query": {
"match": {
"type": "inanimate"
}
}
}
but this is not the solution, You need do explicit mapping as I said.

Resources