Elastic ngram prioritise whole words - elasticsearch

I am trying to build an autocomplete with several million possible values. I have managed to do it with two different methods match and ngram. The problem is that match requires the user to type whole words and ngram returns poor results. Is there a way to only return ngram results if there are no match results?
Method 1: match
Returns very relevant results but requires user to type a full word
//mapping
analyzer: {
std_english: {
type: 'standard',
stopwords: '_english_',
},
}
//search
query: {
bool: {
must: [
{ term: { semanticTag: type } },
{ match: { search } }
]}
}
Method 2: ngram
Returns poor matches
//mapping
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 1,
max_gram: 20,
},
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'autocomplete_filter'],
},
},
//search
query: {
bool: {
must: [
{ term: { semanticTag: type } },
{ match: {
term: {
query: search,
operator: 'and',
}
}
}
]}
}

Try changing query to something like this -
{
"query": {
"bool": {
"must": [
{
"term": {
"semanticTag": "type"
}
},
{
"match_phrase_prefix": {
"fieldName": {
"query": "valueToSearch"
}
}
}
]
}
}
}
You can use match_phrase_prefix, by using this user will not need to type the whole word, anything that user types and which starts with indexed field data will get returned.
Just a note that this will also pull results from any available middle words from indexed documents as well.
For e.g. If data indexed in one of field is like - "lorem ipsum" and user type "ips" then you will get this whole document along with other documents that starts with "ips"
You can go with either standard or custom analyzer, you have to check which analyzer better suits your use case. According to information available in question, given above approach works well with standard analyzer.

Related

Elasticsearch return only results that match array of ids

Is it possible to use elastic search to query only within a set of roomIds?
I tried using bool and should:
query: {
bool: {
must: [
{
multi_match: {
operator: 'and',
query: keyword,
fields: ['content'],
type: 'most_fields'
}
},
{ term: { users: caller } },
{
bool: {
should:
term: {
room: [list of roomIds]
}
}
}
]
}
},
It works but when I have more than 1k roomIds I get "search_phase_execution_exception".
Is there a better way to do this? Thanks.
For array search you should be using terms query instead of term
query: {
bool: {
must: [
{
multi_match: {
operator: 'and',
query: keyword,
fields: ['content'],
type: 'most_fields'
}
},
{ term: { users: caller } },
{
bool: {
should:
terms: {
room: [list of roomIds]
}
}
}
]
}
},
From documentation
By default, Elasticsearch limits the terms query to a maximum of
65,536 terms. This includes terms fetched using terms lookup. You can
change this limit using the index.max_terms_count setting.

Elastic search query using match_phrase_prefix and fuzziness at the same time?

I am new to elastic search, so I am struggling a bit to find the optimal query for our data.
Imagine I want to match the following word "Handelsstandens Boldklub".
Currently, I'm using the following query:
{
query: {
bool: {
should: [
{
match: {
name: {
query: query, slop: 5, type: "phrase_prefix"
}
}
},
{
match: {
name: {
query: query,
fuzziness: "AUTO",
operator: "and"
}
}
}
]
}
}
}
It currently list the word if I am searching for "Hand", but if I search for "Handle" the word will no longer be listed as I did a typo. However if I reach to the end with "Handlesstandens" it will be listed again, as the fuzziness will catch the typo, but only when I have typed the whole word.
Is it somehow possible to do phrase_prefix and fuzziness at the same time? So in the above case, if I make a typo on the way, it will still list the word?
So in this case, if I search for "Handle", it will still match the word "Handelsstandens Boldklub".
Or what other workarounds are there to achieve the above experience? I like the phrase_prefix matching as its also supports sloppy matching (hence I can search for "Boldklub han" and it will list the result)
Or can the above be achieved by using the completion suggester?
Okay, so after investigating elasticsearch even further, I came to the conclusion that I should use ngrams.
Here is a really good explaniation of what it does and how it works.
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
Here is the settings and mapping I used: (This is elasticsearch-rails syntax)
settings analysis: {
filter: {
ngram_filter: {
type: "ngram",
min_gram: "2",
max_gram: "20"
}
},
analyzer: {
ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "ngram_filter"]
}
}
} do
mappings do
indexes :name, type: "string", analyzer: "ngram_analyzer"
indexes :country_id, type: "integer"
end
end
And the query: (This query actually search in two different indexes at the same time)
{
query: {
bool: {
should: [
{
bool: {
must: [
{ match: { "club.country_id": country.id } },
{ match: { name: query } }
]
}
},
{
bool: {
must: [
{ match: { country_id: country.id } },
{ match: { name: query } }
]
}
}
],
minimum_should_match: 1
}
}
}
But basically you should just do a match or multi match query, depending on how many fields you want to search in.
I hope someone find it helpful, as I was personally thinking to much in terms of fuzziness instead of ngrams (Didn't know about before). This led me in the wrong direction.

Elastic search returning wrong results

I am running a query against elastic search but the results returned are wrong. The idea is that I can check against a range of fields with individual queries. But when I pass the following query, items which don't have the included lineup are returned.
query: {
bool: {
must: [
{match:{"lineup.name":{query:"The 1975"}}}
]
}
}
The objects are events which looks like.
{
title: 'Glastonbury'
country: 'UK',
lineup: [
{
name: 'The 1975',
genre: 'Indie',
headliner: false
}
]
},
{
title: 'Reading'
country: 'UK',
lineup: [
{
name: 'The Strokes',
genre: 'Indie',
headliner: true
}
]
}
In my case both of these events are returned.
The mapping can be seen here:
https://jsonblob.com/567e8f10e4b01190df45bb29
You need to use match_phrase query, match query is looking for either The or 1975 and it find The in The strokes and it gives you that result.
Try
{
"query": {
"bool": {
"must": [
{
"match": {
"lineup.name": {
"query": "The 1975",
"type": "phrase"
}
}
}
]
}
}
}

Elasticsearch match nested field against array of values

I'm trying to apply a terms query on a nested field using mongoid-elasticsearch and ElasticSearch 2.0. This has come to be quite frustrating since the trial-error didn't pay off much and the docs on the subject are rather sparse.
Here is my query:
{
"query": {
"nested": {
"path": "awards",
"query": {
"bool": {
"must": [
{ "match": { "awards.year": "2010"}}
]
}
}
},
"nested":{
"path": "procuring_entity",
"query": {
"bool": {
"must": [
{ "terms": { "procuring_entity.country": ["ES", "PL"]}}
]
}
}
}
}
}
While "match" and "term", work just fine, when combined with the "terms" query it returns no results, even thought it should. My mappings looks like this:
elasticsearch!({
prefix_name: false,
index_name: 'documents',
index_options: {
mappings: {
document: {
properties: {
procuring_entity: {
type: "nested"
},
awards: {
type: "nested"
}
}
}
}
},
wrapper: :load
})
If "nested" doesn't count as analyzer (which as far as I know doesn't), than there's no problem with that. As for the second example,I don't think it's the case since the array of values that it's matched against comes from exterior.
Is terms query possible on nested fields? Am I doing something wrong?
Is there any other way to match a nested field against multiple values?
Any thoughts would be much appreciated.
I think you would need to change your mappings for your nested types for this - the terms query only works on not_analyzed fields. If you update your mapping to something like:
elasticsearch!({
prefix_name: false,
index_name: 'documents',
index_options: {
mappings: {
document: {
properties: {
procuring_entity: {
type: 'nested',
properties: {
country: {
'type': 'string',
'index': 'not_analyzed'
}
}
},
awards: {
type: 'nested'
}
}
}
}
},
wrapper: :load
})
I think the query should work if you do that.

In Elasticsearch how to use multiple term filters when number of terms are not fixed they can vary?

I know for using multiple term filters one should use bools but the problem here is that i dont know how many terms there gonna be for example i want to filter results on strings with OR ("aa", "bb", "cc", "dd", "ee") now i want my searches that will contain any of the strings but the problem is that sometimes this array size will be 15 or 10 or 20 now how can i handle number of terms in filters my code is given below.
var stores = docs.stores; // **THIS IS MY ARRAY OF STRINGS**
client.search({
index: 'merchants',
type: shop_type,
body: {
query: {
filtered: {
filter: {
bool: {
must: [
{
// term: { 'jeb_no': stores }, // HERE HOW TO FILTER ALL ARRAY STRINGS WITH OR CONDITION
}
]
}
}
}
}, script_fields : {
"area": {
"script" : "doc['address.area2']+doc['address.area1']"
}
}
}
})
I think this will do. Use terms instead of term
{
"query": {
"bool": {
"must": [
{
"terms": {
"jeb_no": stores
}
}
]
}
}
}

Resources