ElasticSearch match multiple fields with different values - elasticsearch

I can actually perform a simple match search with this:
query: {match: {searchable: {query:search}}}
This works well, my searchable field is analyzed in my mapping.
Now I want to perform a search on multiple fields: 1 string, and all other are numeric.
My mapping:
mappings dynamic: 'false' do
indexes :searchable, analyzer: "custom_index_analyzer", search_analyzer: "custom_search_analyzer"
indexes :year, type: "integer"
indexes :country_id, type: "integer"
indexes :region_id, type: "integer"
indexes :appellation_id, type: "integer"
indexes :category_id, type: "integer"
end
def as_indexed_json(options={})
as_json(
only: [:searchable, :year, :country_id, :region_id, :appellation_id, :category_id]
)
end
I have tried this:
query: {
filtered: {
query: {
match: {
searchable: search
}
},
filter: {
term: {
country_id: "1"
},
term: {
region_id: "2"
},
term: {
appellation_id: "34"
}
}
}
},
sort: {
_score: {
order: :desc
},
year: {
order: :desc,
ignore_unmapped: true
}
},
size:100
It works well but it will give me 100 results in all cases, from the appellation_id sent (34), even if the searchable field is very far from the text search.
I have also tried a BOOL query:
self.search(query: {
bool: {
must: [
{
match: {
country_id: "1"
},
match: {
region_id: "2"
},
match: {
appellation_id: "34"
},
match: {
searchable: search
}
}
]
}
},
sort: {
_score: {
order: :desc
},
year: {
order: :desc,
ignore_unmapped: true
}
},
size:100
)
But It will give me all results matching the searchable field and don't take care of the appellation_id wanted.
My goal is to get the best results and performance and ask ES to give me all data from country_id=X, region_id=Y, appellation_id=Z and finally perform a match on this set of results with the searchable field... and don't obtain results to far from reality with the searchable text.
Thanks.

As you may know elasticsearch match query returns result based on a relevance score. You can try to use term query instead of match for an exact term match. Also I guess your bool query structure must be like :
bool: {
must: [
{ match: {
country_id: "1"
}},
{match: {
region_id: "2"
}},
{match: {
appellation_id: "34"
}},
{match: {
searchable: search
}}
]
}

Related

MongoDB Atlas search autocomplete for partial and exact matching

Documents
{'name': 'name whatever'}, {'name': 'foo whatever'}, ...
Search index
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"type": "string"
},
{
"maxGrams": 100,
"type": "autocomplete"
}
]
}
},
"storedSource": true
}
I want to search by what, whatever, name whatever
It seems ok when I searching what and whatever
// for what
{
index: 'indexName',
autocomplete: {
query: 'whatever',
path: 'name'
}
}
// for whatever
{
index: 'indexName',
autocomplete: {
query: 'whatever',
path: 'name'
}
}
But searching name whatever is not working what I expected,
{
index: 'indexName',
autocomplete: {
query: 'name whatever',
path: 'name'
}
}
this returns name whatever but also foo whatever
How can I get only name whatever?
I had a similar issue and I believe the answer was to include 'tokenOrder: sequential' in the search - so your query would look like this:
{
index: 'indexName',
autocomplete: {
query: 'name whatever',
path: 'name',
tokenOrder: 'sequential'
}
}
https://www.mongodb.com/docs/atlas/atlas-search/autocomplete/#token-order-example
The description for using sequential tokenOrder states:
sequential
Indicates tokens in the query must appear adjacent to each other or in the order specified in the query in the documents. Results contain only documents where the tokens appear sequentially.

Elastic ngram prioritise whole words

I am trying to build an autocomplete with several million possible values. I have managed to do it with two different methods match and ngram. The problem is that match requires the user to type whole words and ngram returns poor results. Is there a way to only return ngram results if there are no match results?
Method 1: match
Returns very relevant results but requires user to type a full word
//mapping
analyzer: {
std_english: {
type: 'standard',
stopwords: '_english_',
},
}
//search
query: {
bool: {
must: [
{ term: { semanticTag: type } },
{ match: { search } }
]}
}
Method 2: ngram
Returns poor matches
//mapping
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 1,
max_gram: 20,
},
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'autocomplete_filter'],
},
},
//search
query: {
bool: {
must: [
{ term: { semanticTag: type } },
{ match: {
term: {
query: search,
operator: 'and',
}
}
}
]}
}
Try changing query to something like this -
{
"query": {
"bool": {
"must": [
{
"term": {
"semanticTag": "type"
}
},
{
"match_phrase_prefix": {
"fieldName": {
"query": "valueToSearch"
}
}
}
]
}
}
}
You can use match_phrase_prefix, by using this user will not need to type the whole word, anything that user types and which starts with indexed field data will get returned.
Just a note that this will also pull results from any available middle words from indexed documents as well.
For e.g. If data indexed in one of field is like - "lorem ipsum" and user type "ips" then you will get this whole document along with other documents that starts with "ips"
You can go with either standard or custom analyzer, you have to check which analyzer better suits your use case. According to information available in question, given above approach works well with standard analyzer.

Elasticsearch return only results that match array of ids

Is it possible to use elastic search to query only within a set of roomIds?
I tried using bool and should:
query: {
bool: {
must: [
{
multi_match: {
operator: 'and',
query: keyword,
fields: ['content'],
type: 'most_fields'
}
},
{ term: { users: caller } },
{
bool: {
should:
term: {
room: [list of roomIds]
}
}
}
]
}
},
It works but when I have more than 1k roomIds I get "search_phase_execution_exception".
Is there a better way to do this? Thanks.
For array search you should be using terms query instead of term
query: {
bool: {
must: [
{
multi_match: {
operator: 'and',
query: keyword,
fields: ['content'],
type: 'most_fields'
}
},
{ term: { users: caller } },
{
bool: {
should:
terms: {
room: [list of roomIds]
}
}
}
]
}
},
From documentation
By default, Elasticsearch limits the terms query to a maximum of
65,536 terms. This includes terms fetched using terms lookup. You can
change this limit using the index.max_terms_count setting.

Elastic search query using match_phrase_prefix and fuzziness at the same time?

I am new to elastic search, so I am struggling a bit to find the optimal query for our data.
Imagine I want to match the following word "Handelsstandens Boldklub".
Currently, I'm using the following query:
{
query: {
bool: {
should: [
{
match: {
name: {
query: query, slop: 5, type: "phrase_prefix"
}
}
},
{
match: {
name: {
query: query,
fuzziness: "AUTO",
operator: "and"
}
}
}
]
}
}
}
It currently list the word if I am searching for "Hand", but if I search for "Handle" the word will no longer be listed as I did a typo. However if I reach to the end with "Handlesstandens" it will be listed again, as the fuzziness will catch the typo, but only when I have typed the whole word.
Is it somehow possible to do phrase_prefix and fuzziness at the same time? So in the above case, if I make a typo on the way, it will still list the word?
So in this case, if I search for "Handle", it will still match the word "Handelsstandens Boldklub".
Or what other workarounds are there to achieve the above experience? I like the phrase_prefix matching as its also supports sloppy matching (hence I can search for "Boldklub han" and it will list the result)
Or can the above be achieved by using the completion suggester?
Okay, so after investigating elasticsearch even further, I came to the conclusion that I should use ngrams.
Here is a really good explaniation of what it does and how it works.
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
Here is the settings and mapping I used: (This is elasticsearch-rails syntax)
settings analysis: {
filter: {
ngram_filter: {
type: "ngram",
min_gram: "2",
max_gram: "20"
}
},
analyzer: {
ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "ngram_filter"]
}
}
} do
mappings do
indexes :name, type: "string", analyzer: "ngram_analyzer"
indexes :country_id, type: "integer"
end
end
And the query: (This query actually search in two different indexes at the same time)
{
query: {
bool: {
should: [
{
bool: {
must: [
{ match: { "club.country_id": country.id } },
{ match: { name: query } }
]
}
},
{
bool: {
must: [
{ match: { country_id: country.id } },
{ match: { name: query } }
]
}
}
],
minimum_should_match: 1
}
}
}
But basically you should just do a match or multi match query, depending on how many fields you want to search in.
I hope someone find it helpful, as I was personally thinking to much in terms of fuzziness instead of ngrams (Didn't know about before). This led me in the wrong direction.

Stemming and highlighting for phrase search

My Elasticsearch index is full of large English-text documents. When I search for "it is rare", I get 20 hits with that exact phrase and when I search for "it is rarely" I get a different 10. How can I get all 30 hits at once?
I've tried creating creating a multi-field with the english analyzer (below), but if I search in that field then I only get results from parts of phrase (e.g., documents matchin it or is or rare) instead of the whole phrase.
"mappings" : {
...
"text" : {
"type" : "string",
"fields" : {
"english" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets_payloads",
"analyzer" : "english"
}
}
},
...
Figured it out!
Store two fields, one for the text content (text) and a sub-field with the English-ified stem words (text.english).
Create a custom analyzer based off of the default English analyzer which doesn't strip stop words.
Highlight both fields and check for each when displaying results to the user.
Here's my index configuration:
{
mappings: {
documents: {
properties: {
title: { type: 'string' },
text: {
type: 'string',
term_vector: 'with_positions_offsets_payloads',
fields: {
english: {
type: 'string',
analyzer: 'english_nostop',
term_vector: 'with_positions_offsets_payloads',
store: true
}
}
}
}
}
},
settings: {
analysis: {
filter: {
english_stemmer: {
type: 'stemmer',
language: 'english'
},
english_possessive_stemmer: {
type: 'stemmer',
language: 'possessive_english'
}
},
analyzer: {
english_nostop: {
tokenizer: 'standard',
filter: [
'english_possessive_stemmer',
'lowercase',
'english_stemmer'
]
}
}
}
}
}
And here's what a query looks like:
{
query: {
query_string: {
query: <query>,
fields: ['text.english'],
analyzer: 'english_nostop'
}
},
highlight: {
fields: {
'text.english': {}
'text': {}
}
},
}

Resources