MongoDB Atlas search autocomplete for partial and exact matching - mongodb-atlas-search

Documents
{'name': 'name whatever'}, {'name': 'foo whatever'}, ...
Search index
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"type": "string"
},
{
"maxGrams": 100,
"type": "autocomplete"
}
]
}
},
"storedSource": true
}
I want to search by what, whatever, name whatever
It seems ok when I searching what and whatever
// for what
{
index: 'indexName',
autocomplete: {
query: 'whatever',
path: 'name'
}
}
// for whatever
{
index: 'indexName',
autocomplete: {
query: 'whatever',
path: 'name'
}
}
But searching name whatever is not working what I expected,
{
index: 'indexName',
autocomplete: {
query: 'name whatever',
path: 'name'
}
}
this returns name whatever but also foo whatever
How can I get only name whatever?

I had a similar issue and I believe the answer was to include 'tokenOrder: sequential' in the search - so your query would look like this:
{
index: 'indexName',
autocomplete: {
query: 'name whatever',
path: 'name',
tokenOrder: 'sequential'
}
}
https://www.mongodb.com/docs/atlas/atlas-search/autocomplete/#token-order-example
The description for using sequential tokenOrder states:
sequential
Indicates tokens in the query must appear adjacent to each other or in the order specified in the query in the documents. Results contain only documents where the tokens appear sequentially.

Related

MongoDB full text search, autocomplete on two fields

I am trying to implement MongoDB atlas search, and the objective is autocomplete on 2 fields.
I currently have this implementation:
const searchStep = {
$search: {
// Read more about compound here:
// https://docs.atlas.mongodb.com/reference/atlas-search/compound/
compound: {
must: [
{
autocomplete: {
query,
path: 'name',
},
},
{
autocomplete: {
query,
path: 'description',
},
},
],
},
},
}
This does not seem to work, seems to only work when there is both a match on the name AND description. How can I fix this, so I query for both name and description?
I now tried using the wildcard option:
{
wildcard: {
query,
path: ['name', 'description'],
allowAnalyzedField: true,
}
}
But the wildcard solution does not seem to work - no relevant results are returned...
If you are trying to match on name or subscription, use should: instead of must:
must will require that all of the subqueries match, where as should requires that only 1 of them does.
const searchStep = {
$search: {
// Read more about compound here:
// https://docs.atlas.mongodb.com/reference/atlas-search/compound/
compound: {
should: [
{
autocomplete: {
query,
path: 'name',
},
},
{
autocomplete: {
query,
path: 'description',
},
},
],
},
},
}

Elastic ngram prioritise whole words

I am trying to build an autocomplete with several million possible values. I have managed to do it with two different methods match and ngram. The problem is that match requires the user to type whole words and ngram returns poor results. Is there a way to only return ngram results if there are no match results?
Method 1: match
Returns very relevant results but requires user to type a full word
//mapping
analyzer: {
std_english: {
type: 'standard',
stopwords: '_english_',
},
}
//search
query: {
bool: {
must: [
{ term: { semanticTag: type } },
{ match: { search } }
]}
}
Method 2: ngram
Returns poor matches
//mapping
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 1,
max_gram: 20,
},
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'autocomplete_filter'],
},
},
//search
query: {
bool: {
must: [
{ term: { semanticTag: type } },
{ match: {
term: {
query: search,
operator: 'and',
}
}
}
]}
}
Try changing query to something like this -
{
"query": {
"bool": {
"must": [
{
"term": {
"semanticTag": "type"
}
},
{
"match_phrase_prefix": {
"fieldName": {
"query": "valueToSearch"
}
}
}
]
}
}
}
You can use match_phrase_prefix, by using this user will not need to type the whole word, anything that user types and which starts with indexed field data will get returned.
Just a note that this will also pull results from any available middle words from indexed documents as well.
For e.g. If data indexed in one of field is like - "lorem ipsum" and user type "ips" then you will get this whole document along with other documents that starts with "ips"
You can go with either standard or custom analyzer, you have to check which analyzer better suits your use case. According to information available in question, given above approach works well with standard analyzer.

ElasticSearch match multiple fields with different values

I can actually perform a simple match search with this:
query: {match: {searchable: {query:search}}}
This works well, my searchable field is analyzed in my mapping.
Now I want to perform a search on multiple fields: 1 string, and all other are numeric.
My mapping:
mappings dynamic: 'false' do
indexes :searchable, analyzer: "custom_index_analyzer", search_analyzer: "custom_search_analyzer"
indexes :year, type: "integer"
indexes :country_id, type: "integer"
indexes :region_id, type: "integer"
indexes :appellation_id, type: "integer"
indexes :category_id, type: "integer"
end
def as_indexed_json(options={})
as_json(
only: [:searchable, :year, :country_id, :region_id, :appellation_id, :category_id]
)
end
I have tried this:
query: {
filtered: {
query: {
match: {
searchable: search
}
},
filter: {
term: {
country_id: "1"
},
term: {
region_id: "2"
},
term: {
appellation_id: "34"
}
}
}
},
sort: {
_score: {
order: :desc
},
year: {
order: :desc,
ignore_unmapped: true
}
},
size:100
It works well but it will give me 100 results in all cases, from the appellation_id sent (34), even if the searchable field is very far from the text search.
I have also tried a BOOL query:
self.search(query: {
bool: {
must: [
{
match: {
country_id: "1"
},
match: {
region_id: "2"
},
match: {
appellation_id: "34"
},
match: {
searchable: search
}
}
]
}
},
sort: {
_score: {
order: :desc
},
year: {
order: :desc,
ignore_unmapped: true
}
},
size:100
)
But It will give me all results matching the searchable field and don't take care of the appellation_id wanted.
My goal is to get the best results and performance and ask ES to give me all data from country_id=X, region_id=Y, appellation_id=Z and finally perform a match on this set of results with the searchable field... and don't obtain results to far from reality with the searchable text.
Thanks.
As you may know elasticsearch match query returns result based on a relevance score. You can try to use term query instead of match for an exact term match. Also I guess your bool query structure must be like :
bool: {
must: [
{ match: {
country_id: "1"
}},
{match: {
region_id: "2"
}},
{match: {
appellation_id: "34"
}},
{match: {
searchable: search
}}
]
}

Elastic search returning wrong results

I am running a query against elastic search but the results returned are wrong. The idea is that I can check against a range of fields with individual queries. But when I pass the following query, items which don't have the included lineup are returned.
query: {
bool: {
must: [
{match:{"lineup.name":{query:"The 1975"}}}
]
}
}
The objects are events which looks like.
{
title: 'Glastonbury'
country: 'UK',
lineup: [
{
name: 'The 1975',
genre: 'Indie',
headliner: false
}
]
},
{
title: 'Reading'
country: 'UK',
lineup: [
{
name: 'The Strokes',
genre: 'Indie',
headliner: true
}
]
}
In my case both of these events are returned.
The mapping can be seen here:
https://jsonblob.com/567e8f10e4b01190df45bb29
You need to use match_phrase query, match query is looking for either The or 1975 and it find The in The strokes and it gives you that result.
Try
{
"query": {
"bool": {
"must": [
{
"match": {
"lineup.name": {
"query": "The 1975",
"type": "phrase"
}
}
}
]
}
}
}

Ruby Elasticsearch API: Returning the latest entry to an index

I've enabled the "_timestamp" field in the index mapping and I successfully retrieved the latest entry to an index using the elasticsearch REST API. The body of the POST request I used looks like this:
{
"query": {
"match_all": {}
},
"size": "1",
"sort": [
{
"_timestamp": {
"order": "desc"
}
}
]
}
Now I'm trying to translate this into the Ruby elasticsearch-api syntax... This is what I have so far:
client = Elasticsearch::Client.new host: 'blahblahblah:9200'
json = client.search index: 'index',
type: 'type',
body: { query: { match_all: {} }},
sort: '_timestamp',
size: 1
I've tried several variations on the above code, but nothing seems to return the newest entry. I can't find many examples online using the Ruby elasticsearch API syntax, so any help would be greatly appreciated!
If there is a way to return the latest entry without using the "_timestamp" field, I am open to trying that as well!
I finally found the correct syntax:
json = client.search index: 'index',
type: 'type',
body: { query: { match_all: {} }, size: 1, sort: [ { _timestamp: { order: "desc"} } ] }

Resources