MongoDB full text search, autocomplete on two fields - mongodb-atlas-search

I am trying to implement MongoDB atlas search, and the objective is autocomplete on 2 fields.
I currently have this implementation:
const searchStep = {
$search: {
// Read more about compound here:
// https://docs.atlas.mongodb.com/reference/atlas-search/compound/
compound: {
must: [
{
autocomplete: {
query,
path: 'name',
},
},
{
autocomplete: {
query,
path: 'description',
},
},
],
},
},
}
This does not seem to work, seems to only work when there is both a match on the name AND description. How can I fix this, so I query for both name and description?
I now tried using the wildcard option:
{
wildcard: {
query,
path: ['name', 'description'],
allowAnalyzedField: true,
}
}
But the wildcard solution does not seem to work - no relevant results are returned...

If you are trying to match on name or subscription, use should: instead of must:
must will require that all of the subqueries match, where as should requires that only 1 of them does.
const searchStep = {
$search: {
// Read more about compound here:
// https://docs.atlas.mongodb.com/reference/atlas-search/compound/
compound: {
should: [
{
autocomplete: {
query,
path: 'name',
},
},
{
autocomplete: {
query,
path: 'description',
},
},
],
},
},
}

Related

MongoDB Atlas search autocomplete for partial and exact matching

Documents
{'name': 'name whatever'}, {'name': 'foo whatever'}, ...
Search index
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"type": "string"
},
{
"maxGrams": 100,
"type": "autocomplete"
}
]
}
},
"storedSource": true
}
I want to search by what, whatever, name whatever
It seems ok when I searching what and whatever
// for what
{
index: 'indexName',
autocomplete: {
query: 'whatever',
path: 'name'
}
}
// for whatever
{
index: 'indexName',
autocomplete: {
query: 'whatever',
path: 'name'
}
}
But searching name whatever is not working what I expected,
{
index: 'indexName',
autocomplete: {
query: 'name whatever',
path: 'name'
}
}
this returns name whatever but also foo whatever
How can I get only name whatever?
I had a similar issue and I believe the answer was to include 'tokenOrder: sequential' in the search - so your query would look like this:
{
index: 'indexName',
autocomplete: {
query: 'name whatever',
path: 'name',
tokenOrder: 'sequential'
}
}
https://www.mongodb.com/docs/atlas/atlas-search/autocomplete/#token-order-example
The description for using sequential tokenOrder states:
sequential
Indicates tokens in the query must appear adjacent to each other or in the order specified in the query in the documents. Results contain only documents where the tokens appear sequentially.

Elasticsearch structure in the correct and effective way for search engine

I'm building a search engine for my audio store.
I only use 1 index for the audio documents and here is the structure:
{
id: { type: 'integer' },
title: { type: 'search_as_you_type' },
description: { type: 'text' },
createdAt: { type: 'date' },
updatedAt: { type: 'date' },
datePublished: { type: 'date' },
duration: { type: 'float' },
categories: {
type: 'nested',
properties: {
id: { type: 'integer' },
name: { type: 'text' }
},
}
}
It's simple to search by text the audio documents with the order by date published.
But I want to make it more powerful to make a text search and order by trending based on the audio listen times and purchase histories in a specific range, eg: text search trending audios for the last 3 months or the last 30 days, so I tweaked the structure as below:
{
...previousProperties,
listenTimes: {
type: 'nested',
properties: {
timestamp: { type: 'date' },
progress: { type: 'float' }, // value 0-1.
},
},
purchaseHistories: {
type: 'nested',
properties: {
timestamp: { type: 'date' }
},
},
}
And here is my query to get trending audios for the last 3 months and it worked:
{
bool: {
should: [
{
nested: {
path: 'listenTimes',
query: {
function_score: {
query: {
range: {
'listenTimes.timestamp': {
gte: $range,
},
},
},
functions: [
{
field_value_factor: {
field: 'listenTimes.progress',
missing: 0,
},
},
],
boost_mode: 'replace',
},
},
score_mode: 'sum',
},
},
{
nested: {
path: 'purchaseHistories',
query: {
function_score: {
query: {
range: {
'purchaseHistories.timestamp': {
gte: 'now+1d-3M/d',
},
},
},
boost: 1.5,
},
},
score_mode: 'sum',
},
},
],
},
}
I have some uncertainty with my approach such as:
The number of listen times and purchase histories record of each audio are quite big, is it effective if I structured the data like this? I just only test with the sample data and it seems to work fine.
Does Elasticsearch will re-index the whole document every time I push new records of listen times and purchase histories into the audio docs?
I'm very new to Elasticsearch, so could someone please give me some advice on this case, thank you so much!
First question is a good one, it depends how you will implement it, you will have to look out for atomic action since, I'm guessing, you're planning to fetch number of listen times and then save incremented value. If you're doing this from one application in one thread and it's managing to process it in time, then you're fine, but you're not able to scale. I would say that elasticsearch is not really made for this kind of transactions. First idea that popped into my brain is saving numbers into SQL database and updating elasticsearch on some schedule. I suppose those results don't have to be updated in real time?
And about second question I'll just post quote from elasticsearch documentation The document must still be reindexed, but using update removes some network roundtrips and reduces chances of version conflicts between the GET and the index operation., you can find more on this link.

Elasticsearch return only results that match array of ids

Is it possible to use elastic search to query only within a set of roomIds?
I tried using bool and should:
query: {
bool: {
must: [
{
multi_match: {
operator: 'and',
query: keyword,
fields: ['content'],
type: 'most_fields'
}
},
{ term: { users: caller } },
{
bool: {
should:
term: {
room: [list of roomIds]
}
}
}
]
}
},
It works but when I have more than 1k roomIds I get "search_phase_execution_exception".
Is there a better way to do this? Thanks.
For array search you should be using terms query instead of term
query: {
bool: {
must: [
{
multi_match: {
operator: 'and',
query: keyword,
fields: ['content'],
type: 'most_fields'
}
},
{ term: { users: caller } },
{
bool: {
should:
terms: {
room: [list of roomIds]
}
}
}
]
}
},
From documentation
By default, Elasticsearch limits the terms query to a maximum of
65,536 terms. This includes terms fetched using terms lookup. You can
change this limit using the index.max_terms_count setting.

elasticsearch: cannot search for # symbol

Our mapping looks like this:
mappings: {
entry: {
properties: {
id: { type: 'string' },
name: { type: 'string' },
creationTimestamp: { type: 'date', format: 'date_time' },
lastTimestamp: { type: 'date', format: 'date_time' }
}
}
}
There are docs that contain # symbols in the name, like for example "#TITLE#_30". However, I am not able to search for the # symbol. Searching for name:*title* or name:*_30* works fine, but when trying name:*#title* I get no results.
Which tokenizer do I have to use so that this is possible? What our end-users want to do is just case insensitive searches with wildcards.
EDIT
The query looks like this:
query: {
filtered: {
filter: {
bool: {
must: [{
range: {
creationTimestamp: {
gte: startdate.toISOString(),
lte: enddate.toISOString()
}
},
query: {
query_string: {
query: 'name:*title*' // e.g
}
}
}]
}
}
}
}
P.S. we use es v1.7
EDIT 2
Tried the 2 options from How to modify standard analyzer to include #? but they don't work for me.
Also tried the following:
settings: {
analysis: {
analyzer: {
name_analyzer: {
type: 'custom',
tokenizer: 'whitespace',
filter: ['test']
}
},
filter: {
test: {
type: 'word_delimiter',
preserve_original: true,
type_table: ['# => ALPHANUM']
}
}
}
}
but this only gets results for name:*#* and any other query doesn't work

Ruby Elasticsearch API: Returning the latest entry to an index

I've enabled the "_timestamp" field in the index mapping and I successfully retrieved the latest entry to an index using the elasticsearch REST API. The body of the POST request I used looks like this:
{
"query": {
"match_all": {}
},
"size": "1",
"sort": [
{
"_timestamp": {
"order": "desc"
}
}
]
}
Now I'm trying to translate this into the Ruby elasticsearch-api syntax... This is what I have so far:
client = Elasticsearch::Client.new host: 'blahblahblah:9200'
json = client.search index: 'index',
type: 'type',
body: { query: { match_all: {} }},
sort: '_timestamp',
size: 1
I've tried several variations on the above code, but nothing seems to return the newest entry. I can't find many examples online using the Ruby elasticsearch API syntax, so any help would be greatly appreciated!
If there is a way to return the latest entry without using the "_timestamp" field, I am open to trying that as well!
I finally found the correct syntax:
json = client.search index: 'index',
type: 'type',
body: { query: { match_all: {} }, size: 1, sort: [ { _timestamp: { order: "desc"} } ] }

Resources