Elasticsearch flexible search - elasticsearch

Does anyone have experience with Elasticsearch and getting the searches to be more flexible?
Currently, if I have a query "House" it will return the correct items back. but if "Hous" is typed in, nothing gets returned. Also, if I search "O.J." it will return O.J. but if I wanted to search OJ I get nothing.

use prefixing:
bool: {
must: [
{
multi_match: {
query: "your text query",
type: "phrase_prefix",
max_expansions: 4,
fields: ["field1", "field2"]
}
}
]
}
you can also add fuzziness, this will allow dynamic mutations but may yield to less accurate results.

Related

Atlas Search Index partial match

I have a test collection with these two documents:
{ _id: ObjectId("636ce11889a00c51cac27779"), sku: 'kw-lids-0009' }
{ _id: ObjectId("636ce14b89a00c51cac2777a"), sku: 'kw-fs66-gre' }
I've created a search index with this definition:
{
"analyzer": "lucene.standard",
"searchAnalyzer": "lucene.standard",
"mappings": {
"dynamic": false,
"fields": {
"sku": {
"type": "string"
}
}
}
}
If I run this aggregation:
[{
$search: {
index: 'test',
text: {
query: 'kw-fs',
path: 'sku'
}
}
}]
Why do I get 2 results? I only expected the one with sku: 'kw-fs66-gre' 😬
During indexing, the standard anlyzer breaks the string "kw-lids-0009" into 3 tokens [kw][lids][0009], and similarly tokenizes "kw-fs66-gre" as [kw][fs66][gre]. When you query for "kw-fs", the same analyzer tokenizes the query as [kw][fs], and so Lucene matches on both documents, as both have the [kw] token in the index.
To get the behavior you're looking for, you should index the sku field as type autocomplete and use the autocomplete operator in your $search stage instead of text
You're still getting 2 results because of the tokenization, i.e., you're still matching on [kw] in two documents. If you search for "fs66", you'll get a single match only. Results are scored based on relevance, they are not filtered. You can add {$project: {score: { $meta: "searchScore" }}} to your pipeline and see the difference in score between the matching documents.
If you are looking to get exact matches only, you can look to using the keyword analyzer or a custom analyzer that will strip the dashes, so you deal w/ a single token per field and not 3

min_score excluding documents with higher scores

I have a trove of several million documents which I'm querying like this:
const query = {
min_score: 1,
query: {
bool: {
should: [
{
multi_match: {
query: "David",
fields: ["displayTitle^2", "synopsisList.text"],
type: "phrase",
slop: 2
}
},
{
nested: {
path: "contributors",
query: {
multi_match: {
query: "David",
fields: [
"contributors.characterName",
"contributors.contributionBy.displayTitle"
],
type: "phrase",
slop: 2
}
},
score_mode: "sum"
}
}
]
}
}
};
This query is giving sane looking results for a wide range of terms. However, it has a problem with "David" - and presumably others.
"David" crops up fairly regularly in the text. With the min_score option this query always returns 0 documents. When I remove min_score I get thousands of documents the best of which has a score of 22.749.
Does anyone know what I'm doing wrong? I guess min_score doesn't work the way I think it does.
Thanks
The problem I was trying to solve was that when I added some filter clauses to the above query elastic would return all the documents that satisfied the filter even those with a score of zero. That's how should works. I didn't realise that I can nest the should inside a must which achieves the desired effect.

Search within the results got from elasticsearch

Is it possible to search within the results that I get from elasticsearch?
To achieve that currently I need to run & wait for two searches on elasticsearch: the first search is
{ "match": { "title": "foo" } }
It takes 5 seconds and returns 500 docs etc.. And then a second search
{
"bool": {
"must": [
{ "match": { "title": "foo" } },
{ "match": { "title": "bar" } }
]
}
}
It takes another 5 seconds and returns 200 docs, which basically has nothing to do with the first search from elasticsearch's perspective.
Instead of doing it this way, I'd like to offer a "search further within the result" option to my users. Hopefully with this option, users can make a search with more keyword provided based on the result returned from the first search.
So my scenario is that a user makes a first search with keyword "foo", and gets 500 results on the webpage, and then selects "search further within the result", to make a second search within the 500 results, and hope to get some refined results really quick.
How can I achive it? Thanks!
What you could do is use the IDS query. Collect all document IDs from the first request, and then post them with a new Bool query that includes an IDS query in a must clause next to the original query. You could efficiently collect the IDs in the first request using the Scroll API. Since you will return the second result sorted anyway, it does not make sense to do any sorting in the first request, so you can speed up the first request.
See:
Scroll API: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
IDS Query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
post filter is a way to search inside an other search.
In your case :
GET _search
{
"query": {
"match": {
"title": "foo"
}
},
"post_filter": {
"match": {
"title": "bar"
}
}
}
post_filter will be executed on the query result.

Search for multiple incomplete words with Elasticsearch

I have a database of records, each of which has a right and a left field, and both these fields contain text. The database is indexed with Elasticsearch.
I want to search through both fields of these records and find the records that contain in any of the fields two or more of the words with certain prefixes. The search should be specific enough to find only the records that contain all words in the query, not just some of them.
For example, a query qui bro should return the record containing the sentence The quick brown fox jumped over the lazy dog, but not the one containing the sentence The quick fox jumped over the lazy dog
I've seen a description of how to perform prefix queries with Elasticsearch (and can reproduce it when searching for one word in one field).
I've also seen a description of how to perform multi-match queries to search through several fields at once.
But what I need is some combination of these techniques, which would allow me both to search through several fields at once, and to look only for parts of words. And to get only those records that have all the words whose parts are contained in the query.
How can I do that? Any method will do (prefixes, ngrams, whatever).
(P.S.: My question may, to a certain extent, be a duplicate of this one, but since it never was answered, I hope I'm not breaking any rules by asking mine.)
======================================
UPDATED:
Oh, I might have the first part of the question. Here is the syntax that seems to work in my Rails app (using elasticsearch-rails gem):
response = Paragraph.search query: {bool: { must: [ { prefix: {right: "qui"}}, {prefix: {right: "bro"}} ] } }
Or, to re-write it in pure Elasticsearch syntax:
{
"bool": {
"must": [
{ "prefix": { "right": "qui" }},
{ "prefix": { "right": "bro" }}
]
}
}
So my updated question now is how to combine this prefix search with multi_match search (to search both through the right and the left field.
OK, here is a possible answer that seems to work. The code has to search through multiple fields for several incomplete words and return only the records that contain all these words.
Here is the request written in elasticsearch-rails syntax:
response = Paragraph.search query: {bool: { must: [ { multi_match: { query: "qui", type: "phrase_prefix", fields: ["right", "left"]}}, { multi_match: { query: "brow", type: "phrase_prefix", fields: ["right", "left"]}}]}}
Or, re-written in the syntax that is used on Elasticsearch site:
{query:
{bool:
{ must:
[
{ multi_match:
{
query: "qui",
type: "phrase_prefix",
fields: ["right", "left"]
}
},
{ multi_match:
{
query: "brow",
type: "phrase_prefix",
fields: ["right", "left"]
}
}
]
}
}
}
This seems to work. But if somebody has other solutions (particularly if these solutions will make the search case-insensitive), I will be happy to hear them.

How do I construct this elasticsearch query object?

My documents are indexed like this:
{
title: "stuff here",
description: "stuff here",
keywords: "stuff here",
score1: "number here",
score2: "number here"
}
I want to perform a query that:
Uses the title, description, and keywords fields for matching the text terms.
It doesn't have to be complete match. Eg. If someone searches "I have a big nose", and "nose" is in one of the document titles but "big" is not, then this document should still be returned.
Edit: I tried this query and it works. Can someone confirm if this is the right way to do it? Thanks.
{
query:{
'multi_match':{
'query': q,
'fields': ['title^2','description', 'keywords'],
}
}
}
Your way is definitely the way to go!
The multi_match query is usually the one that you want to expose to the end users, while the query_string is similar, but also more powerful and dangerous since it exposes the lucene query syntax. Rule of thumb: don't use query string if you don't need it.
Also, searching on multiple fields is easy just providing the list of fields you want to search on, as you did, without the need for a bool query.
Below is the code that will create the query you can use. I wrote it in c# but it will work in other languages in the same way.
What you need to do is to create a BooleanQuery and set that at least 1 of its condition has to match. Then add a condition for every document field you want to be checked with Occur.SHOULD enum value:
BooleanQuery searchQuery = new BooleanQuery();
searchQuery.SetMinimumNumberShouldMatch(1);
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "title", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
Query titleQuery = parser.Parse("I have a big nose");
searchQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "description", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
Query descriptionQuery = parser.Parse("I have a big nose");
searchQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "keywords", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
Query keywordsQuery = parser.Parse("I have a big nose");
searchQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
Query queryThatShouldBeExecuted.Add(searchQuery, BooleanClause.Occur.MUST);
Here is the link to an example in java http://www.javadocexamples.com/java_source/org/apache/lucene/search/TestBooleanMinShouldMatch.java.html
The according JSON object to perform a HTTP Post request would be this:
{
"bool": {
"should": [
{
"query_string": {
"query": "I have a big nose",
"default_field": "title"
}
},
{
"query_string": {
"query": "I have a big nose",
"default_field": "description"
}
},
{
"query_string": {
"query": "I have a big nose",
"default_field": "keywords"
}
}
],
"minimum_number_should_match": 1,
"boost": 1
}
}

Resources