ElasticSearch - Search middle of words over multiple fields - elasticsearch

I'm trying to retrieve documents that have a phrase in them, not necessarily at the start of the word, over multiple document fields.
Such as "ell" should match a document field "hello". And do this on two fields.
I initially went with MultiMatch due to this SO answer. Here was my implementation:
QueryContainer &= Query<VeganItemEstablishmentSearchDto>.MultiMatch(c => c
.Fields(f => f.Field(p => p.VeganItem.Name).Field(v => v.VeganItem.CompanyName))
.Query(query)
.MaxExpansions(2)
.Slop(2)
.Name("named_query")
);
But I found that it would only match "hello" if my search phrase started with the start of the word e.g. it would not match "ello".
So I then changed to QueryString due to this SO answer. My implementation was:
QueryContainer &= Query<VeganItemEstablishmentSearchDto>.QueryString(c => c
.Fields(f => f.Field(p => p.VeganItem.Name).Field(v => v.VeganItem.CompanyName))
.Query(query)
.FuzzyMaxExpansions(2)
.Name("named_query")
);
But I found that was even worse. It didn't search multiple fields, only p.VeganItem.Name and still "ello" was not matching "hello".
How do I use Nest to search for a term that can be in the middle of a word and over multiple document fields?

Wildcard queries are expensive, if you want to customize and allow how many middle characters you want to search, you can do it using the n-gram tokenizer, that would be less expensive and will provide more customisation/flexibility to you.
I've also written a blog post on implementing the autocomplete and its various trade-offs with performance and functional requirements.

You will need to use wild card query for this scenario, for more information about wild cards query check here, and for nest WildQueries check here.
To do wild card query in Nest you can do like this:
new QueryContainer[]
{
Query<VeganItemEstablishmentSearchDto>.Wildcard(w => w
.Field(v => v.VeganItem.CompanyName))
.Value(query)),
Query<VeganItemEstablishmentSearchDto>.Wildcard(w => w
.Field(p => p.VeganItem.Name))
.Value(query)
}
Your should add asterisk (*) in the beginning and end of your query.
Please keep in your mind that wildCard queries are expensive and you might want to achieve these by having different Analyzer in your mapping.

QueryString from this SO answer is what worked for me for multiple fields and the middle of a word. I have not tried Amit's answer yet. I will in the future. This is the quick solution for a beginner:
QueryContainer &= Query<VeganItemEstablishmentSearchDto>
.QueryString(c => c
.Name("named_query")
.Boost(1.1)
.Fields(f => f.Field(p => p.VeganItem.Name).Field(v => v.VeganItem.CompanyName))
.Query($"*{query}*")
.Rewrite(MultiTermQueryRewrite.TopTermsBoost(10))
);
This also works:
QueryContainer = QueryContainer | Query<VeganItemEstablishmentSearchDto>
.MatchPhrase(c => c
.Boost(1.1)
.Field(f => f.VeganItem.Name)
.Query(query)
.Slop(1)
);
QueryContainer = QueryContainer | Query<VeganItemEstablishmentSearchDto>
.MatchPhrase(c => c
.Boost(1.1)
.Field(f => f.VeganItem.CompanyName)
.Query(query)
.Slop(1)
);
var terms = query.ToLower().Split(' ');
foreach (var term in terms)
{
QueryContainer = QueryContainer | Query<VeganItemEstablishmentSearchDto>
.Wildcard(c => c
.Value($"*{term}*")
.Field(f => f.VeganItem.CompanyName)
.Rewrite(MultiTermQueryRewrite.TopTermsBoost(10))
);
QueryContainer = QueryContainer | Query<VeganItemEstablishmentSearchDto>
.Wildcard(c => c
.Value($"*{term}*")
.Field(f => f.VeganItem.Name)
.Rewrite(MultiTermQueryRewrite.TopTermsBoost(10))
);
}

Related

Query one field with multiple values in elasticsearch nest

I have a combination of two queries with Elasticsearch and nest, the first one is a full-text search for a specific term and the second one is to filter or query another field which is file-path but it should be for many files paths and the path could be part or full path, I can query one file-path but I couldn't manage to do it for many file paths, any suggestion?
Search<SearchResults>(s => s
.Query(q => q
.Match(m => m.Field(f => f.Description).Query("Search_term"))
&& q
.Prefix(t => t.Field(f => f.FilePath).Value("file_Path"))
)
);
For searching for more than one path you can use bool Query in elasticsearch and then use Should Occur to search like logical OR, so you code should look like this:
Search<SearchResults>(s => s
.Query(q => q.
Bool(b => b
.Should(
bs => bs.Wildcard(p => p.FilePath, "*file_Pathfile_Path*"),
bs => bs.Wildcard(p => p.FilePath, "*file_Pathfile_Path*"),
....
))
&& q.Match(m => m.Field(f => f.description).Query("Search_term")
)));
Also you should use WildCard Query to get result for paths that could be part or full path. For more information check ES offical documentation about WildQuery and Bool Query below:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/bool-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

ElasticSearch NEST custom word joiner Analyzer not returning the correct result

I created an autocomplete filter with ElasticSearch using NEST API. I cant seem to get the word joiner to work.
So basically if I search for something like Transhex i also want to be able to return Trans Hex
My Index looks as follow...I think the WordDelimiter filter might be wrong.
Also, I followed the following article Link. They use the low-level API so it is possible that I am doing it completely wrong using the NEST API
var response = this.Client.CreateIndex(
"company-index",
index => index.Mappings(
ms => ms.Map<CompanyDocument>(m => m.Properties(p => p
.Text(t => t.Name(n => n.CompanyName).Analyzer("auto-complete")
.Fields(ff => ff.Keyword(k => k.Name("keyword")))))))
.Settings(f => f.Analysis(
analysis => analysis
.Analyzers(analyzers => analyzers
.Custom("auto-complete", a => a.Tokenizer("standard").Filters("lowercase", "word-joiner-filter", "auto-complete-filter")))
.TokenFilters(tokenFilter => tokenFilter
.WordDelimiter("word-joiner-filter", t => t.CatenateAll())
.EdgeNGram("auto-complete-filter", t => t.MinGram(3).MaxGram(30))))));
UPDATE
So I updated the Analyzer to look as follows, noticed that I updated the Analyzer from standard to keyword.
var response = this.Client.CreateIndex(
this.indexName,
index => index.Mappings(
ms => ms.Map<CompanyDocument>(m => m.Properties(p => p
.Text(t => t.Name(n => n.CompanyName).Analyzer("auto-complete")
.Fields(ff => ff.Keyword(k => k.Name("keyword")))))))
.Settings(f => f.Analysis(
analysis => analysis
.Analyzers(analyzers => analyzers
.Custom("auto-complete", a => a.Tokenizer("keyword").Filters("lowercase", "word-joiner-filter", "auto-complete-filter")))
.TokenFilters(tokenFilter => tokenFilter
.WordDelimiter("word-joiner-filter", t => t.CatenateAll())
.EdgeNGram("auto-complete-filter", t => t.MinGram(1).MaxGram(20))))));
The Results will look as follows
Search Keyword : perfect pools
Results
perfect pools -> This is the correct one at the top
EXCLUSIVE POOLS
Perfect Painters
Search Keyword : perfectpools Or PerfectPools
Results
Perfect Hideaways (Pty) Ltd -> this is the wrong one i would like to display perfect pools
PERFORMANTA APAC PTY LTD
Perfect Laser Technologies (PTY) LTD
Use Keyword tokenizer. The standard tokenizer will split the word in 2 tokens, then apply the filters on them.
UPDATE:
I used a search like this one and seems ok.
var searchResult = EsClient.Search<CompanyDocument>(q => q
.Index("test_index")
.Type("companydocument")
.TrackScores(true)
.Query(qq =>
{
QueryContainer queryContainer = null;
queryContainer = qq.QueryString(qs => qs.Fields(fs => fs.Field(f => f.CompanyName)).Query("perfectpools").DefaultOperator(Operator.And).Analyzer("auto-complete"));
return queryContainer;
})
.Sort(sort => sort.Descending(SortSpecialField.Score))
.Take(10)
);

Using which field matched in a multimatch query in a function score

I have a multimatch query which I am using across 5 fields. I am also using a function score to combine various factors into the score. I would like to add a factor to this so that results that matched on one of the fields is increased (adding a large number so that matches on this field always have the highest score).
I know that I can use highlighting to find out which fields were matched, but how can I access that information in the function score script?
Here's what I have so far (using NEST, but that shouldn't make a difference).
var searchResponse = client.Search<TopicCollection.Topic>(s => s
.Query(q => q
.FunctionScore(fs => fs
.Name("function_score_query")
.Query(q1 => q1
.MultiMatch(c => c
.Fields(f => f
.Field(p => p.field1)
.Field(p => p.field2) //...etc
.Query(searchTerm)
)
)
.Functions(fun => fun
.ScriptScore(ss => ss.Script(sc => sc
.Inline(
//TODO: add 1000 to normalised _score if match is in field1
)))
).BoostMode(FunctionBoostMode.Replace)
)
).Highlight(h => h
.Fields(p => p.AllField())
)
);

How to : ElasticSearch .NET and NEST 5.X Multimatch with wildcard

I has been search a lot of sample from Internet, however i still could not find any sample on Wildcard Search with more than one fields, can anyone help me with some example? Im very new into ElasticSearch. Below is what im trying to do with wildcard, but it work for one field.
How can i combine below Wildcard with MultiMatch in C#?
var result = client.Search<Metadata>(x => x
.Index("indexname")
.Type("Metadata")
.MatchAll()
.Query(q => q
.Wildcard(c => c
.Name("Query")
.Boost(1.1)
.Field(p => p.Title)
.Value("input*")
.Rewrite(MultiTermQueryRewrite.TopTermsBoost(10))
)
)
);
How can i add in below multi fields support like in Multimatch?
.Fields(f => f
.Fields(f1 => f1.Title, f2 => f2.Keywords)
)

Elasticsearch NEST - Phrase search

What methods should I use in order for my query to return hits with at least 2 keywords in the text from an input phrase.
For example, if the input "hello friend" I want the return results to contain documents where "hello" and "friend" somewhere in the text. If the input "hello good friend" I want results where 2 of 3 keyword in the text. Or at least results with best combinations be on top.
If I use code like one below I get results where "hello" or "friend" but not both.
var searchResults = client.Search<Thread>(s => s
.Type("threads")
.From(0)
.Size(100)
.Query(q => q
.Match(qs => qs
.OnField(p => p.Posttext)
.Query("hello friend")
)
)
.Highlight(h => h
.OnFields(
f => f.OnField("posttext").PreTags("<b>").PostTags("</b>").FragmentSize(150)
)
)
);
I can get results I want by code like this one but it is not flexible because phrase can be with arbitrary number of words.
var searchResults = client.Search<Thread>(s => s
.Type("threads")
.From(0)
.Size(100)
.Query(q => q
.Match(qs => qs
.OnField(p => p.Posttext)
.Query("hello")
)
&&
q.Match(qs => qs
.OnField(p => p.Posttext)
.Query("friend")
)
)
.Highlight(h => h
.OnFields(
f => f.OnField("posttext").PreTags("<b>").PostTags("</b>").FragmentSize(150)
)
)
);
I think I am missing something. Please help.
Thanks in advance.
you need to use phrase query..
within the match you need specify the type as phrase ..
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#query-dsl-match-query-phrase
IF you go through the article above i guess you can find a direction to your question..
PS: I am aware of elasticsearch for javascript...
I found that adding .Operator(Operator.And) to Match query works in my situation. But I need to investigate more on phrase search.

Resources