Elasticsearch: Field level custom scores in text searches - elasticsearch

I just started exploring elasticsearch. I need to find an approach for specifying custom scores at field level. For example:
I have a collection named blog whose documents have following format:
{
"_id": "1736hst26672829",
"name": "Learning regular expressions basics",
"author": "John Lee",
"summery": "Here is summery.",
"body": "Content of the blog."
}
If I search a text 'xyz' in the collection then the result should reflect following score criteria
match in the field 'name' has priority 1.
match in the author field has the 2nd priority.
match in the summery has 3rd priority.
match in the body has least priority.
I need top 10 results on the basis of the above criteria.
:

Scoring in ElasticSearch is extremely customizable, the following applies to query time based custom scoring. There are various other scoring options, by index, in your mapping (and thus applied to every query), on Filters or Facets, using boosts or custom scoring.
While Custom Score Query is generally the most powerful solution, here are the docs for various custom scoring methods to read up on.
Boosting Query
Custom Boost Factor Query
Custom Score Query
The following is probably the simplest methods to apply custom scoring in query time, although I suggest you read up on Custom Score Query.
"query": {
"filtered": {
"query":
"bool": {
"must": [
{"multi_match": {
"fields": [
"name^4",
"author^3",
"summery^2",
"body^1"
],
"query": "xyz",
"operator": "AND",
"type": "cross_fields",
"analyzer": "standard"
}}
]
}
}
}
}
For people who search this answer but wish to use NEST, bellow is the same query using NEST. Use the ^ character to boost specific fields or use OnFieldsWithBoost to give fields custom scoring, and the query is sorted by score.
var query = "xyz";
//Add your field names to a string in lower camelCase as is ES default.
List<string> searchIn = new List<string(new string[] {"_id","name","author","summery","body"});
.Type("blogType")
.SortDescending("_score")
.Query(
q => q.MultiMatch(
t => t.OnFields(
searchIn
.Select(qs => qs == "name" ? "name^4" : qs)
.Select(qs => qs == "author" ? "author^3" : qs)
.Select(qs => qs == "summery" ? "summery^2" : qs)
.Select(qs => qs == "body" ? "body" : qs)
)
.Query(query)
)
)
If you have the correct (default) mapping in ES (C# Object to ES indexed JSON Object), you can also use the following within the OnFields:
t => t.OnFieldsWithBoost(qs => qs.Add(entry => entry.Name, 4.0)
.Add(entry => entry.Author, 3.0)
.Add(entry => entry.Summary, 2.0)
.Add(entry => entry.Body, 1.0))

Related

Filtering DSL Query Search - Elasticsearch

I was reading a few articles and documents over query context and filter context and learned that it is always best to use filter context if you do not need to do a full text search or scoring does not matter. In my case, I am wanting to return the logs that contain an ID... so I then realized I should just use a filter context instead of a query context. Besides the full text search or scoring is there a hard base line that defines when you should use one over the other?
So I went from my original DSL query search cmd:
GET /customer-simulation-es-app-logs*/_search
{
"query": {
"match": {
"_id": "mJvG0nkBiU3wk_hrEd-8"
}
}
to the filter context:
GET /customer-simulation-es-app-logs*/_search
{
"query": {
"bool": {
"filter": [
{"match": {"_id": "mJvG0nkBiU3wk_hrEd-8"}}
]
}
}
}
}
Since, I am wanting to use NEST to perform the query search I took this approach.
[HttpGet("GetAll/{_id}")]
public async Task<EsSource> GetAll(String _id)
{
var response = await _elasticClient.SearchAsync<EsSource>(s => s
.Index("customer-simulation-es-app-logs*")
.Query(q => q
.Bool(b => b
.Filter(f => f
.Match(m => m.Field("_id").Query(_id))))));
return response?.Documents?.FirstOrDefault();
}
Would this be the correct way to do a filter context using NEST?
That would be the correct way to issue a query with only a filter context. Some additional points that might help
A term-level query on the _id field, like a term query should suffice, as there's no analysis chain involved
If you know the index that contains the document, the get API would be a better option. Given a wildcard index pattern is being used though, implies that the index might not be known.
NEST has convenient operator overloads on queries to make writing bool queries more succinct. The final query can be written more succinctly as
var response = await _elasticClient.SearchAsync<EsSource>(s => s
.Index("customer-simulation-es-app-logs*")
.Query(q => +q
.Match(m => m
.Field("_id")
.Query(_id)
)
)
);

Custom score for exact, phonetic and fuzzy matching in elasticsearch

I have a requirement where there needs to be custom scoring on name. To keep it simple lets say, if I search for 'Smith' against names in the index, the logic should be:
if input = exact 'Smith' then score = 100%
else
if input = phonetic match then
score = <depending upon fuzziness match of input with name>%
end if
end if;
I'm able to search documents with a fuzziness of 1 but I don't know how to give it custom score depending upon how fuzzy it is. Thanks!
Update:
I went through a post that had the same requirement as mine and it was mentioned that the person solved it by using native scripts. My question still remains, how to actually get the score based on the similarity distance such that it can be used in the native scripts:
The post for reference:
https://discuss.elastic.co/t/fuzzy-query-scoring-based-on-levenshtein-distance/11116
The text to look for in the post:
"For future readers I solved this issue by creating a custom score query and
writing a (native) script to handle the scoring."
You can implement this search logic using the rescore function query (docs here).
Here there is a possible example:
{
"query": {
"function_score": {
"query": { "match": {
"input": "Smith"
} },
"boost": "5",
"functions": [
{
"filter": { "match": { "input.keyword": "Smith" } },
"random_score": {},
"weight": 23
}
]
}
}
}
In this example we have a mapping with the input field indexed both as text and keyword (input.keyword is for exact match). We re-score the documents that match exactly the term "Smith" with an higher score respect to the all documents matched by the first query (in the example is a match, but in your case will be the query with fuzziness).
You can control the re-score effect tuning the weight parameter.

Performing an AND query in elastic search

I have tried looking for another solution to this, but the Bool query in ES seems to not do quite what I am looking for. Or I am just not using it correctly.
In our current implementation of search we are trying to boost performance/reduce memory footprint of each query by changing our query logic. Today, if you search for "The Red Ball" you may get back 5 million documents because ES returns any document that matches "the" OR "red" OR "ball" which means we get back WAAAAAY too many irrelevant documents (mostly because of the "the" term). I would like to change our query to instead use AND so ES would return only documents that match "the" AND "red" AND "ball".
I am using the NEST Client to do this with C# so an example using the client would be best since that seems to be where I cannot figure out what to do. Thanks
You can simply use query string query with AND operator.
{
"query": {
"query_string": {
"default_field": "your_field", <--- remove this if you want to search on all fields
"query": "the red ball",
"default_operator": "AND"
}
}
}
or simply
{
"query": {
"query_string": {
"query": "the AND red AND ball"
}
}
}
I do not know C#, but this is how it might look in nest(everyone,feel free to edit)
client.Search<your_index>(q => q
.Query(qu => qu
.QueryString(qs=>qs
.OnField(x=>your_field).Query("the AND red AND ball")
)
)
);
I found the appropriate query to make using the NEST client:
SearchDescriptor<BackupEntitySearchDocument> desc = new SearchDescriptor<BackupEntitySearchDocument>();
desc.Query(qq => qq.MultiMatch(m => m.OnFields(_searchFields).Query(query).Operator(Operator.And)));
var searchResp = await _client.SearchAsync<BackupEntitySearchDocument>(desc).ConfigureAwait(false);
Where _searchFields is a List<string> containing the fields to match on and query is the term to search for.

Extracting matching conditions from querystring

ElasticSearch Query is formed using query string with multiple AND / OR operators. i.e. ((Condition 1 OR Condition 2) AND (Condition 3 OR Condition 4 OR Condition 5)), based on the condition it provides me multiple documents. For getting exact condition I again loop through all the resultant documents again and mark particular conditions. Is there any simple way to get resultant conditions specific to documents ?
Can anyone provide the better example using NEST API?
I think that what you need is to Highlight the data that made the hit on your query. Highlight functionality of elasticsearch actually marks the text from each search result so the user can see why the document matched the query. The marked text is returned in the response.
Please refer in the elasticsearch documentation in order to understand how this api actually works. Refer in the Nest Documentation in order to see how you can implement it with the Nest library.
For example, using the elasticsearch api imagine the below example:
GET /someIndex/someType/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
The same with Nest:
var result = _client.Search<someIndex>(s => s
.Query(q => q
.MatchPhrase(qs => qs
.OnField(e => e.about)
.Query("rock climbing")
)
)
.Highlight(h => h
.OnFields(f => f
.OnField(e => e.about)
)
)
);
The response will be of the below form for each search result (notice the highlight part)
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>"
]
}

Elastic Search boost query corresponding to first search term

I am using PyElasticsearch (elasticsearch python client library). I am searching strings like Arvind Kejriwal India Today Economic Times and that gives me reasonable results. I was hoping I could increase weight of the first words more in the search query. How can I do that?
res = es.search(index="article-index", fields="url", body={
"query": {
"query_string": {
"query": "keywordstr",
"fields": [
"text",
"title",
"tags",
"domain"
]
}
}
})
I am using the above command to search right now.
split given query into multiple terms. In your example it will be Arvind, Kejriwal... Now form query string queries(or field query or any other which fits into the need) for each of the given terms. A query string query will look like this
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-query-string-query.html
{
"query_string" : {
"default_field" : "content",
"query" : "<one of the given term>",
"boost": <any number>
}
}
Now you have got multiple queries like above with different boost values(depending upon which have higher weight). Combine all of those queries into one query using BOOL query. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
If you want all of the terms to be present in the result, query will be like this.
{
"bool" : {
"must" : [q1, q2, q3 ...]
}
}
you can use different options of bool query. for example you want any of 3 terms to present in result then query will be like
{
"bool" : {
"should" : [q1, q2,q3 ...]
},
"minimum_should_match" : 3,
}
theoretically:
split into terms using api
query against terms with different boosting
Lucene Query Syntax does the trick. Thanks
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Boosting%20a%20Term

Resources