How to boost Elasticsearch results based on another field? - elasticsearch

Kinda simple use case but cannot come up with good solution.
Basically I have two indexed fields: content and keywords (keyword tokenizer), where content is a long text field and keywords contain important terms within that content. When I query with some long text, I have to boost those results based on the keywords present in the matching document.
I tried querying the complete text on both content and keywords field, but it is too slow or it throws too_many_clauses error for text with more than 40 words.
{"query": {
"match": {
"keywords": {
"query": "some long text",
"analyzer": "custom_analyzer"
}
}
}}
Is there any better way? Would percolator work here?

I can relate this to my application, which is similar to Stackoverflow, which consists of question and answers, for a question, there is subject, body, tags etc.
Subject here relates to your keyword indexed field and body relate to your content indexed field. Normally subject contains the important keywords about the post, which is also the case with you.
Now coming to solution part,
How we solve it by querying both on subject and body indexed fields but boost subject by a factor of 15, which is configurable.
ES query which we use:
{
"query": {
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^15", "message" ]
}
}
}
This ES doc also has a similar example where they are boosting a subject field in multi_match query by a factor of 3.
Let me know if you have any questions.

Related

How can I query Elasticsearch to output the exact position of a searched keyword or sentence?

I indexed several documents into my Elasticsearch cluster and queried the Elasticsearch cluster using some keywords and sentences, the output from my query displayed the entire documents where the sentences or keywords where be found.
I want a case where if a query is carried out, it should display just the paragraph where the sentence or keyword can be found and also show the page number it was found.
You can use highlighting functionality with source filtering. So it will show only field which is required and you can hide the remaining field.
You can set _source to false so it will return only highlighted field. If you want to search on different field and highlight on different field then you can set require_field_match to false. Please refer the elastic doc for more referance.
GET /_search
{
"_source":false,
"query": {
"match": { "content": "kimchy" }
},
"highlight": {
"require_field_match":false,
"fields": {
"content": {}
}
}
}

How can I make ElasticSearch yield just the first couple of words for a field?

I'm using ElasticSearch to query a set of rather long documents. Each document has (among other things) a title, a URL and a body.
When presenting the results to the user, I'd like to present just an 'abstract' of each document (along with the title and the URL). However, returning the full body only to trim it client-side seems wasteful.
Alas, I don't have a dedicated 'abstract' field or the like. Hence I wonder: is there a way to make ElasticSearch yield just the beginning (e.g. the first 200 words) of the 'body' field for each hit? I looked at source filtering (which I'm already using in my queries) but that seems to just select/deselect individual fields for the response. I'm rather looking for a way to transform the returned data.
It appears that Script Fields are one way to solve this. Here is an example query which gets the title, uri and a scripted(!) abstract field for each document. The abstract consists of the firsts 200 letters of the actual content field:
{
"query": {
"match": {
"title": "Scripting"
},
},
"_source": ["title", "uri"],
"script_fields": {
"abstract": {
"script": {
"lang": "painless",
"source": "params['_source'].content.substring(0, 200)"
}
}
}
}

Elastic exact matching and substring matching together

I know that Elastic have "keyword" type in order to find something with exact matching. Ex:
"address": { "type": "keyword"}
That's cool. exact matching works!
but I would like to have both "exact matching" and "sub-string" matching. So I decided to create the following mapping:
"address": { "type": "text" , "index": true }
Problem
If I have "text" type, how can I search exact matching string? (not sub-string). I've tried several ways but does not works:
GET testing_index/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"address" : "washington"
}
}
}
}
}
or
GET testing_index/_search
{
"query": {
"match": {
"address" : "washington"
}
}
}
I need just something universal mapping:
to find exact string
to find sub-strings
I hope elastic can do this.
By default, text fields use the default analyzer, which drops most punctuation, breaks up text into individual words, and lower cases them. For instance, the standard analyzer would turn the string “Quick Brown Fox!” into the terms [quick, brown, fox]. As you can imagine, this makes it difficult to write an exact match query against the text field. For your use case, I suggest one of 2 options:
store as keyword, and accomplish sub-string-like matching using wildcard or fuzzy queries. Wildcard queries, in particular queries with a leading wildcard, are notoriously slow, so proceed with caution.
store the field twice: one as keyword and one as text. Obvious downside here is bloating the size of the index.
For more background, see the "Term Query" Elasticsearch documentation, and in particular the section on "Why doesn’t the term query match my document?": https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

Search in two fields on elasticsearch with kibana

Assuming I have an index with two fields: title and loc, I would like to search in this two fields and get the "best" match. So if I have three items:
{"title": "castle", "loc": "something"},
{"title": "something castle something", "loc": "something,pontivy,something"},
{"title": "something else", "loc": "something"}
... I would like to get the second one which has "castle" in its title and "pontivy" in its loc. I tried to simplify the example and the base, it's a bit more complicated. So I tried this query, but it seems not accurate (it's a feeling, not really easy to explain):
GET merimee/_search/?
{
"query": {
"multi_match" : {
"query": "castle pontivy",
"fields": [ "title", "loc" ]
}
}
}
Is it the right way to search in various field and get the one which match the in all the fields?
Not sure my question is clear enough, I can edit if required.
EDIT:
The story is: the user type "castle pontivy" and I want to get the "best" result for this query, which is the second because it contains "castle" in "title" and "pontivy" in "loc". In other words I want the result that has the best result in both fields.
As the other posted suggested, you could use a bool query but that might not work for your use case since you have a single search box that you want to query against multiple fields with.
I recommend looking at a Simple Query String query as that will likely give you the functionality you're looking for. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
So you could do something similar to this:
{
"query": {
"simple_query_string" : {
"query": "castle pontivy",
"fields": ["title", "loc"],
"default_operator": "and"
}
}
}
So this will try to give you the best documents that match both terms in either of those fields. The default operator is set as AND here because otherwise it is OR which might not give you the expected results.
It is worthwhile to experiment with other options available for this query type as well. You might also explore using a Query String query as it gives more flexibility but the Simple Query String term works very well for most cases.
This can be done by using bool type of query and then matching the fields.
GET _search
{
"query":
{
"bool": {"must": [{"match": {"title": "castle"}},{"match": {"loc": "pontivy"}}]
}
}
}

Custom score for exact, phonetic and fuzzy matching in elasticsearch

I have a requirement where there needs to be custom scoring on name. To keep it simple lets say, if I search for 'Smith' against names in the index, the logic should be:
if input = exact 'Smith' then score = 100%
else
if input = phonetic match then
score = <depending upon fuzziness match of input with name>%
end if
end if;
I'm able to search documents with a fuzziness of 1 but I don't know how to give it custom score depending upon how fuzzy it is. Thanks!
Update:
I went through a post that had the same requirement as mine and it was mentioned that the person solved it by using native scripts. My question still remains, how to actually get the score based on the similarity distance such that it can be used in the native scripts:
The post for reference:
https://discuss.elastic.co/t/fuzzy-query-scoring-based-on-levenshtein-distance/11116
The text to look for in the post:
"For future readers I solved this issue by creating a custom score query and
writing a (native) script to handle the scoring."
You can implement this search logic using the rescore function query (docs here).
Here there is a possible example:
{
"query": {
"function_score": {
"query": { "match": {
"input": "Smith"
} },
"boost": "5",
"functions": [
{
"filter": { "match": { "input.keyword": "Smith" } },
"random_score": {},
"weight": 23
}
]
}
}
}
In this example we have a mapping with the input field indexed both as text and keyword (input.keyword is for exact match). We re-score the documents that match exactly the term "Smith" with an higher score respect to the all documents matched by the first query (in the example is a match, but in your case will be the query with fuzziness).
You can control the re-score effect tuning the weight parameter.

Resources