is there a way to boost matches in long documents in elasticsearch - elasticsearch

In elasticsearch the length of the document matters a lot to the final score of the search results. So if I have a match in a field that is just one line long, its going to score much higher than a single match in say .... a document with 5 pages of text.
Is there a way to override this behavior, or reliably and repeatedly boost the result to overcome this behavior?

I guess you mean that the length of the matching field is taken into account when computing the score. If you want to just disable this behaviour you can omit norms while indexing. That way you would lose index time boosting as well, but I guess you're not using it and even if you need boosting you should use query time boosting, way more flexible.
You have to update the mapping for your field like this:
"field_name" : {
"type" : "string",
"omit_norms" : true
}
If you want to override this default behaviour for all your string fields you can use a dynamic template like this:
{
"type_name" : {
"dynamic_templates" : [
{
"omit_norms_template" : {
"match_mapping_type" : "string",
"mapping" : {
"omit_norms" : true
}
}
}
]
}
}

Related

Elastic exact matching and substring matching together

I know that Elastic have "keyword" type in order to find something with exact matching. Ex:
"address": { "type": "keyword"}
That's cool. exact matching works!
but I would like to have both "exact matching" and "sub-string" matching. So I decided to create the following mapping:
"address": { "type": "text" , "index": true }
Problem
If I have "text" type, how can I search exact matching string? (not sub-string). I've tried several ways but does not works:
GET testing_index/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"address" : "washington"
}
}
}
}
}
or
GET testing_index/_search
{
"query": {
"match": {
"address" : "washington"
}
}
}
I need just something universal mapping:
to find exact string
to find sub-strings
I hope elastic can do this.
By default, text fields use the default analyzer, which drops most punctuation, breaks up text into individual words, and lower cases them. For instance, the standard analyzer would turn the string “Quick Brown Fox!” into the terms [quick, brown, fox]. As you can imagine, this makes it difficult to write an exact match query against the text field. For your use case, I suggest one of 2 options:
store as keyword, and accomplish sub-string-like matching using wildcard or fuzzy queries. Wildcard queries, in particular queries with a leading wildcard, are notoriously slow, so proceed with caution.
store the field twice: one as keyword and one as text. Obvious downside here is bloating the size of the index.
For more background, see the "Term Query" Elasticsearch documentation, and in particular the section on "Why doesn’t the term query match my document?": https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

Phrase suggester returns unexpected result when first letter is misspelled

I'm using Elasticsearch Phrase Suggester for correcting user's misspellings. everything is working as I expected unless user enters a query which it's first letter is misspelled. At this situation phrase suggester returns nothing or returns unexpected results.
My query for suggestion:
{
"suggest": {
"text": "user_query",
"simple_phrase": {
"phrase": {
"field": "title.phrase",,
"collate": {
"query": {
"inlile" : {
"bool": {
"should": [
{ "match": {"title": "{{suggestion}}"}},
{ "match": {"participants": "{{suggestion}}"}}
]
}
}
}
}
}
}
}
}
Example when first letter is misspelled:
"simple_phrase" : [
{
"text" : "گاشانچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "گارانتی",
"score" : 0.00253151
}]
}
]
Example when fifth letter is misspelled:
"simple_phrase" : [
{
"text" : "کاشاوچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "کاشانچی",
"score" : 0.1121
},
{
"text" : "کاشانجی",
"score" : 0.0021
},
{
"text" : "کاشنچی",
"score" : 0.0020
}]
}
]
I expect that these two misspelled queries have same suggestions(my expected suggestions are second one). what is wrong?
P.S: I'm using this feature for Persian language.
I have solution for your problem, only need to add some fields in your schema.
P.S: I don't have that much expertise in elasticsearch but I have solved same problem using solr, you can implement same way in elasticSearch too
Create new ngram field and copy all you title name in ngram field.
When you fire any query for missspell word and you get blank result then split
the word and again fire the same query you will get results as expected.
Example : Suppose user searching for word Akshay but type it as Skshay, then
create query in below way you will get results as expected hopefully.
I am here giving you solr example same way you can achieve it using
elasticsearch.
**(ngram:"skshay" OR ngram:"sk" OR ngram:"ks" OR ngram:"sh" OR ngram:"ha" ngram:"ay")**
We have split the word sequence wise and fire query on field ngram.
Hope it will help you.
From Elasticsearch doc:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-suggesters-phrase.html
prefix_length
The number of minimal prefix characters that must match in order be a
candidate suggestions. Defaults to 1. Increasing this number improves
spellcheck performance. Usually misspellings don’t occur in the
beginning of terms. (Old name "prefix_len" is deprecated)
So by default phrase-suggester assumes that the first character is correct because the default value for prefix_length is 1.
Note: setting this value to 0 is not a good way because this will have performance implications.
You need to use the reverse analyzer
I explained it in this post so please go and check my answer
Elasticsearch spell check suggestions even if first letter missed
And regarding the duplicates, you can use
skip_duplicates
Whether duplicate suggestions should be filtered out (defaults to
false).

How to run Elasticsearch completion suggester query on limited set of documents

I'm using a completion suggester in Elasticsearch on a single field. The type contains documents of several users. Is there a way to limit the returned suggestions to documents that match a specific query?
I'm currently using this query:
{
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
Is there a way to combine this query with a different one, e.g.
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
}
}
Have a look at the context suggester, which is just a specialized completion suggester with filtering capabilities - however this is still not a regular query filter, just keep that in mind.
You can specify both the query and the suggester in your query, like this:
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
},
"suggest": {
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
}
I have a similar use case, and I've posted my question on elastic search forum, see here
From what I've read so far, I don't think with completion suggester you can limit documents. They essentially create a finite state transducer (prefix tree) at index time, this makes it fast but you lose the flexibility of filtering on additional fields. I don't think context suggester would work in your case (let me know if i am wrong), because the cardinality of user_id is very high.
I think edge-ngrams partial matching is more flexible and might actually work in your use case.
Let me know what you end up implementing.

ElasticSearch filter on exact url

Let's say I create this document in my index:
put /nursery/rhyme/1
{
"url" : "http://example.com/mary",
"text" : "Mary had a little lamb"
}
Why does this query not return anything?
POST /nursery/rhyme/_search
{
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"url" : "http://example.com/mary"
}
}
}
The Term Query finds documents that contain the exact term specified in the inverted index. When you save the document, the url property is analyzed and it will result in the following terms (with the default analyzer) : [http, example, com, mary].
So what you currently have in you inverted index is that bunch of terms, non of them is http://example.com/mary.
What you want is to not analyze the url property or to do a Match Query that will split the query into terms just like when indexing.
Exact Match does not work for analyzed field. A string is by default analyzed which means http://example.com/mary string will be split and stored in reverse index as http , example , com , mary. That's why your query results in no output.
You can make your field not analyzed
{
"url": {
"type": "string",
"index": "not_analyzed"
}
}
but for this you will have to reindex your index.
Study about not_analyzed and term query here.
Hope this helps
In the ElasticSearch 7.x you have to use type "keyword" in maping properties, which is not analized https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html

Exact (not substring) matching in Elasticsearch

{"query":{
"match" : {
"content" : "2"
}
}} matches all the documents whole content contains the number 2, however I would like the content to be exactly 2, no more no less - think of my requirement in a spirit of Java's String.equals.
Similarly for the second query I would like to match when the document's content is exactly '3 3' and nothing more or less. {"query":{
"match" : {
"content" : "3 3"
}
}}
How could I do exact (String.equals) matching in Elasticsearch?
Without seeing your index type mapping and sample data, it's hard to answer this directly - but I'll try.
Offhand, I'd say this is similar to this answer here (https://stackoverflow.com/a/12867852/382774), where you simply set the content field's index option to not_analyzed in your mapping:
"url" : {
"type" : "string",
"index" : "not_analyzed"
}
Edit: I wasn't clear enough with my original answer, shown above. I did not mean to imply that you should add the example code to your query, I meant that you need to specify in your index type mapping that the url field is of type string and it is indexed but not analyzed (not_analyzed).
This tells Elasticsearch to not bother analyzing (tokenizing or token filtering) the field when you're indexing your documents - just store it in the index as it exists in the document. For more information on mappings, see http://www.elasticsearch.org/guide/reference/mapping/ for an intro and http://www.elasticsearch.org/guide/reference/mapping/core-types/ for specifics on not_analyzed (tip: search for it on that page).
Update:
Official doc tells us that in a new version of Elastic search you can't define variable as "not_analyzed", instead of this you should use "keyword".
For the old version elastic:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
For new version:
{
"foo": {
"type" "keyword",
"index": true
}
}
Note that this functionality (keyword type) are from elastic 5.0 and backward compatibility layer is removed from Elasticsearch 6.0 release.
Official Doc
You should use filter instead of match.
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"content" : 2
}
}
}
}
And you got docs whose content is exact 2, instead of 20 or 2.1

Resources