Implementing Fuzzines in Autocomplete in ElasticSearch

Implementing Fuzzines in Autocomplete in ElasticSearch - elasticsearch

I have implemented elasticsearch autocomplete. This is the current query that I use (node.js - elasticsearcj.js):
body: {
query: {
match_phrase_prefix: {
schoolname: {
query: clientSearchterm,
slop: 10,
max_expansions: 50,
fuzzy : {
fuzziness : 2
}
}
}
}
}
It works just fine. How do I implement Fuzziness parameter?

Simple. Just add this:
"fuzzy" : {
"fuzziness" : 2
}

Related

Elasticsearch random_score pushes documents towards the end of results

Here's the logic I am trying to accomplish:
I am using Elasticsearch to display top selling Products and randomly inserting newly created products in the results using function_score query DSL.
The issue I am facing is that I am using random_score fn for newly created products and the query does inserts new products up till page 2 or 3 but then rest all the other newly created products pushed towards the end of search results.
Here's the logic written for function_score:
function_score: {
query: query,
functions: [
{
filter: [
{ terms: { product_type: 'sponsored') } },
{ range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } }
],
random_score: {
seed: Time.current.to_i / (60 * 10), # new seed every 10 minutes
field: '_seq_no'
},
weight: 0.975
},
{
filter: { range: { live_at: { lt: 'CURRENT_DATE - 1.MONTH' } } },
linear: {
weighted_sales_rate: {
decay: 0.9,
origin: 0.5520974289580515,
scale: 0.5520974289580515
}
},
weight: 1
}
],
score_mode: 'sum',
boost_mode: 'replace'
}
And then I am sorting based on {"_score" => { "order" => "desc" } }
Let's say there are 100 sponsored products created in last 1 month. Then the above Elasticsearch query displays 8-10 random products (3 to 4 per page) as I scroll through 2 or 3 pages but then all other 90-92 products are displayed in last few pages of the result. - This is because the score calculated by random_score for 90-92 products is coming lower than the score calculated by linear
decay function.
Kindly suggest how can I modify this query so that I continue to see newly created Products as I navigate through pages and can prevent pushing new records towards the end of results.
[UPDATE]
I tried adding gauss decay function to this query (so that I can somehow modify the score of the products appearing towards the end of result) like below:
{
filter: [
{ terms: { product_type: 'sponsored' } },
{ range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } },
{ range: { "_score" => { lt: 0.9 } } }
],
gauss: {
views_per_age_and_sales: {
origin: 1563.77,
scale: 1563.77,
decay: 0.95
}
},
weight: 0.95
}
But this too is not working.
Links I have referred to:
https://intellipaat.com/community/12391/how-to-get-3-random-search-results-in-elasticserch-query
Query to get random n items from top 100 items in Elastic Search
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-function-score-query.html

I am not sure if this is the best solution, but I was able to accomplish this with wrapping up the original query with script_score query + I have added a new ElasticSearch indexing called sort_by_views_per_year. Here's how the solution looks:
Link I referred to: https://github.com/elastic/elasticsearch/issues/7783
attribute(:sort_by_views_per_year) do
object.live_age&.positive? ? object.views_per_year.to_f / object.live_age : 0.0
end
Then while querying ElasticSearch:
def search
#...preparation of query...#
query = original_query(query)
query = rearrange_low_scoring_docs(query)
sort = apply_sort opts[:sort]
Product.search(query: query, sort: sort)
end
I have not changed anything in original_query (i.e. using random_score to products <= 1.month.ago and then use linear decay function).
def rearrange_low_scoring_docs query
{
function_score: {
query: query,
functions: [
{
script_score: {
script: "if (_score.doubleValue() < 0.9) {return 0.9;} else {return _score;}"
}
}
],
#score_mode: 'sum',
boost_mode: 'replace'
}
}
end
Then finally my sorting looks like this:
def apply_sort
[
{ '_score' => { 'order' => 'desc' } },
{ 'sort_by_views_per_year' => { 'order' => 'desc' } }
]
end
It would be way too helpful if ElasticSearch random_score query DSL starts supporting something like: max_doc_to_include and min_score attributes. So that I can use it like:
{
filter: [
{ terms: { product_type: 'sponsored' } },
{ range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } }
],
random_score: {
seed: 123456, # new seed every 10 minutes
field: '_seq_no',
max_doc_to_include: 10,
min_score: 0.9
},
weight: 0.975
},

Unknown key for a START_OBJECT in [bool] in elastic search

Elasticsearch is giving this error like Unknown key for a START_OBJECT in [bool] in Elasticsearch.
My query is as below: Updated
var searchParams = {
index: 'offers',
body:{
query:{
bool : {
must : {
query: {
multi_match: {
query: query,
fields:['title','subTitle','address','description','tags','shopName'],
fuzziness : 'AUTO'
}
}
},
filter : {
geo_distance : {
distance : radius,
location : {
lat : latitude,
lon : longitude
}
}
}
}}},
filter_path :'hits.hits._source',
pretty:'true'
};
Can anyone tell me how to mix this both geo and fuzzy search query in elastic search?

The body should look like this (you're missing the query section):
body:{
query: { <--- add this
bool : {
must : {
multi_match: {
query: query,
fields:['title','subTitle','address','description','tags','shopName'],
fuzziness : 'AUTO'
}
},
filter : {
geo_distance : {
distance : radius,
location : {
lat : latitude,
lon : longitude
}
}
}
}}},

Elasticsearch custom sorting / adding filter clauses scores

I have this simple documents set:
{
id : 1,
book_ids : [2,3],
collection_ids : ['a','b']
},
{
id : 2,
book_ids : [1,2]
}
If I run this filter query, it will match both documents:
{
bool: {
filter: [
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'book_ids'
}
}
}
},
{
bool: {
filter: {
term: {
book_ids: 2
}
}
}
}
]
}
},
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'collection_ids'
}
}
}
},
{
bool: {
filter: {
term: {
collection_ids: 'a'
}
}
}
}
]
}
}
]
}
}
The thing is I want to sort these documents, and I would like the first one (id: 1) to be returned first because it matched both the book_ids value and the collection_ids values provided.
A simple sort clause like this one is not working:
[
'book_ids',
'collection_ids'
]
because it will return first document 2 due to the book_ids array first value.
Edit: this is a simplified example of the problem I am facing, which has N such clauses in the should clause. Moreover there is an order between the clauses, as I tried to reflect with the sort snippet: results matching the first clause (book_ids) should appear before results matching the second clause (collection_ids). I am really looking for some kind of SQL sort operation where I would only take into account the matching value of the field array. A viable option might be to assign decreasing constant_scores to each term clause, according to the expected sort order, and ES would have to sum this sub-scores to compute the final score. But I cannot figure out how to do it or if it is even possible.
Bonus question:
is there any way for ElasticSearch to return some kind of new document with only the matching values? Here is what I would expect as a response to the above filter query:
{
id : 1,
book_ids : [2],
collection_ids : ['a']
},
{
id : 2,
book_ids : [2]
}

I think you're right about the constant score idea. I think you can do it like this:
{
query: {
bool: {
must: [
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'book_ids'
}
}
}
},
{
constant_score: {
filter: {
term: {
book_ids: 2
}
},
boost: 100
}
}
]
}
},
{
bool: {
should: [
{
bool: {
must_not: {
exists: {
field: 'collection_ids'
}
}
}
},
{
constant_score: {
filter: {
term: {
collection_ids: 'a'
}
},
boost: 50
}
}
]
}
}
]
}
}
}
I think the only thing you were missing using constant score, was likely just that the top level query needs to be must, not filter. (There's no scoring for filters, all the scores are 0.)
An alternative would be to put the filter inside a function_score query (but leave it as a filter), and then compute the score as you want (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html)
As to the bonus question, it's possible if you use a script field to filter and add a new field like you want (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html), but it's not possible in a straightforward way. It's probably easier and makes more sense to do that filtering after you receive the result, unless you have very long lists in your values.

Partial matching not working in this query

Why does the following only match exact, and not partial?
body: {
query: {
filtered: {
filter: {
bool: {
should: [
{ query: { match: { "name": "*"+searchterm+"*" }}},
]
}
}
}
}
}
"*"+searchterm+"*" should match any words that contains searchterm. ie,
item1
item2
0item
But it only matches words exact searchterm ie, only item. Why is this?

If the name field is using default analyzer then the asterisk wildcard characters are dropped during analysis phase. Hence you always get results where name is exactly sarchterm. You need to use a Wildcard query for matching any document where value of name field contains searchterm.
query: {
filtered: {
filter: {
bool: {
should: [
{
query: {
wildcard: {
"name": "*" + searchterm + "*"
}
}
}
]
}
}
}
}

Elasticsearch: Is it possible to query for a term facet that contains more than a term

Part of my mapping looks like this:
{
...
INFO_NODO: {
properties: {
CODIGO: {
type: string
}
ESTADO: {
type: string
}
IN_HOME: {
type: string
}
TEXTO: {
type: string
}
ID_NODO: {
type: integer
}
...
}
}
}
I need to make a facet that will return the fields: ID_NODO, TEXTO, IN_HOME, ESTADO, CODIGO, and COUNT to parse it and feed it to my application. The key is that all these fields except COUNT are dependant on the ID_NODO, that is, if the field INFO_NODO is the same the rest of the information is the same... with that being said ideally I would like to make my facet dependent on the whole INFO_NODO field and not its sub-fields.
I found several solutions but I keep either failing to implement them properly or they are just not working. Any thoughts on my weird situation?
EDIT: What I'd need to do is:
{
"facets": {
"FACET_X_NODO": {
"terms": {
"field": "INFO_NODO"
}
}
}
}
I just can't get the syntax in no documentation since INFO_NODO is a subdocument and not a field.

If I understood you correctly, you should be able to do something like this:
{
"query" : {
"match_all" : { }
},
"facets" : {
"info_node_facet" : {
"terms" : {
"script_field" : "_source.INFO_NODO.CODIGO + _source.INFO_NODO.ESTADO",
"size" : 10
}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Implementing Fuzzines in Autocomplete in ElasticSearch - elasticsearch

Simple. Just add this: "fuzzy" : { "fuzziness" : 2 }

Related

Elasticsearch random_score pushes documents towards the end of results

Unknown key for a START_OBJECT in [bool] in elastic search

Elasticsearch custom sorting / adding filter clauses scores

Partial matching not working in this query

Elasticsearch: Is it possible to query for a term facet that contains more than a term

Categories

Resources