Elasticsearch : get most similar result - elasticsearch

I'm relatively new to ElasticSearch. I need to write an api which returns most similar result match with a name.
Example: i want to find a phone has name most similar to 'samsung s6', this is my JSON query:
{
"query": {
"match": {
"title": {
"query": 'samsung s6',
"operator": "and"
}
}
},
"track_scores": True
}
and i got (formatted on my own) :
6 - Samsung S6 Edge - 5.9510574
5 - Samsung S6 - 7.512151
where first field is just an Id, second is Name field on which ElasticSearch performed it's searching, and third is score.
I tried to sort by _score:
{
"query": {
"match": {
"title": {
"query": 'samsung s6',
"operator": "and"
}
}
},
"track_scores": True,
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
it seems work fine. But when i try with another name, i.e: 'iphone 6':
3 - Apple iPhone 6 - 5.569293
1 - Apple iPhone 6 Plus - 5.8405986
How can i get most similar result match with name?
UPDATE:
This is mapping:
"device_group": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "string",
}
}
}

Related

Elasticsearch score from 0 to 1 for searching similar documents to the one that exists

Need to calculate relative score from 0 to 1 when searching similar documents to existing one?
So existing one has score 1, and all other matching documents score should be calculated according to this and score will be <= 1. But existing document should be excluded from the search. Is it possible to do it on elasticsearch side, not just calculating score manually in a programming language like:
match_doc_score/search_doc_score
Let's imagine we have index person with mapping:
{
"properties": {
"person_id": {
"type": "keyword"
},
"fullname": {
"type": "text"
},
"email": {
"type": "keyword"
},
"phone": {
"type": "keyword"
},
"country_of_birth": {
"type": "keyword"
}
}
}
And I have 3 persons inside the index:
Person 1:
{
"person_id": 1,
"fullname": "John Snow",
"email": "john#gmail.com",
"phone": "111-11-11",
"country_of_birth": "Denmark"
}
Person 2:
{
"person_id": 2,
"fullname": "Snow John",
"email": "john#gmail.com",
"phone": "222-22-22",
"country_of_birth": "Denmark"
}
Person 3:
{
"person_id": 3,
"fullname": "Peter Wislow",
"email": "peter#gmail.com",
"phone": "111-11-11",
"country_of_birth": "Denmark"
}
We find persons that are similar to Person 1 by this query:
{
"query": {
"bool": {
"should": [
{
"match": {
"fullname": {
"query": "John Snow",
"boost": 6
}
}
},
{
"term": {
"email": {
"value": "john#gmail.com",
"boost": 5
}
}
},
{
"term": {
"phone": {
"value": "111-11-11",
"boost": 4
}
}
},
{
"term": {
"country_of_birth": {
"value": "Denmark",
"boost": 2
}
}
}
],
"must_not": [
{
"term": {
"person_id": 123
}
}
]
}
}
}
As you can see:
person 1 and person 2 match by: fullname, email, country of birth.
person 1 and person 3 match by: phone, country of birth.
Is it possible to have 0..1 scoring if we have order with full match in the index(person 1)?
I know there is a more_like_this query, but in real life search queries can be complicated so more_like_this is not a good option. Even elasticsearch documentation says that if you need more control over the query, then use boolean query combinations.
Have not tried but looks like field value factor of function score might solve your query.

Sort Elasticsearch results based on field value

Assuming I have 3 documents (users), and they have knowledge of multiple programming languages - with scores associated, as described below, how can I search for multiple fields (multi-match for example), and if some search-keywords hits a language, sort by its score?
// user1
{
"name": "John Bayes",
"prog_langs": [
{
"name": "python",
"score": 10
},
{
"name": "java",
"score": 500
}
]
}
// user2
{
"name": "John Russel",
"prog_langs": [
{
"name": "python",
"score": 100
},
{
"name": "PHP",
"score": 200
}
]
}
// user3
{
"name": "Terry Guy",
"prog_langs": [
{
"name": "C++",
"score": 600
},
{
"name": "Javascript",
"score": 200
}
]
}
For example: searching "John python"
Should return user1 and user2, but user2 showing up first
**I've been trying to use sort and functions, but I think they always use lowest/highest/average values of score.
Thanks!
[Edit]
**In the meantime I got it working in a testing way to see if without full-text/multi-matched works, and I found out I had to make "prog_langs" nested, so I changed the mapping and it works as expected.
Now I'm only missing the part where a full-text search with multi-match merges with current query.
Thanks again!
I managed to fix the query and now it's working as expected.
Before posting my solution, just have to leave a few things to keep in mind:
I made a new mapping, and added some nested objects, so my original query had to suffer some changes (prog_langs are now of type nested)
I wanted at least two fields to match, being mandatory which should match at least once
{
"query": {
"bool": {
"must": [
{
"query": {
"match": {
"name": {
"query": "john python",
"boost": 5
}
}
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "prog_langs",
"query": {
"match": {
"prog_langs.name": {
"query": "john python",
"boost": 5
}
}
}
}
}
]
}
}
],
"should": [
{
"function_score": {
"query": {
"match": {
"prog_langs.name": "john python"
}
},
"functions": [
{
"script_score": {
"script": "_score * (1 + doc['prog_langs.score'].value)"
}
}
]
}
}
]
}
},
"highlight": {
"fields": {
"name": {},
"prog_langs.name": {}
}
}
}

ElasticSearch - Fuzzy and strict match with multiple fields

We want to leverage ElasticSearch to find us similar objects.
Lets say I have an Object with 4 fields:
product_name, seller_name, seller_phone, platform_id.
Similar products can have different product names and seller names across different platforms (fuzzy match).
While, phone is strict and a single variation might cause yield a wrong record (strict match).
What were trying to create is a query that will:
Take into account all fields we have for current record and OR
between them.
Mandate platform_id is the one I want to specific look at. (AND)
Fuzzy the product_name and seller_name
Strictly match the phone number or ignore it in the OR between the fields.
If I would write it in pseudo code, I would write something like:
((product_name like 'some_product_name') OR (seller_name like
'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id
= 123)
To do exact match on seller_phone i am indexing this field without ngram analyzers along with fuzzy_query for product_name and seller_name
Mapping
PUT index111
{
"settings": {
"analysis": {
"analyzer": {
"edge_n_gram_analyzer": {
"tokenizer": "whitespace",
"filter" : ["lowercase", "ednge_gram_filter"]
}
},
"filter": {
"ednge_gram_filter" : {
"type" : "NGram",
"min_gram" : 2,
"max_gram": 10
}
}
}
},
"mappings": {
"document_type" : {
"properties": {
"product_name" : {
"type": "text",
"analyzer": "edge_n_gram_analyzer"
},
"seller_name" : {
"type": "text",
"analyzer": "edge_n_gram_analyzer"
},
"seller_phone" : {
"type": "text"
},
"platform_id" : {
"type": "text"
}
}
}
}
}
Index documents
POST index111/document_type
{
"product_name":"macbok",
"seller_name":"apple",
"seller_phone":"9988",
"platform_id":"123"
}
For following pseudo sql query
((product_name like 'some_product_name') OR (seller_name like 'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id = 123)
Elastic Query
POST index111/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"platform_id": {
"value": "123"
}
}
},
{
"bool": {
"should": [{
"fuzzy": {
"product_name": {
"value": "macbouk",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
{
"fuzzy": {
"seller_name": {
"value": "apdle",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
{
"term": {
"seller_phone": {
"value": "9988"
}
}
}
]
}
}]
}
}
}
Hope this helps

Exact and fuzzy search

My setup:
I have some documents with name "Apple", "Apple delicous", ...
This is my query:
GET p_index/_search
{
"query": {
"bool": {
"should": [
{"match": {
"name": "apple"
}},
{ "fuzzy": {
"name": "apple"
}}
]
}
}
}
I want achieve, that first the exact match is shown and then the fuzzy one:
apple
apple delicous
Second, i am wondering that i did not get any result if i enter only app in the search:
GET p_index/_search
{
"query": {
"bool": {
"should": [
{"match": {
"name": "app"
}},
{ "fuzzy": {
"name": "app"
}}
]
}
}
}
There are two problems here.
1)To give higher score to an exact match you could try adding "index" : "not_analyzed" to your name field like this.
name: {
type: 'string',
"fields": {
"raw": {
"type": "string",
"index" : "not_analyzed" <--- here
}
}
}
After that your query would look like this
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "apple"
}
},
{
"match": {
"name.raw": "apple"
},
"boost": 5
}
]
}
}
}
This will give higher score for document with "apple" than "apple delicous"
2)To better understand fuzziness you should go through this and this article.
From the Docs
The fuzziness parameter can be set to AUTO, which results in the
following maximum edit distances:
0 for strings of one or two characters
1 for strings of three, four, or five characters
2 for strings of more than five characters
So, the reason your fuzzy query did not return apple for app is because fuzziness i.e edit distance is 2 between those words and since "app" is only three letter word, fuzziness value is 1. You could achieve the desired result with following query
{
"query": {
"fuzzy": {
"name": {
"value": "app",
"fuzziness": 2
}
}
}
}
I seriously would not recommend using this query, because It will return bizarre results, the above query will return cap, arm, pip and lot of other words as they fall within edit distance of 2.
This would better query
{
"query": {
"fuzzy": {
"name": {
"value": "appl"
}
}
}
}
It will return apple.
I hope this helps.
I think ,This will help you.
{"query":{"bool":{"must":[{"function_score":{"query":{"multi_match":{"query":"airetl","fields":["brand_lower"],"boost":1,"fuzziness":Auto,"prefix_length":1}}}}}]}}

Elasticsearch search for a value across multiple fields

My purpose is to search for a value across multiple fields and return the count of these values ​​and the distinct value.
To do this I realized that I have to use the facets.
This is the database schema:
index:
analysis:
analyzer:
custom_search_analyzer:
type: custom
tokenizer: standard
filter : [standard, snowball, lowercase, asciifolding]
custom_index_analyzer:
type: custom
tokenizer: standard
filter : [standard, snowball, lowercase, asciifolding, custom_filter]
filter:
custom_filter:
type: edgeNGram
side: front
min_gram: 1
max_gram: 20
{
"structure": {
"properties": {
"name": {"type": "string", "search_analyzer": "custom_search_analyzer", "index_analyzer": "custom_index_analyzer"},
"locality": {"type": "string", "search_analyzer": "custom_search_analyzer", "index_analyzer": "custom_index_analyzer"},
"province": {"type": "string", "search_analyzer": "custom_search_analyzer", "index_analyzer": "custom_index_analyzer"},
"region": {"type": "string", "search_analyzer": "custom_search_analyzer", "index_analyzer": "custom_index_analyzer"}
}
}
}
and this is the query that I tried to use:
{
"query": {
"bool": {
"should": [
{
"match": {
"locality": "bolo"
}
},
{
"match": {
"region": "bolo"
}
},
{
"match": {
"name": "bolo"
}
}
]
}
},
"facets": {
"region": {
"query": {
"term": {
"region": "bolo"
}
}
},
"locality": {
"query": {
"term": {
"locality": "bolo"
}
}
},
"name": {
"query": {
"term": {
"name": "bolo"
}
}
}
}
}
Of all the tests I've done this is the query that is closest to my desired result, however, does not tell me the count of distinct field, I found it to count the total field.
For example, the above query returns the following result:
facets: {
region: {
_type: query
count: 0
}
locality: {
_type: query
count: 2
}
name: {
_type: query
count: 0
}
}
I would like to have a result like this (not so obviously written is correct, but does understand what I need):
facets: {
....
locality: {
_type: query
"terms": [
{"term": "Bologna", "count": 1},
{"term": "Bolognano", "count": 1}
]
}
How can I do?
I have already tried to use "terms" instead of "query" in the facets and put "index: not_analyzed" in the fields of research, but is only returned if I try the exact scope, not part of it!
This can be done using value count aggregation.
In value count aggregation , it provides you the number of unique terms.
While terms aggregation gives you the unique term and its document count.
I believe you are looking for the value count aggregation - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-valuecount-aggregation.html

Resources