Elasticsearch case-insensitive partial match over multiple fields - elasticsearch

I'm implementing a search box in Elasticsearch and I have an Elasticsearch index with the following mappings:
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"brand": {
"type": "text"
}
}
}
}
And I'd like, quite simply, to do a query such as (in SQL):
SELECT * FROM <table> WHERE brand ILIKE '%test%' OR name ILIKE '%test%';
I've tried a query such as:
{
"query": {
"query_string": {
"query": "*test*",
"fields": ["brand", "name"]
}
}
}
and that gives me my desired result, however, I've noticed that the docs recommend not using query_string for a search box as it can lead to performance issues.
I then tried a multi_match query:
{
"query": {
"multi_match" : {
"query": "test"
}
}
}
But that yielded no results. Further, when I used an ngram tokenizer, it returned all documents all the time.
I've consulted countless resources on this and even on StackOverflow there are countless unanswered questions regarding this topic. Could somebody explain how this is achieved in the Elasticsearch world, or am I simply using the wrong tool for the job? Thanks.

Since you have not provided the sample documents, I have created complete example, what you are trying to do is very much possible in Elasticsearch, with simple boolean should wildcard queries as shown below
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.keyword": {
"value": "*test*"
}
}
},
{
"wildcard": {
"brand.keyword": {
"value": "*test*"
}
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
You can test above query on below sample documents
{
"brand" : "test",
"name" : "name foo according to use"
}
{
"brand" : "barand name is foo",
"name" : "name foo according to use"
}
{
"brand" : "barand name is test",
"name" : "name tested according to use"
}
{
"brand" : "barand name is testing",
"name" : "test the name"
}
on above 4 sample documents, query returns below documents
"hits": [
{
"_index": "73885469",
"_id": "1",
"_score": 2.0,
"_source": {
"brand": "barand name is testing",
"name": "test the name"
}
},
{
"_index": "73885469",
"_id": "2",
"_score": 2.0,
"_source": {
"brand": "barand name is test",
"name": "name tested according to use"
}
},
{
"_index": "73885469",
"_id": "4",
"_score": 1.0,
"_source": {
"brand": "test",
"name": "name foo according to use"
}
}
]
Which is i believe your expected documents

Related

Set elasticsearch match precendence

So if a user search with the word covid, i want all those results first, where title of the sentence starts with word covid and then I want all those items where in other parts of the title have the word have word covid. How can I achieve this?
I want more specific answer, how to do that with searchkick.
If you are using the default mapping, You can use the bool query with two should clause, one with match on the text and another is prefix query on .keyword subfield as shown in below example.
Index sample documents
{
"name" : "foo bar"
}
{
"name" : "bar foo"
}
Search query
{
"query": {
"bool": {
"should": [
{
"match": {
"name" : "foo"
}
},
{
"prefix": {
"name.keyword": "foo"
}
}
]
}
}
}
Search results
"hits": [
{
"_index": "71998426",
"_id": "1",
"_score": 1.1823215,
"_source": {
"name": "foo bar"
}
},
{
"_index": "71998426",
"_id": "2",
"_score": 0.18232156,
"_source": {
"name": "bar foo"
}
}
]
Note: first result, having foo bar is scored much higher and comes first in the search hits.
you can even use boost i guess little modification to #Amit code
"query": {
"bool": {
"should": [
{
"match": {
"name" : "covid",
"boost" : 0.5
}
},
{
"prefix": {
"name.keyword": "foo",
"boost" : 1.0
}
}
]
}
}
}```

How to change the order of search results on Elastic Search?

I am getting results from following Elastic Search query:
"query": {
"bool": {
"should": [
{"match_phrase_prefix": {"title": keyword}},
{"match_phrase_prefix": {"second_title": keyword}}
]
}
}
The result is good, but I want to change the order of the result so that the results with matching title comes top.
Any help would be appreciated!!!
I was able to reproduce the issue with sample data and My solution is using a query time boost, as index time boost is deprecated from the Major version of ES 5.
Also, I've created sample data in such a manner, that without boost both the sample data will have a same score, hence there is no guarantee that one which has match comes first in the search result, this should help you understand it better.
1. Index Mapping
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"second_title" :{
"type" :"text"
}
}
}
}
2. Index Sample docs
a)
{
"title": "opster",
"second_title" : "Dimitry"
}
b)
{
"title": "Dimitry",
"second_title" : "opster"
}
Search query
{
"query": {
"bool": {
"should": [
{
"match_phrase_prefix": {
"title": {
"query" : "dimitry",
"boost" : 2.0 <-- Notice the boost in `title` field
}
}
},
{
"match_phrase_prefix": {
"second_title": {
"query" : "dimitry"
}
}
}
]
}
}
}
Output
"hits": [
{
"_index": "60454337",
"_type": "_doc",
"_id": "1",
"_score": 1.3862944,
"_source": {
"title": "Dimitry", <-- Dimitry in title field has doube score
"second_title": "opster"
}
},
{
"_index": "60454337",
"_type": "_doc",
"_id": "2",
"_score": 0.6931472,
"_source": {
"title": "opster",
"second_title": "Dimitry"
}
}
]
Let me know if you have any doubt understanding it.

elasticsearch - get intermediate scores within 'function_score'

Here's my index
POST /blogs/1
{
"name" : "learn java",
"popularity" : 100
}
POST /blogs/2
{
"name" : "learn elasticsearch",
"popularity" : 10
}
My search query:
GET /blogs/_search
{
"query": {
"function_score": {
"query": {
"match": {
"name": "learn"
}
},
"script_score": {
"script": {
"source": "_score*(1+Math.log(1+doc['popularity'].value))"
}
}
}
}
}
which returns:
[
{
"_index": "blogs",
"_type": "1",
"_id": "AW5fxnperVbDy5wjSDBC",
"_score": 0.58024323,
"_source": {
"name": "learn elastic search",
"popularity": 100
}
},
{
"_index": "blogs",
"_type": "1",
"_id": "AW5fxqmL8cCMCxtBYOyC",
"_score": 0.43638366,
"_source": {
"name": "learn java",
"popularity": 10
}
}
]
Problem: I need to return an extra field in results which would give me raw score (just tf/idf which doesn't take popularity into account)
Things I have explored: script_fields (which doesn't give access to _score at fetch time.
The problem is in the way you are querying, which over-writes the _score variable. Instead if you use sort then _score isn't changed and can be pulled up within the same query.
You can try querying this way :
{
"query": {
"match": {
"name": "learn"
}
},
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "_score*(1+Math.log(1+doc['popularity'].value))"
},
"order": "desc"
}
},
"_score"
]
}

Name searching in ElasticSearch

I have a index created in ElasticSearch with the field name where I store the whole name of a person: Name and Surname. I want to perform full text search over that field so I have indexed it using the analyzer.
My issue now is that if I search:
"John Rham Rham"
And in the index I had "John Rham Rham Luck", that value has higher score than "John Rham Rham".
Is there any posibility to have better score on the exact field than in the field with more values in the string?
Thanks in advance!
I worked out a small example (assuming you're running on ES 5.x cause of the difference in scoring):
DELETE test
PUT test
{
"settings": {
"similarity": {
"my_bm25": {
"type": "BM25",
"b": 0
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "text",
"similarity": "my_bm25",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
}
POST test/test/1
{
"name": "John Rham Rham"
}
POST test/test/2
{
"name": "John Rham Rham Luck"
}
GET test/_search
{
"query": {
"function_score": {
"query": {
"match": {
"name": {
"query": "John Rham Rham",
"operator": "and"
}
}
},
"functions": [
{
"script_score": {
"script": "_score / doc['name.length'].getValue()"
}
}
]
}
}
}
This code does the following:
Replace the default BM25 implementation with a custom one, tweaking the B parameter (field length normalisation)
-- You could also change the similarity to 'classic' to go back to TF/IDF which doesn't have this normilisation
Create an inner field for your name field, which counts the number of tokens inside your name field.
Update the score according to the length of the token
This will result in:
"hits": {
"total": 2,
"max_score": 0.3596026,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.3596026,
"_source": {
"name": "John Rham Rham"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.26970196,
"_source": {
"name": "John Rham Rham Luck"
}
}
]
}
}
Not sure if this is the best way of doing it, but it maybe point you in the right direction :)

Elasticsearch exact match of specific field(s)

I'm trying to filter my elasticsearch index by specific fields, the "country" field to be exact. However, I keep getting loads of other results (other countries) back that are not exact.
Please could someone point me in the right direction.
I've tried the following searches:
GET http://127.0.0.1:9200/decision/council/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"country": "Algeria"
}
}
}
}
}
Here is an example document:
{
"_index": "decision",
"_id": "54290140ec882c6dac5ae9dd",
"_score": 1,
"_type": "council",
"_source": {
"document": "DEV DOCUMENT"
"id": "54290140ec882c6dac5ae9dd",
"date_updated": 1396448966,
"pdf_file": null,
"reported": true,
"date_submitted": 1375894031,
"doc_file": null,
"country": "Algeria"
}
}
You can use the match_phrase query instead
POST http://127.0.0.1:9200/decision/council/_search
{
"query" : {
"match_phrase" : { "country" : "Algeria"}
}
}

Resources