Elastic search relevance search for parent-child relation types - elasticsearch

I am working with elastic search and I need to build a relevance search query apart from my requirement but I am stuck on this.
Please see my insertion part, I have parent-child mapping for my records
PUT /myindex
{ "mappings": {
"my_registration": {},
"my_specialities": {
"_parent": {
"type": "my_registration"
}
} } }
PUT myindex/my_registration/100
{
"Pid": "100",
"Name": "name1",
"Age": "28" }
PUT myindex/my_registration/200
{
"Pid": "200",
"Name": "name2",
"Age": "28" }
PUT myindex/my_registration/300
{
"Pid": "300",
"Name": "name3",
"Age": "28" }
PUT myindex/my_specialities/1?parent=100
{ "Pid": "100", "speciality_name": "Orthopedic Surgeon"
}
PUT myindex/my_specialities/2?parent=200
{ "Pid": "200", "speciality_name": "Orthopedic"
}
PUT myindex/my_specialities/3?parent=300
{ "Pid": "300", "speciality_name": "Surgeon"
}
Please my scenario
1- I need to search speciality_name as Orthopedic Surgeon
2- I need to search speciality_name as Surgeon
3- I need to search speciality_name as Orthopedic
4- I need to search speciality_name as Orth etc
5- I need to search speciality_name as Orthop
6- I need to search speciality_name as Orthoepdic
See my bellow sample query, I am expecting to get the results for all above cases (need to return records with a relevance score, I am getting the null result for case 4 and 5).
/myindex/my_registration/_search
{ "query": { "bool": { "must": [{
"has_child": {
"type": "my_specialities",
"query": {
"fuzzy": {
"speciality_name": {
"value": "Orthop",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
"inner_hits": {}
} }] } } }
Please note that here I need to return the result form type my_registration but searching is applying in type my_specialities.
Any suggession please, Thanks in advance and sorry for lengthy question :)

You are fixing the fuzziness to 2, according to the documentation the fuzziness is the Levenshtein edit distance, that means that it will match terms that at maximum needs to change two letters to match the value in the query.
If you search for Orthop you need al least to add 4 characters to match the closest one (Orthopedic). Thats the reason why you are getting null. You can rise the fuzinness value or try leavig it as auto.

Related

How can I add "context" to an ElasticSearch Query e.g. phrase prefix + fuzzy search

At the moment I'm doing a full text search of HTML documents using a multi_match query with phrase_prefix. The problem I'm trying to solve is that the phrases relate to places. E.g. "The Tower" or "Gold Casino". Now, the same place name may exist in several cities/locations. So I'd like to add "context" to the search by specifying a city name for instance. E.g. "Gold Casino New York".
I believe using just phrase_prefix won't work, unless I do something like increasing slop value, but then a search like "The Blue Tower New York" could be valid when I just want "The Tower" specifically.
Is there a way to combine phrase_prefix with a fuzzy search of sorts? Here is my current query for examination:
{
"_source": false,
"fields": [
{
"field": "source"
},
{
"field": "title"
}
],
"from": 0,
"highlight": {
"fragment_size": 200,
"number_of_fragments": 1,
"fields": {
"seoDescription": {},
"paragraphContent": {}
}
},
"query": {
"multi_match": {
"fields": [
"seoDescription^3",
"paragraphContent"
],
"query": "The Tower New York",
"type": "phrase_prefix"
}
},
"size": 20,
"track_total_hits": true
}

Must match document with keyword if it occurs, must match other document if keyword doesn't occur

I'm looking for a query to return a document that "must match a document with keyword if keyword occurs in search phrase, or otherwise find another document that doesn't contain a keyword if that keyword doesn't occur in search phrase".
You can imagine an index with a store products that can be "regular" or "have something unusual" and when it's regular you don't add this to search phrase.
Like if we have this products:
"Nike T-Shirt" (attributes: [])
"Adidas T-Shirt" (attributes: ["collectible"])
If user searches for "t-shirt" we don't want him to find any collectible items. But when user searches for "collectible t-shirt" we want him to find only collectible items. There can be multiple of this kind of keywords.
Example:
I have some documents:
[
{
"id": 1,
"name": "First document",
"variants": ["red", "big"]
},
{
"id": 2,
"name": "Second document",
"variants": ["red"]
},
{
"id": 3,
"name": "Third entry",
"variants": ["green", "big"]
}
]
And I have a two search phrases that I convert to terms query:
With a keyword (big) occurrence:
{
"query": {
"bool" : {
"must": {
"match": {
"name": {
"query": "document"
}
}
},
"??? must or must_not ???" : {
"terms": {
"variants": ["some", "big", "search", "phrase"]
}
},
}
}
}
Without a keyword occurrence:
{
"query": {
"bool" : {
"must": {
"match": {
"name": {
"query": "document"
}
}
},
"??? must or must_not ???" : {
"terms": {
"variants": ["some", "search", "phrase"]
}
},
}
}
}
Now with first search I want Elasticsearch to return only documents id: 1 and 3 and for second search I want to return only document id: 2.
Using bool.must.terms.variants: ["some", "big", "search", "phrase"]
Would return one document I'm looking for, but using bool.must.terms.variants: ["some", "search", "phrase"] would return no documents.
On the other hand if I replace must with should I'd get both documents correctly ordered by score, but I must match only one document that follows the above rule.
Sorry, this may not answer your question. Since I cannot create comments yet, I'm posting this.
I dont think you can do that logic with "one" query. the logic that you describe is a two step logic.
Find records that matches the variants
If no records returned, find records that doesnt match the variants
You need the result of the first step to evaluate the second step.
As far as I understand, elasticsearch query is single step. the query is distributed to all shards holding the data, each shards will search independently and it will just return the result. i.e. it will not coordinate with other shards to check if other shards have matches.
Maybe you can try something with Aggregate.
As #dna01 mentioned, you need to send two consequent requests: the first one to find documents that match the keyword, then if nothing found the second one to find documents that don't match the keyword.
You can omit extra latency added by second request by utilizing Multi Search API
Just send two searches in a single request.
Request body example (let request be "some big search phrase" and keyword "big").
{ }
{ "query": { "bool": {"must": [{"match": {"name": "document" }}, {"terms": {"variants": ["some", "big", "search", "phrase"]}}] } } }
{ }
{ "query": { "bool": {"must": [{"match": {"name": "document" }}, {"terms": {"variants": ["some", "big", "search", "phrase"]}}], "must_not": [{"terms": {"variants": ["big"]}}] } } }

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

Match exactly query in Kibana demo

I am testing default demo of Kibana: Dashboard [eCommerce] Revenue Dashboard.
When I filter from [eCommerce] Controls, for example, setting the category to Men's Accessories I see other categories on [eCommerce] Sales by Category. How can I change that?
I see that the query is built like:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Accessories",
"type": "phrase"
}
}
}
}
So this translates to:
"query": {
"bool": {
"must": [
{
"match_phrase": {
"category.keyword": {
"query": "Men's Accessories"
}
}
},
How can I change this demo to show exactly the category that I selected?
Example screen:
EDIT:
I'm not looking for some silly solution - that work one time.
I want to show only one category - but with one filter - not one filter and for example three negation. If i change my category to another - put it simply i chose "women's shoes" i want to show only that category from applying only this one filter that i chosed from the dashboard - not some custom made filter by typing some words.
I want to make visualization that when applied shows excatly one category - not 4 like right now.
EDIT:
I created two documents (with brand new Men's TEST_NEW_CATEGORY) in Kibana Dev Tools section with this:
POST kibana_sample_data_ecommerce/_doc/
{
"category": [
"Men's TEST_NEW_CATEGORY"
],
"currency": "EUR",
"customer_first_name": "Youssef",
"customer_full_name": "Youssef Jensen",
"customer_gender": "MALE",
"customer_id": 31,
"customer_last_name": "Jensen",
"customer_phone": "",
"day_of_week": "Saturday",
"day_of_week_i": 5,
"email": "youssef#jensen-family.zzz",
"manufacturer": [
"Low Tide Media"
],
"order_date": "2019-05-15T23:45:36+00:00",
"order_id": 592109,
"products": [
{
"base_price": 49.99,
"discount_percentage": 0,
"quantity": 1,
"manufacturer": "Low Tide Media",
"tax_amount": 0,
"product_id": 12202,
"category": "Men's TEST_NEW_CATEGORY",
"sku": "ZO0396603966",
"taxless_price": 49.99,
"unit_discount_amount": 0,
"min_price": 26.49,
"_id": "sold_product_592109_12202",
"discount_amount": 0,
"created_on": "2016-12-31T23:45:36+00:00",
"product_name": "Moccasins - stone",
"price": 49.99,
"taxful_price": 49.99,
"base_unit_price": 49.99
},
{
"base_price": 28.99,
"discount_percentage": 0,
"quantity": 1,
"manufacturer": "Low Tide Media",
"tax_amount": 0,
"product_id": 15017,
"category": "Men's Clothing",
"sku": "ZO0452704527",
"taxless_price": 28.99,
"unit_discount_amount": 0,
"min_price": 13.63,
"_id": "sold_product_592109_15017",
"discount_amount": 0,
"created_on": "2016-12-31T23:45:36+00:00",
"product_name": "Jumper - off-white",
"price": 28.99,
"taxful_price": 28.99,
"base_unit_price": 28.99
}
],
"sku": [
"ZO0396603966",
"ZO0452704527"
],
"taxful_total_price": 78.98,
"taxless_total_price": 78.98,
"total_quantity": 2,
"total_unique_products": 2,
"type": "order",
"user": "youssef",
"geoip": {
"country_iso_code": "US",
"location": {
"lon": -74,
"lat": 40.8
},
"region_name": "New York",
"continent_name": "North America",
"city_name": "New York"
}
}
Then i clearly see that there is only one category when using standard filter button:
If you want to filter by a specific category, you would want to use filter instead of terms. Terms aggregation will be applied on all the values in the field. Also, notice that i disabled terms aggregation using toggle. I'm using only filter for the use case you mentioned
You can set it up like below:
You can also add multiple filters like below:
Another option is add a scripted field and visualize on those scripted fields, but this case is pretty straight forward i.e. from one field so that is not required. However, when collating data from several fields - scripted fields can be useful.
Updated Answer :
The demo is restrictive, you cannot change the data or play around with analyzers. But in reality you can filter out an array by exact match for a value you want and then apply the filter.
For now, if you filter the array using dev tools in kibana you will see it returns all documents where it matches your category - it may or may not have extra categories but it will always have what you are looking for.
hence, when you add a filter you still see other categories in visualization since thats how the data is.
This is how I solved it. I had to use 4 filters only to see Men's Clothing.
The image might be too small to recognize, so I will write them down.
The first filter :
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Clothing",
"type": "phrase"
}
}
}
}
The second filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Women's Accessories",
"type": "phrase"
}
}
}
}
The third filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Shoes",
"type": "phrase"
}
}
}
}
The fourth filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Accessories",
"type": "phrase"
}
}
}
}
I don't know why but if I view the query as DSL it doesn't show "is not" operator. so let me show you an image of "is not" operator.
Then, you will see all the dashboard only with one category like below.
Now I just realized you only wanted to see men's accessory not men's clothing. My poor eyesight. However, I hope you got a rough idea. If you want, I can do men's clothing. Let me know. Btw, I tried to do it with just one filter but somehow the graph wouldn't work.

Umlaut in Elastic Suggesters

I am currently trying to set up a suggester similar to the google misspelling correction. I am using the Elastic Suggesters with the following query:
{
"query": {
"match": {
"name": "iphone hüle"
}
},
"suggest": {
"suggest_name": {
"text": "iphone hüle",
"term": {
"field": "name"
}
}
}
}
It results the following suggestions:
"suggest": {
"suggest_name": [
{
"text": "iphone",
"offset": 0,
"length": 6,
"options": []
},
{
"text": "hule",
"offset": 7,
"length": 4,
"options": [
{
"text": "hulle",
"score": 0.75,
"freq": 162
},
...
{
"text": "hulk",
"score": 0.75,
"freq": 38
}
]
}
]
}
Now the problem I have is in the returned text inside the options and inside the suggest. The text I submitted and the returned text should be "hüle" not "hule". Furthermore the returned option text should actually be "hülle" and not "hulle". As I use the same fields for the query and the suggester I wonder why the umlauts are only missing in the suggester and not in the regular query results.
See a query result here:
"_source": {
...
"name": "Ladegerät für iPhone",
"manufacturer": "Apple",
}
The data you get back in your query result, i.e.
"name": "Ladegerät für iPhone"
is the stored content of the field. It is exactly your source data. Search and obviously also the suggester, however, work on the inverted index, which contains tokens massaged by the analyzer. You are most likely using an analyzer that folds umlauts.
Strange enough I discussed this with a colleague yesterday. We came to the conclusion that we may need a separate field, indexed and not stored, into which we index the non-normalized tokens. We want to use it to fetch suggestion terms. In addition it may be a feature that we can perform exact searches on it, i.e. searches which do make a difference between Müller and Mueller, Foto and Photo, Rene and René.

Resources