Umlaut in Elastic Suggesters - elasticsearch

I am currently trying to set up a suggester similar to the google misspelling correction. I am using the Elastic Suggesters with the following query:
{
"query": {
"match": {
"name": "iphone hüle"
}
},
"suggest": {
"suggest_name": {
"text": "iphone hüle",
"term": {
"field": "name"
}
}
}
}
It results the following suggestions:
"suggest": {
"suggest_name": [
{
"text": "iphone",
"offset": 0,
"length": 6,
"options": []
},
{
"text": "hule",
"offset": 7,
"length": 4,
"options": [
{
"text": "hulle",
"score": 0.75,
"freq": 162
},
...
{
"text": "hulk",
"score": 0.75,
"freq": 38
}
]
}
]
}
Now the problem I have is in the returned text inside the options and inside the suggest. The text I submitted and the returned text should be "hüle" not "hule". Furthermore the returned option text should actually be "hülle" and not "hulle". As I use the same fields for the query and the suggester I wonder why the umlauts are only missing in the suggester and not in the regular query results.
See a query result here:
"_source": {
...
"name": "Ladegerät für iPhone",
"manufacturer": "Apple",
}

The data you get back in your query result, i.e.
"name": "Ladegerät für iPhone"
is the stored content of the field. It is exactly your source data. Search and obviously also the suggester, however, work on the inverted index, which contains tokens massaged by the analyzer. You are most likely using an analyzer that folds umlauts.
Strange enough I discussed this with a colleague yesterday. We came to the conclusion that we may need a separate field, indexed and not stored, into which we index the non-normalized tokens. We want to use it to fetch suggestion terms. In addition it may be a feature that we can perform exact searches on it, i.e. searches which do make a difference between Müller and Mueller, Foto and Photo, Rene and René.

Related

How can I add "context" to an ElasticSearch Query e.g. phrase prefix + fuzzy search

At the moment I'm doing a full text search of HTML documents using a multi_match query with phrase_prefix. The problem I'm trying to solve is that the phrases relate to places. E.g. "The Tower" or "Gold Casino". Now, the same place name may exist in several cities/locations. So I'd like to add "context" to the search by specifying a city name for instance. E.g. "Gold Casino New York".
I believe using just phrase_prefix won't work, unless I do something like increasing slop value, but then a search like "The Blue Tower New York" could be valid when I just want "The Tower" specifically.
Is there a way to combine phrase_prefix with a fuzzy search of sorts? Here is my current query for examination:
{
"_source": false,
"fields": [
{
"field": "source"
},
{
"field": "title"
}
],
"from": 0,
"highlight": {
"fragment_size": 200,
"number_of_fragments": 1,
"fields": {
"seoDescription": {},
"paragraphContent": {}
}
},
"query": {
"multi_match": {
"fields": [
"seoDescription^3",
"paragraphContent"
],
"query": "The Tower New York",
"type": "phrase_prefix"
}
},
"size": 20,
"track_total_hits": true
}

Match exactly query in Kibana demo

I am testing default demo of Kibana: Dashboard [eCommerce] Revenue Dashboard.
When I filter from [eCommerce] Controls, for example, setting the category to Men's Accessories I see other categories on [eCommerce] Sales by Category. How can I change that?
I see that the query is built like:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Accessories",
"type": "phrase"
}
}
}
}
So this translates to:
"query": {
"bool": {
"must": [
{
"match_phrase": {
"category.keyword": {
"query": "Men's Accessories"
}
}
},
How can I change this demo to show exactly the category that I selected?
Example screen:
EDIT:
I'm not looking for some silly solution - that work one time.
I want to show only one category - but with one filter - not one filter and for example three negation. If i change my category to another - put it simply i chose "women's shoes" i want to show only that category from applying only this one filter that i chosed from the dashboard - not some custom made filter by typing some words.
I want to make visualization that when applied shows excatly one category - not 4 like right now.
EDIT:
I created two documents (with brand new Men's TEST_NEW_CATEGORY) in Kibana Dev Tools section with this:
POST kibana_sample_data_ecommerce/_doc/
{
"category": [
"Men's TEST_NEW_CATEGORY"
],
"currency": "EUR",
"customer_first_name": "Youssef",
"customer_full_name": "Youssef Jensen",
"customer_gender": "MALE",
"customer_id": 31,
"customer_last_name": "Jensen",
"customer_phone": "",
"day_of_week": "Saturday",
"day_of_week_i": 5,
"email": "youssef#jensen-family.zzz",
"manufacturer": [
"Low Tide Media"
],
"order_date": "2019-05-15T23:45:36+00:00",
"order_id": 592109,
"products": [
{
"base_price": 49.99,
"discount_percentage": 0,
"quantity": 1,
"manufacturer": "Low Tide Media",
"tax_amount": 0,
"product_id": 12202,
"category": "Men's TEST_NEW_CATEGORY",
"sku": "ZO0396603966",
"taxless_price": 49.99,
"unit_discount_amount": 0,
"min_price": 26.49,
"_id": "sold_product_592109_12202",
"discount_amount": 0,
"created_on": "2016-12-31T23:45:36+00:00",
"product_name": "Moccasins - stone",
"price": 49.99,
"taxful_price": 49.99,
"base_unit_price": 49.99
},
{
"base_price": 28.99,
"discount_percentage": 0,
"quantity": 1,
"manufacturer": "Low Tide Media",
"tax_amount": 0,
"product_id": 15017,
"category": "Men's Clothing",
"sku": "ZO0452704527",
"taxless_price": 28.99,
"unit_discount_amount": 0,
"min_price": 13.63,
"_id": "sold_product_592109_15017",
"discount_amount": 0,
"created_on": "2016-12-31T23:45:36+00:00",
"product_name": "Jumper - off-white",
"price": 28.99,
"taxful_price": 28.99,
"base_unit_price": 28.99
}
],
"sku": [
"ZO0396603966",
"ZO0452704527"
],
"taxful_total_price": 78.98,
"taxless_total_price": 78.98,
"total_quantity": 2,
"total_unique_products": 2,
"type": "order",
"user": "youssef",
"geoip": {
"country_iso_code": "US",
"location": {
"lon": -74,
"lat": 40.8
},
"region_name": "New York",
"continent_name": "North America",
"city_name": "New York"
}
}
Then i clearly see that there is only one category when using standard filter button:
If you want to filter by a specific category, you would want to use filter instead of terms. Terms aggregation will be applied on all the values in the field. Also, notice that i disabled terms aggregation using toggle. I'm using only filter for the use case you mentioned
You can set it up like below:
You can also add multiple filters like below:
Another option is add a scripted field and visualize on those scripted fields, but this case is pretty straight forward i.e. from one field so that is not required. However, when collating data from several fields - scripted fields can be useful.
Updated Answer :
The demo is restrictive, you cannot change the data or play around with analyzers. But in reality you can filter out an array by exact match for a value you want and then apply the filter.
For now, if you filter the array using dev tools in kibana you will see it returns all documents where it matches your category - it may or may not have extra categories but it will always have what you are looking for.
hence, when you add a filter you still see other categories in visualization since thats how the data is.
This is how I solved it. I had to use 4 filters only to see Men's Clothing.
The image might be too small to recognize, so I will write them down.
The first filter :
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Clothing",
"type": "phrase"
}
}
}
}
The second filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Women's Accessories",
"type": "phrase"
}
}
}
}
The third filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Shoes",
"type": "phrase"
}
}
}
}
The fourth filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Accessories",
"type": "phrase"
}
}
}
}
I don't know why but if I view the query as DSL it doesn't show "is not" operator. so let me show you an image of "is not" operator.
Then, you will see all the dashboard only with one category like below.
Now I just realized you only wanted to see men's accessory not men's clothing. My poor eyesight. However, I hope you got a rough idea. If you want, I can do men's clothing. Let me know. Btw, I tried to do it with just one filter but somehow the graph wouldn't work.

Elastic search relevance search for parent-child relation types

I am working with elastic search and I need to build a relevance search query apart from my requirement but I am stuck on this.
Please see my insertion part, I have parent-child mapping for my records
PUT /myindex
{ "mappings": {
"my_registration": {},
"my_specialities": {
"_parent": {
"type": "my_registration"
}
} } }
PUT myindex/my_registration/100
{
"Pid": "100",
"Name": "name1",
"Age": "28" }
PUT myindex/my_registration/200
{
"Pid": "200",
"Name": "name2",
"Age": "28" }
PUT myindex/my_registration/300
{
"Pid": "300",
"Name": "name3",
"Age": "28" }
PUT myindex/my_specialities/1?parent=100
{ "Pid": "100", "speciality_name": "Orthopedic Surgeon"
}
PUT myindex/my_specialities/2?parent=200
{ "Pid": "200", "speciality_name": "Orthopedic"
}
PUT myindex/my_specialities/3?parent=300
{ "Pid": "300", "speciality_name": "Surgeon"
}
Please my scenario
1- I need to search speciality_name as Orthopedic Surgeon
2- I need to search speciality_name as Surgeon
3- I need to search speciality_name as Orthopedic
4- I need to search speciality_name as Orth etc
5- I need to search speciality_name as Orthop
6- I need to search speciality_name as Orthoepdic
See my bellow sample query, I am expecting to get the results for all above cases (need to return records with a relevance score, I am getting the null result for case 4 and 5).
/myindex/my_registration/_search
{ "query": { "bool": { "must": [{
"has_child": {
"type": "my_specialities",
"query": {
"fuzzy": {
"speciality_name": {
"value": "Orthop",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
"inner_hits": {}
} }] } } }
Please note that here I need to return the result form type my_registration but searching is applying in type my_specialities.
Any suggession please, Thanks in advance and sorry for lengthy question :)
You are fixing the fuzziness to 2, according to the documentation the fuzziness is the Levenshtein edit distance, that means that it will match terms that at maximum needs to change two letters to match the value in the query.
If you search for Orthop you need al least to add 4 characters to match the closest one (Orthopedic). Thats the reason why you are getting null. You can rise the fuzinness value or try leavig it as auto.

How to enable elasticsearch auto-complete return only matching word

I need to implement auto-complete but I m not sure about the exact strategy. For example I have the following product :
Highsound Smart Phone Watch for Android (Gray)
So I need when the user starts typing: "s", "sm", "smar" , the word "smart" or "smart watch" to come out rather than the whole phrase: Highsound Smart Phone Watch for Android (Gray)
I looked around how google, amazon etc. do it and they dont display the whole matching record, but rather they display either only the word ("smart") or a phrase ("smart watch").
Right now I enable the automcomplete in elasticsearch according to the following link, but it returns the whole name of the matching record.
https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_time_search_as_you_type.html
Any suggestions?
This is expected. You get back what is inside _source field. You can use highlighting to get back only the word that was matched.
{
"query": {
"match_phrase": {
"name": "sm"
}
},
"highlight": {
"fields": {
"name": {
"fragment_size": 1,
"number_of_fragments": 2
}
}
}
}
I have used number_of_fragments : 2 in case there are more than one word starting with sm. You can also change fragment size according to your needs. More on that.You will get something like this, then you can use highlight part for the frontend.
"hits": {
"total": 1,
"max_score": 0.6349302,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "5",
"_score": 0.6349302,
"_source": {
"name": "Highsound Smart Phone Watch for Android (Gray)"
},
"highlight": {
"name": [
" <em>Smart</em>"
]
}
}
]
}

Elastic search aggregation sum

Im using elasticsearch 1.0.2 and I want to perform a search on it using a query with aggregation functions like sum()
Suppose my single record data is something like that
{
"_index": "outboxpro",
"_type": "message",
"_id": "PAyEom_mRgytIxRUCdN0-w",
"_score": 4.5409594,
"_source": {
"team_id": "1bf5f3f968e36336c9164290171211f3",
"created_user": "1a9d05586a8dc3f29b4c8147997391f9",
"created_ip": "192.168.2.245",
"folder": 1,
"report": [
{
"networks": "ec466c09fd62993ade48c6c4bb8d2da7facebook",
"status": 2,
"info": "OK"
},
{
"networks": "bdc33d8ca941b8f00c2a4e046ba44761twitter",
"status": 2,
"info": "OK"
},
{
"networks": "ad2672a2361d10eacf8a05bd1b10d4d8linkedin",
"status": 5,
"info": "[unauthorized] Invalid or expired token."
}
]
}
}
Let's say I need to fetch the count of all success messages posted with status = 2 in report field. There will be many record in the collection. I want to take report of all success messages posted.
I have tried the following code
////////////// Edit
{
"size": 2000,
"query": {
"filtered": {
"query": {
"match": {
"team_id": {
"query": "1bf5f3f968e36336c9164290171211f3"
}
}
}
}
},
"aggs": {
"genders": {
"terms": {
"field": "report.status"
}
}
}
}
Please help me to find some solution. Am newbie in elastic search. Is there any other aggregation method to find this one ?. Your help i much appreciate.
Your script filter is slow on big data and doesn't use benefits of "indexing". Did you think about parent/child instead of nested? If you use parent/child - you could use aggregations natively and use calculate sum.
You will have to make use of nested mappings here. Do have a look at https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-mapping.html.
And then you will have to do aggregation on nested fields as in https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html.

Resources