Match exactly query in Kibana demo - elasticsearch

I am testing default demo of Kibana: Dashboard [eCommerce] Revenue Dashboard.
When I filter from [eCommerce] Controls, for example, setting the category to Men's Accessories I see other categories on [eCommerce] Sales by Category. How can I change that?
I see that the query is built like:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Accessories",
"type": "phrase"
}
}
}
}
So this translates to:
"query": {
"bool": {
"must": [
{
"match_phrase": {
"category.keyword": {
"query": "Men's Accessories"
}
}
},
How can I change this demo to show exactly the category that I selected?
Example screen:
EDIT:
I'm not looking for some silly solution - that work one time.
I want to show only one category - but with one filter - not one filter and for example three negation. If i change my category to another - put it simply i chose "women's shoes" i want to show only that category from applying only this one filter that i chosed from the dashboard - not some custom made filter by typing some words.
I want to make visualization that when applied shows excatly one category - not 4 like right now.
EDIT:
I created two documents (with brand new Men's TEST_NEW_CATEGORY) in Kibana Dev Tools section with this:
POST kibana_sample_data_ecommerce/_doc/
{
"category": [
"Men's TEST_NEW_CATEGORY"
],
"currency": "EUR",
"customer_first_name": "Youssef",
"customer_full_name": "Youssef Jensen",
"customer_gender": "MALE",
"customer_id": 31,
"customer_last_name": "Jensen",
"customer_phone": "",
"day_of_week": "Saturday",
"day_of_week_i": 5,
"email": "youssef#jensen-family.zzz",
"manufacturer": [
"Low Tide Media"
],
"order_date": "2019-05-15T23:45:36+00:00",
"order_id": 592109,
"products": [
{
"base_price": 49.99,
"discount_percentage": 0,
"quantity": 1,
"manufacturer": "Low Tide Media",
"tax_amount": 0,
"product_id": 12202,
"category": "Men's TEST_NEW_CATEGORY",
"sku": "ZO0396603966",
"taxless_price": 49.99,
"unit_discount_amount": 0,
"min_price": 26.49,
"_id": "sold_product_592109_12202",
"discount_amount": 0,
"created_on": "2016-12-31T23:45:36+00:00",
"product_name": "Moccasins - stone",
"price": 49.99,
"taxful_price": 49.99,
"base_unit_price": 49.99
},
{
"base_price": 28.99,
"discount_percentage": 0,
"quantity": 1,
"manufacturer": "Low Tide Media",
"tax_amount": 0,
"product_id": 15017,
"category": "Men's Clothing",
"sku": "ZO0452704527",
"taxless_price": 28.99,
"unit_discount_amount": 0,
"min_price": 13.63,
"_id": "sold_product_592109_15017",
"discount_amount": 0,
"created_on": "2016-12-31T23:45:36+00:00",
"product_name": "Jumper - off-white",
"price": 28.99,
"taxful_price": 28.99,
"base_unit_price": 28.99
}
],
"sku": [
"ZO0396603966",
"ZO0452704527"
],
"taxful_total_price": 78.98,
"taxless_total_price": 78.98,
"total_quantity": 2,
"total_unique_products": 2,
"type": "order",
"user": "youssef",
"geoip": {
"country_iso_code": "US",
"location": {
"lon": -74,
"lat": 40.8
},
"region_name": "New York",
"continent_name": "North America",
"city_name": "New York"
}
}
Then i clearly see that there is only one category when using standard filter button:

If you want to filter by a specific category, you would want to use filter instead of terms. Terms aggregation will be applied on all the values in the field. Also, notice that i disabled terms aggregation using toggle. I'm using only filter for the use case you mentioned
You can set it up like below:
You can also add multiple filters like below:
Another option is add a scripted field and visualize on those scripted fields, but this case is pretty straight forward i.e. from one field so that is not required. However, when collating data from several fields - scripted fields can be useful.
Updated Answer :
The demo is restrictive, you cannot change the data or play around with analyzers. But in reality you can filter out an array by exact match for a value you want and then apply the filter.
For now, if you filter the array using dev tools in kibana you will see it returns all documents where it matches your category - it may or may not have extra categories but it will always have what you are looking for.
hence, when you add a filter you still see other categories in visualization since thats how the data is.

This is how I solved it. I had to use 4 filters only to see Men's Clothing.
The image might be too small to recognize, so I will write them down.
The first filter :
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Clothing",
"type": "phrase"
}
}
}
}
The second filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Women's Accessories",
"type": "phrase"
}
}
}
}
The third filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Shoes",
"type": "phrase"
}
}
}
}
The fourth filter:
{
"query": {
"match": {
"category.keyword": {
"query": "Men's Accessories",
"type": "phrase"
}
}
}
}
I don't know why but if I view the query as DSL it doesn't show "is not" operator. so let me show you an image of "is not" operator.
Then, you will see all the dashboard only with one category like below.
Now I just realized you only wanted to see men's accessory not men's clothing. My poor eyesight. However, I hope you got a rough idea. If you want, I can do men's clothing. Let me know. Btw, I tried to do it with just one filter but somehow the graph wouldn't work.

Related

How can I add "context" to an ElasticSearch Query e.g. phrase prefix + fuzzy search

At the moment I'm doing a full text search of HTML documents using a multi_match query with phrase_prefix. The problem I'm trying to solve is that the phrases relate to places. E.g. "The Tower" or "Gold Casino". Now, the same place name may exist in several cities/locations. So I'd like to add "context" to the search by specifying a city name for instance. E.g. "Gold Casino New York".
I believe using just phrase_prefix won't work, unless I do something like increasing slop value, but then a search like "The Blue Tower New York" could be valid when I just want "The Tower" specifically.
Is there a way to combine phrase_prefix with a fuzzy search of sorts? Here is my current query for examination:
{
"_source": false,
"fields": [
{
"field": "source"
},
{
"field": "title"
}
],
"from": 0,
"highlight": {
"fragment_size": 200,
"number_of_fragments": 1,
"fields": {
"seoDescription": {},
"paragraphContent": {}
}
},
"query": {
"multi_match": {
"fields": [
"seoDescription^3",
"paragraphContent"
],
"query": "The Tower New York",
"type": "phrase_prefix"
}
},
"size": 20,
"track_total_hits": true
}

Search for two fields but only score once in Elasticsearch

Let's say I have these documents in Elasticsearch:
{
"display_name": "Jose Cummings",
"username": "josecummings"
},
{
"display_name": "Jose Ramirez",
"username": "elite_gamer"
},
{
"display_name": "Lance Abrams",
"username": "abrams1"
},
{
"display_name": "Steve Smith",
"username": "josesmose"
}
I want to run a "as you type" search for Jose that searches against both the display_name and the username fields, which I can do with this:
{
"query": {
"bool": {
"must": {
"multi_match": {
"fields": [
"display_name",
"username"
],
"query": "Jose",
"type": "bool_prefix",
"fuzziness": "AUTO",
"boost": 50
}
}
}
}
}
The issue here is that when I search for Jose, Jose Cummings gets 100 points while Jose Ramirez and Steve Smith only get 50 points, because it seems to sum the scores for the two fields. This essentially rewards a user for having the same display_name as username, which we do not want to happen.
Is there a way to only take the max score from the two fields? I've tried dozens of different combinations now using function_score, boost_mode/score_mode, constant_score, trying to do a should match with multiple match_bool_prefix queries, etc. Nothing I've tried seems to achieve this.
Try this:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": [
"display_name^50",
"username^50"
],
"query": "Jose",
"type": "bool_prefix",
"fuzziness": "AUTO",
"tie_breaker": 0.3
}
}
]
}
}
}
Notice the effects of the tie_breaker being set to 0.0 as opposed to 0<x<1 and x=1.
Also note that your bool_prefix
scoring behaves like most_fields, but using a match_bool_prefix query instead of a match query.
Perhaps you indeed want the fields to be prefixed w/ jose. But if the username is, say, cool_jose, it's going to get left out (unless you for example apply an other-than-standard analyzer)...

Elastic search relevance search for parent-child relation types

I am working with elastic search and I need to build a relevance search query apart from my requirement but I am stuck on this.
Please see my insertion part, I have parent-child mapping for my records
PUT /myindex
{ "mappings": {
"my_registration": {},
"my_specialities": {
"_parent": {
"type": "my_registration"
}
} } }
PUT myindex/my_registration/100
{
"Pid": "100",
"Name": "name1",
"Age": "28" }
PUT myindex/my_registration/200
{
"Pid": "200",
"Name": "name2",
"Age": "28" }
PUT myindex/my_registration/300
{
"Pid": "300",
"Name": "name3",
"Age": "28" }
PUT myindex/my_specialities/1?parent=100
{ "Pid": "100", "speciality_name": "Orthopedic Surgeon"
}
PUT myindex/my_specialities/2?parent=200
{ "Pid": "200", "speciality_name": "Orthopedic"
}
PUT myindex/my_specialities/3?parent=300
{ "Pid": "300", "speciality_name": "Surgeon"
}
Please my scenario
1- I need to search speciality_name as Orthopedic Surgeon
2- I need to search speciality_name as Surgeon
3- I need to search speciality_name as Orthopedic
4- I need to search speciality_name as Orth etc
5- I need to search speciality_name as Orthop
6- I need to search speciality_name as Orthoepdic
See my bellow sample query, I am expecting to get the results for all above cases (need to return records with a relevance score, I am getting the null result for case 4 and 5).
/myindex/my_registration/_search
{ "query": { "bool": { "must": [{
"has_child": {
"type": "my_specialities",
"query": {
"fuzzy": {
"speciality_name": {
"value": "Orthop",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
"inner_hits": {}
} }] } } }
Please note that here I need to return the result form type my_registration but searching is applying in type my_specialities.
Any suggession please, Thanks in advance and sorry for lengthy question :)
You are fixing the fuzziness to 2, according to the documentation the fuzziness is the Levenshtein edit distance, that means that it will match terms that at maximum needs to change two letters to match the value in the query.
If you search for Orthop you need al least to add 4 characters to match the closest one (Orthopedic). Thats the reason why you are getting null. You can rise the fuzinness value or try leavig it as auto.

What differs between post-filter and global aggregation for faceted search?

A common problem in search interfaces is that you want to return a selection of results,
but might want to return information about all documents. (e.g. I want to see all red shirts, but want to know what
other colors are available).
This is sometimes referred to as "faceted results", or
"faceted navigation". the example from the Elasticsearch reference is quite clear in explaining why / how, so
I've used this as a base for this question.
Summary / Question: It looks like I can use both a post-filter or a global aggregation for this. They both seem to
provide the exact same functionality in a different way. There might be advantages or disadvantages to them that I
don't see? If so, which should I use?
I have included a complete example below with some documents and a query with both types of method based on the example
in the reference guide.
Option 1: post-filter
see the example from the Elasticsearch reference
What we can do is have more results in our origional query, so we can aggregate 'on' those results, and afterwards
filter our actual results.
The example is quite clear in explaining it:
But perhaps you would also like to tell the user how many Gucci shirts are available in other colors. If you just add a terms aggregation on the color field, you will only get back the color red, because your query returns only red shirts by Gucci.
Instead, you want to include shirts of all colors during aggregation, then apply the colors filter only to the search results.
See for how this would look below in the example code.
An issue with this is that we cannot use caching. This is in the (not yet available for 5.1) elasticsearch guide warned about:
Performance consideration
Use a post_filter only if you need to differentially filter search results and aggregations. Sometimes people will use post_filter for regular searches.
Don’t do this! The nature of the post_filter means it runs after the query, so any performance benefit of filtering (such as caches) is lost completely.
The post_filter should be used only in combination with aggregations, and only when you need differential filtering.
There is however a different option:
Option 2: global aggregations
There is a way to do an aggregation that is not influenced by the search query.
So instead of getting a lot, aggregate on that, then filter, we just get our filtered results, but do aggregations on
everything. Take a look at the reference
We can get the exact same results. I did not read any warnings about caching for this, but it seems like in the end
we need to do about the same amount of work. So that maybe the only ommission.
It is a tiny bit more complicated because of the sub-aggregation we need (you can't have global and a filter on the
same 'level').
The only complaint I read about queries using this, is that you might have to repeat yourself if you need to do this
for several items. In the end we can generate most queries, so repeating oneself isn't that much of an issue for my usecase,
and I do not really consider this an issue on par with "can not use cache".
Question
It seems both functions are overlapping in the least, or possibly providing the exact same functionality. This baffles me.
Apart from that, I'd like to know if one or the other has an advantage I haven't seen, and if there is any best practice here?
Example
This is largely from the post-filter reference page, but I added the global filter query.
mapping and documents
PUT /shirts
{
"mappings": {
"item": {
"properties": {
"brand": { "type": "keyword"},
"color": { "type": "keyword"},
"model": { "type": "keyword"}
}
}
}
}
PUT /shirts/item/1?refresh
{
"brand": "gucci",
"color": "red",
"model": "slim"
}
PUT /shirts/item/2?refresh
{
"brand": "gucci",
"color": "blue",
"model": "slim"
}
PUT /shirts/item/3?refresh
{
"brand": "gucci",
"color": "red",
"model": "normal"
}
PUT /shirts/item/4?refresh
{
"brand": "gucci",
"color": "blue",
"model": "wide"
}
PUT /shirts/item/5?refresh
{
"brand": "nike",
"color": "blue",
"model": "wide"
}
PUT /shirts/item/6?refresh
{
"brand": "nike",
"color": "red",
"model": "wide"
}
We are now requesting all red gucci shirts (item 1 and 3), the types of shirts we have (slim and normal) for these 2 shirts,
and which colors gucci there are (red and blue).
First, a post filter: get all shirts, aggregate the models for red gucci shirts and the colors for gucci shirts (all colors),
and post-filter for red gucci shirts to show only those as results: (this is a bit different from the example, as we
try to get it as close to a clear application of postfilters as possilbe.)
GET /shirts/_search
{
"aggs": {
"colors_query": {
"filter": {
"term": {
"brand": "gucci"
}
},
"aggs": {
"colors": {
"terms": {
"field": "color"
}
}
}
},
"color_red": {
"filter": {
"bool": {
"filter": [
{
"term": {
"color": "red"
}
},
{
"term": {
"brand": "gucci"
}
}
]
}
},
"aggs": {
"models": {
"terms": {
"field": "model"
}
}
}
}
},
"post_filter": {
"bool": {
"filter": [
{
"term": {
"color": "red"
}
},
{
"term": {
"brand": "gucci"
}
}
]
}
}
}
We could also get all red gucci shirts (our origional query), and then do a global aggregation for the model (for all
red gucci shirts) and for color (for all gucci shirts).
GET /shirts/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "color": "red" }},
{ "term": { "brand": "gucci" }}
]
}
},
"aggregations": {
"color_red": {
"global": {},
"aggs": {
"sub_color_red": {
"filter": {
"bool": {
"filter": [
{ "term": { "color": "red" }},
{ "term": { "brand": "gucci" }}
]
}
},
"aggs": {
"keywords": {
"terms": {
"field": "model"
}
}
}
}
}
},
"colors": {
"global": {},
"aggs": {
"sub_colors": {
"filter": {
"bool": {
"filter": [
{ "term": { "brand": "gucci" }}
]
}
},
"aggs": {
"keywords": {
"terms": {
"field": "color"
}
}
}
}
}
}
}
}
Both will return the same information, the second one only differs because of the extra level introduced by the sub-aggregations. The second query looks a bit more complex, but I don't think this is very problematic. A real world query is generated by code, probably way more complex anyway and it should be a good query and if that means complicated, so be it.
The actual solution we used, while not a direct answer to the question, is basically "neither".
From this elastic blogpost we got the initial hint:
Occasionally, I see an over-complicated search where the goal is to do as much as possible in as few search requests as possible. These tend to have filters as late as possible, completely in contrary to the advise in Filter First. Do not be afraid to use multiple search requests to satisfy your information need. The multi-search API lets you send a batch of search requests.
Do not shoehorn everything into a single search request.
And that is basically what we are doing in above query: a big bunch of aggregations and some filtering.
Having them run in parallel proved to be much and much quicker. Have a look at the multi-search API
In both cases Elasticsearch will end up doing mostly the same thing. If I had to choose, I think I'd use the global aggregation, which might save you some overhead from having to feed two Lucene collectors at once.

Umlaut in Elastic Suggesters

I am currently trying to set up a suggester similar to the google misspelling correction. I am using the Elastic Suggesters with the following query:
{
"query": {
"match": {
"name": "iphone hüle"
}
},
"suggest": {
"suggest_name": {
"text": "iphone hüle",
"term": {
"field": "name"
}
}
}
}
It results the following suggestions:
"suggest": {
"suggest_name": [
{
"text": "iphone",
"offset": 0,
"length": 6,
"options": []
},
{
"text": "hule",
"offset": 7,
"length": 4,
"options": [
{
"text": "hulle",
"score": 0.75,
"freq": 162
},
...
{
"text": "hulk",
"score": 0.75,
"freq": 38
}
]
}
]
}
Now the problem I have is in the returned text inside the options and inside the suggest. The text I submitted and the returned text should be "hüle" not "hule". Furthermore the returned option text should actually be "hülle" and not "hulle". As I use the same fields for the query and the suggester I wonder why the umlauts are only missing in the suggester and not in the regular query results.
See a query result here:
"_source": {
...
"name": "Ladegerät für iPhone",
"manufacturer": "Apple",
}
The data you get back in your query result, i.e.
"name": "Ladegerät für iPhone"
is the stored content of the field. It is exactly your source data. Search and obviously also the suggester, however, work on the inverted index, which contains tokens massaged by the analyzer. You are most likely using an analyzer that folds umlauts.
Strange enough I discussed this with a colleague yesterday. We came to the conclusion that we may need a separate field, indexed and not stored, into which we index the non-normalized tokens. We want to use it to fetch suggestion terms. In addition it may be a feature that we can perform exact searches on it, i.e. searches which do make a difference between Müller and Mueller, Foto and Photo, Rene and René.

Resources