how can I sort ElasticSearch result properly - elasticsearch

For example I have the following two documents with fields Id and Name(the name field is analyzed):
1,jack-in-box
2,box
When my query was "box", I got the both documents, but actually I only wanna the document 2, or getting document 2 above of document 1.
How can I query this please.
I know that the doc1 was tokenized to jack,in and box, so when I search box I would get the doc1. My current solution is creating another field called name_not_analyzed and it is not analyzed. But I have been wondering if we have the best way via query to solve this such I don't have to reindex. Thanks in advance!

As #jgr pointed out in comment doc2 should be above doc1 by default unless you have your own ranking algorithm or if you are using constant score query or if you are only using filter which would give score of 1 to all documents
Now if you only want doc2, you could use scripting
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source.name.toLowerCase()=='box'"
}
}
}
}
}
I am accessing the source itself to check against, also using lowercase to match BOX, Box etc.
Hope this helps!!

Related

Kibana Regex check if a field Value contains another field value

I'm trying to search for documents in which a description field contains the value of a name field (from another document). I tried to do a Regex query as following :
GET inventory-full-index/_search
{
"query": {
"regexp": {
"description.description_data.value.keyword": ".*doc['name.keyword'].*"
}
}
}
It returns me interesting documents, that fit my need. The problem is that i created a document that contains "python3" in the description, and I made sure there was a document named "python3" as well. This query doesn't return this document, so i obviously missed something.
Any idea how to fix this ?

Query to see if a field contains a string using Query DSL

I am trying to filter Kibana for a field that contains the string "pH". The field is called extra.monitor_value_name. Examples of potential values are Temperature_ABC01, DO_ABC01, or pH_ABC01.
Kibana's Elasticsearch Query DSL does not seem to have a "contains string" so I need to custom make a query.
I am new to Query DSL, can you help me create the query?
Also, is it proper to call it Query DSL? I'm not even sure of proper wording.
Okay! Circling back with an answer to my own question.
My initial problem stemmed from not knowing about field_name vs field_name.keyword. Read here for info on keyword here: What's the difference between the 'field' and 'field.keyword' fields in Kibana?
Solution 1
Here's the query I ended up using. I used a regexp query. I found this article useful in figuring out syntax for the regexp:
{
"query": {
"regexp": {
"extra.monitor_value_name.keyword": "pH.*"
}
}
}
Solution 2
Another way I could have filtered, without Query DSL was typing in a search field: extra.monitor_value_name.keyword:pH*.
One interesting thing to note was the .keyword doesn't seem to be necessary with this method. I am not sure why.
try this in filter using Elasticsearch Query DSL:
{
"query": {
"wildcard": {
"extra.monitor_value_name": {
"value": "pH.*"
}
}
}
}

Getting aggregated results with selected facet

Not sure if this is possible, but I'm running into the current issue:
While being on the page, without any facet selected I run a query with some aggregations on my facets.
For example: on the "ladies shoes" page I run a query with "gender=ladies" and category "shoes" as filter, which gives me all the wanted results. Also there is an aggregation on "brand" which returns me all the brands. However, this also contains brands with a count of 0, since they don't match the "ladies shoes" criteria. But since no facet is selected, I can simply hide them, so the user won't see them.
So far, so good.
Now, when I run a query for "ladies shoes from Nike" (brand=nike as filter), I get the same list of aggregations, but now all the brands have a count of 0, except Nike. Now, it's hard to just hide them, since we want to offer the possibility to filter on multiple (available) brands.
What should be the best approach to this, with as less queries as possible?
When you're talking about multi select faceting as in your example - there is a very handy feature in the Elasticsearch - post_filter
The post_filter is applied to the search hits at the very end of a
search request, after aggregations have already been calculated.
All you need to do, is to move your Nike brand filter to the post_filter of the query like this:
{
"query": {
...
},
"aggs": {
...
},
"post_filter": {
"term": { "brand": "Nike" }
}
}
which would allow you to calculate aggregations on all brands and only after it filter out selected brand.

Not able to understand this Elasticsearch query

{
"query": {
"nested": {
"path": "product_vendors",
"query": {
"bool" :{
"must" : {
"bool" : {
"should" : [
{ "terms": {"product_vendors.manufacturer_style":["FSS235D-26","SG463-1128-5","SG463-2879-4"]}},
{ "terms": {"product_vendors.id":["71320"]}}
]
}
}
}
}
}
}
}
I have above elastic query, not able to understand this. Would anyone please explain what it means and what documents it will return?
Update : #christinabo , i tried your query , and results returned , but here some small issues , apart from the matched documents , two more additional documents are returning in those documents only vendor_id is matching , may i know why two extra unmatched documents are returning , do we need to some attribute or something to make sure strict search and return is allowed , can please suggest on this .
By observing the query, I can understand that there is a nested object in the data. I can imagine that it has this structure:
product_vendors: {
'id': 'the_id',
'manufacturer_style': 'some style'
}
In order to query a nested object, you need a nested query. This is why you have the nested keyword there. In a nested query, you need to specify the path (product_vendors) that leads to the embedded fields (id, manufacturer_style).
Then, the query defines a bool query with the must keyword, which means that the query which follows must appear in matching documents. In this case, what it must appear is another bool query, defined with the should keyword. This contains two terms sub-queries (one for manufacturer_style and one for id) and means that the matching documents should match one or two of them. Each sub-query queries the embedded field by specifying the whole route of the nested object, using the dot (i.e. product_vendors.manufacturer_style).
I would expect the query to return you the documents that match at least one of the terms queries, with the documents that match both to have higher score.
I hope that this explanation gives you an overall idea of this query.
More about bool queries from the documentation here.

Exclude setting on integer field in term query

My documents contain an integer array field, storing the id of tags describing them. Given a specific tag id, I want to extract a list of top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id field to a term filter over the same field, but the list I get back obviously always starts with the album id I provide: all documents matching my filter have that tag, and it is thus the first in the list.
I though of using the exclude field to avoid creating the problematic bucket, but as I'm dealing with an integer field, that seems not to be possible: this query
{
"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}
returns an error saying that Aggregation [tags] cannot support the include/exclude settings as it can only be applied to string values.
Is it possible to avoid getting back this bucket?
This is, as of Elasticsearch 1.4, a shortcoming of ES itself.
After the community proposed this change, the functionality has been added and will be included in Elasticsearch 1.5.0.
It's supposed to be fixed since version 1.5.0.
Look at this: https://github.com/elasticsearch/elasticsearch/pull/7727
While it is enroute to being fixed: My workaround is to have the aggregation use a script instead of direct access to the field, and let that script use the value as string.
Works well and without measurable performance loss.

Resources