How to alphabetically sort a name field in elasticsearch? - sorting

I have the following definition for the name field:
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
How do i sort this alphabetically ?
I am trying the following
{
"query": {
"match": {
"name": "Shoes"
}
},
"sort": {
"keyword": {
"order": "asc"
}
},
"size": 10,
"from": 0,
"sort": []
}
But the sort somehow does't seem to be doing anything at all.
I have referred to these threads but they does't seem to help in my case
Thread 1
Thread 2

You need to refer to the name.keyword subfield and also you have two different sort sections in your search, remove the empty one:
{
"query": {
"match": {
"name": "Shoes"
}
},
"sort": {
"name.keyword": { <---- change this
"order": "asc"
}
},
"size": 10,
"from": 0,
"sort": [] <---- remove this
}

Related

How to do sorting on a field with composite aggregation in elastic search

How to do sorting on a field with composite aggregation in elastic search.
We are using elastic search version 6.8.6 and trying to achieve sorting on a field with composite aggregation.
But we are not able to get expected results with aggregation.
This is our mapping
{
"properties": {
"department": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256.0,
"type": "keyword"
}
}
},
"project": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256.0,
"type": "keyword"
}
}
},
"billingUnit": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256.0,
"type": "keyword"
}
}
},
"billingType": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256.0,
"type": "keyword"
}
}
},
"application": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256.0,
"type": "keyword"
}
}
},
"environmet": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256.0,
"type": "keyword"
}
}
},
"cost": {
"type": "float"
}
}
}
By using the following query we are not able to do sorting, The results are not in alphabetical orders :
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"department": {
"query": "HR",
"slop": 0,
"zero_terms_query": "NONE",
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"sort": [
{
"project.keyword": {
"order": "desc"
}
}
],
"aggs": {
"TERM_RANGE": {
"composite": {
"size": 10000,
"sources": [
{
"billingUnitKey": {
"terms": {
"field": "billingUnit.keyword",
"missing_bucket": false
}
}
},
{
"billingTypeKey": {
"terms": {
"field": "billingType.keyword",
"missing_bucket": false
}
}
}
]
},
"aggregations": {
"TOTAL": {
"sum": {
"field": "cost"
}
},
"dataHits": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"_source": {
"includes": [
"application.keyword",
"environmet.keyword",
],
"excludes": []
},
"docvalue_fields": [
{
"field": "application.keyword"
},
{
"field": "environmet.keyword"
}
]
}
},
"paginate_bucket": {
"bucket_sort": {
"sort": [],
"from": 0,
"size": 100,
"gap_policy": "SKIP"
}
}
}
}
}
}
Sorting is working fine with following query without aggregation
{
"query": {
"match": {
"department": "HR"
}
},
"size": 100,
"sort": [
{
"project.keyword": {
"order": "desc"
}
}
]
}
You should use order key of composite aggregation
https://www.elastic.co/guide/en/elasticsearch/reference/7.8/search-aggregations-bucket-composite-aggregation.html#_order

Nested object aggregation term with mixed nested/non-nested filter

We have facets showing the number of results that will show when clicking the filters (and combining them). Something like this:
Before we introduced nested objects, the following would do the job:
GET /x_v1/_search/
{
"size": 0,
"aggs": {
"FilteredDescriptiveFeatures": {
"filter": {
"bool": {
"must": [
{
"terms": {
"breadcrumbs.categoryIds": [
"category"
]
}
},
{
"terms": {
"products.sterile": [
"0"
]
}
}
]
}
},
"aggs": {
"DescriptiveFeatures": {
"terms": {
"field": "products.descriptiveFeatures",
"size": 1000
}
}
}
}
}
}
This gives the result:
"aggregations": {
"FilteredDescriptiveFeatures": {
"doc_count": 280,
"DescriptiveFeatures": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "somekey",
"doc_count": 42
},
We needed to make products a nested object though, and I'm currently trying rewrite the above to work with this change.
My attempt looks like the following. It doesn't give the correct result though, and doesn't seem properly connected to the filter.
GET /x_v2/_search/
{
"size": 0,
"aggs": {
"FilteredDescriptiveFeatures": {
"filter": {
"bool": {
"must": [
{
"terms": {
"breadcrumbs.categoryIds": [
"category"
]
}
},
{
"nested": {
"path": "products",
"query": {
"terms": {
"products.sterile": [
"0"
]
}
}
}
}
]
}
},
"aggs": {
"nested": {
"nested": {
"path": "products"
},
"aggregations": {
"DescriptiveFeatures": {
"terms": {
"field": "products.descriptiveFeatures",
"size": 1000
}
}
}
}
}
}
}
}
This gives the result:
"aggregations": {
"FilteredDescriptiveFeatures": {
"doc_count": 280,
"nested": {
"doc_count": 1437,
"DescriptiveFeatures": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "somekey",
"doc_count": 164
},
I've also tried to put the nested definition higher up to contain both the filter and aggs, but then the filter term breadcrumbs.categoryId, which is not in the nested object, won't work.
Is what I'm trying to do even possible?
And how can it be solved?
In your FilteredDescriptiveFeatures step, you return all documents that have one product with sterile = 0
But after in the nested step you dont specify again this filter. So all nested products are return in this step, thus you make your terms aggregations on all products, not only products with sterile = 0
You should move your sterile filter in the nested step. And like Richa points out, you need to use a reverse_nested aggregation in the final step to count elasticsearch document and not nested products sub-documents.
Could you try this query ?
{
"size": 0,
"aggs": {
"filteredCategory": {
"filter": {
"terms": {
"breadcrumbs.categoryIds": [
"category"
]
}
},
"aggs": {
"nestedProducts": {
"nested": {
"path": "products"
},
"aggs": {
"filteredByProductsAttributes": {
"filter": {
"terms": {
"products.sterile": [
"0"
]
}
},
"aggs": {
"DescriptiveFeatures": {
"terms": {
"field": "products.descriptiveFeatures",
"size": 1000
},
"aggs": {
"productCount": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
}
}
}
What I under stand from the description is that you want to filter your results on the basis of some Nested and Non Nested Fields and then apply aggregations on the Nested Field. I created a sample Index and data with some Nested and Non Nested Fields and created a query
Mapping
PUT stack-557722203
{
"mappings": {
"_doc": {
"properties": {
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user": {
"type": "nested", // NESTED FIELD
"properties": {
"fName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Sample Data
POST _bulk
{"index":{"_index":"stack-557722203","_id":"1","_type":"_doc"}}
{"category":"X","user":[{"fName":"A","lName":"B","type":"X"},{"fName":"A","lName":"C","type":"X"},{"fName":"P","lName":"B","type":"Y"}]}
{"index":{"_index":"stack-557722203","_id":"2","_type":"_doc"}}
{"category":"X","user":[{"fName":"P","lName":"C","type":"Z"}]}
{"index":{"_index":"stack-557722203","_id":"3","_type":"_doc"}}
{"category":"X","user":[{"fName":"A","lName":"C","type":"Y"}]}
{"index":{"_index":"stack-557722203","_id":"4","_type":"_doc"}}
{"category":"Y","user":[{"fName":"A","lName":"C","type":"Y"}]}
Query
GET stack-557722203/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"nested": {
"path": "user",
"query": {
"term": {
"user.fName.keyword": {
"value": "A"
}
}
}
}
},
{
"term": {
"category.keyword": {
"value": "X"
}
}
}
]
}
},
"aggs": {
"group BylName": {
"nested": {
"path": "user"
},
"aggs": {
"group By lName": {
"terms": {
"field": "user.lName.keyword",
"size": 10
},
"aggs": {
"reverse Nested": {
"reverse_nested": {} // NOTE THIS
}
}
}
}
}
}
}
Output
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"group BylName": {
"doc_count": 4,
"group By lName": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "B",
"doc_count": 2,
"reverse Nested": {
"doc_count": 1
}
},
{
"key": "C",
"doc_count": 2,
"reverse Nested": {
"doc_count": 2
}
}
]
}
}
}
}
As per the discrepancy in data where you are getting, more documents in doc_count when you changed the mapping to Nested is because of the way Nested and Object(NonNested) documents are stored. See here to understand how are they internally stored. In order to connect them back to the root Document , you can use Reverse Nested aggregation and then you will have the same result.
Hope this helps!!

Boost score based on integer value - Elasticsearch

I'm not very experienced with ElasticSearch and would like to know how to boost a search based on a certain integer value.
This is an example of a document:
{
"_index": "links",
"_type": "db1",
"_id": "mV32vWcBZsblNn1WqTcN",
"_score": 8.115617,
"_source": {
"url": "example.com",
"title": "Example website",
"description": "This is an example website, used for various of examples around the world",
"likes": 9,
"popularity": 543,
"tags": [
{
"name": "example",
"votes": 5
},
{
"name": "test",
"votes": 2
},
{
"name": "testing",
"votes": 1
}
]
}
}
Now in this particular search, the focus is on the tags and I would like to know how to boost the _score and multiply it by the integer in the votes under tags.
If this is not possible (or very hard to achieve), I would simply like to know how to boost the _score by the votes (not under tags)
Example, add 0.1 to the _score for each integer in votes
This is the current search query I'm using (for searching tags only):
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool":{
"should":{
"match":{
"tags.name":"example,testing,something else"
}
}
}
}
}
}
}
I couldn't find much online, and hope someone can help me out.
How do I boost the _score with an integer value?
Update
For more info, here is the mapping:
{
"links": {
"mappings": {
"db1": {
"properties": {
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"likes": {
"type": "long"
},
"popularity": {
"type": "long"
},
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"votes": {
"type": "long"
}
}
}
}
}
}
}
}
Update 2
Changed the tags.likes/tags.dislikes to tags.votes, and added a nested property to the tags
This took a long time to figure out. I have learnt so much on my way there.
Here is the final result:
{
"query": {
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"tags.name": "example"
}
},
{
"match": {
"tags.name": "testing"
}
},
{
"match": {
"tags.name": "test"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "tags.votes"
}
}
],
"boost_mode": "multiply"
}
}
}
}
}
The array in should has helped a lot, and was glad I could combine it with function_score
You are looking at function score query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
And field value factor https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor.
Snippet from documentation:
GET /_search
{
"query": {
"function_score": {
"field_value_factor": {
"field": "tags.dislikes",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
Or with script score because your nested tags field (not sure if field value score works fine with nested structure).

Elasticsearch: Search in keywords ignoring case and accent (via aggregation)

I can search for specific keywords on indexes like this:
GET */_search/?
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"TECH.keyword": {
"terms": {
"field": "TECH.keyword",
"include": ".*mine.*",
"order": {
"_count": "desc"
},
"size": 20
}
}
}
}
Using this query, I can get all entries that have "mine" in their TECH.keyword fields, ordered by "_count": "desc". So, it's OK.
The actual problem is that the index can contain mine, Mine or MINE or even miné in TECH.keyword fields. And I would like to return them all.
Is there a way to search in keywords ignoring case and accent?
The current mapping is:
"TECH": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
You should be able to accomplish this with a normalizer. You can't use an analyzer on keyword fields, but you can use a normalizer. It allows you to use lowercase and asciifolding.
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/normalizer.html
PUT index
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"foo": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}

Using inner_hits inside an aggregation

I have a collection of documents which all contain an array of nested objects with important data. I want do to an aggregation on these which returns me the first document, last document, and all of the nested objects in that group. I can achieve everything in that list except for the nested objects.
Mapping:
"instances": {
"properties": {
"aggField": {
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "integer"
},
"nestedObjs": {
"type": "nested",
"properties": {
"key": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "integer"
}
}
},
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
Query:
{
"size" : 0,
"aggs" : {
"agg-buckets" : {
"terms" : {
"field" : "aggField",
"size" : 10
},
"aggs": {
"last-report": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size": 1
}
},
"first-report": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "asc"
}
}
],
"size": 1
}
},
"nested-objs": {
"nested": {
"path": "nestedObjs",
"inner_hits": {}
}
}
}
}
}
But this fails with:
Parse Failure [Unexpected token START_OBJECT in [nested-objs].]
If I remove the "inner_hits" field it works ok. But it just gives me the document count and not the documents themselves.
What am I doing wrong?
E: I'm using ES version 1.7.1
Are you sure that inner_hits is allowed in a nested aggregation (as opposed to a nested query)? I suspect that's what's causing the error.

Resources