ElasticSearch sort by value - sorting

I have ElasticSearch 5 and I would like to do sorting based on field value. Imagine having document with category e.g. genre which could have values like sci-fi, drama, comedy and while doing search I would like to order values so that first comes comedies then sci-fi and drama at last. Then of course I will order within groups by other criteria. Could somebody point me to how do this ?

Elasticsearch Sort Using Manual Ordering
This is possible in elasticsearch where you can assign order based on particular values of a field.
I've implemented what you are looking for using script based sorting which makes use of painless script. You can refer to the links I've mentioned to know more on these for below query would suffice what you are looking for.
Also assuming you have genre and movie as keyword with the below mapping.
PUT sampleindex
{
"mappings": {
"_doc": {
"properties": {
"genre": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"movie": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
You can make use of below query to get what you are looking for
GET sampleindex/_search
{
"query": {
"match_all": {}
},
"sort": [{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"inline": "if(params.scores.containsKey(doc['genre.raw'].value)) { return params.scores[doc['genre.raw'].value];} return 100000;",
"params": {
"scores": {
"comedy": 0,
"sci-fi": 1,
"drama": 2
}
}
},
"order": "asc"
}
},
{ "movie.raw": { "order": "asc"}
}]
}
Note that I've included sort for both genre and movie. Basically sort would happen on genre and post that further it would sort based on movie for each genre.
Hope it helps.

Related

Number of nested objects in Elasticsearch

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:
PUT my-index-000001
{
"mappings": {
"properties": {
"some_id": {"type": "long"},
"user": {
"type": "nested",
"properties": {
"first": {
"type": "keyword"
},
"last": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"some_id": 111,
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
How to filter by the number of users (e.g. query fetching all documents with more than XX users).
I was thinking to using a runtime_field but this gives an error:
GET my-index-000001/_search
{
"runtime_mappings": {
"num": {
"type": "long",
"script": {
"source": "emit(doc['some_id'].value)"
}
},
"num1": {
"type": "long",
"script": {
"source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
}
}
}
,"fields": [
"num","num1"
]
}
Is it possible perhaps using aggregations?
Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).
Thanks.
You cannot query this efficiently
It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs
{
"query": {
"function_score": {
"min_score": 1, # -> min number of nested docs to filter by
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning
The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.
Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.
If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}

Elasticsearch - extract values from a flattened field inside a nested type using a runtime field script

I have this mapping, a big document actually, a lot of fields excluded for brevity
{
"items": {
"mappings": {
"dynamic": "false",
"properties": {
"id": {
"type": "keyword"
},
"costs": {
"type": "nested",
"properties": {
"id": {
"type": "keyword"
},
"costs_samples": {
"type": "flattened"
}
}
}
}
}
}
}
costs_samples is a flattened field type huge collection of possible costs (sometimes more than 10k entries), based on some dynamic dimensions. Have to highlight that costs_sample cannot live outside costs, since at query time some conditions from costs should be composed with should or match clauses, like (costs.country=this_value AND some_other costs_samples_condition.
I would like to be able to extract and eventually inject a new field at costs level as a runtime field, and then use that field to sort, filter and aggregate.
Something like this
{
"runtime_mappings": {
"costs.selected_cost": {
"type": "long",
"script": {
"source":
"for (def cost : doc['costs.costs_samples']) { if(cost.values!= null) {emit(Long.parseLong(cost.values.some_dynamic_identity_known_at_query_time.last))} }"
}
}
},
"query":{
"nested": {
"path": "costs",
"query": {
"bool": {
"filter": [
{
"terms": {
"costs.id": ["id-1","id-2"]
}
},
{
"term": {
"costs.selected_cost": 10
}
}
]
}
}
}
},
"fields": ["costs.selected_cost"]
}
The trouble is selected_cost it is not created / returned by ES. No error message.
Where did I do wrong? Documentation was not helpful.
Maybe worth mention that I've also tried with 2 different documents, like items and costs and then execute kind of a join operation, but the performance tests were really poor.
Thanks!
Long story short, there is no way to use a runtime field inside a nested field type.

Elasticearch - check if Boolean keyword is in text

having trouble with an elasticsearch query.
So I have about 22,000 Boolean Keywords(e.g. ("stock market") ("trading"), ("ecology"|"pollution"), etc.) in an index and I want to query for those documents whose keyword can be found in a certain text.
EDIT:
mappings are such:
{
"mapping": {
"properties": {
"id": {
"type": "long"
},
"word": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
}
}
These are examples of what this index's records look like:
"id" : 41074
"word" : """("Technical-Vocational-Livelihood "|"TVL") ("Track")""",
"id" : 38684,
"word" : """("Special Education Program")""",
Now, I want to filter only the documents whose "word" can be found in a specific text, E.G.
…Here, a student has the following options: prepare for college education, look for work, or start a business after graduation. K-12, which took effect in 2016, has four tracks: academic, technical-vocational-livelihood (TVL), arts and design and sports. “Before, when you drop out of high school, you are really a dropout; as you are expected to proceed to college or higher education…
Currently, I have this query:
GET client_keywords/_search
{
"query": {
"bool": {
"must": [
{"terms": {
"id": [
41074,
38684,
...
]
}
}
],
"filter": {
"script": {
"script": {
"lang": "painless",
"source": "",
"params": {}
}
}
}
}
}
}
I was thinking of filtering this through ES's simple query API, but I don't know if it's possible to invoke the simple query API through scripting.

Sorting on nested fields containing null values in elasticsearch

I am trying to sort on nested field in elasticsearch, but it is always showing docs with null valued nested field at the top of the sorted list while sorting in ascending order. I want to sort (in ascending as well as descending order) and want the null valued nested field docs to appear at the end of the sorted list.
This is the sorting query I am using :
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "string",
"order": "asc",
"script": {
"lang": "painless",
"source": "def val=params['_source'].tags; if(val==null){return '';}else{return params['_source'].tags} "
}
}
}
}
Below is the mapping I have applied related to the nested 'tags' field:
"tags": {
"type": "nested",
"properties": {
"tag": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
}
}
}
Sample payload:
"tags": [
{
"tag": "check"
},
{
"tag": "production"
},
{
"tag": "test"
}
]
Since tags is a nested property, you need to apply a nested sort on it, and to make document without tags last in the results list you only add to use the missing property.
nb: We sort in the subfield keyword of the tags since sorting key need to be not analyzed.
Here the example query :
{
"query": {
"match_all": {}
},
"sort": [
{
"tags.tag.keyword": {
"order": "asc",
"missing": "_last",
"nested": {
"path": "tags"
}
}
}
]
}
Since you are sorting on multi values text field I recommend you to check this sort documentation section about sorting modes for a better understanding of elasticsearch behavior in such case.
Hope it helps!

MySql Order By Value equivalent in ElasticSearch 5.6

ElasticSearch Version: 5.6
I have imported MySQL data in ElasticSearch and I have added mapping to the elastic search as required. Following is one mapping for the column application_status.
Mappings:
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"lead": {
"properties": {
"application_status": {
"type": "string",
"analyzer": "case_insensitive",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}}
On the above mapping, I am able to do simple sorting (asc or desc) using following query:
{
"size": 50,
"from": 0,
"sort": [{
"application_status.keyword": {
"order": "asc"
}
}]}
which is MySql equivalent of
select * from <table_name> order by application_status asc limit 50;
Need help on following problem:
I have MySQL query which sorts based on application_status:
select * from vLoan_application_grid order by CASE WHEN application_status = "IP_QUAL_REASSI" THEN application_status END desc, CASE WHEN application_status = "IP_COMPLE" THEN application_status END desc, CASE WHEN application_status LIKE "IP_FRESH%" THEN application_status END desc, CASE WHEN application_status LIKE "IP_%" THEN application_status END desc
Please help me write the same query in ElasticSearch. I am not able to find order by value equivalent for strings in ElasticSearch. Searching online, I understood that, I should use sorting scripts but not able to find any proper documentation.
I have following query which just does simple sort.
{
"size": 500,
"from": 0,
"query" : {
"match_all": {}
},
"sort": {
"_script": {
"type": "string",
"script": {
"source": "doc['application_status.keyword'].value",
"params": {
"factor": ["IP_QUAL_REASS", "IP_COMPLE"]
}
},
"order": "desc"
}
}}
In the above query, I am not using params section as I am not aware how to use it for type: string
I believe I am asking too much. Please help or any relevant documentation links would be greatly appreciated. Hope question is clear. I'll provide more details if necessary.
You have two options:
the most performant one is to index at indexing time another field that should be a number. This number (your choice) will be the numerical representation of that status. Then at search time, you simply sort by that number and not by the status
at search time use a script that will do almost the same thing as the first option, but dynamically, and less performant (but still quite fast)
Below you have the second option:
"sort": {
"_script": {
"type": "number",
"script": {
"source": "if (params.factor[0].containsKey(doc['application_status.keyword'].value)) return params.factor[0].get(doc['application_status.keyword'].value); else return 1000;",
"params": {
"factor": [{
"IP_QUAL_REASS":1,
"IP_COMPLE":2,
"whatever":3
}
]
}
},
"order": "asc"
}
}
If you also want things like LIKE WHATEVER%, my suggestion is to consider an indexing time change, rather than search time because the script gets more complex. But, this is the one for wildcard matches as well:
"sort": {
"_script": {
"type": "number",
"script": {
"source": "if (params.factor[0].containsKey(doc['application_status.keyword'].value)) return params.factor[0].get(doc['application_status.keyword'].value); else { params.wildcard_factors[0].entrySet().stream().filter(kv -> doc['application_status.keyword'].value.startsWith(kv.getKey())).map(Map.Entry::getValue).findFirst().orElse(1000)}",
"params": {
"factor": [
{
"IP_QUAL_REASS": 1,
"IP_COMPLE": 2,
"whatever": 3
}
],
"wildcard_factors": [
{
"REJ_": 66
}
]
}
},
"order": "asc"
}
}

Resources