How to I make a field sortable in ElasticSearch? - elasticsearch

I have tried, for example:
{
"sort": [
{
"retail_price": {
"reverse": true
}
}
]
}
... to no avail. Do I need to map the field a special way in order to enable sorting on it?

The field should satisfy two conditions: 1) it has to be indexed and 2) it shouldn't have more than one value per document or more than one token per field. If retail_price is indexed in your case and it still doesn't work for you, could you post a script that demonstrates the problem?

Related

How can I let ES support mixed type of a field?

I am saving logs to Elasticsearch for analysis but I found there are mixed types of a particular field which causing error when indexing the document.
For example, I may save below log to the index where uuid is an object.
POST /index-000001/_doc
{
"uuid": {"S": "001"}
}
but from another event, the log would be:
POST /index-000001/_doc
{
"uuid": "001"
}
the second POST will fail because the type of uuid is not an object. so I get this error: object mapping for [uuid] tried to parse field [uuid] as object, but found a concrete value
I wonder what the best solution for that? I can't change the log because they are from different application. The first log is from the data of dynamodb while the second one is the data from application. How can I save both types of logs into ES?
If I disable dynamic mapping, I will have to specify all fields in the index mapping. For any new fields, I am not able to search them. so I do need dynamic mapping.
There will be many cases like that. so I am looking for a solution which can cover all conflict fields.
It's perfectly possible using ingest pipelines which are run before the indexing process.
The following would be a solution for your particular use case, albeit somewhat onerous:
create a pipeline
PUT _ingest/pipeline/uuid_normalize
{
"description" : "Makes sure uuid is a hash map",
"processors" : [
{
"script": {
"source": """
if (ctx.uuid != null && !(ctx.uuid instanceof java.util.HashMap)) {
ctx.uuid = ['S': ctx.uuid]; // hash map init
}
"""
}
}
]
}
run the pipeline when ingesting a new doc
POST /index-000001/_doc
{
"uuid": {"S": "001"}
}
POST /index-000001/_doc?pipeline=uuid_normalize <------
{
"uuid": "001"
}
You could now extend this to be as generic as you like but it is assumed that you know what you expect as input in each and every doc. In other words, unlike dynamic templates, you need to know what you want to safeguard against.
You can read more about painless script operators here.
You just cannot.
You should either normalize all your field in a way or another.
Or use 2 separate field.
I can suggest to use a field like this :
"uuid": {"key": "S", "value": "001"}
and skip the key when not necessary.
But you will have to preprocess your value before ingestion.

Multiple filter by array of object in Elastic 6.*

Need help with building query through the array in ElasticSearch 6. I have documents that represent some property units with a number of attributes:
{
"Unit":{
"Attributes":{
"Attribute":[
{
"Name":"Elevator",
"Text":"No"
},
{
"Name":"Pet Friendly",
"Text":"Yes"
}
...
]
}
}
}
How can I filter my documents to find all pet friendly units or all units without elevator?
P.S. I am using NEST.
Map Attribute as a nested type, probably with Text mapped as keyword for term level matching. To query, use a bool query with filter clauses, where the clauses will be nested queries.

ElasticSearch Function Score Query not defaulting to 1 if missing field

I am trying to create a function score in ElasticSearch that (after other filters) gives a higher score to those customers who bought a product recently. For this I have a field "lastPurchaseOn".
The query also needs to return customers who did not buy any product, so I cannot filter on the lastPurchaseOn field being present.
"functions": [
{
"exp": {
"lastPurchaseOn": {
"scale": "3d"
}
}
},
...
]
The problem is that when the field "lastPurchaseOn" is missing, the function returns score 1, when I would really want it to return 0.
Is there a way to make the function return 0 for missing values?
Thanks
The official documentation states it pretty clearly:
If the numeric field is missing in the document, the function will return 1.
The main reason is that there's no way for the exponential function that is being used to return 0. You can try to use a linearfunction instead of exp.

Exclude setting on integer field in term query

My documents contain an integer array field, storing the id of tags describing them. Given a specific tag id, I want to extract a list of top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id field to a term filter over the same field, but the list I get back obviously always starts with the album id I provide: all documents matching my filter have that tag, and it is thus the first in the list.
I though of using the exclude field to avoid creating the problematic bucket, but as I'm dealing with an integer field, that seems not to be possible: this query
{
"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}
returns an error saying that Aggregation [tags] cannot support the include/exclude settings as it can only be applied to string values.
Is it possible to avoid getting back this bucket?
This is, as of Elasticsearch 1.4, a shortcoming of ES itself.
After the community proposed this change, the functionality has been added and will be included in Elasticsearch 1.5.0.
It's supposed to be fixed since version 1.5.0.
Look at this: https://github.com/elasticsearch/elasticsearch/pull/7727
While it is enroute to being fixed: My workaround is to have the aggregation use a script instead of direct access to the field, and let that script use the value as string.
Works well and without measurable performance loss.

Can elasticsearch return multiple value fields in a single facet?

I am looking for a way to create a facet such that I can essentially return two values for one key.
For instance, I am attempting to retrieve both an amount and schedule properties of an object. I attempted to use a computed value script, but the calculations that have to be done using the two objects are date based, and require an external library to perform them.
Basically, something along the lines of:
"theFacet": {
"terms_stats": {
"key_field": "someKeyProbablyADate",
"value_field": "amount",
"value_field": "simpleSchedule"
}
}
Workarounds are also appreciated. Perhaps some way to return a new dynamic object with both fields?
Sounds like you want to pre-process your data before you index it into a single field, then facet on that.
Something among the line of a single string containing key#amount#schedule
Then when you get the faceting results back you can split it up again and run whatever logic you want.
Try combining different fields with a script element. For example:
"facets": {
"facet-name": {
"terms": {
"field": "some-field",
"script": "_source['another-field'] + '/' + term
}
}
}

Resources