Aggregating nested fields of varying datatypes in Elasticsearch

Aggregating nested fields of varying datatypes in Elasticsearch - elasticsearch

I have an index based on Products and one of the fields declared in the mapping is Attributes. This field is a nested type as it will contain two values - key and value. The problem I have is that the depending on the context of the attribute the datatype of value can vary between an integer and string.
For example:
{"attributes":[{"key":"StrEx","value":"Red"},{"key":"IntEx","value":2}]}
It seems the datatype for every instance of 'value' within all future nested documents within Attributes is decided based on the first data entered. I need to be able to store it as a integer/long datatype so I can perform range queries.
Any help or alternative ideas would be greatly appreciated.

You need a mapping like this one, for the value field:
"value": {
"type": "string",
"fields": {
"as_number": {
"type": "integer",
"ignore_malformed": true
}
}
}
Basically, your field is string but using fields you can attempt to format it as a numeric field.
When you want to use range queries then use value.as_number, for anything else use value.

Related

Unexpected result using Elasticsearch when dash character is involved

I'm querying Elasticsearch 2.3 using django-haystack, and the query that is executed seems to be the following:
'imaging_telescopes:(*\\"FS\\-60\\"*)'
An object in my Elasticsearch data has the following value for its property imaging_telescopes: "Takahashi FSQ-106N".
This object matches the query, and to me this result is unepected, I wouldn't want it to match.
My assumption is that it matches becasue it contains the letters FS, but in my frontend I'm just searching for "FS-60".
How can I modify the query so that it's stricter in looking for objects whose property imaging_telescopes exactly contains some text?
Thanks!
EDIT: this is the mapping of the field:
"imaging_telescopes": {
"type": "string",
"analyzer": "snowball"
}

How to compare two text fields in elastic search

I have a need to compare two text fields in elastic search, but they are text fields.
For normal fields I can use script to compare using doc['field'].value, but is there a way to do the same for text fields.

See below excerpt from ES DOCS :
By far the fastest most efficient way to access a field value from a script is to use the doc['field_name'] syntax, which retrieves the field value from doc values. Doc values are a columnar field value store, enabled by default on all fields except for analyzed text fields.
There are 2 ways known to me to access a text value from script.
Map a keyword representation of a text field as well and access that field.
{
"mappings": {
"properties": {
"name":{
"type": "text",
"fields": {
"keyword":{ // <======= See this
"type":"keyword"
}
}
}
}
}
}
The keyword representation can be accessed like 'doc[name.keyword].value'
It is recommended to index/store keyword representation of fields for small-size text fields like 'name', 'emailId' but is not recommended for larger fields like 'description', due to memory overhead
Another way is to enable field data on the text field and access that field.
Fielddata is disabled on text fields by default. Set fielddata=true on
[your_field_name] in order to load fielddata in memory by uninverting the
inverted index. Note that this can however use significant memory.
It is not recommended to use the field-data over the text fields however.
Note: Please do add details on why you need comparison and what kind of comparison is required on 'text' fields.

Sorting Results efficiently by non mapped/searchable Object properties

In my current index called items I have previously sorted the results of my queries by some property values of a specifiy object property, lets call this property just oprop. After I have realized, that with each new key in oprop the total number of used fields on the items index increased, I had to change the mapping on the index. So I set oprop's mapping to dynamic : false, so the keys of oprop are not searchable anymore (not indexed).
Now in some of my queries I need to sort the results on the items index by the values of oprop keys. I don't know how ElasticSearch still can give me the possibility to sort on these key values.
Do I need to use scripts for sorting? Do I have access on non indexed data when using scripts?
Somehow I don't think that this is a good approach and I think that in long term I will run into performance issues.

You could use scripts for sorting, since the data will be stored in the _source field, but that should probably be a last resort. If you know which fields need to be sortable, you could just add those to the mapping, and keep oprop as a non dynamic field otherwise?
"properties": {
"oprop": {
"dynamic": false,
"properties": {
"sortable_key_1": {
"type": "text"
},
"sortable_key_2": {
"type": "text"
}
}
}
}

Kibana scripted field which loops through an array

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?

I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values

OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]

The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

Elasticsearch: Constant Data Field Type

Is there a way to add an Elasticsearch data field to an index mapping, such that it always returns a constant numeric value?
I know I can just add a numeric datatype, and then reindex everything with the constant, but I would like to avoid reindexing, and I'd also like to be able to change the constant dynamically without reindexing.
Motivation: Our cluster has a lot of different indexes. We routinely search multiple indexes at once for various reasons. However, when searching multiple indices, our search logic still needs to treat each index slightly differently. One way we could do this is by adding a constant numeric field to each index, and then use that field in our search query.
However, because this is a constant, it seems like we should not need to reindex everything (seems pointless to add a constant value to every record).

You could use the _meta field for that purpose:
PUT index1
{
"mappings": {
"_meta": {
"constant": 1
},
"properties": {
... your fields
}
}
}
PUT index2
{
"mappings": {
"_meta": {
"constant": 2
},
"properties": {
... your fields
}
}
}
You can change that constant value anytime, without any need for reindexing anything. The value is stored at the index level and can be retrieved anytime by simply retrieve the index mapping with GET index1,index2/_mapping

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio