How can I let ES support mixed type of a field? - elasticsearch

I am saving logs to Elasticsearch for analysis but I found there are mixed types of a particular field which causing error when indexing the document.
For example, I may save below log to the index where uuid is an object.
POST /index-000001/_doc
{
"uuid": {"S": "001"}
}
but from another event, the log would be:
POST /index-000001/_doc
{
"uuid": "001"
}
the second POST will fail because the type of uuid is not an object. so I get this error: object mapping for [uuid] tried to parse field [uuid] as object, but found a concrete value
I wonder what the best solution for that? I can't change the log because they are from different application. The first log is from the data of dynamodb while the second one is the data from application. How can I save both types of logs into ES?
If I disable dynamic mapping, I will have to specify all fields in the index mapping. For any new fields, I am not able to search them. so I do need dynamic mapping.
There will be many cases like that. so I am looking for a solution which can cover all conflict fields.

It's perfectly possible using ingest pipelines which are run before the indexing process.
The following would be a solution for your particular use case, albeit somewhat onerous:
create a pipeline
PUT _ingest/pipeline/uuid_normalize
{
"description" : "Makes sure uuid is a hash map",
"processors" : [
{
"script": {
"source": """
if (ctx.uuid != null && !(ctx.uuid instanceof java.util.HashMap)) {
ctx.uuid = ['S': ctx.uuid]; // hash map init
}
"""
}
}
]
}
run the pipeline when ingesting a new doc
POST /index-000001/_doc
{
"uuid": {"S": "001"}
}
POST /index-000001/_doc?pipeline=uuid_normalize <------
{
"uuid": "001"
}
You could now extend this to be as generic as you like but it is assumed that you know what you expect as input in each and every doc. In other words, unlike dynamic templates, you need to know what you want to safeguard against.
You can read more about painless script operators here.

You just cannot.
You should either normalize all your field in a way or another.
Or use 2 separate field.
I can suggest to use a field like this :
"uuid": {"key": "S", "value": "001"}
and skip the key when not necessary.
But you will have to preprocess your value before ingestion.

Related

Fluent Bit set index mapping

I am trying to set all mapped fields to string ie if a json message comes with following:
{
"logDate": "2012-04-23T18:25:43.511Z",
"logId": 123131,
"message": {
"username": "pera",
"password": "pera123"
}
}
I need to log every value as string ie. logId should be logged as "logId": "123131".
Is there a way to tell fluent bit what index mapping to use of maybe there is another setting that changes dynamic type to string?
Maybe can try adding an index template.
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html

Exclude setting on integer field in term query

My documents contain an integer array field, storing the id of tags describing them. Given a specific tag id, I want to extract a list of top tags that occur most frequently together with the provided one.
I can solve this problem associating a term aggregation over the tag id field to a term filter over the same field, but the list I get back obviously always starts with the album id I provide: all documents matching my filter have that tag, and it is thus the first in the list.
I though of using the exclude field to avoid creating the problematic bucket, but as I'm dealing with an integer field, that seems not to be possible: this query
{
"size": 0,
"query": {
"term": {
"tag_ids": "00001"
}
},
"aggs": {
"tags": {
"terms": {
"size": 3,
"field": "tag_ids",
"exclude": "00001"
}
}
}
}
returns an error saying that Aggregation [tags] cannot support the include/exclude settings as it can only be applied to string values.
Is it possible to avoid getting back this bucket?
This is, as of Elasticsearch 1.4, a shortcoming of ES itself.
After the community proposed this change, the functionality has been added and will be included in Elasticsearch 1.5.0.
It's supposed to be fixed since version 1.5.0.
Look at this: https://github.com/elasticsearch/elasticsearch/pull/7727
While it is enroute to being fixed: My workaround is to have the aggregation use a script instead of direct access to the field, and let that script use the value as string.
Works well and without measurable performance loss.

Using a combined field as id mapping in ElasticSearch

From this question I can see that it is possible Use existing field as id in elasticsearch
My question is, if can do similar thing but concatenating fields.
{
"RecordID": "a06b0000004SWbdAAG",
"SystemModstamp": "01/31/2013T07:46:02.000Z",
"body": "Test Body"
}
And then do something like
{
"your_mapping" : {
"_id" : {
"path" : "RecordID" + "body"
}
}
}
So the id is automatically formed from concatenating those fields.
No you can't, you can only make the _id point to a field that's within the document, using the dot notation as well if needed (e.g. level1,level2.id).
I'd suggest to have a field that contains the whole id in your documents, or even better to take the id out and provide it in the url, as configuring a path causes the document to be parsed when not needed.

Can elasticsearch return multiple value fields in a single facet?

I am looking for a way to create a facet such that I can essentially return two values for one key.
For instance, I am attempting to retrieve both an amount and schedule properties of an object. I attempted to use a computed value script, but the calculations that have to be done using the two objects are date based, and require an external library to perform them.
Basically, something along the lines of:
"theFacet": {
"terms_stats": {
"key_field": "someKeyProbablyADate",
"value_field": "amount",
"value_field": "simpleSchedule"
}
}
Workarounds are also appreciated. Perhaps some way to return a new dynamic object with both fields?
Sounds like you want to pre-process your data before you index it into a single field, then facet on that.
Something among the line of a single string containing key#amount#schedule
Then when you get the faceting results back you can split it up again and run whatever logic you want.
Try combining different fields with a script element. For example:
"facets": {
"facet-name": {
"terms": {
"field": "some-field",
"script": "_source['another-field'] + '/' + term
}
}
}

How to I make a field sortable in ElasticSearch?

I have tried, for example:
{
"sort": [
{
"retail_price": {
"reverse": true
}
}
]
}
... to no avail. Do I need to map the field a special way in order to enable sorting on it?
The field should satisfy two conditions: 1) it has to be indexed and 2) it shouldn't have more than one value per document or more than one token per field. If retail_price is indexed in your case and it still doesn't work for you, could you post a script that demonstrates the problem?

Resources