How to get Elasticsearch actual result size - elasticsearch

I am not asking the count for the search response. what I am asking is, size of the result(_source) that took Elasticsearch's hard-disk memory. is it possible to find such?. Why I am asking is, I need to find which type of source takes maximum size for an index. thanks in advance.

You can enable the _size field in your mapping. So this is data is created at index time.
{
"tweet" : {
"_size" : {"enabled" : true, "store" : true }
}
}
Check out the size field documentation.
Then you can return this field by adding it to the fields list in the query.
See the Fields documentation for how to do that.

Related

Kibana display number after comma on metric

I'm actually trying to dislay all number after comma in my kibana's datatable but even with json input format, it does display as expected ...
Do you have an idea how to do this ?
Here for example I have 2.521 but in can be 0.632, or 0.194 ...
I only see 0 in Min, Max, Avg columns
In my C# code is a double and indexed as a number in Kibana index:
How to do this plz ?
Thank a lot and best regards
This usually means that your field has been mapped as integer or long. If that's the case, 0.632 is stored as 0 and 2.521 as 2.
You need to make sure that those fields are mapped as float or double in your mapping.
PS: you cannot change the mapping type once the index has been created, you need to create a new index and reindex your data.
You need to pre-create your index with the right mapping types before sending the first document:
PUT webapi-myworkspace-test
{
"mappings": {
"properties": {
"GraphApiResponseTime" : {
"type" : "double"
}
}
}
}

comparing data between different mappings

I am relatively new to Elasticsearch so I apologies if the terms are not accurate. I have a few indexes and a few almost identical indexes but with less fields in the mapping.
(the original indexes has data and the new ones with less fields are empty)
how can I compare the data and insert the relevant documents into the new indexes with less fields?
for example original index mapping:
{
“first_name” : ”Dana”,
“last_name” : ”Leon”,
“birth_date” : “1990-01-09“,
“social_media” : {
“facebook_id” : ”K8426dN”,
“google_id” : ”8764873”,
“linkedin_id” : ”Gdna”
}
}
new mapping with less fields
{
“first_name” : ”Dana”,
“last_name” : ”Leon”,
“social_media” : {
“facebook_id” : ”K8426dN”,
“google_id” : ”8764873”,
“linkedin_id” : ”Gdna”
}
}
Thanks
You can use reindex by script:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-change-name
In the "script" you'll need to specify the fields, that you want to remove like:
ctx._source.remove("birth_date")"
The second option is to use ingest pipeline with "remove" proccessor:
https://www.elastic.co/guide/en/elasticsearch/reference/current/remove-processor.html, and to do reindex with default pipeline definition into settings, but this will be harder to implement

Kibana scripted field which loops through an array

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?
I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values
OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]
The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

How to update a document using index alias

I have created an index "index-000001" with primary shards = 5 and replica = 1. And I have created two aliases
alias-read -> index-000001
alias-write -> index-000001
for indexing and searching purposes. When I do a rollover on alias-write when it reaches its maximum capacity, it creates a new "index-000002" and updates aliases as
alias-read -> index-000001 and index-000002
alias-write -> index-000002
How do I update/delete a document existing in index-000001(what if in case all I know is the document id but not in which index the document resides) ?
Thanks
Updating using an index alias is not directly possible, the best solution for this is to use a search query using the document id or a term and get the required index. Using the index you can update your document directly.
GET alias-read/{type}/{doc_id} will get the required Document if doc_id is known.
If doc_id is not known, then find it using a unique id reference
GET alias-read/_search
{
"term" : { "field" : "value" }
}
In both cases, you will get a single document as a response.
Once the document is obtained, you can use the "_index" field to get the required index.
PUT {index_name}/{type}/{id} {
"required_field" : "new_value"
}
to update the document.

Elasticsearch transform on search?

I've an issue with data in my Elastic index where certain string fields contain differing values that should be the same. For example X-Box, X Box and XBox.
I realise that I could add some transforms to my mappings, but it's not really appropriate in the case as we have data coming in from many sources and the values are unknown until we have received them.
Is it possible to define something like a transform but on search? For example, user searches 'XBox' but because we have defined it (after having discovered the variances) Elastic knows to also return documents for 'X-Box and XBox'?
Hope that makes sense? Thanks in advance.
Synonym filter is what you are looking for. It can map variants to a common name.
You can refer to this blog for creating analyzer.
Just use the format as shown below -
{
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms" : [
"X-box, x box => xbox",
"universe, cosmos"
]
}
}
}

Resources