Kilograms to grams convert processor in Elasticsearch - elasticsearch

There is a standard processor/converter for bytes from gigabytes https://www.elastic.co/guide/en/elasticsearch/reference/current/bytes-processor.html
We have a field weight and the values can be 3kg, 1200g, ...
Is there similar processor for grams? Or which processor can be used to achieve same functionality?

Related

How to calculate the size of an elasticsearch node?

How can i calculate the needed size of the elasticsearch-node for my Shopware 6 instance when i known some KPIs.
For example:
KPI
Value
Customer
5.000
Products
10.000
SalesChannel
2
Languages
1
Categories
20
Is there a (rough) formula to calculate the number of documents or the required size of a node?
https://www.elastic.co/blog/benchmarking-and-sizing-your-elasticsearch-cluster-for-logs-and-metrics is relevant
however what you have provided there is only logical sizing of the data. you will need to figure out what this all means when you start putting documents into Elasticsearch

java.lang.OutOfMemoryError: Direct buffer memory during Druid ingestion task

I'm ingesting data into Druid using Kafka ingestion task.
The test data is 1 message/second. Each message has 100 numeric columns and 100 string columns. Number values are random. String values are taken from a pool of 10k random 20 char strings. I have sum, min and max aggregations for each numeric column.
Config is the following:
Segment granularity: 15 mins.
Intermediate persist period: 2 mins.
druid.processing.buffer.sizeBytes=26214400
druid.processing.numMergeBuffers=2
druid.processing.numThreads=1
The Druid docs say that sane max direct memory size is
(druid.processing.numThreads + druid.processing.numMergeBuffers + 1) *
druid.processing.buffer.sizeBytes
where "The + 1 factor is a fuzzy estimate meant to account for the segment decompression buffers and dictionary merging buffers."
According to the formula I need 100 MB of direct memory but I get java.lang.OutOfMemoryError: Direct buffer memory even when I set max direct memory to 250 MB. This error is not consistent: sometimes I have this error, sometimes I don't.
My target is to calculate max direct memory before I start the task and to not get the error during task execution. My guess is that I need to calculate this "+1 factor" precisely. How can I do this?
In my experience, that formula has been pretty good, with the exception of being careful that a MB is not 1000 KB, but 1024. But I am quite surprised it still gave you the error with 250MB. How are you setting the direct memory size? And are you using a MiddleManager with Peons? Because the peons do the actual work, and you have to set the max direct memory on the peons, not the middle manager. You do it with the following parameter in the Middle Manager runtime.properties. This is what I have on mine:
druid.indexer.runner.javaOptsArray=["-server", "-Xms200m", "-Xmx200m", "-XX:MaxDirectMemorySize=220m", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager", "-XX:+ExitOnOutOfMemoryError", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/var/log/druid/task/"]
You also have to set the other properties this way too: druid.indexer.fork.property.druid.processing.buffer.sizeBytes druid.indexer.fork.property.druid.processing.numMergeBuffers
druid.indexer.fork.property.druid.processing.numThreads

Does field length affect elasticsearch performance?

Will the size of the elastic search index decrease (and performance
increase due to reduce memory footprint) if I were to shorten field
names?
Fields names are stored in the field info file, with suffix .fnm
FieldName is just a UTF-8 string there.
You could estimate the current size of it and make any assumptions about how much it would save you space, but I’m pretty much sure, that it will be negligible, so there is very little sense in optimizing field names.
For example, my tiny playground index of the size 500 Mb with around 100-150 fields have total size of all field info files to be 188 kB which makes it 0.04% out of total size.

Improve mapping performance on Elasticsearch

My elastic cluster contains indices with giant mapping files. This is due to the fact that some of my indices contain up to 60k different fields.
To elaborate a bit about my setup, each index contains information from a single source. Each source has several types of data (what I'll call layers) which are indexed as different types in the index corresponding to the source. Each layer has different attributes (20 in average). To avoid field name collision they are indexed as "LayerId_FieldId".
I'm trying to find a way to reduce the size of my mapping (as to my understanding, it might cause performance issues). One option is having one index per layer (and perhaps spreading large layers over several indices, each responsible for a different time segment). I have around 4000 different layers indexed right now, so lets say that in this method I will have 5000 different indices. Is elastic fine with that? What should I be worried about (if at all) with such a large number of indices, some of them very small (some of the layers have as few as 100 items)?
A second possible solution is the following. Instead of saving a layer's data in the way it is sent to me, for example:
"LayerX_name" : "John Doe",
"LayerX_age" : 34,
"LayerX_isAdult" : true,
it will be saved as :
"value1_string" : "John Doe",
"value2_number" : 34,
"value3_boolean" : true,
In the latter option, I will have to keep some metadata index which links the generic names to the real field names. In the above example, I need to know that for layer X the field "value1_string" corresponds to "name". Thus, whenever I receive a new document to index, I have to query the metadata in order to know how to map the given fields into my generic names. This allows me to have a constant size mapping (say, 50 fields for each value type, so several hundred fields overall). However, this introduces some overhead, but most importantly I'm feeling that this basically reduces my database to a relational one, and I lose the ability to handle documents of arbitrary structure.
Some technical details about my cluster:
Elasticsearch version 2.3.5
22 nodes, 3 of them are masters, each node contains 16 Gb of ram, 2 Tb
disc storage. In total I currently have 6 Tb of data spread over 1.2
billion docs, 55 indices, and 1500 shards.
I'd appreciate your input on the two solutions I suggested, or any other alternatives you have in mind!

Mongodb collection _id

By default _id field is generated as new ObjectId(), which has 96 bits (12 bytes).
Does _id size affect collection performance? What if I'll use 128 bits (160 bits or 256 bits) strings instead of native ObjectId?
In query performance, it is unlikely to matter. The index on _id is a sorted index implemented as a binary tree, so the actual length of the values doesn't matter much.
Using a longer string as _id will of course make your documents larger. Larger documents mean that less documents will be cached in RAM which will result in worse performance for larger databases. But when that string is a part of the documents anyway, using them as _id would save space because you won't need an additonal _id anymore.
By default _id filed is indexed(primary key) and if you tend to use a custom value set for it(say String) factually it will just consume more space. It will not have any significant impact on your query performance. Index size hardly contributes to query performance. You can verify this with sample code.

Resources