java.lang.OutOfMemoryError: Direct buffer memory during Druid ingestion task - memory-management

I'm ingesting data into Druid using Kafka ingestion task.
The test data is 1 message/second. Each message has 100 numeric columns and 100 string columns. Number values are random. String values are taken from a pool of 10k random 20 char strings. I have sum, min and max aggregations for each numeric column.
Config is the following:
Segment granularity: 15 mins.
Intermediate persist period: 2 mins.
druid.processing.buffer.sizeBytes=26214400
druid.processing.numMergeBuffers=2
druid.processing.numThreads=1
The Druid docs say that sane max direct memory size is
(druid.processing.numThreads + druid.processing.numMergeBuffers + 1) *
druid.processing.buffer.sizeBytes
where "The + 1 factor is a fuzzy estimate meant to account for the segment decompression buffers and dictionary merging buffers."
According to the formula I need 100 MB of direct memory but I get java.lang.OutOfMemoryError: Direct buffer memory even when I set max direct memory to 250 MB. This error is not consistent: sometimes I have this error, sometimes I don't.
My target is to calculate max direct memory before I start the task and to not get the error during task execution. My guess is that I need to calculate this "+1 factor" precisely. How can I do this?

In my experience, that formula has been pretty good, with the exception of being careful that a MB is not 1000 KB, but 1024. But I am quite surprised it still gave you the error with 250MB. How are you setting the direct memory size? And are you using a MiddleManager with Peons? Because the peons do the actual work, and you have to set the max direct memory on the peons, not the middle manager. You do it with the following parameter in the Middle Manager runtime.properties. This is what I have on mine:
druid.indexer.runner.javaOptsArray=["-server", "-Xms200m", "-Xmx200m", "-XX:MaxDirectMemorySize=220m", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager", "-XX:+ExitOnOutOfMemoryError", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/var/log/druid/task/"]
You also have to set the other properties this way too: druid.indexer.fork.property.druid.processing.buffer.sizeBytes druid.indexer.fork.property.druid.processing.numMergeBuffers
druid.indexer.fork.property.druid.processing.numThreads

Related

How to determine what causes ES's query API instability

Normally, my ES query API takes less than 1s.But sometimes these queries get slow.
cluster consists of three 32G machines (16G allocated to ES).The index consists of 20 primaries and 1 replica, 303,000,000 dos count and 500gb primaries storage size and 1tb storage size.
Here's kibana's monitoring data:
`
Personally, I think it's the result of GC. I want to add machines.But I need to find a reason to convince my leader.
Yes it could be a GC problem. But can you be more specific? What do you mean by slow?
Anyway it seems the allocated heap is way too large for your needs. You have a collection when the heap is at 12Go ( 75% of 16go ) and it goes back to 5go every time. Its generate huge garbage collection.
You should try to lower the heap to like 10Go and check the impact on performance GC count and GC duration.
I recommands you too read this article https://www.elastic.co/blog/a-heap-of-trouble especially the "Together We Can Prevent Forest Fires" part.

How much load can cassandra handle on m1.xlarge instance?

I setup 3 nodes of Cassandra (1.2.10) cluster on 3 instances of EC2 m1.xlarge.
Based on default configuration with several guidelines included, like:
datastax_clustering_ami_2.4
not using EBS, raided 0 xfs on ephemerals instead,
commit logs on separate disk,
RF=3,
6GB heap, 200MB new size (also tested with greater new size/heap values),
enhanced limits.conf.
With 500 writes per second, the cluster works only for couple of hours. After that time it seems like not being able to respond because of CPU overload (mainly GC + compactions).
Nodes remain Up, but their load is huge and logs are full of GC infos and messages like:
ERROR [Native-Transport-Requests:186] 2013-12-10 18:38:12,412 ErrorMessage.java (line 210) Unexpected exception during request java.io.IOException: Broken pipe
nodetool shows many dropped mutations on each node:
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 7
BINARY 0
READ 2
MUTATION 4072827
_TRACE 0
REQUEST_RESPONSE 1769
Is 500 wps too much for 3-node cluster of m1.xlarge and I should add nodes? Or is it possible to further tune GC somehow? What load are you able to serve with 3 nodes of m1.xlarge? What are your GC configs?
Cassandra is perfectly able to handle tens of thousands small writes per second on a single node. I just checked on my laptop and got about 29000 writes/second from cassandra-stress on Cassandra 1.2. So 500 writes per second is not really an impressive number even for a single node.
However beware that there is also a limit on how fast data can be flushed to disk and you definitely don't want your incoming data rate to be close to the physical capabilities of your HDDs. Therefore 500 writes per second can be too much, if those writes are big enough.
So first - what is the average size of the write? What is your replication factor? Multiply number of writes by replication factor and by average write size - then you'll approximately know what is required write throughput of a cluster. But you should take some safety margin for other I/O related tasks like compaction. There are various benchmarks on the Internet telling a single m1.xlarge instance should be able to write anywhere between 20 MB/s to 100 MB/s...
If your cluster has sufficient I/O throughput (e.g. 3x more than needed), yet you observe OOM problems, you should try to:
reduce memtable_total_space_mb (this will cause C* to flush smaller memtables, more often, freeing heap earlier)
lower write_request_timeout to e.g. 2 seconds instead of 10 (if you have big writes, you don't want to keep too many of them in the incoming queues, which reside on the heap)
turn off row_cache (if you ever enabled it)
lower size of the key_cache
consider upgrading to Cassandra 2.0, which moved quite a lot of things off-heap (e.g. bloom filters and index-summaries); this is especially important if you just store lots of data per node
add more HDDs and set multiple data directories, to improve flush performance
set larger new generation size; I usually set it to about 800M for a 6 GB heap, to avoid pressure on the tenured gen.
if you're sure memtable flushing lags behind, make sure sstable compression is enabled - this will reduce amount of data physically saved to disk, at the cost of additional CPU cycles

Hbase response size

I have a bunch of rows on HBase which store varying sizes of data (0.5 MB to 120 MB). When the scanner cache is set to say 100, the response sometimes gets too large and the region server dies. I tried but couldn't find a solution. Can someone help me finding
What is the maximum response size that HBase supports?
Is there a way to limit the response size at the server so that the result will be limited to a particular value (answer to the first question) so that the result will be returned as soon as the limit is reached?
What happens if a single record exceeds this limit? There should be a way to increase it but I don't know how.
1. What is the maximum response size that HBase supports?
It is Long.MAX_VALUE and represented by the constant DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE
public static long DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE = Long.MAX_VALUE;
2. Is there a way to limit the response size at the server so that the result will be limited to a particular value (answer to the first question) so that the result will be returned as soon as the limit is reached?
You could make use of the property hbase.client.scanner.max.result.size to handle this. It allows us to set a maximum size rather than count of rows on what a scanner gets in one go. It is actually the maximum number of bytes returned when calling a scanner's next method.
3. What happens if a single record exceeds this limit? There should be a way to increase it but I don't know how.
Complete record(row) will be returned even if it exceeds the limit.

What is the ideal bulk size formula in ElasticSearch?

I believe there should be a formula to calculate bulk indexing size in ElasticSearch. Probably followings are the variables of such a formula.
Number of nodes
Number of shards/index
Document size
RAM
Disk write speed
LAN speed
I wonder If anyone know or use a mathematical formula. If not, how people decide their bulk size? By trial and error?
Read ES bulk API doc carefully: https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html#_using_and_sizing_bulk_requests
Try with 1 KiB, try with 20 KiB, then with 10 KiB, ... dichotomy
Use bulk size in KiB (or equivalent), not document count !
Send data in bulk (no streaming), pass redundant info API url if you can
Remove superfluous whitespace in your data if possible
Disable search index updates, activate it back later
Round-robin across all your data nodes
There is no golden rule for this. Extracted from the doc:
There is no “correct” number of actions to perform in a single bulk call. You should experiment with different settings to find the optimum size for your particular workload.
I derived this information from the Java API's BulkProcessor class. It defaults to 1000 actions or 5MB, it also allows you to set a flush interval but this is not set by default. I'm just using the default settings.
I'd suggest using BulkProcessor if you are using the Java API.
I was searching about it and i found your question :)
i found this in elastic documentation
.. so i will investigate the size of my documents.
It is often useful to keep an eye on the physical size of your bulk requests. One thousand 1KB documents is very different from one thousand 1MB documents. A good bulk size to start playing with is around 5-15MB in size
In my case, I could not get more than 100,000 records to insert at a time. Started with 13 million, down to 500,000 and after no success, started on the other side, 1,000, then 10,000 then 100,000, my max.
I haven't found a better way than trial and error (i.e. the traditional engineering process), as there are many factors beyond hardware influencing indexing speed: the structure/complexity of your index (complex mappings, filters or analyzers), data types, whether your workload is I/O or CPU bound, and so on.
In any case, to demonstrate how variable it can be, I can share my experience, as it seems different from most posted here:
Elastic 5.6 with 10GB heap running on a single vServer with 16GB RAM, 4 vCPU and an SSD that averages 150 MB/s while searching.
I can successfully index documents of wildly varying sizes via the http bulk api (curl) using a batch size of 10k documents (20k lines, file sizes between 25MB and 79MB), each batch taking ~90 seconds. index.refresh_interval is set to -1 during indexing, but that's about the only "tuning" I did, all other configurations are the default. I guess this is mostly due to the fact that the index itself is not too complex.
The vServer is at about 50% CPU, SSD averaging at 40 MB/s and 4GB RAM free, so I could probably make it faster by sending two files in parallel (I've tried simply increasing the batch size by 50% but started getting errors), but after that point it probably makes more sense to consider a different API or simply spreading the load over a cluster.
Actually, there is no clear way of finding out the exact upper limit for the bulk update. An important factor to consider in the bulk update is request data volume not only the no. of documents
An excerpt from link
How Big Is Too Big?
      The entire bulk request needs to be loaded into memory by the node that receives our request, so the bigger the request, the less memory available for other requests. There is an optimal size of bulk request. Above that size, performance no longer improves and may even drop off. The optimal size, however, is not a fixed number. It depends entirely on your hardware, your document size and complexity, and your indexing and search load.
      Fortunately, it is easy to find this sweet spot: Try indexing typical documents in batches of increasing size. When performance starts to drop off, your batch size is too big. A good place to start is with batches of 1,000 to 5,000 documents or, if your documents are very large, with even smaller batches.
      It is often useful to keep an eye on the physical size of your bulk requests. One thousand 1KB documents is very different from one thousand 1MB documents. A good bulk size to start playing with is around 5-15MB in size.
Actually I'm facing some problems related to bulk API. There is one parameter that impact the bulk api. It's the number of index inside a bulk request.

Need suggestion to fasten Map for a huge Invert Index in matlab

I need to store a huge data in a map for Invert Index, but my data is very huge, and I see that as Map gets bigger and bigger, it becomes slower and slower. We are talking a Map container with a very sparse index, that covers 1 to billions.
In one iteration of my program, some numbers will be calculated, to get many key values (could be thousands) to be stored - this means the Map will have its size increased by about thousands or so in every iteration. And I see that in the first few iterations, it take 20 seconds or so, but on 70th iteration or so, it takes 100 seconds or so. I have about 5000 set of data - that is I require 5000 iterations for all these data. With the exponentially increasing time for each iteration, this will take days to compute and that is unacceptable.
So is there anything I can do in this case?
You could try using the java HashMap implementation instead. There is a smkall overhead every time Matlab access java routines, but the Java routines usually provide more flexibility. For example:
%Create
map = java.util.HashMap(5e6); %Initialize with room for 5 million entries
%Add data
map.put('key1','value1');
map.put(2,20);
%get data
out = map.get('key1'); %Get a value
map.containsKey(2); %Check for existance of a key
This will work. But ... it's not clear if it will be faster or not. Only a test will tell.
Also, you will probably get an occasional error when you are developing this way.
Java exception occurred:
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.<init>(Unknown Source)
at java.util.HashMap.<init>(Unknown Source)
When this happens, you can use clear java to purge any Java resident information, or allocate less space to the initial HashMap.

Resources