How to determine correct heap size for ElasticSearch? - elasticsearch

How can I determine the heap size required for 1 GB logs having 1 day retention period?
if I take the machine with 32 GB heap size (64 GB RAM) how many GB logs I can keep in this for 1 day?

It depends on various factors like the number of indexing requests, search requests, cache utilization, size of search and indexing requests, number of shards/segments etc, also heap size should follow the sawtooth pattern, and instead of guessing it, you should start measuring it.
The good thing is that you can starting right, by assigning 50% of RAM as ES Heap size which is not crossing 32 GB.

Related

Spark UI: How to understand the min/med/max in DAG

I would like to fully understand the meaning of the information about min/med/max.
for example:
scan time total(min, med, max)
34m(3.1s, 10.8s, 15.1s)
means of all cores, the min scan time is 3.1s and the max is 15.1, the total time accumulated is up to 34 minutes, right?
then for
data size total (min, med, max)
8.2GB(41.5MB, 42.2MB, 43.6MB)
means of all the cores, the max usage is 43.6MB and the min usage is 41.5MB, right?
so the same logic, for the step of Sort at left, 80MB of ram has been used for each core.
Now, the executor has 4 core and 6G RAM, according to the metrix, I think a lot of RAM has been set aside, since each core could use up to around 1G RAM. So I would like to try reducing partition number and force each executor to process more data and reduce shuffle size, do you think theoretically it is possible?

Virtual Memory - Calculating the Virtual Address Space

I am studying for an exam tomorrow and I came across this question:
You are given a memory system with 2MB of virtual memory, 8KB page size,
512 MB of physical memory, TLB contains 16 entries, 2-way set associative.
How many bits are needed to represent the virtual address space?
I was thinking it would be 20 bits, since 2^10 is 1024, so I simply multiply 2^10*2^10 and get 2^20. However, the answer ends up being 21 and I have no idea why.
The virtual address space required is 2MB.
As you have calculated, 20 bits can accomodate 1MB of VM space. You need 21 bits to accomodate 2MB.

ElasticSearch 30.5GB Heap Size Restriction

We use ES to store around 2.5TB of data. We have 12 primary shards and 2 replica shards.
We are currently load testing ES and I read the following article
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
This article states 2 important things. First allocate 50% of Memory to Lucene and Second Don't cross 30.5GB limit for heap space.
I don't clearly understand the 30.5GB limit. I understand that if I am to set 40GB over 30.5 GB i will loose more than i gain(because of compressed pointers) but say if i have hardware of around 250GB RAM what are the reasons that i should only allocate 30.5GB and not 120GB for heap. Won't i start seeing gains after 70-80GB heap setting over 30.5 GB heap. Can somebody list down all the reasons?

Can't use more than about 1.5 GB of my 4 GB RAM for a simple sort

I'm using a circa summer 2007 MacBook Pro (x86-64) with a 32KB L1 (I think), 4MB L2, and 4GB RAM; I'm running OS X 10.6.8.
I'm writing a standard radix sort in C++: it copies from one array to another and back again as it sorts (so the memory used is twice the size of the array). I watch it by printing a '.' every million entries moved.
If the array is at most 750 MB then these dots usually move quite fast; however if the array is larger then the whole process crawls to a halt. If I radix sort 512 MB in blocks and then attempt to merge sort the blocks, the first block goes fast and then again the process crawls to a halt. That is, my process only seems to be able to use 1.5 GB of RAM for the sort. What is odd is that I have 4 GB of physical RAM.
I tried just allocating an 8 GB array and walking through it writing each byte and printing a '.' every million bytes; it seems that everything starts to slow down around 1.5 GB and stays at that rate even past 4 GB when I know it must be going to disk; so the OS starts writing pages to disk around 1.5 GB.
I want to use my machine to sort large arrays. How do I tell my OS to give my process at least, say, 3.5 GB of RAM ? I tried using mlock(), but that just seems to slow things down even more. Ideas?

managed heap fragmentation

I am trying to understand how heap fragmenation works. What does the following output tell me?
Is this heap overly fragmented?
I have 243010 "free objects" with a total of 53304764 bytes. Are those "free object" spaces in the heap that once contained object but that are now garabage collected?
How can I force a fragmented heap to clean up?
!dumpheap -type Free -stat
total 243233 objects
Statistics:
MT Count TotalSize Class Name
0017d8b0 243010 53304764 Free
It depends on how your heap is organized. You should have a look at how much memory in Gen 0,1,2 is allocated and how much free memory you have there compared to the total used memory.
If you have 500 MB managed heap used but and 50 MB is free then you are doing pretty well. If you do memory intensive operations like creating many WPF controls and releasing them you need a lot more memory for a short time but .NET does not give the memory back to the OS once you allocated it. The GC tries to recognize allocation patterns and tends to keep your memory footprint high although your current heap size is way too big until your machine is running low on physical memory.
I found it much easier to use psscor2 for .NET 3.5 which has some cool commands like ListNearObj where you can find out which objects are around your memory holes (pinned objects?). With the commands from psscor2 you have much better chances to find out what is really going on in your heaps. Most commands are also available in SOS.dll in .NET 4 as well.
To answer your original question: Yes free objects are gaps on the managed heap which can simply be the free memory block after your last allocated object on a GC segement. Or if you do !DumpHeap with the start address of a GC segment you see the objects allocated in that managed heap segment along with your free objects which are GC collected objects.
This memory holes do normally happen in Gen2. The object addresses before and after the free object do tell you what potentially pinned objects are around your hole. From this you should be able to determine your allocation history and optimize it if you need to.
You can find the addresses of the GC Heaps with
0:021> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x101da9cc
generation 1 starts at 0x10061000
generation 2 starts at 0x02aa1000
ephemeral segment allocation context: none
segment begin allocated size
02aa0000 02aa1000** 03836a30 0xd95a30(14244400)
10060000 10061000** 103b8ff4 0x357ff4(3506164)
Large object heap starts at 0x03aa1000
segment begin allocated size
03aa0000 03aa1000 03b096f8 0x686f8(427768)
Total Size: Size: 0x115611c (18178332) bytes.
------------------------------
GC Heap Size: Size: 0x115611c (18178332) bytes.
There you see that you have heaps at 02aa1000 and 10061000.
With !DumpHeap 02aa1000 03836a30 you can dump the GC Heap segment.
!DumpHeap 02aa1000 03836a30
Address MT Size
...
037b7b88 5b408350 56
037b7bc0 60876d60 32
037b7be0 5b40838c 20
037b7bf4 5b408350 56
037b7c2c 5b408728 20
037b7c40 5fe4506c 16
037b7c50 60876d60 32
037b7c70 5b408728 20
037b7c84 5fe4506c 16
037b7c94 00135de8 519112 Free
0383685c 5b408728 20
03836870 5fe4506c 16
03836880 608c55b4 96
....
There you find your free memory blocks which was an object which was already GCed. You can dump the surrounding objects (the output is sorted address wise) to find out if they are pinned or have other unusual properties.
You have 50MB of RAM as Free space. This is not good.
Having .NET allocating blocks of 16MB from process, we have a fragmentation issue indeed.
There are plenty of reasons to fragmentation to occure in .NET.
Have a look here and here.
In your case it is possibly a pinning. As 53304764 / 243010 makes 219.35 bytes per object - much lower then LOH objects.

Resources