How much memory is available for database use in memsql - memory-management

I have created memsql cluster on 7 machines. One of the machine shows that out of 62.86 GB only 2.83 is used. So here I am assuming that around 60 GB
memory is available to store data.
But my top command tell another story
Here we can see that about 21.84 GB memory is getting used and free memory is 41 GB.
So
1> How much exact memory is available for database? Is it 60 Gb as per cluster URL or 42 Gb as per top command
Note that:
1>memsql-op is consuming aroung 13.5 g virtual memory.
2> as per 'top' if we subtract buffered and cached memory's total size from used memory, then it comes to 2.83GB which is used memory as per cluster URL

To answer your question, you currently have about 60GB of memory free to be used by any process on your machine including the MemSQL database. Note that MemSQL has some overhead and by default reserves a small percentage of the total memory for overhead. If you visit the status page in the MemSQL Ops UI and view the "Leaf Table Memory" card, you will discover the amount of memory that can be used for data storage within the leaf nodes of your MemSQL cluster.
MemSQL Ops is written in Python which is then embedded into a "single binary" via a packaging tool. Because of this it exhibits a couple of oddities including high VM use. Note that this should not affect the amount of data you can store, as Ops is only consuming 308MB of resident memory on your machine. It should stay relatively constant based on the size of your cluster.

Related

Cassandra java process using more memory than its allocated max heap size (Xmx)

We have our cassandra cluster which runs Apache Cassandra 3.11.4 in set of unix hosts (18). each of these host has 96G of RAM and we have configured heap size to -Xms=64G -Xmx=64G but top command (top -M) on hosts shows the actual memory utilization is ~85G on average i.e. much higher than allocated heap (64G).
the trends of memory usage are like, during startup of cassandra daemon, top -M show the process has already occupied ~75G which (75G-64G)=9G more than allocated heap size, and this memory utilization increases over time and reaches to max 85G in just 3-4 hours and remains at that stage throughout the time, while the heap utilization (~40-50%) is normal, GS activities are usual, minor GC kicks in as usual.
have confirmed that the total off-heap memory utilized by all the keyspaces are below 2G on each hosts.
We are unable to trace what else is consuming the RAM in addition to the allocated heap.
Besides the heap memory, Cassandra uses also the off-heap memory, for example for keeping compression metadata, bloom filters, and some other things. From documentation (1, 2):
Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per terabyte of data on disk, though the exact usage varies with chunk_length_in_kb and compression ratios.
Bloom filters are stored in RAM, but are stored offheap, so operators should not consider bloom filters when selecting the maximum heap size.
You can monitor heap & offheap memory usage using the JMX, for example. (I've seen setups, where bloom filter alone occupied ~40Gb of RAM, but it was heavily dependent on the number of the unique partition keys)
Too big heaps are usually not recommended because they can use long pauses, etc. It of course depends on the workload, but you can try 31Gb or lower (or just use default settings). Plus, you need to leave the memory for Linux file buffers so it will cache often used files. That is the reason why by default Cassandra allocates only 1/4th of system memory for heap.

setting up heap memory in jmeter for more than one concurrent script execution

Below is my scenario.
I have 2 test scripts :- one might use 5GB to 15GB of heap memory and other script might use from 5GB to 12GB.
If i have a machine of 32 GB memory,
While executing for the first script can i assign XMS 1GB XMX 22GB(though my script needs 15GB) and for the second script can i assign XMS 1GB and XMX 12GB
As sum of maximum goes beyong 32GB(total memory)
In the second case i assign like this--->
for script 1:XMS 22GB XMX 22GB
for script 2:XMS 12GB and XMX 12GB
Sum of Max 34GB.
Does it by any chance work like below----- >
If 12GB is assigned for first script,is this memory blocked for that process/script ? and can i not use the unused memory for other processes ?
or
If 12GB is assigned for the first script ,it uses only as much as requuired by it and any other process can use the rest memory ? IF it works in this way-i don't have to specifically assign heap for two scripts separately.
If you set the minimum heap memory via Xms parameter the JVM will reserve this memory and it will not be available for other processes.
If you're able to allocate more JVM Heap than you have total physical RAM it means that your OS will go for swapping - using hard drive to dump memory pages which extends your computer memory at cost of speed because memory operations are fast and disk operations are very slow.
For example look at my laptop which has 8 GB of total physical RAM:
It has 8 GB of physical memory of which 1.2 GB are free. It means that I can safely allocate 1 GB of RAM to Java
However when I give 8 GB to Java as:
java -Xms8G
it still works
15 GB - still works
and when I try to allocate 20 GB it fails because it doesn't fit into physical + virtual memory.
You must avoid swapping because it means that JMeter will not be able to send requests fast enough even if the system under tests supports it so make sure to measure how much available physical RAM you have and your test must not exceed this. If you cannot achieve it on one machine - you will have to go for distributed testing
Also "concurrently" running 2 different scripts is not something you should be doing because it's not only about the memory, a single CPU core can execute only one command at a time, other commands are waiting in the queue and being served by context switching which is kind of expensive and slow operation.
And last but not the least, allocating the maximum HEAP is not the best idea because this way garbage collections will be less frequent but will last much longer resulting in throughput dropdowns, keep heap usage between 30% and 80% like in Optimal Heap Size article

What does VIRTUAL_MEMORY_BYTES task counter mean in Hadoop?

The following excerpt from The Definitive Guide provides high level details as shown below but
what exactly is virtual memory is referring to in this task counter?
How to interpret it? How is it related to PHYSICAL_MEMORY_BYTES?
Following is an example extract from one of the jobs. Physical is 214 GB approx. and virtual is 611 GB approx.
1.What exactly is virtual memory is referring to in this task counter?
Virtual Memory here is used to prevent Out of Memory errors of a task,if data size doesn't fits in RAM(physical mem).
in RAM.So a portion of memory of size what didn't fit in RAM will be used as Virtual Memory.
So,while setting up hadoop cluster one is advised to have the value of vm.swappiness =1 to achieve better performance. On linux systems, vm.swappiness is set to 60 by default.
Higher the value more aggresive swapping of memory pages.
https://community.hortonworks.com/articles/33522/swappiness-setting-recommendation.html
2. How to interpret it? How is it related to PHYSICAL_MEMORY_BYTES?
swapping of memory pages from physical memory to virtual memory on disk when not enough phy mem
This is the relation between PHYSICAL_MEMORY_BYTES and VIRTUAL_MEMORY_BYTES.

statement_mem seems to limit the node memory instead of the segment memory

According to the GreenPlum documentation, GUCs such as statement_mem, gp_vmem_protect_limit should work at segment level. Same thing should happen with a resource queue memory allowance.
On our system we have 8 primary segments per node. So if I set the statement_mem of a query to 2GB I would expect the query to consume (if needed) up to 2GB x 8 = 16GBs of RAM. But it seems that it would only use 2GBs total per node before starting to write into disk (that's it 2GB/8 per segment). I tried with different statement_values and same thing.
max_statement_mem or gp_vmem_protect_limit limits are never reached. RAM usage on nodes have been monitored using various tools (from GP command center to top, free, all the way across Pivotal suggested session_level_memory_consumption view).
EDITED FROM HERE
ADDED two documentation sources where statement_mem is defined per segment and not per host. (#Jon Roberts)
On the GP best practices guide, beginning of page 32, it clearly says that if the statement_mem is 125MB and we have 8 segments on the server, each query will get 1GB allocated per server.
https://www.google.es/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwi6sOTx8O3KAhVBKg4KHTwICX0QFggmMAE&url=http%3A%2F%2Fgpdb.docs.pivotal.io%2F4300%2Fpdf%2FGPDB43BestPractices.pdf&usg=AFQjCNGkTqa6143fvJUztYISWAiVyj62dA&sig2=D2ZcJwLDqN0qBzU73NjXNg&bvm=bv.113943164,d.ZWU&cad=rja
On the https://support.pivotal.io/hc/en-us/articles/201947018-Pivotal-Greenplum-GPDB-Memory-Configuration it seems to use statement_mem as segment memory and not host memory. It keeps interrelating statement_mem with the memory limit of the resource queues as well as with the gp_vmem_protect_limit (both parameters defined per segment basis).
This is why I'm getting confused about how to properly manage the memory resources.
Thanks
I incorrectly stated that statement_mem is on a per host and that is not the case. This link is talking about the memory on a segment level:
http://gpdb.docs.pivotal.io/4370/guc_config-statement_mem.html#statement_mem
With the default of "eager_free" gp_resqueue_memory_policy, memory gets re-used so the aggregate amount of memory used may look low for a particular query execution. If you change it to "auto" where the memory isn't re-used, the memory usage is more noticeable.
Run an "explain analyze" of your query and see the slices that are used. With eager_free, the memory gets re-used so you may only have a single slice wanting more memory than available such as this one:
(slice18) * Executor memory: 10399K bytes avg x 2 workers, 10399K bytes max (seg0). Work_mem: 8192K bytes max, 13088K bytes wanted.
And for your question on how to manage the resources, most people don't change the default values. A query that spills to disk is usually an indication that the query needs to be revised or the data model needs some work.

MongoDB process size in task manager

I have been working on MongoDB and insterted upto 1 GB data into a database collection and noticed that the process size of MongoDB shown in task manager is 25mb but overall Memory in Performance tab of task manager is getting higher as i insert data, Question is why that 1 GB is not part of Process Size shown by task manager, i know that mongodb store it on Files but yet it cache a part of that data in memory.
MongoDB (<= 2.6) uses memory-mapped files. This means that the database asks the operating system to map the data files to a portion of virtual memory. The operating system then handles moving things in and out of physical memory according to what the database accesses. Your 1GB of data is mapped into virtual memory, but is likely not resident in physical memory since you have not accessed it recently. To see more detailed statistics about MongoDB's memory usage, run db.serverStatus() in the shell and look at the mem section. You can read a bit more about the memory-mapped storage engine in the storage FAQ.

Resources