I have a cassandra cluster where I recently added two new nodes . Looking at the stats I observed that disk I/O in these newly added machines is way higher than the already present machines .
On checking I found that the read_ahead_kb OS configuration on these machines is 4096 whereas other machines it is 4 .
I changed the value but the disk I/O is still same . Do we need to restart the machines for these OS configuration changes to take effect ?
Also if there is any other setting I need to look at .
It depends how you set the readahead value. The following command will set the readahead for /dev/sda to 4kb, and will take effect immediately (no reboot necessary):
sudo blockdev --setra 4 /dev/sda
I recommend configuring a udev rule (as described here), as otherwise the change will be lost after a reboot.
Read ahead is one of the most important performance tweaks regarding disk I/O & throughput. Some other things that are important for read performance:
ensure you have plenty of free RAM for the OS page cache
disable swap
use SSDs over spinning disks, especially if you have a read-heavy workload
This guide is a few years old, but many of the OS tuning & hardware recommendations still apply to Cassandra 3.x:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
Related
I came across a weird problem for which I cannot find a solution elsewhere. Maybe you can help me.
I have a system running Ubuntu 20 LTS which is the host of six guests (four Ubuntu 20 LTS and two Windows Server 2019) and they are running quite fast up to the point where I have taken live snapshots. I'm running the guest on QEMU/KVM while using QCOW2 files and I'm using virsh to manage these virtual systems.
I take the live snapshots (without the RAM state) of the guests with the following command:
virsh snapshot-create-as $VM --no-metadata $timestamp --disk-only --atomic
This almost immediately snapshots all the virtual disks of a particular guest and creates new delta files to which the differences are written to. I then have for all guests and for all disks the following structure:
base <- snapshot <- live_delta_file
After copying away the snapshots, I commit them to their base files with the following command:
virsh blockcommit $currentVM $disk --base $path_to_base --top $path_to_snapshot --verbose --wait
After that, I delete the snapshots and all of this works without producing any errors.
However, after taking the snapshots and while all the guests are still running without any errors, each VM is horribly slow with respect to any command in the shell. Furthermore, I can see via top on the host, that the RAM usage of each guest has dramatically reduced (e.g. for the Windows Server 2019 with GUI from 25 GB to 2.5 GB).
It seems, that all the cached data was removed from the RAM which - of course - strongly reduces the performance. However, taking the snapshots (without the --quiesce parameter) should not lead to this behavior, or?. After a reboot of all the guests, everything again works quite fast (while nothing was changed with respect to the snapshot-structure).
Do you have an idea which configuration or situation can lead to such a behavior?
Thank you in advance!
----- EDIT -----
It seems that the actual problem is copying away the files via scp/rsync after the snapshots were taken because one of these programs (rsync?) is eaten up all the memory on the host leading to swapping parts of the RAM of the guests to disk.
Even after the copy process has finished, the copied data seems to remain in the host cache and the guests are further using parts of the swap space of the host.
This of course explains the bad performance of the guests. It can be fixed by clearing the page cache and the swap space by using the following commands:
sync; echo 1 > /proc/sys/vm/drop_caches
swapoff -a; swapon -a
But be careful, clearing the swap space can take several hours with pausing the operation of the guests. Either it should be done at night when they are not used or the problem should be solved at its root, i.e., at the rsync/scp part.
I recognize your experiences.
I solved it by making the caching and swapping less agressive like so.
Maybe it can help you too.
(from /etc/sysctl.conf)
# Make the kernel less swappy
vm.swappiness = 5
# Make the kernel free cached dentries and inodes sooner
vm.vfs_cache_pressure = 200
If we reserve 80% of memory for yarn, and then for some reason lets say memory leak for example, the os and local programs consumed 50% of memory.
Does yarn will be aware that there is only 50% for him ? whats the impact for the newly submitted applications ?
YARN doesn't monitor the OS for available memory. It's run as a normal process like everything else. So the OS will do what it does whenever more memory is asked for than is available.
RE: MapReduce, most MR jobs typically use far less memory than they need so in most cases a local process over-consuming memory will not cause any problems. YARN developers have noticed this underutilization pattern and have enabled a feature Opportunistic Containers to maximize node efficiency.
I am running Cloudera Hadoop on my laptop and Oracle VirtualBox VM.
I have given 5.6 GB out of mine 8 and six from eight cores as well.
And still I am not able to keep it up and running.
Even without load services would not stay up and running and when I try a query at least Hive will be down within 20 minutes. And sometimes they go down like dominoes: one after another.
More memory seemed to help some: with 3GB and all services, Hue was blinking with red colors when the Hue itself managed to get up. And after rebooting it would takes 30 - 60 minutes before I manage to get the system up enough to even try running anything on it.
There has been two sensible notes (that I have managed to find):
- Warning of swapping.
- Crashing note when the system used 26 GB of virtual memory which was not enough.
My dataset is less than one megabyte, so it is hard to understand why the system would go up to dozens of gigabytes, but for whatever was reason for that has passed: now the system is running more steadily around the 5.6 GB that I have given to it after closing down a few services: see my answer to myself.
And still it is just more stable. Right after I got a warning of swapping and the Hive went down again. What could be reason for more-or-less all Hadoop services going down if the VM starts to swap?
I don't have enough reputation to post the picture to here, but when Hive went down again it was swapping 13 pages / second and utilizing 5.9 GB / 5.6 GB. So basically my system starts crashing more-or-less right after it start to swap. "428 pages were swapped to disk in the previous 15 minute(s)"
I have used default installation options as far as hard drive is concerned.
Only addition is a shared folder between Windows and VM. That works somewhat strangely locking files all the time, so I used it just like FTP and only for passing files from one system to another. Thus I can go days without using it, but systems still crash, so that is not the cause either.
Now that the system is mostly up, services crash still about twice a day: Service Monitor and Hive are quite even with their crashing frequency. After those come Activity Monitor and Event Server, which appear to crash always together. I believe Yarn crashes as well, but it gets up on its own. Last time Hive crashed first, and then it got followed by Service Monitor, Hive (second time), Activity Monitor and Event Server all.
As swap is disk, perhaps the problem is with disk:
# cat /etc/fstab
# swapoff -a
# badblocks -v /dev/VolGroup/lv_swap
Checking blocks 0 to 8388607
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.
# badblocks -vw /dev/VolGroup/lv_swap
Checking for bad blocks in read-write mode
From block 0 to 8388607
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
So nothing wrong with swap disk and I have not noticed any disk error anywhere else either.
Note that you could check file system from Windows side also. But I expect that if you make Windows to fix your Linux file system, you have good chances of destroying your Linux with that, so I did my checks somewhat pessimistically, because AFAIK these commands are safe to execute.
About half of the services kept going down, so giving more specifics would be a long story.
I succeeded to get the system more stable by closing down flume, hbase, impala, ks_indexer, oozie, spark and sqoop. And by increasing more memory to some remaining services that complained they had not been given enough memory.
Also I fixed couple of thing on the Windows side, I am not sure which one of these helped:
- MsMpEng.exe kept my hard drive busy. I didn't have permissions to kill it, but I decreased its priority to lowest possible.
- CcmExec.exe got to loop on my DVD and kept reading it for forever. This I solved by taking the DVD out from the drive. Then later on I killed the process tree to keep it from bothering for a while.
I found these using Windows resource manager.
The VM requires 4GB: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html You should use that.
I am not clear whether you are using the QuickStart VM though. It's set up to run just the essential services and tuned to conserve memory rather than exploit lots of memory.
It sounds like you are running your own installation, on one virtual machine, on your Windows machine. You may be running an entire cluster's worth of services on one desktop machine. Each of these services has master, worker processes, monitoring processes, etc. You don't need most of them.
You also probably have left memory settings at default suitable for a server-class machine of 16+ GB RAM. Remember these services usually run across many machines, not all on one.
Finally, you're clearly swapping, and that makes things incredibly slow. Remember this is all through a VM too!
Bottom line, use the QuickStart VM if you really want a 1-machine cluster tuned correctly. If you want a real cluster or more services, you need more hardware.
Also consider: cloudera.com/live contains a full CDH 5.1 cluster + sample data, running on demand on AWS. Of course, the advantage of the VM is that you can BYOD, but if you're simply looking for a hands-on Hadoop experience, Live is a great option.
I am not an OS expert, and I am having trouble understanding my server's memory usage. I need your advices to understand the following:
My server has 8 GB RAM and operates as web server. PHP, mySQL and Apache processes consume the majority of the memory. When I issue the command "free" after the system is rebooted, I would normally see something along these lines:
total used free shared buffers cached
Mem: 8059080 2277924 5781156 0 948 310852
-/+ buffers/cache: 1966124 6092956
Swap: 4194296 0 4092668
Obviously, sooner or later the free memory would drop and the cached memory would increase and I assume there is nothing wrong with that since the OS decides to cache it.
What I don't understand is about 1-2 days later after the machine is rebooted, I would slightly see an increase in the used swap memory. Does not this mean that the server does not have free memory anymore and using IO instead? How can I understand which processes cause this?
I am asking this question to stackoverflow users because if I ask it to my hosting provider, I am sure they would ask more money to increase RAM.
Thanks.
This is perfectly normal. When the machine starts up, a large number of services also start up. As they run their startup code, read their configuration, and so on, they dirty some pages of memory. Many of these services will never run again. By writing this data to swap, the operating system accomplishes two things:
First, if it ever does encounter memory pressure, it can discard the pages without having to write them first, since it has already written them. Second, it can discard the pages to make more free memory to enlarge the cache.
The alternative is to keep information that hasn't been touched in days in physical memory. And that just doesn't make sense.
Hypervisors and Memory Management
I have been using virtual machines for years and never really had any issues. I have primarily used VMWare's free single ESXi host and had nothing but success. Because I have never had any issues I have never delved in much deeper. I have however always been very wary of loading the system up and get a lot of spare resources handy.
I have recently purchased a new server and we have decided to give Hyper-V a try and see how that goes. We have a fairly small team but utilise lots of servers for testing etc.
My question relates to memory and how much I need to leave free or available for the host machine to run appropriately.
Setup:Dell Server 24 Cores: 48GB Ram
When I run taskmgr in the windows host instance I see the following:
Physical Memory: 49139
Cached: 14933
Available: 17743
Free: 2982
What exactly do these figures mean? What is the difference between free and available?
My server uses hardly any CPU resources ever and has 10 Production servers running on it without a single user complaint ever about speed of the services.
Am I able to run up another server with 2GB ram effectivly leaving 982MB free? or am I starting to push my requirements a little?
Thanks for the help.
You shouldn’t use the host partition for anything other than Hyper-V (although you can run security and infrastructure software such as management agents, backup agents and firewalls). Therefore, that 2GB recommendation assumes you aren’t going to run any extra applications or server roles in the parent partition.
Hyper-V doesn’t let you allocate memory directly to the host partition. It essentially uses whatever memory is left over. Therefore, you have to remember to leave 2GB of your host server’s memory not allocated so it’s available for the parent partition.
Source