h2o is not using all processors - h2o

I have a server with 48 processors.
The server is not virtualized and the h2o sees 48 processors, but 16 of them for some reason are not being used.
Any advice?
enter image description here

It looks like somehow your H2O cluster was launched with 32 cores instead of the full 48. That's what "H2O cluster allowed cores: 32" indicates is happening. To use all the cores, do the following:
Shut down your existing H2O cluster using h2o.shutdown()
Start a new H2O cluster from R using h2o.init(nthreads = -1), which means that it will use all available cores. If for some reason that does not work, try h2o.init(nthreads = 48).
You can also start the H2O cluster from the command line by typing the following: java -Xmx30g -jar h2o.jar -nthreads 48 and then use h2o.init() to connect inside R.
Feel free to also upgrade to the latest stable version of H2O (3.8.0.2 is slightly outdated, now we are at 3.8.1.1).

It looks like this was a limitation in the old version. Using 3.10 and testing 3.12 now issue was fixed.

Related

Not able to create a new cluster in Azure Databricks

I have free trial with some credits remaining , I want to create a new cluster inside azure databricks and write some code in scala notebooks , but it seems everytime i try to create a new clsuter it says terminated. Can someone help what needs to be done to create a new cluster
Using Databricks with Azure free trial subscription, we cannot use a cluster that utilizes more than 4 cores. It can be understood that you are using a Standard cluster which consumes 8 cores (4 worker and 4 driver cores).
So, try creating a ‘Single Node Cluster’ which only consumes 4 cores (driver cores) which does not exceed the limit. You can refer to the following document to understand more about single node cluster.
https://learn.microsoft.com/en-us/azure/databricks/clusters/single-node
If you need to use Standard cluster, upgrade your subscription to pay-as-you-go or use the 14-day free trial of Premium DBUs in Databricks. The following link refers to a problem like the one you are facing.
https://learn.microsoft.com/en-us/answers/questions/35165/databricks-cluster-does-not-work-with-free-trial-s.html
That is normal. You can create your Scala notebook and then attach and start the cluster from the drop down menu of the Databricks notebook.

Cannot obtain cAdvisor container metrics on Windows Kubernetes nodes

I have configured a mixed node Kubernetes cluster. Two worker nodes are Unbuntu Server 18.04.4 and two worker nodes are Windows Server 2019 Standard. I have deployed several Docker containers as deployments/pods to each set of worker nodes (.NET Core apps on Ubuntu and legacy WCF apps on Windows). Everything seems to work as advertised.
I am now at the point where I want to monitor the resources of the pod/containers. I have deployed Prometheus, kube-state-metrics, metrics-server. I have Prometheus scraping the nodes. For container metrics, the kubelet/cAdvisor is returning everything I need from the Ubunutu nodes, such as "container_cpu_usage_seconds_total, container_cpu_cfs_throttled_seconds_total, etc". But the kubelet/cAdvisor for the Windows nodes only give me some basic information:
http://localhost:8001/api/v1/nodes/[WINDOWS_NODE]/proxy/metrics/cadvisor
# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="",kernelVersion="10.0.17763.1012",osVersion="Windows Server 2019 Standard"} 1
# HELP container_scrape_error 1 if there was an error while getting container metrics, 0 otherwise
# TYPE container_scrape_error gauge
container_scrape_error 0
# HELP machine_cpu_cores Number of CPU cores on the machine.
# TYPE machine_cpu_cores gauge
machine_cpu_cores 2
# HELP machine_memory_bytes Amount of memory installed on the machine.
# TYPE machine_memory_bytes gauge
machine_memory_bytes 1.7179398144e+10
So while the cAdvisor on the Ubuntu nodes gives me everything I ever wanted about containers and more, the cAdvisor on the Windows nodes only gives me the above.
I have examined the Powershell scripts that install/configure kubelet on the Windows nodes, but don't see/understand how I might configure a switch or config file if there is a magical setting I am missing that would enable container metrics to be published when kubelet/cAdvisor is scraped. Any suggestions?
There is metrics/resource/v1alpha1 endpoint. But it provides only 4 basic metrics.
Documentation
I think that cAdvisor doesn't support windows nodes properly that you see is just a n emulated interface with limited metrics
Github issue

Zeppelin Interpreter Memory - driver memory

Im unsuccessfully trying to increase the driver memory for my spark interpreter.
I just set spark.driver.memory in interpreter settings and everything looks great at first.
But in the docker container that zeppelin runs there is
Zeppelin 0.6.2
Spark 2.0.1
2:06 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /usr/zeppelin/int.....-2.7.2/share/hadoop/tools/lib/* -Xmx1g ..... --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer /usr/zeppelin/interpreter/spark/zeppelin-spark_2.11-0.6.2.jar 42651
a max heap setting that kind of breaks everything.
My main issue is I am trying to run the Latent Dirchilet Allocation of mllib and it always runs out of memory and just dies on the driver.
The docker container has 26g RAM now so that should be enough.
Zeppelin itself should be fine with its 1g ram.
But the spark driver simply needs more.
My Executor process have RAM but the driver is reported in the UI as
Executor ID Address Status RDD Blocks Storage Memory Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time (GC Time) Input Shuffle Read Shuffle Write Thread Dump
driver 172.17.0.6:40439 Active 0 0.0 B / 404.7 MB 0.0 B 20 0 0 1 1 1.4 s (0 ms) 0.0 B 0.0 B 0.0 B Thread Dump
pretty abysmal
Setting ZEPPELIN_INTP_MEM='-Xms512m -Xmx12g' does not seem to change anything.
I though zeppelin-env.sh is not loaded correctly so I passed this variable directly in the docker create -e ZE... but that did not change anything.
SPARK_HOME is set and the it connects to a standalone spark cluster. But that part works. Only the driver runs out of memory.
But I tried starting a local[*] process with 8g driver memory and 6g executor but the same abysmal 450mb driver memory.
the intrepreter reports a java heap out of memory error and that breaks that halts the LDAModel training.
Just came across this in a search while running into the exact same problem! Hopefully you've found a solution by now, but just in case anyone else runs across this issue and is looking for a solution like me, here's the issue:
The process you're looking at here isn't considered an interpreter process by Zeppelin, it's actually a Spark Driver process. This means that it gets options set differently than the ZEPPELIN_INTP_MEM variable. Add this to your zeppelin-env.sh:
export SPARK_SUBMIT_OPTIONS="--driver-memory 12G"
Restart Zeppelin and you should be all set! (tested and works with the latest 0.7.3, assuming it works with earlier versions).
https://issues.apache.org/jira/browse/ZEPPELIN-1263 fix this issue. After that you can use whatever standard spark configuration. e.g. you can specify driver memeory via setting spark.driver.memory in spark interpreter setting.

Ambari scaling memory for all services

Initially I had two machines to setup hadoop, spark, hbase, kafka, zookeeper, MR2. Each of those machines had 16GB of RAM. I used Apache Ambari to setup the two machines with the above mentioned services.
Now I have upgraded the RAM of each of those machines to 128GB.
How can I now tell Ambari to scale up all its services to make use of the additional memory?
Do I need to understand how the memory is configured for each of these services?
Is this part covered in Ambari documentation somewhere?
Ambari calculates recommended settings for memory usage of each service at install time. So a change in memory post install will not scale up. You would have to edit these settings manually for each service. In order to do that yes you would need an understanding of how memory should be configured for each service. I don't know of any Ambari documentation that recommends memory configuration values for each service. I would suggest one of the following routes:
1) Take a look at each services documentation (YARN, Oozie, Spark, etc.) and take a look at what they recommend for memory related parameter configurations.
2) Take a look at the Ambari code that calculates recommended values for these memory parameters and use those equations to come up with new values that account for your increased memory.
I used this https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_installing_manually_book/content/determine-hdp-memory-config.html
Also, Smartsense is must http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.2.0/index.html
We need to define cores, memory, Disks and if we use Hbase or not then script will provide the memory settings for yarn and mapreduce.
root#ttsv-lab-vmdb-01 scripts]# python yarn-utils.py -c 8 -m 128 -d 3 -k True
Using cores=8 memory=128GB disks=3 hbase=True
Profile: cores=8 memory=81920MB reserved=48GB usableMem=80GB disks=3
Num Container=6
Container Ram=13312MB
Used Ram=78GB
Unused Ram=48GB
yarn.scheduler.minimum-allocation-mb=13312
yarn.scheduler.maximum-allocation-mb=79872
yarn.nodemanager.resource.memory-mb=79872
mapreduce.map.memory.mb=13312
mapreduce.map.java.opts=-Xmx10649m
mapreduce.reduce.memory.mb=13312
mapreduce.reduce.java.opts=-Xmx10649m
yarn.app.mapreduce.am.resource.mb=13312
yarn.app.mapreduce.am.command-opts=-Xmx10649m
mapreduce.task.io.sort.mb=5324
Apart from this, we have formulas there to do calculate it manually. I tried with this settings and it was working for me.

How to change the Elasticsearch ES_HEAP_SIZE on CentOS

Elasticsearch 1.7.2 on CentOS, 8GB RAM
We see that ES_HEAP_SIZE should be increased to 4g.
The only place this seems declared in the ES environment is in /etc/init.d/elasticsearch
We set it to 4g in this init file, restarted ES, but the jvm "heap_max_in_bytes" (as returned from /_nodes/stats ) did not move from the default 1g value.
Where and how can we get control of ES_HEAP_SIZE ?
(I should add: The similar looking threads here on SO are either dated [e.g. apply to earlier versions of ES and do not apply to 1.7.x] or are for other platforms [win, osx], or are do not work [have tried them, and you can see many of the responses are tagged 'this is a hack don't do it'])
(I should further note that the ES docs document this element, and suggest what to set it to, but do not instruct how or where.)
Note: Below is for Elasticsearch 1.7.x. For 5.3 and higher, it is different.
Per a comment that is rather buried on How to change Elasticsearch max memory size
On CentOS /etc/sysconfig/elasticsearch is the appropriate place to make these changes.
This has been tested and verified on my CentOS 7 environment. Strongly expect it to also fly on CentOS 6.

Resources