Mixed SLURM Cluster using Centos7 and Debian8 - cluster-computing

I am using SLURM on a Centos7 cluster.
Is it possible to get SLURM working with various OSes (for example Centos7 and Debian8) on the same cluster?

Yes Slurm will run, but beware that programs compiled on one OS will not necessarily function on the other OS.

Related

How to get HDFS and YARN version programmatically?

I'm writing a spark program that download different jars from maven based on the environment it runs on, each for a different version of Hadoop distribution (e.g. CDH, HDP, MapR).
This is necessary because some low-level APIs of HDFS and YARN are not shared between these distributions. However, I cannot find any public API of HDFS and YARN that tells their version.
Is it possible to do it only in Java? Or I have to run an external shell to know it?
In Java org.apache.hadoop.util.VersionInfo.getVersion() should work.
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/VersionInfo.html
For the CLIs, you can use:
$ hadoop version
$ hdfs version
$ yarn version

Greenplum installation error

While installing greenplum we are getting below error after running gpcheck command
GPCHECK_ERROR : uname -r output is different among hosts.
on two machines we have installed centos6 and in one machine we have installed centos7.
for greenplum installation is it necessary all hosts should have same os version?
should we ignore this error and go ahead.?
You must have the same OS version on all the cluster machines. Greenplum home directory is used for installing gppkgs (add-ons) that are in fact packed rpms. Greenplum initializes rpm database inside of GPDB home directory for managing add-ons. Whenever you do "gpseginstall" (installation, expansion), GPDB copies the content of GPDB home directory to other hosts. However RPM database created on one version of OS is not valid on another, so you would get errors trying to install/list/remove packages there
In general, if you don't plan to use any gppkgs and use it merely for PoC purposes, this should work, but I would strongly recommend to use the same OS version on all the cluster hosts
It is recommended to have same OS (kernel). If it is not production environment you can give try ignoring it. I have never tested it.

Is there a reliable way how to run virtualized Docker / Kubernetes stack on Windows?

Are there any up-to-date guides, or VM images of some Linux VM + Kubernetes that I could run on Windows? Both VMWare, VirtualBox or Vagrant images would help. I'm trying to set up a development environment. (There is no production environment yet, but it will be most likely self-hosted.)
I tried installing several Vagrant templates for Kubernetes linked from their github documentation, but they were specifically marked as not supported on Windows; I tried compiling Kubernetes 0.15 from source under CoreOS and Boot2Docker, but ran into problems with either.
Since my ops skill set is relatively low, I'd sleep easier if I could use a template that was set up by someone who knew what they're doing.
If you install Docker on Windows (see the Docker instructions), you can then follow the guide to run Kubernetes locally via Docker and then once you are comfortable with that try running Multi-Node Kubernetes Using Docker.

mpiexec hangs for remote execution

I have two EC2 instances.
Ubuntu 12.04 running OpenMPI 1.4.3
Ubuntu 14.04 running OpenMPI 1.6.5
I run this command:
mpiexec --hostfile machines ls
where "machines" is a file that contains the IP address of the other server that the command is being run on. Every time, it hangs indefinitely. When I replace the IP address with the server that the command is being run on, it works fine. I can password-less ssh between machines fine.
I tried having the same version of MPI on both machines, but could not get that to work. apt-get installs different versions on both machines for some reason.
What can I do to make MPI work between machines?

How to find cdh version hadoop

When connecting to Hadoop cluster, how can I know which version of Hadoop this cluster is running? In particular this is important for proper configuration of libraries when compiling and packaging Hadoop Java jobs with Maven.
The simplest way if you have ssh access to hadoop node is by running command
$ hadoop version
If you are looking for CDH version then check /usr/lib/hadoop/cloudera/cdh_version.properties
In cdh, in the cluster I am using, there is not any cdh_version.properties (or I couldn't find it)
If your cluster uses "Parcels", you could check which version of cdh is used by doing:
/opt/cloudera/parcels
And you could see the version as the name of the folder:
CDH-5.5.1-1.cdh5.5.1.p0.11
Note: I know that this is a not a general rule for getting which cdh version is used. I am trying to show an alternative way that it worked to me.
We can check the installed version with the help of following command:
cat /usr/lib/hadoop/cloudera/cdh_version.properties
Hope this may help you.

Resources