Sharing executeables from head node to compute nodes in a Unix HPC cluster - installation

I am working with some CFD software on a Unix cluster, and the software is installed on the head node, but not as a module. Is there a way for me to share this software and its executable with the compute nodes so we can run jobs via slurm job scripts without having to reinstall this software to a different location on the head node?
So far I have been able to run a test case on the head node, so I know that the software is installed correctly and at least working on the head node. I have ssh to one of the compute nodes and searched for the executable there and not found a path where I have permission to run said executable.

Related

Copy files from several Windows nodes to a single Linux host

I am trying to copy several files from windows nodes (Node A and Node B) into a linux node were ansible is (Node C).
My question is, there is a way to copy files directly from windows nodes to a linux node, if yes, how for this situation?
The only workaround that I have think about is using a remote shared folder, were all the nodes had access to write in it, and use win_copy for windows nodes to that location and after that a copy for the linux node to pick those files.
Thank you in advance!
I've just notice that the right module for this task is fetch, it also works on Windows Nodes. Its a reverse command compared with copy modules.
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/fetch_module.html

Can I use a hadoop distribution instead manually installing?

I am planning to implement a hadoop cluster with about 5 machines. With some background study, I understood that I need to install hadoop on each of those machines in order to implement the cluster.
Earlier I was planning to install a Linux distribution on each of these machines, and then install hadoop separately, and configure each machine to work in parallel.
Recently I came through some Hadoop distributions, such as Cloudera and Hortonworks. My question is, should I install a distribution such as Cloudera or Hortonworks in each of those machines, or should I install hadoop separately as I described earlier?
Will using a distribution make my task easier or would it need more knowledge to handle them than pure hadoop installation?
I'm a beginner in Hadoop too (~1.5 month), using a distribution can be very helpful if you use the automated way to install (Cloudera Manager for Cloudera or Ambari for Hortonworks). It install and deploy Hadoop and services you choose (hive, impala, spark, hue ...) on all the cluster very quickly. The main disadvantages in my opinion is that you can't really optimize and personalize your installation but for a first time it's much easier to run some simple cases.
I would highly recommend using a distro rather than doing it manually. Even using a distro will be complicated the first time as there are a lot of separate services that need to be running depending on what you want into addition to a base Hadoop install.
Also, do you intend to have a cluster size of just 5 machines? If so Hadoop may not be the right solution for you. You could potentially run all the masters on a single server and have a 4 node cluster, but that is probably not going to perform all that well. Note that the typical redundancy for HDFS is 3, so 4 nodes is just barely enough. If one or two machines goes down you could easily lose data in a production cluster. Personally I would recommend at least 8 nodes and one or two servers for the masters, so a total cluster size of 9 or 10, preferably 10.

Run Flume master on Windows

I'm able to run a Cloudera Flume node on Windows but I can't get a Flume master running.
Is this possible and how can you do it?
I don't think this is possible within windows unless you are willing to consider a virtual linux/unix box to act as the Flume master.
I have not spent any time researching Cygwin as a possible solution.

can i use hadoop cloudera without root access?

a bit of a binary question (okay, not excatly) - but was wondering if one is able to configure cloudera / hadoop to run at the nodes without root shell access to the node computers (although i can setup ssh passwordless login)?
appears from their instructions that root access is needed, at yet i found a hadoop wiki which suggest root access might not be needed ? http://wiki.apache.org/nutch/NutchHadoopTutorial
You can, yes. You'll just have to install from source instead of RPM or DEB. Visit http://archive.cloudera.com/docs/ and click on one of the "Tarball" releases (either CDH2 or CDH3) in the top-right corner.
Once you get the tarball, you'll have to create a hadoop user, set some environment variables, etc.
I encourage you to ask Cloudera questions in Get Satisfaction, where we're more likely to answer your questions.
getsatisfaction.com/cloudera
Thanks, and good luck.

How to Setup a Low cost cluster

At my house I have about 10 computers all different processors and speeds (all x86 compatible). I would like to cluster these. I have looked at openMosix but since they stopped development on it I am deciding against using it. I would prefer to use the latest or next to latest version of a mainstream distribution of Linux (Suse 11, Suse 10.3, Fedora 9 etc).
Does anyone know any good sites (or books) that explain how to get a cluster up and running using free open source applications that are common on most mainstream distributions?
I would like a load balancing cluster for custom software I would be writing. I can not use something like Folding#home because I need constant contact with every part of the application. For example if I was running a simulation and one computer was controlling where rain was falling, and another controlling what my herbivores are doing in the simulation.
I recently set up an OpenMPI cluster using Ubuntu. Some existing write up is at https://wiki.ubuntu.com/MpichCluster .
Your question is too vague. What cluster application do you want to use?
By far the easiest way to set up a "cluster" is to install Folding#Home on each of your machines. But I doubt that's really what you're asking for.
I have set up clusters for music/video transcoding using simple bash scripts and ssh shared keys before.
I manage mail server clusters at work.
You only need a cluster if you know what you want to do. Come back with an actual requirement, and someone will suggest a solution.
Take a look at Rocks. It's a fullblown cluster "distribution" based on CentOS 5.1. It installs all you need (libs, applications and tools) to run a cluster and is dead simple to install and use. You do all the tweaking and configuration on the master node and it helps you with kickstarting all your other nodes. I've recently been installing a 1200+ nodes (over 10.000 cores!) cluster with it! And would not hesitate to install it on a 4 node cluster since the workload to install the master is none!
You could either run applications written for cluster libs such as MPI or PVM or you could use the queue system (Sun Grid Engine) to distribute any type of jobs. Or distcc to compile code of choice on all nodes!
And it's open source, gpl, free, everything that you like!
I think he's looking for something similar with openMosix, some kind of a general cluster on top of which any application can run distributed among the nodes. AFAIK there's nothing like that available. MPI based clusters are the closest thing you can get, but I think you can only run MPI applications on them.
Linux Virtual Server
http://www.linuxvirtualserver.org/
I use pvm and it works. But even with a nice ssh setup, allowing for login without entering passwd to the machine, you can easily remotely launch commands on your different computing nodes.

Resources