Oozie fork running only 2 forks parallely - hadoop

I am running an oozie workflow job which has a fork node. Fork node directs the workflow to 4 different sub-workflows which in turn are calling shell scripts.
Ideally all 4 shell scripts were suppose to execute parallely but for me only 2 shell scripts are executing parallely.
Could someone help me to address this issue?

Related

Running a bash script on nodes srun uses for an mpi job

I can launch an mpi job across multiple compute nodes using a slurm batch script and srun. As part of the slurm script, I want to launch a shell script that runs on the nodes the job is using to collect information (using the top command) about the job tasks running on that node. I want the shell script to run at the node level, rather than the task level. The shell script works fine running on just a single compute node, and for jobs using a single compute node I can run it in the background as part of the slurm script. But its not clear how to get it to run on multiple compute nodes using srun. I've tried using multiple srun commands in the slurm batch script, but the shell script only starts on on compute node.
I figured this out. I create a shell script wrapper to invoke the mpi code and then in the slurm script I use srun on the wrapper script. In the wrapper script I have the following conditional to invoke my shell script (sampleTop2.sh) to run one instance on each of the allocated compute nodes.
if (( ( SLURM_PROCID % SLURM_NTASKS_PER_NODE) == 0 ))
then
./sampleTop2.sh $USER $SLURMD_NODENAME 10 &
fi

In Oozie, how I'd be able to use script output

I have to create a cron-like coordinator job and collect some logs.
/mydir/sample.sh >> /mydir/cron.log 2>&1
Can I use simple oozie wf, which I use for any shell command?
I'm asking because I've seen that there are specific workflows to execute .sh scripts
Sure, you can execute Shell action (On any node in the Yarn cluster) or use the Ssh action if you'd like to target specific hosts. You have to keep in mind that the "/mydir/cron.log" file will be created on the host the action is executed on and the generated file might no be available for other Oozie actions.

Need help to develop CHEF recipe to execute 4 bash scripts on 4 servers in sequence and verify if each script executed successfully

I need help to develop CHEF recipe to automate application installation.
Here is my requirement -
Well, I have 4 different bash scripts and need to execute on 4 different severs in sequence like script 'A' to execute on server '1' and verify that script executed successful and then script 'B' to execute on server '2' and again verify execution. similarly for other 2 scripts to execute on 2 different servers.
Any ideas to develop recipe for above requirement??
Thanks in advance.
Nodes are usually independent from each other. If you still want to do this I would recommend using a data bag that stores the outcome of the scripts or alternatively a node tag about the outcome that you can query in the chef client runs before executing the script.

Why oozie submits shell action to yarn?

I am recently learning oozie. I little curious about shell action. I am executing shell action which contains shell command like
hadoop jar <jarPath> <FQCN>
While running this action there are two yarn jobs running which are
one for hadoop job
one for shell action
I dont understand why shell action needs yarn for execution. I also tried email action. It executes without yarn resources.
To answer this question, the difference is between
running a shell script independently(.sh file or from CLI)
running a shell action as a part of an oozie workflow.(shell script in an oozie shell action)
The first case is very obvious.
In the second case, oozie launches the shell script via YARN(is the resource negotiator )to run your shell script on the cluster where oozie is installed and runs MR jobs internally to launch the shell action. So the shell script runs as a YARN application internally. The logs of the oozie workflow shows the way the shell action is launched in oozie.

Why does scheduling Spark jobs through cron fail (while the same command works when executed on terminal)?

I am trying to schedule a spark job using cron.
I have made a shell script and it executes well on the terminal.
However, when I execute the script using cron it gives me insufficient memory to start JVM thread error.
Every time I start the script using terminal there is no issue. This issue comes when the script starts with cron.
Kindly if you could suggest something.

Resources