In Oozie, how I'd be able to use script output - shell

I have to create a cron-like coordinator job and collect some logs.
/mydir/sample.sh >> /mydir/cron.log 2>&1
Can I use simple oozie wf, which I use for any shell command?
I'm asking because I've seen that there are specific workflows to execute .sh scripts

Sure, you can execute Shell action (On any node in the Yarn cluster) or use the Ssh action if you'd like to target specific hosts. You have to keep in mind that the "/mydir/cron.log" file will be created on the host the action is executed on and the generated file might no be available for other Oozie actions.

Related

How do I write script to start multiple services in centos?

I am having multi-node cluster of Hadoop, Kafka, Zookeeper, Spark.
I am running following commands to start respective service,
$ ./Hadoop/sbin/start-all.sh
$ ./zookeeper/bin/zkServer.sh start
$ ./Kafka/Kafka-server-start.sh ./config/server-properties.sh
$ ./spark/sbin/start-all.sh
and so on..
can anyone tell me how to write a script to automate this process of running each command individually?
Have you tried creating a simple shell script with all these commands and running that script instead? For example, following is a simple bash script
#!/bin/bash
./Hadoop/sbin/start-all.sh
./zookeeper/bin/zkServer.sh start
./kafka/kafka-server-start.sh ./config/server-properties.sh
./spark/sbin/start-all.sh
and so on ...

Script didn't Finish execution but cron job started again

i am trying to run a cron job which will execute my shell script, my shell script is having hive & pig scripts. I am setting the cron job to execute after every 2 mins but before my shell script is getting finish my cron job starts again is it going to effect my result or once the script finishes its execution then only it will start. I am in a bit of dilemma here. Please help.
Thanks
I think there are two ways to better resolve this, a long way and a short way:
Long way (probably most correct):
Use something like Luigi to manage job dependencies, then run that with Cron (it won't run more than one of the same job).
Luigi will handle all your job dependencies for you and you can make sure that a particular job only executes once. It's a little more work to get set-up, but it's really worth it.
Short Way:
Lock files have already been mentioned, but you can do this on HDFS too, that way it doesn't depend on where you run the cron job from.
Instead of checking for a lock file, put a flag on HDFS when you start and finish the job, and have this as a standard thing in all of your cron jobs:
# at start
hadoop fs -touchz /jobs/job1/2016-07-01/_STARTED
# at finish
hadoop fs -touchz /jobs/job1/2016-07-01/_COMPLETED
# Then check them (pseudocode):
if(!started && !completed): run_job; add_completed; remove_started
At the start of the script, have a check:
#!/bin/bash
if [ -e /tmp/file.lock ]; then
rm /tmp/file.lock # removes the lock and continue
else
exit # No lock file exists, which means prev execution has not completed.
fi
.... # Your script here
touch /tmp/file.lock
There are many others ways of achieving the same. I am giving a simple example.

Why oozie submits shell action to yarn?

I am recently learning oozie. I little curious about shell action. I am executing shell action which contains shell command like
hadoop jar <jarPath> <FQCN>
While running this action there are two yarn jobs running which are
one for hadoop job
one for shell action
I dont understand why shell action needs yarn for execution. I also tried email action. It executes without yarn resources.
To answer this question, the difference is between
running a shell script independently(.sh file or from CLI)
running a shell action as a part of an oozie workflow.(shell script in an oozie shell action)
The first case is very obvious.
In the second case, oozie launches the shell script via YARN(is the resource negotiator )to run your shell script on the cluster where oozie is installed and runs MR jobs internally to launch the shell action. So the shell script runs as a YARN application internally. The logs of the oozie workflow shows the way the shell action is launched in oozie.

How is running a script using aws emr script-runner different from running it from bash?

I have used script-runner on aws emr, and given that it may look very basic (and maybe stuid) question, but I read many documents and noone answers why we need a script runner in emr, when all it does is executing a script in the master node.
Can the same script not be run using a bash?
The script runner is needed when you want to simply execute a script but the entry point is expecting a jar. For example, submitting an EMR Step will execute a "hadoop jar blah ..." command. But if "blah" is a script this will fail. Script runner becomes the jar that the Step expects and then uses its argument (path to script) to execute shell script.
When you are running your script in bash, you need to have the script locally and also you need to set all the configurations to work as you expect it.
With the script-runner you have more options, for example, run it as part of your cluster launch command, as well execute a script that is hosted remotely in S3. See the example from the EMR documentations: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-script.html

Simple script run via cronjob doesn't work but works from shell

I am on shared hosting and I'm trying to schedule cronjob to run every now and then. Via cPanel I scheduled to execute my script but even though that according to my host support the cronjob runs, the script doesn't seem as doing anything. The cron job command I set via cPanel is:
/bin/sh /home1/myusername/public_html/somefolder/cronjob2.sh
and the cronjob2.sh
#!/bin/bash
/home1/myusername/public_html/somefolder/node_modules/forever/bin/forever stop 0
when via SSH I execute:
/home1/myusername/public_html/somefolder/cronjob2.sh
it stops forever process as needed. From cronjob doesn't do anything.
How can I get this working?
EDIT:
So I've tried:
/bin/sh /home1/username/public_html/somefolder/cronjob2.sh >> /tmp/mylog 2>&1
and mylog entries say:
/usr/bin/env: node: No such file or directory
It seems that forever needs to run node and this cannot be found. How would I possibly fix this?
EDIT2:
Accepted answer at superuser.com. Thank you all for help
https://superuser.com/questions/763261/simple-script-run-via-cronjob-doesnt-work-but-works-from-shell/763288#763288
For cron job lines in a crontab it's not required to specify kind of shell or e.g. of perl.
It's enough, that your script contains
shebang
line.
Therefore you should remove /bin/sh from your cron job line.
Another aspect, that might cause a different behavior of your script by interactive start and by cron daemon start is possible different environment, first of all the PATH variable. Therefore check, if you script is able to be executed in very restricted environment, that is provided by cron daemon. You can determine your cron job environment experimentally by start of temporary cron job, that executes "env" command and writes its output to a file.
Once more aspect: Have you redirected STDOUT and STDERR of the cron job to a log file and read its content to analyze the issue? You can do it as follows:
your_cron_job >/tmp/any_name.log 2>&1
According to what you wrote, when you run your script via SSH, you are using bash, because this line is the first of your script:
#!/bin/bash
However, in the crontab, you are forcing the use of sh instead of bash. Are you sure your script is fully compatible with sh? Otherwise, simply replace /bin/sh with /bin/bash in your cron command and test again.

Resources