I have a 6 node cluster.
When trying to run an oozie job, it triggers the job in any of the 6 nodes
Is there a way to specify the node in which the oozie shell action should be triggered
Related
While deploying hadoop, I want some set of nodes to run HDFS server but not to run any MapReduce tasks.
For example, there are two nodes A and B that run HDFS.
I want to exclude the node A from running any map/reduce task.
How can I achieve it? Thanks
If you do not want to run any MapReduce job in a particular node or a set of nodes,
Stopping the nodemanager daemon would be the simplest option if they are already running.
Run this command on the nodes where the MR tasks should not be attempted.
yarn-daemon.sh stop nodemanager
Or exclude the hosts using the property yarn.resourcemanager.nodes.exclude-path in yarn-site.xml
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/path/to/excludes.txt</value>
<description>Path of the file containing the hosts to exclude. Should be readable by YARN user</description>
</property>
On adding this property, refresh the resourcemanager
yarn rmadmin -refreshNodes
The nodes specified in the file will be exempted from attempting MapReduce tasks.
I answer my question
If you use Yarn for resource management,
go check franklinsijo's answer.
If you use standalone mode,
make a list of nodes that you will run MR tasks and specify its path as 'mapred.hosts' at mapred-default file. (https://hadoop.apache.org/docs/r1.2.1/mapred-default.html)
Say I have a currently running oozie bundle running coordinators A and B.
I do development work on coordinator B and I want to relaunch coordinator B.
Is there any way to relaunch a coordinator inside of an oozie bundle, without restarting the bundle itself? This is because I don't want to restart coordinator A.
Otherwise, is there a way to add/remove a coordinator from a currently running oozie bundle?
I wrote smth like custom oozie FTP action (simple example described in "Professional Hadoop Solutions By: Boris Lublinsky; Kevin T. Smith; Alexey Yakubovich"). We have HDFS on node1 and Oozie server on node2. Node2 also has HDFS client.
My problem:
Oozie job started from node1 (All needed files located on HDFS on node1).
Oozie custom FTP action successfully downloaded CSV files from FTP on node2 (oozie server located)
I should pass file into HDFS and create external table from CSV on node1.
I tried to use Java action and call fileSystem.moveFromLocalFile(...) method. Also I tried to use Shell action like /usr/bin/hadoop fs -moveFromLocal /tmp\import_folder/filename.csv /user/user_for_import/imported/filename.csv but I hadn't effect. All actions seems tried to look files on node1. The same result if I start oozie job from node2.
Question: can I set node for FTP action to load files from FTP on node1? Or can I have any other ways to pass downloaded files in HDFS instead described?
Oozie runs all its actions as MR jobs on nodes from a configured Map Reduce cluster. There is no way to make Oozie run some actions on a particular node.
Basically, you should use Flume to ingest files into HDFS. Set up a Flume agent on your FTP node.
Ozzie allows user to run a shell script on a particular node via oozie sssh shell extension.
https://oozie.apache.org/docs/4.2.0/DG_SshActionExtension.html
I run a YARN mapreduce job with 1 node.
But my job stuck on ACCEPTED state, and still 0% completed. I checked with jps command on my slave, there is no MR App Master or YARN Child to complete the job. On my slave all daemons have ran normally like datanode and nodemanager. There is no wrong configuration on my master node, because I've tried t before with different slave and it's works.
How can I fix it? Thanks....
I ran an Oozie coordinator which runs a workflow every hour. I don't have its id and when I run the command oozie jobs -oozie http://localhost:11000/oozie it only shows me the workflow jobs and there is no coordinator. I would like to stop this coordinator from further processing, how can I do that?
First an advice in order to avoid to define the oozie URL in each command
export OOZIE_URL=http://localhost:11000/oozie
You can list running coordinators
oozie jobs -jobtype coordinator -filter status=RUNNING
This will return a list displaying the coordinator ID <coord_id> in the first column.
Note that you must have appropriate rights to run the following commands.
Then you can suspend the coordinator
oozie job -suspend `<coord_id>`
And resume it.
oozie job -resume <coord_id>
But often you have to kill it
oozie job -kill <coord_id>
and redeploy it...
oozie job -config job.properties -run
For coordinator jobs, try this
oozie jobs -jobtype coordinator -oozie http://localhost:11000/oozie
su - {username} -c 'oozie job -oozie http://localhost:11000/oozie -kill {Workflow External Id or coordinator's external Id}'
To execute this command you need to login to your oozie cluster or you can also run from local machine for that you need to replace localhost to box address where oozie is running..
Thanks