Oozie job configuration app directory not found on HDFS - hadoop

I installed a pseudo-distributed version of Cloudera on my Linux box, and ran some simple MapReduce examples with success. However, I'm trying to get Oozie to work, and am completely baffled by the errors I am receiving when attempting to execute a simple job workflow:
tim#phocion:~$ oozie version
Oozie client build version: 3.1.3-cdh4.0.1
Copy the pre-packaged examples to HDFS and execute, per the documentation:
tim#phocion:~$ oozie job -oozie http://phocion:11000/oozie -config /user/tim/examples/apps/map-reduce/job.properties -run
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
Check to see if the file exists:
tim#phocion:~$ hdfs dfs -ls /user/tim/examples/apps/map-reduce
Found 3 items
-rwxr-xr-x 1 tim tim 995 2012-10-03 14:47 /user/tim/examples/apps/map-reduce/job.properties
drwxrwxr-x - tim tim 4096 2012-10-03 14:47 /user/tim/examples/apps/map-reduce/lib
-rwxr-xr-x 1 tim tim 2559 2012-10-03 14:47 /user/tim/examples/apps/map-reduce/workflow.xml
It does. Can I connect to phocion:8020?
tim#phocion:~$ telnet phocion 8020
Trying 127.0.1.1...
Connected to phocion.
Escape character is '^]'.
I can. So, basically, I'm at a total loss as to what this error is trying to tell me - the folder very much does exist. I'm assuming the error is too vague to fully communicate what the issue is, but I've found virtually nothing out there that could point me in the right direction.
I can also replicate this error with other 3rd party tutorials.
Spent much time pouring through configuration files to the point of not wanting to look at a computer ever again. Maybe I'm over thinking the issue here, but any help would be greatly appreciated.
EDIT: Adding the full job.properties (not too different from the default):
nameNode=hdfs://phocion:8020
jobTracker=phocion:8021
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce
MORE EDITS: I get the same exact error when the folder DOES NOT exist, and after I put if back into hdfs. Last-ditch idea that its a permissions issue, chmod 777 still gets the same error. Full HDFS path passed on the command line doesn't fix the issue. Running it under oozie and even root accounts don't work:
tim#phocion:~$ oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
tim#phocion:~$ hdfs dfs -put examples/ /user/tim/
12/10/04 13:26:43 INFO util.NativeCodeLoader: Loaded the native-hadoop library
tim#phocion:~$ oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
tim#phocion:~$ hdfs dfs -chmod -R 777 /user/tim/examples/
12/10/04 13:28:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library
tim#phocion:~$ oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
tim#phocion:~$ sudo -u oozie oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
[sudo] password for tim:
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
tim#phocion:~$ sudo -u root oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
Should this command work in theory?
tim#phocion:~$ hdfs dfs -ls hdfs://phocion:8020/user/tim/examples/apps/map-reduce
ls: `hdfs://phocion:8020/user/tim/examples/apps/map-reduce': No such file or directory
This shows up in hadoop-hdfs logs after executing the oozie command:
2012-10-04 13:50:00,152 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 113297
2012-10-04 13:50:00,874 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://localhost.localdomain:50090/getimage?getimage=1&txid=113296&storageInfo=-40:2092007576:0:cluster8
2012-10-04 13:50:00,875 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.net.ConnectException: Connection refused
2012-10-04 13:50:00,876 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:234)
at sun.net.www.http.HttpClient.New(HttpClient.java:307)
at sun.net.www.http.HttpClient.New(HttpClient.java:324)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)

In addition to HarshJ's comment, check your error message:
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/demo] does not exist
And the hadoop fs -ls listing you provided:
/user/tim/examples/apps/map-reduce/
And play spot the difference:
/user/tim/examples/apps/demo
/user/tim/examples/apps/map-reduce/
try configuring as follows:
oozie.wf.application.path=/user/tim/examples/apps/map-reduce

I had a same issue and got it fixed by exporting the correct oozie url.
To export you should use the below command
export OOZIE_URL=http://someip:11000/oozie
To get this oozie url you need to use hue to connect you cluster and navigate to Workflows where you can find a tab called oozie. Inside this you should see gauges where a lot of properties will be listed. Look for the property oozie.servers.

What you need to do is to -copyFromLocal the examples folder to the location specified in the jobs config.

Related

oozie fails with Could not load db driver class: oracle.jdbc.OracleDriver

I am getting below error while executing sqoop export command(in shell script) with oozie.
"java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver"
sqoop export from cli(edge node) works fine.
I have added the ojdbc6.jar to below locations.
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/sqoop/lib/
(HDFS locations)
/user/oozie/share/lib/sqoop/ and
/user/oozie/share/lib/lib_20161215195933/sqoop
i have also set oozie.use.system.libpath=true in my oozie job.properties file
Please guide me if i am missing any setting.
log content
Thanks & Regards,
Sonali
Make sure that you upload a file to a directory /user/oozie/share/lib/sqoop (it could looks like /user/oozie/share/lib/lib_${timestamp}/sqoop for Cloudera and HDP).
Check if ojdbc6.jar file is correct - check if it contains OracleDriver.class and make sure size of the file is ok. It could be error while downloading.
Check permissions to ojdbc6.jar file (eventually, you can try to give 755 permissions to this file). Check who is the owner of the file - it should be oozie by default.
Update Oozie sharelib by execute below command (run this command on the host where Oozie Server is located):
sudo -u oozie oozie admin -oozie http://<Oozie_Server_Host>:11000/oozie -sharelibupdate
Verify sharelib for sqoop:
sudo -u oozie oozie admin -oozie http://<Oozie_Server_Host>:11000/oozie -shareliblist sqoop*
You can always restart Oozie service. It should update sharelib.
Create a directory named lib next to your workflow.xml in HDFS and put jars in there. Oozie will automatically make those jars available to all actions in that workflow.
Cloudera users should check this article. Especially paragraph 'One Last Thing'.

Not able to access /tmp folder in HDFS

I have started the name node, datanode and mr service on my local machine and all the service are running. Here is what's the result of jps command:
kv:~ karan.verma$ jps
4499 SecondaryNameNode
420
4676 NodeManager
4741 JobHistoryServer
5125 Jps
4406 DataNode
4600 ResourceManager
4333 NameNode
And i could easy browse throw the "browse directory" of the web UI for name node. But when i try to browse the /tmp directory, it shows me the following error:
Permission denied: user=root, access=READ_EXECUTE, inode="/tmp":karan.verma:karan.verma:drwxrwx-w-
I tried to change the permissions using following command:
hadoop fs -chown -R karan.verma:karan.verma hdfs://localhost/
hadoop fs -chmod a+w /
but no luck. Please suggest what could be the issue? I executed the above commands with sudo, but still the same result. Any Help?
it looks like you are running as root and the file system to is owned by karan.verma.
you can confirm this by running
whoami
either su to karan.veram or add root to the karan.verma group
Executing the following command solved the issue for me:
hadoop fs -chmod -R 777 hdfs://localhost/

Cannot start running on browser the namenode for Hadoop

It is my first time in installing Hadoop on my Linux (Fedora distro) running on VM (using Parallel on my Mac). And I followed every step on this video and including the textual version of it.And then when I run it on localhost (or the equivalent value from hostname) in port 50070, I got the following message.
...can't establish a connection to the server at localhost:50070
When I run the jps by the way command I don't have the datanode and namenode unlike at the end of the textual version tutorial which has the following:
While mine has only the following processes running:
6021 NodeManager
3947 SecondaryNameNode
5788 ResourceManager
8941 Jps
When I run the hadoop namenode command I have some of the following [redacted] error:
Cannot access storage directory /usr/local/hadoop_store/hdfs/namenode
16/10/11 21:52:45 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_store/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
I tried to access by the way the above mentioned directories and it existed.
Any hint for this newbie? ;-)
You would need to give read and write permission to user with which you are running the services on directory /usr/local/hadoop_store/hdfs/namenode.
Once done, you should run format command using hadoop namenode -format
Then try to start your services.
delete files /app/hadoop/tmp/*
and try again formatting the namenode and then start-dfs.sh & start-yarn.sh

error while running example of oozie job

I tried running my first oozie job by following a blog post.
I used oozie-examples.tar.gz, after extracting, placed examples in hdfs.
I tried running map-reduce job in it but unfortunately got an error.
Ran below command:
oozie job -oozie http://localhost:11000/oozie -config /examples/apps/map-reduce/job.properties -run
Got the error:
java.io.IOException: configuration is not specified at
org.apache.oozie.cli.OozieCLI.getConfiguration(OozieCLI.java:787) at
org.apache.oozie.cli.OozieCLI.jobCommand(OozieCLI.java:1026) at
org.apache.oozie.cli.OozieCLI.processCommand(OozieCLI.java:662) at
org.apache.oozie.cli.OozieCLI.run(OozieCLI.java:615) at
org.apache.oozie.cli.OozieCLI.main(OozieCLI.java:218) configuration is
not specified
I don't know which configuration it is asking for as I am using Cloudera VM and it has by default got all the configurations set in it.
oozie job -oozie http://localhost:11000/oozie -config /examples/apps/map-reduce/job.properties -run
The -config parameter takes an local path not an HDFS path. The workflow.xml needs to be present in the HDFS and path is defined in the job.properties file with the property:
oozie.wf.application.path=<path to the workflow.xml>

Permission denied issue in mapreduce?

I have tried the below query.
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount /home/cloudera/Desktop/words/output
map reduce is started after that its showing below error. can anyone please help on this issue.
15/11/04 10:33:57 INFO mapred.JobClient: Task Id : attempt_201511040935_0008_m_000002_0, Status : FAILED
org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Do I need to change anything config file or in cloudera manager.
The exception suggests that you are trying to write to the HDFS root directory "/" which you (user:cloudera) does not have permission to do.
Without knowing what your specific jar does:
I guess that the last argument ("/home/cloudera/Desktop/words/output") is where you wish to place the output.
I guess this is supposed to be within HDFS where /home does not exist.
Try to change this to somewhere where you can write, possibly "/user/cloudera/words/output"
There are set of default directories to be created before you start using the hadoop cluster,
do, it should show you the directories
$ hadoop fs -ls /
sample user, if you want to run as cloudera you need on hdfs
/user/cloudera -- the user running the program
/user/hadoop -- your hadoop file system user
/user/mapred -- your mapred user
/tmp -- temporary which needs to have permission hdfs chmod 1777
HTH.
The last argument that you are passing should be the output path of HDFS not the default file system.
As you are running with cloudera user, you can point to the /user/cloudera/words/output. But first you need to check whether you have cloudera in your HDFS and you have write permission by issuing the following
hadoop fs -ls /user/
Once you have it change your command to following:
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount <path_where_you_have_write_permission_in_HDFS>

Resources