Sqoop Task using oozie workflow, ends with connection refused exception - hadoop

I am trying to execute sqoop task using oozie. Code structure looks like this
I have a job.properties file on my local file system, which I use to submit oozie job. Job.properties file is as follows
namenode=hdfs://servername:8020
jobtracker=servername:8021
queuename=default
oozie.wf.application.path=${namenode}/user/username/oozie
Command, I am using to submit job is
oozie job -oozie http://localhost:11000/oozie \PATH\job.properties -run
I have my workflow.xml at HDFS path as mentioned in properties application path (/user/username/oozie). With workflow.xml at HDFS folder, I have copied sqljdbc4.jar too.
Once I submit my job, I get this error in my oozie.log file.
Caused by: java.net.ConnectionException: Call to
servername/10.248.92.1:8021 failed on connection exception:
I am not sure, what is causing this exception. One weird thing is that this ip address: 10.248.92.1 is for the same server name.

I see you are running Hortonworks. They recently changed to the jobtracker port to 50030 by default. This gives you two options. 1) change the port to 8021 for your properties file and any other file/setting that depends on the jobtracker port or 2) change the jobtracker port back to 8021. In case you prefer option 2, here are the instructions.
The jobtracker port is not an easy setting to track down, but for HDP 1.3.2 you can find it in /usr/lib/ambari-server/web/javascripts/apps.js. Look for a block that looks like this:
{
"name": "mapred.job.tracker",
"templateName": ["jobtracker_host"],
"foreignKey": null,
"value": "<templateName[0]>:50300",
"filename": "mapred-site.xml"
},
and change the 50300 to 8021. Unfortunately that change is not enough for the cluster to pick up the new port. Go to the Ambari Web UI and stop the MapReduce service. The go to its configuration and may a benign change (you can change it back later if you want). Once you restart MapReduce that benign change and your change to the jobtracker port will stick and you will be running on port 8021. If you are running other services that depend on the jobtracker port, go ahead and stop and restart them too.
Unfortunately you will have to do this every time you upgrade the cluster (unless Hortonworks changes how this is set up). You might want to just bite the bullet and get everything to look for the jobtracker on port 50300.

It means that the JobTracker is not running or not listening on port 8021 for some reason. Do a telnet servername 8021 and verify the same.

Related

mapreduce tasks only run on namenode

I have builded a Hadoop cluster on three machines; these are the characteristics:
OS:Ubuntu14.04LTS
Hadoop:2.6.0
NameNode and ResourceManager IP: namenode/192.168.0.100
DataNode also as the NodeManger IP: data1/192.168.0.101, data2/192.168.0.102
I have configed all xml files as official doc. When I execute the wordcount example program in eclipse, I wanna show the machine information, which is running the mapTask or reduceTask, so here is my code snippet.
//get localhost
InetAddress mLocalHost = InetAddress.getLocalHost();
System.out.println("Task on " + mLocalHost);
above the snippet was put into map and reduce function and runs it on hadoop. Nevertheless the console always show:
Task on namenode/192.168.0.100
From my perspective, these tasks should run on data1 or data2. Can you explain the puzzle? Whats wrong with my cluster?
Whats more?
the jobHistory(namenode:19888) records nothing.
and webAppProxy(namenode:8088) just show the active nodes:2, but nothing more infomation about job.
can you help me? really appreciated.
namenode's further info below,
jps command show:
12647 Jps
11426 SecondaryNameNode
11217 NameNode
11585 ResourceManager
12033 JobHistoryServe
Where did you put that code, Is it in your Driver class ? You need to have it in your mapper or reducer so that you can see which node is processing.
Instead of that you can have a look at resource manager web ui at rmipaddress:8088 which will give you more details on which node is executing mappers and other logs.
i have found whats wrong with my problem. "run on hadoop" in Eclipse just starts the job locally, so i should modify the MyHadoopXML.xml file which is under Eclipse plugins' sub-directory. Otherwise, i just develop and debug mapreduce job locally and export the project into a jar, then run the jar with command of "hadoop jar" in the cluster to verify whether the job is executed successfully.

how to start and check job history on hadoop 2.5.2

in the mapreduce webconsole for each application there is a tracking ui link which points to xx:19888/jobhistory/, but how to start the service on 19888 (i have started 4 services: yarn-resource-manager, yarn-node-manager, hdfs-name-node, hdfs-data-node, what i have missed?)
is the jobtracker removed in 2.5.2
I want to check the job.xml generated for my job, where can i find it. I have specified "mapreduce.jobtracker.jobhistory.location" but nothing is there
Thank you.
To access the JobHistory server's web interface then you have start the hadoop-mapreduce-historyserver service, which will bind to 19888 by default.
If your are running YARN in the cluster then you don't need jobtracker anymore, the work done jobtracker is offloaded to ResourceManager, NodeManager's & ApplicationMaster's. But, you could still install just MRv1 in that case you will install JobTracker and TaskTracker's (which is not recommended).
You could check the job.xml from the ResourceManager's UI by navigating to http://RESOURCEMANAGER_HOST:8088/cluster -> select your application's Tracking UI -> Select your Job ID -> On the left tab you'll be able to see Configuration. Or simply if you already know you'r job's id then visit this link: http://JOBHISTORY_SERVER:19888/jobhistory/conf/YOUR_JOB_ID.
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh --config $HADOOP_HOME/etc/hadoop start historyserver
then run jps to see if it is running.
Please try 'jps' command to verify all the services are running.
try //jhs_host:port/ Default HTTP port is 19888."
Make sure that your hadoop services are running.

Job Tracker web interface

I followed the tutorialshttp://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html and installed hadoop 2.4.1 as pseudo distributed cluster. I created a ubuntu VM using OracleVM and installed hadoop as mentioned in the link. It was setup fine and able to run the examples. However the job tracker URL is not working. :50030 gives page not found. I also tried netstat on the server and there is no process waiting on 50030 port . Do i need to start any other service ? What are the possible reasons ?
You need to execute this:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Or JobTracker won't start.
(In my case, $HADOOP_HOME is in /usr/local/hadoop)
Check the value of mapred.job.tracker.http.address in mapred-site.xml
If the port is different, use that.
Also check if jobtracker is running. Check the jobtracker logs.
You need to enter the following command
http://localhost:50030/
Job Tracker web UI.

Hadoop 2.2.0 Web UI not showing Job Progress

I have installed Single node hadoop 2.2.0 from this link. When i run a job from terminal, it works fine with output. Web UI's i used
- Resource Manager : http://localhost:8088
- Namenode Daemon : http://localhost:50070
But from Resource Manager's web UI(shown above) i can't see job progress like Submitted Jobs, Running Jobs, etc..
MY /etc/hosts file is as follows:
127.0.0.1 localhost
127.0.1.1 meitpict
My System has IP: 192.168.2.96(I tried by removing this ip but still it didn't worked)
The only host:port i mentioned is in core-site.xml and that is:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:54310</value>
</property>
Even i get these problems while executing Map - Reduce job.
Best what i did that while executing Map-Reduce job you will get link on the box from which you executed MapR job for checking the job progress something like below:
http://node1:19888/jobhistory/job/job_1409462238335_0002/mapreduce/app
Here node1 is set to 192.168.56.101 in my hosts file entry and that is for NameNode box.
So at the time of your MapR job is running you can go to the UI link provided by MapR framework.
ANd when it gets opened then do not close and there you can find details about other jobs also, when they started and when they got finished etc.
So next time better to check your putty console output after submitting the MapR job, you will definitely see a link for the current job to check it status from the browser UI.
In Hadoop 2.x this problem could be related to memory issues, you can see it in MapReduce in Hadoop 2.2.0 not working

where is the hadoop task manager UI

I installed the hadoop 2.2 system on my ubuntu box using this tutorial
http://codesfusion.blogspot.com/2013/11/hadoop-2x-core-hdfs-and-yarn-components.html
Everything worked fine for me and now when I do
http://localhost:50070
I can see the management UI for HDFS. Very good!!
But the I am going through another tutorial which tells me that there must be a task manager UI running at http://mymachine.com:50030 and http://mymachine.com:50060
on my machine I cannot open these ports.
I have already done
start-dfs.sh
start-yarn.sh
start-all.sh
is something wrong? why can't I see the task manager UI?
You have installed YARN (MRv2) which runs the ResourceManager. The URL http://mymachine.com:50030 is the web address for the JobTracker daemon that comes with MRv1 and hence you are not able to see it.
To see the ResourceManager UI, check your yarn-site.xml file for the following property:
yarn.resourcemanager.webapp.address
By default, it should point to : resource_manager_hostname:8088
Assuming your ResourceManager runs on mymachine, you should see the ResourceManager UI at http://mymachine.com:8088/
Make sure all your deamons are up and running before you visit the URL for the ResourceManager.
For Hadoop 2[aka YARN/MRV2] - Any hadoop installation version-ed 2.x or higher its at port number 8088. eg. localhost:8088
For Hadoop 1 - Any hadoop installation version-ed lower than 2.x[eg 1.x or 0.x] its at port number 50030. eg localhost:50030
By default HadoopUI location is as below
http://mymachine.com:50070

Resources