I tried to run a simple program in hadoop using Windows-Cygwin.
I am able to start the namenode .
The jobtracker start however fails with exception :
FATAL mapred.JobTracker: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2200)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2192)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2186)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)
I tried all possible methods to resolve this ,but in vain. Any pointers will greatly help me.
Hdfs-site.xml configurations :
<configuration><br>
<property>
<name>fs.default.name< /name>
<value>hdfs://localhost:9100</value>
</property>
<property>
<name>mapred.job.tracker< /name>
<value>localhost:9101< /value>
</property>
<property>
<name>dfs.replication< /name>
<value>1</value>
</property>
</configuration>
The problem is the following lines should on into mapred-site.xml and NOT hdfs-site.xml,
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property>
By the way why are you trying to run Hadoop in Windows? For development? You don't have a linux machine or reluctant to install one?
One more thing, you usually put this property in core-site.xml not hdfs-site.xml,
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9100</value>
</property>
I faced the same issue when working on the "Pseudo Distributed" examples as at this page: http://hadoop.apache.org/docs/r1.1.2/single_node_setup.html#PseudoDistributed
It turned out that hadoop simply wasn't picking up my conf files. The examples at the link above assume you are running in your install of hadoop (i.e. /Usr/jane/hadoop-1.1.2). I was trying to run the examples in another directory. I'm sure you could configure hadoop to recognize other 'conf' directories, but I took the easy route and just started running in my hadoop directory.
This thread helped me figure it out: https://issues.apache.org/jira/browse/HDFS-2515
Related
I need to do data analysis using Hadoop. Therefore I have installed Hadoop and configured as below. But localhost:9870 is not working. Even I have format namenode every time I worked with that. Some articles and answers of this forum mentioned that 9870 is the updated one from 50070. I have win 10. I also referred answers in this forum but none of them worked. Java-home and hadoop-home paths are set. Paths to bin and sbin of hadoop are also set up. Can anyone please tell me what I am doing wrong in here?
I referred this site to do the installation and configuration.
https://medium.com/#pedro.a.hdez.a/hadoop-3-2-2-installation-guide-for-windows-10-454f5b5c22d3
core-site.xml
I have set up the Java path in this xml as well.
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9870</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-3.2.2\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-3.2.2\data\datanode</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
If you look at the namenode logs, it very likely has an error saying something about a port already being in use.
The default fs.defaultFS port should be 9000 - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html ; you shouldn't change this without good reason.
The Namenode web UI isn't the value in fs.defaultFS. It's default port is 9870, and is defined by dfs.namenode.http-address in hdfs-site.xml
need to do data analysis
You can do analysis on Windows without Hadoop using Spark, Hive, MapReduce, etc. directly and it'll have direct access to your machine without being limited by YARN container sizes.
I'm deploying Hadoop at work and I've been troubleshooting some days. Yesterday it was working perfectly but today something strange is happening.
I have hadoop.tmp.dir set in core-site.xml as well as other directories for HDFS (datanode, namenode and secondarynamenode in hdfs-site.xml). But today, when I format the FS it's creating all the files in /tmp and not in /usr/local/hadoop/tmp which is the one I have configured.
$ bin/hdfs namenode -format
[...]
INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
[...]
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/usr/local/hadoop/hdfs/secondname</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hdfs/datanode</value>
</property>
Anyone has any clue about what's happening?
Thanks!
make sure this directory exists and have enough permission
give the path as file:///usr/local/hadoop/tmp
Found what was wrong and it was really embarrassing. My hadoop user had bash as default sh but wasn't loading correctly the profile until I explicitly did "bash" on command line.
Saw it with printenv command.
I am getting this error after enabling hdfs plugin in apache ranger.
When I run enable-hdfs-plugin.sh ranger adds following configuration in hdfs-site.xml.
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.inode.attributes.provider.class</name>
<value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
</property>
But if I remove the above property and restart my namenode, it starts with no error. Also, when I try to format the namenode it gives me the same error.
This is my install.properties of ranger's hdfs-plugin.
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-hdfs-plugin-impl to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-hdfs-plugin-impl
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-hdfs-plugin-shim-1.0.0-SNAPSHOT.jar to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-hdfs-plugin-shim-1.0.0-SNAPSHOT.jar
Link ranger-1.0.0-SNAPSHOT-hdfs-plugin/lib/ranger-plugin-classloader-1.0.0-SNAPSHOT.jar to /var/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/ranger-plugin-classloader-1.0.0-SNAPSHOT.jar
follow these instruction as per your file path. The problem is due to classloader is not found in your hadoop file path.
I'm trying to run my first oozie workflow, simple <pig> action .
Can anyone help with these two tags:
<job-tracker>[JOB-TRACKER]</job-tracker>
<name-node>[NAME-NODE]</name-node>
As I understand, paramaters refer to existing configuration.
I'm using preconfigurered environment so can you please help where to find these values?
If you have access to see Hadoop's conf files, open core-site.xml to find the name node from the below property.
<property>
<name>fs.default.name</name>
<value>hdfs://ec2-1-1-1-1.compute-1.amazonaws.com:9000</value>
</property>
Open mapred-site.xml to find the job tracker.
<property>
<name>mapred.job.tracker</name>
<value>ec2-1-1-1-1.compute-1.amazonaws.com:54311</value>
</property>
Then your values will be.
nameNode=hdfs://ec2-1-1-1-1.compute-1.amazonaws.com:9000
jobTracker=ec2-1-1-1-1.compute-1.amazonaws.com:54311
Using hadoop multinode setup (1 mater , 1 salve)
After starting up start-mapred.sh on master , i found below error in TT logs (Slave an)
org.apache.hadoop.mapred.TaskTracker: Failed to get system directory
can some one help me to know what can be done to avoid this error
I am using
Hadoop 1.2.0
jetty-6.1.26
java version "1.6.0_23"
mapred-site.xml file
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/workspace</value>
</property>
</configuration>
It seems that you just added hadoop.tmp.dir and started the job. You need to restart the Hadoop daemons after adding any property to the configuration files. You have specified in your comment that you added this property at a later stage. This means that all the data and metadata along with other temporary files is still in the /tmp directory. Copy all those things from there into your /home/hduser/workspace directory, restart Hadoop and re run the job.
Do let me know the result. Thank you.
If, it is your windows PC and you are using cygwin to run Hadoop. Then task tracker will not work.