Hadoop installation trouble - hadoop

I am installing hadoop on my windows PC. I was able to start yarn and dfs. I ran this command hadoop fs -mkdir /in
It displays the following error:
-mkdir: java.net.UnknownHostException: master
I am newbie Please explain me what has to be done ?

The error indicates the host master cannot be resolved to an IP address. You probably need to add a mapping for it in your hosts file so it can be resolved.

Related

Unable to read a file from HDFS using spark shell in ubuntu

I have installed spark and hadoop in standalone modes on ubuntu virtualbox for my learning. I am able to do normal hadoop mapreduce operations on hdfs without using spark. But when I use below code in spark-shell,
val file=sc.textFile("hdfs://localhost:9000/in/file")
scala>file.count()
I get "input path does not exist." error. The core-site.xml has fs.defaultFS with value hdfs://localhost:9000. If I give localhost without the port number, I get "Connection refused" error as it is listening on default port 8020. Hostname and localhost are set to loopback addresses 127.0.0.1 and 127.0.1.1 in etc/hosts.
Kindly let me know how to resolve this issue.
Thanks in advance!
I am able to read and write into the hdfs using
"hdfs://localhost:9000/user/<user-name>/..."
Thank you for your help..
Probably your configuration is alright, but the file is missing, or in an unexpected
location...
1) try:
sc.textFile("hdfs://in/file")
sc.textFile("hdfs:///user/<USERNAME>/in/file")
with USERNAME=hadoop, or your own username
2) try on the command line (outside spark-shell) to access that directory/file :
hdfs dfs -ls /in/file

hortonworks sandbox connection refuse

i just start to learn hadoop with hortonworks sandbox
i have HDP 2.3 on a virtualBox and in the setting i have a Bridged Adapter network and a NAT,
when i start the machine everything is ok i can do some hadoop command i can connect the Ambari at 127.0.0.1:8080
but when i run the script in /etc/lib/hue/tools/start_scripts/gen_hosts.sh
to generate hosts with a different ip address every thing going wrong and i can't execute a simple hadoop command like hadoop fs -ls /user
i get this error
ls: Call From sandbox.hortonworks.com/10.24.244.85 to sandbox.hortonworks.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
as i said i just started to learn hadoop and i am not a network expert so i will preciate any help
thank you.
I found that you have to restart the services ( HDFS,MapReduce,Hbase,..) from
Ambari when you generate a host.
Hope this will help someone.
turn on name NameNode / HDFS it will work

Hadoop namenode format not working

I've been trying to install hadoop 2.7.0 on Ubuntu but when i enter the hadoop namenode -format command i get the following message:
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I've triple checked all the configuration files but i can't seem to find where the problem is.
I followed this tutorial : http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
Can anyone please tell me why is this not working??
You have to add hadoop-hdfs-2.7.0.jar to your hadoop classpath. Just add these lines in $HADOOP_HOME/etc/hadoop/hadoop-env.sh:
export HADOOP_HOME=/path/to/hadoop
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.7.0.jar
Now, stop all hadoop processes. Try to format namenode now. Post the error if you get any.

Need help adding multiple DataNodes in pseudo-distributed mode (one machine), using Hadoop-0.18.0

I am a student, interested in Hadoop and started to explore it recently.
I tried adding an additional DataNode in the pseudo-distributed mode but failed.
I am following the Yahoo developer tutorial and so the version of Hadoop I am using is hadoop-0.18.0
I tried to start up using 2 methods I found online:
Method 1 (link)
I have a problem with this line
bin/hadoop-daemon.sh --script bin/hdfs $1 datanode $DN_CONF_OPTS
--script bin/hdfs doesn't seem to be valid in the version I am using. I changed it to --config $HADOOP_HOME/conf2 with all the configuration files in that directory, but when the script is ran it gave the error:
Usage: Java DataNode [-rollback]
Any idea what does the error mean? The log files are created but DataNode did not start.
Method 2 (link)
Basically I duplicated conf folder to conf2 folder, making necessary changes documented on the website to hadoop-site.xml and hadoop-env.sh. then I ran the command
./hadoop-daemon.sh --config ..../conf2 start datanode
it gives the error:
datanode running as process 4190. stop it first.
So I guess this is the 1st DataNode that was started, and the command failed to start another DataNode.
Is there anything I can do to start additional DataNode in the Yahoo VM Hadoop environment? Any help/advice would be greatly appreciated.
Hadoop start/stop scripts use /tmp as a default directory for storing PIDs of already started daemons. In your situation, when you start second datanode, startup script finds /tmp/hadoop-someuser-datanode.pid file from the first datanode and assumes that the datanode daemon is already started.
The plain solution is to set HADOOP_PID_DIR env variable to something else (but not /tmp). Also do not forget to update all network port numbers in conf2.
The smart solution is start a second VM with hadoop environment and join them in a single cluster. It's the way hadoop is intended to use.

FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed

I am using Ubuntu 12.04, hadoop-0.23.5, hive-0.9.0.
I specified my metastore_db separately to some other place $HIVE_HOME/my_db/metastore_db in hive-site.xml
Hadoop runs fine, jps gives ResourceManager,NameNode,DataNode,NodeManager,SecondaryNameNode
Hive gets started perfectly,metastore_db & derby.log also created,and all hive commands run successfully,I can create databases,table,etc. But after few day later,when I run show databases,or show tables, get below error
FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I had this problem too and the accepted answer did not help me so will add my solution here for others:
My problem was I had a single machine with a pseudo distributed set up installed with hive. It was working fine with localhost as the host name. However when we decided to add multiple machines to the cluster we also decided to give the machines proper names "machine01, machine 02 etc etc".
I changed all the hadoop conf/*-site.xml files and the hive-site.xml file too but still had the error. After exhaustive research I realized that in the metastore db hive was picking up the URIs not from *-site files, but from the metastore tables in mysql. Where all the hive table meta data was saved are two tables SDS and DBS. Upon changing the DB_LOCATION_URI column and LOCATION in the tables DBS and SDS respectively to point to the latest namenode URI, I was back in business.
Hope this helps others.
reasons for this
If you changed your Hadoop/Hive version,you may be specifying previous hadoop version (which has ds.default.name=hdfs://localhost:54310 in core-site.xml) in your hive-0.9.0/conf/hive-env.sh
file
$HADOOP_HOME may be point to some other location
Specified version of Hadoop is not working
your namenode may be in safe mode ,run bin/hdfs dfsadmin -safemode leave or bin/hadoop dsfadmin -safemode leave
In case of fresh installation
the above problem can be the effect of a name node issue
try formatting the namenode using the command
hadoop namenode -format
1.Turn off your namenode from safe mode. Try the commands below:
hadoop dfsadmin -safemode leave
2.Restart your Hadoop daemons:
sudo service hadoop-master stop
sudo service hadoop-master start

Resources