Introduction: I'm using Ubuntu 18.04.2 LTS on which I'm trying to set up a Hadoop 3.2 Single Node Cluster. The installation goes perfectly fine, and I have Java installed. JPS is working as well.
Issue: I'm trying to connect to the Web GUI at localhost:50070, but I'm unable to. I'm attaching a snippet of my console when I execute ./start-all.sh:
root#it-research:/usr/local/hadoop/sbin# ./start-all.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [it-research]
Starting resourcemanager
Starting nodemanagers
pdsh#it-research: localhost: ssh exited with exit code 1
root#it-research:/usr/local/hadoop/sbin# jps
6032 Jps
3154 SecondaryNameNode
2596 NameNode
I'm unable to resolve localhost: ssh exited with exit code 1
Solutions I've tried:
Set up password-less SSH
Set up NameNode User
Set up PDSH to work with SSH
I've also added master [myIPAddressv4Here] in /etc/hosts file and tried connecting to master:50070. but still facing the same issue
Expected Behaviour: I should be able to connect to the Web GUI when I go to localhost:50070, but I can't.
Please let me know if there's some more information I should provide.
The port number for Hadoop 3.x is 9870, so localhost:9870 should work.
Related
I am very new to hadoop and am trying to set a psuedo-distributed mode execution with Hadoop-3.1.2.
When I try to start yarn service I get the following error, please see the code snippet below.
$ sbin/start-yarn.sh
Starting resourcemanagers on []
localhost: ERROR: Cannot set priority of resourcemanager process 13209
pdsh#manager-4: localhost: ssh exited with exit code 1
Starting nodemanagers
localhost: ERROR: Cannot set priority of nodemanager process 13366
pdsh#manager-4: localhost: ssh exited with exit code 1
I tried solutions at this stackoverflow question, which is very similar to my problem. But nothing worked out. A problem same as mine is posted in another forum here. However, no solution is available there as well.
Then, I tried another option which I am describing in the following text.
I set following exports in the file sbin/start-yarn.sh.
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
Then executed with sbin/start-yarn.sh and I got the following error. Please note that I have done all the settings for passwordless ssh for root#localhost.
$ sudo sbin/start-yarn.sh
Starting resourcemanagers on []
localhost: Permission denied (publickey).
pdsh#manager-4: localhost: ssh exited with exit code 255
Starting nodemanagers
localhost: Permission denied (publickey).
pdsh#manager-4: localhost: ssh exited with exit code 255
In addition to the steps suggested by zhao, ephraimbuddy and qitian.
Please make sure that if you have a firewall running than the firewall is not blocking it in anyway. Also make sure that the user with which you are executing the command has enough permissions to update the priorities.
Before running the start-yarn script, try the command: ssh localhost
When you have set passwordless ssh for localhost change the pdsh_rcmd_type value to ssh:
export PDSH_RCMD_TYPE=ssh
this error info actually very confuse me, later i find it happens because i have not correctly config cgroup. so you can firstly check your config make sure they are all right, you can check you resourcemanager logs
I had the same issue, what helped me was the guide I found in this link!
The message "Cannot set priority of resourcemanager process" is misleading. I checked the resource manager logs and found that there was an error as follows
Unexpected close tag </property>; expected </configuration>
I had the same issue and was finally able to solve it. I got ResourceManager and NodeManager to run. If you're running Hadoop 3.3 and up, the issue might be with the java version you're using. hadoop_compatibility
" Apache Hadoop 3.3 and upper supports Java 8 and Java 11 (runtime only)
Please compile Hadoop with Java 8. Compiling Hadoop with Java 11 is not supported"
Solution:
Try switching to java 8.
Then make sure your JAVA_HOME path variables are pointing to java 8 (including any JAVA_HOME path variables in hadoop-env.sh).
If the issue persists, check the error messages in resourcemanager log located in $HADOOP_HOME/logs/.
I am newbie to Hadoop. I tried to do single node cluster set up and while opening Resource manager UI and job history UI, I am getting server not found error.
Please refer the attached image. While executing jps command, I am seeing following O/P:
5023 JobHistoryServer
5554 Jps
4631 ResourceManager
3916 DataNode
4014 NameNode
4124 SecondaryNameNode
4888 NodeManager
I am seeing server not found on these UIs: http://localhost:8088/cluster and http://localhost:19888/jobhistory.
Please assist on how to access these UIs.
Thank you all, for your replies,i am able to access the UIs now.Page was displaying server not found,so i tried to check hostname on box and replaced localhost with hostname/ipaddress,finally able to access both the UIs now.
screenshot
It is my first time in installing Hadoop on my Linux (Fedora distro) running on VM (using Parallel on my Mac). And I followed every step on this video and including the textual version of it.And then when I run it on localhost (or the equivalent value from hostname) in port 50070, I got the following message.
...can't establish a connection to the server at localhost:50070
When I run the jps by the way command I don't have the datanode and namenode unlike at the end of the textual version tutorial which has the following:
While mine has only the following processes running:
6021 NodeManager
3947 SecondaryNameNode
5788 ResourceManager
8941 Jps
When I run the hadoop namenode command I have some of the following [redacted] error:
Cannot access storage directory /usr/local/hadoop_store/hdfs/namenode
16/10/11 21:52:45 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop_store/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
I tried to access by the way the above mentioned directories and it existed.
Any hint for this newbie? ;-)
You would need to give read and write permission to user with which you are running the services on directory /usr/local/hadoop_store/hdfs/namenode.
Once done, you should run format command using hadoop namenode -format
Then try to start your services.
delete files /app/hadoop/tmp/*
and try again formatting the namenode and then start-dfs.sh & start-yarn.sh
I am new to the spark, After installing the spark using parcels available in the cloudera manager.
I have configured the files as shown in the below link from cloudera enterprise:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
After this setup, I have started all the nodes in the spark by running /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-all.sh. But I couldn't run the worker nodes as I got the specified error below.
[root#localhost sbin]# sh start-all.sh
org.apache.spark.deploy.master.Master running as process 32405. Stop it first.
root#localhost.localdomain's password:
localhost.localdomain: starting org.apache.spark.deploy.worker.Worker, logging to /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain: failed to launch org.apache.spark.deploy.worker.Worker:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
localhost.localdomain: at gnu.java.lang.MainThread.run(libgcj.so.10)
localhost.localdomain: full log in /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain:starting org.apac
When I run jps command, I got:
23367 Jps
28053 QuorumPeerMain
28218 SecondaryNameNode
32405 Master
28148 DataNode
7852 Main
28159 NameNode
I couldn't run the worker node properly. Actually I thought to install a standalone spark where the master and worker work on a single machine. In slaves file of spark directory, I given the address as "localhost.localdomin" which is my host name. I am not aware of this settings file. Please any one cloud help me out with this installation process. Actually I couldn't run the worker nodes. But I can start the master node.
Thanks & Regards,
bips
Please notice error info below:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
I met the same error when I installed and started Spark master/workers on CentOS 6.2 x86_64 after making sure that libgcj.x86_64 and libgcj.i686 had been installed on my server, finally I solved it. Below is my solution, wish it can help you.
It seem as if your JAVA_HOME environment parameter didn't set correctly.
Maybe, your JAVA_HOME links to system embedded java, e.g. java version "1.5.0".
Spark needs java version >= 1.6.0. If you are using java 1.5.0 to start Spark, you will see this error info.
Try to export JAVA_HOME="your java home path", then start Spark again.
I am trying to setup the multinode cluster of Hbase. When i do the jps on slave i get
5780 Jps
5558 HQuorumPeer
5684 HRegionServer
1963 DataNode
2093 TaskTracker
similarly on master i get
4254 SecondaryNameNode
15226 Jps
14982 HMaster
3907 NameNode
14921 HQuorumPeer
4340 JobTracker
EVerything is runnnig properly. But when i try to create table on hbase shell. It gives an error
ERROR: org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
regionserver log of my slave(where region server is running):
2013-06-11 13:09:53,119 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at localhost,60000,137093$
2013-06-11 13:10:53,190 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:60000
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at $Proxy8.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2037)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2083)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:744)
at java.lang.Thread.run(Thread.java:722)
2013-06-11 13:10:53,391 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at localhost,60000,137093$
FYI, i have also took care of /etc/hosts file on both master and slave.
127.0.0.1 localhost
127.0.0.1 naresh-PC
I again did changes in /etc/hosts file 127.0.1.1 to naresh-PC. But still getting this error
2013-06-11 14:51:17,781 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at naresh-pc,60000,137094$
2013-06-11 14:52:17,817 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.UnknownHostException: unknown host: naresh-pc
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.<init>(HBaseClient.java:276)
at org.apache.hadoop.hbase.ipc.HBaseClient.createConnection(HBaseClient.java:255)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1111)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at $Proxy8.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2037)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2083)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:744)
at java.lang.Thread.run(Thread.java:722)
Try clearing all the states in Zookeeper.
Stop Zookeeper
Wipe the Zookeeper data directory
Start Zookeeper
I was getting the same issue and followed this approach and it worked fine.
You need to change the configuration on the slave node to point at the master. It is currently pointing to localhost and not connecting to the actual master:
"org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This
server is in the failed servers list: localhost/127.0.0.1:60000 at "
I'm hosting my own cluster inside Docker. Here's what worked in my case. I grepped the HBase log file for errors and found "Master passed us a different hostname to use"
`[root#docker-iop bin]# grep ERROR /var/log/hbase/hbase-hbase-regionserver-bi-mgmt01.local.log
2016-10-06 00:05:29,816 ERROR [regionserver/bi-mgmt01.local/111.11.2.3:16020] regionserver.HRegionServer: Master passed us a different hostname to use; was=my-host-name, but now=111.22.33.444'
I mapped my-host-name to 111.22.333.444 in my hosts file, restarted HBase and it worked.
I also had the same issue with a fully distributed hbase cluster with the configuration below.
Master Node (Node-A)
Backup Masters ($HBASE_HOME/conf/backup-masters) (Node-B & Node-C)
3 Replication servers (Node-A, Node-B & Node-C)
RCA:
The backup-masters nodes were attempted to be started when the cluster started.
Solution
I removed the backup masters by making $HBASE_HOME/conf/backup-masters empty in all hbase nodes.
So I had a cluster running without backup masters.
I wonder if the master node and master nodes must not also function as regionservers? The HBase documentation says otherwise though.
I came across the same issue and could not find anything, it turns out I was copy pasting from the Hbase documentation (https://hbase.apache.org/book.html#shell_exercises). I believe some character in there may be creating the error, so try to manually enter:
create 'test', 'cf'
We resolved this issue. Solution is to
stop Hbase
log to zookeeper-client as root
execute command rmr /hbase-unsecure/meta-region-server
start Hbase
We stop/start Hbase through Ambari UI, delete /hbase... through server bash shell.
[root#s1 ~]# zookeeper-client
Connecting to localhost:2181
.......
[zk: localhost:2181(CONNECTED) 0] rmr /hbase-unsecure/meta-region-server
I use docker/docker-compose to set up my distributed hbase, after I make changes, I can not create table in hbase shell.
I docker rm all the related images, and rebuild them. It works. Also, simply rebuilding the images doesn't work...