I have installed hadoop(1.2.1) multinode on 1 master and 2 slaves. Now I am trying to install hbase over it. The problem is that when I start hbase on the master, it only shows one regionserver(the master itself) while the slaves are not being shown on the web browser. On the terminal each slave has its own regionserver but that is not reflected on the browser. Can anyone tell me as to what the problem is?
I had same problem, i solve it by adding port number in hbase.rootdir
And your hbase-site.xml should look like this
<property>
<name>hbase.zookeeper.quorum</name>
<value>master-IP-Adress,slave1-IP,slave2-IP</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/ravi/zookeeper-data</value>
<description>Property from ZooKeeper config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master-IP:50000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
Related
I am trying to set up a multi-node Hadoop cluster between 2 windows devices. I am using Hadoop 2.9.2.
how can I achieve that, please.
after a lot of trial and error the following did the job me.
do same configuration as previous answer by #AbsoluteBeginner.
disable windows firewall on all machines (i think you could keep it on and just mess around with the rules, but thats for you to find out)
hdfs namenode -format all nodes (master and slaves)
make sure that the datanode folder is empty in all 3 nodes (just shift+del)
in master node run start-all.cmd. all the following should appear.
50436 NameNode
54696 NodeManager
54744 DataNode
60028 Jps
7340 ResourceManager
in slave nodes run start-all.cmd. all the following should appear
6116 DataNode
2408 Jps
3208 NodeManager
note the reason that nameode and resource manager isn't appearing, is becuase they are running on master node and already occupy the port, and you only need the master resourcemanger and name node running
note if you saw multi-cluster tutorial of linux the master node also shows SeceondryNameNode when executing jps. not really sure why its not appearing in windows.
go to master:50070, and navigate to data nodes you should see something like this
go to master:8088, and navigate to Node you should see something like this
Install open-ssh server on both of your systems using this guide. Generating a new SSH public and private key pair on your local computer is the first step towards authenticating with a remote server without a password. Add the public key to the authorized_keys and add your hostname to list of known hosts. You can find guides on how to do this by searching the internet.
2.Add your hadoop master and slave ips to your hosts file. Open “C:\Windows\System32\drivers\etc\hosts”
and add
your-master-ip hadoopMaster
your-salve-ip hadoopSlave
you can use these names in your configuration files.
much like Linux systems, these are the steps you have to follow in order to run a Hadoop cluster on windows:
3. First you need to have Java installed on your system and JAVA_HOME must be added to your environment variables. You can download Java from Oracle website and install it.
Download Hadoop binary files from Apache website and extract it.
Note that you shouldn't have space in your folder names or you might encounter problems.
Next you have to add Java and Hadoop home and bin folders to your environment variables. just open start menu and type "environment variable" and open the edit environment variables window from control panel.
Add
HADOOP_HOME=”root of your hadoop extracted folder\hadoop-2.9.2″
HADOOP_BIN=”root of hadoop extracted folder\hadoop-2.9.2\bin”
JAVA_HOME=<Root of your JDK installation>”
Edit your "path" environment variable and add %JAVA_HOME%, %HADOOP_HOME%, %HADOOP_BIN%, %HADOOP_HOME%/sbin to your PATH one by one.
you can validate your additions by opening cmd and type in:
echo %HADOOP_HOME%
echo %HADOOP_BIN%
echo %PATH%
CONFIGURING HADOOP:
10. Open "your hadoop root\hadoop-2.9.2\etc\hadoop\hadoop-env.cmd" and add the following lines to the bottom of the file:
set HADOOP_PREFIX=%HADOOP_HOME%
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin
11.Open "your-hadoop-root\hadoop-2.9.2\etc\hadoop\hdfs-site.xml" and add the below content:
<property>
<name>dfs.name.dir</name>
<value>your desired address</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>your desired address</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoopMaster:50070</value>
<description>Your NameNode hostname for http access.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoopMaster:50090</value>
<description>Your Secondary NameNode hostname for http access.</description>
</property>
edit your core-site.xml and add:
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopMaster:9000</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>your-temp-directory</value>
<description>A base for other temporary directories.</description>
</property>
Open "root to hadoop\hadoop-2.9.2\etc\hadoop\mapred-site.xml" and add below content within tags. If you don’t see mapred-site.xml then open mapred-site.xml.template file and rename it to mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>hadoopMaster:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
14.Edit your yarn-site.xml and add:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
<description>Long running service which executes on Node Manager(s) and provides MapReduce Sort and Shuffle functionality.</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>Enable log aggregation so application logs are moved onto hdfs and are viewable via web ui after the application completed. The default location on hdfs is '/log' and can be changed via yarn.nodemanager.remote-app-log-dir property</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopMaster:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopMaster:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopMaster:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoopMaster:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoopMaster:8088</value>
</property>
In your slaves file in "root-hadoop-directory/hadoop/bin" add
hadoopSlave
Do these steps on your slave nodes too.
open cmd and cd to your sbin folder in hadoop directory.
18.format your nameNode
hadoop namenode -format
19.run the following command:
start-dfs.sh
then run:
start-yarn.sh
I am trying to integrate my es 2.2.0 version with hadoop HDFS.In my envoirnment,I have 1 master node and 1 data node. On my master node my Es is installed.
But while integrating it with HDFS my resource manager applications jobs get stuck in Accepted state.
Somehow i found link to change my yarn-site.xml settings:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2200</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
I have done this also but it is not giving me expected output.
Configuration:-
my core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.
</description> </property>
<property> <name>fs.default.name</name>
<value>
hdfs://localhost:54310
</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
</description>
</property>
my mapred-site.xml,
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description>
</property>
my hdfs-site.xml,
<property>
<name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description>
</property>
Please help me how can i change my RM job to running state.So that i can use my elasticsearch data on HDFS.
If the screenshot is correct - you have 0 nodemanager - thus the application can’t start running - you need to start at least 1 nodemanager, so that application master and later tasks can be started.
I have a small HBase/Hadoop cluster. I can login to the master server and run commands against hbase via the hbase shell without a problem. However, when I try to do the same thing from a remote machine I get the below error:
ERROR: 20000 millis timeout while waiting for channel to be ready for connect. h : java.nio.channels.SocketChannel[connection-pending remote=/10.175.224.75:6000]
I ran a netstat on the master server to confirm it was listening on port 60000 and got the following (x.x.x.x is the ip address of the master server).
tcp 0 0 x.x.x.x:60000 : LISTEN
I believe this means that the master server is only listening for connections on port 60000 that originate from itself. Is there a way to configure HBase to accept remote connections? Here is my hbase-site.xml.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.master</name>
<value>master:60000</value>
<description>The host and port that the HBase master runs at.
TODO: Support 'local' (All running in single context).
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/master</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hduser/hbase/zookeeper</value>
</property>
</configuration>
I run hbase in a distributed mode. Hbase starts region servers java processes on all nodes, but web ui doesn' show them
http://s1.ipicture.ru/uploads/20120517/16DXTnsU.png
here is hbase-site.xml
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.3.6.44</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hdfs/zookeeper</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://10.3.6.44:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
btw hadoop cluster is running normally and sees all the datanodes
thanks very much for your help.
problem was with dns and hosts file.
Add this property to your hbase-site.xml file and see if it works for you
name - hbase.zookeeper.property.clientPort
value - 2181
We have installed hadoop cluster. We want to use HBase over it. My hbase-site.xml is below
<property>
<name>hbase.rootdir</name>
<value>hdfs://ali:54310/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ali,reg_server1</value>
<description>The directory shared by region servers.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>
</description>
</property>
And I have 2 region servers ali and reg_server1. When I open page at http://ali:60010 I see that server reg_server1 has 0 regions. but server ali has n > 0 regions. I put some data to Hbase but, server reg_server1 still has 0 regions. Does it means that this node is not participiating in cluster? How can I resolve it?
Thanks
No, you are ok as long as you see both regionservers in the master's web UI. When you write to an HBase table, it will write to one region (one region is always on one regionserver, in your case, ali). Once you write enough data to make the region exceed the configured max file size, the region will be split and distributed across the two regionservers.