We have installed hadoop cluster. We want to use HBase over it. My hbase-site.xml is below
<property>
<name>hbase.rootdir</name>
<value>hdfs://ali:54310/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ali,reg_server1</value>
<description>The directory shared by region servers.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>
</description>
</property>
And I have 2 region servers ali and reg_server1. When I open page at http://ali:60010 I see that server reg_server1 has 0 regions. but server ali has n > 0 regions. I put some data to Hbase but, server reg_server1 still has 0 regions. Does it means that this node is not participiating in cluster? How can I resolve it?
Thanks
No, you are ok as long as you see both regionservers in the master's web UI. When you write to an HBase table, it will write to one region (one region is always on one regionserver, in your case, ali). Once you write enough data to make the region exceed the configured max file size, the region will be split and distributed across the two regionservers.
Related
I am a beginner to hadoop and HDFS, Now I have a situation where I need to connect 3 different PC having a file, NIFI and Hadop+HDFS.
Machine 1 : Will have a .csv file
Machine 2(Personal laptop): Will have my NIFI running to it.
Machine 3(Running at my office) : will have Hadoop+HDFS in it.
Now I would like to send a csv file from machine 1 to my database running on machine 3 using nifi which is running on machine 2.
I connect to machine 3 using ssh connection which is basically a router at my office.
Question:How can I connect to machine 3 from machine 2 which has nifi which can send the file to my hadoop hbase.
Should I use public key as configuration or should I use a different setup or server?
My configuration of files of haddo and hdfs are as follows
hbase-site.xml
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
Look in to the configuration files and let me know where I need to change the properties and also I have install til now the psudo distributed mode HDFS in machine 3.
Pseudo distributed and fully distributed aren't any different.
You say only machine 3 has HDFS. Therefore only it needs to be running a Namenode and Datanode, setup in a distributed fashion, meaning that external clients will be able to communicate with it.
More specifically, no config file should be using localhost and should instead use LAN IP or hostnames
I'm setting up a Hbase cluster and I ran into a problem. When I'm writing my data to the cluster some nodes remain empty.
Hbase Status Screen :
Dfshealth screen :
Hbase 1.4.10, Hadoop 3.1.2
node-master hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://node-master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://node-master:9000/zookeeper</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeper.quorum</name>
<value>node-master</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.client.write.buffer</name>
<value>8388608</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>
</configuration>
node-master regionservers (hadoop workers same)
node1
node2
node3
node4
node5
node6
Hbase writes data in regions where each region is a collection of rows based on sorted keys. When you reach a region limit HBase split the region into 2 regions and same when 2nd region reach limit it again split. each region is assigned to a region server(Datanode).
So your table does not have as many regions that it will utilize all the nodes. so to balance the data across nodes you can either pre-split the table while creating it.
Hbase Table Pre-split
Please also read about HBase hotspot problem then you get more understanding.
I am trying to integrate my es 2.2.0 version with hadoop HDFS.In my envoirnment,I have 1 master node and 1 data node. On my master node my Es is installed.
But while integrating it with HDFS my resource manager applications jobs get stuck in Accepted state.
Somehow i found link to change my yarn-site.xml settings:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2200</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
I have done this also but it is not giving me expected output.
Configuration:-
my core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.
</description> </property>
<property> <name>fs.default.name</name>
<value>
hdfs://localhost:54310
</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
</description>
</property>
my mapred-site.xml,
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description>
</property>
my hdfs-site.xml,
<property>
<name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description>
</property>
Please help me how can i change my RM job to running state.So that i can use my elasticsearch data on HDFS.
If the screenshot is correct - you have 0 nodemanager - thus the application can’t start running - you need to start at least 1 nodemanager, so that application master and later tasks can be started.
I have installed hadoop(1.2.1) multinode on 1 master and 2 slaves. Now I am trying to install hbase over it. The problem is that when I start hbase on the master, it only shows one regionserver(the master itself) while the slaves are not being shown on the web browser. On the terminal each slave has its own regionserver but that is not reflected on the browser. Can anyone tell me as to what the problem is?
I had same problem, i solve it by adding port number in hbase.rootdir
And your hbase-site.xml should look like this
<property>
<name>hbase.zookeeper.quorum</name>
<value>master-IP-Adress,slave1-IP,slave2-IP</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/ravi/zookeeper-data</value>
<description>Property from ZooKeeper config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master-IP:50000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
I run hbase in a distributed mode. Hbase starts region servers java processes on all nodes, but web ui doesn' show them
http://s1.ipicture.ru/uploads/20120517/16DXTnsU.png
here is hbase-site.xml
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.3.6.44</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hdfs/zookeeper</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://10.3.6.44:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
btw hadoop cluster is running normally and sees all the datanodes
thanks very much for your help.
problem was with dns and hosts file.
Add this property to your hbase-site.xml file and see if it works for you
name - hbase.zookeeper.property.clientPort
value - 2181