I run hbase in a distributed mode. Hbase starts region servers java processes on all nodes, but web ui doesn' show them
http://s1.ipicture.ru/uploads/20120517/16DXTnsU.png
here is hbase-site.xml
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.3.6.44</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hdfs/zookeeper</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://10.3.6.44:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
btw hadoop cluster is running normally and sees all the datanodes
thanks very much for your help.
problem was with dns and hosts file.
Add this property to your hbase-site.xml file and see if it works for you
name - hbase.zookeeper.property.clientPort
value - 2181
Related
I am a beginner to hadoop and HDFS, Now I have a situation where I need to connect 3 different PC having a file, NIFI and Hadop+HDFS.
Machine 1 : Will have a .csv file
Machine 2(Personal laptop): Will have my NIFI running to it.
Machine 3(Running at my office) : will have Hadoop+HDFS in it.
Now I would like to send a csv file from machine 1 to my database running on machine 3 using nifi which is running on machine 2.
I connect to machine 3 using ssh connection which is basically a router at my office.
Question:How can I connect to machine 3 from machine 2 which has nifi which can send the file to my hadoop hbase.
Should I use public key as configuration or should I use a different setup or server?
My configuration of files of haddo and hdfs are as follows
hbase-site.xml
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
Look in to the configuration files and let me know where I need to change the properties and also I have install til now the psudo distributed mode HDFS in machine 3.
Pseudo distributed and fully distributed aren't any different.
You say only machine 3 has HDFS. Therefore only it needs to be running a Namenode and Datanode, setup in a distributed fashion, meaning that external clients will be able to communicate with it.
More specifically, no config file should be using localhost and should instead use LAN IP or hostnames
Configuring hadoop 2.7.1 to retain yarn jobs for longer
Have enabled log aggregation and the jobhistory/timeline server and when a job is complete in the resource manager it does show up in the jobhistory server(if you give the correct url), however the jobhistory server is only listing M/R jobs, not yarn applications
The problem is the job is not visible in the timeline server, in fact no jobs show in the timeline server
Current yarn-site.xml configuration :
<property>
<name>yarn.timeline-service.hostname</name>
<value>host1</value>
</property>
<property>
<name>yarn.timeline-service.address</name>
<value>${yarn.timeline-service.hostname}:10200</value>
</property>
<property>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://${yarn.timeline-service.hostname}:19888/jobhistory/logs/</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/vm/apps/hadoop/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/vm/apps/hadoop/logs</value>
</property>
Am I providing conflicting configuration in using the jobhistory server AND the timeline server?
At the end of the day I want the yarn logs persisted to hdfs for viewing in the web-ui over the following days/weeks
You need to set mapreduce.job.emit-timeline-data property to true in mapred-site.xml
This will enable mapreduce jobs to push events to the timeline server.
I am using the latest hadoop version 3.0.0 build from source code. I have my timeline service up and running and have configured hadoop to use that for job history also. But when I click on history in the resoucemanager UI I get the below error:-
HTTP ERROR 404
Problem accessing /jobhistory/job/job_1444395439959_0001. Reason:
NOT_FOUND
Can someone please point out what I am missing here. Following is my yarn-site.xml:-
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<description>The hostname of the Timeline service web application.</description>
<name>yarn.timeline-service.hostname</name>
<value>0.0.0.0</value>
</property>
<property>
<description>Address for the Timeline server to start the RPC server.</description>
<name>yarn.timeline-service.address</name>
<value>${yarn.timeline-service.hostname}:10200</value>
</property>
<property>
<description>The http address of the Timeline service web application.</description>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<description>The https address of the Timeline service web application.</description>
<name>yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname}:8190</value>
</property>
<property>
<description>Handler thread count to serve the client RPC requests.</description>
<name>yarn.timeline-service.handler-thread-count</name>
<value>10</value>
</property>
<property>
<description>Indicate to ResourceManager as well as clients whether
history-service is enabled or not. If enabled, ResourceManager starts
recording historical data that Timelien service can consume. Similarly,
clients can redirect to the history service when applications
finish if this is enabled.</description>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<description>Store class name for history store, defaulting to file system
store</description>
<name>yarn.timeline-service.generic-application-history.store-class</name>
<value>org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore</value>
</property>
<property>
<description>URI pointing to the location of the FileSystem path where the history will be persisted.</description>
<name>yarn.timeline-service.generic-application-history.fs-history-store.uri</name>
<value>/tmp/yarn/system/history</value>
</property>
<property>
<description>T-file compression types used to compress history data.</description>
<name>yarn.timeline-service.generic-application-history.fs-history-store.compression-type</name>
<value>none</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
and my mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10200</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:8188</value>
</property>
<property>
<name>mapreduce.job.emit-timeline-data</name>
<value>true</value>
</property>
</configuration>
JPS output:
6022 NameNode
27976 NodeManager
27859 ResourceManager
6139 DataNode
6310 SecondaryNameNode
28482 ApplicationHistoryServer
29230 Jps
If you want to see the logs through YARN RM web UI, then you need to enable the log aggregation. For that, you need to set the following parameters, in yarn-site.xml:
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
If you do not enable log aggregation, then NMs will store the logs locally. With the above settings, the logs are aggregated in HDFS at "/app-logs/{username}/logs/". Under this folder, you can find logs for all the applications run so far. Again the log retention is determined by the configuration parameter "yarn.log-aggregation.retain-seconds" (how long to retain the aggregated logs).
When the MapReduce applications are running, then you can access the logs from the YARN's web UI. Once the application is completed, the logs are served through Job History Server.
Also, set following configuration parameter in yarn-site.xml:
<property>
<name>yarn.log.server.url</name>
<value>http://{job-history-hostname}:8188/jobhistory/logs</value>
</property>
I have installed hadoop(1.2.1) multinode on 1 master and 2 slaves. Now I am trying to install hbase over it. The problem is that when I start hbase on the master, it only shows one regionserver(the master itself) while the slaves are not being shown on the web browser. On the terminal each slave has its own regionserver but that is not reflected on the browser. Can anyone tell me as to what the problem is?
I had same problem, i solve it by adding port number in hbase.rootdir
And your hbase-site.xml should look like this
<property>
<name>hbase.zookeeper.quorum</name>
<value>master-IP-Adress,slave1-IP,slave2-IP</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/ravi/zookeeper-data</value>
<description>Property from ZooKeeper config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master-IP:50000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
I'm so newby in hbase cluster , I cluster hbase in Distributed mode and starting fine but when i run hbase shell I can't create table this error is shown:
my base-site.xml configuration is
<property>
<name>hbase.master</name>
<value>matser:60000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-namnode:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
</property>
<property>
<name>hbase.zookeeper.property.clientport</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>usr/local/hbase/temp</value>
</property>
could you please help me ?Thanks in advance
The version of Hbase should compatible to Hadoop version.Downgrade the Hbase it'll work fine.