I tried to start up HBase - hadoop

I tried to run start-hbase.sh. but...
dream#dream-VirtualBox:/usr/local/hbase/bin$ cat ~/.bashrc | tail -n 2
export PATH=$PATH:/usr/local/hadoop/sbin/:/usr/local/hadoop/bin/:/usr/local/hbase/bin/:/usr/local/mahout/bin/
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
dream#dream-VirtualBox:/usr/local/hbase/bin$source ~/.bashrc
dream#dream-VirtualBox:/usr/local/hbase/bin$sh -x ./bin/start-hbase.sh
...(skip)...
./start-hbase.sh: 53: [: unexpected operator
+ /usr/local/hbase/bin/hbase-daemons.sh --config /usr/local/hbase/bin/../conf start zookeeper
Error: Could not find or load main class .usr.lib.jvm.java-7-oracle..bin.java
+ /usr/local/hbase/bin/hbase-daemon.sh --config /usr/local/hbase/bin/../conf start master
starting master, logging to /usr/local/hbase/bin/../logs/hbase-dream-master-dream-VirtualBox.out
Error: Could not find or load main class .usr.lib.jvm.java-7-oracle..bin.java
+ /usr/local/hbase/bin/hbase-daemons.sh --config /usr/local/hbase/bin/../conf --hosts /usr/local/hbase/bin/../conf/regionservers start regionserver
starting regionserver, logging to /usr/local/hbase/bin/../logs/hbase-dream-1-regionserver-dream-VirtualBox.out
Error: Could not find or load main class .usr.lib.jvm.java-7-oracle..bin.java
+ /usr/local/hbase/bin/hbase-daemons.sh --config /usr/local/hbase/bin/../conf --hosts /usr/local/hbase/bin/../conf/backup-masters start master-backup
I observed start-hbase.sh that it tried to run shell of /usr/local/hbase/bin/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool in fail.
I didn't sure that hbase why always throw exception.
dream#dream-VirtualBox:/usr/local/hbase$ /usr/local/hbase/bin/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool
Error: Could not find or load main class .usr.lib.jvm.java-7-oracle..bin.java
dream#dream-VirtualBox:/usr/local/hbase$ ./bin/hbase shell
Error: Could not find or load main class .usr.lib.jvm.java-7-oracle..bin.java
But... I tried to use sudo. it maybe look work
dream#dream-VirtualBox:/usr/local/hbase$ sudo ./bin/start-hbase.sh
starting master, logging to /usr/local/hbase/bin/../logs/hbase-root-master-dream-VirtualBox.out
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
dream#dream-VirtualBox:/usr/local/hbase$ jps
2869 NameNode
3540 NodeManager
3403 ResourceManager
3237 SecondaryNameNode
3031 DataNode
5666 Jps
dream#dream-VirtualBox:/usr/local/hbase$ sudo jps
5053 HQuorumPeer
2869 NameNode
3540 NodeManager
5857 Jps
3403 ResourceManager
3237 SecondaryNameNode
3031 DataNode
dream#dream-VirtualBox:/usr/local/hbase$ sudo ./bin/hbase shell
2015-08-10 15:41:04,136 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015
hbase(main):001:0>
My environment
Linux dream-VirtualBox 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Java-7-oracle#1.7.0_80
HBase-1.1.1
My HBase setting
conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/hbase/zookeeper</value>
</property>
</configuration>
~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export PATH=$PATH:/usr/local/hadoop/sbin/:/usr/local/hadoop/bin/:/usr/local/hbase/bin/
Would you please give me any help?
Thanks.

First, I not sure that why had some funny property of exec in /bin/hbase.
/bin/hbase:
exec "$JAVA" -Dproc_$COMMAND -XX:OnOutOfMemoryError="kill -9 %p" $HEAP_SETTINGS $HBASE_OPTS $CLASS "$#"
Entity:
exec /usr/lib/jvm/java-7-oracle/bin/java -DXXXXXX /usr/lib/jvm/java-7-oracle//bin/java -Xmx1000m -DXXXXXX
I think that I needed delete /usr/lib/jvm/java-7-oracle//bin/java.
I observed line 217-229 in script of /bin/hbase.
217 #If avail, add Hadoop to the CLASSPATH and to the JAVA_LIBRARY_PATH
218 # Allow this functionality to be disabled
219 if [ "$HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP" != "true" ] ; then
220 HADOOP_IN_PATH=$(PATH="${HADOOP_HOME:-${HADOOP_PREFIX}}/bin:$PATH" which hadoop 2>/dev/null)
221 if [ -f ${HADOOP_IN_PATH} ]; then
222 HADOOP_JAVA_LIBRARY_PATH=$(HADOOP_CLASSPATH="$CLASSPATH" ${HADOOP_IN_PATH} \
223 org.apache.hadoop.hbase.util.GetJavaProperty java.library.path 2>/dev/null)
224 if [ -n "$HADOOP_JAVA_LIBRARY_PATH" ]; then
225 JAVA_LIBRARY_PATH=$(append_path "${JAVA_LIBRARY_PATH}" "$HADOOP_JAVA_LIBRARY_PATH")
226 fi
227 CLASSPATH=$(append_path "${CLASSPATH}" `${HADOOP_IN_PATH} classpath 2>/dev/null`)
228 fi
229 fi
That do something when HADOOP_PATH in PATH.
To explain why my user(dream) didn't run /bin/hbase but root was fine.
So, I had remove HADOOP_PATH in PATH. It seem work.
dream#dream-VirtualBox:/usr/local/hbase/bin$ ./start-hbase.sh
starting master, logging to /usr/local/hbase/bin/../logs/hbase-dream-master-dream-VirtualBox.out
dream#dream-VirtualBox:/usr/local/hbase/bin$ jps
22956 Jps
2869 NameNode
3540 NodeManager
3403 ResourceManager
3237 SecondaryNameNode
22722 HMaster
3031 DataNode
dream#dream-VirtualBox:/usr/local/hbase/bin$ ./hbase shell
2015-08-10 23:33:44,016 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015
hbase(main):001:0>

add JAVA_HOME in hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
add given property in hbase-site.xml
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
#create zookeeperDir directory with permission 755
<value>/home/kishore/zookeeperDir</value>
</property>
make sure your zookeeper should be run on port 2181.

You ran sh interpreter with the command
sh -x ./bin/start-hbase.sh
Use instead
./bin/start-hbase.sh
as you did in
sudo ./bin/start-hbase.sh
This automatically selects the script interpreter that may be different as the first line of start-hbase.sh says
#!/usr/bin/env bash
The difference between these two ways is explained here: https://askubuntu.com/questions/22910/what-is-the-difference-between-and-sh-to-run-a-script
This solved the problem I had with
bin/start-hbase.sh: 51: [: unexpected operator
I am using hbase-1.1.2 so the line may have changed.

The issue is with already running ZK service.
The error message/logs in a screenshot you attached clearly mentioned the problem :
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
I also faced the same problem, but when I stopped the ZK service everything works well. JPS started listing HMaster service.
I have used Java 8 and HBase 2.2.0

Related

Hadoop 3.3.0: RPC response has invalid length

I just installed PySpark via Homebrew and I'm currently trying to put stuff into Hadoop.
The Problem
Any interaction with Hadoop is failing.
I followed a tutorial to set up Hadoop3.3.0 on MacOS.
It somehow didn't work out even though the only things I fixed where some versions (specific JDK, MySQL etc).
Whenever I try to run any command related to Hadoop, I receive this:
▶ hadoop fs -ls /
2021-05-12 07:45:44,647 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: RPC response has invalid length
Running this code in a notebook:
from pyspark.sql.session import SparkSession
# https://saagie.zendesk.com/hc/en-us/articles/360029759552-PySpark-Read-and-Write-Files-from-HDFS
sparkSession = SparkSession.builder.appName("example-pyspark-read-and-write").getOrCreate()
# Create data
data = [('First', 1), ('Second', 2), ('Third', 3), ('Fourth', 4), ('Fifth', 5)]
df = sparkSession.createDataFrame(data)
# Write into HDFS
df.write.csv("hdfs://localhost:9000/cluster/example.csv")
# Read from HDFS
df_load = sparkSession.read.csv("hdfs://localhost:9000/cluster/example.csv")
df_load.show()
sc.stop()
throws me
Py4JJavaError Traceback (most recent call last)
<ipython-input-5-e25cae5a6cac> in <module>
8
9 # Write into HDFS
---> 10 df.write.csv("hdfs://localhost:9000/cluster/example.csv")
11 # Read from HDFS
12 df_load = sparkSession.read.csv("hdfs://localhost:9000/cluster/example.csv")
/usr/local/Cellar/apache-spark/3.1.1/libexec/python/pyspark/sql/readwriter.py in csv(self, path, mode, compression, sep, quote, escape, header, nullValue, escapeQuotes, quoteAll, dateFormat, timestampFormat, ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, charToEscapeQuoteEscaping, encoding, emptyValue, lineSep)
1369 charToEscapeQuoteEscaping=charToEscapeQuoteEscaping,
1370 encoding=encoding, emptyValue=emptyValue, lineSep=lineSep)
-> 1371 self._jwrite.csv(path)
1372
1373 def orc(self, path, mode=None, partitionBy=None, compression=None):
/usr/local/lib/python3.9/site-packages/py4j/java_gateway.py in __call__(self, *args)
1307
1308 answer = self.gateway_client.send_command(command)
-> 1309 return_value = get_return_value(
1310 answer, self.gateway_client, self.target_id, self.name)
1311
/usr/local/Cellar/apache-spark/3.1.1/libexec/python/pyspark/sql/utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
/usr/local/lib/python3.9/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o99.csv.
: java.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response has invalid length; Host Details : local host is: "blkpingu16-MBP.fritz.box/192.xxx.xxx.xx"; destination host is: "localhost":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
...
at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:979)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
...
at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: org.apache.hadoop.ipc.RpcException: RPC response has invalid length
at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1827)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1173)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1069)
There it is: RPC response has invalid length
I have configured and verified all my paths in various config files like
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>ipc.maximum.data.length</name>
<value>134217728</value>
</property>
</configuration>
.zshrc
JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home"
...
## JAVA env variablesexport JAVA_HOME="/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home"
export PATH=$PATH:$JAVA_HOME/bin
## HADOOP env variables
export HADOOP_HOME="/usr/local/Cellar/hadoop/3.3.0/libexec"
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
## HIVE env variables
export HIVE_HOME=/usr/local/Cellar/hive/3.1.2_3/libexec
export PATH=$PATH:/$HIVE_HOME/bin
## MySQL ENV
export PATH=$PATH:/usr/local/Cellar/mysql/8.0.23_1/bin
Hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hadoop-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_141.jdk/Contents/Home
if I start Hadoop it seems to start all nodes:
▶ $HADOOP_HOME/sbin/start-all.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [blkpingu16-MBP.fritz.box]
2021-05-12 08:18:15,786 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
jps shows that Hadoop stuff is running, along some Spark Stuff
▶ jps
166 Jps
99750 ResourceManager
99544 SecondaryNameNode
99851 NodeManager
98154 SparkSubmit
99405 DataNode
39326 Master
http://localhost:8088/cluster is available and shows the Hadoop dashboard (Yarn, according to the tutorial I followed)
http://localhost:8080 is available and shows the Spark dashboard
http://localhost:9870 is not available (should show me something Hadoop related)
My main problem is that I don't know why my namenode is not there, because it should and subsequently why I can't communicate with the HDFS in order to interact with it via command line (put data in it) or request data via notebooks.
Something Hadoop related is broken and I can't figure out how to fix it.
I faced the same issue today and would like to note it here if anyone faced a similar issue. A quick command jps show me that the NameNode process is not there - although there is no warning or error show up.
As I discovered in the .log file of the NameNode in Hadoop, there was a java.net.BindException: Problem binding to [localhost:9000], which made me think that the port 9000 is used by another process. I use the command from this source to check open ports, indeed it is used by a python process (I ran only PySpark at that time). (sudo lsof -i -P -n | grep LISTEN for anyone needs by the way)
The solution is pretty straightforward: change the port number in fs.defaultFS field in etc/core-site.xml to another port that is not in used (mine is 9900).

Spark-shell --master yarn stuck

I installed Hadoop and Spark via Homebrew
$ brew list --versions | grep spark
apache-spark 2.2.0
$ brew list --versions | grep hadoop
hadoop 2.8.1 2.8.2 hdfs
where Hadoop 2.8.2 is what I am using.
I followed this post to configure Hadoop. Also, followed this post to configure spark.yarn.archive as:
spark.yarn.archive hdfs://localhost:9000/user/panc25/spark-jars.zip
The following are my Hadoop/Spark related environment setting in my .bash_profile :
# ---------------------
# Hadoop
# ---------------------
export HADOOP_HOME=/usr/local/Cellar/hadoop/2.8.2
export YARN_CONF_DIR=$HADOOP_HOME/libexec/etc/hadoop/
alias hadoop-start="$HADOOP_HOME/sbin/start-dfs.sh;$HADOOP_HOME/sbin/start-yarn.sh"
alias hadoop-stop="$HADOOP_HOME/sbin/stop-yarn.sh;$HADOOP_HOME/sbin/stop-dfs.sh"
# ---------------------
# Apache Spark
# ---------------------
export SPARK_HOME=/usr/local/Cellar/apache-spark/2.2.0/libexec
export PATH=$SPARK_HOME/../bin:$SPARK_HOME/sbin:$PATH
I can successfully start hadoop (hdfa + yarn):
$ hadoop-start
17/11/12 17:08:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/hadoop-panc25-namenode-mbp13mid2017.local.out
localhost: starting datanode, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/hadoop-panc25-datanode-mbp13mid2017.local.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/hadoop-panc25-secondarynamenode-mbp13mid2017.local.out
17/11/12 17:08:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/yarn-panc25-resourcemanager-mbp13mid2017.local.out
localhost: starting nodemanager, logging to /usr/local/Cellar/hadoop/2.8.2/libexec/logs/yarn-panc25-nodemanager-mbp13mid2017.local.out
$ jps
92723 NameNode
93188 Jps
93051 ResourceManager
93149 NodeManager
92814 DataNode
92926 SecondaryNameNode
However, when I start spark-shell --master yarn it seems to freeze and I don't know what is going on:
What is wrong?
BTW, I could visit the SparkUI http://localhost:4040/, but all pages are blank.
I experienced a similar issue an was caused by the fact that I forgot to append /conf to HADOOP_CONF_DIR env variable (/etc/hadoop/conf).
In my case I was running spark 2.1 cloudera distribution and specified HADOOP_CONF_DIR=/etc/hadoop/conf/:/etc/hive/conf/ . Due to some reason it was getting stuck so I modified it to HADOOP_CONF_DIR=/etc/hadoop/conf/ and it worked. Still looking for the root cause !

hadoop error: util.NativeCodeLoader (hdfs dfs -ls does not work!)

I have seen a lot of folks getting problem with hadoop installation. I went through all the related stackoverflow questions, but could not fix the problem.
The problem is :
hdfs dfs -ls
16/09/27 09:43:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `.': No such file or directory
I am using ubuntu 16.04 and I downloaded hadoop stable version 2.7.2 from Apache mirror:
http://apache.spinellicreations.com/hadoop/common/
I have installed java and ssh already.
which java
java is /usr/bin/java
which javac
javac is /usr/bin/javac
which ssh
ssh is /usr/bin/ssh
echo $JAVA_HOME
/usr/lib/jvm/java-9-openjdk-amd64
Note:
sudo update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/jvm/java-9-openjdk-amd64/bin/java 1091 auto mode
1 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1081 manual mode
2 /usr/lib/jvm/java-9-openjdk-amd64/bin/java 1091 manual mode
Press <enter> to keep the current choice[*], or type selection number:
hadoop environment variables in ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-9-openjdk-amd64
export HADOOP_INSTALL=/home/bhishan/hadoop-2.7.2
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export PATH=$PATH:$HADOOP_HOME/bin
Modification of file:
/home/bhishan/hadoop-2.7.2/etc/hadoop/hadoop-env.sh
Added a one line at the end:
export JAVA_HOME=/usr/lib/jvm/java-9-openjdk-amd64
The link to hadoop-env.sh in the pastebin is here:
http://pastebin.com/a3iPjB04
Then I created some empty directories:
/home/bhishan/hadoop-2.7.2/tmp
/home/bhishan/hadoop-2.7.2/etc/hadoop/hadoop_store
/home/bhishan/hadoop-2.7.2/etc/hadoop/hadoop_store/hdfs
/home/bhishan/hadoop-2.7.2etc/hadoop/hadoop_store/hdfs/datanode
/home/bhishan/hadoop-2.7.2/etc/hadoop/hadoop_store/hdfs/namenode
Modifications to the file: /home/bhishan/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/bhishan/hadoop-2.7.2/etc/hadoop/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/bhishan/hadoop-2.7.2/etc/hadoop/hadoop_store/hdfs/datanode</value>
</property>
The link in the pastebin is this:
http://pastebin.com/cha7ZBr8
Modifications to the file: /home/bhishan/hadoop-2.7.2/etc/hadoop/core-site.xml
is following:
hadoop.tmp.dir
/home/bhishan/hadoop-2.7.2/tmp A base
for other temporary directories.
fs.default.name
hdfs://localhost:54310 The name of the
default file system. A URI whose scheme and authority determine the
FileSystem implementation. The uri's scheme determines the config
property (fs.SCHEME.impl) naming the FileSystem implementation
class. The uri's authority is used to determine the host, port,
etc. for a filesystem.
The link to the pastebin for core-site.xml is this:
http://pastebin.com/D184DuGB
The Modification to file are given below: /home/bhishan/hadoop-2.7.2/etc/hadoop/mapred-site.xml
mapred.job.tracker
localhost:54311 The host and port that
the MapReduce job tracker runs at. If "local", then jobs are run
in-process as a single map and reduce task.
The pastebin link is:
http://pastebin.com/nVxs8nMm
when I type hostname in the terminal it says BP
cat /etc/hosts
127.0.0.1 localhost BP
127.0.1.1 localhost
The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
I have also disabled ipv6
cat /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1
hadoop descriptions
hadoop version
Hadoop 2.7.2
which hadoop
hadoop is /home/bhishan/hadoop-2.7.2/bin/hadoop
which hdfs
hdfs is /home/bhishan/hadoop-2.7.2/bin/hdfs
Restarting hadoop
cd /home/bhishan/hadoop-2.7.2/sbin
stop-dfs.sh
stop-yarn.sh
cd /home/bhishan/hadoop-2.7.2/tmp && rm -Rf *
hadoop namenode -format
start-dfs.sh
start-yarn.sh
Now the error comes
hdfs dfs -ls
16/09/26 23:53:14 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable ls: `.': No such file or directory
checking jps
jps
6688 sun.tools.jps.Jps
3909 SecondaryNameNode
3525 NameNode
4327 NodeManager
4184 ResourceManager
3662 DataNode
checknative
hadoop checknative -a
16/09/27 09:28:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false
openssl: false
16/09/27 09:28:18 INFO util.ExitUtil: Exiting with status 1
Then I installed missing libraries:
a) which hadoop gives Hadoop 2.7.2
b) sudo apt-get install --reinstall zlibc zlib1g zlib1g-dev
From synaptic manager I can see following libraries installed:
zlib1g, zlib1g-dev , zlib1g:i386, zlibc
c) Installed snappy and python-snappy.
d) In Synaptic manager I can see lz4
liblz4-1, liblz4-tool, python-lz4, python3-lz4
e) bzip2 is already installed.
f) openssl is already installed.
All checknative are false and I can not run hdfs dfs -ls
I could not find any errors till now. Any help will be appreciated.
Also, I am trying to run hadoop in Single laptop with four cores. The version is 2.7.2, How is version 3.0, If I have to reinstall the hadoop from Scratch, may be I should go with hadoop3. Suggestions will be welcomed.
Related links:
Result of hdfs dfs -ls command
hdfs dfs ls not working after multiple nodes configured
hadoop fs -ls does not work
Namenode not getting started
No Namenode or Datanode or Secondary NameNode to stop
Hadoop 2.6.1 Warning: WARN util.NativeCodeLoader
Hadoop 2.2.0 Setup (Pseudo-Distributed Mode): ERROR// Warn util.NativeCodeLoader: unable to load native-hadoop library
Command "hadoop fs -ls ." does not work
And, also,
hadoop fs -mkdir failed on connection exception
Hadoop cluster setup - java.net.ConnectException: Connection refused
Hadoop (local and host destination do not match) after installing hive
Help will be truly appreciated!
From this error:
hdfs dfs -ls
16/09/27 09:43:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `.': No such file or directory
Ignore the warning about the native libraries - the command should work fine even with that warning.
When you run hdfs dfs -ls with no path as you have done, it attempts to list the contents of your home directory in HDFS, which is /user/ by default. In this case, I suspect this issue is simply that your user directory does not exist.
Does it work OK if you run:
hadoop fs -ls /
And then do:
hadoop fs -mkdir -p /user/<your_user_name>
hadoop fs -ls

HBase fails to start in single node cluster mode on Mac OSX

I am trying to get a personal HBase development environment set up. I have hdfs and yarn running, but cannot get HBase to start.
I have started up hadoop 2.7.1, by running start-dfs.sh and start-yarn.sh. I have verified these are running by testing hdfs dfs -mkdir /test and running a sample MR job bundled in the examples, I have browsed HDFS at port 50070.
I have started zookeeper 3.4.6 on port 2181 and set its dataDir. My zoo.cfg has:
dataDir=/Users/.../tools/hd/zookeeper_data
clientPort=2181
I observe its zookeeper_server.PID file in the dataDir I chose, and when I run jps I see the below:
51074 NodeManager
50743 DataNode
50983 ResourceManager
50856 SecondaryNameNode
57848 QuorumPeerMain
58731 Jps
50653 NameNode
QuorumPeerMain above matches the PID in zookeeper_server.PID, as I would expect. Is this expectation correct? From what I have done so far, should it be expected that any more processes should be showing here?
I installed hbase-1.1.2. I configure hbase-site.xml. I set the hbase.rootDir to be hdfs://localhost:8200/hbase, my hdfs is running at localhost:8200. I set hbase.zookeeper.property.dataDir to my zookeeper's dataDir, with the expectation that it will use this property to find the PID of a running zookeeper. Is this expectation correct or have I misunderstood? The config in hbase-site.xml is:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>Users/.../tools/hd/zookeeper_data</value>
</property>
When I run start-hbase.sh my server fails to start. I see this log message:
2015-09-26 19:32:43,617 ERROR [main] master.HMasterCommandLine: Master exiting
To investigate I ran hbase master start and get more detail:
2015-09-26 19:41:26,403 INFO [Thread-1] server.NIOServerCnxn: Stat command output
2015-09-26 19:41:26,405 INFO [Thread-1] server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:63334 (no session established for client)
2015-09-26 19:41:26,406 INFO [main] zookeeper.MiniZooKeeperCluster: Started MiniZooKeeperCluster and ran successful 'stat' on client port=2182
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
2015-09-26 19:41:26,406 ERROR [main] master.HMasterCommandLine: Master exiting
java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:214)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2304)
So I have a few questions:
Should I be trying to set up a zookeeper before running HBase?
Why when I have started a zookeeper and told HBase where its dataDir is, does HBase try to start its own zookeeper?
Anything obviously stupid/misguided in the above?
The script you are using to start hbase start-hbase.sh will try to start the following components, in order:
zookeeper
hbase master
hbase regionserver
hbase master-backup
So, you could either stop the zookeeper which is started by you (or) you could start the daemons individually yourself:
# start hbase master
bin/hbase-daemon.sh --config ${HBASE_CONF_DIR} start master
# start region server
bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} --hosts ${HBASE_CONF_DIR}/regionservers start regionserver
HBase stand alone starts it's own zookeeper (if you run start-hbase.sh), but it if fails to start or keep running, the other need hbase daemons won't work.
Make sure you explicitly set the properties for your interface lo0 in the hbase-site.xml file:
<property>
<name>hbase.zookeeper.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<value>lo0</value>
</property>
<property>
<name>hbase.master.dns.interface</name>
<value>lo0</value>
</property>
I found that when my wifi was on, if these entries were missing, zookeeper filed to start.

Running JAR in Hadoop on Google Cloud using Yarn-client

i want to run a JAR in Hadoop on Google Cloud using Yarn-client.
i use this command in the master node of hadoop
spark-submit --class find --master yarn-client find.jar
but it return this error
15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/10.240.180.15:8032
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/10.240.180.15:8032. Already tried 0
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
What is the problem? In case it is useful this is my yarn-site.xml
<?xml version="1.0" ?>
<!--
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/yarn-logs/</value>
<description>
The remote path, on the default FS, to store logs.
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-m-on8g</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5999</value>
<description>
In your case, it looks like the YARN ResourceManager may be unhealthy for unknown reasons; you can try to fix yarn with the following:
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh
However, it looks like you're using the Click-to-Deploy solution; Click-to-Deploy's Spark + Hadoop 2 deployment actually doesn't support Spark on YARN at the moment, due to some bugs and lack of memory configs. You'd normally run into something like this if you just try to run it with --master yarn-client out-of-the-box:
15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED
15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1434561664937
yarnAppState: ACCEPTED
15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: 0
appStartTime: 1434561664937
yarnAppState: RUNNING
15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
The well-supported way to deploy is a cluster on Google Compute Engine with Hadoop 2 and Spark configured to be able to run on YARN is to use bdutil. You'd run something like:
./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d \
-e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh
./bdutil -e my_custom_env.sh deploy
# Shorthand for logging in to the master
./bdutil -e my_custom_env.sh shell
# Handy way to run a socks proxy to make it easy to access the web UIs
./bdutil -e my_custom_env.sh socksproxy
# When done, delete your cluster
./bdutil -e my_custom_env.sh delete
With spark_on_yarn_env.sh Spark should default to yarn-client, though you can always re-specify --master yarn-client if you want. You can see a more detailed explanation of the flags available in bdutil with ./bdutil --help. Here are the help entries just for the flags I included above:
-b, --bucket
Google Cloud Storage bucket used in deployment and by the cluster.
-d, --use_attached_pds
If true, uses additional non-boot volumes, optionally creating them on
deploy if they don't exist already and deleting them on cluster delete.
-e, --env_var_files
Comma-separated list of bash files that are sourced to configure the cluster
and installed software. Files are sourced in order with later files being
sourced last. bdutil_env.sh is always sourced first. Flag arguments are
set after all sourced files, but before the evaluate_late_variable_bindings
method of bdutil_env.sh. see bdutil_env.sh for more information.
-P, --prefix
Common prefix for cluster nodes.
-p, --project
The Google Cloud Platform project to use to create the cluster.
-z, --zone
Specify the Google Compute Engine zone to use.

Resources