Bigtop Hbase tables disappeared after PC restart - hadoop

I installed Bigtop 0.7.0 on Ubuntu 12.04 and I started without any problem the master server with:
sudo hbase master start
I was able to connect with hbase shell and create a table.
After I restarted the PC, I saw that table is not there anymore.
I read that the problem is that it stores tables in /tmp which is cleared after restart, so I tried to change the configuration hbase-site.xml to set another folder.
the default hbase-site.xml was:
<configuration/>
(No properties defined)
When I wrote in hbase-site.xml, then I tried to start the hbase master again and I recieved Zookeeper client exception not possible to connect to server.
Can you please give me some advice on how to configure this right or if there is maybe some other problem that I'm not aware of?
EDIT (from the comments):
My hbase-site.xml is:
<configuration>
<!--property>
<name>hbase.rootdir</name>
<value>file://app/hadoop/tmp/hbase</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<property-->
</configuration>

Related

Unable to install hadoop on macosx

I am unable to run hadoop on my Max OSX.
When I run hadoop version -> nothing happens
When I run sudo hadoop version -> it shows my hadoop version. I read somewhere that I shldn't be using sudo to run hadoop. but anyway this tells me that my hadoop is installed?
Because hadoop version returns nothing, I am unable to open any nodes. Everytime I try to start a node with start-dfs.sh, nothing happens as well. Does anyone knows what's happening here? I've look through all the configuration files multiple times to ensure that I have set them right. Not sure where the problem lies here.
What i did:
Followed the instructions from here
In my config files I edited:
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
etc/hadoop/hadoop-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home
In fact I managed to run everything before.
I am not sure what I did after tht, but I am unable to start the nodes again. I think it might be because I failed to close my nodes before I shut down my computer before?

hbase.master.port overridden programmatically?

I installed hbase from cloudera 5.3.3 distribution and as I run the hbase everything seems to be working fine...
When I try assign hbase.master.port via /etc/hbase/conf/hbase-site.xml it does not pick it from there.
I see this from master node info http://MASTERNODE:60010/conf
<property>
<name>hbase.master.port</name>
<value>0</value>
<source>programatically</source>
</property>
hbase distribution: 0.98.6-cdh5.3.3
What does this 'programmatically' mean and how can I disable/override it ?
answering my own question :(
as i just figured out the hbase standalone mode do not takes hbase.master.port into account
https://github.com/cloudera/hbase/blob/cdh4.5.0-release/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java#L141
standalone mode:
http://www.cloudera.com/content/cloudera/en/documentation/core/v5-2-x/topics/cdh_ig_hbase_standalone_start.html
only way to assign a port is to setup ,at least a Pseudo-Distributed Mode,
see this:
http://www.cloudera.com/content/cloudera/en/documentation/core/v5-2-x/topics/cdh_ig_hbase_pseudo_configure.html
This means, its being set in some app/code.Are you using Cloduera Manager? You will need to set it in Cloduera Manager. If you are not using Cloudera Manager, then you will need to modify hbase-site.xml for HBase cluster and do a reboot of HBase cluster.
Since version 1.4.2 there's hbase.localcluster.assign.random.ports option which prevents ports overriding

How do I run sqlline with Phoenix?

When I try to run Phoenix's sqlline.py localhostcommand, I get
WARN util.DynamicClassLoader: Failed to identify the fs of
dir hdfs://localhost:54310/hbase/lib, ignored
java.io.IOException: No FileSystem for scheme:
hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass...
and nothing else happens. I also could not get Squirrel to work (it freezes when I click 'list drivers').
As per these instructions, I have copied phoenix-4.2.1-server.jar to my hbase/lib folder and restarted hbase. I have also copied core-site.xml and hbase-site.xml to my phoenix/bin directory.
I have not added 'the phoenix-[version]-client.jar to the classpath of any Phoenix client'
since I do not know what this refers to.
I am using HBase 0.98.6.1-hadoop2, Phoenix 4.2.1 and hadoop 2.2.0.
I fix the same issue by adding setting in
${PHOENIX_HOME}/bin/hbase-site.xml
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>

How to use Hive without hadoop

I am a new to NoSQL solutions and want to play with Hive. But installing HDFS/Hadoop takes a lot of resources and time (maybe without experience but I got no time to do this).
Are there ways to install and use Hive on a local machine without HDFS/Hadoop?
yes you can run hive without hadoop
1.create your warehouse on your local system
2. give default fs as file:///
than you can run hive in local mode with out hadoop installation
In Hive-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<!-- this should eventually be deprecated since the metastore should supply this -->
<name>hive.metastore.warehouse.dir</name>
<value>file:///tmp</value>
<description></description>
</property>
<property>
<name>fs.default.name</name>
<value>file:///tmp</value>
</property>
</configuration>
If you are just talking about experiencing Hive before making a decision you can just use a preconfigured VM as #Maltram suggested (Hortonworks, Cloudera, IBM and others all offer such VMs)
What you should keep in mind that you will not be able to use Hive in production without Hadoop and HDFS so if it is a problem for you, you should consider alternatives to Hive
You cant, just download Hive, and run:
./bin/hiveserver2
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
Hadoop is like a core, and Hive need some library from it.
Update This answer is out-of-date : with Hive on Spark it is no longer necessary to have hdfs support.
Hive requires hdfs and map/reduce so you will need them. The other answer has some merit in the sense of recommending a simple / pre-configured means of getting all of the components there for you.
But the gist of it is: hive needs hadoop and m/r so in some degree you will need to deal with it.
Top answer works for me. But need few more setups. I spend a quite some time search around to fix multiple problems until I finally set it up. Here I summarize the steps from scratch:
Download hive, decompress it
Download hadoop, decompress it, put it in the same parent folder as hive
Setup hive-env.sh
$ cd hive/conf
$ cp hive-env.sh.template hive-env.sh
Add following environment in hive-env.sh (change path accordingly based
on actual java/hadoop version)
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home
export path=$JAVA_HOME/bin:$path
export HADOOP_HOME=${bin}/../../hadoop-3.3.1
setup hive-site.xml
$ cd hive/conf
$ cp hive-default.xml.template hive-site.xml
Replace all the variable ${system:***} with constant paths (Not sure why this is not recognized in my system).
Set database path to local with following attributes (copied from top answer)
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<!-- this should eventually be deprecated since the metastore should supply this -->
<name>hive.metastore.warehouse.dir</name>
<value>file:///tmp</value>
<description></description>
</property>
<property>
<name>fs.default.name</name>
<value>file:///tmp</value>
</property>
</configuration>
setup hive-log4j2.properties (optional, good for troubleshooting)
cp hive-log4j2.properties.template hive-log4j2.properties
Replace all the variable ${sys:***} to constant path
Setup metastore_db
If you directly run hive, when do any DDL, you will got error of:
FAILED: HiveException org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ? createDatabaseIfNotExist=true for mysql))
In that case we need to recreate metastore_db with following command
$ cd hive/bin
$ rm -rf metastore_db
$ ./schematool -initSchema -dbType derby
Start hive
$ cd hive/bin
$ ./hive
Now you should be able run hive on you local file system. One thing to note, the metastore_db will always be created on you current directory. If you start hive in a different directory, you need to recreate it again.
Although, there are some details that you have to keep in mind it's completely normal to use Hive without HDFS. There are a few details one should keep in mind.
As a few commenters mentioned above you'll still need some .jar files from hadoop common.
As of today(XII 2020) it's difficult to run Hive/hadoop3 pair. Use stable hadoop2 with Hive2.
Make sure POSIX permissions are set correctly, so your local hive can access warehouse and eventually derby database location.
Initialize your database by manual call to schematool
You can use site.xml file pointing to local POSIX filesystem, but you can also set those options in HIVE_OPTS environmen variable.
I covered that with examples of errors I've seen on my blog post

Why do we need to format HDFS after every time we restart machine?

I have installed Hadoop in pseudo distributed mode on my laptop, OS is Ubuntu.
I have changed paths where hadoop will store its data (by default hadoop stores data in /tmp folder)
hdfs-site.xml file looks as below :
<property>
<name>dfs.data.dir</name>
<value>/HADOOP_CLUSTER_DATA/data</value>
</property>
Now whenever I restart machine and try to start hadoop cluster using start-all.sh script, data node never starts. I confirmed that data node is not start by checking logs and by using jps command.
Then I
Stopped cluster using stop-all.sh script.
Formatted HDFS using hadoop namenode -format command.
Started cluster using start-all.sh script.
Now everything works fine even if I stop and start cluster again. Problem occurs only when I restart machine and try to start the cluster.
Has anyone encountered similar problem?
Why this is happening and
How can we solve this problem?
By changing dfs.datanode.data.dir away from /tmp you indeed made the data (the blocks) survive across a reboot. However there is more to HDFS than just blocks. You need to make sure all the relevant dirs point away from /tmp, most notably dfs.namenode.name.dir (I can't tell what other dirs you have to change, it depends on your config, but the namenode dir is mandatory, could be also sufficient).
I would also recommend using a more recent Hadoop distribution. BTW, the 1.1 namenode dir setting is dfs.name.dir.
For those who use hadoop 2.0 or above versions config file names may be different.
As this answer points out, go to the /etc/hadoop directory of your hadoop installation.
Open the file hdfs-site.xml. This user configuration will override the default hadoop configurations, that are loaded by the java classloader before.
Add dfs.namenode.name.dir property and set a new namenode dir (default is file://${hadoop.tmp.dir}/dfs/name).
Do the same for dfs.datanode.data.dir property (default is file://${hadoop.tmp.dir}/dfs/data).
For example:
<property>
<name>dfs.namenode.name.dir</name>
<value>/Users/samuel/Documents/hadoop_data/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/samuel/Documents/hadoop_data/data</value>
</property>
Other property where a tmp dir appears is dfs.namenode.checkpoint.dir. Its default value is: file://${hadoop.tmp.dir}/dfs/namesecondary.
If you want, you can easily also add this property:
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/Users/samuel/Documents/hadoop_data/namesecondary</value>
</property>

Resources