MR scratch error on hive/hbase integration

MR scratch error on hive/hbase integration - hadoop

I'm running hive and hbase on a 2-node-hadoop.
I'm using hadoop-0.20.205.0, hive-0.9.0, hbase-0.92.0, and zookeeper-3.4.2.
hive and hbase works fine separately. Then I followed this manual to integrate hive and hbase.
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
hive started without errors, and I created the sample table
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
show tables in hive and list or scan in hbase works well.
But when I select * from hbase_table_1; in hive, I get errors
2012-09-12 11:25:56,975 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: java.lang.RuntimeException(Error while making MR scratch directory - check filesystem config (null))
java.lang.RuntimeException: Error while making MR scratch directory - check filesystem config (null)
...
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://10.10.10.15:54310/tmp/hive-hadoop/hive_2012-09-12_11-25-56_602_1946700606338541381, expected: hdfs://hadoop01:54310
It says fs is wrong, but I don't think it's right to config fs to such a path, and where should I config it?
Here is my config files. Ip address of hadoop01 is 10.10.10.15.
hbase-site.xml
<configuration>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.10.10.15</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/datas/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop01:54310/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
Anyone can help please?

I solved it myself.
Modify $HADOOP_HOME/conf/core-site.xml, change dfs.default.name from ip to hostname. like this
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop01:54310/</value>
</property>
Make sure that both this property and hbase.rootdir property in hbase-site.xml use same hostname or ip.

Related

Unable to access databases created with hive and run queries on hue

I would like to use Hue as a visualization interface for hive, the server hiveserver 2 starts well and I can work in command without problem.
My hadoop is also functional (single node running on localhost), I managed to configure the hdfs files for hue and I can easily view hdfs files with the interface hue. but my big problem for weeks is to make a HIVE request with hue (even if I configured according to the research I found on the internet). I can not do it and get stuck on it
your help will be really appreciated.
this is hive-site.xml
<?xml version="1.0"?>
-<configuration>
-<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive_temp</value>
<description>Local scratch space for Hive jobs</description>
</property>
-<property>
<name>hive.execution.engine</name>
<value>mr</value>
<description> Expects one of [mr, tez, spark]. Chooses execution engine. Options are: mr (Map reduce, default)</description>
</property>
-<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&useSSL=false</value>
<description>metadata is stored in a MySQL server</description>
</property>
-<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
-<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
-<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password to use against metastore database</description>
</property>
-<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive_tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission</description>
</property>
-<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/warehouse</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission</description>
</property>
</configuration>
and hive configuration in HUE pseudo-distributed.ini
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=localhost
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10002
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/usr/local/hive/conf

HBase Region servers will not start in Hadoop HA environment

I've created an HBase cluster in a Hadoop HA cluster. My region servers are failing to start with the following exception in the logs:
2017-09-12 11:41:32,116 ERROR [regionserver/my.hostname.com/10.10.30.28:16020] regionserver.HRegionServer: Failed init
java.io.IOException: Failed on local exception: java.net.SocketException: Invalid argument; Host Details : local host is: "my.hostname.com/10.10.30.28"; destination host is: "0.0.0.1":8020;
I'm pretty sure the problem is caused by the hadoop HA configuration
I think Hbase doesn't understand the nameservice and thinks it's an IP address.
excerpt from core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://001</value>
<description>NameNode URI</description>
</property>
excerpt from hdfs-site.xml:
<property>
<name>dfs.nameservices</name>
<value>001</value>
</property>
my hbase-site.xml:
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://001/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zk1:2181,zk2:2181,zk3:2181</value>
</property>
</configuration>
Help?

It was a silly mistake. Hbase was missing the path to the hadoop configuration files. Simply added HADOOP_CONF_DIR to hbase-env.sh

Hadoop - Tables not displayed in Hive

The problem i am facing is:
Everytime I login in to HIVE CLI, all the created databases & tables are gone. I can see them in the warehouse directory in Hadoop GUI. However same is not reflecting through CLI. Please help me resolve the issue.
I am using Hadoop - 1.0.4 & Hive - 1.2.1.
I have configured (warehouse dir, temp dir, derby metastore dir) inhive-site.xml as per documentation.
properties in hive-site.xml
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hadoop/hive</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/hadoop/hive</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.scratch.dir.permission</name>
<value>700</value>
<description>The permission for the user specific scratch directories that get created.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value/>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.metastore.connect.retries</name>
<value>3</value>
<description>Number of retries while opening a connection to metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/usr/hadoop/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

Issues in saving bulk data in HBase in Pseudo-distributed mode

I am setting up CDH4 in a pseudo-distributed mode.
I have set up Hadoop, and as suggested on CDH4 installation guide, have also completed the hdfs demo successfully.
I have also set up, HIVE, & HBase.
To populate the data in Hbase, I have written a java client, which populates the bulk data in HBase (around 1M rows each in 4 tables).
Now I am facing two issues:
When java client is running to port the dummy data into hbase, the regionserver shut down after around 4,50,000 rows of data is entered in total.
Using Hive, I am not able to access tables created in HBase, or worst, even cannot create tables from hive shell. Though, the hbase shell shows me the data/table structure (whetever has been generated before regionserver shut down.)
I have seen other posts regarding same. Seems that the 2nd issue is related to my /etc/hosts or hive-site.xml. Thus, I am pasting contents of both of them.
/etc/hosts
198.251.79.225 u17162752.onlinehome-server.com u17162752
198.251.79.225 default-domain.com
198.251.79.225 hbase.zookeeper.quorum localhost
198.251.79.225 cloudera-vm # Added by NetworkManager
127.0.0.1 localhost.localdomain localhost
127.0.1.1 cloudera-vm-local localhost
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore</value>
<description>the URL of the MySQL database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mypassword</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>
</configuration>
These issue are holding me from accomplish the task, I am supposed to.
Thanks in advance
Abhiskek
PS: This is my first post to this forum, so apologies, for anything inappropriate, you might have found! Thanks for bearing with me.
Hi Tariq, Thanks for your response. I have somehow managed to get over this. Now, I am facing another issue.
I am having 4 tables in HBase already, for which I want to create external tables in hive shell. But on running create external table commands on hive shell gives following error:
'ERROR: org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in -ROOT- for region .META.,,1.1028785192 containing row'
Also, this error appears when I do something in HBase shell.
The other error that comes with the former one, on hbase shell is related to zookeeper. Stacktrace:
'WARN zookeeper.ZKUtil: catalogtracker-on- org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation#6a9a56bf- 0x1413718482c0010 Unable to get data of znode /hbase/unassigned/1028785192
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/1028785192'
Please help. Thanks!

Impala cannot find com.mysql.jdbc.Driver

I'm trying to set up Cloudera Impala with CDH4 in pseudo distributed mode on Red Hat 5. I have Hive using JDBC to connect to a MySQL metastore, but I'm having trouble setting up Impala with JDBC. I've been following the instructions found here: http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_impala_jdbc.html
I've extracted the JARs to a directory and included that directory in $CLASSPATH. I've also included /usr/lib/hive/lib in $CLASSPATH, which has mysql-connector-java-5.1.25-bin.jar.
In both my Hive and Impala conf directories, I have hive-site.xml including the following properties:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
But when I run sudo service impala-server restart, the server log has this error:
ERROR common.MetaStoreClientPool: Error initializing Hive Meta Store client
javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
Which it says is cause by this:
Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.datasource.dbcp.DBCPDataSourceFactory.makePooledDataSource(DBCPDataSourceFactory.java:80)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initDataSourceTx(ConnectionFactoryImpl.java:144)
... 57 more
Is there any step I'm missing to configure Impala with JDBC?

I fixed this by copying mysql-connector-java-5.1.25-bin.jar to /var/lib/impala - the startup script was telling the classpath to look here for the connector jar for some reason.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

MR scratch error on hive/hbase integration - hadoop

Related

Unable to access databases created with hive and run queries on hue

HBase Region servers will not start in Hadoop HA environment

Hadoop - Tables not displayed in Hive

Issues in saving bulk data in HBase in Pseudo-distributed mode

Impala cannot find com.mysql.jdbc.Driver

Categories

Resources