Cdap connectivity with Apache HIVE - hadoop

I have linux Box with CDAP installed and I configured the Hive import and Export plugins in CDAP.
In the same machine, I have Hadoop with HIVE installed. Am able to start all of the Hadoop services and verified using jps command and create and query the hive tables.
The actual problem is when am trying to connect the hive from cdap. It is unable to connect to hive and it is throwing the below error message.
Connection string: jdbc:hive2://localhost:10000/defaultdb;auth=deligateToken;
Output Directory: /tmp/hive - this directory is already exists
Error:
I tried changing the connection string to
Option1 : Connection string: jdbc:hive2://localhost:10000/defaultdb;auth=deligateToken; - COnnection refused error
Option 2: Connection string: jdbc:hive2:// - unable to instantiate error.
Option 3: Connection string: jdbc:hive2://localhost:10001/defaultdb;auth=deligateToken; - still it is not working

Related

Greenplum issue - HDFS Protocol Installation for GPHDFS access to HDP 2.x cluster

When i am trying to read external table using GPHDFS Protocol. Additionally, I am not able to access HDP2.X files via greenplum cluster.
Getting Error
devdata=# select count(*) from schema.ext_table;
ERROR: external table gphdfs protocol command ended with error. Error occurred during initialization of VM (seg5 slice1 datanode0:40001 pid=13407)
DETAIL:
java.lang.OutOfMemoryError: unable to create new native thread
Command: 'gphdfs://Authorithy/path
More symptoms
Not able to run Hadoop list file command from gpadmin user at greenplum cluster.
that is
gpadmin$hdfs dfs -ls hdfs://namenode/file/path
We tried :
checked Setting related to gphdfs vm paramteres.

Hive - issues while starting

I have been using Hive for sometime now on Ubuntu while Hadoop is in Pseudo Distribution mode however today out of nowhere i am getting error while starting Hive shell.I have not made any changes in configuration at all -
Caused by: Meta Exception(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
The hivemetastore service is not running. You can start the service with the command below. This command is for installations made using packages.
service hive-metastore start
For tarball installations, you can start the hive metastore using the below command
hive --service metastore &

Beeline command issue

I am new to Hive and hopefully this is going to be an easy thing to solve
for someone with more experience, but I am having trouble doing it on my
own.
On my EC2 app server I am running the following command with no error:
beeline -u jdbc:hive2://master
This is working on Hive 13 which was installed through a bootstrap action
using the latest AMI version. 'master' is pointing to my EMR cluster
Then I downloaded the source for Hive 14 and built it. I have replaced my
/home/hadoop/hive directory with the package that was built.
However, if I try to execute the same command, I get an error:
scan complete in 6ms
Connecting to jdbc:hive2://master
Error: Could not open client transport with JDBC Uri: jdbc:hive2://master:
Cannot open without port. (state=08S01,code=0)
Beeline version 0.14.0 by Apache Hive
0: jdbc:hive2://master (closed)>
Running it with the port provided works correctly:
beeline -u jdbc:hive2://master:10000
I would like to be able to able to run the command without providing the
default port number.
Can anyone direct me with an instruction.
Thanks,
Hive Beeline Connection in Two Modes:
1.Embedded Mode:
If both Hive Client and Hive server are same then connect beeline by using below url:
!connect jdbc:hive2://
2.Remote Mode:
If server in one machine but client in one machine you can connect beeline using below url:
!connect jdbc:hive2://<host>:<port>

Hive jdbc connection is giving error if MR is involved

I am working on Hive-jdbc connection in HDP 2.1
Code is working fine for queries where mapreduce is not involved like "select * from tabblename". The same code is showing error when the query is modified with a 'where' clause or if we specify columnnames(which will run mapreduce in the the background).
I have verified the correctness of the query by executing it in HiveCLI.
Also I have verified the read/write permissions for the table for the user through which I am running the java-jdbc code.
The error is as follows
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:355)
at com.testing.poc.hivejava.HiveJDBCTest.main(HiveJDBCTest.java:25)
Today I also got this exception when I submit a hive task from java.
The following error:
org.apache.hive.jdbc.HiveDriverorg.apache.hive.jdbc.HiveDriverhive_driver:
org.apache.hive.jdbc.HiveDriverhive_url:jdbc:hive2://10.174.242.28:10000/defaultget
connection sessucess获取hive连接成功!
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I tried to use the sql execute in hive and it works well. Then I saw the log in /var/log/hive/hadoop-cmf-hive-HIVESERVER2-cloud000.log.out then I found the reason of this error. The following error:
Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=anonymous, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
Solution
I used the following command :
sudo -u hdfs hadoop fs -chmod -R 777 /
This solved the error!
hive_driver:org.apache.hive.jdbc.HiveDriver
hive_url:jdbc:hive2://cloud000:10000/default
get connection sessucess
获取hive连接成功!
Heart beat
执行insert成功!
If you use beeline to execute the same queries, do you see the same behaviour as you get while running your test program?
The beeline client also uses the open source JDBC driver and connects to Hive server, which is similar to what you do in your program. HiveCLI on the other hand has Hive embedded in it and does not connect to a remote Hive server by default. You can use HiveCLI to connect to a remote Hive Server 1 but I don't believe you can use it to connect to Hive Server2 (use beeline for Hive Server 2).
For this error, you can take a look at the hive.log and hiveserver2.log on the server side to get more insight into what might have caused the MapReduce error.
Hope this helps.
Cheers,
Holman

Hadoop HDFS connection in PowerCenter

I have installed Cloudera's Hadoop QuickStart VM and I am attempting to pass records from my local database to HDFS using a PowerCenter mapping.
I've set up the Hadoop_HDFS_Connection in PowerCenter Workflow Manager but when I run the workflow I get the following error: "Unable to establish a connection with the specified HDFS host". It gives a "java.net.ConnectionException" error when trying to connect to the host name and port.
I think the error may be in the hostname notation. On Cloudera Manager on the VM, the host name is listed as 'localhost.localdomain' but I don't know how to translate this in the PowerCenter connection settings.
Anybody got this connection to work?
Many thanks.
Brian

Resources