DynamoDBStorageHandler Hive connector - hadoop

when I run this command from Hive shell in our EMR cluster:
CREATE EXTERNAL TABLE my_db.my_table
(col1 string, ...)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "table_name",
"dynamodb.column.mapping" = "col1:col1 ... "
);
I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.net.ConnectTimeoutException Call From ip-xx-xx-xx-xxx.ec2.internal/xx.xx.xx.xxx to ip-yy-yy-yy-yyy.ec2.internal:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=ip-yy-yy-yy-yyy.ec2.internal/yy.yy.yy.yyy:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout)
The EMR cluster is in VPC.
I Tried editing the Inbound/Outbound rules of the security group of the master node, so far with no success.
Thanks, Michael

AWS Support were able to assist me: the problem was that the database location in Glue was pointing to old HDFS address ip-yy-yy-yy-yyy.ec2.internal (different then xx.xx.xx.xxx), according to the master's node of the previous cluster. I changed to location to point to S3 and the problem was resolved.

Related

Create Hive table with Phoenix handler throws NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo

I want to create hive table on top of phoenix table in emr.
I am facing a NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo
What I have done so far:
I followed the instructions from https://phoenix.apache.org/hive_storage_handler.html and added phoenix-hive-5.0.0-HBase-2.0.jar to hive-env.sh as well as in hive-site.xml .
Restarted the hive service systemctl restart hive-server2.service
Restarted the metastore systemctl restart hive-hcatalog-server.service
Executed create table command from hue:
create external table ext_table (
i1 int,
s1 string,
f1 float,
d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
"phoenix.table.name" = "ext_table",
"phoenix.zookeeper.quorum" = "localhost",
"phoenix.zookeeper.znode.parent" = "/hbase",
"phoenix.zookeeper.client.port" = "2181",
"phoenix.rowkeys" = "i1",
"phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);
Got an exception: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.security.SecurityInfo)
I am using emr-6.1.0
HBase 2.2.5
Phoenix 5.0.0
Hive 3.1.2
Anybody has an idea what can be the issue?
Update
I followed the advice from #leftjoin and used ADD JAR from hue to add phoenix-hive jar to classpath. Then I faced jar compatibility issue caused by phoenix hive connector that I use:
phoenix-hive-5.0.0-HBase-2.0.jar.
The newer versions of phoenix connectors are not archived into single bundle that could be downloaded from phoenix website . Instead
the connectors are located now in github repo.
I built the new phoenix-hive connector (versions: Phoenix->5.1.0, Hive->3.1.2, Hbase->2.2) and used it to create the Hive table.
As a result I got another exception, which I am not able to fix:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/phoenix/compat/hbase/CompatSteppingSplitPolicy
I think it is still somehow connected to dependency issues. But no clue what is exactly.
As a workaround put jar into hdfs and execute ADD JAR command before create table and query:
ADD JAR hdfs://path/to/your/jar/phoenix-hive-5.0.0-HBase-2.0.jar;

Greenplum connector giving hostname resolution error in ibm Datas stage

GreenPlum Connection error:
Greenplum_Connector_1,0: The following SQL statement failed: INSERT INTO GPCC_ET_20200903233319813_84175_0 select * from table .
The statement reported the following reason: [SQLCODE=HY000][Native=56,966,976] [IBM(DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: could not translate host name "hostname_of_machine", port "8001" to address: Name or service not known (cdbutil.c:819)
(seg5 192.168.111.240:6005 pid=38339) (cdbdisp.c:254)(File cdbdisp.c; Line 254; Routine cdbdisp_finishCommand; )
(CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
As Jon mentioned, "hostname_of_machine" needs to be updated to the real hostname

How to create external table to DynamoDB on Hive 3.1

I'd like to know if it's possible to have an external table pointing to a DynamoDB table on AWS using Hive.
I'm not using AWS EMR, what I'm using is a Hadoop Stack configured through Apache Ambari.
Hive version: Hive 3.1.0.3.1.4.0-315
What I did was:
Downloaded the EMR Dynamo-Hive connector JARS directly from the maven repository: https://mvnrepository.com/artifact/com.amazon.emr
I loaded all the JARS in hive.aux.jars.path:
emr-dynamodb-hadoop-4.12.0.jar
emr-dynamodb-hive-4.12.0.jar
emr-dynamodb-tools-4.12.0.jar
hive1.2-shims-4.12.0.jar
hive1-shims-4.12.0.jar
hive2-shims-4.12.0.jar
hive2-shims-4.15.0.jar
shims-common-4.12.0.jar
shims-loader-4.12.0.jar
But when I try to create the table with:
CREATE EXTERNAL TABLE dynamo_LabDynamoHive
(id double, nome string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "LabDynamoHive",
"dynamodb.column.mapping" = "id:id,nome:nome"
);
I get the following error:
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist
INFO : Completed executing command(queryId=hive_20200422142624_6ebabdc8-8942-4025-84a8-411505d20895); Time taken: 0.203 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist (state=08S01,code=1)
I know I'm not loading a Shims JAR for Hive 3, but I'd like to know if any of you have tried and succeded in using an external table with DynamoDB using Hive 3 outside of EMR.
Any help or directions would be greatly appreciated!
problem is apparently the source code of this EMR connector is somewhat outdated and lacks Hive 3.x support recently introduced by AWS for EMR 6.0.
However, you can find a working 3.1 implementation here, forked from the official EMR connector: https://github.com/ramsesrm/emr-dynamodb-connector
Installation steps as follow:
1- compile the mentioned code (mvn clean package)
2- install the 3 JARs in your hive.aux.jars.path, along with aws-java-sdk-core and aws-java-sdk-dynamodb JARs from AWS (shim JARs are not required), 5 in total.
Tha's it. Don't forget to specify the region as a TBLPROPERTIES if you're not using the default US one.

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

hive failed execution error return code 2 from org.apache.hadoop.hive.ql.exec.mapredtask

I have one query. It is executing fine on Hive CLI and returning the result. But when I am executing it with the help of Hive JDBC, I am getting an error below:
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:192)
What is the problem? Also I am starting the Hive Thrift Server through Shell Script. (I have written a shell script which has command to start Hive Thrift Server) Later I decided to start Hive thrift Server manually by typing command as:
hadoop#ubuntu:~/hive-0.7.1$ bin/hive --service hiveserver
Starting Hive Thrift Server
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:10000.
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:99)
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:80)
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:73)
at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:384)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
hadoop#ubuntu:~/hive-0.7.1$
Please help me out from this.
Thanks
For this error :
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuer
Go to this link :
http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_Hive.html
and add
**hadoop-0.20-core.jar
hive/lib/hive-exec-0.7.1.jar
hive/lib/hive-jdbc-0.7.1.jar
hive/lib/hive-metastore-0.7.1.jar
hive/lib/hive-service-0.7.1.jar
hive/lib/libfb303.jar
lib/commons-logging-1.0.4.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar**
to the class path of your project , add this jars from the lib of hadoop and hive, and try the code. and also add the path of hadoop, hive, and hbase(if your are using) lib folder path to the project class path, like you have added the jars.
and for the second error you got
type
**netstat -nl | grep 10000**
if it shows something means hive server is already running. the second error comes only when the port you are specifying is already acquired by some other process, by default server port is 10000 so very with the above netstat command which i said.
Note : suppose you have connected using code exit from ... bin/hive of if you are connected through bin/hive > then code will not connect because i think (not sure) only one client can connect to the hive server.
do above steps hopefully will solve your problem.
NOTE : exit from cli when you are going to execute the code, and dont start cli while code is being executing.
Might be some issue with permission, just try some query like "SELECT * FROM " which won't start MR jobs.
Try to paste these property before the codes.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
set hive.auto.convert.join = false;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.dynamic.partitions.pernode=10000;

Resources