Getting error try to select hive table using hcatalog from HAWQ - hortonworks-data-platform

I am using Hortonworks (HDP)sandbox and on top of that install HAWQ 2.0
I'm trying to select hive table using hcatalog but not able to access hive tables form HAWQ. Executing below steps mention in pivotal doc.
postgres=# SET pxf_service_address TO "localhost:51200";
SET
postgres=# select count(*) from hcatalog.default.sample_07;
ERROR: remote component error (500) from 'localhost:51200': type Exception report message Internal server error. Property "METADATA" has no value in current request description The server encountered an internal error that prevented it from fulfilling this request. exception java.lang.IllegalArgumentException: Internal server error. Property "METADATA" has no value in current request (libchurl.c:878)
LINE 1: select count(*) from hcatalog.default.sample_07;

I think there's a missing property in pxf-profile.xml
check if you have <metadata> property under Hive profile
this is newly added profile and if you were using a legacy build it might not have it.
<profile>
<name>Hive</name>
<description>This profile is suitable for using when connecting to Hive</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hive.HiveDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.hive.HiveAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.hive.HiveResolver</resolver>
<metadata>org.apache.hawq.pxf.plugins.hive.HiveMetadataFetcher</metadata>
</plugins>
</profile>

Related

Create Hive table with Phoenix handler throws NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo

I want to create hive table on top of phoenix table in emr.
I am facing a NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo
What I have done so far:
I followed the instructions from https://phoenix.apache.org/hive_storage_handler.html and added phoenix-hive-5.0.0-HBase-2.0.jar to hive-env.sh as well as in hive-site.xml .
Restarted the hive service systemctl restart hive-server2.service
Restarted the metastore systemctl restart hive-hcatalog-server.service
Executed create table command from hue:
create external table ext_table (
i1 int,
s1 string,
f1 float,
d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
"phoenix.table.name" = "ext_table",
"phoenix.zookeeper.quorum" = "localhost",
"phoenix.zookeeper.znode.parent" = "/hbase",
"phoenix.zookeeper.client.port" = "2181",
"phoenix.rowkeys" = "i1",
"phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);
Got an exception: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.security.SecurityInfo)
I am using emr-6.1.0
HBase 2.2.5
Phoenix 5.0.0
Hive 3.1.2
Anybody has an idea what can be the issue?
Update
I followed the advice from #leftjoin and used ADD JAR from hue to add phoenix-hive jar to classpath. Then I faced jar compatibility issue caused by phoenix hive connector that I use:
phoenix-hive-5.0.0-HBase-2.0.jar.
The newer versions of phoenix connectors are not archived into single bundle that could be downloaded from phoenix website . Instead
the connectors are located now in github repo.
I built the new phoenix-hive connector (versions: Phoenix->5.1.0, Hive->3.1.2, Hbase->2.2) and used it to create the Hive table.
As a result I got another exception, which I am not able to fix:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/phoenix/compat/hbase/CompatSteppingSplitPolicy
I think it is still somehow connected to dependency issues. But no clue what is exactly.
As a workaround put jar into hdfs and execute ADD JAR command before create table and query:
ADD JAR hdfs://path/to/your/jar/phoenix-hive-5.0.0-HBase-2.0.jar;

How to create external table to DynamoDB on Hive 3.1

I'd like to know if it's possible to have an external table pointing to a DynamoDB table on AWS using Hive.
I'm not using AWS EMR, what I'm using is a Hadoop Stack configured through Apache Ambari.
Hive version: Hive 3.1.0.3.1.4.0-315
What I did was:
Downloaded the EMR Dynamo-Hive connector JARS directly from the maven repository: https://mvnrepository.com/artifact/com.amazon.emr
I loaded all the JARS in hive.aux.jars.path:
emr-dynamodb-hadoop-4.12.0.jar
emr-dynamodb-hive-4.12.0.jar
emr-dynamodb-tools-4.12.0.jar
hive1.2-shims-4.12.0.jar
hive1-shims-4.12.0.jar
hive2-shims-4.12.0.jar
hive2-shims-4.15.0.jar
shims-common-4.12.0.jar
shims-loader-4.12.0.jar
But when I try to create the table with:
CREATE EXTERNAL TABLE dynamo_LabDynamoHive
(id double, nome string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "LabDynamoHive",
"dynamodb.column.mapping" = "id:id,nome:nome"
);
I get the following error:
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist
INFO : Completed executing command(queryId=hive_20200422142624_6ebabdc8-8942-4025-84a8-411505d20895); Time taken: 0.203 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist (state=08S01,code=1)
I know I'm not loading a Shims JAR for Hive 3, but I'd like to know if any of you have tried and succeded in using an external table with DynamoDB using Hive 3 outside of EMR.
Any help or directions would be greatly appreciated!
problem is apparently the source code of this EMR connector is somewhat outdated and lacks Hive 3.x support recently introduced by AWS for EMR 6.0.
However, you can find a working 3.1 implementation here, forked from the official EMR connector: https://github.com/ramsesrm/emr-dynamodb-connector
Installation steps as follow:
1- compile the mentioned code (mvn clean package)
2- install the 3 JARs in your hive.aux.jars.path, along with aws-java-sdk-core and aws-java-sdk-dynamodb JARs from AWS (shim JARs are not required), 5 in total.
Tha's it. Don't forget to specify the region as a TBLPROPERTIES if you're not using the default US one.

Unable to create a table from Hive CLI - ERROR 23502

I seem to be getting the below exception when I try to create a table using Hive client.
create table if not exists test (id int, name string) comment 'test table';
11:15:32.016 [HiveServer2-Background-Pool: Thread-34] ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler - Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDODataStoreException: Insert of object "org.apache.hadoop.hive.metastore.model.MTable#784fafc2" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,DB_ID,LAST_ACCESS_TIME,OWNER,RETENTION,SD_ID,TBL_NAME,TBL_TYPE,VIEW_EXPANDED_TEXT,VIEW_ORIGINAL_TEXT) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED' cannot accept a NULL value.
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:720)
Caused by: ERROR 23502: Column 'IS_REWRITE_ENABLED' cannot accept a NULL value.
at org.apache.derby.client.am.ClientStatement.completeExecute(Unknown Source)
at org.apache.derby.client.net.NetStatementReply.parseEXCSQLSTTreply(Unknown Source)
I did search but couldn't find a satisfactory resolution.
Here is my set up:
Hive 2.1.0
OS: Windows
Hadoop: 2.9.2
Derby: 10.14.2.0
What am i missing?
Thanks.
Seems like a compatibility issue with derby. I moved back to a earlier version of derby 10.2.1.1 and the issue went away.

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

PXF JSON plugin error

Using HDP 2.4 and HAWQ 2.0
Wanted to read json data kept in HDFS path into HAWQ external table?
Followed below steps to add new json plugin into PXF and read data.
Download plugin "json-pxf-ext-3.0.1.0-1.jar" from
https://bintray.com/big-data/maven/pxf-plugins/view#
Copy the plugin into path /usr/lib/pxf.
Create External table
CREATE EXTERNAL TABLE ext_json_mytestfile ( created_at TEXT,
id_str TEXT, text TEXT, source TEXT, "user.id" INTEGER,
"user.location" TEXT,
"coordinates.type" TEXT,
"coordinates.coordinates[0]" DOUBLE PRECISION,
"coordinates.coordinates[1]" DOUBLE PRECISION)
LOCATION ('pxf://localhost:51200/tmp/hawq_test.json'
'?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter'
'&ACCESSOR=org.apache.hawq.pxf.plugins.json.JsonAccessor'
'&RESOLVER=org.apache.hawq.pxf.plugins.json.JsonResolver'
'&ANALYZER=org.apache.hawq.pxf.plugins.hdfs.HdfsAnalyzer')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import')
LOG ERRORS INTO err_json_mytestfile SEGMENT REJECT LIMIT 10 ROWS;
When execute the above DDL table create successfully. After that trying to execute select query
select * from ext_json_mytestfile;
But getting error: -
ERROR: remote component error (500) from 'localhost:51200': type Exception report message java.lang.ClassNotFoundException: org.apache.hawq.pxf.plugins.json.JsonAccessor description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.lang.ClassNotFoundException: org.apache.hawq.pxf.plugins.json.JsonAccessor (libchurl.c:878) (seg4 sandbox.hortonworks.com:40000 pid=117710) (dispatcher.c:1801)
DETAIL: External table ext_json_mytestfile
Any help would be much appreciated.
It seems that referenced jar file has old package name as com.pivotal.*. The JSON PXF extension is still incubating, the jar pxf-json-3.0.0.jar is built for JDK 1.7 as Single node HDB VM is using JDK 1.7 and uploaded to dropbox.
https://www.dropbox.com/s/9ljnv7jiin866mp/pxf-json-3.0.0.jar?dl=0
Echo'ing the details of the above comments so that the steps are performed correctly to ensure the PXF service recognize the jar file. The below steps assume that Hawq/HDB is managed by Ambari. If not, the manual steps as mentioned by the previous updates should work.
Copy the pxf-json-3.0.0.jar to /usr/lib/pxf/ of all your HAWQ nodes (master and segments).
In Ambari managed PXF, add the below line by going through Ambari Admin -> PXF -> Advanced pxf-public-classpath
/usr/lib/pxf/pxf-json-3.0.0.jar
In Ambari managed PXF, add this snippet to your pxf profile xml at the end by going through Ambari Admin -> PXF -> Advanced pxf-profiles
<profile>
<name>Json</name>
<description>
JSON Accessor</description>
<plugins>
<fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
<accessor>org.apache.hawq.pxf.plugins.json.JsonAccessor</accessor>
<resolver>org.apache.hawq.pxf.plugins.json.JsonResolver</resolver>
</plugins>
</profile>
Restart PXF service via Ambari
Did you add the jar file location to /etc//conf/pxf-public.classpath?
Did you try:
copying PXF JSON jar file to /usr/lib/pxf
updating /etc/pxf/conf/pxf-profiles.xml to include the Json plug-in profile if not already present
(per comment above) updating the /etc/pxf/conf/pxf-public.classpath
restarting the PXF service either via Ambari or command line (sudo service pxf-service restart)
likely didn't add json jar in classpath.
Create External Table DDL will always succeed as it was just a definition.
Only when you run queries, HAWQ will check the run time jar dependencies.
Yes, the jar json-pxf-ext-3.0.1.0-1.jar" from https://bintray.com/big-data/maven/pxf-plugins/view# has old package name as com.pivotal.*. The previous update has edited with details to download the correct jar from dropbox

Resources