I am facing an error while creating an External Table to push the data from Hive to ElasticSearch.
What I have done so far:
1) Successfully set up ElasticSearch-1.4.4 and is running.
2) Successfully set up Hadoop1.2.1, all the daemons are up and running.
3) Successfully set up Hive-0.10.0.
4) Configured elasticsearch-hadoop-1.2.0.jar in both Hadoop/lib and Hive/lib as well.
5) Successfully created few internal tables in Hive.
Error coming when executing following command:
CREATE EXTERNAL TABLE drivers_external (
id BIGINT,
firstname STRING,
lastname STRING,
vehicle STRING,
speed STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.nodes'='localhost','es.resource' = 'drivers/driver');
Error is:
Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.elasticsearch.hadoop.hive.EsStorageHandler
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Any Help!
Finally found the resolution for it...
1) The "elasticsearch-hadoop-1.2.0.jar" jar I was using was bugged one. It didn't have any hadoop/hive packages inside it. (Found this jar on internet and just downloaded it).
Now replaced it by jar from Maven repository "elasticsearch-hadoop-1.3.0.M1.jar".
2) The class "org.elasticsearch.hadoop.hive.**EsStorageHandler**" has been renamed in new elasticsearch jar as "org.elasticsearch.hadoop.hive.**ESStorageHandler**". Note that capital 'S' in 'ES'.
So the new hive command to create External table is :
CREATE EXTERNAL TABLE drivers_external (
id BIGINT,
firstname STRING,
lastname STRING,
vehicle STRING,
speed STRING)
STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler'
TBLPROPERTIES('es.nodes'='localhost','es.resource' = 'drivers/driver');
It Worked!
Related
Problem & Question
I am working on a docker-compose stack with Minio for object storage, a Hive Standalone Metastore, and Trino as the query engine. My repo is store here - https://github.com/rylativity/trino_hive-meta_minio and steps to reproduce the issue are included at the end of this post.
When I attempt to use the Hive Standalone Metastore, I get an error when attempting to create a table. I am able to create the schema with CREATE SCHEMA hive.taxi;, and I can see the taxi.db folder created in the /opt/warehouse folder in my Hive Metastore container when I run the CREATE SCHEMA command. However, when I run CREATE TABLE hive.taxi.trips (...<col_type_info>...) WITH (external_location = 's3a://test/trips, format = 'PARQUET'), I immediately see an NoSuchObjectException error thrown by the ThriftMetastoreClient in the Trino container's logs that says the hive.taxi.trips table cannot be found and then a timeout occurs 60s later:
trino_1 | 2022-06-07T15:09:41.496Z DEBUG dispatcher-query-13 io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastoreClient Invocation of get_table_req(req=GetTableRequest(dbName:taxi, tblName:trips, capabilities:ClientCapabilities(values:[INSERT_ONLY_TABLES]))) took 29.53ms and failed with NoSuchObjectException(message:hive.taxi.trips table not found)
However, if I use a FileHiveMetastore (stored within the Trino container), I am able to create and query the table successfully, which leads me initailly to believe that the issue is with my Hive Standalone Metastore setup and not with the other services.
Does anyone know why I might be receiving a NoSuchObjectException - hive.taxi.trips table not found when attempting to create a table and after I am able to successfully create the hive.taxi schema?
Steps to Reproduce
Clone the repo - https://github.com/rylativity/trino_hive-meta_minio
Run docker-compose up -d to bring up the containers. The minio-init container will create a 'test' bucket and an access-key/secret to access the bucket.
Run ./dataload_scripts/load_taxidata_to_minio.sh. This will download parquet files containing yellow-cab trip data from the public NYC-TLC bucket and load it into the 'test' bucket in Minio.
Run docker-compose exec trino trino to open a Trino shell inside the Trino container.
Inside the Trino shell, run CREATE SCHEMA hive.taxi; and then run
CREATE TABLE IF NOT EXISTS filehive.taxi.trips(
VendorID BIGINT,
tpep_pickup_datetime TIMESTAMP,
tpep_dropoff_datetime TIMESTAMP,
passenger_count DOUBLE,
trip_distance DOUBLE,
PULocationID BIGINT,
DOLocationID BIGINT,
RatecodeID DOUBLE,
store_and_fwd_flag VARCHAR,
dropoff_longitude DOUBLE,
dropoff_latitude DOUBLE,
payment_type BIGINT,
fare_amount DOUBLE,
improvement_surcharge DOUBLE,
congestion_surcharge DOUBLE,
mta_tax DOUBLE,
tip_amount DOUBLE,
tolls_amount DOUBLE,
extra DOUBLE,
airport_fee DOUBLE,
total_amount DOUBLE
)
WITH (
external_location = 's3a://test/taxi',
format = 'PARQUET'
);
If you run docker-compose exec metastore ls /opt/warehouse/, you will see that the taxi.db folder gets created after you run the CREATE SCHEMA... command. If you look at the Trino logs after running the CREATE TABLE... command, you will see the "NoSuchObjectException - hive.taxi.trips does not exist" error.
If you replace "hive" with "filehive" in the CREATE SCHEMA... and CREATE TABLE... commands, you will see that there is no issue when using a Trino FileHiveMetastore.
At any point, to start over, you can run docker-compose down -v and then return to step 1
I want to create hive table on top of phoenix table in emr.
I am facing a NoClassDefFoundError: org.apache.hadoop.hbase.security.SecurityInfo
What I have done so far:
I followed the instructions from https://phoenix.apache.org/hive_storage_handler.html and added phoenix-hive-5.0.0-HBase-2.0.jar to hive-env.sh as well as in hive-site.xml .
Restarted the hive service systemctl restart hive-server2.service
Restarted the metastore systemctl restart hive-hcatalog-server.service
Executed create table command from hue:
create external table ext_table (
i1 int,
s1 string,
f1 float,
d1 decimal
)
STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
TBLPROPERTIES (
"phoenix.table.name" = "ext_table",
"phoenix.zookeeper.quorum" = "localhost",
"phoenix.zookeeper.znode.parent" = "/hbase",
"phoenix.zookeeper.client.port" = "2181",
"phoenix.rowkeys" = "i1",
"phoenix.column.mapping" = "i1:i1, s1:s1, f1:f1, d1:d1"
);
Got an exception: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.security.SecurityInfo)
I am using emr-6.1.0
HBase 2.2.5
Phoenix 5.0.0
Hive 3.1.2
Anybody has an idea what can be the issue?
Update
I followed the advice from #leftjoin and used ADD JAR from hue to add phoenix-hive jar to classpath. Then I faced jar compatibility issue caused by phoenix hive connector that I use:
phoenix-hive-5.0.0-HBase-2.0.jar.
The newer versions of phoenix connectors are not archived into single bundle that could be downloaded from phoenix website . Instead
the connectors are located now in github repo.
I built the new phoenix-hive connector (versions: Phoenix->5.1.0, Hive->3.1.2, Hbase->2.2) and used it to create the Hive table.
As a result I got another exception, which I am not able to fix:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/phoenix/compat/hbase/CompatSteppingSplitPolicy
I think it is still somehow connected to dependency issues. But no clue what is exactly.
As a workaround put jar into hdfs and execute ADD JAR command before create table and query:
ADD JAR hdfs://path/to/your/jar/phoenix-hive-5.0.0-HBase-2.0.jar;
I'd like to know if it's possible to have an external table pointing to a DynamoDB table on AWS using Hive.
I'm not using AWS EMR, what I'm using is a Hadoop Stack configured through Apache Ambari.
Hive version: Hive 3.1.0.3.1.4.0-315
What I did was:
Downloaded the EMR Dynamo-Hive connector JARS directly from the maven repository: https://mvnrepository.com/artifact/com.amazon.emr
I loaded all the JARS in hive.aux.jars.path:
emr-dynamodb-hadoop-4.12.0.jar
emr-dynamodb-hive-4.12.0.jar
emr-dynamodb-tools-4.12.0.jar
hive1.2-shims-4.12.0.jar
hive1-shims-4.12.0.jar
hive2-shims-4.12.0.jar
hive2-shims-4.15.0.jar
shims-common-4.12.0.jar
shims-loader-4.12.0.jar
But when I try to create the table with:
CREATE EXTERNAL TABLE dynamo_LabDynamoHive
(id double, nome string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES (
"dynamodb.table.name" = "LabDynamoHive",
"dynamodb.column.mapping" = "id:id,nome:nome"
);
I get the following error:
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist
INFO : Completed executing command(queryId=hive_20200422142624_6ebabdc8-8942-4025-84a8-411505d20895); Time taken: 0.203 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Shim class for Hive version 3.1.1000 does not exist (state=08S01,code=1)
I know I'm not loading a Shims JAR for Hive 3, but I'd like to know if any of you have tried and succeded in using an external table with DynamoDB using Hive 3 outside of EMR.
Any help or directions would be greatly appreciated!
problem is apparently the source code of this EMR connector is somewhat outdated and lacks Hive 3.x support recently introduced by AWS for EMR 6.0.
However, you can find a working 3.1 implementation here, forked from the official EMR connector: https://github.com/ramsesrm/emr-dynamodb-connector
Installation steps as follow:
1- compile the mentioned code (mvn clean package)
2- install the 3 JARs in your hive.aux.jars.path, along with aws-java-sdk-core and aws-java-sdk-dynamodb JARs from AWS (shim JARs are not required), 5 in total.
Tha's it. Don't forget to specify the region as a TBLPROPERTIES if you're not using the default US one.
I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command
I'm sing cdh5 quickstart... I would like to run this script:
CREATE EXTERNAL TABLE serd(
user_id string,
type string,
title string,
year string,
publisher string,
authors struct<name:string>,
source string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/user/hdfs/data/book-seded-workings-reduced.json/' INTO TABLE serd;
But I got this error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class org.openx.data.jsonserde.objectinspector.JsonObjectInspectorFactory
But following my previous question(Loading JSON file with serde in Cloudera) , I've tried to build each serd proposed here: https://github.com/rcongiu/Hive-JSON-Serde
But I always have the same error
Finally, only twitter serde worked in my cdh5 vm