I am trying to create a transactional ORC table in Hive using beeline.
DDL:
CREATE TABLE employee_trans (
id int,
name string,
age int,
gender string)
STORED AS ORC
TBLPROPERTIES ('transactional'='true');
I have also set the below properties:
SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
But I am getting the below error,
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:The table must be bucketed and stored using an ACID compliant format (such as ORC))
Can someone please help!
Related
I am trying to load the hbase table from hive table, for that I am using the following approach and it works fine if I have only single column family in hbase table, however if I have multiple families it throws error.
Approach
source table
CREATE EXTERNAL TABLE temp.employee_orc(id String, name String, Age int)
STORED AS ORC
LOCATION '/tmp/employee_orc/table';
Create Hive table with Hbase Serde
CREATE TABLE temp.employee_hbase(id String, name String, age int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,emp:name,emp:Age')
TBLPROPERTIES("hbase.table.name" = "bda:employee_hbase", "hfile.family.path"="/tmp/employee_hbase/emp", "hive.hbase.generatehfiles"="true");
export the hbase files
SET hive.hbase.generatehfiles=true;
INSERT OVERWRITE TABLE temp.employee_hbase SELECT DISTINCT id, name, Age FROM temp.employee_orc CLUSTER BY id;
Load the hbase table
export HADOOP_CLASSPATH=`hbase classpath`
hadoop jar /usr/hdp/current/hbase-client/lib/hbase-server.jar completebulkload /tmp/employee_hbase/ 'bda:employee_hbase'
Error
I am getting following error if I have multiple column family in Hbase table,
java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: Multiple family directories found in hdfs://hadoopdev/apps/hive/warehouse/temp.db/employee_hbase/_temporary/0/_temporary/attempt_1527799542731_1180_r_000000_0
is there another way to load Hbase table if not this approach?
Bulk load from hive to hbase, The target table can only have a single column family.
bulk load of hbase
You can use hbase bulkload hbase_bulkload with support multiple column family
Or you can use multiple hive table for each column family
With the below HIVE query I am creating a HIVE table in ORC format, which should have been creating it successfully:
create table etl_stats.err_mstr_40sq_orc(audt_id int,err_col_lineage_id int,err_cd int, err_dscr string,cntxt_txt string, src_nm string, src_key string)
STORED AS ORC
LOCATION '/user/warehouse/hive';
The table got created successfully, but to cross check when I used the
"describe formatted ;", I get below output
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
What am I missing?
I am looking for a command to add columns and update schema for my Hive External table backed by Avro schema.
Here is what I have tried so far.
I have a Hive External Table with Avro backed Schema created with this command -
CREATE EXTERNAL TABLE `person_hourly`(
'personid' string COMMENT '',
'name' string COMMENT ''
)
PARTITIONED BY (
'partitiontime' string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
'hdfs://nameservice1/web/PersonData/'
TBLPROPERTIES (
'avro.schema.url'='hdfs:///schemas/PersonV1.avsc'
)
I would like to add additional columns and update schema for this table.
alter table person_hourly ADD COLUMNS (lastname string ) SET TBLPROPERTIES ('avro.schema.url' = 'hdfs:///schemas/PersonV2.avsc')
But I cannot do this since I get an error
FAILED: ParseException line 1:64 missing EOF at 'SET' near ')'
So I tried adding column separately, which worked, but I cannot update the schema
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. at least one column must be specified for the table
The Data Definition Language (DDL) for ALTER TABLE can be found here
ALTER TABLE table_name SET TBLPROPERTIES table_properties;
table_properties:
: (property_name = property_value, property_name = property_value, ... )
And your comment
I tried adding column separately, which worked
I think that's what you should do. Add the column, then set the properties
if you modify the schema in the hdfs, it will be detected by Hive. Hive read the schema on runtime, it doesn't save any schema information when you use avsc through avro.schema.url
Regards,
Hector
The code below worked for me..
You can change the schema definition in avsc file (with proper formatting) then can use simply alter command with setting path of updated schema file.
ALTER TABLE table_name SET TBLPROPERTIES ("path of updated schema avsc format file")
Upon upgrading Hive External table from RC to ORC format and running MSCK REPAIR TABLE on it when I do select all from the table , I get following error -
Failed with exception java.io.IOException:java.io.IOException: Malformed ORC file hdfs://myServer:port/my_table/prtn_date=yyyymm/part-m-00000__xxxxxxxxxxxxx Invalid postscript length 1
What is the process to be followed for migrating RC formatted historical data to ORC formatted new definition for same table if there is one ?
Hive doesn't automatically reformat the data when you add partitions. You have two choices:
Leave the old partitions as RC files and make the new partitions ORC.
Move the data to a staging table and use insert overwrite to re-write the data as ORC files.
Blockquote
Add Row format ,input format and outformat to solve the problen in create statement:
create external table xyz
(
a string,
b string)
PARTITIONED BY (
c string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
Loacation "hdfs path";
I have an employee data with 3 departments A,B,C.
I am trying to create partioned table on departments.
I created the table using below command.
create external table Parti_Trail (EmployeeID Int,FirstName
String,Designation String,Salary Int) PARTITIONED BY (Department
String) row format delimited fields terminated by "," location
'/user/sree/HiveTrail';
But this did nt load my table with data in location '/user/sree/HiveTrail'
So I tried to load my table
LOAD DATA INPATH '/user/aibladmin/HiveTrail' OVERWRITE INTO TABLE Parti_SCDTrail PARTITION(department);
But showing
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: department not found in table's partition spec: {department=null}
Why is it so. Am I doing anything wrong?
What happens if we SET hive.exec.dynamic.partition.mode = nonstrict;
While creating partitioned table , do we need to keep data seperated in different folder or whether it automatically get seperated into different partitions
For external tables with partition in Hive you need to run an ALTER statement to update the Metastore for new partitions. Because external tables are not managed by Hive.
Check this link
Hope it helps...!!!