unable to create Parquet file in hive - hadoop

Can any one pls tell me what is the error in the below query.
insert overwrite directory 'user/cloudera/batch' stored as parquet select * from emp;
I am trying to create parquet table. I am facing the below error when using the above command.
cannot recognize input near 'stored' 'as' 'parquet' in select clause

For creating parquet table first create table and store as parquet.
CREATE TABLE emp (x INT, y STRING) STORED AS PARQUET;
now load the data into this table then you can execute your query.
insert overwrite directory '/user/cloudera/batch' select * from emp;

Related

Hive/Impala write to HDFS

On Hue, I can write a query using Hive or Impala:
SELECT * FROM database.tablename LIMIT 10
The output appears and I can click "export data" and store it on my hdfs folder user/username/mytestfolder as parquet. I want to do the exporting from the hive script, and tried versions of:
INSERT OVERWRITE DIRECTORY '/user/username/mytestfolder'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS PARQUET
SELECT * FROM database.tablename
LIMIT 10;
but it always returns an error Error while compiling statement: FAILED: SemanticException Error creating temporary folder on: hdfs://bdp01cdhc01-ns/user/username
I don't think INSERT OVERWRITE DIRECTORY is what you want.
You could create a table in the location that you want, using CREATE TABLE AS SELECT statement
CREATE TABLE x
STORED AS PARQUET
LOCATION '/user/username/mytestfolder'
AS SELECT * FROM database.tablename LIMIT 10;
Or CREATE EXTERNAL TABLE x LIKE database.tablename LOCATION 'path';, followed by an INSERT from the other table
But, HDFS shoudn't be used to store such small files (only 10 rows)
Also, these are for text files, and have no context for Parquet
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
Alternatively, if you have Spark or Pig available, those would also let you save off Hive tables to alternate HDFS locations

Loading in path file to a partitioned table

I'm trying to load a file locally into Hive by running this command:
LOAD DATA INPATH '/data/work/hive/staging/ExampleData.csv' INTO TABLE tablename;
which gives me the error:
SemanticException [Error 10062]: Need to specify partition columns
because the destination table is partitioned (state=42000,code=10062)
An answer I found suggests creating an intermediate table then letting dynamic partitioning kick in to load into a partitioned table.
I've created a table that matches the data and truncated it:
create table temptablename as select * from tablename;
truncate table temptablename
Then loaded the data using:
LOAD DATA INPATH '/data/work/hive/staging/ExampleData.csv' INTO TABLE temptablename;
How do I 'kick in' dynamic partitioning?
1.Load data into temptablename(without partition)
create table temptablename(col1,col2..);
LOAD DATA INPATH '/data/work/hive/staging/ExampleData.csv' INTO TABLE
temptablename;
now once you have data in intermediate table ,you can kick in dynamic
partitioning using following command.
2.INSERT into tablename PARTITION(partition_column) select * from
temptablename;

Add partition to hive table with no data

I am trying to create a hive table which has the same columns as another table (partitioned). I use the following query for the same
CREATE TABLE destTable STORED AS PARQUET AS select * from srcTable where 1=2;
Apparently I cannot use 'PARTITIONED BY(col_name)' because destTable must not be partitioned. But I want to mention that destTable should be partitioned by a column (same as srcTable) before I add data to it.
Is there a way to do that?
As you mentioned, destTable can not be a partitioned table so there is no way to do this directly. Also, destTable can not be an external table.
In this situation you will need to create a temporary "staging_table" (un-partitioned and a Hive-managed table) to hold the data.
Step 1: Transfer everything from srcTable to the staging_table
Step 2: Create a partitioned destTable and do:
INSERT OVERWRITE TABLE destTable PARTITION(xxxx)
SELECT * FROM staging_table;
Hope this helps.

Hive, create table ___ like ___ stored as ___

I have a table in hive stored as text files. I want to move all the data into another table with the same schema but stored as sequence files.
How do I create the second table? I wanted to use the hive create table like command but it doesn't support as sequencefile
hive> create table test_sq like test_t stored as sequencefile;
FAILED: ParseException line 1:33 missing EOF at 'stored' near 'test_t'
I am looking for a programmatic way so that I can replicate the same process for more tables.
CREATE TABLE test_t LIKE test_sq;
It just copies the source table definition.The new table contains no rows. As you said you have to move all the data. In this case above query is not suitable;
try this,
CREATE TABLE test_sq row format delimited fields terminated by '|' STORED AS sequencefile AS select * from test_t;
Target cannot be partitioned table.
Target cannot be external table.
It copies the structure as well as the data
Note - if you don't want to give row format delimited then remove from query. You can give where clause also in query to copy selected rows;
Try using create + insert together.
Use the normal DDL statement to create the table.
CREATE TABLE test2 (a INT) STORED AS SEQUENCEFILE
then use
INSERT INTO test2 AS SELECT * FROM test;
test is the table with Textfile as data format and 'test2' is the table with SEQUENCEFILE data format.

Hive load specific columns

I am interested in loading specific columns into a table created in Hive.
Is it possible to load the specific columns directly or I should load all the data and create a second table to SELECT the specific columns?
Thanks
Yes you have to load all the data like this :
LOAD DATA [LOCAL] INPATH /Your/Path [OVERWRITE] INTO TABLE yourTable;
LOCAL means that your file is on your local system and not in HDFS, OVERWRITE means that the current data in the table will be deleted.
So you create a second table with only the fields you need and you execute this query :
INSERT OVERWRITE TABLE yourNewTable
yourSelectStatement
FROM yourOldTable;
It is suggested to create an External Table in Hive and map the data you have and then create a new table with specific columns and use the create table as command
create table table_name as select statement from table_name;
For example the statement looks like this
create table employee as select id as id,emp_name as name from emp;
Try this:
Insert into table_name
(
#columns you want to insert value into in lowercase
)
select columns_you_need from source_table;

Resources