Loading in path file to a partitioned table - hadoop

I'm trying to load a file locally into Hive by running this command:
LOAD DATA INPATH '/data/work/hive/staging/ExampleData.csv' INTO TABLE tablename;
which gives me the error:
SemanticException [Error 10062]: Need to specify partition columns
because the destination table is partitioned (state=42000,code=10062)
An answer I found suggests creating an intermediate table then letting dynamic partitioning kick in to load into a partitioned table.
I've created a table that matches the data and truncated it:
create table temptablename as select * from tablename;
truncate table temptablename
Then loaded the data using:
LOAD DATA INPATH '/data/work/hive/staging/ExampleData.csv' INTO TABLE temptablename;
How do I 'kick in' dynamic partitioning?

1.Load data into temptablename(without partition)
create table temptablename(col1,col2..);
LOAD DATA INPATH '/data/work/hive/staging/ExampleData.csv' INTO TABLE
temptablename;
now once you have data in intermediate table ,you can kick in dynamic
partitioning using following command.
2.INSERT into tablename PARTITION(partition_column) select * from
temptablename;

Related

Regarding Dynamic Partition Table insert without creating staging/temp table in hive

While loading data into Dynamic partition table I know first to create
temp/staging table then load data in this temp table and then overwrite
into partitioned table but in an interview I was asked how to load directly
without temp/staging table.Please guide what are other methods to insert.
To load the data directly in the table without using temp/staging table we can use the below command
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE db.tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
You can refer to the link

detailed steps for bulk loading in HBase table

I am new to HBase. Can someone provide me a detailed example on how bulk loading can be done in a HBase table.
Say for example I have a customer file with 10 columns and 100K rows. I want to load the file in a HBase table.
I have created a HBase table which is managed by HIVE and tried to load the same using LOAD command, but it failed.
Looks like I have to insert the table from HBase only.
hive (Koushik)> CREATE TABLE hive_hbase_emp_sample(eid int, ename string, esal double)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES
> ("hbase.columns.mapping" = ":key,cfstr:enm,cfsal:esl")
> TBLPROPERTIES ("hbase.table.name" = "hive_hbase_emp_sample");
OK
Time taken: 6.404 seconds
hive (Koushik)> load data local inpath '/home/hduser/sample_emp_file' into table hive_hbase_emp_sample;
FAILED: SemanticException [Error 10101]: A non-native table cannot be used as target for LOAD
You cannot direcly use load for targeting a HbaseStorage Handler Non native table instead load data in a staging table and then insert into your Hbase table using select * from staging table

Insert partitioned data into partitioned hive table

I have stored the data in hdfs using Pig Multistorage with the column id.
So data stored as
/output/1/part-0000
/output/2/
/output/3/
Now I have created a partitioned table in hive and I want to load the data from /output folder into this partitioned table. Is there any way to achieve this?
First you create a temp hive table where you load all the data from pig output.
Then You load to your actual partitioned hive table from temp table.
Something like below:
FROM emp_external temp INSERT OVERWRITE TABLE emp_partition PARTITION(country) SELECT temp.id,temp.name,temp.dept,temp.sal,temp.country;
Else you can explore Hcatlog for this case.
not sure if you are looking to insert the data in the outputfolder (created from pig) to an existing table or loading the data in the output folder in to a new hive partitioned table.
If you want to load the data in to new hive table, you can create a new partitioned table pointing to the output folder
If you are looking to load the data into an existing hive table, then you can either create a temp table as #Aman mentioed and do a insert in to the destination table
or
You can just move/copy the files in the hdfs from output/ to hive table location.
Hope this helps
Assign a Hive schema to pig output location with partitioned columns (Alter table Add Partition) as column id. Now both are hive tables and you can use where clause over partitioned column to move over the data.

FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException while inserting data into Hive partitioned table

I have an employee data with 3 departments A,B,C.
I am trying to create partioned table on departments.
I created the table using below command.
create external table Parti_Trail (EmployeeID Int,FirstName
String,Designation String,Salary Int) PARTITIONED BY (Department
String) row format delimited fields terminated by "," location
'/user/sree/HiveTrail';
But this did nt load my table with data in location '/user/sree/HiveTrail'
So I tried to load my table
LOAD DATA INPATH '/user/aibladmin/HiveTrail' OVERWRITE INTO TABLE Parti_SCDTrail PARTITION(department);
But showing
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: department not found in table's partition spec: {department=null}
Why is it so. Am I doing anything wrong?
What happens if we SET hive.exec.dynamic.partition.mode = nonstrict;
While creating partitioned table , do we need to keep data seperated in different folder or whether it automatically get seperated into different partitions
For external tables with partition in Hive you need to run an ALTER statement to update the Metastore for new partitions. Because external tables are not managed by Hive.
Check this link
Hope it helps...!!!

Loading Data from a .txt file to Table Stored as ORC in Hive

I have a data file which is in .txt format. I am using the file to load data into Hive tables. When I load the file in a table like
CREATE TABLE test_details_txt(
visit_id INT,
store_id SMALLINT) STORED AS TEXTFILE;
the data is loaded correctly using
LOAD DATA LOCAL INPATH '/home/user/test_details.txt' INTO TABLE test_details_txt;
and I can run a SELECT * FROM test_details_txt; on the table in Hive.
However If I try to load the data in a table that is
CREATE TABLE test_details_txt(
visit_id INT,
store_id SMALLINT) STORED AS ORC;
I receive the following error on trying to run a SELECT:
Failed with exception java.io.IOException:java.io.IOException: Malformed ORC file hdfs://master:6000/user/hive/warehouse/test.db/transaction_details/test_details.txt. Invalid postscript.
While loading the data using above LOAD statement I do not receive any error or exception.
Is there anything else that needs to be done while using the LOAD DATA IN PATH.. command to store data into an ORC table?
LOAD DATA just copies the files to hive datafiles. Hive does not do any transformation while loading data into tables.
So, in this case the input file /home/user/test_details.txt needs to be in ORC format if you are loading it into an ORC table.
A possible workaround is to create a temporary table with STORED AS TEXT, then LOAD DATA into it, and then copy data from this table to the ORC table.
Here is an example:
CREATE TABLE test_details_txt( visit_id INT, store_id SMALLINT) STORED AS TEXTFILE;
CREATE TABLE test_details_orc( visit_id INT, store_id SMALLINT) STORED AS ORC;
-- Load into Text table
LOAD DATA LOCAL INPATH '/home/user/test_details.txt' INTO TABLE test_details_txt;
-- Copy to ORC table
INSERT INTO TABLE test_details_orc SELECT * FROM test_details_txt;
Steps:
First create a table using stored as TEXTFILE  (i.e default or in
whichever format you want to create table)
Load data into text table.
Create table using stored as ORC as select * from text_table;
Select * from orc table.
Example:
CREATE TABLE text_table(line STRING);
LOAD DATA 'path_of_file' OVERWRITE INTO text_table;
CREATE TABLE orc_table STORED AS ORC AS SELECT * FROM text_table;
SELECT * FROM orc_table; /*(it can now be read)*/
Since Hive does not do any transformation to our input data, the format needs to be the same: either the file should be in ORC format, or we can load data from a text file to a text table in Hive.
ORC file is a binary file format, so you can not directly load text files into ORC tables.
ORC stands for Optimized Row Columnar which means it can store data in an optimized way than the other file formats. ORC reduces the size of the original data up to 75%. As a result the speed of data processing also increases. ORC shows better performance than Text, Sequence and RC file formats.
An ORC file contains rows data in groups called as Stripes along with a file footer. ORC format improves the performance when Hive is processing the data.
First you need to create one normal table as textFile, load your data into the textFile table and then you can use insert overwrite query to write your data into ORC file.
create table table_name1 (schema of the table) row format delimited by ',' | stored as TEXTFILE
create table table_name2 (schema of the table) row format delimited by ',' | stored as ORC
load data local inpath ‘path of your file’ into table table_name1;(loading data from a local system)
INSERT OVERWRITE TABLE table_name2 SELECT * FROM table_name1;
Now all your data will be stored in an ORC file.
The similar procedure is applied to all the binary file formats i.e., Sequence files, RC files and Parquet files in Hive.
You can refer to the below link for more details.
https://acadgild.com/blog/file-formats-in-apache-hive/
Steps to load data into ORC file format in hive
1.Create one normal table using textFile format
2.Load the data normally into this table
3.Create one table with the schema of the expected results of your normal hive table using stored as orcfile
4.Insert overwrite query to copy the data from textFile table to orcfile table
Refer the blog to learn the handson of how to load data into all file formats in hive
Load data into all file formats in hive

Resources