Exporting Data from Hive table to Local Machine File System - hadoop

Using following command:
insert overwrite local directory '/my/local/filesystem/directory/path'
select * from Emp;
overwrites the entire already existing data in /my/local/filesystem/directory/path with the data of Emp.
What i want is to just copy the data of Emp to /my/loca/filesystem/directory/path and not overwrite, how to do that?
Following are my failed trials:
hive> insert into local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:12 mismatched input 'local' expecting
TABLE near 'into' in insert clause
hive> insert local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:0 cannot recognize input near 'insert'
'local' 'directory' in insert clause
Can u please tell me how can I get this solved?

To appened to a hive table you need to use INSERT INTO:
INSERT INTO will append to the table or partition keeping the existing
data in tact. (Note: INSERT INTO syntax is only available starting in
version 0.8)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
But you can't use this to append to an existing local file so another option is to use a bash command.
If you have a file called 'export.hql' and in that file your code is:
select * from Emp;
Then your bash command can be:
hive -f 'export.hql' >> localfile.txt
The -f command executes the hive file and the >> append pipes the results to the text file.
EDIT:
The command:
hive -f 'export.hql' > localfile.txt
Will save the hive query to a new file, not append.

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-SQLOperations
When using 'LOCAL', 'OVERWRITE' is also needed in your hql.
For example:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out' SELECT * FROM test

Related

Timestamp partitioning in Hive

I am trying to create timestamp based partition in hive. But hive is creating data based partition. Below is my code. Could someone please help?
cat test1.sh
dat=`date +'%Y%m%d %H:%m:%S'`
hive -f load.hql -hiveconf file_load_timestamp=$dat;
cat load.hql
INSERT OVERWRITE table perm.test partition(file_load_timestamp='${hiveconf:dat}')
SELECT a,b FROM work.temp;
dt=20180102/ = HDFS path is getting created like this.
dt=20180102 103455/ = Expecting HDFS path to be created like this.
When I tried with %Y%m%d_%H:%m:%S' format its working as expected. But I need space between date and timestamp.
To create a folder name in HDFS with space in between, it is required to escape the space with \
hadoop fs -mkdir test\ 123
create a folder in hdfs with name test 123.
Similarly, hive maintains the partitions in folders created with the partition value. Thats why providing the date format %Y%m%d\ %H%m%S will help to create folder with spaces.
Below is tested and working:
INSERT OVERWRITE table person_details1 partition(datelocal='20180102\ 200128') select * from person_details;
datelocal is String
Edited:Executed the code, Below is working one:
hduser#Amit:~$ cat test1.sh
#!/bin/sh
dat=`date +'%Y%m%d\ %H%m%S'`
hive -f load.hql -hiveconf datelocal="$dat";
hduser#Amit:~$ cat load.hql
INSERT OVERWRITE table amit.person_details1 partition(datelocal='${hiveconf:datelocal}') select * from amit.person_details;

multi file insert from hive table not working?

Hi i have 200 gb of data in one of my hive table backed on HBase.
I have to create 142 different files out of that table currently trying for 3 files only .
I want to run all query to run parallel at the same time .
I was trying multi file insert from hive table but getting parse exception .
This is my query that i was trying .
FROM hbase_table_FinancialLineItem
INSERT OVERWRITE LOCAL DIRECTORY '/hadoop/user/m6034690/FSDI/FinancialLineItem/Japan.txt'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
select * from hbase_table_FinancialLineItem WHERE FilePartition='Japan'
INSERT OVERWRITE LOCAL DIRECTORY '/hadoop/user/m6034690/FSDI/FinancialLineItem/SelfSourcedPrivate.txt'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
select * from hbase_table_FinancialLineItem WHERE FilePartition='SelfSourcedPrivate'
INSERT OVERWRITE LOCAL DIRECTORY '/hadoop/user/m6034690/FSDI/FinancialLineItem/ThirdPartyPrivate.txt'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
select * from hbase_table_FinancialLineItem WHERE FilePartition='ThirdPartyPrivate';
And after running this i was getting below error.
FAILED: ParseException line 7:9 missing EOF at 'from' near '*'
I think it can be solved when you add this FROM hbase_table_FinancialLineItem; at the end of each insert overwrite.

Hive : How to execute a query from a file and dump the output in hdfs

I can execute a query from a sql file and store the output in a local file using
hive -f /home/Prashasti/test.sql > /home/Prashasti/output.csv
Also, I can store the output of a hive query in hdfs using :
insert overwrite directory 'user/output' select * from folders;
Is there any way I can run the query from a sql file and store the output in hdfs too?
Just modify the sql file and add the insert overwrite directory 'user/output' to the front of the query.

Hive error: parseexception missing EOF

I am not sure what I am doing wrong here:
hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
tblproperties ("orc.compress"="NONE")
LOCATION "/user/hive/test_table";
FAILED: ParseException line 1:107 missing EOF at 'LOCATION' near ')'
while the following query works perfectly fine:
hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
tblproperties ("orc.compress"="NONE");
OK
Time taken: 0.106 seconds
Am I missing something here. Any pointers will help. Thanks!
Try put the "LOCATION" in front of "tblproperties" like below, worked for me.
CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
LOCATION "/user/hive/test_table"
tblproperties ("orc.compress"="NONE");
It seems even the sample SQL from book "Programming Hive" got the order wrong. Please reference to the official definition of create table command:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
#Haiying Wang pointed out that LOCATION is to be put in front of tblproperties.
But I think the error also occurs when location is specified above stored as.
Its better to stick to the correct order:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
Refer: Hive Create Table
Check this post:
Loading Data from a .txt file to Table Stored as ORC in Hive
And check your source files present at the specified directory /user/hive/test_table. Incase the files are in .txt or some other non ORC format then you can follow the steps in the above post to come out of the error.
ParseException line lineNumber missing EOF at '.' near 'schemaName':
Got the above error while trying to execute the following command from linux script to truncate a hive table
dse -u username -p password hive -e "truncate table keyspace.tablename;"
Fix:
Need to separate the commands within the script line as follows -
dse -u username -p password hive -e "use keyspace; truncate table keyspace.tablename;"
Happy coding!
Got the same error while creating a table in hive.
I used the drop command to drop the table and then run the create table command that I had again.
Worked for me.
If you see this error when running the HiveQL from a file with the command "hive -f file.hql". And that it points the first line of your query most definitely this is because of a forgotten semicolon(;) for a previous query.
Since parser looks for semicolon(;) as a terminator for each query.
for example:
DROP TABLE IF EXISTS default.emp
create table default.emp (
field1 type,
field2 type)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3://gts-promocube/source-data/Lowes/POS/';
If you save the above in a file and execute it with hive -f, then you'll get the error:
FAILED: ParseException line 2:0 missing EOF at 'CREATE' near emp.
Solution: Put a semicolon(;) for the DROP TABLE command above.

how to append hadoop job id to hive query result file?

I have a hive query insert overwrite to local file system. My query is as the following:
insert overwrite local directory /home/test/dds
select col1, col2 from test_table where query_ymd='2011-05-15' or query_ymd='2011-05-16' or query_ymd='2011-05-17';
It generates 2 files:
.000000_0.crc
000000_0
I would like the output to be:
attempt_201303210330_19069_r_000000_0
attempt_201303210330_19069_r_000000_0.crc
How can I config the hive server or query?
one HQL has some jobs,not only one.So,you can not do this.

Resources