how to append hadoop job id to hive query result file? - hadoop

I have a hive query insert overwrite to local file system. My query is as the following:
insert overwrite local directory /home/test/dds
select col1, col2 from test_table where query_ymd='2011-05-15' or query_ymd='2011-05-16' or query_ymd='2011-05-17';
It generates 2 files:
.000000_0.crc
000000_0
I would like the output to be:
attempt_201303210330_19069_r_000000_0
attempt_201303210330_19069_r_000000_0.crc
How can I config the hive server or query?

one HQL has some jobs,not only one.So,you can not do this.

Related

Hive one line command to catch SCHEMA + TABLE NAME info

Is there a way to catch all schema + table name info in a single command through Hive in a similar way to
SELECT * FROM information_schema.tables
from the PostgreSQL world?
show databases and show tables combined in a loop [here an example] is an answer, but I'm looking for a more compact way to have the same result in a single command.
It's been long I have worked on Hive Queries but as far as I remember you can probably use
hive> desc formatted tableName;
or
hive> describe formatted tableName;
It will give you all the relevant information related to the Table like the Schema, Partition info, Table Type like Managed Table, etc
I am not sure If you are particularly looking for this ??
There is another way to query Hive Tables, is writing Hive Scripts which can be called from Hadoop Terminal rather than from Hive Terminal itself.
std]$ cat sample.hql or vi sample.hql
use dbName;
select * from tableName;
desc formatted tableName;
# this hql script can be called from outside the hive terminal
std]$ hive -f sample.hql
or, without even have to write script file you can probably query hive as
std]$ hive -e "use dbName; select * from emp;" > text.txt or >> to append
On the Database level, you can probably query as :
hive> use dbName;
hive> set hive.cli.print.current.db=true;
hive(dbName)> describe database dbName;
it will bring metadata from MySQL(metastore) about the Database.

Exit status of hive queries in .hql file

I have multiple hive queries in hive_queries.hql. I want to keep a log tracking the exit status of individual queries. Also if possible, I want to change the individual queries to fetch the data such as I want to change the query
"select * from ABC"
to
"load data local inpath '<path>/<folder_name>' select * from ABC"
I want to keep a log tracking the exit status of individual queries
As per my knowledge there is no standard way to track the exit status of individual queries being run through .hql file. What you may do:
Output your data in a hive table format.
Check for _SUCCESS file at the warehouse location/output location (if it is an external table or using INSERT OVERWRITE) to determine for failure.
I want to change the individual queries to fetch the data such as I
want to change the query "select * from ABC" to "load data local
inpath '/' select * from ABC"
There is a trick to use hiveconf to achieve this.
Write your query like
`${hiveconf:start_tag}`
select * from ABC
By this way, basically you are creating a placeholder in the script which may be replaced at runtime. E.g.
if you execute the script as
hive -hiveconf start_tag= -f my_script.hql
Then your query will be executed as
select * from ABC
if you execute the script as
hive -hiveconf start_tag="load data local inpath '<path>/<folder_name>'" -f my_script.hql
Then your query will be executed as
load data local inpath '<path>/<folder_name>'
select * from ABC

Sqoop incremental export using hcatalog?

Is there a way to use sqoop to do incremental exports ? I am using Hcatalog integration for sqoop.I tried using the --last-value, --check-column options which are used for incremental import, but sqoop gave me error that the options were invalid.
I have not seen incremental sqoop export arguments. The other way you could try is to create a contol_table in hive where you keep log of the table name & timestamp when it was last exported every time.
create table if not exists control_table (
table_name string,
export_date timestamp
);
insert into control_table 'export_table1' as table_name, from_unixtime(unix_timestamp()) as export_date from control_table;
If export_table1 is the table you want to export incrementally and assuming if have already executed above two statements.
--execute below at once
--get the timestamp when the table was last executed
create temporary table control_table_now as select table_name, max(export_date) as last_export_date from control_table group by table_name;
--get incremental rows
create table new_export_table1 as select field1, field2, field3, .... timestamp1 from export_table1 e, control_table_now c where c.table_name = 'export_table1' and e.timestamp1 >= c.last_export_date;
--append the control_table for next process
insert into control_table 'export_table1' as table_name, from_unixtime(unix_timestamp()) as export_date from control_table;
Now, export the new_export_table1 table which is incrementally created using sqoop export command.
By default sqoop does not support incremental update with hcatalog integration, when we try it gives following error
Append mode for imports is not compatible with HCatalog. Please remove the parameter--append-mode
at org.apache.sqoop.tool.BaseSqoopTool.validateHCatalogOptions(BaseSqoopTool.java:1561)
you can use query option to make it work. as described in this hortonworks document

Hive : How to execute a query from a file and dump the output in hdfs

I can execute a query from a sql file and store the output in a local file using
hive -f /home/Prashasti/test.sql > /home/Prashasti/output.csv
Also, I can store the output of a hive query in hdfs using :
insert overwrite directory 'user/output' select * from folders;
Is there any way I can run the query from a sql file and store the output in hdfs too?
Just modify the sql file and add the insert overwrite directory 'user/output' to the front of the query.

Exporting Data from Hive table to Local Machine File System

Using following command:
insert overwrite local directory '/my/local/filesystem/directory/path'
select * from Emp;
overwrites the entire already existing data in /my/local/filesystem/directory/path with the data of Emp.
What i want is to just copy the data of Emp to /my/loca/filesystem/directory/path and not overwrite, how to do that?
Following are my failed trials:
hive> insert into local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:12 mismatched input 'local' expecting
TABLE near 'into' in insert clause
hive> insert local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:0 cannot recognize input near 'insert'
'local' 'directory' in insert clause
Can u please tell me how can I get this solved?
To appened to a hive table you need to use INSERT INTO:
INSERT INTO will append to the table or partition keeping the existing
data in tact. (Note: INSERT INTO syntax is only available starting in
version 0.8)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
But you can't use this to append to an existing local file so another option is to use a bash command.
If you have a file called 'export.hql' and in that file your code is:
select * from Emp;
Then your bash command can be:
hive -f 'export.hql' >> localfile.txt
The -f command executes the hive file and the >> append pipes the results to the text file.
EDIT:
The command:
hive -f 'export.hql' > localfile.txt
Will save the hive query to a new file, not append.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-SQLOperations
When using 'LOCAL', 'OVERWRITE' is also needed in your hql.
For example:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out' SELECT * FROM test

Resources