I'm using this code to write the results of a Hive query to the specified file:
INSERT OVERWRITE DIRECTORY '/user/test.user/test.csv'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ESCAPED BY '"' STORED AS TEXTFILE
SELECT
...
I don't want the filename to be test.csv however but the unix timestamp, that is 1517213651.csv or something like that.
I understand I can't use the concat function to manipulate the filename, but that is as far as I got.
How do I get the timestamp of the moment of query execution to be the filename of my output?
EDIT: We're using Cloudera.
Another option is to put the Hive insert inside of a Shell Script. Define a Date variable in the script and then use the Date Variable to define the output file.
TIMESTAMP_VAR=date +"%Y-%m-%d-%H-%M-%S"
FILENAME_VAR=/user/test/${TIMESTAMP_VAR}.csv
You can manipulate the timestamp layout in numerous ways.
you have to add TalendDate.getDate("CCYYMMDD") in file path.
"/File1/Output_File_" + TalendDate.getDate("CCYYMMDD") + ".csv"
Related
I have the .prog file from a host program created in oracle apps. I am sending a parameter from oracle apps with host program and I can access it in the .prog file like this e.g.
echo "5 Concurrent Program Parameter 1 : " ${5}
I need to use this parameter ($5) into the control file (.ctl) where I will insert some columns and this parameter into a new table. e.g
LOAD DATA
INSERT INTO TABLE TABLE_NAME
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
COL1,
COL2,
DATA_FROM_PROG (5) => ** here i need to insert that data from the .prog file**
)
I am thinking it would have to be included in this command somehow so it creates this control file or another dynamically but I can't figure out how to send that parameter and make this work.
I am familiar with this line that I used in the past for simpler problems
e.g.sqlldr userid=user/pass data=$5 control=control.ctl
Thanks.
I wouldn't know, as I don't know anything about Oracle apps. nor ".prog" files.
Workaround - from my perspective - would be to
load only known data (from the source file)
data_from_prog would be specified as a filler field (and populated with NULL values (if trailing nullcols is specified))
after loading session is over, update that column from Oracle apps. - then you'd use a simple update statement; you're in (PL/)SQL world, it is easy to write such a query (at least, I hope so)
Using Bash Script in the .prog file to create the control file (.ctl) dynamically from scratch seems to be working and I can use the parameters as well.
So in the .prog file we would have:
echo "5 Concurrent Program Parameter 1 : " ${5} /*this is only to test it*/
/* *Printf* with *>* command will create and edit a file.
Alternative *Printf* with *>>* would append to the file*/
printf "LOAD DATA\n
INFILE 'path_to_csv_file.csv'\n /*this is data for col1, col2 etc*/
INSERT INTO TABLE TABLE_NAME\n
FIELDS TERMINATED BY \',\' OPTIONALLY ENCLOSED BY \'\"\'\n
TRAILING NULLCOLS\n
(COL1,\n
COL2,\n
DATA_FROM_PROG CONSTANT ${5})" > [name and path to control file (e.g./folder/control.ctl)]
This way, when the .prog file is executed it will Dynamically create the .ctl file which will have the parameter that we want (${5}).
And we can also add something like this to run the .ctl file
sqlldr userid=user/pass control=[path_to_control]control.ctl log=track.log
Also make sure to escape the quotes ' and double quotes " with \ because you will get some errors otherwise.
In our environment we do not have access to Hive meta store to directly query.
I have a requirement to generate tablename , columnname pairs for a set of tables dynamically.
I was trying to achieve this by running "describe extended $tablename" to a file for all tables and pick up tablename and column name pairs from the file.
is there any easier way it is done/it can be done other than this way .
The desired output is like
table1|col1
table1|col2
table1|col3
table2|col1
table2|col2
table3|col1
This script will print columns in desired format for single table. AWK parses strings from describe command, takes only column_name, concatenates with "|" and table_name variable, each string printed with \n as a delimiter between them.
#!/bin/bash
#Set table name here
TABLE_NAME=your_schema.your_table
TABLE_COLUMNS=$(hive -S -e "set hive.cli.print.header=false; describe ${TABLE_NAME};" | awk -v table_name="${TABLE_NAME}" -F " " 'f&&!NF{exit}{f=1}f{printf c table_name "|" toupper($1)}{c="\n"}')
You can easily modify it for generating output for all tables using show tables command for example.
The easier way is to access metadata database directly.
I am a beginner in Hadoop/Hive. I did some research to find out a way to export results of HiveQL query to CSV.
I am running below command line in Putty -
Hive -e ‘use smartsourcing_analytics_prod; select * from solution_archive_data limit 10;’ > /home/temp.csv;
However below is the error I am getting
ParseException line 1:0 cannot recognize input near 'Hive' '-' 'e'
I would appreciate inputs regarding this.
Run your command from outside the hive shell - just from the linux shell.
Run with 'hive' instead of 'Hive'
Just redirecting your output into csv file won't work. You can do:
hive -e 'YOUR QUERY HERE' | sed 's/[\t]/,/g' > sample.csv
like was offered here: How to export a Hive table into a CSV file?
AkashNegi answer will also work for you... a bit longer though
One way I do such things is to create an external table with the schema you want. Then do INSERT INTO TABLE target_table ... Look at the example below:
CREATE EXTERNAL TABLE isvaliddomainoutput (email_domain STRING, `count` BIGINT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE
LOCATION "/user/cloudera/am/member_email/isvaliddomain";
INSERT INTO TABLE isvaliddomainoutput
SELECT * FROM member_email WHERE isvalid = 1;
Now go to "/user/cloudera/am/member_email/isvaliddomain" and find your data.
Hope this helps.
I am trying to write the hive table into hdfs file using following queries
insert overwrite directory '<HDFS Location>' select customerid,'\t' ,f1,',', f2,',', f3,',', f4,',', f5 from sd_cust_product_recomm_all_emailid_model2 WHERE EMAILID IS NOT NULL;
I am getting the UTF and spaces in the file . The output is somthing like this :
customer1\t^Af1^A,^Af2^A,^Af3^A,^Af4^A,^Af5^A,
I desired output in following format
customer1/tf1,f2,f3,f4,f5
customer2/tf1,f2,f3,f4,f5
with no spaces and UTF
Thanks for the help
The default delimiter is the issue. Data written to the filesystem is serialized as text with columns separated by ^A.
By explicitly mentioning the Field delimiter(Comma) and Row delimiter(\n) you can overcome the issue.
insert overwrite directory '[HDFS Location]' ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' select
customerid,'\t',f1,f2,f3,f4,f5 from
sd_cust_product_recomm_all_emailid_model2 WHERE EMAILID IS NOT NULL;
I Have my data in this format.
"123";"mybook1";"2002";"publisher1";
"456";"mybook2;the best seller";"2004";"publisher2";
"789";"mybook3";"2002";"publisher1";
the fields are enclosed in "" and are delimited by ; Also the book name may contain ';' in between.
Can you tell me how to load this data from file to hive table
the below query which i am using now obviously not working ;
create table books (isbn string,title string,year string,publisher string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;'
if possible i want the userid and year fields to be stored as Int. Please help
Also i dont want to use regexserde command.
how can i use sed command from unix to clean the data and get my output.
i tried to learn about sed command and found the replace option. So i can remove the " double quotations. But how can i handle the extra ; semi colon which comes in the middle of the data
Please help
I think you can preprocess with sed and then use the MetadataTypedColumnsetSerDe WITH SERDEPROPERTIES
sed -r ':a; s/^([^"]*("[^"]*"[^"]*)*);/\1XXXXX/g; t a; s/;/ /g; s/XXXXX/;/g' file
This sed matches the quote pairs to avoid processing what is between quotes putting a placeholder for the semicolons outside of quoted text. Afterward it removes the ;'s from the book title text and replaces them w/a space and puts back the semicolons that are outside quotes.
See here for more how to load data using Hive including an example of MetadataTypedColumnsetSerDe WITH SERDEPROPERTIES:
https://svn.apache.org/repos/asf/hive/trunk/serde/README.txt
create external table books (isbn int,title string,year int,publisher string)
row format SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH
SERDEPROPERTIES ('separatorChar' = '\;' , 'quoteChar' = '\"' ) location 'S3
path/HDFS path for the file';