How to export vertica table data in sql file - vertica

I wanted to export table data in vertica and generates a SQL INSERT script. I exported table schemas and generated a SQL script. Is there any way to export table data in vertica?
Thank you.

What you can do is to write a vsql script that puts its output to file.
Script exp.sql:
-- don't align
\a
-- tuples only
\t
\pset fieldsep '|'
-- write to workfile.sql
\o workfile.sql
-- get the create table statement into the workfile
SELECT EXPORT_OBJECTS('','public.foo',FALSE);
-- put the COPY command for in-line data into the workfile
SELECT 'COPY public.foo FROM STDIN DELIMITER ''|'';';
-- export the table's data
SELECT * FROM public.foo;
-- append a backslash-dot line to mark the end of the input of the copy command.
SELECT '\.';
Resulting workfile.sql:
CREATE TABLE public.foo
(
id numeric(37,15),
first_name varchar(256),
last_name varchar(256),
hire_dt timestamp
);
CREATE PROJECTION public.foo
(
id,
first_name,
last_name,
hire_dt
)
AS
SELECT foo.id,
foo.first_name,
foo.last_name,
foo.hire_dt
FROM public.foo
ORDER BY foo.id,
foo.first_name,
foo.last_name,
foo.hire_dt
UNSEGMENTED ALL NODES;
COPY public.foo FROM STDIN DELIMITER '|';
1.000000000000000|Arthur|Dent|2017-02-05 00:00:00
2.000000000000000|Ford|Prefect|2017-02-05 00:00:00
3.000000000000000|Zaphod|Beeblebrox|2017-02-05 00:00:00
4.000000000000000|Tricia|McMillan|2017-02-05 00:00:00
[ . . . ]
41.000000000000000|Lunkwill|Lunkwill|2017-02-05 00:00:00
42.000000000000000|Fook|Fook|2017-02-05 00:00:00
\.

Related

How to store date value in hive timestamp?

I am trying to store the date and timestamp values in timestamp column using hive. The source file contain the values of date or sometimes timestamps.
Is there a way to read both date and timestamp by using the timestamp data type in hive.
Input:
2015-01-01
2015-10-10 12:00:00.232
2016-02-01
Output which I am getting:
null
2015-10-10 12:00:00.232
null
Is it possible to read both values by using timestamp data type.
DDL:
create external table mytime(id string ,t timestamp) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs://xxx/data/dev/ind/'
I was able think of a workaround. tried this with a small set of data:
Load the data with inconsistent date data into a hive table say table1 by making the column as string datatype .
Now create another table table2 with the datatype as timestamp for the required column and load the data from table1 to table2 using the transformation INSERT OVERWRITE TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
This should load the data in required format.
Code as below:
`
create table table1
(
id int,
tsstr string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table1.tb';
Data:
1,2015-04-15 00:00:00
2,2015-04-16 00:00:00
3,2015-04-17
LOAD DATA LOCAL INPATH '/home/cloudera/data/tsstr' INTO TABLE table1;
create table table2
(
id int,
mytimestamp timestamp
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table2.tb';
INSERT INTO TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
Result shows up as expected:
Hive is similar to any other database in terms of datatype mapping and hence requires a uniform values for a specific column to be stored under a conformed datatype. The data in your file for second column has non-uniform data i.e, some are in date format while others in timestamp format.
In order to not to lose the date, as suggested by #Kishore , make sure you have a uniform datatype in the file and get the file with timestamp values as 2016-01-01 00:00:000 where there are only dates.

SQL Loader from 1 file.dat to multiple tables doesn't work

I want to load multiple tables from one file (data). It doesnt work.
I use the when condition. But only the first will load.
For more details:
The scripts SQL for tables:
CREATE TABLE TECHNOLOGY
(
code, dept, salary, hiredate
);
CREATE TABLE OTHER
(
code, dept, salary, hiredate
);
The file data ulcase5.dat:
100;Thomas;Sales;5000;1000
200;Jason;Technology;5500;2000
300;Mayla;Technology;7000;2000
400;Nisha;Marketing;9500;1000
500; Randy;Technology;6000;3000
600;Bea;Sales;5000;1000
The control file :
LOAD DATA
INFILE 'ulcase5.dat'
-- BADFILE 'ulcase5.bad'
-- DISCARDFILE 'ulcase5.dsc'
---ONLY THIS TABLE IS LOADER
INTO TABLE TECHNOLOGY APPEND
WHEN salary = 'Technology'
FIELDS TERMINATED BY ";"
OPTIONALLY ENCLOSED BY '"'
(
code, dept, salary, hiredate
)
-- NEVER LOADED.
INTO TABLE OTHER
WHEN salary = 'Sales'
FIELDS TERMINATED BY ";"
OPTIONALLY ENCLOSED BY '"'
(
code, dept, salary, hiredate
)
Need your help. thanks.
Move APPEND before the INTO TABLE line it's currently on:
APPEND
INTO TABLE TECHNOLOGY
Plus your table, data file and control file are not in sync but I suspect that's an editing issue?
EDIT: Amended based on your comment.
Try replacing this:
WHEN salary = 'Sales'
with:
WHEN salary != 'Technology'

hive date & time stamp from unix_timestamp()

I need two columns to be inserted with current date(sysdate) and time stamp.
I have created the table and inserting data using unix_timestamp. I am not able to convert into hive date and time stamp format.
############ Hive create table #############
create table informatica_p2020.M23_MD_LOC_BKEY(
group_nm string,
loc string,
natural_key string,
loc_sk_id int,
**load_date date,
load_time timestamp)**
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/spanda20/informatica_p2020/infor_external/m23_md_loc/m23_md_loc_bkey/';
############### Insert into Table ##########
insert overwrite table M23_MD_LOC_BKEY select 'M23' as group_nm,loc,concat('M23','|','LOC') as NATURAL_KEY,
ROW_NUMBER() OVER () as loc_sk_id,
from_unixtime(unix_timestamp(), 'YYYY-MM-DD'),
from_unixtime(unix_timestamp(), 'YYYY-MM-DD HH:MM:SS.SSS') from M23_MD_LOC LIMIT 2 ;
################output of the insert query ############
M23 SY_BP M23|LOC 1 **2015-07-183** 2015-07-**183** 16:07:00.000
M23 SY_MX M23|LOC 2 2015-07-183 2015-07-183 16:07:00.000
Regards
Sanjeeb
instead of from_unixtime(unix_timestamp(), 'YYYY-MM-DD')
try -
from_unixtime(unix_timestamp(), 'yyyy-MM-dd')

Hive query output to file

I run hive query by java code.
Example:
"SELECT * FROM table WHERE id > 100"
How to export result to hdfs file.
The following query will insert the results directly into HDFS:
INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM table WHERE id > 100;
This command will redirect the output to a text file of your choice:
$hive -e "select * from table where id > 10" > ~/sample_output.txt
This will put the results in tab delimited file(s) under a directory:
INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/YourTableDir'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
SELECT * FROM table WHERE id > 100;
I agree with tnguyen80's response. Please note that when there is a specific string value in query better to given entire query in double quotes.
For example:
$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv
The ideal way to do it will be using "INSERT OVERWRITE DIRECTORY '/pathtofile' select * from temp where id > 100" instead of "hive -e 'select * from...' > /filepath.txt"
#sarath
how to overwrite the file if i want to run another select * command from a different table and write to same file ?
INSERT OVERWRITE LOCAL DIRECTORY '/home/training/mydata/outputs'
SELECT expl , count(expl) as total
FROM (
SELECT explode(splits) as expl
FROM (
SELECT split(words,' ') as splits
FROM wordcount
) t2
) t3
GROUP BY expl ;
This is an example to sarath's question
the above is a word count job stored in outputs file which is in local directory
:)
Two ways can store HQL query results:
Save into HDFS Location
INSERT OVERWRITE DIRECTORY "HDFS Path" ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
SELECT * FROM XXXX LIMIT 10;
Save to Local File
$hive -e "select * from table_Name" > ~/sample_output.txt
$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv
Create an external table
Insert data into the table
Optional drop the table later, which wont delete that file since it is an external table
Example:
Creating external table to store the query results at '/user/myName/projectA_additionaData/'
CREATE EXTERNAL TABLE additionaData
(
ID INT,
latitude STRING,
longitude STRING
)
COMMENT 'Additional Data gathered by joining of the identified cities with latitude and longitude data'
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE location '/user/myName/projectA_additionaData/';
Feeding the query results into the temp table
insert into additionaData
Select T.ID, C.latitude, C.longitude
from TWITER
join CITY C on (T.location_name = C.location);
Dropping the temp table
drop table additionaData
To directly save the file in HDFS, use the below command:
hive> insert overwrite directory '/user/cloudera/Sample' row format delimited fields terminated by '\t' stored as textfile select * from table where id >100;
This will put the contents in the folder /user/cloudera/Sample in HDFS.
Enter this line into Hive command line interface:
insert overwrite directory '/data/test' row format delimited fields terminated by '\t' stored as textfile select * from testViewQuery;
testViewQuery - some specific view
To set output directory and output file format and more, try the following:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format]
SELECT ... FROM ...
Example:
INSERT OVERWRITE DIRECTORY '/path/to/output/dir'
ROW FORMAT DELIMITED
STORED AS PARQUET
SELECT * FROM table WHERE id > 100;

Add DB timestamp to SQL Loader CSV import

I have a client who's CSV file does not contain any dates. They would like a timestamp to indicate as to when each row is loaded into their Oracle 11g database. The CSV file is being supplied by a vendor so I cannot modify the file. I have tried adding a default column value and an "after insert" trigger but with no luck. (Performance is not an issue as this is an off- hours process).
The control files looks like this:
options (skip=1, direct=true, rows=10000)
load data
infile data.dat
badfile sqldatatxtdata.bad
replace
into table LAM.CSV_DATA_TXT
fields terminated by ','
trailing nullcols
(ASSET, ASSET_CLASS, MATURITY, PRICE)
The table looks like such:
create table LAM.CSV_DATA_TXT (
ASSET VARCHAR2(20),
ASSET_CLASS VARCHAR2(50),
MATURITY varchar2(50),
PRICE NUMBER(12,8),
DATE_TIMESTAMP DATE default(SYSTIMESTAMP)
Any other ideas? Thanks.
Adding a TIMESTAMP column with a default value of SYSTIMESTAMP ought to worK:
SQL> create table t23
2 ( id number not null
3 , ts timestamp default systimestamp not null)
4 /
Table created.
SQL> insert into t23 (id) values (9999)
2 /
1 row created.
SQL> select * from t23
2 /
ID TS
---------- -------------------------
9999 25-APR-11 15.21.01.495000
SQL>
So you'll need to explain in greater detail why it doesn't work in your case.
I note that in your example you have created the column as a DATE datatype. This will mean it will truncate the defaulted SYSTIMESTAMP to the nearest second. If you want timestamp values you need to use the TIMESTAMP datatype.

Resources