vertica executing copy command with default value - vertica

I'm trying to run copy command that populate the db based on concatenation of the csv, but one column needs to be hardcoded.
Table columns names are:
col1,col2,col3
File content is (just the numbers, names are the db column names):
1234,5678,5436
What i need is a way to insert data say like this: based on my example:
I want to put in the db:
col1 col2 col3
1234 5678 10
Notice: 10 as hardcoded, ignoring the real value of col3 at db
Should I use FILLER? if so what is the command?
my starting point is:
COPY SAMPLE.MYTABLE (col1,col2,col3)
FROM LOCAL 'c:\\1\\test.CSV'
UNCOMPRESSED DELIMITER ',' NULL AS 'NULL' ESCAPE AS '\' RECORD TERMINATOR ' ' ENCLOSED BY '"' DIRECT STREAM NAME 'Identifier_0' EXCEPTIONS 'c:\\1\\test.exceptions'
REJECTED DATA 'c:\\1\\test.rejections' ABORT ON ERROR NO COMMIT;
Can you help how to load those columns (basically col3)?
Thanks

You need to just use a dummy filler to parse (but ignore) the 3rd value in your csv. Then you need to use AS to do an expression to assign the third table column to a literal.
I've added it to your COPY below. However, I'm not sure I understand your RECORD TERMINATOR setting. I'd look at that a little closer. Perhaps you had a copy/paste issue or something.
COPY SAMPLE.MYTABLE (col1, col2, dummy FILLER VARCHAR, col3 AS 10)
FROM LOCAL 'c:\1\test.CSV' UNCOMPRESSED DELIMITER ','
NULL AS 'NULL' ESCAPE AS '\' RECORD TERMINATOR ' '
ENCLOSED BY '"' DIRECT STREAM NAME 'Identifier_0'
EXCEPTIONS 'c:\1\test.exceptions' REJECTED DATA 'c:\1\test.rejections'
ABORT ON ERROR NO COMMIT;

Related

Importing CSV via ORACLE SQL LOADER, line break in double-quote

I'm trying to import CSV via Oracle SQL*LOADER, but I have a problem because some data has line break within the double-quotes. For example
"text.csv"
John,123,New York
Tom,456,Paris
Park,789,"Europe
London, City"
I think that SQL*LOADER uses the line break character to separate records.
This data generates an error "second enclosure string not present"
I use this control file.
(control.txt)
OPTIONS(LOAD=-1, ERRORS=-1)
LOAD DATA
INFILE 'test.csv'
TRUNCATE
INTO TABLE TMP
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(
field1,
field2,
field3
)
and I execute a command like this
sqlldr 'scott/tiger' control='control.txt' log='result.txt'
I want to import 3 records, not 4 records...
can SQL*LOADER ignore line breaks within double-quotes?
Seems you need to get rid of carriage return and/or line feed characters.
So replace
field3
with
field3 CHAR(4000) "REPLACE(TRIM(:field3),CHR(13)||CHR(10))"'
or
field3 CHAR(4000) "REPLACE(REPLACE(TRIM(:field3),CHR(13)),CHR(10))"'
where using a TRIM() would be useful in order to remove trailing and leading whitespaces.
In case you would like to preserve the embedded carriage returns, construct the control file like this using the "str" (stream) clause on the infile option line to set the end of record character. It tells sqlldr that hex 0D (carriage return, or ^M) is the record separator (this way it will ignore the linefeeds inside the double-quotes):
INFILE 'test.csv' "str x'0D'"

How to Insert parameter from Concurrent Program(.prog file) into a table using sql*loader control file created dynamically

I have the .prog file from a host program created in oracle apps. I am sending a parameter from oracle apps with host program and I can access it in the .prog file like this e.g.
echo "5 Concurrent Program Parameter 1 : " ${5}
I need to use this parameter ($5) into the control file (.ctl) where I will insert some columns and this parameter into a new table. e.g
LOAD DATA
INSERT INTO TABLE TABLE_NAME
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
COL1,
COL2,
DATA_FROM_PROG (5) => ** here i need to insert that data from the .prog file**
)
I am thinking it would have to be included in this command somehow so it creates this control file or another dynamically but I can't figure out how to send that parameter and make this work.
I am familiar with this line that I used in the past for simpler problems
e.g.sqlldr userid=user/pass data=$5 control=control.ctl
Thanks.
I wouldn't know, as I don't know anything about Oracle apps. nor ".prog" files.
Workaround - from my perspective - would be to
load only known data (from the source file)
data_from_prog would be specified as a filler field (and populated with NULL values (if trailing nullcols is specified))
after loading session is over, update that column from Oracle apps. - then you'd use a simple update statement; you're in (PL/)SQL world, it is easy to write such a query (at least, I hope so)
Using Bash Script in the .prog file to create the control file (.ctl) dynamically from scratch seems to be working and I can use the parameters as well.
So in the .prog file we would have:
echo "5 Concurrent Program Parameter 1 : " ${5} /*this is only to test it*/
/* *Printf* with *>* command will create and edit a file.
Alternative *Printf* with *>>* would append to the file*/
printf "LOAD DATA\n
INFILE 'path_to_csv_file.csv'\n /*this is data for col1, col2 etc*/
INSERT INTO TABLE TABLE_NAME\n
FIELDS TERMINATED BY \',\' OPTIONALLY ENCLOSED BY \'\"\'\n
TRAILING NULLCOLS\n
(COL1,\n
COL2,\n
DATA_FROM_PROG CONSTANT ${5})" > [name and path to control file (e.g./folder/control.ctl)]
This way, when the .prog file is executed it will Dynamically create the .ctl file which will have the parameter that we want (${5}).
And we can also add something like this to run the .ctl file
sqlldr userid=user/pass control=[path_to_control]control.ctl log=track.log
Also make sure to escape the quotes ' and double quotes " with \ because you will get some errors otherwise.

String and non string data getting converted to 'null' for empty fields while exporting into Oracle table through hive

I am new to Hadoop and I have a scenario where I have to export the dataset/file from HDFS to Oracle table using sqoop export. The file has values of 'null' in it so same is getting exported in table as well. I want to know how we can replace 'null' with blank in database while exporting?
You can create a TSV file from hive/beeline in that process you can add nulls to be blank with this --nullemptystring=true
Example : beeline -u ${hhiveConnectionString} --outputformat=csv2 --showHeader=false --silent=true --nullemptystring=true --incremental=true -e 'set hive.support.quoted.identifiers =none; select * from someSchema.someTable where whatever > something' > /Your/Local/Location or EdgeNode/exportingfile.tsv
You can use the created file in the sqoop export for exporting to Oracle table.
You can also replace the nulls with blanks on the file with Unix sed
Ex : sed -i s/null//g /Your/file//Your/Local/Location or EdgeNode/exportingfile.tsv
In oracle empty strings and nulls are treated the same for varchars. That is why Oracle internally converts empty strings into nulls for varchar. When '' assigned to a char(1) it becomes ' ' (char types are blank padded strings). See what Tom Kite says about this: https://asktom.oracle.com/pls/asktom/f?p=100:11:0%3a%3a%3a%3aP11_QUESTION_ID:5984520277372
See this manual: https://www.techonthenet.com/oracle/questions/empty_null.php

Remove spaces and UTF while writing hive table into HDFS files

I am trying to write the hive table into hdfs file using following queries
insert overwrite directory '<HDFS Location>' select customerid,'\t' ,f1,',', f2,',', f3,',', f4,',', f5 from sd_cust_product_recomm_all_emailid_model2 WHERE EMAILID IS NOT NULL;
I am getting the UTF and spaces in the file . The output is somthing like this :
customer1\t^Af1^A,^Af2^A,^Af3^A,^Af4^A,^Af5^A,
I desired output in following format
customer1/tf1,f2,f3,f4,f5
customer2/tf1,f2,f3,f4,f5
with no spaces and UTF
Thanks for the help
The default delimiter is the issue. Data written to the filesystem is serialized as text with columns separated by ^A.
By explicitly mentioning the Field delimiter(Comma) and Row delimiter(\n) you can overcome the issue.
insert overwrite directory '[HDFS Location]' ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' select
customerid,'\t',f1,f2,f3,f4,f5 from
sd_cust_product_recomm_all_emailid_model2 WHERE EMAILID IS NOT NULL;

How to handle a delimiter in Hive

How do we handle a data in Hive when the \t is in the value and the delimiter is also \t. Suppose for example there is a column as Street, data type as String and value as XXX\tYYY and while creating a table we have used the field delimiter as \t. How will the delimiter work? In this case will the \t in the value will also be delimited?
If your columns with \t values are enclosed by quote character like " the you could use csv-serde to parse the data like this:
Here is a sample dataset that I have loaded:
R1Col1 R1Col2 "R1Col3 MoreData" R1Col4
R2Col2 R2Col2 "R2Col3 MoreData" R2Col4
Register the jar from hive console
hive> add jar /path/to/csv-serde-1.1.2-0.11.0-all.jar;
Create a table with the specified serde and custom properties
hive> create table test_table(c1 string, c2 string, c3 string, c4 string)
> row format serde 'com.bizo.hive.serde.csv.CSVSerde'
> with serdeproperties(
> "separatorChar" = "\t",
> "quoteChar" = "\"",
> "escapeChar" = "\\"
> )
> stored as textfile;
Load your dataset into the table:
hive> load data inpath '/path/to/file/in/hdfs' into table test_table;
Do a select * from test_table to check the results
You could download the csv-serde from here.
It will treat it as a delimiter, yes, same as if you had a semicolon ; in the value and told it to split on semicolon - when the text is scanned, it will see the character and interpret it as the edge of the field.
To get around this, I used sed to find-and-replace characters before loading it into Hive, or I created the Hive table with different delimiters, or left it at the default ^A, or \001, and then, when I extracted it, used sed on the output to replace the \001 with commas or tabs or whatever I needed. Running sed -i 's/oldval/newval/g' file on the command line will replace the characters in your file in place.
Is there a reason you chose to make the table with \t as the delimiter, instead of the default Hive field delimiter of ^A? Since tab is a fairly common character in text, and Hadoop/Hive is used a lot for handling text, it is tough to find a good character for delimiting.
We have faced the same in our data load into hadoop clusters. What we did, added \\t whenever we saw the delimiter is included within a data fields and added the below in the table definition.
Row format delimited fields terminated by \t escaped by \\ lines terminated by \n

Resources