Using SAS to insert rows containing nulls and blanks into database - insert

I have a db2 table DBTable with columns A, B, C (all of type varchar) which is linked to a library lib in SAS.
I use SAS to generate a dataset ValuesForA with one column whose content I want to write into the column A of DBTable with the additional requirement that the the column for B is filled with ' ' (blank) and the column for C with (null). So the DBTable should look something like this:
| A | B | C |
======================
| 'x' | ' ' | (null) |
| 'y' | ' ' | (null) |
| 'z' | ' ' | (null) |
I cannot find a way how to acchieve this as SAS as it treats blanks as null.
The simple approach specifying B as " " just fills this column with (null). I also tried to use the nullchar=no option and not specifying a value for C:
proc sql;
insert into lib.DBTable
(nullchar=no, A, B)
select
A, " " as B
from ValuesForA;
quit;
however the column C is then also filled with blanks
| A | B | C |
===================
| 'x' | ' ' | ' ' |
| 'y' | ' ' | ' ' |
| 'z' | ' ' | ' ' |

Try this:
proc sql;
insert into lib.DBTable
select
A, " " as B, null as C
from ValuesForA;
quit;
This gave me the results you requested using a DB2 temp table with three VARCHAR columns.

I posted the same question on communities.sas.com.
In the end, I used a solution proposed by s_lassen (you may check out his other one).
I give my own description of his solution here:
It seems that inserting blanks and nulls is not possible over the linked libraies. We can, however, write a SAS-program that produces an sql-statement which we can execute on the database server itself using pass-through sql.
This is all done by this little script:
/* *** Create a temporary document with the name "tempsas" *** */
Filename tempsas temp;
/* *** Loop throuhg ValuesForA and write into tempsas. *** */
/* (The resulting tempsas is basically an SQL insert statement. All observations are written in one big values statement) */
Data _null_;
file tempsas;
set ValuesForA end=done;
if _N_=1 then put
'rsubmit;' /
'proc sql;' /
' Connect to DB2(<connect options>);' /
' execute by DB2(' /
' insert into DBTable(A, B, C)' /
' values'
;
put
" ('" a +(-1) "', ' ', null)" #; /* "a" is an observation ValuesForA.a
"+(-1)" removes the trailing blank set by the put statement
"#" writes the next put statement in the same line */
if done then put
/
' );' /
'quit;' /
'endrsubmit;'
;
else put ',';
run;
/* *** Run the code in tempsas *** */
%include tempsas;
It creates a file "tempsas" in which it writes and executes the following code:
rsubmit;
proc sql;
Connect to DB2(<connect options>);
execute by DB2(
insert into DBTable(A, B, C)
values
('x', ' ', null),
('y', ' ', null),
('z', ' ', null)
);
quit;
endrsubmit;
I guess this solution is only feasible if there are not too many values to be inserted to the database.

Related

Unable to load data into hive table in correct format

I am trying to load the below table which is having two array typed columns in hive.
Base table:
Array<int> col1 Array<string> col2
[1,2] ['a','b','c']
[3,4] ['d','e','f']
I have created the table in hive as below:
create table base(col1 array<int>,col2 array<string>) row format delimited fields terminated by '\t' collection items terminated by ',';
And then loaded the data as below:
load data local inpath '/home/hduser/Desktop/batch/hiveip/basetable' into table base;
I have used below command:
select * from base;
I got the output as below
[null,null] ["['a'","'b'","'c']"]
[null,null] ["['d'","'e'","'f]"]
I am not getting the data in correct format.
Please help me out where I am getting wrong.
You can change the datatype of col1 array of string instead on array of int then you can get the data for col1.
With col1 datatype as Array(string):-
hive>create table base(col1 array<string>,col2 array<string>) row format delimited fields terminated by '\t' collection items terminated by ',';
hive>select * from base;
+--------------+------------------------+--+
| col1 | col2 |
+--------------+------------------------+--+
| ["[1","2]"] | ["['a'","'b'","'c']"] |
| ["[3","4]"] | ["['d'","'e'","'f']"] |
+--------------+------------------------+--+
Why is this behaviour because hive not able to detect the values inside array as integers as we are having 1,2 values enclosed in []
Accessing col1 elements:-
hive>select col1[0],col1[1] from base;
+------+------+--+
| _c0 | _c1 |
+------+------+--+
| [1 | 2] |
| [3 | 4] |
+------+------+--+
(or)
With col1 datatype as Array(int type):-
if you are thinking to don't want to change the datatype then you need to keep your input file as below without [] square brackets for array(i.e.col1) values.
1,2 ['a','b','c']
3,4 ['d','e','f']
then create table same as you mentioned in the question, then hive can detect the first 1,2 as array elements as int type.
hive> create table base(col1 array<int>,col2 array<string>) row format delimited fields terminated by '\t' collection items terminated by ',';
hive> select * from base;
+--------+------------------------+--+
| col1 | col2 |
+--------+------------------------+--+
| [1,2] | ["['a'","'b'","'c']"] |
| [3,4] | ["['d'","'e'","'f']"] |
+--------+------------------------+--+
Accessing array elements:-
hive> select col1[0] from base;
+------+--+
| _c0 |
+------+--+
| 1 |
| 3 |
+------+--+

Oracle - Find Best Match between Two tables

My team and I are curious to determine the best way we can match Two Different sets of data. There are no keys that can be joined on as this data as is coming from two separate sources that know nothing about each other. We import this data into two oracle tables and once that is done we can begin to look for matches.
Both Tables contain a full list of Properties(As in Real estate). We are needing to match up the Properties in Table1 to any potential matching Properties found in Table2. For each and every record in Table1 search Table2 for a potential match and determine the probability of the match. My team and I have decided that the best way to do this would be to compare the Address fields from each of the two tables.
The one catch is that Table1 provides the Address in a Parsed format and allocates the address number, address Street and even the Address_type into separate columns while Table2 only contains one column to hold the Address. Each table has City, State and Zip columns that can be compared individually.
For Example - See Below Table1 and Table2:
Notice that the Primary Keys in my pseudo tables below are Key1 and Key2 matching the tables they are in.
+---------------+---------------+---------------+---------------+---------------+-------+-------+
+ + TABLE1 + + + + + +
+---------------+---------------+---------------+---------------+---------------+-------+-------+
| Key1 | Addr_Number | Addr_Street | Addr_Type | City | State | Zip |
+---------------+---------------+---------------+---------------+---------------+-------+-------+
| 1001 | 148 | Panas | Road | Robinson | CA | 76050 |
| 1005 | 110 | 48th | Street | San Juan | NJ | 8691 |
| 1009 | 8571 | Commerce | Loop | Vallejo | UT | 83651 |
| 1059 | 714 | Nettleton | Avenue | Vista | TX | 29671 |
| 1185 | 1587 | Orchard | Drive | Albuquerque | PA | 77338 |
+---------------+---------------+---------------+---------------+---------------+-------+-------+
+---------------+----------------------+---------------+---------------+---------------+
+ + TABLE2 + + + +
+---------------+----------------------+---------------+---------------+---------------+
| Key2 | Address | City | State | Zip |
+---------------+----------------------+---------------+---------------+---------------+
| Ax89f | 148 Panas Road | Robinson | CA | 76050 |
| B184a | 110 48th Street | San Juan | NJ | 08691 |
| B99ff | 8571 Commerce Lp | Vallejo | UT | 83651 |
| D81bc | 714 Nettleton Ave | Vista | TX | 29671 |
| F84a2 | 1587 Orachard Dr | Albuquerqu | PA | 77338 |
+---------------+----------------------+---------------+---------------+---------------+
The goal here is to provide an output to the user that simply displays ALL of the records from Table1 and the highest matched record found in Table2. There could of course be many records that are found that could be a potential match but we want to keep this a one to one relationship and not produce Duplicates in this initial output. The output should just be One Record out of Table one matched to the best find in Table2.
See below an example of the Desired output I am attempting to create:
+--------+-------+----------------+---------------------------+
+ + + Matched_Output + +
+--------+-------+----------------+---------------------------+
| Key1 | Key2 | Percent_Match | num_Matched_Records > 90% |
+--------+-------+----------------+---------------------------+
| 1001 | Ax89f | 100% | 5 | --All Parsed Values Match
| 1005 | B184a | 98% | 4 | --Zip Code prefixed with Zero in Table 2
| 1009 | B99ff | 95% | 3 | --Loop Vs Lp
| 1059 | D81bc | 95% | 2 | --Avenue Vs Ave
| 1185 | F84a2 | 97% | 2 | --City Spelled Wrong in Table 2 and Drive vs Dr
+--------+-------+----------------+---------------------------+
In the output I want to see Key1 from Table1 and the matched record right next to it showing that it matches to the record in Table2 to Key2. Next we are needing to know how well these two records match. There could be many records in Table2 that show a probability to matching a records in Table1. In fact every single record in Table2 can be assigned a percentage all the way from 0% up to a 100% match.
So now to the main question:
How does one obtain this percentage?
How do I Parse the Address column in Table2 so that I can compare each of the individual columns that make up the address in Table1 and then apply comparison algorithm on each parsed value?
So far this is what my team and myself have come up with (Brainstorming, Spitballin, whatever you want to call it).
We have taken a look at a couple of the built in Oracle Functions to obtain the percentages we are looking for as well as trying to utilize Regular Expressions. If I could hit up Google and get some of their Search Algorithms I would. Obviously I don't have that luxury and must design my own.
regexp_count(table2_city,'(^| )'||REPLACE(table1_city,' ','|')||'($| )') city_score,
regexp_count(table2_city,'(^| )') city_max,
to_char((city_score/city_max)*100, '999G999G999G999G990D00')||'%' city_perc,
The above was just what my team and I used as a proof of concept. We have simply selected these values out of the two tables and run the 'regexp_count' function against that columns. Here are a few other functions that we have taken a look at:
SOUNDEX
REGEXP_LIKE
REGEXP_REPLACE
These functions are great but I'm not sure they can be used in a Single Query between both tables to produce the desired output.
Another idea is that we could create a Function() that takes as its parameters the Address fields we are wanting to use to compare. That function would then search Table2 for the highest probable match and return back to the user the Key2 value out of Table2.
Function(Addr_Number, Addr_Street, Addr_type, City, State) RETURN table2.key2
For example maybe something like this 'could' work:
Select tb1.key1, table2Function(tb1.Addr_Number, tb1.Addr_Street, tb1.Addr_type, tb1.City, tb1.State) As Key2
From Table1 tb1;
Lastly, just know that there is roughly 15k records currently in Table1 and 20k records in Table2. Again... each record in Table 1 needs to be checked against each record in Table 2 for a potential match.
I'm all ears. And thank you in advance for your feedback.
Use the UTL_MATCH package:
Oracle Setup:
CREATE TABLE Table1 ( Key1, Addr_Number, Addr_Street, Addr_Type, City, State, Zip ) AS
SELECT 1001, 148, 'Panas', 'Road', 'Robinson', 'CA', 76050 FROM DUAL UNION ALL
SELECT 1005, 110, '48th', 'Street', 'San Juan', 'NJ', 8691 FROM DUAL UNION ALL
SELECT 1009, 8571, 'Commerce', 'Loop', 'Vallejo', 'UT', 83651 FROM DUAL UNION ALL
SELECT 1059, 714, 'Nettleton', 'Avenue', 'Vista', 'TX', 29671 FROM DUAL UNION ALL
SELECT 1185, 1587, 'Orchard', 'Drive', 'Albuquerque', 'PA', 77338 FROM DUAL;
CREATE TABLE Table2 ( Key2, Address, City, State, Zip ) AS
SELECT 'Ax89f', '148 Panas Road', 'Robinson', 'CA', '76050' FROM DUAL UNION ALL
SELECT 'B184a', '110 48th Street', 'San Juan', 'NJ', '08691' FROM DUAL UNION ALL
SELECT 'B99ff', '8571 Commerce Lp', 'Vallejo', 'UT', '83651' FROM DUAL UNION ALL
SELECT 'D81bc', '714 Nettleton Ave', 'Vista', 'TX', '29671' FROM DUAL UNION ALL
SELECT 'F84a2', '1587 Orachard Dr', 'Albuquerqu', 'PA', '77338' FROM DUAL;
Query:
SELECT Key1,
Key2,
UTL_MATCH.EDIT_DISTANCE_SIMILARITY(
A.Addr_Number || ' ' || A.Addr_Street || ' ' || A.Addr_Type
|| ' ' || A.City || ' ' || A.State || ' ' || A.Zip,
B.Address || ' ' || B.City || ' ' || B.State || ' ' || B.Zip
) AS Percent_Match,
CASE WHEN UTL_MATCH.EDIT_DISTANCE_SIMILARITY(
A.Addr_Number || ' ' || A.Addr_Street || ' ' || A.Addr_Type,
B.Address
) >= 90
THEN 1
ELSE 0
END
+
CASE WHEN UTL_MATCH.EDIT_DISTANCE_SIMILARITY( A.City, B.City ) >= 90
THEN 1
ELSE 0
END
+
CASE WHEN UTL_MATCH.EDIT_DISTANCE_SIMILARITY( A.State, B.State ) >= 90
THEN 1
ELSE 0
END
+
CASE WHEN UTL_MATCH.EDIT_DISTANCE_SIMILARITY( A.Zip, B.Zip ) >= 90
THEN 1
ELSE 0
END AS Num_Matched
FROM Table1 A
INNER JOIN
Table2 B
ON ( SYS.UTL_MATCH.EDIT_DISTANCE_SIMILARITY(
A.Addr_Number || ' ' || A.Addr_Street || ' ' || A.Addr_Type
|| ' ' || A.City || ' ' || A.State || ' ' || A.Zip,
B.Address || ' ' || B.City || ' ' || B.State || ' ' || B.Zip
) > 80 );
Output:
KEY1 KEY2 PERCENT_MATCH NUM_MATCHED
---------- ----- ------------- -----------
1001 Ax89f 100 4
1005 B184a 97 3
1009 B99ff 95 3
1059 D81bc 92 3
1185 F84a2 88 3
A few thoughts.
First, you may want to take a look at the utl_match package:
https://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm
Then: you surely will want to match by ZIP code and state first. Perhaps adding leading zeros to ZIP code where needed - although apparently one of your concerns is typos, not just different packaging of the input data. If there are typos in the ZIP code you can more or less deal with that, but if there are typos in the state that really sucks.
You may want to score the similarity by city, but often that won't help. For example, for all practical purposes Brooklyn, NY should be seen as matching New York City, NY but there's no way you can do that in your project. So I would put a very low weight on matching by city.
Similar comment about the address type; perhaps you can create a small table with equivalencies, such as Street, Str, Str. or Lane, Ln, Ln. But the fact is often people are not consistent when they give you an address; they may say "Clover Street" to one source and "Clover Avenue" to another. So you may be better off comparing only the street number and the street name.
Good luck!

How to access parameters inside a Netezza stored procedure?

This is my stored procedure:
nzsql -u user -pw pass -c "CREATE OR REPLACE PROCEDURE INSERT_LOGIC(varchar(50),varchar(20),varchar(40)) RETURNS BOOL LANGUAGE NZPLSQL AS BEGIN_PROC
DECLARE
t1 ALIAS FOR $1;
t2 ALIAS FOR $2;
t3 ALIAS FOR $3;
BEGIN
INSERT INTO ABC..XYZ
(select '$t1','$t2','$t3' from ABC..PQR limit 10);
END;
END_PROC;"
The ALIAS FOR is the only way I found on the internet to do this but I get the following error:
NOTICE: plpgsql: ERROR during compile of INSERT_LOGIC near line 3
ERROR: syntax error, unexpected ERROR, expecting VARIABLE or WORD at or near "t1Stuff"
How do I access the three "varchar variables" that I pass to the stored procedure inside the same?
Here is an example similar to your requirement and its working. I am using two tables 'tab1' and 'tab2' with following description:
$ nzsql -d test -c "\d tab1"
Table "TAB1"
Attribute | Type | Modifier | Default Value
-----------+---------------+----------+---------------
COL1 | INTEGER | |
COL2 | CHARACTER(10) | |
COL3 | INTEGER | |
Distributed on hash: "COL1"
$ nzsql -d test -c "\d tab2"
Table "TAB2"
Attribute | Type | Modifier | Default Value
-----------+---------------+----------+---------------
C1 | INTEGER | |
C2 | CHARACTER(10) | |
C3 | INTEGER | |
Distributed on hash: "C1"
Following is the stored procedure code that I used:
CREATE OR REPLACE PROCEDURE INSERT_LOGIC(varchar(50),varchar(20),varchar(40))
RETURNS BOOL
LANGUAGE NZPLSQL
AS
BEGIN_PROC
DECLARE
num_args int4;
sql char(100);
t1 ALIAS FOR $1;
t2 ALIAS FOR $2;
t3 ALIAS FOR $3;
BEGIN
num_args := PROC_ARGUMENT_TYPES.count;
RAISE NOTICE 'Number of arguments: %', num_args;
sql := 'INSERT INTO tab2 SELECT ' || t1 || ',' || t2 || ',' || t3 || ' FROM tab1 LIMIT 10 ';
RAISE NOTICE 'SQL Statement: %', sql;
EXECUTE IMMEDIATE sql;
END;
END_PROC;
Hope this will help!
You're attempting to reference variables by putting a $ in front of the name, which is not valid.
Look at the example in the docs.
DECLARE
logtxt ALIAS FOR $1;
curtime timestamp;
BEGIN
curtime := 'now()';
INSERT INTO logtable VALUES (logtxt, curtime);
RETURN curtime;
END
You should try
INSERT INTO ABC..XYZ
(select t1, t2, t3 from ABC..PQR limit 10);
Though it's possible that the column values won't resolve when used this way. If not, build a dynamic statement and execute it instead.
declare sql varchar;
sql := 'insert into abc..xyz select ' || t1 || ',' || t2 || ',' || t3 || ' from abc..pqr limit 10;'
execute immediate sql;
If you're passing values, not column names, as parameters:
declare sql varchar;
sql := 'insert into abc..xyz select ''' || t1 || ''',''' || t2 || ''',''' || t3 || ''' from abc..pqr limit 10;'
execute immediate sql;

ORA-00920 when running query through UI Framework

My company works with a UI Framework written in PLSQL which interacts with a Java client program.
A lot of queries need to be passed as a VARCHAR2 string to it to be called when needed.
The query
SELECT DISTINCT cod_visum
FROM tbl_visum
WHERE num_firma = 0
AND (code_visum BETWEEN 1 AND 799 OR code_visum BETWEEN 900 AND 999)
AND id_worker IS NULL
OR ( date_valid_to IS NOT NULL
AND p_date_from > NVL (date_valid_to, p_date_from + 1))
ORDER BY 1;
runs fine and returns the expected results when run from the TOAD editor, but when I pass it as a VARCHAR2 to the framework, I keep getting an ORA-00920 (Invalid relational operator) but can't find the cause. (Also the framework catches the exception and only shows me a dialog with the exception number and text >.<)
I've tried several methods of concatinating the variable p_date_from into the VARCHAR2, like
v_sql_norm VARCHAR2 (450)
:= 'SELECT DISTINCT code_visum
FROM tbl_visum
WHERE num_firma = 0
AND (code_visum BETWEEN 1 AND 799 OR code_visum BETWEEN 900 AND 999)
AND id_worker IS NULL
OR ( date_valid_to IS NOT NULL AND '
|| p_date_from
|| ' > NVL(date_valid_to , '
|| (p_date_from + 1)
|| ')) ORDER BY 1';
I already checked the date formats, tried to convert the dates to a string before concatination but everything results in the same exception.
The table used in the query looks like this:
+--------------------+-------+-------------------+
| Column Name | NULL? | DATA TYPE |
+--------------------+-------+-------------------+
| ID_VISUM_ASSIGMENT | N | NUMBER (12) |
| NUM_FIRMA | N | NUMBER (3) |
| CODE_VISUM | N | VARCHAR2 (3 Char) |
| ID_WORKER | Y | NUMBER (10) |
| DATE_VALID_FROM | Y | DATE |
| DATE_VALID_TO | Y | DATE |
+--------------------+-------+-------------------+
Side info: a visum (or visa?) is assigned to a worker starting at a specific date (date_valid_from). It can be assigned to a certain date (date_valid_to), or indefinitely assigned (date_valid_to is NULL).
Thank you in advance, really appreciate any help! :)
edit: yes, cod_visum is a VARCHAR2 but in the query it's used as NUMBER but I also already tried casting it to a number (and mostly it's casted explicitely ^^)
This bit is wrong: the fourth line mixes variable with boilerplate.
OR ( date_valid_to IS NOT NULL AND '
|| p_date_from
|| ' > NVL(date_valid_to , '
|| (p_date_from + 1)
It should be something like
OR ( date_valid_to IS NOT NULL AND '
|| p_date_from
|| ' > NVL(date_valid_to , ('
|| p_date_from
|| ' + 1)'
Also the logic seems wonky, but that's an aside.

vsql/vertica, how to copy text input file's name into destination table

I have to copy a input text file (text_file.txt) to a table (table_a). I also need to include the input file's name into the table.
my code is:
\set t_pwd `pwd`
\set input_file '\'':t_pwd'/text_file.txt\''
copy table_a
( column1
,column2
,column3
,FileName :input_file
)
from :input_file
The last line does not copy the input text file name in the table.
How to copy the input text file's name into the table? (without manually typing the file name)
Solution 1
This might not be the perfect solution for your job but i think will do the job :
You can get the table name and store it in a TBL variable and next add this variable at the end of each line in the CSV file that you are about to load into Vertica.
Now depending on your CSV file size this can be quite time and CPU consuming.
export TBL=`ls -1 | grep *.txt` | sed -e 's/$/,'$TBL'/' -i $TBL
Example:
[dbadmin#bih001 ~]$ cat load_data1
1|2|3|4|5|6|7|8|9|10
[dbadmin#bih001 ~]$ export TBL=`ls -1 | grep load*` | sed -e 's/$/|'$TBL'/' -i $TBL
[dbadmin#bih001 ~]$ cat load_data1
1|2|3|4|5|6|7|8|9|10||load_data1
Solution 2
You can use a DEFAULT CONSTRAINT, see example:
1. Create your table with a DEFAULT CONSTRAINT
[dbadmin#bih001 ~]$ vsql
Password:
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
dbadmin=> create table TBL (id int ,CSV_FILE_NAME varchar(200) default 'TBL');
CREATE TABLE
dbadmin=> \dt
List of tables
Schema | Name | Kind | Owner | Comment
--------+------+-------+---------+---------
public | TBL | table | dbadmin |
(1 row)
See the DEFAULT CONSTRAINT it has the 'TBL' default value
dbadmin=> \d TBL
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+---------------+--------------+------+---------+----------+-------------+-------------
public | TBL | id | int | 8 | | f | f |
public | TBL | CSV_FILE_NAME | varchar(200) | 200 | 'TBL' | f | f |
(2 rows)
2. Now setup your COPY variables
- insert some data and alter the DEFAULT CONSTRAINT value to your current :input_file value.
dbadmin=> \set t_pwd `pwd`
dbadmin=> \set CSV_FILE `ls -1 | grep load*`
dbadmin=> \set input_file '\'':t_pwd'/':CSV_FILE'\''
dbadmin=>
dbadmin=>
dbadmin=> insert into TBL values(1);
OUTPUT
--------
1
(1 row)
dbadmin=> select * from TBL;
id | CSV_FILE_NAME
----+---------------
1 | TBL
(1 row)
dbadmin=> ALTER TABLE TBL ALTER COLUMN CSV_FILE_NAME SET DEFAULT :input_file;
ALTER TABLE
dbadmin=> \dt TBL;
List of tables
Schema | Name | Kind | Owner | Comment
--------+------+-------+---------+---------
public | TBL | table | dbadmin |
(1 row)
dbadmin=> \d TBL;
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+---------------+--------------+------+----------------------------+----------+-------------+-------------
public | TBL | id | int | 8 | | f | f |
public | TBL | CSV_FILE_NAME | varchar(200) | 200 | '/home/dbadmin/load_data1' | f | f |
(2 rows)
dbadmin=> insert into TBL values(2);
OUTPUT
--------
1
(1 row)
dbadmin=> select * from TBL;
id | CSV_FILE_NAME
----+--------------------------
1 | TBL
2 | /home/dbadmin/load_data1
(2 rows)
Now you can implement this in your copy script.
Example:
\set t_pwd `pwd`
\set CSV_FILE `ls -1 | grep load*`
\set input_file '\'':t_pwd'/':CSV_FILE'\''
ALTER TABLE TBL ALTER COLUMN CSV_FILE_NAME SET DEFAULT :input_file;
copy TBL from :input_file DELIMITER '|' DIRECT;
Solution 3
Use the LOAD_STREAMS table
Example:
When loading a table give it a stream name - this way you can identify the file name / stream name:
COPY mytable FROM myfile DELIMITER '|' DIRECT STREAM NAME 'My stream name';
*Here is how you can query your load_streams table :*
=> SELECT stream_name, table_name, load_start, accepted_row_count,
rejected_row_count, read_bytes, unsorted_row_count, sorted_row_count,
sort_complete_percent FROM load_streams;
-[ RECORD 1 ]----------+---------------------------
stream_name | fact-13
table_name | fact
load_start | 2010-12-28 15:07:41.132053
accepted_row_count | 900
rejected_row_count | 100
read_bytes | 11975
input_file_size_bytes | 0
parse_complete_percent | 0
unsorted_row_count | 3600
sorted_row_count | 3600
sort_complete_percent | 100
Makes sense ? Hope this helped !
If you do not need to do it purely from inside vsql, it might possible to cheat a bit, and export the logic outside Vertica, in bash for example:
FILE=text_file.txt
(
while read LINE; do
echo "$LINE|$FILE"
done < "$FILE"
) | vsql -c 'copy table_a (...) FROM STDIN'
That way you basically COPY FROM STDIN, adding the filename to each line before it even reaches Vertica.

Resources