Whitespaces coming in while exporting data from Teradata BTEQ file

Whitespaces coming in while exporting data from Teradata BTEQ file - shell

I have a BTEQ script which I'm running from shell (Ksh). Aim is to export the contents of a Teradata table to a .csv file. Problem is while exporting data too many white spaces are being introduced between columns. I have tried
1. Trimming individual columns
2. Using Cast to convert each column datatype in Char
but none of this seems to help.
BTEQ code looks something like this (I have used REPORT file since I need file headers.
.EXPORT REPORT FILE = exportfilepath.csv;
.SET SEPARATOR ",";
.SET TITLEDASHES OFF;
.set RECORDMODE OFF;
.set width 65531;
.SET ERRORLEVEL 3807 SEVERITY 0
select
trim('"' || trim(cast(col1 as char(256))) || '"') AS col1,
trim('"' || trim(cast(col2 as char(256))) || '"') AS col2,
trim(cast(col3 as INTEGER)) AS col3,
trim(cast(col4 as char(6))) AS col4,
trim(col5) AS col5,
trim(cast(col6 as decimal(18,2)) AS col6,
trim(date) AS date
from table A;
Col1 and Col2 are having lot of white spaces between them.Any help as to how I can remove those white spaces. What else can I do in this case? I cannot decrease the char size since these are names with variable sizes.
I have added '"' here because col1 and col2 are names with comma in between them. Since the exported .csv file needs to phrased the format is not proper.

REPORT format is for printing, i.e. fixed width plus separator.
To generate comma-delimted data without adding seperators and quoting better use CSV like this:
WITH CTE AS
( SELECT col1,col2,col3,col4.col5,col6,current_date as dt
FROM table A
)
SELECT str (title '')
FROM TABLE
(CSV(NEW VARIANT_TYPE(cte.col1,cte.col2,cte.col3
,cte.col4.cte.col5,cte.col6
,cte.dt), ',', '"'
) RETURNS (str varchar(32000) CHARACTER SET UNICODE)
) AS t1;
Or switch to TPT & DELIMITED format.

Related

Unequal length between strings after writing to a file - using same delimiter (tab)

I have a short procedure in PL/SQL which uses UTL_FILE package to create and then write to a .txt file.
Tab is saved in its own variable (v_delimiter), declared as varchar2(5), with a value of chr(9).
The header is saved as a string, concatenated with the v_delimiter and then written to a file.
After that, the rest of the data from an explicit cursor is also written to a file, line by line.
In the end, when I open the txt file, there are unequal widths between some of the strings which make up a header. There are also unequal widths between some of the data from the cursor inside a final .txt file and I guess that shouldn't be since I am using one and the same delimiter (tab) to create a header and to create a string from a cursor.
I am using UTL_FILE.put_line_nchar function to write a Unicode line to a file.
I tried without declaring a delimiter as a variable, using literally chr(9) when concatenating and it is always the same result.
I am out of ideas why is this happening in the final txt file.
v_file_handle UTL_FILE.file_type ;
v_output_path VARCHAR2 (100) := '/Path/to/File' ;
v_file_header VARCHAR2 (32767) ;
v_delimiter VARCHAR2 (5) := chr(9) ;
v_file_handle := UTL_FILE.fopen_nchar (v_output_path,'string_1' || TO_CHAR (SYSDATE, 'dd_mm_yyyy') || '.txt','w', 32767); -- opening
v_file_header :='claimFileIdentifier'|| v_delimiter || 'claimFileOpenedDate'
|| v_delimiter|| 'claimStatus'|| v_delimiter|| 'claimStatusDate'|| v_delimiter|| 'incidentDateTime'|| v_delimiter|| 'incidentPlace'|| v_delimiter|| 'calculationType' ... -- header
UTL_FILE.put_line_nchar ( v_file_handle, v_file_header ) ; --writing header to a file
FOR rec IN cursor_candidates --iterating over a cursor
LOOP
UTL_FILE.put_line_nchar (
v_file_handle,
rec.claimFileIdentifier
|| v_delimiter
|| rec.claimFileOpenedDate
|| v_delimiter
|| rec.claimStatus
|| v_delimiter
|| rec.claimStatusDate
|| v_delimiter
|| rec.incidentDateTime
|| v_delimiter
|| rec.incidentPlace
|| v_delimiter ... ) ; -- writing cursor rows to a file
END LOOP ;
UTL_FILE.fclose ( v_file_handle );
Final txt file and unequal width between certain strings

That's kind of expected, in my opinion. Values are separated by the TAB character, but it doesn't mean that output will look "nice" when you look at it as a text file.
For example, following values are separated by TAB, but they look ugly:
a b c
Littlefoot Scott Tiger
If you e.g. imported that file into Excel and set TAB as column separator, every value would be in its own column and output would look pretty.
If you wanted text file to look nice as well, you'll have to use a different approach, e.g. LPAD numeric values (IDs, salaries, ...), RPAD textual strings (names, addresses, ...), possibly SUBSTR (to cut long values short).

This relating to the functionality of the client application that you use to open the output tab-separated values (TSV) file.
If I have the data:
longtitle longtitle2 longtitle3
a b c
1234567890 123456789010234545 1234567788
and I open it in a basic text editor (such as Notepad) then the output is:
And the columns are unequal across the rows.
However, if I display the same data in an editor that supports TSV files (such as Notepad++) then it displays the same output with equal widths for the columns across the rows:
Importing the data into a spreadsheet application (such as Excel or OpenOffice), then the TSV is separated into cells and the output is:
And, again, the information is split into columns.
Do not change how you are outputting the file; instead, find a better way of viewing the output file with an application that supports formatting tab-separated values.

Create greenplum external table with multi character delimiter

Greenplum external table loading HDFS data, the data is as follows:
S1000001 ^ # ^ 200001 ^ # ^ 300001
S1000002 ^ # ^ 200002 ^ # ^ 300002
Separator is ^ # ^
In greenplum external table mode loading, can only use a single delimiter, is there any way to customize the delimiter? Best to have an example, thank you.
I tried to modify the greenplum source code, in the copy.c file, modify the following code, build the table can be successful, but the data is wrong.
/* single byte encoding such as ascii, latinx and other */
if (strlen(delim) != 1 && !delim_off)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("delimiter must be a single one-byte character, or \'off\'")));

Greenplum doesn't support multi-byte delimiters. You can do this trick instead. First, pick a character that doesn't exist in your data. In this example, I'll use '~' but it can be any character that doesn't exist in your data.
create external table ext_example
(data text)
location ('<storage stuff here>')
format 'text' (delimiter as '~');
Next, use split_part to extract the columns you want.
insert into target_table (col1, col2, col3)
select split_part(data, '^ # ^', 1) as col1,
split_part(data, '^ # ^', 2) as col2,
split_part(data, '^ # ^', 3) as col3
from ext_example;

removing EOL delimiter from inserting into external table -oracle

I have included notrim for rowdata column in external table as suggesterd by Alex (This is a continuation of this question,),
But now End of Line character is also appending at the rowdata column, I mean , End of line (CR-LF) is also joins at the end of rowdata.
I don't want to use substr() or translate() , since file size is around 1GB,
My external table creation process :
'CREATE TABLE ' || rec.ext_table_name || ' (ROW_DATA VARCHAR2(4000)) ORGANIZATION EXTERNAL ' ||
'(TYPE ORACLE_LOADER DEFAULT DIRECTORY ' || rec.dir_name || ' ACCESS ' || 'PARAMETERS (RECORDS ' ||
'DELIMITED by NEWLINE NOBADFILE NODISCARDFILE ' ||
'FIELDS REJECT ROWS WITH ALL NULL FIELDS (ROW_DATA POSITION(1:4000) char)) LOCATION (' || l_quote ||
'temp.txt' || l_quote || ')) REJECT LIMIT UNLIMITED'
Is there any other paramenter I can add , to remove the End-of-line character. Thanks.
EDIT 1:
My file :
Some first line with spaces at end
Some second line with spaces at end
My Ext table :
Some first line with spaces at end <EOL>
Some second line with spaces at end <EOL>
to be more clear , I will explain in java (when I assign column values to string , it is something like below),
without notrim :
rowdata[1]="Some first line with spaces at end";
rowdata[2]="Some second line with spaces at end";
with notrim:
rowdata[1]="Some first line with spaces at end \n";
rowdata[2]="Some second line with spaces at end \n";
what I want it to be :
rowdata[1]="Some first line with spaces at end ";
rowdata[2]="Some second line with spaces at end ";
the delimiter is also a part of rowdata, since no trim is specified.
EDIT2:
Line-Endings : CRLF
Platform :
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit
Production PL/SQL Release 12.1.0.1.0 - Production
"CORE 12.1.0.1.0 Production" TNS for Solaris: Version 12.1.0.1.0 -
Production NLSRTL Version 12.1.0.1.0 - Production
SELECT DUMP(ROW_DATA,1016) FROM EXT_TABLE WHERE ROWNUM = 1;
Typ=1 Len=616 CharacterSet=AL32UTF8:
41,30,30,30,30,30,30,30,30,30,30,31,30,30,30,30,37,36,36,36,44,30,30,30,30,31,32,35,30,38,31,36,32,35,30,38,31,36,31,33,34,37,30,39,44,42,20,41,30,36,31,30,30,30,30,30,30,30,30,30,30,30,30,32,30,30,4d,59,52,20,32,5a,20,30,31,36,30,30,30,31,32,31,32,33,34,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,52,49,42,46,50,58,30,30,30,31,30,30,30,30,30,30,30,30,31,30,36,32,38,30,31,30,32,30,30,47,20,20,20,20,53,20,20,30,30,30,30,30,30,30,30,30,30,30,20,20,20,20,20,20,20,4e,39,32,37,32,20,20,20,20,20,20,30,30,30,30,30,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,39,39,38,54,45,53,54,52,52,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,4f,50,44,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,54,52,41,4e,53,49,44,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,52,52,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,4f,50,44,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,54,45,53,54,54,52,41,4e,53,49,44,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,d
Len should be 615

Your file line endings are CRLF (suggesting the file is created in Windows?), but your database is running on Solaris. As the documentation says:
If DELIMITED BY NEWLINE is specified, then the actual value used is platform-specific. On UNIX platforms, NEWLINE is assumed to be "\n". On Windows operating systems, NEWLINE is assumed to be "\r\n".
As your database platform is Unix it's only using the LF (\n) as the record delimiter. You can either change the delimiter in your file, or change the terminated by clause to look for the Windows line-ending:
,,,
records delimited by "\r\n" nobadfile ...
If you might get files with either type of line ending and can't control that, you could add a preprocessor step to strip any that do exist. If you create an executable script file, either in the same directory as the file or (as Oracle recommends) in a different Oracle-accessible directory, say called remove_cr which contains:
/usr/bin/sed -e "s/\\r$//" $1
you can add a call to that in your external table definition, and keep the newline temrinator:
...
records delimited by newline nobadfile nodiscardfile
preprocessor 'remove_cr'
...
Make sure you read the the security warnings in the documentation though.
Demo with a temp.txt file with CRLF line endings:
create table t42_ext (
row_data varchar2(4000)
)
organization external
(
type oracle_loader default directory d42 access parameters
(
records delimited by newline nobadfile nodiscardfile
preprocessor 'remove_cr'
fields reject rows with all null fields
(
row_data position(1:4000) char notrim
)
)
location ('temp.txt')
)
reject limit unlimited;
select '<'|| row_data ||'>' from t42_ext;
'<'||ROW_DATA||'>'
--------------------------------------------------------------------------------
<Line1sometext >
<Line2sometext >
<Line3sometext >

Header formatting while spooling a csv file in sqlplus

I am required to spool a csv from a table in Oracle, using sqlplus. Following is the format required:
"HOST_SITE_TX_ID","SITE_ID","SITETX_TX_ID","SITETX_HELP_ID"
"664436565","16","2195301","0"
"664700792","52","1099970","0"
Following is the relevant piece of the shell script I wrote:
sqlplus -s $sql_user/$sql_password#$sid << eof >> /dev/null
set feedback off
set term off
set linesize 1500
set pagesize 11000
--set colsep ,
--set colsep '","'
set trimspool on
set underline off
set heading on
--set headsep $
set newpage none
spool "$folder$filename$ext"
select '"'||PCL_CARRIER_NAME||'","'||SITETX_EQUIP_ID||'","'||SITETX_SITE_STAT||'","'||SITETX_CREATE_DATE||'","'||ADVTX_VEH_WT||'"'
from cvo_admin.MISSING_HOST_SITE_TX_IDS;
spool off
(I have used some commented statements in, to signify the things that I tried but couldn't get to work)
The output I receive is:
'"'||PCL_CARRIER_NAME||'","'||SITETX_EQUIP_ID||'","'||SITETX_SITE_STAT||'","'||SITETX_CREATE_DATE||'","'||ADVTX_VEH_WT||'"'
"TRANSPORT INC","113","00000000","25-JAN-13 10.17.51 AM",""
"TRANSPORT INC","1905","00000000","25-JAN-13 05.06.44 PM","0"
Which shows that the header is messed up - it is literally printing the whole string that should have been interpreted as an sql statement, as is the case with the data displayed.
Options I am considering:
1) Using colsep
set colsep '","'
spool
select * from TABLE
spool off
This introduces other problems as the data having leading and trailing spaces, first and the last values in the files are not enclosed by quotes
HOST_SITE_TX_ID"," SITE_ID"
" 12345"," 16"
" 12345"," 21
I concluded that this method gives me more heartburn than the one I described earlier.
2) Getting the file and use a regex to modify the header.
3) Leaving the header altogether and manually adding a header string at the beginning of the file, using a script
Option 2 is more doable, but I was still interested in asking, if there might be a better way to format the header somehow, so it comes in a regular csv, (comma delimited, double quote bounded) format.
I am looking to do as less hard coding as possible - the table I am exporting has around 40 columns and I am currently running the script for around 4 million records - breaking them in a batch of around 10K each. I would really appreciate any suggestions, even totally different from my approach - I am a programmer in learning.

One easy way to have a csv with just one header is to do
set embedded on
set pagesize 0
set colsep '|'
set echo off
set feedback off
set linesize 1000
set trimspool on
set headsep off
the embedded is a hidden option but it is important to have JUST one header

This is how I created a header:
set heading off
/* header */
SELECT '"'||PCL_CARRIER_NAME||'","'||SITETX_EQUIP_ID||'","'||SITETX_SITE_STAT||'","'||SITETX_CREATE_DATE||'","'||ADVTX_VEH_WT||'"'
FROM
(
SELECT 'PCL_CARRIER_NAME' AS PCL_CARRIER_NAME
, 'SITETX_EQUIP_ID' AS SITETX_EQUIP_ID
, 'SITETX_SITE_STAT' AS SITETX_SITE_STAT
, 'SITETX_CREATE_DATE' AS SITETX_CREATE_DATE
, 'ADVTX_VEH_WT' AS ADVTX_VEH_WT
FROM DUAL
)
UNION ALL
SELECT '"'||PCL_CARRIER_NAME||'","'||SITETX_EQUIP_ID||'","'||SITETX_SITE_STAT||'","'||SITETX_CREATE_DATE||'","'||ADVTX_VEH_WT||'"'
FROM
(
/* first row */
SELECT to_char(123) AS PCL_CARRIER_NAME
, to_char(sysdate, 'yyyy-mm-dd') AS SITETX_EQUIP_ID
, 'value3' AS SITETX_SITE_STAT
, 'value4' AS SITETX_CREATE_DATE
, 'value5' AS ADVTX_VEH_WT
FROM DUAL
UNION ALL
/* second row */
SELECT to_char(456) AS PCL_CARRIER_NAME
, to_char(sysdate-1, 'yyyy-mm-dd') AS SITETX_EQUIP_ID
, 'value3' AS SITETX_SITE_STAT
, 'value4' AS SITETX_CREATE_DATE
, 'value5' AS ADVTX_VEH_WT
FROM DUAL
) MISSING_HOST_SITE_TX_IDS;

This is how you add a pipe delimited header to SQL statements. Once you spool it out that "something" wont be there
-- this creates the header
select 'header_column1|header_column2|header_column3' as something
From dual
Union all
-- this is where you run the actual sql statement with pipes in it
select
rev.value1 ||'|'||
rev.value2 ||'|'||
'related_Rel' as something
from
...

In Oracle 19 you can use set markup csv on to ensure that csv outputs are created.
You can also set the delimiter and optional quote or even spool html if you prefer
You can read more here
set markup csv on
spool "$folder$filename$ext"
select q'|wow, I can't beleive he said "hello, how are you?", can you beleive it!|' as text
from dual;
spool off
quit;

SQL*Loader: Dealing with delimiter characters in data

I am loading some data to Oracle via SQLLDR. The source file is "pipe delimited".
FIELDS TERMINATED BY '|'
But some records contain pipe character in data, and not as separator. So it breaks correct loading of records as it understands indata pipe characters as field terminator.
Can you point me a direction to solve this issue?
Data file is about 9 GB, so it is hard to edit manually.
For example,
Loaded row:
ABC|1234567|STR 9 R 25|98734959,32|28.12.2011
Rejected Row:
DE4|2346543|WE| 454|956584,84|28.11.2011
Error:
Rejected - Error on table HSX, column DATE_N.
ORA-01847: day of month must be between 1 and last day of month
DATE_N column is the last one.

You could not use any separator, and do something like:
field FILLER,
col1 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\1')",
col2 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\2')",
col3 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\3')",
col4 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\4')",
col5 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\5')",
col6 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\6')"
This regexp takes six capture groups (inside parentheses) separated by a vertical bar (I had to escape it because otherwise it means OR in regexp). All groups except the third cannot contain a vertical bar ([^|]*), the third group may contain anything (.*), and the regexp must span from beginning to end of the line (^ and $).
This way we are sure that the third group will eat all superfluous separators. This only works because you've only one field that may contain separators. If you want to proofcheck you can for example specify that the fourth group starts with a digit (include \d at the beginning of the fourth parenthesized block).
I have doubled all backslashes because we are inside a double-quoted expression, but I am not really sure that I ought to.

It looks to me that it's not really possible for SQL*Loader to handle your file because of the third field which: can contain the delimiter, is not surrounded by quotes and is of a variable length. Instead, if the data you provide is an accurate example then I can provide a sample workaround. First, create a table with one column of VARCHAR2 with length the same as the maximum length of any one line in your file. Then just load the entire file into this table. From there you can extract each column with a query such as:
with CTE as
(select 'ABC|1234567|STR 9 R 25|98734959,32|28.12.2011' as CTETXT
from dual
union all
select 'DE4|2346543|WE| 454|956584,84|28.11.2011' from dual)
select substr(CTETXT, 1, instr(CTETXT, '|') - 1) as COL1
,substr(CTETXT
,instr(CTETXT, '|', 1, 1) + 1
,instr(CTETXT, '|', 1, 2) - instr(CTETXT, '|', 1, 1) - 1)
as COL2
,substr(CTETXT
,instr(CTETXT, '|', 1, 2) + 1
,instr(CTETXT, '|', -1, 1) - instr(CTETXT, '|', 1, 2) - 1)
as COL3
,substr(CTETXT, instr(CTETXT, '|', -1, 1) + 1) as COL4
from CTE
It's not perfect (though it may be adaptable to SQL*Loader) but would need a bit of work if you have more columns or if your third field is not what I think it is. But, it's a start.

OK, I recomend you to parse the file and replace the delimiter.
In command line in Unix/linux you should do:
cat current_file | awk -F'|' '{printf( "%s,%s,", $1, $2); for(k=3;k<NF-2;k++) printf("%s|", $k); printf("%s,%s,%s", $(NF-2),$(NF-1),$NF);print "";}' > new_file
This command will not change your current file.
Will create a new file, comma delimited, with five fields.
It splits the input file on "|" and take first, second, anything to antelast, antelast, and last chunk.
You can try to sqlldr the new_file with "," delimiter.
UPDATE:
The command can be put in a script like (and named parse.awk)
#!/usr/bin/awk
# parse.awk
BEGIN {FS="|"}
{
printf("%s,%s,", $1, $2);
for(k=3;k<NF-2;k++)
printf("%s|", $k);
printf("%s,%s,%s\n", $(NF-2),$(NF-1),$NF);
}
and you can run in this way:
cat current_file | awk -f parse.awk > new_file

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Whitespaces coming in while exporting data from Teradata BTEQ file - shell

Related

Unequal length between strings after writing to a file - using same delimiter (tab)

Create greenplum external table with multi character delimiter

removing EOL delimiter from inserting into external table -oracle

Header formatting while spooling a csv file in sqlplus

SQL*Loader: Dealing with delimiter characters in data

Categories

Resources