Avoiding skipped null columns when using terminated and optionally enclosed delimiters - oracle

I regularly upload various tab-delimited text data files with SQL Loader. The control files always specify TERMINATED BY X'09'. Certain "cells" in those data files may be null, i.e. there is no character between two subsequent tabs. It always works like a clock.
Now, I have run into a specific case where I have to strip a data column of double quotes that may or may not enclose the actual data (side effect of a text export from Excel).
I tried simply adding OPTIONALLY ENCLOSED BY '"' behind the terminated delimiter and it did work for the file and column in question. Still, with this new option, files with null values are no longer decoded correctly. The loader seems to simply skip those values, which provokes column shifts and results in either a corrupt load, or a load failure.
For now, I've got a workaround dropping the new option and executing an SQL script eliminating double quotes directly in the database, but that obviously cannot last.
Any ideas?

Related

Matillion S3 load component Issue

I am loading data from S3 bucket to Landing table via S3 Load Component using Matillion ETL tool.
I have one records like below
000D3A8B328E|"Rila Borovets" AD||83634A3C|DDFS
Filed separator is | and while loading this records I am getting following error "err code: 1214 Delimited value missing end quote"
I want to load values as it coming from source without removing double quotes.
I have million of records and few are with double quotes and few are not. I am having issue for only those records which are having double quotes.
How do i handle this scenario.?
You don't say which Matillion ETL product you're using there (could be Matillion ETL for Snowflake, for Redshift or for Delta Lake on Databricks).
If you're targeting Snowflake, choose File Type CSV then the only non-default parameter you would need to set in your S3 Load component is to change the Field Delimiter to |
If you're targeting Redshift, choose Data File Type Delimited, and again the only non default parameter is setting the Delimiter to |

NiFi ReplaceText Processor inserting Empty strings

I'm trying to convert Fixed width file text file to pipe delimited text file. I'm using NiFi's ReplaceText Processor for doing the same. These are my processor configurations
Replacement Strategy-Regex Replace
Evaluation Mode-Line-by-Line
Line-by-Line Evaluation Mode-All
Search Value- (.{1})(.{4})(.{16})(.{11})(.{14})(.{1})(.{8})(.{16})(.{9})(.{19})(.{5})(.{14})(.{4})(.{33})
replacement Value- ${'$1':trim()}${literal('|'):unescapeXml()}${'$3':trim()}${literal('|'):unescapeXml()}${'$4':trim()}${literal('|'):unescapeXml()}${'$5':toDecimal()}${literal('|'):unescapeXml()}${'$8':trim()}${literal('|'):unescapeXml()}${'$9':trim():toNumber()}${literal('|'):unescapeXml()}${'$10':trim()}${literal('|'):unescapeXml()}${'$11':toNumber()}${literal('|'):unescapeXml()}${'$12':toDecimal()}${literal('|'):unescapeXml()}${'$13':trim()}${literal('|'):unescapeXml()}${header:substring(63,69)}
I'm trying to split record according to the column length's provided to me and trying to trim spaces and and parsing to different types. In this process I observe that randomly some column in output file are empty strings even though the records in fixed width file contains some data. I can't figure out why the expression evaluation is inserting zero length strings randomly in the file. When I'm trying to with small set of records(some 100 records) from original file it is working fine. My original file is having 12 million records in it.

Multi-line insert in Hive

I am trying to insert into Hive table through files. But it so happens that the the last column in text file has data which spills across different lines.
Example data:
col1|col2|col3|this line is spilling into different line
as is this, this is spilling this is spilling this is sp
iliing and so is this
col1|col2|col3|this can be inserted without problem
So the spilled data is considered as a new row instead to wrapping into the last column. I tried using lines terminated by option, but cannot get this to work.
This is a special case of the more general problem that having a newline (end of line/record) symbol embedded in a column. Typical csv file formats have quotation characters around the string fields, and thus detecting embedded newlines in fields is simplified by noting the newline is inside quotes.
You do not have quote characters, but you do have knowledge of the number of fields, so you can detect when a newline would lead to the premature end of the record. But detecting the newline in the last field is harder. You need to notice that subsequent lines do not have field separators, and assume that these following lines are part of the record.

Oracle PL/SQL package/procedure for processing CSV from CLOB

I am allowing a user to upload a .csv file via OHS/mod_plsql. This file gets saved as a BLOB file in the database, which I then convert to CLOB. I want to then take the contents of this CLOB file and save its contents into a table. I already have some working code that splits the file based on line endings, then splits each resulting string along commas and inserts the records.
What I need is a way to handle the case where a string within the CSV is enclosed in double-quotes and contains a comma. For example:
col1,col2,col3,col4
some,text,more,text
this,text,has,"commas, semicolons, and periods"
My code will know how to process the second line, but not the third. Does anyone have some code that is smart enough to treat "commas, semicolons, and periods" as a single token? I could probably hack something together, but I don't trust my regular expression skills enough, and I figure someone else has probably already written something that does this same thing that would be willing to share.
There's a good CSV parser in the Alexandria PL/SQL Library - CSV_UTIL_PKG.
https://code.google.com/p/plsql-utils/
More info:
http://ora-00001.blogspot.com.au/2010/04/select-from-spreadsheet-or-how-to-parse.html

JMeter export graph to CSV exports to multiple columns, write results to file exports to one column

I was annoyed with JMeter writing data results to CSV as one column. So when the CSV file was opened in Excel all values would be added to one single column (which requires annoying manual copy/paste work to get to graphs). I then noticed that if I choose Export to CSV on a Listener graph, it actually exports the CSV file as separate columns in Excel, which is great.
Is it possible to have the "Write results to file" write data into separate columns by default as it does with the graph "Export to CSV"? Thanks!
Suppose you have at least 2 options:
Simple Data Writer, which one you are using at the moment, as you understand.
In jmeter.properties file (JMETER_HOME\bin\jmeter.properties) uncomment and set jmeter.save.saveservice.default_delimiter=; to use ';' instead of ',' (used by default) as separator in csv-files (which one you are creating using "Write results to file") - this will separate values in different columns if opened in Excel.
# For use with Comma-separated value (CSV) files or other formats
# where the fields' values are separated by specified delimiters.
jmeter.save.saveservice.default_delimiter=;
Flexible File Writer from jmeter-plugins pack implments the same functionality and looks to be more customizable.
Idea is the same as above - use ";" to separate values written into file:
Write file header: endTimeMillis;responseTime;latency;sentBytes;receivedBytes;isSuccessful;responseCode
Record each sample as: endTimeMillis|;|responseTime|;|latency|;|sentBytes|;|receivedBytes|;|isSuccessful|;|responseCode|\r\n
Hope this helps.

Resources