SQLLoader control file for reading comma or tab or pipe separated/delimited data? - oracle

I have three kinds of data, that is comma or tab or pipe separated data, trying to ingest file with comma or tab or pipe separated data with single control file?
Is it possible to load different kind of delimited data using single control file?
Egg:
Test1.csv
content:
firstname,lastname
rachel,green
chandler,bing
Test2.tsv
content:
firstname lastname
rachel green
chandler bing
Test3.psv
content:
firstname|lastname
rachel|green
chandler|bing
My current control file:
test.ctl
load data into table USERNAMES APPEND fields terminated by '\t' (firstname,lastname)
Expecting something like:
load data into table USERNAMES APPEND fields optionally terminated by '\t' or "," or "|" (firstname,lastname)

Nope, unfortunately you can't make it using same control file
docs

I believe sqlldr only allows for one delimiter, so you could pre-reprocess the file with a script to make all delimiters the same first (you'll need to know if those characters could be in the data though-could get ugly) and technically this changes the incoming data.
Alternatively you could load the file after pre-processing as above into a staging table where some sanity checks could be performed, then load into the main data table if sanity checks and validation passes.

Related

data factory special character in column headers

I have a file I am reading into a blob via datafactory.
Its formatted in excel. Some of the column headers have special characters and spaces which isn't good if want to take it to csv or parquet and then SQL.
Is there a way to correct this in the pipeline?
Example
"Activations in last 15 seconds high+Low" "first entry speed (serial T/a)"
Thanks
Normally, Data Flow can handle this for you by adding a Select transformation with a Rule:
Uncheck "Auto mapping".
Click "+ Add mapping"
For the column name, enter "true()" to process all columns.
Enter an appropriate expression to rename the columns. This example uses regular expressions to remove any character that is not a letter.
SPECIAL CASE
There may be an issue with this is the column name contains forward slashes ("/"). I accidentally came across this in my testing:
Every one of the columns not mapped contains forward slashes. Unfortunately, I cannot explain why this would be the case as Data Flow is clearly aware of the column name. It can be addressed manually by adding a Fixed rule for EACH offending column, which is obviously less than ideal:
ANOTHER OPTION
The other thing you could try is to pre-process the text file with another Data Flow using a Source dataset that has no delimiters. This would give you the contents of each row as a single column. If you could get a handle on the just first row, you could remove the special characters.

Talend tInputFileDelimited component java.lang.NumberFormatException for CSV file

As a beginner to TOS for BD, I am trying to read two csv files in Talend OS, i have inferred the metadata schema from the same CSV file, and setup the first row to be header, and delimiter as comma (,)
In my code:
The tMap will read the csv file, and do a lookup on another csv file, and generate two output files passed and reject records.
But while running the job i am getting below error.
Couldn't parse value for column 'Product_ID' in 'row1', value is '4569,Laptop,10'. Details: java.lang.NumberFormatException: For input string: "4569,Laptop,10"
I believe it is considering entire row as one string to be the value for "Product_ID" column
I don't know why that is happening when i have set the delimiter and row separator correctly.
Schema
I can see no rows are going from the first tInputFileDelimited due to above error.
Job Run
Input component
Any idea what else can i check?
Thanks in advance.
In your last screenshot, you can see that the Field separator of your tFileInputDelimited_1 is ; and not ,.
I believe that you haven't set up your component to use the metadata you created for your csv file.
So you need to configure the component to use the metadata you've created by selecting Repository under Property Type, and selecting the delimited file metadata.

Send a Flat file attachment in the workflow in Informatica Developer

In a mapping we use delimited flat file having 3 columns.The column separated through comma. But i have a requirement that in between the column there is a column having 2 comma.So how should I process the column in the mapping?
You should have information quoted with "" so whatever is within " is skiped. this way you could differentiate between comma of a piece of information or as a column separator.
We don't know what have you tried, but count the number of commas for each line and separate accordingly (if possible).

csv parsing issue when data include comma

I am retrieving data from DB and make each field merged with Comma between them to generate CSV.
But the problem is one the field is Company Name and the data includes comma which leads to malformed CSV file.
Example: Name, Telephone, Email
AAA, 12345, aaa#mail.com
BBB Co,.Ltd, 43466, bbb#gmail.com
For the record BBB the generated CSV becomes problem as it includes , in the data.
How should I make the correct CSV for such records of including , ?
Most of the developers handle this situation by using different characters instead of "comma". But i would suggest you to look into an old post here
Dealing with commas in a CSV file
Is your question related to Salesforce APEX?
When the CSV was generated there ought to be an option to enclose the fields in Double quotes so that commas can appear inside the field content. For example "Company, Name","1234","etc."
The CSV generator will also "escape" any double quotes inside a field like this "Some field with \"double\" quotes","123","etc"
This all means you need a CSV parser that can handle these situations.
If your question is related to Salesforce APEX then it is quite difficult to build such a CSV parser because of the limitations Salesforce imposes on the number of statements that can run in any given action.

Reading files in PIG where delemeter comes in data

I want to read a CSV file using PIG what should i Do?. I used load n pigstorage(',') but it fails to read CSV file properly because where it encounters comma (,) in data it splits it.How should i give delimeter now if i have comma in data also?
It's generally impossible to distinguish comma in data from comma as a delimiter.
You will need to escape that comma that is in your 'data' and custom load function (for Pig) that can recognize escaped commas.
Take a look here:
http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html
http://pig.apache.org/docs/r0.7.0/udf.html#Load%2FStore+Functions
Have you had a look at the CSVLoader loader in the PiggyBank if you want to read a CSV file? (of course the file format needs to be valid)
First make sure you have a valid CSV file. In the case you haven't try to change the source file through Excel (if the file is small) or other tool and export a new CSV with a good delimiter for your data (Ex: \t tab, ; , etc). Even better can be do another extract with a "good" delimiter.
Example of your load can be then something like this:
TABLE = LOAD 'input.csv' USING PigStorage(';') AS ( site_id: int,
name: chararray, ... );
Example of your DUMP:
STORE TABLE INTO 'clean.csv' using PigStorage(','); <- delimiter that suits you best

Resources