Send a Flat file attachment in the workflow in Informatica Developer - bioinformatics

In a mapping we use delimited flat file having 3 columns.The column separated through comma. But i have a requirement that in between the column there is a column having 2 comma.So how should I process the column in the mapping?

You should have information quoted with "" so whatever is within " is skiped. this way you could differentiate between comma of a piece of information or as a column separator.
We don't know what have you tried, but count the number of commas for each line and separate accordingly (if possible).

Related

Does a single column CSV file have commas?

When i open my csv file in an excel it looks like this -
Header
Value1
Value2
Value3
Value4
Value5
I want to know whether this file actually has commas in it? I am aware that if i have multiple columns i will see the commas
You can easily test that by opening the file in a text editor (e.g. Notepad on Windows). It will show the file as it is in text format, i.e., with commas present (if they are in the file). I would say that if it is single column, it won't have commas (but rather line breaks between the rows), but if you need to be sure just open it with a text editor.
https://www.ietf.org/rfc/rfc4180.txt
Given there is only one value in each record it would not have a comma given the spec.
Within the header and each record, there may be one or more
fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part
of a field and should not be ignored. The last field in the
record must not be followed by a comma. For example:

SQLLoader control file for reading comma or tab or pipe separated/delimited data?

I have three kinds of data, that is comma or tab or pipe separated data, trying to ingest file with comma or tab or pipe separated data with single control file?
Is it possible to load different kind of delimited data using single control file?
Egg:
Test1.csv
content:
firstname,lastname
rachel,green
chandler,bing
Test2.tsv
content:
firstname lastname
rachel green
chandler bing
Test3.psv
content:
firstname|lastname
rachel|green
chandler|bing
My current control file:
test.ctl
load data into table USERNAMES APPEND fields terminated by '\t' (firstname,lastname)
Expecting something like:
load data into table USERNAMES APPEND fields optionally terminated by '\t' or "," or "|" (firstname,lastname)
Nope, unfortunately you can't make it using same control file
docs
I believe sqlldr only allows for one delimiter, so you could pre-reprocess the file with a script to make all delimiters the same first (you'll need to know if those characters could be in the data though-could get ugly) and technically this changes the incoming data.
Alternatively you could load the file after pre-processing as above into a staging table where some sanity checks could be performed, then load into the main data table if sanity checks and validation passes.

Field Overflow when column values contain space

I have a simple mapping to read data from a DB table onto a CSV delimited file using ODI(11g/12c).
Certain fields are having values with words separated by spaces, like "United States of America".
When the data is generated in CSV,I see that "United" is in one column, "States" is in the next column and so on. In other words there is an overflow of data in spite of the delimiter being set to ",".
IKM: SQL to File Append.
How can we resolve this?

How can I extract a column's (called say "X") any cell value from a text file with multiple columns using Bash?

I have a huge file with 100 columns.
I am concerned with one column called 'Location'. I know for a fact that all rows of this column are same in value. I need to get that value through Bash.
Any thoughts on how to go about this?
If the column is always in the same location relative to other columns (say 10th) you could use
cut -d" " -f10
In this case you're assuming there's one whitespace between each column, you could change the delimiter to whatever separates between the columns.

Talend tInputFileDelimited component java.lang.NumberFormatException for CSV file

As a beginner to TOS for BD, I am trying to read two csv files in Talend OS, i have inferred the metadata schema from the same CSV file, and setup the first row to be header, and delimiter as comma (,)
In my code:
The tMap will read the csv file, and do a lookup on another csv file, and generate two output files passed and reject records.
But while running the job i am getting below error.
Couldn't parse value for column 'Product_ID' in 'row1', value is '4569,Laptop,10'. Details: java.lang.NumberFormatException: For input string: "4569,Laptop,10"
I believe it is considering entire row as one string to be the value for "Product_ID" column
I don't know why that is happening when i have set the delimiter and row separator correctly.
Schema
I can see no rows are going from the first tInputFileDelimited due to above error.
Job Run
Input component
Any idea what else can i check?
Thanks in advance.
In your last screenshot, you can see that the Field separator of your tFileInputDelimited_1 is ; and not ,.
I believe that you haven't set up your component to use the metadata you created for your csv file.
So you need to configure the component to use the metadata you've created by selecting Repository under Property Type, and selecting the delimited file metadata.

Resources