Hoping someone can assist with this query. I am attempting to replicate a CSV output and everything is fine at the moment aside from one field data that I am attempting to insert.
One of the fields contains multiple sets of data which contain commas to separate them however they are in the same field.
Is there any way in bash to generate this CSV so it can actually provide a CSV output that acknowledges that that particular field has multiple entries delimited by commas and not attempt to add those entries into a different field.
Cheers
wingZero
Details:
Field1, Field2, Field3
Data1, DATA2,DATA22,DATA222, Data3
Related
I'm having a csv file with two columns for example column A and Columb B. Column B consists of string value like this : I am, doing good. so when I try to insert this data into a database only the string I am is getting inserted. I just want to know what attribute I need to add to the process group so that I am, doing good will get inserted to the database
The attached image consists of the attributes in the current process group
I have three kinds of data, that is comma or tab or pipe separated data, trying to ingest file with comma or tab or pipe separated data with single control file?
Is it possible to load different kind of delimited data using single control file?
Egg:
Test1.csv
content:
firstname,lastname
rachel,green
chandler,bing
Test2.tsv
content:
firstname lastname
rachel green
chandler bing
Test3.psv
content:
firstname|lastname
rachel|green
chandler|bing
My current control file:
test.ctl
load data into table USERNAMES APPEND fields terminated by '\t' (firstname,lastname)
Expecting something like:
load data into table USERNAMES APPEND fields optionally terminated by '\t' or "," or "|" (firstname,lastname)
Nope, unfortunately you can't make it using same control file
docs
I believe sqlldr only allows for one delimiter, so you could pre-reprocess the file with a script to make all delimiters the same first (you'll need to know if those characters could be in the data though-could get ugly) and technically this changes the incoming data.
Alternatively you could load the file after pre-processing as above into a staging table where some sanity checks could be performed, then load into the main data table if sanity checks and validation passes.
As a beginner to TOS for BD, I am trying to read two csv files in Talend OS, i have inferred the metadata schema from the same CSV file, and setup the first row to be header, and delimiter as comma (,)
In my code:
The tMap will read the csv file, and do a lookup on another csv file, and generate two output files passed and reject records.
But while running the job i am getting below error.
Couldn't parse value for column 'Product_ID' in 'row1', value is '4569,Laptop,10'. Details: java.lang.NumberFormatException: For input string: "4569,Laptop,10"
I believe it is considering entire row as one string to be the value for "Product_ID" column
I don't know why that is happening when i have set the delimiter and row separator correctly.
Schema
I can see no rows are going from the first tInputFileDelimited due to above error.
Job Run
Input component
Any idea what else can i check?
Thanks in advance.
In your last screenshot, you can see that the Field separator of your tFileInputDelimited_1 is ; and not ,.
I believe that you haven't set up your component to use the metadata you created for your csv file.
So you need to configure the component to use the metadata you've created by selecting Repository under Property Type, and selecting the delimited file metadata.
To count the rows of csv file we can use Get Files Rows Count Input in etl. How to find the number columns of a csv file?
Just read the first row of the CSV file using Text-File-Input setting header rows to 0. Usually, the first row contains field names. If you read the whole row into a single field, you can use Split-Field-To-Rows to have a single fieldname per row and the number of rows tells you the number of fields. There are other ways, but this one easily prepares for a subsequent metadata injection - if that's what you have in mind.
No Need of Metadata injection , In Split-Field-To-Rows, check "Include rownum in output" and give some name to that Variable. Then apply sort rows on that Variable, use Sample rows, then you will get number of fields which are present in the file.
I am retrieving data from DB and make each field merged with Comma between them to generate CSV.
But the problem is one the field is Company Name and the data includes comma which leads to malformed CSV file.
Example: Name, Telephone, Email
AAA, 12345, aaa#mail.com
BBB Co,.Ltd, 43466, bbb#gmail.com
For the record BBB the generated CSV becomes problem as it includes , in the data.
How should I make the correct CSV for such records of including , ?
Most of the developers handle this situation by using different characters instead of "comma". But i would suggest you to look into an old post here
Dealing with commas in a CSV file
Is your question related to Salesforce APEX?
When the CSV was generated there ought to be an option to enclose the fields in Double quotes so that commas can appear inside the field content. For example "Company, Name","1234","etc."
The CSV generator will also "escape" any double quotes inside a field like this "Some field with \"double\" quotes","123","etc"
This all means you need a CSV parser that can handle these situations.
If your question is related to Salesforce APEX then it is quite difficult to build such a CSV parser because of the limitations Salesforce imposes on the number of statements that can run in any given action.