I'm trying to use Spoon / Kettle to upload a plain text file that is separated by ASCII characters. I can see all the data when I preview the content of the file in Kettle, but no records load when I try to preview rows on the "Content" tab.
According to my research, Kettle should understand my field separator when typed as "$[value]" which in my case is "$[01]". Here's a description of the file structure:
Each file in the feed is in plain text format, separated into columns and rows. Each record has the same set of fields. The following are the delimiters for
each field and record:
Field Separator (FS): SOH (ASCII character 1)
Record Separator (RS) : STX (ASCII character 2) + “n”
Any record starting with a “#” and ending with the RS should be treated as a comment by the ingester and ignored. The data provider has also generated a column header line at the beginning of the file, listing field data types.
So my input parameters are:
Filetype: Fixed
Separator: $[01]
Enclosure:
Escape:
...
Format: DOS
Encoding: US-ASCII
Length: Characters
I'm unable to read any records, and I'm not sure if this is the correct approach. Would ingesting this data with java inside of kettle be a better method?
Any help with this would be much appreciated. Thanks!
Related
Are there any reserved text characters in SQL Loader ?
Any special characters like &,_" etc which cannot be loaded in Oracle table columns ?
My file column seperator is a pipe {|} character and I will escape to accept this too in my text columns but are there any other reserved characters which I cannot use in the data fields to be interfaced ?
There are none, as far as I can tell.
However, I'd suggest you to choose delimiters wisely because if text you're loading contains delimiters, you'll have problems in figuring out whether e.g. a pipe sign is a delimiter, or part of text to be loaded.
If you can prepare input data so that values are optionally enclosed into double quotes, you'd be able to avoid such problems. However, why having it complicated if it can be simple?
I have three kinds of data, that is comma or tab or pipe separated data, trying to ingest file with comma or tab or pipe separated data with single control file?
Is it possible to load different kind of delimited data using single control file?
Egg:
Test1.csv
content:
firstname,lastname
rachel,green
chandler,bing
Test2.tsv
content:
firstname lastname
rachel green
chandler bing
Test3.psv
content:
firstname|lastname
rachel|green
chandler|bing
My current control file:
test.ctl
load data into table USERNAMES APPEND fields terminated by '\t' (firstname,lastname)
Expecting something like:
load data into table USERNAMES APPEND fields optionally terminated by '\t' or "," or "|" (firstname,lastname)
Nope, unfortunately you can't make it using same control file
docs
I believe sqlldr only allows for one delimiter, so you could pre-reprocess the file with a script to make all delimiters the same first (you'll need to know if those characters could be in the data though-could get ugly) and technically this changes the incoming data.
Alternatively you could load the file after pre-processing as above into a staging table where some sanity checks could be performed, then load into the main data table if sanity checks and validation passes.
In a mapping we use delimited flat file having 3 columns.The column separated through comma. But i have a requirement that in between the column there is a column having 2 comma.So how should I process the column in the mapping?
You should have information quoted with "" so whatever is within " is skiped. this way you could differentiate between comma of a piece of information or as a column separator.
We don't know what have you tried, but count the number of commas for each line and separate accordingly (if possible).
I need to read a text file using spring batch process and bellow is a sample file
000115989 AB0001 BC00012 030114 010100 WITHDRAWL FROM SAVING 100.00
It doesn't have any column header and each column has a fixed length and delimited by two blank spaces.
Here I can't use DelimitedLineTokenizer for two blank spaces as columns can also have leading or trailing blank spaces .
Is there any work around so that I read each column with its specific length after that I can trim that.
Take a look at the FixedLengthTokenizer (http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/file/transform/FixedLengthTokenizer.html). This allows you to set how lines are parsed by column instead of by delimiter.
I have this csv file that I would like to parse with Ruby. The file's data is a cluster with commas and new lines in the fields but Excel still reads it properly. If the file could be exported from excel using the unit and record separators as the delimiters for the columns and rows, I'd be golden.
Anybody know how to specify those characters in excel? Thanks!
Use Ruby CSV with this option:
:col_sep
The String placed between each field. This String will be transcoded
into the data’s Encoding before parsing.
See more here: http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html
I ended up having Google Sheets export the file as json. Steps I followed here There were 10,000 records and the browser tab crashed when it tried to do all of them. So I had to piece meal it. I'm sure there's a better way to do it.