I have csv file which contains Latin characters like this : Østfold.
What should be my ctl file for this?
This got resolved by using "CHARACTERSET WE8ISO8859P1" in CTL file.
Related
I have a scenario to import the csv file then validate the content, I have a cell with following special characters
“ !#$%&’()*+,-./:;<=>?#[\]^_`{|}”~
When I read the CSV file with above characters in cell using CSV.read("csv_filepath"), I am getting following
“ !\#$%&’()*+,-./:;<=>?#[\\]^_`{|}”~
backslash(\) is added for # and \, how to read the exact content
There is nothing like /n in ebcdic. There is no support for new line in ebcdic.
How should I convert that like. There is no delimiter in ebcdic. So while converting this file. How shall I know that new line has come?
Suggestions please.
Actually there is a new-line character (x'15'). Normal ZOS files do not use the New-Line character. ZOS is built around Fixed-Width, VB, VSAM etc files.
Options include:
If it is a Text file (unlikely) convert the file to ascii when it is transferred off the mainframe.
Convert the File to Text on the Mainframe and convert when transferring off the mainframe.
Use a commercial package. Sync-Sort has DMX-h, there is also Datameer
If you have Cobol copybook, look at these open source packages:
https://wiki.cask.co/display/CE/Plugin+for+COBOL+Copybook+Reader+-+Fixed+Length
https://index.pocketcluster.io/tmalaska-copybookinputformat.html
https://github.com/ianbuss/CopybookHadoop
https://sourceforge.net/projects/coboltocsv/
JRecord could be used with a Cobol Copybook, Plain Java Code or a Xml file description
6.
A new line in EBCDIC is usually formed from the carriage-return character (hex 0D) and line-feed (hex 25). In ASCII carriage-return is also hex 0D but the line-feed character needs to be converted to hex 0A.
Hope this helps.
I am trying to read a file which have delimiter as double colon (::). I am using CSVExcelStorage, but it is giving error as:
could not instantiate 'org.apache.pig.piggybank.storage.CSVExcelStorage' with arguments '[::]'
So is there any way to read a file using custom delimiter?
You can use PigStorage with your custom delimiter.
You are probably missing the quotes.
REGISTER /usr/lib/pig/piggybank.jar;
A = LOAD 'Test.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage('::')
The question: How (where) can I specify the line terminator string of DAT file in case, that I pass the name of the DAT file on the command line using "data" parameter and not in CTL file? I am using Oracle 11.2 SQL Loader.
The goal: I need to load fast huge amount of data from CSV file into Oracle 11.2 (or above). The field (column) separator is hexa 1F (US character = unit separator), the string delimiter is the double quote, the record (row) separator is hexa 1E (RS character = record separator).
The problem: Using "stream record format" with "str terminator_string" of SQL Loader is fine, but just only in case, that I can specify the name of the DAT file using "infile" directive inside CTL. But the name of my DAT file is varying, so I pass the name of the DAT file on the command line as the "data parameter". And in this case I do not know, how (where) can I specify the line terminator string of DAT file in case.
Remark: The problem is the same as in the unsolved problem in this question.
Admittedly, more a workaround than a proper solution, but it should work if you have a fixed name in the controlfile, and then copy/rename/sym link each file to the fixed name and process. Or, have a control which has a infile entry "THE_DAT_FILE", and then run "sed" to change this to the required file name and then invoke sqlldr using this sed'd file.
So, something like:
Get the data file F1
Copy/SymLink F1 to the_file.dat (sym link asuming Unix/Linux/Cygwin)Admi
RUn sqlldr with STR which refers to INFILE as "the_file.dat"
When complete, delete/unlink the_file.dat
Repeat 1-4 for next file(s) F1, F2, ... Fn
E.g.
for DAT_FILE in *.dat
do
ln -s $DAT_FILE /tmp/the_file.dat
sqlldr .....
rm /tmp/the_file.dat
done
Or
for DAT_FILE in *.dat
do
cat the_ctl_file | \
sed "s/THE_DAT_FILE/£DAT_FILE/" > /tmp/ctl_$DAT_FILE.cf
sqlldr ..... controlfile=tmp/ctl_$DAT_FILE.cf
done
I just ran into a similar situation, where I need to use the same control file for a set of files, all with the windows EOL character for EOR with embedded newlines in text fields.
Rather than code a specific control file for each with the name on the INFILE directive, I coded the name as /dev/null with the STR as:
INFILE '/dev/null' "STR '\r\n'"
And then on the sqlldr command line I use the DATA option to specify the actual flat file.
my text file is delimited by pipeline '|'
I want to export this in to excel file (xls) using a script in Unix
can anyone please help
My suggestion would be,
Convert the delimiter | to ,
Save the file with csv extension
Open the file in excel.
Note: If you have , in the file contents other than token separator this idea will not work.
If you want to convert your file to .xls format then you will have to use apache POI library. It has perl support.
If you just want to open it in excel then you can directly use open with excel and set the seperator as |.
Or put all the words in " " and use , as the seperator. If it is within "" then comma within the text will not be an issue. But double quotes within the text will be a problem.
To avoid all these you can use some other ascii character as the seperator.