Convert EBCDIC data file to ASCII - ascii

I have an EBCDIC data file which is variable length.
Inside this file, it contains binary data (comp), packed-decimal (comp-3), display-numeric (pic (9)), and string (pic (x)).
How to convert it to ASCII using a language such as Java, Perl, or COBOL?

First of all, you need to verify that the file is still EBCDIC. Some file transfer programs automatically convert the EBCDIC to ASCII, which corrupts the COMP and COMP-3 fields. You can verify this visually by looking for space characters in the alphanumeric fields. EBCDIC space is x'40'. ASCII space is x'20'.
Assuming that the file is EBCDIC, you have to write three separate conversions. COMP to ASCII, COMP-3 to ASCII, and alphanumeric to alphanumeric. Each field in the record has to be extracted and converted separately.
The alphanumeric to alphanumeric conversion is basically a lookup table for the alphabet and digits. You look up the EBCDIC character and replace it with an ASCII character.
COMP is a binary format. It's usually 4 bytes or 8 bytes,
COMP-3 is a packed decimal format with a sign byte. The sign byte is usually following the number, but not always. For example, 124 looks like x'124F'. The last byte is a sign field, x'C' is unsigned (positive), x'D' is negative, and x'F' is positive.

You need to treat each field by itself. You must not convert the COMP fields. You convert the character fields with the method provided by the language of choice.

Yes it is possible (For java look at JRecord project). Jrecord can use a Cobol copybook to read a file in Java. It can also use a Xml description as well.
Warnings
Before doing anything else
Make sure the file is still Ebcdic !!!. If the file has been through a EBCDIC to ASCII Conversion, the comp/comp-3 fields will be corrupted.
You describe the file as being variable length. Does that mean it has variable length records ???. If so make sure the RDW option was used to transfer the file !!
If you do not get the file transfer* done correctly you will end up wasting a lot of time and then redoing what you have already done.
Cobol Copybook
I am assuming you can get a Cobol copybook. If so
Try editing the file using the RecordEditor. There is an outdated answer here. I will try and provide a more up to date answer.
You can generate Skelton Java/JRecord in the RecordEditor. See How do you generate java~jrecord code for a Cobol copybook
JRecord Project
The JRecord project will read/write Cobol data files using
A Cobol Copybook
A Xml File Description
File Schema defined in Java
There are 3 sub-project that can be used to convert simple Cobol files to
CSV, Xml or Json files. More complicated files need Java/JRecord
CobolToCsv
CobolToXml
CobolToJson
JRecord CodeGen
JRecord CodeGen will generate sample Java/JRecord code to Read/Write Cobol files (including Mainframe Cobol) from a Cobol Copybook. JRecord CodeGen is used by the RecordEditor to generate Java/JRecord programs.

Related

How to create a file with unknown character encoding

I would like to test some file character encoding detection functionality, where I input files of type UTF-8, windows-1252, ISO-8859-1, etc.
I also want to input files with unknown character encoding so that the user can be alerted.
I haven't found a good way to create files with an unknown or undetectable character encoding.
head -c1024 /dev/random > /tmp/badencoding
This is almost certainly what you want in practice (1kB of random data), but there isn't really a good definition of "undetectable character encoding." This random file is legal 8-bit ASCII. The fact that it certainly is not meant to be 8-bit ASCII is just a heuristic. So all you're going to wind up doing is testing that your algorithm works in ways that your users probably want it to; there is no ultimate "correct" here without reading the mind of the person who created the file.
An empty text file has an undetectable character encoding (except if it has a Unicode BOM).
But basically, you either have to require the user to tell which character encoding a file they are giving you uses, or tell them which one to use (or both, if you specify a default but allow it to be overridden [which is what many compilers do.]).
You can then test the contents for validity against the agreed character encoding. This will catch some errors but note that many character encodings allow any sequence of bytes with any value so any content is always valid (even if the character encoding is not what was used to write the file).
You can then test for consistency with expected values, such as some syntax or allowable character or words, to catch more errors (but you wouldn't necessarily be able to say the character encoding didn't match; it could be just the content is incorrect).
To create files with different character encodings, you could write a program or use a 3rd-party program such as iconv or PowerShell.
If you want an unknown character encoding, just generate a random integer map, convert a file, discard the map, and then not even you will know it.
Ultimately, text files are too technical for users to deal with. Give them some other option such as an open document or spreadsheet format such as .odt, .docx, .ods, or .xlsx. These are very easy to read by programs.

Informatica v10.2 - Transformation for ASCII to EBCDIC

I need to convert data from ASCII to EBCDIC in an Informatica Transformation. I have attempted to use CONVERT_BASE in an expression using string datatypes for the currency data, but received a non-fatal error.
I've also googled a fair amount and have been unable to find a solution.
Has anyone encountered and been successful in a situation like this?
In Complex Data Exchange, you do not require a transformer to convert ASCII to EBCDIC format.
To change the encoding from ASCII to EBCDIC format, do the following:
LaunchContentMaster Studio
Go toProject > Properties > Encoding
Change the output encoding toEBCDIC-37 and the byte order toBigEndian.
Just in case if you need to transfer flat file from mainframe (EBCDIC) to Linux (ASCII) and preserve packed decimal fields / COMP-3 (i.e. do not unpack COMP-3); You can use a combination of PWX for VSAM/Sequential on the mainframe source, and PWX for Flat Files on the Linux machine for the target.
Create appropriate datamaps for the source, and for the target.
On the source side, use datamap functions to create a new field for each of the packed fields, as an unpacked value.
In the mapping, bring in the unpacked value ports, not the packed ones, as numerics.
In the datamap for the target, create only the packed fields.
In the mapping, map the (unpacked) numerics to packed numerics fields
PWX should handle the conversions for you.
Note that this includes operations on packed fields, so some signs may get converted from F to C.

How to open a (CSV) file in oracle and save it in UTF-8 format if it is in another formats

Can anyone please advise me on the below issue.
I have an oracle program which will take a .CSV file as the input and will process it. We are now facing an issue that when there is an extended ASCII character appear in the input file, its trimming the next letter after that special character.
We are using the File utility function Utl_File.Fopen_Nchar() to open the file and Utl_File.Get_Line_Nchar() for reading the characters in the file. The program is written in such a way that it should handle multiple languages(Unicode characters) in the input file.
In the analysis its found that when the character encoding of the CSV file is UTF-8 its processing the file successfully even when extended ASCII characters as well as Unicode characters are there. But some times we are getting the file in 1252 (ANSI - Latin I) format which makes the trimming problem for extended ASCII characters.
So is there any way to handle this issue? Can we open a (CSV) file in oracle and save it in UTF-8 format if it's in any another formats?
Please let me know if any more info is needed.
Thanks in anticipation.
The problem is when you don't know in which encoding your CSV file is saved then it is not possible to determine any conversion either. You would screw up your CSV file.
What do you mean by "1252 (ANSI - Latin I)"?
Windows-1252 and ISO-8859-1 are not equal, see the difference here: ISO 8859-1 vs. ISO 8859-15 vs. Windows-1252 vs. Unicode
(Sorry for posting the German Wikipedia, however the English version does not show such a nice table)
You could use the fix_latin command-line tool convert a file from an unknown mixture of ASCII / Latin-1 / CP1251 / UTF8 into UTF8:
fix_latin < input.csv > output.csv
The fix_latin utility is a simple Perl script which is shipped with the Encoding::FixLatin module on CPAN.

FTP batch report with chinese characters to excel

We have a requirement to FTP the batch report to a excel sheet in .csv format. The batch report contains both single byte and double byte characters, for example, English and Chinese. The data in mainframe is in Base64 format and when this is FTP’ed in either Binary or ASCII mode, the resulting .csv spreadsheet shows only junk characters. We need a method to FTP the batch report file, so that the FTP’ed report is in readable format.
Request your help in resolving this issue.
I'm not familiar with Chinese character sets but I would think if you're not restricted to CSV, you might try to format an XML document for excel whereby you can specify the fonts as part of the spreadsheet definition.
Assuming that isn't an option I would think the Base64 format might need to be translated to ASCII (from EBCDIC) before transmission and then delivered in BINARY. Otherwise you risk having the data translated to something you didn't expect.
Another way to see what is really happening is send the data as ASCII and retrieve the data as BINARY and then compare the before and after results to see what characters were changed enroute during transmission. I recall having to do something similar to this once to resolve different code sets in Europe vs. U.S.
I'm not sure any of these suggestions would represent a "solution" to your problem, but these would be ideas that I would explore. I would be interested in hearing how you resolve this.

Unexpected Newlines in files uploaded to z/OS

I'm uploading a file that was originally ASCII and converted to EBCDIC from Windows OS to z/OS. My problem is that when I checked the file after uploading it, I see a lot of new lines.
When I tried to check it with its hex dump I discovered that when mainframe sees a x'15' it translates it into a newline. In the file there are packed decimals so the hex could contain let say a x'001500001c' but when I upload it, mainframe mistook it as a new line. Can anyone help me with this problem?
You should put your FTP client (or library if the upload is done by your code) into binary (IMAGE TYPE) mode instead of ascii/EBCDIC if you are sending a file already in EBCDIC i believe.
It depends on the type of target "file" that you're uploading to.
If you're uploading to a member that has fixed block size (e.g., FB80), you'll need to ensure all the lines are padded out with spaces before you transmit it up (in binary mode).
Text mode transfers are not suitable for binary files (and your files are binary if they contain packed decimals - there's no reliable way for FTP to detect real line-end characters).
You'll need to fix your Windows ASCII-to-EBCDIC converter to be able to generate fixed length records.
The only other option is with a REXX script on the mainframe but this would still require being able to tell the difference between a real end-of-line marker and that marker within the binary data.
You could possibly tell the presence of a packed decimal by virtue of the fact that it consisted of BCD nybbles, the last of which is 0xC or 0xD, but that could also cause false positives or negatives.
My advice: when you convert it from ASCII to EBCDIC, pad out the lines to the desired record length at the same time.
The other point I'd like to raise is that if you just want to look at the files on the mainframe (not use them from any code that requires EBCDIC), the ISPF editor includes a few new commands (as of z/OS 1.9 if I remember correctly).
SOURCE ASCII will display the data as ASCII rather than EBCDIC. In addition, the LF command allows you to massage the ASCII stream in an FB member to correctly fix up line endings.

Resources