JRecord - Formatting file transferred from Mainframe - ftp

I am trying to display a mainframe file in a eclipse RCP application using JRecord library.I already have the COBOL copybook as a text file.
to accomplish that,
I am transferring the file from mainframe to my desktop through
apache commons net FTPClient API
Now I have a text file
I am removing the newline and carriage return characters
then I read it via ., a CobolIoProvider and convert it into a ArrayList of type AbstractLine
But I have offset issues because of some special charcters .
here are the issues
when I dont perform step #3 , there are offset issues right from
record 1. hence I included step #3
even when I perform step #3 , the first few thounsands of records seem to be formatted(or read ) by the AbstractLineReader correctly unless it encounters a special character (not sure but thats my assumption).
Code snippet:
ArrayList<AbstractLine> lines = new ArrayList<AbstractLine>();
InputStream copyStream;
InputStream fis;
try {
copyStream = new FileInputStream(new File(copybookfile));
String filec = FileUtils.readFileToString(new File(datafile));
System.out.println("initial len: "+filec.length());
filec=filec.replaceAll("\r", "");
filec=filec.replaceAll("\n", "");
System.out.println("initial len: "+filec.length());
fis= new ByteArrayInputStream(filec.getBytes());
CobolIoProvider ioProvider = CobolIoProvider.getInstance();
AbstractLineReader reader = ioProvider.newIOBuilder(copyStream, "REQUEST",
Convert.FMT_MAINFRAME).newReader(fis);
AbstractLine line;
while ((line = reader.read()) != null) {
lines.add(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
What am I missing here ? is there an additional preprocessing that I need to do for the file transferred from mainframe ?

If it is a Text File (no binary data) with \r\n line delimiters try:
ArrayList<AbstractLine> lines = new ArrayList<AbstractLine>();
InputStream copyStream;
InputStream fis;
try {
copyStream = new FileInputStream(new File(copybookfile));
AbstractLineReader reader = CobolIoProvider.getInstance()
.newIOBuilder(copyStream, "REQUEST", ICopybookDialects.FMT_MAINFRAME)
.setFileOrganization(Constants.IO_STANDARD_TEXT_FILE)
.newReader(datafile);
AbstractLine line;
while ((line = reader.read()) != null) {
lines.add(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Note: The setFileOrganization tells JRecord what type of file it is. So .setFileOrganization(Constants.IO_STANDARD_TEXT_FILE) tells JRecord it is a Text file with \n or \r\n end-of-line markers. Here is a Description of FileOrganisation in JRecord.
The special charcters worry me though, if there is a \n in the 'Data' it will be treated as an end-of-line. You may need to do binary transfer and keep the RDW (Record-Descriptor-Word) if it is a VB file.
If The file contains Binary data, you will need:
do a binary transfer (with RDW if it is a VB file)
use the appropriate File-Organisation
Specify Ebcdic (.setFont("cp037") tells JRecord is US-Ebcdic)
I will add a second answer for Generating Code using the RecordEditor
If you are absolutely sure all the records are the same length you can use the low-level routines to do the reading see the ReadAqtrans.java program in https://sourceforge.net/p/jrecord/discussion/678634/thread/4b00fed4/
basically you would do:
ICobolIOBuilder iobuilder = CobolIoProvider.getInstance()
.newIOBuilder("copybookFileName", ICopybookDialects.FMT_MAINFRAME)
.setFont("CP037")
.setFileOrganization(Constants.IO_FIXED_LENGTH);
LayoutDetail layout = iobuilder.getLayout();
FixedLengthByteReader br
= new FixedLengthByteReader(layout.getMaximumRecordLength() + 2);
br.open("...");
byte[] bytes;
while ((bytes = br.read()) != null) {
lines.add(iobuilder.newLine(bytes));
}

Future Reference / Binary File
If the file does contain Binary Data, you really need to do a binary transfer. You may find the RecordEditor useful.
The RecordEditor 0.98 has a JRecord code Generation
function. The advantages of using the RecordEditor Generate function are
The Recordeditor will try and work out the appropriate File attributes by looking at the File
You can try out various attributes (left hand pane) and see what the file looks like with those attributes
(right hand side).
When happy, hit the Generate button and the RecordEditor will generate JRecord code. There are several Code Templates
available:
Standard - will generate basic JRecord code (with a field name class
lineWrapper - will generate a "wrapper" class with the Cobol fields represented as get/set methods
RecordEditor Generate
In the RecordEditor select Generate >>> Java~JRecord code for Cobol
Generate Screen
Enter the Cobol CopyBook / Sample file and adjust the attributes as needed
Code Template
Next you can select the Code Template
Generated Code
Finally the RecordEditor will generate JRecord code based on the Attributes entered.

Related

hl7 message encoding error while parsing the message in map-reduce

I am trying to parse a HL7 message by Hapi in map-reduce function i got EncodingNotSupportedException when i run the map task.
i tried to add \n or \r to the end of each segment but i am facing the same error.
the message is saved in text file and it uploaded to HDFS. should i need to add something this is my code
String v = value.toString();
InputStream is = new StringBufferInputStream(v);
is = new BufferedInputStream(is);
Hl7InputStreamMessageStringIterator iter = new Hl7InputStreamMessageStringIterator(
is);
HapiContext hcontext = new DefaultHapiContext();
Message hapiMsg;
Parser p = hcontext.getGenericParser();
while (iter.hasNext()) {
String msg = iter.next();
try {
hapiMsg = p.parse(msg);
} catch (EncodingNotSupportedException e) {
e.printStackTrace();
return;
} catch (HL7Exception e) {
e.printStackTrace();
return;
}
}
the sample message
MSH|^~\&|HIS|RIH|EKG|EKG|20150121002000||ADT^A01||P|2.5.1
EVN||20150121002000|||||CITY GENL HOSP^0133195934^NPI
PID|1||95101100001^^^^PI^CITY GENL HOSP&0133195934&NPI||SNOW^JOHN^^^MR^^L||19560121002000|M||2054-5^White^CDCREC|470 Ocean Ave^^NEW YORK^^11226^USA^C^^29051||^^^^^513^5551212|||||95101100001||||2186-5^White American^CDCREC|||1
PV1||E||E||||||||||1||||||||||||||||||||||||||||||
OBX|1|NM|21612-7^PATIENT AGE REPORTED^LN||60|a^YEAR^UCUM|||||F|||201601131443
OBX|2|NM|21613-7^Urination^LN||2|a^DAY^UCUM|||||F|||19740514201500
DG1|001||4158^Diabetes^I9CDX||19740514201500|A|5478^Non-infectious
DG1|002||2222^Huntington^I9CDX||19610718121500|A|6958^Genetic
Never store HL7-messages as text file, but as binary. Are you sure, that the segment delimiters are ok?
Just check your HL7 message after reading from HDFS either via printing to the console or via the use of a debugger, if the message contains only \r as segment delimiter before parsing.
The segment delimiter has to be a \r, ie x0d, "carriage return" and not a \n, ie x0a "newline". There are probably some tools, maybe HL7 editors, accepting alternative segment delimiters or writing the wrong delimiter, but this is not standard.

PdfBox: PDF/A-1A to PDF/A-3A

i have the following problem:
i want to transform a PDF/A-1A document to a PDF/A-3A.
The original document is validated by Arobat Reader Pro, so i can asume it is PDF/A-1A conform.
I try to convert the PDF metadata with the following code:
private PDDocumentCatalog makeA3compliant(PDDocument doc) throws IOException, TransformerException {
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);
XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);
XMPSchemaDublinCore dc = xmp.addDublinCoreSchema();
String creator = "TestCr";
String producer = "testPr";
dc.addCreator(creator);
dc.setAbout("");
XMPSchemaBasic xsb = xmp.addBasicSchema();
xsb.setAbout("");
xsb.setCreatorTool(creator);
xsb.setCreateDate(GregorianCalendar.getInstance());
PDDocumentInformation pdi = new PDDocumentInformation();
pdi.setProducer(producer);
pdi.setAuthor(creator);
doc.setDocumentInformation(pdi);
XMPSchemaPDF pdf = xmp.addPDFSchema();
pdf.setProducer(producer);
pdf.setAbout("");
PDMarkInfo markinfo = new PDMarkInfo();
markinfo.setMarked(true);
doc.getDocumentCatalog().setMarkInfo(markinfo);
pdfaid.setPart(3);
pdfaid.setConformance("A");
pdfaid.setAbout("");
metadata.importXMPMetadata(xmp);
return cat;
}
If i try to validate the new file with Acrobat again, i get a validation error:
CIDset in subset font is incomplete (font contains glyphs that are not listed)
if i try to validate the file with this online validator (http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx) it is a valid PDF/A-3A....
am i missing something?
is nobody able to help?
EDIT: Here is the PDF file
This worked for us to be fully PDF/A-3 compliant regarding the CIDset issue:
private void removeCidSet(PDDocumentCatalog catalog) {
COSName cidSet = COSName.getPDFName("CIDSet");
// iterate over all pdf pages
for (Object object : catalog.getAllPages()) {
if (object instanceof PDPage) {
PDPage page = (PDPage) object;
Map<String, PDFont> fonts = page.getResources().getFonts();
Iterator<String> iterator = fonts.keySet().iterator();
// iterate over all fonts
while (iterator.hasNext()) {
PDFont pdFont = fonts.get(iterator.next());
if (pdFont instanceof PDType0Font) {
PDType0Font typedFont = (PDType0Font) pdFont;
if (typedFont.getDescendantFont() instanceof PDCIDFontType2Font) {
PDCIDFontType2Font f = (PDCIDFontType2Font) typedFont.getDescendantFont();
PDFontDescriptor fontDescriptor = f.getFontDescriptor();
if (fontDescriptor instanceof PDFontDescriptorDictionary) {
PDFontDescriptorDictionary fontDict = (PDFontDescriptorDictionary) fontDescriptor;
fontDict.getCOSDictionary().removeItem(cidSet);
}
}
}
}
}
}
}
OK - I think I have an answer on your question from the perspective of the callas and/or Adobe technology (and once more, I'm affiliated with callas and its pdfToolbox technology that is also used inside of Acrobat).
According to my research and the people I consulted, your example PDF document contains a font with a CID character set that is incomplete. Why does pdfToolbox or Acrobat say it's a valid PDF/A-1a file but not a valid PDF/A-3a file? Interesting question:
1) The rules for incomplete CID sets changed between PDF/A-1a and PDF/A-3a. They are stricter in PDF/A-3a.
2) But while in PDF/A-1a a CID set always had to be there, in PDF/A-3a you can have a valid, compliant file, without such a CID set.
So, your PDF file contains a CID set (which makes it valid for PDF/A-1a and A-3a) but while that CID set is fine for A-1a it does not contains all characters to be A-3a compliant.
To test at least part of this theory, I processed your file through pdfToolbox with a fixup entitled "Remove CIDset if incomplete". That correction (as the name implies) removes the CID set from the file but doesn't change anything else. After doing so your file validates as a valid A-3a file.
That leaves the question why the pdftools web site claims this is a valid PDF/A-3a file; according to the people I've spoken to, the result from preflight for this file is correct and there should be an error on this file. So perhaps that's something you need to take up with the pdftools guys (and they possibly with callas to figure out who's finally right).
Feel free to send me a personal message if you want to discuss this further - more discussion on the tools themselves probably becomes off-topic for this public site.

hadoop-1.0.3 sequenceFile.Writer overwrites instead of appending images into a sequencefile

I am using hadoop 1.0.3 (I can't really upgrade right now,Thats for later. )
I have around 100 images in my HDFS and I am trying to combine them into a single sequencefile ( default no compression etc.. )
here's my code:
FSDataInputStream in = null;
BytesWritable value = new BytesWritable();
Text key = new Text();
Path inpath = new Path(fs.getHomeDirectory(),"/user/hduser/input");
Path seq_path = new Path(fs.getHomeDirectory(),"/user/hduser/output/file.seq");
FileStatus[] files = fs.listStatus(inpath);
SequenceFile.Writer writer = null;
for( FileStatus fileStatus : files){
inpath = fileStatus.getPath();
try {
in = fs.open(inpath);
byte bufffer[] = new byte[in.available()];
in.read(bufffer);
writer = SequenceFile.createWriter(fs,conf,seq_path,key.getClass(),value.getClass());
writer.append(new Text(inpath.getName()), new BytesWritable(bufffer));
}catch (Exception e) {
System.out.println("Exception MESSAGES = "+e.getMessage());
e.printStackTrace();
}}
This just goes through all the files in input/ and one by one appends them.
HOWEVER this just overwrites my sequence file instead of appending it , I see only the last image in sequencefile.
NOTE I am not closing the writer before the for loop ends , can anyone help me with this please.
I am not sure How can I append the images?
Your main issue is with the following line :
writer = SequenceFile.createWriter(fs, conf, seq_path, key.getClass(), value.getClass());
which is inside the for, creating a new writer in each pass. It replaces previous file at the path seq_path. Thus only last image is available.
Pull it out of the loop, and the problem should vanish.

Error Appending to IsolatedStorageFile

I am having some problems with Isolated file store , I am trying to append to a file, but when I use the code below, I get an error about invalid Arguments on this line
IsolatedStorageFileStream("Folder\\barcodeinfo.txt", FileMode.Append,
FileMode.OpenOrCreate, myStore))
I think it has something to do with the Filemode.Append.. I am trying to append to the file rather than create new
// Obtain the virtual store for the application.
IsolatedStorageFile myStore = IsolatedStorageFile.GetUserStoreForApplication();
// Create a new folder and call it "MyFolder".
myStore.CreateDirectory("Folder");
// Specify the file path and options.
using (var isoFileStream = new IsolatedStorageFileStream("Folder\\barcodeinfo.txt", FileMode.Append, FileMode.OpenOrCreate, myStore))
{
//Write the data
using (var isoFileWriter = new StreamWriter(isoFileStream))
{
isoFileWriter.WriteLine(textBox1.Text);
isoFileWriter.WriteLine(textBox2.Text);
isoFileWriter.WriteLine(textBox3.Text);
}
}
There is no overload that takes two FileModes. It should be
IsolatedStorageFileStream("Folder\\barcodeinfo.txt", FileMode.Append,
FileAccess.Write, myStore));
Important thing to note about FileMode.Append is:
[FileMode.Append] Opens the file if it exists and seeks to the end of the file, or
creates a new file. Append can only be used in conjunction with Write.
Attempting to seek to a position before the end of the file will throw
an IOException and any attempt to read fails and throws an
NotSupportedException.
which is why FileAccess.Write is used.
It looks like you have FileMode.Append, FileMode.OpenOrCreate. That is 2 file modes. The first parameter is be FileMode and the second should be FileAccess.
That should fix your problem.

Encoding conversion for large file

I am faced with a large (~ 18 GB) file, exported from SQL Server as a Unicode text file, which means its encoding is UTF-16 (little endian). The file is now stored in a computer running Linux, but I have not figured out a way to convert it to UTF-8.
At first I tried using iconv, but the file is too large for that. My next approach was using split and converting the files one by one, but that didn't work either - there were a lot of errors during the conversions.
So, any ideas on how to convert this to UTF-8? Any help will be much appreciated.
Since you're using SQL server, I assume your platform is Windows. In the simplest case you can write quick an dirty .NET application, which reads the source line-by-line and writes the converted file as it goes. Something like this:
using System;
using System.IO;
using System.Text;
namespace UTFConv {
class Program {
static void Main(string[] args) {
try {
Encoding encSrc = Encoding.Unicode;
Encoding encDst = Encoding.UTF8;
uint lines = 0;
using (StreamReader src = new StreamReader(args[0], encSrc)) {
using (StreamWriter dest = new StreamWriter(args[1], false, encDst)) {
string ln;
while ((ln = src.ReadLine()) != null) {
lines++;
dest.WriteLine(ln);
}
}
}
Console.WriteLine("Converted {0} lines", lines);
} catch (Exception x) {
Console.WriteLine("Problem converting the file: {0}", x.Message);
}
}
}
}
Just open Visual Studio, start a new C# Console Application project, paste this code in there, compile, and run it from the command line. The first argument is your source file, the second argument is your destination file. Should work.

Resources