hl7 message encoding error while parsing the message in map-reduce - hadoop

I am trying to parse a HL7 message by Hapi in map-reduce function i got EncodingNotSupportedException when i run the map task.
i tried to add \n or \r to the end of each segment but i am facing the same error.
the message is saved in text file and it uploaded to HDFS. should i need to add something this is my code
String v = value.toString();
InputStream is = new StringBufferInputStream(v);
is = new BufferedInputStream(is);
Hl7InputStreamMessageStringIterator iter = new Hl7InputStreamMessageStringIterator(
is);
HapiContext hcontext = new DefaultHapiContext();
Message hapiMsg;
Parser p = hcontext.getGenericParser();
while (iter.hasNext()) {
String msg = iter.next();
try {
hapiMsg = p.parse(msg);
} catch (EncodingNotSupportedException e) {
e.printStackTrace();
return;
} catch (HL7Exception e) {
e.printStackTrace();
return;
}
}
the sample message
MSH|^~\&|HIS|RIH|EKG|EKG|20150121002000||ADT^A01||P|2.5.1
EVN||20150121002000|||||CITY GENL HOSP^0133195934^NPI
PID|1||95101100001^^^^PI^CITY GENL HOSP&0133195934&NPI||SNOW^JOHN^^^MR^^L||19560121002000|M||2054-5^White^CDCREC|470 Ocean Ave^^NEW YORK^^11226^USA^C^^29051||^^^^^513^5551212|||||95101100001||||2186-5^White American^CDCREC|||1
PV1||E||E||||||||||1||||||||||||||||||||||||||||||
OBX|1|NM|21612-7^PATIENT AGE REPORTED^LN||60|a^YEAR^UCUM|||||F|||201601131443
OBX|2|NM|21613-7^Urination^LN||2|a^DAY^UCUM|||||F|||19740514201500
DG1|001||4158^Diabetes^I9CDX||19740514201500|A|5478^Non-infectious
DG1|002||2222^Huntington^I9CDX||19610718121500|A|6958^Genetic

Never store HL7-messages as text file, but as binary. Are you sure, that the segment delimiters are ok?
Just check your HL7 message after reading from HDFS either via printing to the console or via the use of a debugger, if the message contains only \r as segment delimiter before parsing.
The segment delimiter has to be a \r, ie x0d, "carriage return" and not a \n, ie x0a "newline". There are probably some tools, maybe HL7 editors, accepting alternative segment delimiters or writing the wrong delimiter, but this is not standard.

Related

How to match EOF condition using antlr 4

I am new to ANTLR and I am currently writing a lexer for cool language in ANTLR 4.
For more about cool language please refer http://theory.stanford.edu/~aiken/software/cool/cool-manual.pdf.
One rule of cool language that I was trying to implement was detecting EOF inside Comments (may be nested) or String Constants and reporting as an error.
This is the rule that I wrote :
ERROR : '(*' (COMMENT|~['(*'|'*)'])*? (~['*)']) EOF {reportError("EOF in comment");}
|'"' (~[\n"])* EOF {reportError("EOF in string");};
fragment COMMENT : '(*' (COMMENT|~['(*'|'*)'])*? '*)'
Here the fragment COMMENT is a recursive rule that I used.
The function reportError used above reports error which is given below:
public void reportError(String errorString){
setText(errorString);
setType(ERROR);
}
But when I run it on the test file given below:
"Test String
It gives the following output :
line 1:0 token recognition error at: '"Test String\n'
#name "helloworld.cl"
Clearly the String with EOF in it was not recognised and ERROR was not detected.
Can someone help me in pointing out where I am going wrong as EOF (and hence, the error rule) is somehow not getting detected by the lexer.
If something is not clear please do mention it.
'"' (~[\n"])* EOF
Here the ~[\n"]* part will stop at the first \n or " or at the end of the file.
If it stops at a ", the rule does not match because the EOF does not match and that's what we want because the string literal is properly terminated.
If it stops at the end of file, then the subsequent EOF will match and you'll get an ERROR token. So that's also what you want.
But if it stops at a \n, the EOF will not match and you won't get an error token even though you'd want one in this case. And since your input ends with a \n, that's exactly the scenario you're running into here. So in addition to EOF, you should also allow for erroneous string literals to end in \n:
'"' (~[\n"])* ('\n' | EOF)
You don't need a dedicated ERROR rule. You can handle that specific situation with an unfinished string directly in your error listener. Your comment rule shouldn't be a fragment however, as it has to recognize a lexeme on its own that must be handled (fragment rules are rather rules to be used in other lexer rules only).
When the lexer reaches a string but cannot finish it due to the end of the input, you can get the offending input from the current lexer state in your error listener. You can then check that to see what exactly wasn't finished, like I do here for 3 quoted text types in MySQL:
void LexerErrorListener::syntaxError(Recognizer *recognizer, Token *, size_t line,
size_t charPositionInLine, const std::string &, std::exception_ptr ep) {
// The passed in string is the ANTLR generated error message which we want to improve here.
// The token reference is always null in a lexer error.
std::string message;
try {
std::rethrow_exception(ep);
} catch (LexerNoViableAltException &) {
Lexer *lexer = dynamic_cast<Lexer *>(recognizer);
CharStream *input = lexer->getInputStream();
std::string text = lexer->getErrorDisplay(input->getText(misc::Interval(lexer->tokenStartCharIndex, input->index())));
if (text.empty())
text = " "; // Should never happen.
switch (text[0]) {
case '/':
message = "Unfinished multiline comment";
break;
case '"':
message = "Unfinished double quoted string literal";
break;
case '\'':
message = "Unfinished single quoted string literal";
break;
case '`':
message = "Unfinished back tick quoted string literal";
break;
default:
// Hex or bin string?
if (text.size() > 1 && text[1] == '\'' && (text[0] == 'x' || text[0] == 'b')) {
message = std::string("Unfinished ") + (text[0] == 'x' ? "hex" : "binary") + " string literal";
break;
}
// Something else the lexer couldn't make sense of (likely there is no rule that accepts this input).
message = "\"" + text + "\" is no valid input at all";
break;
}
owner->addError(message, 0, lexer->tokenStartCharIndex, line, charPositionInLine,
input->index() - lexer->tokenStartCharIndex);
}
}
This code was taken from the parser module in MySQL Workbench.

MVStore Online Back Up

The information in the MVStore docs on backing up a database is a little vague, and I'm not familiar with all the concepts and terminology, so I wanted to see if the approach I came up with makes sense.
I'm a Clojure programmer, so please forgive my Java here:
// db is an MVStore instance
FileStore fs = db.getFileStore();
FileOutputStream fos = java.io.FileOutputStream(pathToBackupFile);
FileChannel outChannel = fos.getChannel();
try {
db.commit();
db.setReuseSpace(false);
ByteBuffer bb = fs.readFully(0, fs.size());
outChannel.write(bb);
}
finally {
outChannel.close();
db.setReuseSpace(true);
}
Here's what it looks like in Clojure in case my Java is bad:
(defn backup-db
[db path-to-backup-file]
(let [fs (.getFileStore db)
backup-file (java.io.FileOutputStream. path-to-backup-file)
out-channel (.getChannel backup-file)]
(try
(.commit db)
(.setReuseSpace db false)
(let [file-contents (.readFully fs 0 (.size fs))]
(.write out-channel file-contents))
(finally
(.close out-channel)
(.setReuseSpace db true)))))
My approach seems to work, but I wanted to make sure I'm not missing anything or see if there's a better way. Thanks!
P.S. I used the H2 tag because MVStore doesn't exist and I don't have enough reputation to create it.
The docs currently say:
The persisted data can be backed up at any time, even during write
operations (online backup). To do that, automatic disk space reuse
needs to be first disabled, so that new data is always appended at the
end of the file. Then, the file can be copied. The file handle is
available to the application. It is recommended to use the utility
class FileChannelInputStream to do this.
The classes FileChannelInputStream and FileChannelOutputStream convert a java.nio.FileChannel into a standard InputStream and OutputStream. There is existing H2 code in BackupCommand.java that shows how to use them. We can improve upon it using Java 9 input.transferTo(output); to copy the data:
public void backup(MVStore s, File backupFile) throws Exception {
try {
s.commit();
s.setReuseSpace(false);
try(RandomAccessFile outFile = new java.io.RandomAccessFile(backupFile, "rw");
FileChannelOutputStream output = new FileChannelOutputStream(outFile.getChannel(), false)){
try(FileChannelInputStream input = new FileChannelInputStream(s.getFileStore().getFile(), false)){
input.transferTo(output);
}
}
} finally {
s.setReuseSpace(true);
}
}
Note that when you create the FileChannelInputStream you have to pass false to tell it to not close the underlying file channel when the stream is closed. If you don't do that it will close the file that your FileStore is trying to use. That code uses try-with-resource syntax to make sure that the output file is properly closed.
In order to try this, I checked out the mvstore code then modified the TestMVStore to add a testBackup() method which is similar to the existing testSimple() code:
private void testBackup() throws Exception {
// write some records like testSimple
String fileName = getBaseDir() + "/" + getTestName();
FileUtils.delete(fileName);
MVStore s = openStore(fileName);
MVMap<Integer, String> m = s.openMap("data");
for (int i = 0; i < 3; i++) {
m.put(i, "hello " + i);
}
// create a backup
String fileNameBackup = getBaseDir() + "/" + getTestName() + ".backup";
FileUtils.delete(fileNameBackup);
backup(s, new File(fileNameBackup));
// this throws if you accidentally close the input channel you get from the store
s.close();
// open the backup and verify
s = openStore(fileNameBackup);
m = s.openMap("data");
for (int i = 0; i < 3; i++) {
assertEquals("hello " + i, m.get(i));
}
s.close();
}
With your example, you are reading into a ByteBuffer which must fit into memory. Using the stream transferTo method uses an internal buffer that is currently (as at Java11) set to 8192 bytes.

Collecting output from Docker in java application

I'm executing some code on docker in my java application using ProcessBuilder to run the code, however i'm having trouble retrieving the output from it. BufferedReader is not reading anything from the InputStream returned from the container. Is there a specific way to retrieve output from Docker??
I've never had trouble getting output from bash executions before, so I'm thinking maybe docker does things differently somehow. Any ideas would be appreciated
Here's a snippet of the code:
Process dockerCommand;
ProcessBuilder builder = new ProcessBuilder("bash","-c","sudo docker images");
builder.redirectErrorStream(true);
builder.redirectOutput(ProcessBuilder.Redirect.INHERIT);
builder.redirectError(ProcessBuilder.Redirect.INHERIT);
dockerCommand = builder.start();
dockerCommand.waitFor();
List<String> result = new ArrayList<>();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(dockerCommand.getInputStream()))
{
String line = reader.readLine();
while (line != null) {
result.add(line);
line = reader.readLine();
}
}
catch (IOException exc)
{}
The line
builder.redirectOutput(ProcessBuilder.Redirect.INHERIT);
causes bash to receive the same standard output as the parent process, which is presumably your terminal window. This produces misleading results because you actually see the Docker image list, but it's being printed by the shell.
If I comment that out and then iterate over the results list, I can see the output from Docker inside the JVM.

JRecord - Formatting file transferred from Mainframe

I am trying to display a mainframe file in a eclipse RCP application using JRecord library.I already have the COBOL copybook as a text file.
to accomplish that,
I am transferring the file from mainframe to my desktop through
apache commons net FTPClient API
Now I have a text file
I am removing the newline and carriage return characters
then I read it via ., a CobolIoProvider and convert it into a ArrayList of type AbstractLine
But I have offset issues because of some special charcters .
here are the issues
when I dont perform step #3 , there are offset issues right from
record 1. hence I included step #3
even when I perform step #3 , the first few thounsands of records seem to be formatted(or read ) by the AbstractLineReader correctly unless it encounters a special character (not sure but thats my assumption).
Code snippet:
ArrayList<AbstractLine> lines = new ArrayList<AbstractLine>();
InputStream copyStream;
InputStream fis;
try {
copyStream = new FileInputStream(new File(copybookfile));
String filec = FileUtils.readFileToString(new File(datafile));
System.out.println("initial len: "+filec.length());
filec=filec.replaceAll("\r", "");
filec=filec.replaceAll("\n", "");
System.out.println("initial len: "+filec.length());
fis= new ByteArrayInputStream(filec.getBytes());
CobolIoProvider ioProvider = CobolIoProvider.getInstance();
AbstractLineReader reader = ioProvider.newIOBuilder(copyStream, "REQUEST",
Convert.FMT_MAINFRAME).newReader(fis);
AbstractLine line;
while ((line = reader.read()) != null) {
lines.add(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
What am I missing here ? is there an additional preprocessing that I need to do for the file transferred from mainframe ?
If it is a Text File (no binary data) with \r\n line delimiters try:
ArrayList<AbstractLine> lines = new ArrayList<AbstractLine>();
InputStream copyStream;
InputStream fis;
try {
copyStream = new FileInputStream(new File(copybookfile));
AbstractLineReader reader = CobolIoProvider.getInstance()
.newIOBuilder(copyStream, "REQUEST", ICopybookDialects.FMT_MAINFRAME)
.setFileOrganization(Constants.IO_STANDARD_TEXT_FILE)
.newReader(datafile);
AbstractLine line;
while ((line = reader.read()) != null) {
lines.add(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Note: The setFileOrganization tells JRecord what type of file it is. So .setFileOrganization(Constants.IO_STANDARD_TEXT_FILE) tells JRecord it is a Text file with \n or \r\n end-of-line markers. Here is a Description of FileOrganisation in JRecord.
The special charcters worry me though, if there is a \n in the 'Data' it will be treated as an end-of-line. You may need to do binary transfer and keep the RDW (Record-Descriptor-Word) if it is a VB file.
If The file contains Binary data, you will need:
do a binary transfer (with RDW if it is a VB file)
use the appropriate File-Organisation
Specify Ebcdic (.setFont("cp037") tells JRecord is US-Ebcdic)
I will add a second answer for Generating Code using the RecordEditor
If you are absolutely sure all the records are the same length you can use the low-level routines to do the reading see the ReadAqtrans.java program in https://sourceforge.net/p/jrecord/discussion/678634/thread/4b00fed4/
basically you would do:
ICobolIOBuilder iobuilder = CobolIoProvider.getInstance()
.newIOBuilder("copybookFileName", ICopybookDialects.FMT_MAINFRAME)
.setFont("CP037")
.setFileOrganization(Constants.IO_FIXED_LENGTH);
LayoutDetail layout = iobuilder.getLayout();
FixedLengthByteReader br
= new FixedLengthByteReader(layout.getMaximumRecordLength() + 2);
br.open("...");
byte[] bytes;
while ((bytes = br.read()) != null) {
lines.add(iobuilder.newLine(bytes));
}
Future Reference / Binary File
If the file does contain Binary Data, you really need to do a binary transfer. You may find the RecordEditor useful.
The RecordEditor 0.98 has a JRecord code Generation
function. The advantages of using the RecordEditor Generate function are
The Recordeditor will try and work out the appropriate File attributes by looking at the File
You can try out various attributes (left hand pane) and see what the file looks like with those attributes
(right hand side).
When happy, hit the Generate button and the RecordEditor will generate JRecord code. There are several Code Templates
available:
Standard - will generate basic JRecord code (with a field name class
lineWrapper - will generate a "wrapper" class with the Cobol fields represented as get/set methods
RecordEditor Generate
In the RecordEditor select Generate >>> Java~JRecord code for Cobol
Generate Screen
Enter the Cobol CopyBook / Sample file and adjust the attributes as needed
Code Template
Next you can select the Code Template
Generated Code
Finally the RecordEditor will generate JRecord code based on the Attributes entered.

hadoop-1.0.3 sequenceFile.Writer overwrites instead of appending images into a sequencefile

I am using hadoop 1.0.3 (I can't really upgrade right now,Thats for later. )
I have around 100 images in my HDFS and I am trying to combine them into a single sequencefile ( default no compression etc.. )
here's my code:
FSDataInputStream in = null;
BytesWritable value = new BytesWritable();
Text key = new Text();
Path inpath = new Path(fs.getHomeDirectory(),"/user/hduser/input");
Path seq_path = new Path(fs.getHomeDirectory(),"/user/hduser/output/file.seq");
FileStatus[] files = fs.listStatus(inpath);
SequenceFile.Writer writer = null;
for( FileStatus fileStatus : files){
inpath = fileStatus.getPath();
try {
in = fs.open(inpath);
byte bufffer[] = new byte[in.available()];
in.read(bufffer);
writer = SequenceFile.createWriter(fs,conf,seq_path,key.getClass(),value.getClass());
writer.append(new Text(inpath.getName()), new BytesWritable(bufffer));
}catch (Exception e) {
System.out.println("Exception MESSAGES = "+e.getMessage());
e.printStackTrace();
}}
This just goes through all the files in input/ and one by one appends them.
HOWEVER this just overwrites my sequence file instead of appending it , I see only the last image in sequencefile.
NOTE I am not closing the writer before the for loop ends , can anyone help me with this please.
I am not sure How can I append the images?
Your main issue is with the following line :
writer = SequenceFile.createWriter(fs, conf, seq_path, key.getClass(), value.getClass());
which is inside the for, creating a new writer in each pass. It replaces previous file at the path seq_path. Thus only last image is available.
Pull it out of the loop, and the problem should vanish.

Resources