Sequential file processing in webmethods - webmethod

Can webmethods file port read files in specific sequence? like if i have file 001.xml, 003.xml, 002.xml in the monitoring folder can webmethods be configured/customized to read file in filename sequence like 001.xml, 002.xml, 003.xml?

AFAIK, the files are read in no particular order. In order to process them in some order, you'd probably have to read them, publish an event (containing the file name among other data) or store the file somewhere, and then process the published events / stored files as you like.

Related

How to wait until a specific file arrives in a folder before NiFi's ListFile processor lists the entire contents of the floder

I need to move several hundred files from a Windows source folder to a destination folder together in one operation. The files are named sequentially (e.g. part-0001.csv, part-002.csv). It is not known what the final file in the sequence will be called. The files will arrive in the source folder over a number of weeks and it is not ascertainable when the final one will arrive. The users want to use a trigger file (i.e. the arrival of a spefic named file in the folder e.g. trigger.txt) to cause flow to start. My first two thoughts were using a first ListFile processor as an input to a second, or the input to an ExecuteProcess processor that would call a script to start the second one, however, neither of these processors accept an input, so I am a bit stumped as to how I might achieve this, or indeed if it is possible with NiFi. Has anyone encountered this use case, and if so how did you resolve it?

Get files from Ab-initio server to SFTP server

I need the shell script to pull the .dat file from source server to SFTP server.
Every time the job runs, shell script has to verify if the table already exists in sftp server and get all the files corresponding to that table with date greater than the existing file. (file comparison is required based on the date in the filename).
Example: Yesterday, job ran and file "table1_extract_20190101.dat" is extracted. And in source server, I have 2 files "table1_extract_20190102.dat", "table1_extract_20190103.dat". Then it has to get both the files and so on for each and every table.
Please suggest on how this could be implemented.
Thanks
Use Ab Initio SFTP To component.
Ideally, add it at the end of the graph that creates the files, so all handling is in one place. The SFTP To component(s) would run in a new phase after the files are written.
Or, create another Ab Initio graph that looks for filenames based on the filename specification used to generate the original filenames. One risk is being sure the files have been written completely, which is why it is ideal to do it in the original graph. You would need to schedule this graph to run after the first graph is complete. A good way to do that is with a plan. Another way using Control>Center is to schedule this job after the previous one completes by adding a job dependency.

Spark saveAsTextFile writes empty file - <directory>_$folder$ to S3

rdd.saveAsTextFile("s3n://bucket-name/path) is creating an empty file with folder name as - [folder-name]_$folder$
Seems like this empty file in used by hadoop-aws jar (of org.apache.hadoop) to mimick S3 filesystem as hadoop filesystem.
But, my application writes thousands of files to S3. As saveAsTextFile creates folder (from the given path) to write the data (from rdd) my application ends up creating thousands of these empty files - [directory-name]_$folder$.
Is there a way to make rdd.saveAsTextFile not to write these empty files?
Stop using s3n, switch to s3a. It's faster and actually supported. that will make this issue go away, along with the atrocious performance problems reading large Parquet/ORC files.
Also, if your app is creating thousands of small files in S3, you are creating future performance problems: listing and opening files on S3 is slow. Try to combine source data into larger columnar-formatted files & use whatever SELECT mechanism your framework has to only read the bits you want

HDFS "files" that are directories

Background--we are trying to read different file types (csv or parquet) into pyspark, and I have the task of writing a program that will determine file type.
It appears that parquet files are always directories, parquet file appears in HDFS as a directory.
We have some csv files that are also directories, where the file name is the directory name and the directory contains several part files. What processes do this?
Why are some files --'files' and some files 'directories'?
It will depend on what process produced those files. For example, when MapReduce produces output, it always produces a directory and then creates one output file per reducer within that directory. This is done so that each reducer can create its output independently.
Judging from Spark's CSV package, it expects to output to a single file. So perhaps the single-file CSVs are being generated by Spark and the directories by MapReduce.
To be as generic as possible, it may be a good idea to do the following: check if the file in question is a directory. If not, check the extension. If yes, look at the extension of the files inside of the directory. This should work for each of your situations.
Note that some input formats (e.g. MapReduce input formats) will only accept directories as inputs, and some (e.g. Spark's textFile) will only accept files/globs of files. You need to be aware of what is expected from the libraries you are interacting with.
All the data on your hard drive consists of files and folders. The
basic difference between the two is that files store data, while
folders store files and other folders.
Hadoop execution engines generally creates a directory and write multiple part files as output based on the number of reducers or executors used.
When you many an output file abc.csv it doesn't mean that its a single file with the data. Its just the output location which MapReduce (generally) interprets as the new directory to be created within which it creates the output files(part files).
In case of Spark when you are writing a file(maybe using .saveAsTextFile) it may creates only a single file.

HL7 Message Document?

Is there a tool which can take 1000 Seperate HL7 Messages and combine them into a single document for 7edit? I need to run a test, and if I can do one document and choose send all, it will be better than me running it manually for each of these 1000 messages.
Yes, There exist a way to combine those messages in a single file. You can do that using any integration engine, I will take Mirth in this case.
Follow these steps in sequential order
Download Mirth Connect from here using the .exe installer (in case you don't have it).
Setup your account and do initial configuration on your local system.
Create a Channel called Appending Channel, put Source inbound and outbound connector as HL7v2.x.
Go to Source Tab, Put Connector type as File Reader. Give the location of the directory where your messages will reside(D:\x\read in my case). Make sure you have the directory shared
You can make Delete file after read as a Yes, which will prune the files after they are read from this location.If you do a NO, then specify where you want to move those files to.
Put Process Batch files as a No.
Go to Destinations tab, create a Destination called as Appender and make it a File Writer type.
Give the directory(D:\x\Output in my case) where your final file will be placed.Give the file name as final.txt.
Choose Append on the file exists tab.
In Template, Drag Raw Data from the list on the right hand side, and put it here or else what you can do is type ${message.rawData} in the template section.
Save Channel and Deploy it.
Place all your messages in the read folder (mentioned above), and wait for Mirth to poll the folder (default setting is 1000 ms).
Once that is done, go to final.txt to see all the messages appended in the same file.
The downside is that even though this process is 100 percent working, the message thus appended will not be seperated by any means. So it will look like below
|2688684|||||||||||||||||||||||||199912271408||||||002376853MSH|^~\&|EPIC|EPICADT|
^ End of first message
You don't need any tool for that. 7edit is able to read multi-message files. You just need to append each message into one single text file like this (two ADT messages):
MSH|^~\&|SystemA|CompanyA|SystemB|CompanyB|20121116122025||ADT^A01|101|T|2.5||||||UNICODE UTF-8
EVN|A01|20130823080958
PID|||1000||Lastname^Firstname
PV1||I
MSH|^~\&|SystemA|CompanyA|SystemB|CompanyB|20121116122026||ADT^A01|102|T|2.5||||||UNICODE UTF-8
EVN|A01|20130823080958
PID|||1000||Lastname^Firstname
PV1||I
Open this file with 7edit and you will see this (multiple messages):
Now you can send all messages at once by pressing on Send and then select All Messages:
It is that simple - no other tool necessary (just to make the append in one file maybe)
You could also try to use HL7Browser (www.nule.org), a tool that is similar to 7Edit, with less features but free.
You should be able to open many single HL7 messages files, HL7Browser will cache them in its viewer and should allow you to save them all to a single file.
Hope helps
Davide
if you have multiple HL7 files in one folder and want to combine them into 1 hl7 file, you can do following:
create a batch file in this folder named combine.cmd
write following into this batch file
del combined.hl7
for %%f in (*.hl7) do type "%%f" >> combined.hl
move combined.hl combined.hl7
run this batch file
result: all hl7 files in this folder are combined into a single file called "combined.hl7"

Resources