What happen if i trigger a transference in WebSphere MQ FTE but the folder is contantly receiving new files - ibm-mq

I want to know what happen if i program a monitor to trigger a transference anytime a trigger file is found in x directory and transfer all the .txt files in x folder, what happen if this directory receive other files after the trigger file is created? are they send in the same transference? or will be send in another one?
Thanks for your help in advance

It depends on the timings between when the agent begins processing the transfer request submitted by the monitor and when the extra files are added to the directory that contains the source files to be transferred.
As an example, let's say you monitor directory x to match on the trigger file, "trigger.file". When this file is detected by a poll of the resource monitor, it submits a managed transfer request to the agent that specifies "*.txt" as the source file located in directory x also. In other words, the managed transfer request submitted will transfer any file ending in .txt in directory x (because of the wildcard).
Now, imagine the following timeline of events:
Two .txt files (file1.txt, file2.txt) are added to directory x.
The trigger file (trigger.file) is then subsequently created directory x.
The resource monitor polls, detects the file "trigger.file" which matches the resource monitors trigger conditions.
The resource monitor then submits a managed transfer request to the agent.
Before the agent processes this request, a new .txt file is added to directory x (file3.txt).
The agent then starts to process the managed transfer request and needs to expand the wildcard source file specification (*.txt) in a concrete list of files. So it lists directory x and picks out the files ending in .txt. At this point there are three files (file1.txt, file2.txt and file3.txt) that are included in the transfer, even though file3.txt was created after the resource monitor triggered when the trigger file was detected.
Once the wildcard has been expanded and the concrete list of files determined, any new .txt file (e.g., file4.txt) will not be transferred until the trigger file is updated / replaced causing the resource monitor to trigger again.
I hope this helps! If you need any further clarification, feel free to ask.

Related

How to transfer files sequentially from one folder to another using apache camel?

i have certain files in one folder:
abc.zip (optional)
def.zip(optional)
ghi.zip(optional)
I want to send it to destination folder sequentially.
From destination folder they will be sent to 3rd party system.
So ,Suppose abc.zip is transferred to destination folder,unless it is picked by third party system def.zip should not be transferred to destination folder.
So,destination folder is like watch folder where i want to check if previous file is present or not.If not then only send next file.
Is there any way to achieve this using apache camel?
In order to accomplish this, it sounds like you'd need to implement a org.apache.camel.component.file.GenericFileProcessStrategy class and set that as the processStrategy on the file component. That way you can check the destination for any files which have an earlier name.
From the docs (emphasis added):
A pluggable org.apache.camel.component.file.GenericFileProcessStrategy
allowing you to implement your own readLock option or similar. Can
also be used when special conditions must be met before a file can be
consumed, such as a special ready file exists. If this option is set
then the readLock option does not apply.

Windows Projected File System read only?

I tried to play around with Projected File System to implement a user mode ram drive (previously I had used Dokan). I have two questions:
Is this a read-only projection? I could not find anything any notification sent to me when opening the file from say Notepad and writing to it.
Is the file actually created on the disk once I use PrjWriteFileData()? From what I have understood, yes.
In that case what would be any useful thing that one could do with this library if there is no writing to the projected files? It seems to me that the only useful thing is to initially create a directory tree from somewhere else (say, a remote repo), but nothing beyond that. Dokan still seems the way to go.
The short answer:
It's not read-only but you can't write your files directly to a "source" filesystem via a projected one.
WriteFileData method is used for populating placeholder files on the "scratch" (projected) file system, so, it doesn't affect a "source" file system.
The long answer:
As stated in the comment by #zett42 ProjFS was mainly designed as a remote git file system. So, the main goal of any file versioning system is to handle multiple versions of files. From this a question arise - do we need to override the file inside a remote repository on ProjFS file write? It would be disastrous. When working with git you always write files locally and they are not synced until you push the changes to a remote repository.
When you enumerate files nothing being written to a local file system. From the ProjFS documentation:
When a provider first creates a virtualization root it is empty on the
local system. That is, none of the items in the backing data store
have yet been cached to disk.
Only after the file is opened ProjFS creates a "placeholder" for it in a local file system - I assume that it's a file with a special structure (not a real one).
As files and directories under the virtualization root are opened, the
provider creates placeholders on disk, and as files are read the
placeholders are hydrated with contents.
What "hydrated" is mean? Most likely, it represents a special data structure partially filled with real data. I would imaginge a placeholder as a sponge partially filled with data.
As items are opened, ProjFS requests information from the provider to allow placeholders for those items to be created in the local file system. As item contents are accessed, ProjFS requests those contents from the provider. The result is that from the user's perspective, virtualized files and directories appear similar to normal files and directories that already reside on the local file system.
Only after a file is updated (modified). It's not a placeholder anymore - it becomes "Full file/directory":
For files: The file's content (primary data stream) has been modified.
The file is no longer a cache of its state in the provider's store.
Files that have been created on the local file system (i.e. that do
not exist in the provider's store at all) are also considered to be
full files.
For directories: Directories that have been created on the local file
system (i.e. that do not exist in the provider's store at all) are
considered to be full directories. A directory that was created on
disk as a placeholder never becomes a full directory.
It means that on the first write the placeholder is replaced by the real file in the local FS. But how to keep a "remote" file in sync with a modified one? (1)
When the provider calls PrjWritePlaceholderInfo to write the
placeholder information, it supplies the ContentID in the VersionInfo
member of the placeholderInfo argument. The provider should then
record that a placeholder for that file or directory was created in
this view.
Notice "The provider should then record that a placeholder for that file". It means that in order to sync the file later with a correct view representation we have to remember with which version a modified file is associated. Imagine we are in a git repository and we change the branch. In this case, we may update one file multiple times in different branches. Now, why and when the provider calls PrjWritePlaceholderInfo?
... These placeholders represent the state of the backing store at the
time they were created. These cached items, combined with the items
projected by the provider in enumerations, constitute the client's
"view" of the backing store. From time to time the provider may wish
to update the client's view, whether because of changes in the backing
store, or because of explicit action taken by the user to change their
view.
Once again, imagine switching branches in a git repository; you have to update a file if it's different in another branch. Continuing answering the question (1). Imaging you want to make a "push" from a particular branch. First of all, you have to know which files are modified. If you are not recorded the placeholder info while modifying your file you won't be able to do it correctly (at least for the git repository example).
Remember, that a placeholder is replaced by a real file on modification? A ProjFS has OnNotifyFileHandleClosedFileModifiedOrDeleted event. Here is the signature of the callback:
public void NotifyFileHandleClosedFileModifiedOrDeletedCallback(
string relativePath,
bool isDirectory,
bool isFileModified,
bool isFileDeleted,
uint triggeringProcessId,
string triggeringProcessImageFileName)
For our understanding, the most important parameter for us here is relativePath. It will contain a name of a modified file inside the "scratch" file system (projected). Here you also know that the file is a real file (not a placeholder) and it's written to the disk (that's it you won't be able to intercept the call before the file is written). Now you may copy it to the desired location (or do it later) - it depends on your goals.
Answering the question #2, it seems like PrjWriteFileData is used only for populating "scratch" file system and you cannot use it for updating the "source" file system.
Applications:
As for applications, you still can implement a remote file system (instead of using Dokan) but all writes will be cached locally instead of directly written to a remote location. A couple use case ideas:
Distributed File Systems
Online Drive Client
A File System "Dispatcher" (for example, you may write your files in different folders depending on particular conditions)
A File Versioning System (for example, you may preserve different versions of the same file after a modification)
Mirroring data from your app to a file system (for example, you can "project" a text file with indentations to folders, sub-folders and files)
P.S.: I'm not aware of any undocumented APIs, but from my point of view (accordingly with the documentation) we cannot use ProjFS for purposes like a ramdisk or write files directly to the "source" file system without writing them to the "local" file system first.

Spring batch job start processing file not fully uploaded to the SFTP server

I have a spring-batch job scanning the SFTP server at a given interval. When it finds a new file, it starts the processing.
It works fine for most cases, but there is one case when it doesn't work:
User starts uploading a new file to the SFTP server
Batch job checks the server and finds a new file
It start processing it
But since the file is still being uploaded, during the processing it encounters unexpected end of input block, and the error occurs.
How can I check that file was fully uploaded to the SFTP server before batch job processing starts?
Locking files while uploading / Upload to temporary file name
You may have an automated system monitoring a remote folder and you want to prevent it from accidentally picking a file that has not finished uploading yet. As majority of SFTP and FTP servers (WebDAV being an exception) do not support file locking, you need to prevent the automated system from picking the file otherwise.
Common workarounds are:
Upload “done” file once an upload of data files finishes and have
the automated system wait for the “done” file before processing the
data files. This is easy solution, but won’t work in multi-user
environment.
Upload data files to temporary (“upload”) folder and move them atomically to target folder once the upload finishes.
Upload data files to distinct temporary name, e.g. with .filepart extension, and rename them atomically once the upload finishes. Have the automated system ignore the .filepart files.
Got from here
We had similar problem, Our solution was, we configured spring-batch cron trigger to trigger the job every 10min(though we could configure for 5min, as file transfer was taking less than 3min), then we read/process all the files created prior to 10 minutes. We assume the FTP operation completes within 3 minutes. This gave us some additional flexibility such as when spring-batch app was down etc.
For example if the batch job triggered at 10:20AM we read all the files that were created before 10:10AM, like-wise job that runs at 10:30, reads all the files created before 10:20.
Note: Once Read you need to either delete or move to history folder for duplicate reads.

Monitoring files entering and leaving the folder(file queue)

I have a Folder which is assumed to be a file queue, every second many files enter and leave the file queue. I want to write a script which will maintain 2 log files. One will log the name of file and time at which the file entered the file queue, the other will log the name of the file and time at which the file will leave the queue. Could you please help?
I think with a batch file you can't reach that goal.
I would prefer a C# console application or a powershell script. In both cases the FileSystemWatcher would be the needed class.
C# documentation and example
Powershell example

What is the algorithm that dropbox uses to identify list of files/folders changed locally when the app was not running?

I understand that we can identify the changes in File System when the app is running using some OS events. I am just wondering when the app is not running, If I make lots of changes on the file system like add / modify / delete / rename few files and folders, What algorithm does Dropbox uses to identify these changes. One thing I could think of is, by comparing last modified time of a file on the file system against LMT stored value when the app was running. In this case, we had to loop through all the files anyways. However, LMT doesn't change if we do rename. Just wanted to see is there any better approach as relying on LMT has its own problems?
Any comments?
I don't know if it's how Dropbox handles it but here is a strategy that may be useful:
You have a root directory handled by Dropbox. If I were Dropbox, I'ld keep hashes for each file I have on the server. Starting from the root, the app would scan the file tree (directories + files) and compute the hashes for each file.
The scan would lead to a double index hashtable. Each file and directory would be indexed using its relative path (from the root Dropbox directory). A second index would be made using the hash(es) of each file.
Now, the app has scanned and established the double-indexed hashtable. The server would then send the tuples (relative path, hashes of the file). Let (f, h) be such a file tuple:
The app would try to get the file through the path index using f:
If there is a result, compare the hashes. If they don't match, update the file on the remote server
If there is no result the file may have been deleted OR moved/renamed. The app then tries to get the file through the hash index using h: if there is a match, that means the file is still there only under a different path (hence moved or renamed). The app send the info and the file is appropriately moved/renamed on the server.
The file has not been found neither using the hash or the path. It has been deleted from the Dropbox file tree: we delete it accordingly on the server.
Note that this strategy needs a synchronization mechanism to know, when meeting a match, if the file has to be updated on the client or on the server. This could be achieved by storing the time of the last update run by Dropbox (on the client and the server) and who performed this last update (on the server).

Resources