I want to understand when and how spring batch creates and writes files. Means, files are open until batch is completed? I wanted to know whether the file is open at my destination server. What is the best way i can do that.
My application is running on Unix, jboss server
You can check open file descriptors for the PID. You may use lsof command.
Related
I am working on the 3 steps Spring Batch project. Firstly, it downloads needed text files from ftp to local, then process it, and finally delete files in the local directory every 10 minutes. And every 10 minutes there are new files loaded in the FTP. What if there emerge some problem in the FTP and it does not load new files? Then Spring Batch project download same file and process it again. So my question is that how can avoid Spring Batch to process same file twice?
Edit: I have used Apache common library to download files from FTP.
And I am using MultiResourceItemReader to pull 2 text files at each run.
I would use the file name as a job parameter. This will create a job instance for each file.
Now since Spring Batch prevents running the same job instance to completion more than once, then each file would be processed only once and you could avoid processing the same file twice by design.
If I understand it correctly normal way of spring batch testing is to basically run my application and let JobLauncherTestUtils run my normal jobs. However my application reads input from external service and writes it to my database. I don't want my tests to write to my production database and I'd like to specify test input to be read rather from the files I'd provide than from external service.
Can anyone direct me to some example how I could do it? I'd like to feed a job with a file then when job has finished check in the database that what I expect is there. I guess I could specify h2 db in application-test.properties but I have no clue about the input.
Docs from https://docs.spring.io/spring-batch/4.1.x/reference/html/testing.html#testing don't really cover it for me.
Are you reading input files from disk? If so you can edit the input file source directory only for tests to be within the src/test/resources/input_dir/your_test_file.xml for example.
If the input file directory is configured with properties, you could create properties file only for tests with something like classpath:input_dir/your_test_file.xml (which would be in your project as src/test/resources/input_dir/your_test_file.xml).
If the input file directory is configured within execution context you can provide that in the jobExecutionContext parameter of JobLauncherTestUtils.launchStep
I have a spring-batch job scanning the SFTP server at a given interval. When it finds a new file, it starts the processing.
It works fine for most cases, but there is one case when it doesn't work:
User starts uploading a new file to the SFTP server
Batch job checks the server and finds a new file
It start processing it
But since the file is still being uploaded, during the processing it encounters unexpected end of input block, and the error occurs.
How can I check that file was fully uploaded to the SFTP server before batch job processing starts?
Locking files while uploading / Upload to temporary file name
You may have an automated system monitoring a remote folder and you want to prevent it from accidentally picking a file that has not finished uploading yet. As majority of SFTP and FTP servers (WebDAV being an exception) do not support file locking, you need to prevent the automated system from picking the file otherwise.
Common workarounds are:
Upload “done” file once an upload of data files finishes and have
the automated system wait for the “done” file before processing the
data files. This is easy solution, but won’t work in multi-user
environment.
Upload data files to temporary (“upload”) folder and move them atomically to target folder once the upload finishes.
Upload data files to distinct temporary name, e.g. with .filepart extension, and rename them atomically once the upload finishes. Have the automated system ignore the .filepart files.
Got from here
We had similar problem, Our solution was, we configured spring-batch cron trigger to trigger the job every 10min(though we could configure for 5min, as file transfer was taking less than 3min), then we read/process all the files created prior to 10 minutes. We assume the FTP operation completes within 3 minutes. This gave us some additional flexibility such as when spring-batch app was down etc.
For example if the batch job triggered at 10:20AM we read all the files that were created before 10:10AM, like-wise job that runs at 10:30, reads all the files created before 10:20.
Note: Once Read you need to either delete or move to history folder for duplicate reads.
I have a spring batch integration where multiple servers are polling a single file directory. This causes a problem where a file can be processed up by more than one. I have attempted to add a nio-lock onto the file once a server has got it but this locks the file for processing so it can't read the contents of the file.
Is there a spring batch/integration solution to this problem or is there a way to rename the file as soon as it is picked up by a node?
Consider to use FileSystemPersistentAcceptOnceFileListFilter with the shared MetadataStore: http://docs.spring.io/spring-integration/reference/html/system-management-chapter.html#metadata-store
So, only one instance of your application will be able to pick up a file.
Even if we find a solution for nio-lock, you should understand that lock means "do not touch until freed". Therefore when one instance has done its work, another one is ready to pick up the file. I guess that isn't your goal.
I am referring to the Windows-native ftp.exe application. Out-of-the-box, it seems to overwrite files under any and all circumstances.
Is it possible to prevent overwriting files with ftp.exe? If this cannot be done with specific ftp.exe arguments, can it be done using a batch process to call ftp.exe?
I don't think there are any ftp arguments nor ftp command options to do what you want explicitly.
Using a batch process looks like the way to go (if you must stick to this ftp client).
You might have to do something like:
Ftp connect
List files (remote.txt)
Compare remote.txt with local.txt (files you want to upload)
Generate uploadables.txt (containing items from local.txt not in remote.txt)
Ftp connect again
Upload uploadables.txt
Sounds fun, but I better get back to work. :-)