shared drive csv file load to Mssql table using spring - spring

I am searching for approach/ code base which can fulfill the below requirement.
We have source file(formatted) in shared drive which has ~one
million record count, this drive has new file every day with date prefix on it(eg: 02-12-2018_abcd.txt)
2.While reading file from sharedrive location, if its any failure occuer it
should not commit the sql insert.
3.this job should run on schduled time.
I found the couple of approaches to read file from shared drive like jar to read, another approach is to copy the file from shared drive to local machine(on applicaion server) and do spring batch processing and other approach is using spring integration adapter, inbount channel etc.
Please suggest and the best approach and spring code base/ git code for the same. Thanks

This is a typical use case where Spring Batch can help. You can have a first step (of type tasklet) that copies the file from the shared drive to the local machine and then a second step (of type chunk oriented tasklet) that reads the file and inserts data in the database.
You can find samples here: https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples

Related

Moved file to another location in Apache NIFI

I am trying to load to MySQL database using LOCAL INFILE however, i am having difficulties to move the files to a new location once they file has been successfully imported in MySql.
Below is a screen show of the process-flow.
My problem is:
I am managed to import/ load the database using the LOAD DATA LOCAL INFILE of MySql but the issue is when I am trying to move the successfully imported files to the correct directory. I fail to achieve so. The PutFile_sucess & PutFile_fail do not work as expected, so I decided to use: FetchFile and then I get an empty file when I say FetchFile it just creates it instead of moving the whole file.
I hope I have made myself clear, I would appreciate any inputs.
if your issue is to remove the file once imported, you could just add a FetchFile processor somewhere after your sucess part and set the Completion Strategy to Delete File
However, better aproach will be to load the content of the file in Nifi then parse/split/process it and then (eventually regroup by batch) ingest the content in MySQL.
Could you maybe improve your question with informations like the format/structure/content of the file you're trying to load ?

Spring batch integration file lock access

I have a spring batch integration where multiple servers are polling a single file directory. This causes a problem where a file can be processed up by more than one. I have attempted to add a nio-lock onto the file once a server has got it but this locks the file for processing so it can't read the contents of the file.
Is there a spring batch/integration solution to this problem or is there a way to rename the file as soon as it is picked up by a node?
Consider to use FileSystemPersistentAcceptOnceFileListFilter with the shared MetadataStore: http://docs.spring.io/spring-integration/reference/html/system-management-chapter.html#metadata-store
So, only one instance of your application will be able to pick up a file.
Even if we find a solution for nio-lock, you should understand that lock means "do not touch until freed". Therefore when one instance has done its work, another one is ready to pick up the file. I guess that isn't your goal.

Program solution reading through share folders

I have a quick project I am working on for one of our VPs.
We have a few thousand CAD jobs stored on a network file share. The file structure is such that there is a parent folder for the CAD job. Part of the folder name contains the job number. Inside the folder, there are 1 to many .ini text files that contain the connection information I need.
What I need is a programatic way to search through all the folders and extract the job number from the folder name, and all the connection values from the ini files.
For example for a folder named CM8252390-3, the job number is 8252390-3. Inside this folder are 3 ini files. Inside the ini files are that look like this:
[Connection]
Name=IMP_Acme_3.5
[Origin]
X=-15.044784
Y=19.620095
Z=44.621395
So my program needs to give me the following result
Job Connection
8252390-3 IMP_Acme1_3.5
8252390-3 IMP_Acme2_3.5
8252390-3 IMP_Acme3_3.5
8254260-1 IMP_Acme3_2.4
8254260-1 IMP_Acme3_4.1
...continued for all folders in the network share
Any suggestion on the best way to do this. I am primarily an Oracle PL/SQL developer, but have some basic Windows batch and Unix shell experience. If I can get the data loaded into Oracle tables, I can search using PL/SQL tools, but is there a better way using shell, batch, or other tools?
Thank you.
I think this is a job for Powershell or vbScript. It would be easy to use these tools to write the information you need to one file.
This file should be written to an Oracle directory.
grant read permission to a database user on this directory
use utl_file to read the file or treat the file as an external table and expose it as a view
schedule a regular OS job to refresh or rebuild the list

Spring Batch: move files to another location

I want to use Spring Batch to perform the upload of files from my server to e.g. Google Drive.
So the steps are:
get files from a specified folder
upload them to Google Drive
update each entry in my DB corresponding to this file (i.e. update path)
My question is: do I necessarily have to do it with tasklet? If so, do I have to split the job into chunks myself and there will be no restart-on-failure support?

What's the best way to (programatically) determine a file's network origin?

For an application I'm writing, i want to programatically find out what computer on the network a file came from. How can I best accomplish this?
Do I need to monitor network transactions or is this data stored somewhere in Windows?
When a file is copied to the local system Windows does not keep any record of where it was copied. So unless the application that created it saved such information in the file then it will be lost.
With file auditing file and directory operations can be tracked, but I don't think that will include the source path with file copies (just who created it and when).
Yes, it seems like you would either need to detect the file transfer based on interception of network traffic, or if you have the ability to alter the file in some way, use public key cryptography to sign files using a machine-specific key before they are transferred.
Create a service on either the destination computer, or on the file hosting computers which will add records to an Alternate Data Stream attached to each file, much the way that Windows handles ZoneInfo for files downloaded from the internet.
You can have a background process on machine A which "tags" each file as having been tagged by machine A on such-and-such a date and time. Then when machine B downloads the file, assuming we are using NTFS filesystems, it can see the tag from A. Or, if you can't have a process at the server, you can use NTFS streams on the "client" side via packet sniffing methods as others have described. The bonus here is that future file-copies will retain the data as long as it is between NTFS systems.
Alternative: create a requirement that all file transfers must be done through a Web portal (as opposed to network drag-and-drop). Built in logging. Or other type of file retrieval proxy. Do you have control over procedures such as this?

Resources