Moved file to another location in Apache NIFI - apache-nifi

I am trying to load to MySQL database using LOCAL INFILE however, i am having difficulties to move the files to a new location once they file has been successfully imported in MySql.
Below is a screen show of the process-flow.
My problem is:
I am managed to import/ load the database using the LOAD DATA LOCAL INFILE of MySql but the issue is when I am trying to move the successfully imported files to the correct directory. I fail to achieve so. The PutFile_sucess & PutFile_fail do not work as expected, so I decided to use: FetchFile and then I get an empty file when I say FetchFile it just creates it instead of moving the whole file.
I hope I have made myself clear, I would appreciate any inputs.

if your issue is to remove the file once imported, you could just add a FetchFile processor somewhere after your sucess part and set the Completion Strategy to Delete File
However, better aproach will be to load the content of the file in Nifi then parse/split/process it and then (eventually regroup by batch) ingest the content in MySQL.
Could you maybe improve your question with informations like the format/structure/content of the file you're trying to load ?

Related

Can I delete file in Nifi after send messages to kafka?

Hi I'm using nifi as an ETL tool.
Process IMG
This is my current process. I use TailFile to detect CSV file and then send messages to Kafka.
It works fine so far, but i want to delete CSV file after i send contents of csv to Kafka.
Is there any way?
Thanks
This depends on why you are using TailFile. From the docs,
"Tails" a file, or a list of files, ingesting data from the file as it is written to the file
TailFile is used to get new lines that are added to the same file, as they are written. If you need to a tail a file that is being written to, what condition determines it is no longer being written to?
However, if you are just consuming complete files from the local file system, then you could use GetFile which gives the option to delete the file after it is consumed.
From a remote file system, you could use ListSFTP and FetchSFTP which has a Completion Strategy to move or delete.

shared drive csv file load to Mssql table using spring

I am searching for approach/ code base which can fulfill the below requirement.
We have source file(formatted) in shared drive which has ~one
million record count, this drive has new file every day with date prefix on it(eg: 02-12-2018_abcd.txt)
2.While reading file from sharedrive location, if its any failure occuer it
should not commit the sql insert.
3.this job should run on schduled time.
I found the couple of approaches to read file from shared drive like jar to read, another approach is to copy the file from shared drive to local machine(on applicaion server) and do spring batch processing and other approach is using spring integration adapter, inbount channel etc.
Please suggest and the best approach and spring code base/ git code for the same. Thanks
This is a typical use case where Spring Batch can help. You can have a first step (of type tasklet) that copies the file from the shared drive to the local machine and then a second step (of type chunk oriented tasklet) that reads the file and inserts data in the database.
You can find samples here: https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples

Processing a huge CSV file uploaded using Spring controller

Let's imagine the following situation: I have a user which using the admin panel, uploads a csv file and transforms that csv in a new one with additional data retrieved from the DB. This csv must be stored somewhere in our server and we want to perform this transformation asynchronously.
I know about Spring batch so I've tried how to figure out if there is any posibility to set the file of the batch process dynamically. I've made some tests and I've achieved to launch an spring batch job but using a hardcoded file setted in the bean constructor.
We are using grails and the spring-batch plugin. The thing is... Is there any other better way to process a huge CSV asynchronously without memory errors? I was revieweing this post Spring batch to upload a CSV file and insert into database accordingly but I don't know if it is the best approach.

Elasticsearch - how to store scripts in config/scripts directory

I'm trying to experiment with using scripts in the config/scripts directory. The Elasticsearch docs here say this:
Save the contents of the script as a file called config/scripts/my_script.groovy on every data node in the cluster:
This seems like it's probably really easy, but I'm afraid I don't understand how exactly to put a groovy file "on every data node in the cluster". Would this normally be done through the command line somehow, or can it be done by manually moving the groovy file (in Finder on OSX for example)? I have a test index, but when I look at the file structure on the nodes I'm confused where to put the groovy file. Help, pretty please.
You just need to copy the file to each server running elasticsearch. If you're just running elasticsearch on your computer then go to the folder you've installed elasticsearch into and add copy the file into config/scripts in there (you may have to create the folder first). Doesn't matter how the file gets there.
You should see an entry in the logs (or the console if you are running in the foreground) along the lines of
compiling script file [/path/to/elasticsearch/config/scripts/my_script.groovy
This won't show up straightaway - by default elasticsearch checks for new/updated scripts every 60 seconds (you can change this with the watcher.interval setting)
Since file scripts are deprecated (elastic/elasticsearch#24552 & elastic/elasticsearch#24555) this aproach is not going to work anymore.
API it's the only way.

Why does my app sometimes create a file "A.myappextension-shm" in addition to the file "A.myappextension"?

I have a Document based Core Data app that saves with SQLite. While testing I save to a test file A.myappextension. Sometimes another file---"A.myappextension-shm"---is also created. Why is that?
Assuming that A.myappextension is your Core Data persistent store file, it happens because of SQLite journaling. You might also see A.myappextension-wal. Both of these extra files are SQLite journal files, and a lot of your data may actually be stored in them instead of in the main file. If you ever copy these files, or remove them, or do anything else that treats them as files instead of SQLite data, you'll need to copy/remove/whatever all of them.

Resources