I have a nifi flow it has keeps some state with the ListS3 processor.
I have a dev instance and a prod instance.
I want some options of deploying from dev to prod where the state is kept and where I don't manually have to go in and change all the processor's and process groups.
It seems like this can't be done with templates? Based on the following stackoverflow question:
how does NIFI listfile maintains its timestamp?
edit:
Just so there is no misunderstanding I want to keep prod state when deploying.
It sounds like you aren't using NiFi registry, so you're downloading a flow template and then importing it. This can't preserve state, as it's not the same flow.
You should be using NiFi Registry to version control your flows, which supports this Dev -> Prod workflow.
Build your flow in Dev NiFi, version to Registry.
In prod, add a new Process Group and select the Import option when it asks you for a name. You'll be able to pick your versioned flow.
Run your flow so that it stores some state. View the processors state to verify.
Now update the flow in Dev, and commit the local change to Registry.
Then, update the flow in Prod to the latest version from Registry. It will preserve state on the stateful processor.
For detailed steps on installing & using Registry, see these links:
https://nifi.apache.org/docs/nifi-registry-docs/html/getting-started.html
https://pierrevillard.com/2018/04/09/automate-workflow-deployment-in-apache-nifi-with-the-nifi-registry/
https://alasdairb.com/2021/03/22/nifi-in-production-nifi-registry/
https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.2.0/versioning-a-dataflow/content/connecting-to-a-nifi-registry.html
https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.0/getting-started-with-nifi-registry/content/import-a-versioned-flow.html
https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.0/getting-started-with-nifi-registry/content/save-changes-to-a-versioned-flow.html
https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.0/getting-started-with-nifi-registry/content/start-version-control-on-a-process-group.html
Related
According to Jelastic documentation it is possible to export the Environment configuration and download it so it can be restored in another provider
However I have tried with 2 Jelastic providers and they both have disabled the option for exporting private data.
So exporting/download/upload/import of environment is not possible.
i.e. I was expecting to have a process similar to CPanel backup/restore tool
In fact, another view for the deployment process gives a possibility to get rid of the model of handling the data or configuration on the platform. Try to think a bit differently and using CI/CD approach. The Jelastic provides a platform where something you created, locate on somewhere you're elaborating(VCS or GIT as an example) and based on or depends on the specific stack, already pre-configured like a layer and can be installed(copied) over Jelastic. Don't need to handle the data somewhere in the cloud because you have it locally(means within any VCS) and doing the changes there. Then just do a 'pull' procedure(manually or automatically) on that deployment(test, production, staging) environment you're expecting.
Moreover, you can expect any environments type like a code and perform it creating before deploying the data.
Please, find the articles being described each case:
Deployment Guide
Jelastic Packaging Standard for CI/CD Automation
In case you would like to handle the databases' backups, check this article:
Scheduling Database Backups
Additional FTP add-on can make the copies more easily for each instance:
FTP/FTPS Support in Jelastic
We would like to use Nifi registry with git as storage engine. In that case, i modified providers.xml and i was able to save the flows there.
Challenges:
There is no 2 way sync. We can only save the flows modified by Nifi user but if we modify the flow directly in git location, it will not be reflected on nifi registry
There is no review or approval process for Nifi registry. A user has to login to nifi-registry server, create a branch and issue a pull request.
As a workaround, we can delete the database file ( H2) and restart the nifi resgistry.
Lastly, everything should be automated in CI/CD like what we do for regular maven project.
Any suggestions ?
The purpose of the git storage is mostly to let user visualize the differences through tools like git hub, or any other tools that can support diffs, plus by pushing to a remote you also get a remote backup of the flow content. It is not meant to be modified outside of the application, just like you wouldn't bypass an application and go right into it's database and start changing data.
I'm using GitFlowPersistenceProvider in NiFi Registry 0.3. Today I created another NiFi Registry and I wanted to load all flows from the previous one using the same provider. Unfortunately nothing happens - any buckets nor flows aren't recreated. I tried to create all buckets manually but even then any flows aren't imported.
GitFlowPersistenceProvider documentation states:
When NiFi Registry starts, this provider reads through Git commit
histories and lookup these bucket.yml files to restore Buckets and
Flows for each snapshot version.
What should I do to load existing flows into new NiFi Registry using GitFlowPersistenceProvider?
Unfortunately that documentation is not totally accurate. Currently there is a metadata DB which defaults to an embedded H2, but can also be Postgres, and then the flow storage. You would need to restore both in order to spin up a new instance with the same data.
In the next release there is a new feature where if you start a new instance with a completely empty DB (i.e. no buckets) and the git flow provider, then it will restore everything.
You can do the same by stopping nifi-registry 0.4.0 , deleting the database file ( if any) and then starting the nifi registry to rebuild the database based on git repo.
https://issues.apache.org/jira/browse/NIFIREG-209
We are using templates to package up some data transfer jobs between two nifi clusters, one acting as a sender, the other as the receiver. One of our jobs contains a remote process group and all worked fine at the point the template was created.
However when we deploy the template through our environments (dev, test, pre, prod), it is tedious and annoying to have to manually delete and a recreate a remote process group in the user interface. I'd like to automate this to simplify deploying templates and reduce the manual intervention.
Is it possible to update a remote processor group and its port configuration through the rest-api ?
Do I just use the REST api to create a new RPG with the correct configuration ?
Does anyone have any experience with this?
There is a JIRA to address this issue [1] which will be worked in conjunction with some of the ongoing Flow Registry (SDLC for flows) efforts. Until then, the best option would be (2) above.
[1] https://issues.apache.org/jira/browse/NIFI-4526
I have tried all the basics of Kubernetes and if you want to update your application all you can use kubectl rolling-update to update the pods one by one without downtime. Now, I have read the kubernetes documentation again and I have found a new feature called Deployment on version v1beta1. I am confused since I there is a line on the Deployment docs:
Next time we want to update pods, we can just update the deployment again.
Isn't this the role for rolling-update? Any inputs would be very useful.
Deployment is an Object that lets you define a declarative deploy.
It encapsulates
DeploymentStatus object, that is in charge of managing the number of replicas and its state.
DeploymentSpec object, which holds number of replicas, templateSpec , Selectors, and some other data that deal with deployment behaviour.
You can get a glimpse of actual code here:
https://github.com/kubernetes/kubernetes/blob/5516b8684f69bbe9f4688b892194864c6b6d7c08/pkg/apis/extensions/v1beta1/types.go#L223-L253
You will mostly use Deployments to deploy services/applications, in a declarative manner.
If you want to modify your deployment, update the yaml/json you used without changing the metadata.
In contrast, kubectl rolling-update isn't declarative, no yaml/json involved, and needs an existing replication controller.
I have been testing rolling update of a service using both replication controller and declarative deployment objects. I found using rc there appears to be no downtime from a client perspective. But when the Deployment is doing a rolling update, the client gets some errors for a while until the update stabilizes.
This is with kubernetes 1.2.1
The main difference is that "kubectl rolling-update" is client-driven rolling update, whereas the Deployment object gives you server-side rolling update.