I'm using GitFlowPersistenceProvider in NiFi Registry 0.3. Today I created another NiFi Registry and I wanted to load all flows from the previous one using the same provider. Unfortunately nothing happens - any buckets nor flows aren't recreated. I tried to create all buckets manually but even then any flows aren't imported.
GitFlowPersistenceProvider documentation states:
When NiFi Registry starts, this provider reads through Git commit
histories and lookup these bucket.yml files to restore Buckets and
Flows for each snapshot version.
What should I do to load existing flows into new NiFi Registry using GitFlowPersistenceProvider?
Unfortunately that documentation is not totally accurate. Currently there is a metadata DB which defaults to an embedded H2, but can also be Postgres, and then the flow storage. You would need to restore both in order to spin up a new instance with the same data.
In the next release there is a new feature where if you start a new instance with a completely empty DB (i.e. no buckets) and the git flow provider, then it will restore everything.
You can do the same by stopping nifi-registry 0.4.0 , deleting the database file ( if any) and then starting the nifi registry to rebuild the database based on git repo.
https://issues.apache.org/jira/browse/NIFIREG-209
Related
I have a nifi flow it has keeps some state with the ListS3 processor.
I have a dev instance and a prod instance.
I want some options of deploying from dev to prod where the state is kept and where I don't manually have to go in and change all the processor's and process groups.
It seems like this can't be done with templates? Based on the following stackoverflow question:
how does NIFI listfile maintains its timestamp?
edit:
Just so there is no misunderstanding I want to keep prod state when deploying.
It sounds like you aren't using NiFi registry, so you're downloading a flow template and then importing it. This can't preserve state, as it's not the same flow.
You should be using NiFi Registry to version control your flows, which supports this Dev -> Prod workflow.
Build your flow in Dev NiFi, version to Registry.
In prod, add a new Process Group and select the Import option when it asks you for a name. You'll be able to pick your versioned flow.
Run your flow so that it stores some state. View the processors state to verify.
Now update the flow in Dev, and commit the local change to Registry.
Then, update the flow in Prod to the latest version from Registry. It will preserve state on the stateful processor.
For detailed steps on installing & using Registry, see these links:
https://nifi.apache.org/docs/nifi-registry-docs/html/getting-started.html
https://pierrevillard.com/2018/04/09/automate-workflow-deployment-in-apache-nifi-with-the-nifi-registry/
https://alasdairb.com/2021/03/22/nifi-in-production-nifi-registry/
https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.2.0/versioning-a-dataflow/content/connecting-to-a-nifi-registry.html
https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.0/getting-started-with-nifi-registry/content/import-a-versioned-flow.html
https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.0/getting-started-with-nifi-registry/content/save-changes-to-a-versioned-flow.html
https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.0/getting-started-with-nifi-registry/content/start-version-control-on-a-process-group.html
We would like to use Nifi registry with git as storage engine. In that case, i modified providers.xml and i was able to save the flows there.
Challenges:
There is no 2 way sync. We can only save the flows modified by Nifi user but if we modify the flow directly in git location, it will not be reflected on nifi registry
There is no review or approval process for Nifi registry. A user has to login to nifi-registry server, create a branch and issue a pull request.
As a workaround, we can delete the database file ( H2) and restart the nifi resgistry.
Lastly, everything should be automated in CI/CD like what we do for regular maven project.
Any suggestions ?
The purpose of the git storage is mostly to let user visualize the differences through tools like git hub, or any other tools that can support diffs, plus by pushing to a remote you also get a remote backup of the flow content. It is not meant to be modified outside of the application, just like you wouldn't bypass an application and go right into it's database and start changing data.
We are using templates to package up some data transfer jobs between two nifi clusters, one acting as a sender, the other as the receiver. One of our jobs contains a remote process group and all worked fine at the point the template was created.
However when we deploy the template through our environments (dev, test, pre, prod), it is tedious and annoying to have to manually delete and a recreate a remote process group in the user interface. I'd like to automate this to simplify deploying templates and reduce the manual intervention.
Is it possible to update a remote processor group and its port configuration through the rest-api ?
Do I just use the REST api to create a new RPG with the correct configuration ?
Does anyone have any experience with this?
There is a JIRA to address this issue [1] which will be worked in conjunction with some of the ongoing Flow Registry (SDLC for flows) efforts. Until then, the best option would be (2) above.
[1] https://issues.apache.org/jira/browse/NIFI-4526
Is it possible in Azure Service Fabric to run code when a (stateful) microservice is upgraded?
The case I have in mind is state migration. Between one version of a service and the next you may want to update persisted state to a new format. Or maybe delete state that is no longer relevant for the next version of the service.
You could try with storing current version in persistent storage. On service startup, detect current code package version from the service context and compare that with the stored version.
If it doesn't match, take necessary steps for data migration and then update the current version... rinse and repeat.
I'm not aware of any'native'way to make this work... The service context had a CodePackageModifiedevent... But I'm not quite sure what that's supposed to do(or when it's triggered
I've got Solr running as a service on windows. I used NSSM (http://nssm.cc/) to set up the service to automatically start. The web server is Jetty.
I'd like to have my Solr directory under source control in Git because the configuration changes (and sometimes plugin changes) need to be picked up by all team members. At the very least, I'd like to have the configuration files (solrconfig.xml, schema.xml, stopwords.txt, etc.) under Git control, but ideally, I'd like to put the whole solr directory (including jar and war files) under Git control. Will this pose any problems? I can foresee us pulling commits and switching branches, all while the Solr service is running.
How have other teams configured Solr under source control?
The rule I go by is to check in configuration files (SolrConfig.xml, Stopwords.txt, dataconfig.xml etc.)
There are reasons, IMHO, to not check in the entire Solr directory in source control:
Solr directory contains the index data as well as configuration. Bad idea to check in the index, because
size of the repo will grow
your index isn't a data-source. In most cases, it relies on external source such as RDBMS to refresh itself. Huge risk on data-integrity when your database goes out of sync with your Solr Index.
Only in development box, we have Solr and the consuming app deployed in the same machine, otherwise, setting up Solr is independent of application deploy. Checkin in Solr directory in SC would mean unnecessarily big repositories to deploy.
Rather than doing the whole repository checkin, we ended up having the config files checked in and basic scripts to setup solr, create index, start an instance etc. So every team member could check out the code base, run a couple of build tasks and get ready to party :)