NIFI - Dev to Test to Prod - apache-nifi

We are struggling with trying to figure out the best approach for updating processor configurations as a flow progresses through the dev, test, and prod stages. We would really like to avoid manipulating host, port, etc. references in the processors when the flow is deployed to the specific environment. At least in our case, we will have different hosts for things like ElasticSearch, PostGres, etc. How have others handled this?
Things we have considered:
Pull the config from a properties file using expression language. This is great for processors that have EL enabled, but not the case for those where it isn't.
Manipulate the flow xml and overwrite the host, port, etc. configurations. A bit concerned about inadvertently corrupting the xml and how portable this will be across NIFI versions.
Any tips or suggestions would be greatly appreciated. There is a good chance that there is an obvious solution we have neglected to consider.
EDIT:
We are going with the templates that Byran suggested. They will definitely meet our needs and appear to be a good way for us to control configurations across numerous environments.
https://github.com/aperepel/nifi-api-deploy

This discussion comes up frequently, and there is definitely room for improvement here...
You are correct that currently one approach is to extract environment related property values into the bootstrap.conf, and then reference them through expression language so the flow.xml.gz can be moved from one environment to the other. As you mentioned this only works well with properties that support expression language.
In order to make this easier in the future, there is a feature proposal for an idea called a Variable Registry:
https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry
An interesting approach you may want to look at is using templates. There is a GitHub project that can be used to help with this:
https://github.com/aperepel/nifi-api-deploy

You can loook at this post automating NIFI template deployment
For automating NIFI template deployment, there is a tool that works well : https://github.com/hermannpencole/nifi-config
Prepare your nifi development
Create a template on nifi
and download it
Extrac a sample configuration with the tools
Deploy it on production
undeploy the old version with the tools
deploy the template with the tools
update the production configuration with the tools

Related

Managing configuration files for multiple instances of the same application (same environment)

I have multiple instances of the same engine running as windows services on the same environment and system that just have slightly different connection strings as they point to different queues. Other than a couple of lines in the conifg (XML) the rest of the application is exactly the same (config and binaries). When config changes are made this is done to all instances which is time consuming so I am doing some research into the best method of managing the config files in a scalable and version controlled way. Currently I use a batchfile to copy the default engine directory and config over and then find and replace the individual strings. I'd prefer to have a template config that can be updated that pulls in set variables for the connection strings depending on the instance and environment. I understand that this may be possible using chef, puppet or ansible but to my understanding these are more for system configuration as opposed to individual application files? Does anyone know if this is possible with gitlab or AWS? Before committing to the learning curve I'm trying to discern if one of the aforementioned config management tools would be overkill for this scenario or a realistic solution?
I understand that this may be possible using chef, puppet or ansible but to my understanding these are more for system configuration as opposed to individual application files?
Managing individual files, including details of their contents, is a common facet of configuration management. Chef, Puppet, and Ansible can all do this with relative ease.
Does anyone know if this is possible with gitlab or AWS?
No doubt, someone does. And I anticipate, but cannot confirm, that the answer is "yes" for both.
Before committing to the learning curve I'm trying to discern if one of the aforementioned config management tools would be overkill for this scenario or a realistic solution?
A configuration management system would almost certainly be overkill if the particular task you describe is the only thing you are considering them for.
Currently I use a batchfile to copy the default engine directory and
config over and then find and replace the individual strings. I'd
prefer to have a template config that can be updated that pulls in set
variables for the connection strings depending on the instance and
environment.
In the first place, if it ain't broke, don't fix it. On the other hand, if it is broke, and switching to a template-based approach is a reasonable method to resolve the issue, then you can certainly implement that with a for-purpose local script without bringing in all the apparatus of a configuration management system.
In the event that you do decide that the current mechanism needs to be replaced, do, for goodness sake, ditch batchfile. It's one of the worst scripting languages ever inflicted on humanity. PowerShell would be a natural replacement on Windows, but you might also consider Python, or pretty much any programming language you know.

How do I change an attribute for a policy group without having to re-compile all of the policyfiles?

I am trying to shift from environment files in Chef to using Policyfiles. https://docs.chef.io/policy.html. I really like the concept, especially since you can include a policy from policy into another, but I am trying to understand how do a simple attribute change.
For instance, if I want to change a globally-used attribute that may be an error message for a problem that is happening now. ("The system will be down for 10 minutes. Thanks for your patience"). Or perhaps I want to turn off some AB testing with an attribute working as a flag. From what I can tell, the only way I can do this is to change an attribute in the policyfile, and then I need to create a new version of the policy file.
And if the policyfile is an included in many other policyfiles, like in the case of a base policyfile, then I have a lot of work to do for a simple change.
default['production']['maintenance_message'] = 'We will be down for the next 15 minutes!'
default['production']['start_new_feature'] = true
How do I make a simple change to an attribute that affects an entire policy group? Is there a simple way to change an attribute, or do I have to move all my environment properties to a data bag??
OK, I used Chef Support and got an answer: Nope.
This is their response:
"You've called out one of the main reasons we recommend that people use something like Jenkins pipelines to deliver cookbooks to their infrastructure. All that work can be kicked off by a build system recognizing a change in a dependency and initiating new builds for all the downstream consumer jobs.
For what it's worth, I don't really like putting application configurations like that maintenance message example you used in configuration management, I think something like Habitat is a better system for that kind of rapid-change configuration delivery, although you could also go down the route of storing application configuration like that in a system like Hashicorp Vault, Consul, or etcd, and ensuring that whatever apps need to ingest those changes are able to do so without configuration management fighting with the key-value config store.
If that was just an example to illustrate things, ignore the previous comment and refer only to my recommendation to use pipelines to deliver cookbooks, attributes, etc to your infrastructure (and I would generally recommend against data bags these days, but that's mostly a preference thing)."

Apache Nifi - About .Nar File Changes and Nifi Restarting

I want suggestions for my application:
I have Multitenancy in Nifi. For each Process group, I have different Tenants/Users.
For any changes in one Tenant/user like in his custom processor(.nar file will create), we need to copy-paste that .nar file into lib folder and again restart the nifi. But due to this full Nifi server has restarted because of that Each Tenant/User and Processes group get restarted.
So, Please give Some Suggestions So that we can restart only one Tenant/user or process group Or Without Restart Nifi .nar file will reflect?
NiFi does not currently have the kind of warm restart option that you describe, however a lot of the base functionality needed to support it is in the code base and the concept is on the community roadmap.
Some options that might help you today:
Consider segregating the tenants with a high rate of code change into separate development environments. You could possibly leverage the Docker builds to provide flexibility and easy automation. You could then promote the end-of-day versions of your Nars into the 'Production' cluster each night, hopefully without disturbing users.
Consider utilising the NiFi Site-to-Site capability to have linked NiFi environments instead of a single shared one. Processors that change regularly could be called out to and updated in their own schedule
Consider why you are changing processor code so regularly, there may be a better approach than hard coding logic and parameters into the processors - the variable registry, various controller services, flow registry, etc. all provide a very rich featureset.

Spring Profile conditional prop files multiple environment

I have a question, with Spring Profiles. I understand the reason for not using maven profiles because each environment would require another artifact. That makes since. I modified my code to use Spring Profile but the problem I have with Spring Profiles is that it requires you to have a database.property file for each environment, on the server. I have this setup, same setup everyone has seen a hundred times.
src
- main
- resources
-conf
myapp.properties
-env
dev.db.properties
test.db.properties
prod.db.properties
The problem I think with this setup is that, each server would have all the files in the env dir (i.e. dev would have prod.db.properties and test.db.properties files on its server). Is there a way to only copy the files that are needed during the build of maven without using profiles? I haven't been able to figure out a way. If that is the case, then this would seem like a reason to use maven profiles. I may have missed something. Any thoughts would be greatly appreciated.
This seems like a chicken and egg problem to me. If you want your artifact to work on all these 3 environments you need to ship the 3 configurations. Not doing so would lead to the same issue you mentioned originally. It's generally a bad practice to build an artifact with certain coordinates differently according to a profile.
If you do not want to ship the configuration in the artifact itself, you could externalize the definition either through the use of system property or by locating a properties file at a defined place (that you could override for convenience).
You should first point out what your application really is: If you are running "an application in different environments" or if you are running "different applications in their own propritary environments". This are two slightly different concepts:
If you are running an application in different environments its better to put all property files into your jar. Bring it into mind by imagine that you buy a new SUV; you first drive it on a test track, then after on ordinary highways before going offroad to finally enjoy its offroad capabilities. You always use one and the same car in different environments with all its capabilities and driving characteristics. In each environment the car adapts its behaviour and driving characteristics. If you use one application to drive it through different environment, so use the first approach to build all environment-characteristics into one jar.
On the other hand you can also use slightly different cars in different environments. So if you need different cars with its own special driving characteristics for different environments, maybe 4WD or special flood ligths because you are driving by night, you should take the second approach. Back to application: If you need different applications with far different characteristics in different production environments its better to build each application only with the properties it really needs.
Finally you can also merge the two approaches:
my-fun-application-foo.jar for customer foo with properties for test, integration and production environment.
my-fun-application-2047.jar for customer 2047 with properties for test, pre-integration, integration, pre-production and production environment.
Now you should get also an understanding why you shouldn't using profiles for building an application with different flavours.

What parts of application you prefer to be externalized as configuration and why?

What parts of your application are not coded?
I think one of the most obvious examples would be DB credentials - it's considered bad to have them hard coded. And in most of situations it is easy to decide if you want something to be externalized or coded.For me the rules are simple. Some part of the application should be externalized if:
it can and should be changed by non-developer, but not so often to be included in application settings defined in UI (DB credentials, service URLs, etc)
it does not require programming language and seems unnatural being coded (localization)
Do you have anything to add?
This is a little related to this question about spring cfg.
Spring configuration seems less obvious example for me, because in my practice it is never modified by anyone except the developer. And the road of externalizing can take you far away, to the entire project being "configured", not coded - so where to stop?
So please post here some examples from your experience, when you got benefit from having something configured, not coded - like dependency injection configuration in spring, etc.
And if you use spring - how often is configuration changed without recompiling?
Anything that needs to differ between different deployments of your application. That is, anything specific to the environment.
Examples include:
Database connection strings
URLs for web or WCF services
Logging configuration
Any information your application uses that is "data" and that could change depending on where it is installed. Things like:
smtp mail server used to send e-mails
Database connect strings
Paths to file locations / folders used by the app
FTP servers & connect info
Active Directory servers used for authentication
Any links displayed in the application to external information
sources
Warning limit values
I've even put the RegEx filters used to limit the allowable characters
for data entry fields.
Besides the obvious changing stuff (paths, servers, ports, and so on), some people argue that you should be able to easily change whatever might reasonably change, for instance, say you have a generic engine which operates on the business logic (a rule engine).
You would then define the rules on a "configuration file" which ends up being is no less than programming in a DSL instead of in the generic purpose language. Benefits being it's closer to the domain so it's easier and more maintainable, and that you can easily change things that otherwise would demand a new build.
The main argument behind this is that things you assumed would never change always end up changing nonetheless, so you better be prepared.
paths and server names/addresses come to mind..
I agree with your two conditions, which is why I:
Rarely include a config file as part of a Windows or Windows Mobile application (web apps yes).
If I did include a config file meant to be tweaked by end users, it certainly wouldn't be XML.
Employee emails/names since employees can come and go... (you should typically try to keep them out of an application though)
Configuration files should include:
deployment details
DB credentials
file paths
host names
anything that is used in many places but that may change
contact email addresses
options that aren't in the GUI
The last one is a bit open-ended, but very important. I've found it very useful to foresee variables that the client may, in the future, want to change. If changes are infrequent, I or they can edit the config file. If it becomes a frequent thing, it's trivial to add the option to the GUI, which isn't hardcoded.
I would also add encryption keys (which themselves should be encrypted)...
Basically the rule of thumb is information the application needs BEFORE it's regular, functional operation, data that it MUST have on-hand (i.e. local and not networked).
Note that this data should not be dynamically changing or large amounts of it, otherwise it should be in the database.
With Spring apps I actually distinguish between two types of configuration:
Items externalized into property files which are "deploy time" concerns or "environment-specific": server IP's / addresses, file system locations, etc etc
Spring XML configuration which can do lots of things, like indicate the overall application structure, apply behavior via AOP, etc.
I use Spring to wire all the beans in a J2SE application that has no GUI (a transactional switch). That way it's very easy for me to have different configurations in each deployment (we have this thing running in different countries), without having to code anything different.
Another thing I like to have is to manage all the SQL statements separately from the code, when I use plain JDBC (or Spring JDBC). Like in a properties file or XML or something, sometimes even as String properties in the beans that will use the statement (when there is only one bean that will use the statement, such as a DAO).
I am going to use spring JDBC or vanilla JDBC for data persistence, here we have decided to externalize all the SQL from the Java code, so can be better mangable in terms of SQL query tuning and optimization, we don't need to disturb the java code.

Resources