Compare two different NiFis in different environments - apache-nifi

I want to compare two different environments (Prod, Dev). We have around 5-deep layers.
NiFi Home --> First --> Second ...
Which would be the approach to see the differences, besides going from layer to layer?

I'm not aware of anything that can do what you are asking. The closest thing would be using NiFi Registry and having versioned flows that you start in dev, save to registry and import to prod, then you could see any changes made local to either instance per process group.

Agree with #Bryuan, if you dont use schema registry you can compare the flow.xml with traditional compare software. each processor get a unique UUID that you can use to compare content.

Related

NiFi Controller Service Reuse and Schema Registry Architecture

When creating Apache NiFi controller services, I'm interested in hearing about when it makes sense to create new ones and when to re-share existing ones.
Currently I have a CsvReader and CSVRecordSetWriter at the root process group and they are reused heavily in child process groups. I have tried to set them up to be as dynamic and flexible as possible to cover the widest number of use cases possible. I am setting the Schema Text property in each currently like this:
Reader Schema Text: ${avro.schema:notNull():ifElse(${avro.schema}, ${avro.schema.reader})}
Writer Schema Text: ${avro.schema:notNull():ifElse(${avro.schema}, ${avro.schema.writer})}
A very common pattern I have is to map files with different fields from different sources into a common format (common schema). So one thought is to use the ConvertRecord or UpdateRecord processors with avro.schema.reader and avro.schema.writerattributes set to the input and output schemas. Then I would have the writer always set the avro.schema attribute so any time I read records again further along in a flow it would default to using avro.schema. This feels dirty to leave the reader and writer schema attributes hanging around. Is there a better way from an architecture standpoint? Why have tons of controller services hanging around at different levels? Aside from some settings that may need to be different for some use cases, am I missing anything?
Also curious in hearing about how others organize their schemas? I don't have a need to reuse them at disparate locations across different processor blocks or reference different versions so it seems like a waste to centralize them or maintain a schema registry server that will also require upgrades and maintenance when I can just use AvroSchemaRegistry.
In the end, I decided it made more sense to split the controller into two controllers. One for conversions from Schema A to Schema B and another for using the same avro.schema property as normal/default readers and writers do when adding new ones. This allows for explicitly choosing the right pattern at processor block configuration time rather than relying on the implicit configuration of a single processor. Plus you get the added benefit of not stopping all flows (just a subset) when you only need to tweak settings on one of those two patterns.

Multiple GPUComputationRenderer instances

I have multiple GPUComputationRenderer instances. I don't use them at the same time. I update one, then update a different one. For some reason it seems like they affect each other. I don't have a simple working example, it's part of a complicated project I'm working on. I notice it when I set the two different GPUComputationRenderer instances to have different sizes. Are they somehow sharing some resource in the background?

How should we design database when working with multiple version of same service

For mricro-service based product,We want to provide backward compatibility.
This means we will have multiple versions of same service running at a time.
Problem: When new version is created, there are changes in database TABLES, few columns are added and few are altered. In this case if database is same for services, it will impact older services. What is the best way to handle this ?
Can we have database tables with versions?
One known way is have different database for each service, which we want to avoid.
You should never be in this situation. If columns are added you can have a DTO which do not send out these newly added columns to older versions. If you have to remove, then don't remove, stop using it for new apis, and if you need to alter create a new column and discard and let new api talk to new ones.
Having said that, such changes should be resisted and if you have to you need to make sure ways you can maintain the sanity of data. If you stop using and existing column and add new one how will you read data when you look up at the whole thing.
What will happen when new api makes call to historic data, what will happen when you run a reporting tool on it.
There are so many question that will need to be answered other than how api needs to be served and how services will manage the changes.
Creating a new table can be solution but how good or how bad it is , depends on your use case, what the changes are, what was the significance of the data in the service , what is its historic significance i.e if you need older data, or you can dump it etc
I feel like this is more of business decision rather than technical one.
As far as backward compatibility is concerned, I try to provide it at my controller level. I try as far as possible to have just one core biz logic in my code and map older apis to the newer one by either providing default values or doing required conversions.
I would never want to keep to set of logics. It takes some effort but I am able to find my way. Your case might not be same as mine, but still try to avoid getting into keeping two tables or two databases for old and new apis and try to concentrate the changes wrt to managing old apis into one place.
First of all, its a very good question and design is tricky.
You should refer this answer to get a fair and broad ideas.
Can we have database tables with versions?
In my opinion, you can have whatever you want but this is not recommended because of kind of complexities that it introduces to the system. This is what is concluded in above answer too.
What is the best way to handle this?
The way I do it and have seen in few systems that I didn't worked on that API is basically treated as presentation layer and incompatible DB changes to previous version of API are avoided.
i.e. lets say there is an API change in newer version which doesn't require a DB change - no problem , all is well and good - move ahead.
then lets say there is new API version which is calling for a DB change that will break existing system / old version - Its not good , try to rework your solution to achieve same functionality in such a way so that it doesn't break your existing version. If that's not possible ( obviously everything is possible !! ) , its a case of major product merge & upgrade and needs to be deferred till old version is discarded.
For this very reason, in the very first attempt, we need to design DB tables & JPA entities to be as complete and as broad as possible & keep DTOs and Entities distinct so changes are mainly needed on DTO side and not on entity side.
Obviously, these are subjective opinions, will vary case by case basis and open for debates.

go-diameter: support for different AVP dictionary for different network provider (i.e. Ericsson, Nokia) and different nodes (i.e. GGSN, Tango)

We are working on creating a diameter adapter for OCS. Currently our AVP dictionary is as supplied by go-diameter.
We are trying to Provide a configurable dictionary to support Following
Vendor Specific AVPs to support different network providers, like Nokia and Ericsson
Support for different Network traffic, like VoLTE, GGSN, Tango.
Following are the two approaches that we are currently thinking on.
Include a single dictionary with all supported AVPs and have a single release of diameter adapter. The intelligence to be build inside the code for identifying which AVPs are required for which node.
Have different releases for each dictionary that we want to support, and deploy which ever is required by the service provider.
I have search over the internet to see if something similar has been done as a proof of concept. Need help in identifying which is better solution for implementation.
Im not familiar with go-diameter but My suggestion: Use one dictionary
This dictionary should be used by all vendors and providers.
Reasons:
You dont know how many different releases you will end up with and you might need to support many dictionaries at the end.
If you use few dictionaries most of the AVPs will be the same on all
As bigger as your one dictionary will be you will support more AVPs everywhere and you never 100% sure which AVP might arrive from different clients

SSIS Get List of all OLE DB Destinations in Data Flow

I have several SSIS Packages that we use to load in data from multiple different OLE DB data sources into our DB. Inside each package we have several Data Flow tasks that hold a large amount of OLE DB Sources and Destinations. What I'm looking to do is see if there is a way to get a text output the holds all of the Destinations flow configurations (Sources would be good to but not top of my list).
I'm trying to make sure that all my OLE DB Destination flows are pointed at the right table, as I've found a few hiccups, without having to double click on each Flow task and check that way, it just becomes tedious and still prone to missing things.
I'm viewing the packages in Visual Studio 2013. Any help is appreciated!
I am not aware of any programmatic ways to discover this data, other than building an application to read the XML within the *.dtsx package. Best advice, pack a lunch and have at it. I know for sure that there is nothing with respect to viewing and setting database tables (only server connections).
Though, a solution I may add once you have determined the list: create a variable(s) to store the unique connection strings and then set those connection strings inside the source/destination components. This will make it easier to manage going forward. In fact, you can take it one step further by setting the same values as parameter, as opposed to variables, which have the added benefit of being exposed on the server. This allows either you or the DBA to set the values as you promote through environments or change server nodes.
Also, I recommend rationalizing that solution into smaller solutions if possible. In my opinion, there is nothing worse than one giant solution that tries to do it all. I am not sure if any of this is helpful, but for what its worth, I do hope it helps.
You can use the SSIS Object Model for your needs..An example can be found here. Look in the method IterateAllDestinationComponentnsInPackage for the exact details. To start understanding the code, start in the Start method and follow the path.
Caveats: Make sure you use the appropriate Monikers and Class IDs for the Data Flow Tasks and your Destination Components. You can also use this for other Control Flow Tasks and Data Flow Components (for example, Source Components as your other need seems to be). Just keep in mind the appropriate Monikers and Class IDs.

Resources