I am working on a cloud based application using Apache Nifi, for this we required to support Multitenency. But the current Nifi implementation only supports role based access for users, for a single flow.
I could understand that the flow state is saved as a single compressed XML file for a Nifi instance. So that who ever logins into that instance can view the same flow. Our requirement is to create unique flows for each user login. I tried to replicate state saving gz XML file for each users, but couldn't succeeded as the FlowService/FlowController which loads the XML file, is instantiated at the application startup and they are singleton. Please correct me, if Iam wrong with this approach. Or is there any other solution for adding Multitenant support with Nifi. I also wonder the reason behind the Nifi as a single user application.
Multi-tenant support will be introduced in Apache NiFi 1.0.0. There is a BETA release available [1]. This will support assigning permissions on a per component basis. However, the different tenants still share a canvas. There has been discussions of introducing a workspace concept that could provide visually separate dataflows.
[1] https://nifi.apache.org/download.html
Related
While creating a NiFi flow I'm realizing the versions of the components changing.
I understand that the version changes each time the component updates - but what is considered an update of a component?
For example, what causes an update in a connection's version?
I'm trying to find some pattern but with not much luck.
Thanks in advance!
The official documentation states that you can have multiple versions of your flow at the same time:
You have access to information about the version of your Processors, Controller Services, and Reporting Tasks. This is especially useful when you are working within a clustered environment with multiple NiFi instances running different versions of a component or if you have upgraded to a newer version of a processor.
You can opt-out of versioning all together:
Methods of disable the versioning:
NiFi UI: To change the version of a flow, right-click on the versioned process group and select Version→Change version (link).
Rest API: Send an http DELETE request to /versions/process-groups/{id} with the appropriate ID.
You can also use Toolkit CLI to view available versions, by executing ./bin/cli.sh registry diff-flow-versions (link).
All environments are in the same tenant, same Azure Active Directory.
Need to push data from one environment's (Line of Business) Common Data Service to another environment's Common Data Service (Central Enterprise CDS) where reporting is running from.
I've looked into using OData Dataflows, however this seems like more of a manually triggered option.
OData dataflows is meant for and designed to support migration and synchronization of large datasets in Common Data Service during such scenarios:
A one-time cross-environment or cross-tenant migration is needed (for
example, geo-migration).
A developer needs to update an app that is being used in production.
Test data is needed in their development environment to easily build
out changes.
Reference: Migrate data between Common Data Service environments using the dataflows OData connector
For continuous data synchronization, use the CDS connector in Power Automate and attribute filters for source CDS record updates to target CDS entities.
We are using templates to package up some data transfer jobs between two nifi clusters, one acting as a sender, the other as the receiver. One of our jobs contains a remote process group and all worked fine at the point the template was created.
However when we deploy the template through our environments (dev, test, pre, prod), it is tedious and annoying to have to manually delete and a recreate a remote process group in the user interface. I'd like to automate this to simplify deploying templates and reduce the manual intervention.
Is it possible to update a remote processor group and its port configuration through the rest-api ?
Do I just use the REST api to create a new RPG with the correct configuration ?
Does anyone have any experience with this?
There is a JIRA to address this issue [1] which will be worked in conjunction with some of the ongoing Flow Registry (SDLC for flows) efforts. Until then, the best option would be (2) above.
[1] https://issues.apache.org/jira/browse/NIFI-4526
I am very new to Hadoop tools. So I am asking this doubt. I am using sqoop to push data from my relational DB to HDFS. Now as a next step I want to generate some reports using this data which is stored in HDFS. I have my own custom reports format.
I am aware that using HIVE I can get data from HDFS. but is it possible that I can design my own custom reports(Web UI) over this? is there any other tools I can use?
Else, is it possible to deploy an application( containing HTML GUI and java API's) on same machine and I can access it via HTTP and can see data present in HDFS?
You can use Tableau for better experience though it is paid but is the best in market,you can even customize your graph or report using tableau.You can get trial version of tableau from their site. You can use PowerBI from Microsoft which free and works well with Big data. Ambrose is created by twitter which is also having good support(I dind't tried this one).
Check Ambrose as this is what your are looking for. You can access it via HTTP url.
I'm using ASP.NET Identity for Authentication and Authorization. Since I use docker on the recommended way (separate container for building and running), I got always logged out after each deployment. Seems like ASP.NET Core doesn't store the sessions in the database. Also cause I can't see any table where they are.
How can I fix this so that my users don't get logged out after each deployment?
I think I need to store the sessions in the database. But I couldn't find information how to do this. I found information about using Redis as session store. This comes near - I'm not sure, if this also affect the ASP.NET Identity session, or only the session stores like TempData. And the other problem is, that I would like to store the session in my MySQL database using Pomelo.EntityFrameworkCore.MySql provider.
Found out that using memory storage causes issues with the encryption keys, too. In Short, ASP.NET Core use those keys to protect sensitive data like sessions. So they're not stored in plain text. Seems like that ASP.NET generate those keys automatically on the first application run.
Cause it runs in a Docker container, this will result in two big problems:
The encription key get lost by rebuilding the container image. ASP.NET Core generated a new one automatically, but can't decrypt the existing sessions cause they were encrypted using an different key
A container is isolated, so the default memory storage provider for sessions would lost its data after every new deployment
This could be solved by using a storage which is running on a different server than the webserver, as I suggested. I couldn't find any MySQL implementation for this. Only SQL server, which seems to be MSSQL. I fixed it by installing a Redis server. Its used for session storage and the encryption keys.
To let ASP.NET Core storage encryption keys in Redis, install the Microsoft.AspNetCore.DataProtection.Redis provider and append the following lines to ConfigureServices before AddOptions
var redis = ConnectionMultiplexer.Connect($"{redisIpAddress}:{redisPort}");
services.AddDataProtection().PersistKeysToRedis(redis, "DataProtection-Keys");
Note, that this is only part of ASP.NET Core since the 1.1.0 release. Cause it has dependencies on other packages of the 1.1.0 branch, I'd assume that its not working on the LTS 1.0 release. In this case, you may need to write a custom implementation which is 1.0 compatible. But I haven't tested this, since I'm using 1.1.0 in my project. More infos in this article: http://www.tugberkugurlu.com/archive/asp-net-core-authentication-in-a-load-balanced-environment-with-haproxy-and-redis
In summary, I don't think its a bad idea to use Redis for this instead of a SQL database, cause Redis is optimized for serving key-value pairs very fast. And the base is there to use Redis for caching other parts of the application like (complex) database queries. This can speed up your application and reduce load of the database server.