How does one setup a Distributed Map Cache for NiFi? - apache-nifi

I'm brand new to NiFi and simply playing around with processors.
I'm trying to incorporate Wait and Notify processors in my testing, but I have to setup a Distributed Map Cache (server and client?).
The NiFi documentation assumes a level of understanding that I do not have.
I've installed memcached on my computer (macOS) and verified that it's running on Port 11211 (default). I've created a DistributedMapCacheClientService and DistributedMapCacheServer under NiFi's CONTROLLER SERVICES, but I'm getting java.net.SocketTimeoutException & other errors.
Is there a good tutorial on this entire topic? Can someone suggest how to move forward?

the DistributedMapCacheClientService and DistributedMapCacheServer does not require additional software.
To create these services, right-click on the canvas, select Configure and then select the Controller Services tab. You can then add new services by clicking the + button on the right and searching by name.
create DistributedMapCacheServer with default parameters (port 4557) and enable it. this will start built-in cache server.
create DistributedMapCacheClientService with hostname localhost and other default parameters and enable it
create a simple flow GenerateFlowFile set the run schedule and not zero bytes size in parameters.
connect it to PutDistributedMapCache set Entry Identifier as Key01 and choose your DistributedMapCacheClientService
try to run it. and if port 4557 not used by other software the put cache should work.

#Darshan
Yey it will work beacause in the documentation of DistributedMapCacheClientService says that it :
Provides the ability to communicate with a DistributedMapCacheServer. This can be used in order to share a Map between nodes in a NiFi cluster

Related

Using Apache Nifi in a docker instance, for a beginner

So, I want, very basically, to be able to spin up a container which runs Nifi, with a template I already have. I'm very new to containers, and fairly new to Nifi. I think I know how to spin up a Nifi container, but not how to make it so that it will automatically run my template every time.
You can use the apache/nifi Docker container found here as a starting point, and use a Docker RUN/COPY command to inject your desired flow. There are three ways to load an existing flow into a NiFi instance.
Export the flow as a template (an XML file containing the exported flow segment) and import it as a template into your running Nifi instance. This requires the "destination" NiFi instance to be running and uses the NiFi API.
Create the flow you want, manually extract the entire flow from the "source" NiFi instance by copying $NIFI_HOME/conf/flow.xml.gz, and overwrite the flow.xml.gz file in the "destination" NiFi's conf directory. This does not require the destination NiFi instance to be running, but it must occur before the destination NiFi starts.
Use the NiFi Registry to version control the original flow segment from the source NiFi and make it available to the destination NiFi. This seems like overkill for your scenario.
I would recommend Option 2, as you should have the desired flow as you want it. Simply use COPY /src/flow.xml.gz /destination/flow.xml.gz in your Dockerfile.
If you literally want it to "run my template every time", you probably want to ensure that the processors are all in enabled state (showing a "Play" icon) when you copy/save off the flow.xml.gz file, and that in your nifi.properties, nifi.flowcontroller.autoResumeState=true.

How to get dmgr host and port number dynamically using jython and jacl in IBM Websphere Application Server in linux?

I need to get Dmgr host and port dynamically to sync the node.
AdminControl.getHost() and AdminControl.getPort()
I am not sure whether i works. Thanks in advance
Would something like this work instead at the end of your administrative script?
AdminConfig.save()
if (NDInstall == "ND"):
nodeSync = AdminControl.completeObjectName("type=NodeSync,node=" + nodeLongName + ",*")
AdminControl.invoke(nodeSync, "sync")
A save and sync by itself doesn't require nodes or application servers to be down. Depending on the nature of the change you may need to recycle application servers to bring the change into effect. One feature that's in ND to help with high availability is the ability to ripple start servers in a cluster. This way one or more application servers stay up to service requests while a change is 'rippled' into effect.
A cluster is also an administrative unit that can be stopped and started. You can arrange your clusters however you want across your nodes.

Solutions for a secure distributed cache

Problem: I want to cache user information such that all my applications can read the data quickly, but I want only one specific application to be able to write to this cache.
I am on AWS, so one solution that occurred to me was a version of memcached with two ports: one port that accepts read commands only and one that accepts reads and writes. I could then use security groups to control access.
Since I'm on AWS, if there are solutions that use out-of-the box memcached or redis, that'd be great.
I suggest you use ElastiCache with one open port at 11211(Memcached)then create an EC2 instance, set your security group so only this server can access to your ElastiCache cluster. Use this server to filter your applications, so only one specific application can write to it. You control the access with security group, script or iptable. If you are not using VPC, then you can use cache security group.
I believe you can accomplish this using Redis (instead of Memcached) which is also available via ElastiCache. Once the instance has been created, you will want to create a replication group and associate it to the cache cluster you already launched.
You can then add instances to the replication group. Instances within the replication group are simply replicated from the Master Cache Cluster (single Redis instance) and so are (by default) read-only.
So, in this setup, you have a master node (single endpoint) that you can write to and as many read nodes (multiple endpoints) as you would like.
You can take security a step further and assign different routing rules to the replication group (via the VPC) so the applications reading data does not have access to the master node (the only one that can write data).

Accessing Clustered MSMQ with an application

We are switching from a non-clustered to a 2-node clustered MSMQ Windows Server 2008 R2 SP1 Enterprise environment. Previously, when it was non-clustered, we wrote a .NET 3.5 C# Windows Form application to help us manage our environment (so it does tasks such as create queues with the right permissions, read messages, forward messages, etc.). I would like to make this application work with our new cluster.
Per these articles,
http://blog.terranspot.com/2011/07/accessing-microsoft-message-queuing.html
http://blogs.msdn.com/b/johnbreakwell/archive/2008/02/18/clustering-msmq-applications-rule-1.aspx
I understand that I need to add the application as a resource on the cluster as when I don't, I am accessing the node's MSMQ instance. To help with my debugging, I have turned the local MSMQ services off. No matter what I do, however, the program keeps trying to access the node's instance. I added it as an application resource (with the command line of "Q:\QueueManagerConsole.exe". The Q:\ is the disk that is shared between the 2 nodes that is part of the failover cluster), but when I run it via Windows Explorer, it doesn't see the cluster instance, only the local. I have seen no way to execute a program from Failover Cluster Manager, so I don't understand what I am doing wrong. I switched the code to access everything via "." (so MessageQueue.GetPrivateQueuesByMachine(".")), which, per my meager understanding is how you access the local queue. Could someone explain, preferably acting as if I had no clue what I was doing, on a. if this IS possible and b. HOW to do this correctly?
Hi I did something similar a while ago. Try deploy a service in a failover cluster
, it wokerd for me to:
configure the app to use clustered msmq
configure app as clustered resource
configure the app to connect under host name
set the permission set rquired for transpot
At least this will give you a good starting point.
I finally got this working by creating a shortcut to the application and putting it on the server that was actually accessing the clustered queues.
Please try add to environment used by Your application following Environment variables:
_CLUSTER_NETWORK_NAME_
_CLUSTER_NETWORK_HOSTNAME_
with cluster server name as a value. It worked in the system which is being developed by my team - it contains a few services which had to access clustered MSMQ and it solved the problem.

EC2 database server failover strategy

I am planning to deploy my web app to EC2. I have several webserver instances. I have 1 primary database instance. I have 1 failover database instance. I need a strategy to redirect the webservers to the failover database instance IP when the primary database instance fails.
I was hoping I could use an Elastic IP in my connection strings. But, the webservers are not able to access/ping the Elastic IP. I have several brute force ideas to solve the problem. However, I am trying to find the most elegant solution possible.
I am using all .Net and SQL Server. My connection strings are encrypted.
Does anybody have a strategy for failing over a database instance in EC2 using some form of automation or DNS configuration?
Please let me know.
http://alestic.com/2009/06/ec2-elastic-ip-internal
tells you how to use the Elastic IP public DNS.
Haven't used EC2 but surely you need to either:
(a) put your front-end into some custom maintenance mode, that you define, while you switch the IP over; and have the front-end perform required steps to manage potential data integrity and data loss issues related to the previous server going down and the new server coming up when it enters and leaves your custom maintenance mode
OR, for a zero down-time system:
(b) design the system at the object/relational and transaction levels from the ground up to support zero-down-time fail-over. It's not something you can bolt on quicjkly to just any application.
(c) use some database support for automatic failover. I am unaware whether SQL Server support for failover suitable for your application exists or is appropriate here. I suggest adding a "sql-server" tag to the question to start a search for the right audience.
If Elastic IPs don't work (which sounds odd to say the least - shouldn't you talk to EC2 about that), you mayhave to be able to instruct your front-end which new database IP to use at the same time as telling it to go from maintenance mode to normal mode.
If you're willing to shell out a bit of extra money, take a look at Rightscale's tools; they've built custom server images and supporting tools that handle database failover (among many other things). This link explains how to do it with MySQL, so will hopefully show you some principles even though it doesn't use SQL Server.
I always thought there was this possibility in the connnection string
This is taken (but not yet tested) from How to add Failover Partner to a connection string in VB.NET :
If you connect with ADO.NET or the SQL
Native Client to a database that is
being mirrored, your application can
take advantage of the drivers ability
to automatically redirect connections
when a database mirroring failover
occurs. You must specify the initial
principal server and database in the
connection string and the failover
partner server.
Data Source=myServerAddress;Failover Partner=myMirrorServerAddress;
Initial Catalog=myDataBase;Integrated Security=True;
There is ofcourse many other ways to
write the connection string using
database mirroring, this is just one
example pointing out the failover
functionality. You can combine this
with the other connection strings
options available.
To broaden gareth's answer, cloud management softwares usually solve this type of problems. RightScale is one of them, but you can try enStratus or Scalr (disclaimer: I work at Scalr). These tools provide failover solutions like:
Backups: you can schedule automated snapshots of the EBS volume containing the data
Fault-tolerant database: in the event of failure, a slave is promoted master and mounted storage will be switched if the failed master and new master are in the same AZ, or a snapshot taken of the volume
If you want to build your own solution, you could replicate the process detailed below that we use at Scalr:
Is there a slave in the same AZ? If so, promote it, switch EBS
volumes (which are limited to a single AZ), switch any ElasticIP you
might have, reconfigure replication of the remaining slaves.
If not, is there a slave fully replicated in another AZ? If so, promote it,
then do the above.
If there are no slave in same AZ, and no slave fully
replicated in another AZ, then create a snapshot from master's
volume, and use this snapshot to create a new volume in an AZ where a
slave is running. Then do the above.

Resources