Can I use Apache Nifi as a ESB or a request mediator? - apache-nifi

I've seen Apache Nifi compared to similar ETL tools like Apache Flume, Airflow and Kafka. These are ETL tools more than ESBs or request mediators.
ESBs/request mediators can be used to orchestrate web services and expose a single service (a proxy service) which is expected to serve concurrent HTTP requests efficiently.
My question is, can I use Apache Nifi for the same purpose? To provide service orchestration and serve proxy service endpoints using Nifi's processors such as HandleHttpRequest? Is it designed to handle real-time concurrent requests efficiently?

You brought up a few technologies that are quite different..
Apache NiFi is a dataflow management tool. Unlike, Kafka Streams, Airflow or Apache Flume, it does not require you to write your own code. You can do almost anything you need using the existing processors developed by Apache.
Besides, Airflow is a workflow management tool, could be compared with Oozie.
NiFi is made for real time performance but not for serving as a Rest API. It can start a flow based on an http request like you said though.
Hope it helps

Related

Spring Boot Microservices load balancing vs cloud load balancing

I am new to Microservices. (Learning phase). I have a question. We deploy microservices at cloud. (e.g. AWS). Cloud already provide load balancing and logs. And We also implement Load Balancing(Ribbon) and logs(Rabbit MQ and Zipkin) in Spring Boot.
What is the difference in these two implementation? Do we need both?
Can some answer these questions.
Thanks in advance.
Ribbon is a client side load balancer which means there is no any other hop in between your client and service. Basically you keep and maintain a list of service on your client.
In AWS load balancer case you need to make another hop in between the client and server.
Both have advanges and disadvantages. Former has the advantage of not having any dependency to any specific external solution. Basically with ribbon and service discovery like eureka you can deploy your product to any cloud provider or on-premise setup without additional effort. Latter has advantage of not needing an extra component of service discovery or keeping the cache of service list on client. But it has that additional hop which might be an issue if you are trying to run an very high-load system.
Although I don't have much experience with AWS CloudWatch what I know is it helps you to collect logs to a central place from different AWS components. And that is what you are trying to do with your solution.

Hadoop Zookeeper understanding

I'm having difficulty understanding the Zookeeper Hadoop framework. The main aspects of zookeeper I find confusing is understanding is how it handles consistency across its nodes, but also how it makes use of its distributed in-memory file system to handle co-ordination? Any help with these points would be great.
As #Yann said, Zookeeper is not related to Hadoop. Zookeeper is a coordination engine for services. With it, you can create distributable configuration, service discovery, leader election, among other features.
I suggest you to read this post to see how it works for load balancing and this other for service discovery. Other nice resource to read is Apache Curator that is a framework that make easy to use Zookeeper with Java or any other JVM language

RESTful Microservice failover & load balancing

At the moment we have some monolithic Web Applications and try to transfer the projects to an microservices infrastructure.
For the monolithic application is an HAProxy and Session Replication to have failover and load balancing.
Now we build some RESTful microservices with spring boot but it's not clear for me what is the best way to build the production environment.
Of course we can run all applications as unix services and still have a reverse proxy for load balancing and failover. This solution seems very heavy for me and have a lot of configuration and maintenance. Resource Management and scaling up or down servers will be always a manually process.
What are the best possibilities to setup production environment with 2-3 Servers and easy resource management?
Is there some solution the also support continuous deployment?
I'd recommend looking into service discovery. Netflix descibes this as:
A Service Discovery system provides a mechanism for:
Services to register their availability
Locating a single instance of a particular service
Notifying when the instances of a service change
Packages such as Netflix's Eureka could be of help. (EDIT - actually this looks like it might be AWS specific)
This should work well with continuous delivery as the services can make themselves unavailable, be updated and then register availability again.

Is Spring XD the right tool choice?

We're building an M2M IoT platform and part of the ecosystem is a Big Data storage and analytics component.
The platform connects devices at one end and provides a streaming data output using ActiveMQ to interface with the Big Data application layer.
I'm right now designing this middle layer which accepts machine data, running real time processes and stores this data in to a Hadoop storage module.
From what I see, Spring XD seems to be able to orchestrate this process from ingestion, to filtering, processing, analytics and export to Hadoop.
However, I do not know anyone who has done something like this. Anyone here who has executed something similar? Need your feedback into the choice of tool for the middleware.
Spring XD is great with RabbitMQ, for ActiveMQ you can use the JMS connector.
For more information take a look at Spring Integration, which is the main underpinnings and has been around for ever.
Spring XD runs on YARN or Zookeeper which are very solid.
I have seen it used for orchestration of big data in a few places.

How to do load balancing in distributed OSGi?

We deploy two service instance in difference machine using CXF distributed OSGi. we want to give the system the load balancing feature. All we know the OSGi don't provide any load balancing feature. Does anyone know how to do it ?
Load balancing is a concern that is meant to be implemented by the Topology Manager (TM). It would be useful to read the Remote Services Admin specification, which addresses exactly this kind of question.
As far as I know, the CXF implementation of Remote Services only implements a single TM, which is "promiscuous", i.e. it publishes every available service in every listening framework. It is possible however to write your own TM to perform load balancing and failover etc.
The Remote Services spec is written in such a way that a TM implementation can be developed completely independently of any specific Remote Services implementation.
You should be able to get the complete list of services using a ServiceTracker. So a nice way to create a load balancer should be to create a proxy service yourself that does the load balancing and publish it locally as a service with a special property. So your business application can use the service without knowing anything about the details of load balancing.

Resources