I have been looking into this gist which provides a minimal functional implementation of channelled pub/sub style communication over websockets.
For multiple channels we can have a local hash of EM::Channel instances which can be created on the fly as per requirements. What I am concerned with is how can this setup be scaled to support a cluster of server instances or what alternatives are available to facilitate channeled pub/sub via web-sockets which are usable in clustered deployments ?
The Jet Protocol provides strict pub/sub (no-poll) semantics and is open source. It is far more powerful than subscribing to "Channels" (It is called "Fetching" in the Jet wording.
Related
I was wondering if the following IoT problem/use case would be an intended fit for using Apache NiFi:
I use NB-IoT/LTE-M as connectivity means for sending messages to an IoT cloud platform (e.g. AWS IoT Core, Azure IoT Hub or others). I need a protocol converter/gateway for the messages entering as UDP or TCP and leaving as MQTT. Of course I can develop an UDP/TCP listener/server that listens for the entering messages and publishing them to the desired IoT cloud platform (MQTT) broker. But I was thinking of eventual using Apache NiFi, as it has processors for UDP, TCP and MQTT. However, I was wondering if Apache NiFi is meant for these kind of (IoT) scenarios?
Thanks.
Guy
We are using Apache NiFi to ingest and route IoT data at scale. I had to write a custom processor because of a proprietary IoT protocol, however assembling the rest of the flow has been just drag and drop. Before you invest into developing your own UDP/TCP listener/server at least try NiFi and see if you can solve your problem. With NiFi you can design your directed graphs of data routing pretty fast and have a short learn feedback loop.
Further think about:
What will limit the ability to grow the system?
Which resource constraints are important to pay attention to? E.g. metric volume, velocity, variety, volatility
How big can it get? Do you need resiliency?
With clustered NiFi you can spread your workload to multiple instances and satisfy the growth and resiliency requirement. You can also merge data and throttle its volume to protect downstream systems. The capabilities of NiFi are very versatile.
To answer your question: yes, Apache NiFi is actively used for IoT scenarios. There is even a NiFi IoT tutorial on cloudera: https://www.cloudera.com/tutorials/nifi-in-trucking-iot.html
I am a backend developer and I would like to know what are the common technologies for building real-time servers. I know I could use a service like Firebase, but I really want to create it. I have some experience using Websockets on Java, but I would like to know more ways to achieve a real-time server. When I say real-time, I mean something like Facebook. I also would like to know how to scale real-time servers.
Thank you all!
I've asked the same in multiple forums. Common answer to this question is strangely enough still:
WebSocket
Socket.io
Server-Sent Events (SSE)
But those are mainly ways of transporting or streaming events to the clients. Something needs to be built on top of it. And there are multiple other things to consider, such as:
Considerations for real-time API's
What events to send to the client
How to send each client only the events they need
How to handle authorization for events
Where to keep state on the event subscriptions (for stateless services)
How to recover from missed events due to lost connections and service crashes
Producing events for search-, or pagination queries
How to scale
Publish/Subscribe solutions
There are multiple pub/sub solutions out there, such as:
Pusher
PubNub
SocketCluster
etc.
But because of the limitation of a topic based pub/sub architecture, some of the above questions are still left unanswered and has to be dealt with by yourself. Examples are lost connections, where Pusher has no fallback, neither does SocketCluster, and PubNub has a limited queue.
Resgate - Realtime API Gateway
An alternative to the traditional topic based pub/sub pattern is using a resource-aware realtime API Gateway, such as Resgate.
Instead of the client subscribing to topics, the gateway keeps track on which resources (objects or arrays) that the client has fetched, keeping the client data up to date until it unsubscribes.
As a developer of Resgate, I can really recommend checking it out as it solves all above question, is language agnostic, simple and light-weight, and blazingly fast.
Read more at NATS blog.
Scaling
Let's say you want to scale both in the number of concurrent clients and the number of events that is produced. You will eventually need to ensure each client only gets the data they are interested in through either traditional topic based publish/subscribe, or through resource subscriptions. All above solutions handles that.
I also assume all the above mentioned solutions scales concurrent clients by allowing you to add more nodes/servers that handles the persistent WebSocket connections.
With Resgate, first level of scaling is done by simply running multiple instances (it is a simple executable), and adding a load balancer that distributes the connection evenly between them:
Handling 100M concurrent clients
Let's say a single Resgate instance handles 10000 persistent WebSocket connections, and you can add 10000 Resgates (distributed to multiple data centers) to a single NATS Server. This would allow a total of 100M connections. Of course, depending on your data, you might have other scaling issues as well, such as network traffic ;) .
A second layer of scaling (and adding redundancy) would be to replicate the whole setup to different data centers, and have the services synchronize their data between the data centers using other tools like Kafka, CockroachDB, etc.
Scaling data retrieval
With the traditional publish/subscribe solution that only deals with events, you will also have to handle scaling for the HTTP (REST) requests.
With Resgate, this is not required, as resource data is also fetched over the WebSocket connection. This allows Resgate not only to ensure that resource data and events are synchronized (another issue with separate pub/sub solutions), but also that the data can be cached. If multiple clients requests the same data, Resgate will only need to fetch it from the service once, effectively improving scalability.
Butterfly Server .NET is a real-time server written in C# allowing you to create real-time apps. You can see the source at https://github.com/firesharkstudios/butterfly-server-dotnet.
I have been toying with the idea of using gRPC for 'IoT' type devices; not very constrained things like sensors; more like single board computer inbuilt devices like robots, drones and the like. What is needed a interface between device and centralized controller as the devices are developed separately and possible by other companies. So a versioned interface language is a must; and it should not be just in a word document; something programmable like a header file or WSDL or IDL or ProtocolBuffer. Also between device and controller the behavior should be specified for common use case like registration, re-registration etc.This can be in word file or in some non technical document.
The (rpc) interface specification in Protocol Buffer (ver 3) along with the efficient implementation via gRPC is leading me to choose this over CoAP/ LWM2M (Leshan Java and C++ implementation).
Having used both LWM2M and grPC, I would say that gRPC is more developer friendly; interface definition and implementation is fast,compared to OMA LWM2M process.Of course there is no Observer-Notify in gRPC, but for that MQTT should suffice.
Strictly one cannot compare LWM2M to gRPC. LWM2M is not just the interface but defines behavior in lot of IoT cases like BootStrap, Registration, KeepAlive , SW Upgrade etc and its universal HTTP like GET, PUT on an URL type addressable resource makes it very complete. However most of these behaviors can be custom defined with some effort.
Some of the IoT things which we plan to orchestrate are far from little brained devices like bulbs, and more like robots. Has anyone used gRPC for similar purposes. Any success of failure stories to share
The thing I would miss in gRPC for IoT are the MQTT MQ capabilities like queueing of messages, broker bridging QoS Parameter. Or for CoAP the Out of Band messages over SMS Transport. In this context gRPC is "just" an RPC framework which assumes to be always connected over TCP.
We were thinking in the same solution as Joe's, which includes CoAP + Protobuf:
Protobuf & its IDL are a really helpful way to define and govern our APIs in a centralized way. Google's api-linter as well as AIPs are high quality resources for keeping everything sane.
CoAP for transport: Lightweight protocol for IoT devices. It's available in practically every RTOS/embedded/platform and it's adapted to the low bandwitdh we usually face in this world. Being the protocol chosen by Thread and the support Project Connected Home Over IP is giving to it does also help.
Use Protobuf for the messages no matter they are Request/Response or even event messages pushed by the IoT device. An already solved problem by nanopb which is a protobuf wire compatible C code generator for embedded systems.
Generate our own stubs based on the IDL to create our own wrapper over CoAP and nanopb code. Unary calls are supported and also server streaming by leveraging this to the CoAP observable mechanisms.
I have taken the plunge and used it in a project with connected 'Devices'; These are small computers like Raspberry-pi. Overall it has been good experience; and languages used are C++ and Java mainly and also JavaScript in Node.js . We have used this as Dockerized microservices; Load Balancing is what we have not done; and I read that HTTP/2 based load balancing is tricky; Will update that part; planning to use Kubernetes for that. Overall Container technology with versioned interfaces - GRPC seems like a good fit for (micro) services
I use an esp32 and R_Pi with CoAP and protobufs. As far as I know gRPC is not supported for esp32/8266. I'm pretty happy with it, but didn't do any concrete testing against lwm2m. Implementation is here
Can I get the comparison between RabbitMQ and MSMQ. It will be helpful performance information on different factors are available.
I wrote a blog post a while back comparing MSMQ and RabbitMQ (among others):
http://mikehadlow.blogspot.co.uk/2011/04/message-queue-shootout.html
RabbitMQ gave slightly better performance than MSMQ, but both were comprehensively out performed by ZeroMQ. If performance is your main criteria, you should definitely look at ZeroMQ.
It's worth noting that RabbitMQ and MSMQ are very different beasts. MSMQ is a simple store-and-forward queue. It doesn't provide any messaging patterns, such as pub/sub, or routing. For anything beyond simple point-to-point messaging you'd probably want to use a service bus library such as NServiceBus or MassTransit on top of MSMQ.
RabbitMQ is a sophisticated server product that provides complex messaging patterns, topics and routing out-of-the-box. You also get centralized management and DR, something you'd have to implement yourself if you chose MSMQ.
I am trying to develop a publish/subscribe system.
To this end, I have read some papers and articles regarding it.
And they all talk about Messaging service as an integral part of publish/subscribe system.
My question is, can I develop a publish subscribe system without using MOM like JMS?
Or am I missing or oversimplifying things?
I do not think you are oversimplifying things. There are stand-alone products available that provide advanced functionality based on publish/subscribe, without being part of a larger MOM system.
One of them is a group of products implementing the Data Distribution Service (DDS) specification, as standardized by the Object Management Group (OMG). Check out this Wikipedia entry for a very brief introduction and list of references.
DDS supports many advanced data management features like a strong-typed and content aware databus, distributed state management and historical data access. Its rich set of Quality of Service settings allows to off-load a lot of the complexity from your applications to the middleware. This is all based on the publish/subscribe paradigm.
If you would tell more about your application, then I might be able to point you to similar use cases using this technology -- if you are interested.
It depends what you mean by "MOM". If you think MOM = JMS then yes, there are plenty of pub/sub applications which are not JMS servers (off the top of my head): 0MQ, TIBCO Rendezvous and the many AMQP implementations around.
I guess my definition of MOM is an infrastructure for reliably getting a message from one system to another in an asynchronous manner. Pub/sub is a feature on top of the message transport which allows a message to be distributed to multiple other systems. Once you get beyond the point of opening a socket and stuffing a bunch of bytes down it, I would argue you are in the realm of MOM.
So, no you don't need JMS to do pub/sub....there are plenty of open-source and closed-source alternatives out there. Which one depends on your requirements and skills.
You can look at multicast that provides one to many communication. Multicast does not require MOM, instead it requires multicast enabled IP network. Usually the network routers take care of creating copies of message and delivering messages to destinations.