How to create a udp-based message broker service - hadoop

I want to create a udp-based message broker service.
I have a few dozens of sources, each transmiting at different rates, part of them streaming and part of them forward the data in batch.
I want all the data to go to one destination - a Cloudera Hadoop cluster (using RedHat 6.6 OS), that will use Kafka/Flume as it's message broker.
I need to create the inbetween message broker service. It has to be robust and fault tollerant. It can receive the data from the sources using any protocol, but it has to forward the messages using UDP (or any one-way protocol, no ACK/SYN or any other respond allowed).
For that reason it has to use a PUSH mechanism, and the data cannot be pulled by the Hadoop cluster.
As much as i know Kafka and Flume - they use TCP to forward messages. I found "udp-kafka-bridge" and "flume-udp-source", but I do not have any experience with them.
The message broker has to be robust and fault tollerant. It has to be able to deal with changing rates of incoming data, and also preferred to be Near Real Time broker.
Do you have any recommendation for tools/architecture I should use?
thank you!

Related

Minimizing bandwidth on multiple Tibco brokers / destinations

My experience with setting up Tibco infrastructure is minimal, so please excuse any misuse of terminology, and correct me where wrong.
I am a developer in an organization where I don't have access to how the backend is setup for Tibco.  However we have bandwidth issues between our regional centers, which I believe is due to how it's setup.
We have a producer that sends a message to multiple "regional" brokers.  However these won't always have a client who needs to subscribe to the messages.
I have 3 questions around this:
For destination bridges: https://docs.tibco.com/pub/ems/8.6.0/doc/html/GUID-174DF38C-4FDA-445C-BF05-0C6E93B20189.html
Is a bridge what would normally be used, to have a producer send the same message to multiple brokers/destinations or is there something else?
It's not clear in the documentation, if a bridge exists to a destination where there is no client consuming a message, does the message still get sent to that destination?  I.e., will this consume bandwidth even with no client wanting it?
If the above is true (and messages are only sent to destinations with a consumer), does this apply to both Topics and Message Selectors?
Is a bridge what would normally be used, to have a producer send the same message to multiple brokers/destinations or is there something else?
A bridge can be used to send messages from one destination to multiple destinations (queues or topic).
Alternatively Topics can be used to send a message to multiple consumer applications. Topics are not the best solution if a high level of integrity is needed(no message losses, queuing, etc).
It's not clear in the documentation, if a bridge exists to a destination where there is no client consuming a message, does the message still get sent to that destination? I.e., will this consume bandwidth even with no client wanting it?
If the bridge destination is a queue, messages will be put in the queue.
If the bridge destination is a Topic, messages will be distributed only if there are active consumers applications (or durable subscribers).
3 If the above is true (and messages are only sent to destinations with a consumer), does this apply to both Topics and Message Selectors?
This applies only to Topics (when there is no durable subscriber)
An alternative approach would be to use routing between EMS servers. In this approach Topics are send to remote EMS servers only when there is a consumer connected to the remote EMS server (or if there is a durable subscriber)
https://docs.tibco.com/pub/ems/8.6.0/doc/html/GUID-FFAAE7C8-448F-4260-9E14-0ACA02F1ED5A.html

shared node wise queue

I am building a proxy server using Java. This application is deployed in docker container (multiple instances)
Below are requirements I am working on.
Clients send http requests to my proxy server
Proxy server forward those requests in the order it received to destination node server.
When destination is not reachable, proxy server store those requests and forward it when it is available in future.
Similarly when a request fails, request will be re-tried after "X" time
I implemented a node wise queue implantation (Hash Map - (Key) node name - (value) reachability status + requests queue in the order it received).
Above solution works well when there is only one instance. But I would like to know how to solve this when there are multiple instances? Is there any shared datastructure I can use to solve this issue. ActiveMQ, Redis, Kafka something of that kind (I am very new to shared memory / processing).
Any help would be appreciated.
Thanks in advance.
Ajay
There is an Open Source REST Proxy for Kafka based on Jetty which you might get some implementation ideas from.
https://github.com/confluentinc/kafka-rest
This proxy doesn’t store messages itself because kafka clusters are highly available for writes and there are typically a minimum of 3 kafka nodes available for Message persistence. The kafka client in the proxy can be configured to retry if the cluster is temporarily unavailable for write.

Difference between queue manager and message broker

What is the difference between a Websphere Message Broker and a Queue Manager. I guess the queue manager puts messages in the queue, takes messages out of the queue, moves messages to backout queues etc. So what is the job of the broker?
Does it sit between the publisher and the Queue Manager or between the consumer and the Queue Manager?
Websphere MQ is a software which uses the AMQ(Asynchronous messaging protocol). You can achieve asynchronous messaging between your applications via Websphere MQ, which will make your infrastructure loosely coupled(Applications can keep working even though other applications are down in the infrastructure).
But the applications in your infrastructure may not be able to understand each others' message formats, and hence just sending the message to the target application may not be enough. You may require transformation of the message.
You can do it by writing your own program using the Websphere MQ API.
Your program should be able to do the below things:
Pick message from a specific queue (using MQGET)
Should be able to understand the message. That is say it's an XML message. Then your program must be able to parse the XML and read the
data in it.
After reading the input message you will make your output message based on the requirements.
Then you will either publish the message or put the message in some specific queue(say TargetQ), so that the target application can get
the message. Target application will then get the message either by
issuing MQGET on the TargetQ or subscribing to the topic which was
published from your application.
But writing your own program will take a lot of development time and effort and also may be a bit complex.
So, IBM provided its own software to do the job, which is "Websphere Message Broker".
WMB allows you to create programs very easily and a lot faster.
Appropriate nodes in WMB will do all above steps for you. In fact it provides lot many features than the above steps.
Websphere MQ still doesn't have an HTTP listener. But, a message broker does. It allows you to host web services and have HTTP based flows etc that too in a secure way(Supports SSL).
MQ is providing you the infrastructure for messaging: queues and topics - IBM MQ
IBM Integration Bus (formerly known as WebSphere Message Broker) allows you to apply the common EAI patterns, e.g. Routing, Transformation
Hope that helps.
Best,
Patrick
I want to add just two points: Message Broker (now IIB) includes a set of optimized and fast parsers (XML, CSV, etc) and useful mapping nodes (msg-msg, msg-db). MQ is also used for internal configuration messages coming from the Configuration Manager.
WebSphere MQ is a solution for application-to-application communication services regardless of where your applications or data reside. Whether on a single server, separate servers of the same type, or separate servers of different architecture types, WebSphere MQ facilitates communications between applications by sending and receiving message data via messaging queues. Applications then use the information in these messages to interact with Web browsers, business logic, and databases. WebSphere MQ provides a secure and reliable transport layer for moving data unchanged in the form of messages between applications but it is not aware of the content of the messages. WebSphere MQ uses a set of small and standard application programming interfaces (APIs) that support a number of programming languages, including Visual Basic, NATURAL, COBOL, Java, and C across all platforms.
WebSphere Message Broker is built to extend WebSphere MQ, and it is capable of understanding the content of each message that it moves through the Broker. Customers can define the set of operations on each message depending on its content. The message processing nodes supplied with WebSphere Message Broker are capable of processing messages from various sources, such as Java Message Service (JMS) providers, HyperText Transfer Protocol (HTTP) calls, or data read from files. By connecting these nodes with each other, customers can define linked operations on a message as it flows from one application to its destination.
Message Broker can do the following:
Matches and routes communications between services
Converts between different transport protocols
Transforms message formats between requestor and service
Identifies and distributes business events from disparate sources
Together, WebSphere MQ and WebSphere Message Broker deliver a comprehensive publish and subscribe facility, connecting Message Broker’s broad transport and format support to WebSphere MQ’s messaging backbone. WebSphere Message Broker extends the WebSphere MQ publish and subscribe functionality with advanced function such as content-based publish and subscribe by means of an enhanced Publication node. The two products share a common publish and subscribe domain for topic- and content-based operations
MQ is mainly for the transforming the messages from on system to another system.
WMB(websphere message broker) will sit between QMGR's and transforming message along with change content of the message format as per the system requirement/Business Logic implementation.
Srinu D

Build durable architecture with Websphere MQ clients

How can you create a durable architecture environment using MQ Client and server if the clients don't allow you to persist messages nor do they allow for assured delivery?
Just trying to figure out how you can build a salable / durable architecture if the clients don't appear to contain any of the necessary components required to persist data.
Thanks,
S
Middleware messaging was born of the need to persist data locally to mitigate the effects of failures of the remote node or of the network. The idea at the time was that the queue manager was installed locally on the box where the application lives and was treated as part of the transport stack. For instance you might install TCP and WMQ as a transport and some apps would use TCP while others used WMQ.
In the intervening 20 years, the original problems that led to the creation of MQSeries (Now WebSphere MQ) have largely been solved. The networks have improved by several nines of availability and high availability hardware and software clustering have provided options to keep the different components available 24x7.
So the practices in widespread use today to address your question follow two basic approaches. Either make the components highly available so that the client can always find a messaging server, or put a QMgr where the application lives in order to provide local queueing.
The default operation of MQ is that when a message is sent (MQPUT or in JMS terms producer.send), the application does not get a response back on the MQPUT call until the message has reached a queue on a queue manager. i.e. MQPUT is a synchronous call, and if you get a completion code of OK, that means that the queue manager to which the client application is connected has received the message successfully. It may not yet have reached its ultimate destination, but it has reached the protection of an MQ Server, and therefore you can rely on MQ to look after the message and forward it on to where it needs to get to.
Whether client connected, or locally bound to the queue manager, applications sending messages are responsible for their data until an MQPUT call returns successfully. Similarly, receiving applications are responsible for their data once they get it from a successful MQGET (or JMS consumer.receive) call.
There are multiple levels of message protection are available.
If you are using non-persistent messages and asynchronous PUTs, then you are effectively saying it doesn't matter too much whether the messages reach their destination (although they generally will).
If you want MQ to really look after your messages, use synchronous PUTs as described above, persistent messages, and perform your PUTs and GETs within transactions (aka syncpoint) so you have full application control over the commit points.
If you have very unreliable networks such that you expect to regularly fail to get the messages to a server, and expect to need regular retries such that you need client-side message protection, one option you could investigate is MQ Telemetry (e.g. in WebSphere MQ V7.1) which is designed for low bandwidth and/or unreliable network communications, as a route into the wider MQ.

ActiveMQ network of brokers connectivity scheme

I need to scale up my ActiveMQ solution so I have defined a network of brokers.
I'm tring to figure out how to connect my producers and consumers to the cluster.
does each producer has to be connected to a single broker (with the failover uri for availability)? in this case how can I guarentry the distribution of traffic accross the brokers? do I need to configure the producers to connect each to a diffrent broker?
should I apply the same schema for the consumers?
This makes the application aware of the cluster topology, which I hope can be avoided by a discent cluster
Tx
Tomer
I strongly suggest you carefully read through the documentation from activemq.apache.org on clustering ActiveMQ. There are a lot of very helpful tips.
From what you have written I suggest you pay special attention to this. At the bottom of the page it details how you can control from server side the failover/failback configuration for your producers.
For example:
updateClusterClients - if true pass information to connected clients about changes in the topology of the broker cluster
rebalanceClusterClients - if true, connected clients will be asked to rebalance across a cluster of brokers when a new broker joins the network of brokers
updateURIsURL - A URL (or path to a local file) to a text file containing a comma separated list of URIs to use for reconnect in the case of failure
In a production active system then I would think that making use of updateURIsURL would make it a lot less painful scaling out.

Resources