ZeroMQ Pub-Sub implement privileges for specific topics for specific subscribers - zeromq

Is it possible, in ZeroMQ Pub-Sub model, to filter out (disallow) some topic for specific subscribers for security reasons? If not, what other pattern could match such architecture?

I think the only way to get what you want -- while still using pub/sub sockets -- would be to create an xpub/xsub proxy server. You'd have a structure something like:
Privileged clients connecto the upstream publisher, while "restricted" clients connect through the proxy.
Here's an example proxy implementation in Python; a C/C++ solution would use similar logic:
import zmq
import random
import time
ctx = zmq.Context()
upstream = ctx.socket(zmq.XSUB)
downstream = ctx.socket(zmq.XPUB)
upstream.connect("tcp://localhost:3000")
downstream.bind("tcp://127.0.0.1:3001")
poller = zmq.Poller()
poller.register(upstream, zmq.POLLIN)
poller.register(downstream, zmq.POLLIN)
secret_topics = ["topic3"]
while True:
socks = dict(poller.poll())
if upstream in socks and socks[upstream] == zmq.POLLIN:
msg = upstream.recv_multipart()
# We've received a messages from the upstream
# publisher. Let's see if we should block it...
if msg[0].decode() in secret_topics:
# It's a secret message, don't publish it to
# our subscribers!
print("upstream !!", msg)
continue
# If we get this far, publish the message to our
# subscribers.
print("upstream ->", msg)
downstream.send_multipart(msg)
elif downstream in socks and socks[downstream] == zmq.POLLIN:
# This is a message FROM the subscibers TO the
# publisher (i.e., a subscription message)
msg = downstream.recv_multipart()
print("downstream ->", msg)
upstream.send_multipart(msg)
A SUB socket client will connect to this instead of the publisher, and the proxy will filter out messages that have topics matching an item in secret_topics.
The next question becomes, "how do I prevent the client from connecting to the upstream publisher?", to which the answer is probably to implement authentication so that only authorized clients can connect to the upstream publisher, and everything else connects to the filtering proxy (or require different auth for the proxy).

Related

How to connect queues to a ZeroMQ PUB/SUB

Consider the following:
a set of 3 logical services: S1, S2 and S3
two instances of each service are running, so we have the following processes: S1P1, S1P2, S2P1, S2P2, S3P1, S3P2
a ZeroMQ broker running in a single process and reachable by all service processes
A logical service, let's say S1, publishes a message M1 that is of interest to logical services S2 and S3. Only one process of each logical service must receive M1, so let's say S2P1 and S3P2.
I have tried the following, but without success:
broker thread 1 is running a XSUB/XPUB proxy
broker thread 2 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S1)
broker thread 3 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S2)
broker thread 4 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S3)
each logical service process is running a REP socket thread connected to the broker DEALER socket
I figured that the XSUB/XPUB proxy would give me publish/subscribe semantics and that the ROUTER/DEALER proxies would introduce a competition between the REP sockets for the messages sent by the XSUB/XPUB proxy.
How can I combine ZeroMQ sockets to accomplish this?
Update1
I know "without success" isn't helpful, I've tried different configurations and got different errors. The latest configuration I tried is the following:
(XSUB proxy=> XPUB) => (SUB copyLoop=> REQ) => (ROUTER proxy=> DEALER) => REP
The copyLoop goes like this:
public void start() {
context = ZMQ.context(1);
subSocket = context.socket(ZMQ.SUB);
subSocket.connect(subSocketUrl);
subSocket.subscribe("".getBytes());
reqSocket = context.socket(ZMQ.REQ);
reqSocket.connect(reqSocketUrl);
while (!Thread.currentThread().isInterrupted()) {
final Message msg = receiveNextMessage();
resendMessage(msg);
}
}
private Message receiveNextMessage() {
final String header = subSocket.recvStr();
final String entity = subSocket.recvStr();
return new Message(header, entity);
}
private void resendMessage(Message msg) {
reqSocket.sendMore(msg.getKey());
reqSocket.send(msg.getData(), 0);
}
The exception I get is the following:
java.lang.IllegalStateException: Cannot send another request
at zmq.Req.xsend(Req.java:51) ~[jeromq-0.3.4.jar:na]
at zmq.SocketBase.send(SocketBase.java:613) ~[jeromq-0.3.4.jar:na]
at org.zeromq.ZMQ$Socket.send(ZMQ.java:1206) ~[jeromq-0.3.4.jar:na]
at org.zeromq.ZMQ$Socket.sendMore(ZMQ.java:1189) ~[jeromq-0.3.4.jar:na]
at com.xyz.messaging.zeromq.SubReqProxyConnector.resendMessage(SubReqProxyConnector.java:47) ~[classes/:na]
at com.xyz.messaging.zeromq.SubReqProxyConnector.start(SubReqProxyConnector.java:35) ~[classes/:na]
I'm running JeroMQ 0.3.4, Oracle Java 8 JVM and Windows 7.
You seem to be adding in some complexity with your ROUTER connection - you should be able to do everything connected directly to your publisher.
The error you're currently running into is that REQ sockets have a strict message ordering pattern - you are not allowed to send() twice in a row, you must send/receive/send/receive/etc (likewise, REP sockets must receive/send/receive/send/etc). From what it looks like, you're just doing send/send/send/etc on your REQ socket without ever receiving a response. If you don't care about a response from your peer, then you must receive and discard it or use DEALER (or ROUTER, but DEALER makes more sense in your current diagram).
I've created a diagram of how I would accomplish this architecture below - using your basic process structure.
Broker T1 Broker T2 Broker T3 Broker T4
(PUB*)------>(*SUB)[--](DEALER*) -->(*SUB)[--](DEALER*) -->(*SUB)[--](DEALER*)
|_____________________||____| || | ||
|_____________________||_______________________||____| ||
|| || ||
========================|| ==================|| ===========||=
|| || || || || ||
|| || || || || ||
|| || || || || ||
(REP*) (REP*) (REP*) (REP*) (REP*) (REP*)
S1P1 S1P2 S2P1 S2P2 S3P1 S3P2
So, the main difference is that I've ditched your (SUB copyLoop=> REQ) step. Whether you choose XPUB/XSUB vs PUB/SUB is up to you, but I would tend to start simpler unless you currently want to make use of the extra features of XPUB/XSUB.
Obviously this diagram doesn't deal with how information enters your broker, where you currently show an XSUB socket - that's out of scope for the information you've provided thus far, presumably you're able to receive information into your broker successfully already so I won't deal with that.
I assume your broker threads that are dedicated to each service are making intelligent choices on whether to send the message to their service or not? If so, then your choice of having them subscribed to everything should work fine, otherwise more intelligent subscription setups might be necessary.
If you're using a REP socket on your service processes, then the service process must take that message and deal with it asynchronously, never communicating back any details about that message to the broker. It must then respond to each message with an acknowledgement (like "RECEIVED") so that it follows the strict receive/send/receive/send pattern for REP sockets.
If you want any other type of communication about how the service handles that message sent back to the broker, REP is no longer the appropriate socket type for your service processes, and DEALER may no longer be the correct socket type for your broker. If you want some form of load balancing so that you send to the next open service process, you'll need to use ROUTER/REQ and have each service indicate its availability and have the broker hold on to the message until the next service process says its available by sending results back. If you want some other type of message handling, you'll have to indicate what that is so a suitable architecture can be proposed.
Clearly I got mixed up with a few elements:
Sockets have the same API whether you're using it as a client-side socket (Socket.connect) or a server-side socket (Socket.bind)
Sockets have the same API regardless of the type (e.g. Socket.subscribe should not be called on a PUSH socket)
Some socket types require a send/receive response loop (e.g. REQ/REP)
Some nuances in communication patterns (PUSH/PULL vs ROUTER/DEALER)
The difficulty (impossiblity?) in debugging a ZeroMQ setup
So a big thanks to Jason for his incredibly detailed answer (and awesome diagram!) that pointed me to the right direction.
I ended up with the following design:
broker thread 1 is running a fan-out XSUB/XPUB proxy on bind(localhost:6000) and bind(localhost:6001)
broker thread 2 is running a queuing SUB/PUSH proxy on connect(localhost:6001) and bind(localhost:6002); broker threads 3 and 4 use a similar design with different bind port numbers
message producers connect to the broker using a PUB socket on connect(localhost:6000)
message consumers connect to the broker queuing proxy using a PULL socket on connect(localhost:6002)
On top of this service-specific queuing mechanism, I was able to add a similar service-specific fan-out mechanism rather simply:
broker thread runs a SUB/PUB proxy on connect(localhost:6001) and bind(localhost:6003)
message producers still connect to the broker using a PUB socket on connect(localhost:6000)
message consumers connect to the broker fan-out proxy using a SUB socket on connect(localhost:6003)
This has been an interesting ride.

Polling if I can PUSH or send in zmq?

By using 0mq, I am trying to detect if I have made a successful connection to a PULL port, and if I can PUSH. However, it didn't work as I had expected, see the example code below. Poller will return immediately even remote peer hasn't been started to accept connections. Is there a way to fix it?
import sys
import zmq
context = zmq.Context()
pusher = context.socket(zmq.PUSH)
pusher.connect("tcp://localhost:5555")
poller = zmq.Poller()
poller.register(pusher, zmq.POLLOUT)
socks = dict(poller.poll(timeout=1000))
if pusher in socks and socks[pusher] == zmq.POLLOUT:
print("Pusher can push")
else:
print("Failed to connect, exit.")
sys.exit(1)
You would be allowed to send as long as you haven't reached the High Water Mark ( HWM ) of the sending socket - the number of messages allowed to pile up on the sender side.
By default it is set to 1000 as far as I remember.
/Søren

zeromq pattern for multi client pull receive from one server pushing socket

I want to use zeromq for several clients labeled as 1...n to pull receive from one same socket in server pushing messages containing one field as client id. When server push message labeled 1, only client 1 receives it.
one way is to generate the same number of server sockets for clients to one by one connect, which I think there may be a better solution. Thanks a lot.
Use "Publish/Subscribe" pattern and set socket options for filtering messages.
Code for SUB side:
context = zmq.Context()
socket = context.socket(zmq.SUB);
socket.connect ("tcp://localhost:%s" % port)
topicfilter = "10001"
socket.setsockopt(zmq.SUBSCRIBE, topicfilter)
string = socket.recv()
Code for PUB side:
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:%s" % port)
topic = 10001
messagedata = random.randrange(1,215) - 80
socket.send("%d %d" % (topic, messagedata))
Examples and pattern describe here
Update
Another pattern which you may use Router-Req.
Common idea:
Examples for python here

ZeroMQ: Many-to-one no-reply aynsc messages

I have read through the zguide but haven't found the kind of pattern I'm looking for:
There is one central server (with known endpoint) and many clients (which may come and go).
Clients keep sending hearbeats to the server, but they don't want the server to reply.
Server receives heartbeats, but it does not reply to clients.
Hearbeats sent when clients and server are disconnected should somehow be dropped to prevent a heartbeat flood when they go back online.
The closet I can think of is the DEALER-ROUTER pattern, but since this is meant to be used as an async REQ-REP pattern (no?), I'm not sure what would happen if the server just keep silent on incoming "requests." Also, the DEALER socket would block rather then start dropping heartbeats when the send High Water Mark is reached, which would still result in a heartbeat flood.
The PUSH/PULL pattern should give you what you need.
# Client example
import zmq
class Client(object):
def __init__(self, client_id):
self.client_id = client_id
ctx = zmq.Context.instance()
self.socket = ctx.socket(zmq.PUSH)
self.socket.connect("tcp://localhost:12345")
def send_heartbeat(self):
self.socket.send(str(self.client_id))
# Server example
import zmq
class Server(object):
def __init__(self):
ctx = zmq.Context.instance()
self.socket = ctx.socket(zmq.PULL)
self.socket.bind("tcp://*:12345") # close quote
def receive_heartbeat(self):
return self.socket.recv() # returns the client_id of the message's sender
This PUSH/PULL pattern works with multiple clients as you wish. The server should keep an administration of the received messages (i.e. a dictionary like {client_id : last_received} which is updated with datetime.utcnow() on each received message. And implement some housekeeping function to periodically check the administration for clients with old timestamps.

Publisher finishes before subscriber and messages are lost - why?

Fairly new to zeromq and trying to get a basic pub/sub to work. When I run the following (sub starting before pub) the publisher finishes but the subscriber hangs having not received all the messages - why ?
I think the socket is being closed but the messages have been sent ? Is there a way of ensuring all messages are received ?
Publisher:
import zmq
import random
import time
import tnetstring
context=zmq.Context()
socket=context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
y=0
for x in xrange(5000):
st = random.randrange(1,10)
data = []
data.append(random.randrange(1,100000))
data.append(int(time.time()))
data.append(random.uniform(1.0,10.0))
s = tnetstring.dumps(data)
print 'Sending ...%d %s' % (st,s)
socket.send("%d %s" % (st,s))
print "Messages sent: %d" % x
y+=1
print '*** SERVER FINISHED. # MESSAGES SENT = ' + str(y)
Subscriber :-
import sys
import zmq
import tnetstring
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
filter = "" # get all messages
socket.setsockopt(zmq.SUBSCRIBE, filter)
x=0
while True:
topic,data = socket.recv().split()
print "Topic: %s, Data = %s. Total # Messages = %d" % (topic,data,x)
x+=1
In ZeroMQ, clients and servers always try to reconnect; they won't go down if the other side disconnects (because in many cases you'd want them to resume talking if the other side comes up again). So in your test code, the client will just wait until the server starts sending messages again, unless you stop recv()ing messages at some point.
In your specific instance, you may want to investigate using the socket.close() and context.term(). It will block until all the messages have been sent. You also have the problem of a slow joiner. You can add a sleep after the bind, but before you start publishing. This works in a test case, but you will want to really understand what is the solution vs a band-aid.
You need to think of the PUB/SUB pattern like a radio. The sender and receiver are both asynchronous. The Publisher will continue to send even if no one is listening. The subscriber will only receive data if it is listening. If the network goes down in the middle, the data will be lost.
You need to understand this in order to design your messages. For example, if you design your messages to be "idempotent", it doesn't matter if you lose data. An example of this would be a status type message. It doesn't matter if you have any of the previous statuses. The latest one is correct and message loss doesn't matter. The benefits to this approach is that you end up with a more robust and performant system. The downsides are when you can't design your messages this way.
Your example includes a type of message that requires no loss. Another type of message would be transactional. For example, if you just sent the deltas of what changed in your system, you would not be able to lose the messages. Database replication is often managed this way which is why db replication is often so fragile. To try to provide guarantees, you need to do a couple things. One thing is to add a persistent cache. Each message sent needs to be logged in the persistent cache. Each message needs to be assigned a unique id (preferably a sequence) so that the clients can determine if they are missing a message. A second socket (ROUTER/REQ) needs to be added for the client to request the missing messages individually. Alternatively, you could just use the secondary socket to request resending over the PUB/SUB. The clients would then all receive the messages again (which works for the multicast version). The clients would ignore the messages they had already seen. NOTE: this follows the MAJORDOMO pattern found in the ZeroMQ guide.
An alternative approach is to create your own broker using the ROUTER/DEALER sockets. When the ROUTER socket saw each DEALER connect, it would store its ID. When the ROUTER needed to send data, it would iterate over all client IDs and publish the message. Each message should contain a sequence so that the client can know what missing messages to request. NOTE: this is a sort of reimplementation of Kafka from linkedin.

Resources