I was writing a POC on long-polling using go.
I see the general package to be used is https://github.com/jcuga/golongpoll .
But assuming that I would want to publish an event to the golongpoll.SubscriptionManager from a general context, especially when there is a possibility that the long poll API request is being served by one machine, while the Kafka event for that particular consumer group is consumed by another instance in the cluster.
The examples given in the documentation did not talk of such a scenario at all, even though this seems like a common scenario. One way I can think of is have a distributed cache like Redis in between and have all the services poll this for a change? But that sounds a bit dumb to me.
Related
I'm using RabbitMQ in my pet project (Spring Boot based). In #Configuration I declare beans like Queue,Binding,DirectExchange. So, when I run the application all these exchanges and bindings with queues are created automatically. I'm concerned about whether this is the correct way to configure these RabbitMQ-related "entities". Should I separate this into separate steps before application startup? For example, calling series of curl to the management HTTP API to create all needed queues (with exchanges and bindings) before application startup. What are the best practices for creating/configuring routing-related stuff?
The thing is there is no one way of using RabbitMQ. However there are a few questions I always ask myself before working with any broker. I'll apply them to your question here.
Let's question the approach:
In regards to creating the exchanges, bindings, queues etc:
Try to understand if the elements you are using are durable. If so, then you could create this within your code on startup & apply a simple health check. You also want to check if your RabbitMQ server persists data. If not, then you'll HAVE to create your queues, exchanges & bindings every time.
In regards to routing & binding queues with exchanges:
There are two major questions you need to consider
How much does latency matter?
If latency does matter, try to use direct exchanges as much as you can. The reason for this is simple. You're simply going from exchange to queue instead of having to route your message. Routing adds latency, never forget! If those few extra ms won't make the difference for you, then the following question needs to be kept in mind to understand how to define your exchanges, bindings & queues.
How will I use my broker?
Some people use their broker to simply pub/sub messages. This is a perfectly reasonable use case. In this case a fanout exchange would be the most viable option. If you're trying to minimize the amount of queues you're creating a topic exchange may be interesting as well.
However more important, are you using your broker exclusively for between-service communication or is your service going to be both the producer and consumer? Hell, is it a mix between the previous two cases? This boundary needs to be defined clearly. Else you'll run into a mess where you suddenly notice you're consuming messages that can actually be handled internally by another library or simply passing arguments to functions.
Example of how to apply it:
Case: we have a logging service & user service:
1. How will I use my broker:
between service communication.
2. Which messages do I need to get across?
CRUD operations
login/logout
3. How would my routing table look like (draft)?
Exchange Name
Exchange Type
Binding
Queue
user
topic
user.cmd.*
user:crud
user
topic
user.event.login
user:login
Above you can clearly see we can handle all CRUD operations using one simple queue. Is this the most efficient approach? It depends on your service. It may be better that user.cmd.create should go to user:created. This is another boundary that you'll need to define.
Something that also needs to be mentioned is that you should use your Queues & routing keys as pieces of information. Debugging a micro service can be hellish. So applying a general naming convention would be most appropriate. There is no one naming convention, so this depends on your use case once again.
Conclusion:
In general the practice that is best in regards to any broker is clearly defining the scope of your broker and its underlying elements. If not done properly, it does not matter if you do a health check or not on startup. It's easy to get lost in the complexity and interesting features rabbitMQ offers. Try to keep it as simple as possible at first and ask yourself: "is it going to cost me a lot of time to refactor/debug/fix later?" If the answer is yes go back to the first sentence in this paragraph.
Documentation:
AMQP concepts: https://www.rabbitmq.com/tutorials/amqp-concepts.html
Some general best practices: https://www.cloudamqp.com/blog/part2-rabbitmq-best-practice-for-high-performance.html
Routing: https://www.rabbitmq.com/tutorials/tutorial-four-python.html
Latency & throughput: https://blog.rabbitmq.com/posts/2012/05/some-queuing-theory-throughput-latency-and-bandwidth
I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.
For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.
At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)
I have a REST service - all its requests are persisted to its own relational database. So far, good. But, there is also a small business functionality (email notification, sms alert) that should be run on the newly received/updated data. For this process to work on data in background, it requires some way to know about the persisted data - a message queue would fix the problem. Three common ways I see designing this,
The REST service inserts into the database, also, publish to the queue, too.
The problem here is, distributed transaction - combining different types within one transaction - relational database & the queue. Some tools may support, some may not.
As usual REST service persists only to its database. Additionally it also inserts the data into another table to which a scheduled job queries, publishes them to queue (from which the background job should start its work).
The problem I see is the scheduler - not reactive, batchprocessing, limited by the time slot, not realtime, slow and others.
The REST endpoint publishes the data directly to a topic. A consumer persists it to the database, whereas another process it in the background.
Something like eventsourcing. TMU, it is bit complex to implement as the number of services grow. Also, if the db is down, the persistent service would fail to save the data, however the background service (say, the emailer) would send email which is functionaly wrong. This may lead to inconsistency among the services, also functional.
I have also thought of reading database transaction-logs, but it seems more complex, requires tools to configurations to make it work, also, it seems right for data processing systems than for our use case.
What's your thought on this - did I miss anything? How do you manage such scenarios? What should be looked for? Thinking reactive, say Vertx?
Apologies if this looks very naive, but I have to ask.
I think best approach is 2 with a CDC(change data capture) system like debezium.
See [https://microservices.io/patterns/data/transactional-outbox.html][1]
I usually recommend option 3 if you don't need immediate read after write consistency. Background job should retry if the database record is still not updated by the message it processes.
Your post exemplifies why queues shouldn't be used for these types of scenarios. They are good for delivering analytical data or logs, but for task orchestration developers have to reinvent the wheel every time.
The much better approach is to use a task orchestration system like Cadence Workflow that eliminates issues you described and makes multi-service orchestration much simpler.
See this presentation that explains the Cadence programming model.
I have an environment where I have only one app server. I have some messages that take awhile to service (like 10 seconds or so) and I'd like to increase throughput by configuring multiple instances of my consumer application running code to process these messages. I've read about the "competing consumer" pattern and gather that this should be avoided when using MassTransit. According to the MassTransit docs here, each receive endpoint should have a unique queue name. I'm struggling to understand how to map this recommendation to my environment. Is it possible to have N instances of consumers running that each receive the same message, but only one of the instances will actually act on it? In other words, can we implement the "competing consumer" pattern but across multiple queues instead of one?
Or am I looking at this wrong? Do I really need to look into the "Send" method as opposed to "Publish"? The downside with "Send" is that it requires the sender to have direct knowledge of the existence of an endpoint, and I want to be dynamic with the number of consumers/endpoints I have. Is there anything built in to MassTransit that could help with the keeping track of how many consumer instances/queues/endpoints there are that can service a particular message type?
Thanks,
Andy
so the "avoid competing consumers" guidance was from when MSMQ was the primary transport. MSMQ would fall over if multiple threads where reading from the queue.
If you are using RabbitMQ, then competing consumers work brilliantly. Competing consumers is the right answer. Each competing consume will use the same receive from endpoint.
I am running an Apache server that receives HTTP requests and connects to a daemon script over ZeroMQ. The script implements the Multithreaded Server pattern (http://zguide.zeromq.org/page:all#header-73), it successfully receives the request and dispatches it to one of its worker threads, performs the action, responds back to the server, and the server responds back to the client. Everything is done synchronously as the client needs to receive a success or failure response to its request.
As the number of users is growing into a few thousands, I am looking into potentially improving this. The first thing I looked at is the different patterns of ZeroMQ, and whether what I am using is optimal for my scenario. I've read the guide but I find it challenging understanding all the details and differences across patterns. I was looking for example at the Load Balancing Message Broker pattern (http://zguide.zeromq.org/page:all#header-73). It seems quite a bit more complicated to implement than what I am currently using, and if I understand things correctly, its advantages are:
Actual load balancing vs the round-robin task distribution that I currently have
Asynchronous requests/replies
Is that everything? Am I missing something? Given the description of my problem, and the synchronous requirement of it, what would you say is the best pattern to use? Lastly, how would the answer change, if I want to make my setup distributed (i.e. having the Apache server load balance the requests across different machines). I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Some thoughts about the subject...
Keep it simple
I would try to keep things simple and "plain" ZeroMQ as long as possible. To increase performance, I would simply to change your backend script to send request out from dealer socket and move the request handling code to own program. Then you could just run multiple worker servers in different machines to get more requests handled.
I assume this was the approach you took:
I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Only problem here is that there is no request retry in the backend. If worker fails to handle given task it is forever lost. However one could write worker servers so that they handle all the request they got before shutting down. With this kind of setup it is possible to update backend workers without clients to notice any shortages. This will not save requests that get lost if the server crashes.
I have the feeling that in common scenarios this kind of approach would be more than enough.
Mongrel2
Mongrel2 seems to handle quite many things you have already implemented. It might be worth while to check it out. It probably does not completely solve your problems, but it provides tested infrastructure to distribute the workload. This could be used to deliver the request to be handled to multithreaded servers running on different machines.
Broker
One solution to increase the robustness of the setup is a broker. In this scenario brokers main role would be to provide robustness by implementing queue for the requests. I understood that all the requests the worker handle are basically the same type. If requests would have different types then broker could also do lookups to find correct server for the requests.
Using the queue provides a way to ensure that every request is being handled by some broker even if worker servers crashed. This does not come without price. The broker is by itself a single point of failure. If it crashes or is restarted all messages could be lost.
These problems can be avoided, but it requires quite much work: the requests could be persisted to the disk, servers could be clustered. Need has to be weighted against the payoffs. Does one want to use time to write a message broker or the actual system?
If message broker seems a good idea the time which is required to implement one can be reduced by using already implemented product (like RabbitMQ). Negative side effect is that there could be a lot of unwanted features and adding new things is not so straight forward as to self made broker.
Writing own broker could covert toward inventing the wheel again. Many brokers provide similar things: security, logging, management interface and so on. It seems likely that these are eventually needed in home made solution also. But if not then single home made broker which does single thing and does it well can be good choice.
Even if broker product is chosen I think it is a good idea to hide the broker behind ZeroMQ proxy, a dedicated code that sends/receives messages from the broker. Then no other part of the system has to know anything about the broker and it can be easily replaced.
Using broker is somewhat developer time heavy. You either need time to implement the broker or time to get use to some product. I would avoid this route until it is clearly needed.
Some links
Comparison between broker and brokerless
RabbitMQ
Mongrel2