I'm using a Google Cloud Platform Cloud Function. I trigger it via Pub/Sub. In the function logs, the messages are appearing in the order in which they were triggered (newest on top). But if I create a subscription to the published topic and view it in the console like this:
cloud beta pubsub subscriptions pull test_sub --limit 1000 --auto-ack
the messages appear in random order.
Any idea why?
Google Cloud Pub/Sub does not guarantee the order of messages. There is no attempt to order messages at all. This would break or complicate sharding and clustering of resources.
To quote Google Cloud:
Even in this simple case, guaranteed ordering of messages would put
severe constraints on throughput.
For a best-case design, your software should not assume, nor rely upon any specific order for messages. Messages should be atomic units that are not dependent on other messages either before or after. If this is the case for your designs, then you will need to implement time windows and process messages independently of delivery/pull.
For more specific information on Pub/Sub message ordering:
Pub/Sub Ordering messages
Related
Recently I am trying to use MassTransit in our microservice ecosystem.
According to MassTransit vocabulary and from documents my understanding is :
Publish: Sends a message to 1 or many subscribers (Pub/Sub Pattern) to propagate the message.
Send: Used to send messages in fire and forget fashion like publish, but instead It is just used for one receiver. The main difference with Publish is that in Send if your destination didn't receive a message, it would return an exception.
Requests: uses request/reply pattern to just send a message and get a response in a different channel to be able to get response value from the receiver.
Now, my question is according to the Microservice concept, to follow the event-driven design, we use Publish to propagate messages(Events) to the entire ecosystem. but what is exactly the usage (use case) of Send here? Just to get an exception if the receiver doesn't exist?
My next question is that is it a good approach to use Publish, Send and Requests in a Microservices ecosystem at the same time? like publish for propagation events, Send for command (fire and forget), and Requests for getting responses from the destination.
----- Update
I also found here which Chris Patterson clear lots of things. It also helps me a lot.
Your question is not related to MassTransit. MassTransit implements well-known messaging patterns thoughtfully described on popular resources such as Enterprise Integration Patterns
As Eben wrote in his answer, the decision of what pattern to use is driven by intent. There are also technical differences in the message delivery mechanics for each pattern.
Send is for commands, you tell some other service to do something. You do not wait for a reply (fire and forget), although you might get a confirmation of the action success or failure by other means (an event, for example).
It is an implementation of the point-to-point channel, where you also can implement competing consumers to scale the processing, but those will be instances of the same service.
With MassTransit using RabbitMQ it's done by publishing messages to the endpoint exchange rather than to the message type exchange, so no other endpoints will get the message even though they can consume it.
Publish is for events. It's a broadcast type of delivery or fan-out. You might be publishing events to which no one is listening, so you don't really know who will be consuming them. You also don't expect any response.
It is an implementation of the publish-subscribe channel.
MassTransit with RabbitMQ creates exchanges for each message type published and publishes messages to those exchanges. Consumers create bindings between their endpoint exchanges and message exchanges, so each consumer service (different apps) will get those in their independent queues.
Request-response can be used for both commands that need to be confirmed, or for queries.
It is an implementation of the request-reply message pattern.
MassTransit has nice diagrams in the docs explaining the mechanics for RabbitMQ.
Those messaging patterns are frequently used in a complex distributed system in different combinations and variations.
The difference between Send and Publish has to do with intent.
As you stated, Send is for commands and Publish is for events. I worked on a large enterprise system once running on webMethods as the integration engine/service bus and only events were used. I can tell you that it was less than ideal. If the distinction had been there between commands and events it would've made a lot more sense to more people. Anyway, technically one needs a message enqueued and on that level it doesn't matter, which is why a queueing mechanism typically would not care about such semantics.
To illustrate this with a silly example: Facebook places and Event on my timeline that one of my friends is having a birthday on a particular day. I can respond directly (send a message) or I could publish a message on my timeline and hope my friend sees it. Another silly example: You send an e-mail to PersonA and CC 4 others asking "Please produce report ABC". PersonA would be expected to produce the report or arrange for it to be done. If that same e-mail went to all five people as the recipient (no CC) then who gets to do it? I know, even for Publish one could have a 1-1 recipient/topic but what if another endpoint subscribed? What would that mean?
So the sender is responsible, still configurable as subscriptions are, to determine where to Send the message to. For my own service bus I use an implementation of an IMessageRouteProvider interface. A practical example in a system I once developed was where e-mails received had to have their body converted to an image for a content store (IBM FileNet P8 if memory serves). For reasons I will not go into the systems were stopped each night at 20h00 and restarted at 6h00 in the morning. This led to a backlog of usually around 8000 e-mails that had to be converted. The conversion endpoint would process a conversion in about 2 seconds but that still takes a while to work through. In the meantime the web front-end folks could request PDF files for conversion to paged TIFF files. Now, these ended up at the end of the queue and they would have to wait hours for that to come back. The solution was to implement another conversion endpoint, with its own queue, and have the web front-end configured to send the same message type, e.g. ConvertDocumentCommand to that "priority" queue for processing. Pretty easy to do. Now, if that had been a publish how would I do that split? The same event going to 2 different endpoints under different circumstances? Well, you could have another subscription store for your system but now you'd need to maintain both. There could be another answer such as coding this logic into the send bit but that is a design choice and would require coding changes.
In my own Shuttle.Esb service bus I only have Send and Publish. For request/response both the sender and receiver have an inbox and a request would be sent (Send) to the receiver and it in turn could reply (also a Send but uses the sender's URI).
We are introducing SNS + SQS to handle event production and propagation in our micro services architecture, which has so far relied on HTTPS calls to communicate with each other. We are considering connecting multiple SQS queues onto one SNS topic. The events in the queues will then be consumed by a lambda or a service running in EC2.
My question is, how generic should the topics be? When should we create new topics?
Say, we have a user domain which needs to publish two events—created and deleted. Two options we are considering are:
OPTION A: Have two topics, "user-created" and "user-deleted". Each topic guarantees a single event type.
the consumers would not have to worry about discarding events that they are not interested in, as they know already know the messages coming from a "user-created" topic is only related to user creations.
multiple different parts of the code publishing to the same topic
OPTION B: Have one topic, "users", that accepts multiple event types
the consumers would have an additional responsibility of filtering through the events or taking different actions depending on the type of the event (they can also configure their queues subscriptions to filter certain event types)
can ensure a single publisher for each topic
Does anyone have a strong preference for either of the options and why would that be?
On a related note, where would you include the cloud configuration for each of the resources? (should the queue resource creation be deployed together with the consumers, or should they live independently from any of the publishers/consumers?)
I think you should go with Option B and keep all events concerning a given "domain" (e.g. "user") in a single topic:
keeps your infrastructure simple
you might introduce services interested in multiple event types (e.g. "create" and "delete"). Its kind of tricky to get the ordering right consuming this from two topics; imagine a "user-delete" event arrive before the "user-create" event
throughput might be an issue, this really depends on your domain (creating and deleting users doesn't sound like a high volume issue)
think about changes in the data structures in your topics, introducing changes in two or more topics simultaniously can get complicated pretty fast
Concerning your other question: Keep your topic/infrastructure configuration separate from your services. It's an individual piece of infrastructure (like a database) and should kept separate; especially if you introduce more consumers & producers to your system.
EDIT: This might be an example "setup":
Repository user-service contains the service/lambda code, cloudformation/terraform templates for the service and its topic subscriptions
Repository sns contains all cloudformation/terraform templates concerning SNS topics
Repository sqs contains all cloudformation/terraform templates concerning SQS topics
You can think about keeping the SNS & SQS infra code in a single repository (the last two), but I would strongly recommend everything specific to a certain service/lambda to be kept in separate repositories.
Generally it helps to think about your topics as a "database", this line of thinking should point you in the right direction for all your questions.
Here is my use case: I am developing a trading application and i want to send incremental stock updates (bidQty etc) to active consumers instead of the whole quote and a snapshot update to a new consumer (to start with).
Now, is it possible to override any ActiveMQ's class (implementors of Topic) to achieve this behavior? Any clues on this would be helpful .
If the same is possible in any other openSource provider, please let me know.
This is NOT a case where you simply can change the implementation of topic. You should actually avoid changing the implementation of core ActiveMQ features to solve specific business requirements. Fixing bugs and adding core messaging features is another thing.
There are multiple ways to solve your use case with regular ActiveMQ features.
Separate Sync and Update channel
I would probably divide the "sync/snapshot" channel from the "incremental update" channel.
One way is to implement the "snapshot-sync" as JMS request/reply where the consumer asks the provider for a sync, then continues to rely on incremental updates pushed via the topic.
Advisory messages and Selectors
You can also implement it all using a single topic using a mix of AdvisoryMessages and JMS Selectors.
An idea (you can do this in many ways):
Introduce two message properties: MsgType and Receiver
Mark each incremental update with MsgType=inc
Mark each snapshot with some client id of the consumer, Receiver=.
Have the producer listen to advisory messages from ActiveMQ and and fire a snapshot/sync message marked with Receiver= and MsgType=snapshot when there is a new client subscribing the stock topic.
The client subscribes with a selector of something like
MsgType='inc' OR (MsgType='snapshot' AND Receiver=<me>)
This way you can trigger snapshot syncs with specific clients as well as incremental updates for all clients.
If you start think about the dynamics you already have, you can probably think of another ten or so solutions.
Retroactive Consumers
You might have some use of a Retroactive Consumer - the example actually shows a scenario similar to yours.
Ref: Official GlassFish 4.0 docs/javaee-tutorial Java EE 7
Firstly, let us start with the destination-type of: topic.
As per GlassFish 4.0 tutorial, section “46.4 Writing High Performance and Scalable JMS Applications”:
This section describes how to use the JMS API to write applications
that can handle high volumes of messages robustly.
In the subsection “46.4.2 Using Shared Durable Subscriptions”:
The SharedDurableSubscriberExample.java client shows how to use shared
durable subscriptions. It shows how shared durable subscriptions
combine the advantages of durable subscriptions (the subscription
remains active when the client is not) with those of shared consumers
(the message load can be divided among multiple clients).
When we run this example as per “46.4.2.1 To Run the ShareDurableSubscriberExample and Producer Clients”, it gives us the same effect/functionality as previous example on destination-type of queue: if we follow “46.2.6.2 To Run the AsynchConsumer and Producer Clients”, points 5 onwards – and modify it slightly using 2 consumer terminal-windows and 1 producer terminal-window.
Yes, section “45.2.2.2 Publish/Subscribe Messaging Style” does mention:
The JMS API relaxes this requirement to some extent by allowing
applications to create durable subscriptions, which receive messages
sent while the consumers are not active. Durable subscriptions provide
the flexibility and reliability of queues but still allow clients to
send messages to many recipients.
.. and anyway section “46.4 Writing High Performance and Scalable ..” examples are queue style – one message per consumer:
Each message added to the topic subscription is received by only one
consumer, similarly to the way in which each message added to a queue
is received by only one consumer.
What is the precise technical answer for: why, in this example, the use of Shared-Durable-Consumer on Topic is supposed to be, and mentioned under, “High Performance and Scalable JMS Application” vs. use of Asynchronous-Consumer on Queue?
I was wonderign about the same issue, so I found out the following link. I understand that John Ament gave you the right reponse, maybe it was just too short to get a full understand.
Basically, when you create a topic you are assuming that only the subscribed consumers will receive its messages. However processing such a message may requires a heavy processing; in such a cases you can create a shared topic using as much threads as you want.
Why not use a queue? The answer is quite simple, if you use a queue only one consumer will be able to handle such a message.
In order to clarify I will give you an example. Let's say a federal court publishes thousand of sentences every day and you have three distinct applications that depends on it.
Application A just copy the sentences to a database.
Application B parse the sentence and try to find out all relation between people around all previously saved sentences.
Application C parse the sentence and try to find out all relation between companies around all previously saved sentences.
You could use a Topic for the sentences, where Application A, B and C would be subscribed. However it easy to see that Application A can process the message very quicly while Application B and C may take some time. An available solution would consist of create a shared subscription for application B and another one to application C, so multiple threads could act on each of them simultaneouly...
...Of course there are other solutions, you could for example use a unshared topic (i.e. a regular one) and post all received messages on a ArrayBlockingQueue that would be handled by a pool of threads some time later; howecer in such a decision the developer would be the one to worry about queue handling.
Hope this can help.
The idea is that you can have multiple readers on a subscription. This allows you to read more messages faster, assuming you have threads available.
JMS Queue :
queued messages are persisted
each message is guaranteed to be delivered once-and-only-once, even no consumer running when the messages are sent.
JMS Shared Subscription :
subscription could have zero to many consumers
if messages sent when there is no subscriber (durable or not), message will never be received.
I am planning to inegrate messaging middleware in my web application. Right now I am tesing different messaging middleware software like RabbitMQ,JMS, HornetQ, etc..
Examples provided with this softwares are working but its not giving as desired results.
So, I want to know that which are the factors which are responsible to improve peformance that one should keep in eyes?
Which are the areas, a developer should take care of to improve the performance of middleware messaging software?
I'm the project lead for HornetQ but I will try to give you a generic answer that could be applied to any message system you choose.
A common question that I see is people asking why a single producer / single consumer won't give you the expected performance.
When you send a message, and are asking confirmation right away, you need to wait:
The message transfer from client to server
The message being persisted on the disk
The server acknowledging receipt of the message by sending a callback to the client
Similarly when you are receiving a message, you ACK to the server:
The ACK is sent from client to server
The ACK is persisted
The server sends back a callback saying that the callback was achieved
And if you need confirmation for all your message-sends and mesage-acks you need to wait these steps as you have a hardware involved on persisting the disk and sending bits on the network.
Message Systems will try to scale up with many producers and many consumers. That is if many are producing they should all use the resources available at the server shared for all the consumers.
There are ways to speed up a single producer or single consumer:
One is by using transactions. So, you minimize the blocks and syncs you perform on disk while persisting at the server and roundtrips on the network. (This is actually the same on any database)
Another one, is by using Callbacks instead of blocking at the consumer. (JMS 2 is proposing a Callback similar to the ConfirmationHandler on HornetQ).
Also: most providers I know will have a performance section on their docs with requirements and suggestions for that specific product. You should look individually at each product