Cache values in Java EE - caching

I'm building a simple message delegation application. Messages are being send on both ends via JMS. I'm using a MDB to process incoming messages, transform them and send them to a target queue. Unfortunately the same messages can be send to the incoming queue more than once but it is not allowed to forward duplicates.
So what is the best way to accomplish that?
Since there can be multiple MDBs listening on the incoming queue a need a single cache where I can store the unique message uuids of the incoming messages for at least an hour. How should this cache be accessed? Via a singleton/ static class (I'm running Java EE 5 and thus don't have the singleton annotation)?
In addition I think all operations must be synchronized, right? Does that harm performance too much?

#Ingo: are you OK with database solution. You can full fledged DB server or simple apache derby solution for this..
If so, you can have a simple table where you can store message unique UId and can check against it for uniqueness....this solution will have following benefits:
Simple code
No need of time bound cache(1 hour). You can check for uniqueness of a message forever.
Persistent record of what messages came in.
No need of expensive synchronized, you can rely on DB isolation level to have consistency.
centralized solution for your possibly many deployments of application.

Related

Microservice failure Scenario

I am working on Microservice architecture. One of my service is exposed to source system which is used to post the data. This microservice published the data to redis. I am using redis pub/sub. Which is further consumed by couple of microservices.
Now if the other microservice is down and not able to process the data from redis pub/sub than I have to retry with the published data when microservice comes up. Source can not push the data again. As source can not repush the data and manual intervention is not possible so I tohught of 3 approaches.
Additionally Using redis data for storing and retrieving.
Using database for storing before publishing. I have many source and target microservices which use redis pub/sub. Now If I use this approach everytime i have to insert the request in DB first than its response status. Now I have to use shared database, this approach itself adding couple of more exception handling cases and doesnt look very efficient to me.
Use kafka inplace if redis pub/sub. As traffic is low so I used Redis pub/sub and not feasible to change.
In both of the above cases, I have to use scheduler and I have a duration before which I have to retry else subsequent request will fail.
Is there any other way to handle above cases.
For the point 2,
- Store the data in DB.
- Create a daemon process which will process the data from the table.
- This Daemon process can be configured well as per our needs.
- Daemon process will poll the DB and publish the data, if any. Also, it will delete the data once published.
Not in micro service architecture, But I have seen this approach working efficiently while communicating 3rd party services.
At the very outset, as you mentioned, we do indeed seem to have only three possibilities
This is one of those situations where you want to get a handshake from the service after pushing and after processing. In order to accomplish the same, using a middleware queuing system would be a right shot.
Although a bit more complex to accomplish, what you can do is use Kafka for streaming this. Configuring producer and consumer groups properly can help you do the job smoothly.
Using a DB to store would be a overkill, considering the situation where you "this data is to be processed and to be persisted"
BUT, alternatively, storing data to Redis and reading it in a cron-job/scheduled job would make your job much simpler. Once the job is run successfully, you may remove the data from cache and thus save Redis Memory.
If you can comment further more on the architecture and the implementation, I can go ahead and update my answer accordingly. :)

Preventing data loss in client authoritative database writes

A project I'm working on requires users to insert themselves into a list on a server. We expect a few hundred users over a weekend and while very unlikely, a collision could happen in which two users submit the list concurrently and one of them is lost. The server has no validation, it simply allows you to get and put data.
I was pointed in the direction of "optimistic locking" but I'm having trouble grasping when exactly the data should be validated and how it prevents this from happening. If one of the clients reads the data, adds itself and then checks again to ensure that the data is the same with the use of an index or timestamp, how does this prevent the other client from doing the same and then one overwriting the other?
I'm trying to understand the flow in the context of two clients getting data and putting data.
The point of optimistic locking is that the decision to accept or reject a write is taken on the server, and is protected against concurrency by a pessimistic transaction or some sort of hardware protection, such as compare-and-swap. So a client requests a write together with some sort of timestamp or version identifier, and the server only accepts the write if the timestamp is still accurate. If it isn't the client gets some sort of rejection code and will have to try again. If it is, the client gets told that its write succeeded.
This is not the only way to handle receiving data from multiple clients. One popular alternative is to use a reliable messaging system - for example the Java Messaging Service specifies an interface for such systems for which you can find open source implementations. Clients write into the messaging system and can go away as soon as their message is accepted. The server reads requests from the messaging system and acts on them. If the server or the network goes down it's no big deal: the messages will still be there to be read when they come back (typically they are written to disk and have the same level of protection as database data although if you look at a reliable message queue implementation you may find that it is not, in fact, built on top of a standard database table).
One example of a writeup of the details of optimistic locking is the HTTP server Etag specification e.g. https://en.wikipedia.org/wiki/HTTP_ETag

How to maintain order of messages being processed in a mule flow from VM to JMS using one-way message exchange pattern?

I am using mulesoft ESB with Anypoint studio for a project. In one of my flows I am using one-way message exchange pattern to dispatch from VM (persistence file store VM connector) to JMS, both xa transaction enabled to avoid losing messages.
Consider a scenario where we send a message every time user updates his/her last name to ESB. For example, let's say user changes last name to 'A', but quickly changes to 'B', so final result is expected to be 'B'.
1) Is it likely that message 'B' gets processed before message 'A' in my case? and thus last name being set to 'A' instead of 'B'?
2) How do I avoid that apart from using 'request-response' MEP?
3) Is there a way to write unit tests for making sure order of messages being processed is maintained from VM (one-way, xa enabled) to JMS (one-way, xa enabled)?
4) How do I go about testing that manually?
Thank you in advance. Any pointers/help will be appreciated.
It's not likely, since your system would normally react way quicker than a user can submit requests. However, that may be the case during a load peak.
To really ensure message order, you really need a single bottleneck (a single instance/thread) in your solution to handle all requests. That is, you need to make sure your processing strategy in Mule is synchronous and that you only have a single consumer on the VM queue. If you have a HA setup with multiple Mule servers, you may have potential to get messages out of order. In that case, and if the user initially is connected using HTTP, you can get around most of the problem using a load balancer with a sticky session strategy.
A perhaps more robust and scalable solution is to make sure the user submits it's local timestamp on each request with high resolution. Then you can make sure to discard any "obsolete" updates when storing the information into a database. However, that is not in the mule VM/JMS layer, but rather in the database.
For testability - no, I don't think there is a truly satisfying way to be 100% sure messages won't come out of order during any condition by just writing integration tests or performing manual tests. You need to verify the message path theoretically to make sure there is no part where one message can bypass another.

Web server and ZeroMQ patterns

I am running an Apache server that receives HTTP requests and connects to a daemon script over ZeroMQ. The script implements the Multithreaded Server pattern (http://zguide.zeromq.org/page:all#header-73), it successfully receives the request and dispatches it to one of its worker threads, performs the action, responds back to the server, and the server responds back to the client. Everything is done synchronously as the client needs to receive a success or failure response to its request.
As the number of users is growing into a few thousands, I am looking into potentially improving this. The first thing I looked at is the different patterns of ZeroMQ, and whether what I am using is optimal for my scenario. I've read the guide but I find it challenging understanding all the details and differences across patterns. I was looking for example at the Load Balancing Message Broker pattern (http://zguide.zeromq.org/page:all#header-73). It seems quite a bit more complicated to implement than what I am currently using, and if I understand things correctly, its advantages are:
Actual load balancing vs the round-robin task distribution that I currently have
Asynchronous requests/replies
Is that everything? Am I missing something? Given the description of my problem, and the synchronous requirement of it, what would you say is the best pattern to use? Lastly, how would the answer change, if I want to make my setup distributed (i.e. having the Apache server load balance the requests across different machines). I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Some thoughts about the subject...
Keep it simple
I would try to keep things simple and "plain" ZeroMQ as long as possible. To increase performance, I would simply to change your backend script to send request out from dealer socket and move the request handling code to own program. Then you could just run multiple worker servers in different machines to get more requests handled.
I assume this was the approach you took:
I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Only problem here is that there is no request retry in the backend. If worker fails to handle given task it is forever lost. However one could write worker servers so that they handle all the request they got before shutting down. With this kind of setup it is possible to update backend workers without clients to notice any shortages. This will not save requests that get lost if the server crashes.
I have the feeling that in common scenarios this kind of approach would be more than enough.
Mongrel2
Mongrel2 seems to handle quite many things you have already implemented. It might be worth while to check it out. It probably does not completely solve your problems, but it provides tested infrastructure to distribute the workload. This could be used to deliver the request to be handled to multithreaded servers running on different machines.
Broker
One solution to increase the robustness of the setup is a broker. In this scenario brokers main role would be to provide robustness by implementing queue for the requests. I understood that all the requests the worker handle are basically the same type. If requests would have different types then broker could also do lookups to find correct server for the requests.
Using the queue provides a way to ensure that every request is being handled by some broker even if worker servers crashed. This does not come without price. The broker is by itself a single point of failure. If it crashes or is restarted all messages could be lost.
These problems can be avoided, but it requires quite much work: the requests could be persisted to the disk, servers could be clustered. Need has to be weighted against the payoffs. Does one want to use time to write a message broker or the actual system?
If message broker seems a good idea the time which is required to implement one can be reduced by using already implemented product (like RabbitMQ). Negative side effect is that there could be a lot of unwanted features and adding new things is not so straight forward as to self made broker.
Writing own broker could covert toward inventing the wheel again. Many brokers provide similar things: security, logging, management interface and so on. It seems likely that these are eventually needed in home made solution also. But if not then single home made broker which does single thing and does it well can be good choice.
Even if broker product is chosen I think it is a good idea to hide the broker behind ZeroMQ proxy, a dedicated code that sends/receives messages from the broker. Then no other part of the system has to know anything about the broker and it can be easily replaced.
Using broker is somewhat developer time heavy. You either need time to implement the broker or time to get use to some product. I would avoid this route until it is clearly needed.
Some links
Comparison between broker and brokerless
RabbitMQ
Mongrel2

What is the best way to deliver real-time messages to Client that can not be requested

We need to deliver real-time messages to our clients, but their servers are behind a proxy, and we cannot initialize a connection; webhook variant won't work.
What is the best way to deliver real-time messages considering that:
client that is behind a proxy
client can be off for a long period of time, and all messages must be delivered
the protocol/way must be common enough, so that even a PHP developer could easily use it
I have in mind three variants:
WebSocket - client opens a websocket connection, and we send messages that were stored in DB, and messages comming in real time at the same time.
RabbitMQ - all messages are stored in a durable, persistent queue. What if partner will not read from a queue for some time?
HTTP GET - partner will pull messages by blocks. In this approach it is hard to pick optimal pull interval.
Any suggestions would be appreciated. Thanks!
Since you seem to have to store messages when your peer is not connected, the question applies to any other solution equally: what if the peer is not connected and messages are queueing up?
RabbitMQ is great if you want loose coupling: separating the producer and the consumer sides. The broker will store messages for you if no consumer is connected. This can indeed fill up memory and/or disk space on the broker after some time - in this case RabbitMQ will shut down.
In general, RabbitMQ is a great tool for messaging-based architectures like the one you describe:
Load balancing: you can use multiple publishers and/or consumers, thus sharing load.
Flexibility: you can configure multiple exchanges/queues/bindings if your business logic needs it. You can easily change routing on the broker without reconfiguring multiple publisher/consumer applications.
Flow control: RabbitMQ also gives you some built-in methods for flow control - if a consumer is too slow to keep up with publishers, RabbitMQ will slow down publishers.
You can refactor the architecture later easily. You can set up multiple brokers and link them via shovel/federation. This is very useful if you need your app to work via multiple data centers.
You can easily spot if one side is slower than the other, since queues will start growing if your consumers can't read fast enough from a queue.
High availability and fault tolerance. RabbitMQ is very good at these (thanks to Erlang).
So I'd recommend it over the other two (which might be good for a small-scale app, but you might grow it out quickly is requirements change and you need to scale up things).
Edit: something I missed - if it's not vital to deliver all messages, you can configure queues with a TTL (message will be discarded after a timeout) or with a limit (this limits the number of messages in the queue, if reached new messages will be discarded).

Resources