DynamoDBMapper transactions vs Distributed Locks - spring-boot

I'm using Java DynamoDBMapper into my SpringBoot microservice where I'm using DynamoDB.
I have a doubt about the transactions management:
For a REST API (POST), before allowing the creation of the entity I have to do some checks about the status of the user objects that are currently saved on the DB. This is not a check on the status of a specific object fields.
What I mean.. I have to do something like this:
I have to retrieve the count of the objects that are currently assigned to the user
In case this count is <= N I have to allow the creation of the new object.
Basically I would like to encapsulate these steps into a single 'atomic' operation in order to avoid creating objects for the user if he already reached the limit or block the operation if, at the same time the user has deleted one saved object.
I'm not able to understand if I can do this using the transactions.
Basically, I would like to understand if it's possible to do a sort of lock:
I mean, If I'm doing this operation I would like to block other operations for the same user: e.g. delete an object (using a dedicated API) when in the middle of step 1 and 2.
Should I use the transactions (and, in case, how?) or should I use a different approach like this: Building Distributed Locks with the DynamoDB Lock Client

Related

How to avoid concurrent requests to a lambda

I have a ReportGeneration lambda that takes request from client and adds following entries to a DDB table.
Customer ID <hash key>
ReportGenerationRequestID(UUID) <sort key>
ExecutionStartTime
ReportExecutionStatus < workflow status>
I have enabled DDB stream trigger on this table and a create entry in this table triggers the report generation workflow. This is a multi-step workflow that takes a while to complete.
Where ReportExecutionStatus is the status of the report processing workflow.
I am supposed to maintain the history of all report generation requests that a customer has initiated.
Now What I am trying to do is avoid concurrent processing requests by the same customer, so if a report for a customer is already getting generated don’t create another record in DDB ?
Option Considered :
query ddb for the customerID(consistent read) :
- From the list see if any entry is either InProgress or Scheduled
If not then create a new one (consistent write)
Otherwise return already existing
Issue: If customer clicks in a split second to generate report, two lambdas can be triggered, causing 2 entires in DDB and two parallel workflows can be initiated something that I don’t want.
Can someone recommend what will be the best approach to ensure that there are no concurrent executions (2 worklflows) for the same Report from same customer.
In short when one execution is in progress another one should not start.
You can use ConditionExpression to only create the entry if it doesn't already exist - if you need to check different items, than you can use DynamoDB Transactions to check if another item already exists and if not, create your item.
Those would be the ways to do it with DynamoDB, getting a higher consistency.
Another option would be to use SQS FIFO queues. You can group them by the customer ID, then you wouldn't have concurrent processing of messages for the same customer. Additionally with this SQS solution you get all the advantages of using SQS - like automated retry mechanisms or a dead letter queue.
Limiting the number of concurrent Lambda executions is not possible as far as I know. That is the whole point of AWS Lambda, to easily scale and run multiple Lambdas concurrently.
That said, there is probably a better solution for your problem using a DynamoDB feature called "Strongly Consistent Reads"
By default reads to DynamoDB (if you use the AWS SDK) are eventually consistent, causing the behaviour you observed: Two writes to the same table are made but your Lambda only was able to notice one of those writes.
If you use Strongly consistent reads, the documentation states:
When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful.
So your Lambda needs to do a strongly consistent read to your table to check if the customer already has a job running. If there is already a job running the Lambda does not create a new job.

Implementing static shared counter in microservice architecture

I have a use case where i want to record data in rows and display to the user.
Multiple users can add these records and they have to be displayed in order of insertion AND - MOST IMPORTANTLY - with a sequence number starting from 1.
I have a Spring boot microservice architecture at the backend, which obviously means i cannot hold state in my boot application as i'm gonna have multiple running instances.
Another method was to fetch all existing records in the db,count them,increment the count by 1 and use that as my sequence. I need to do this every time i am doing an insert.
But the problem with the second approach is with parallel requests, which could result in same sequence number being given to 2 records.
Third approach is to configure the counter in a db , but since i am using cosmos DB, apparently that is also not an option.
Any suggestions as to how i can implement a static, shared counter ?

CQRS Event-sourcing and own database per microservice

I have some questions above event-sourcing and cqrs in microservices architecture.
I understand that after send command some microservice executes it and emits event. Event-store subcsribes on it and saves inside his database. Also some ReadModel basing on this event generates and saves optimized data inside read database.
My first question is - Can microservice has his own database and store
data inside it too? Or maybe in event-sourcing approach microservices
don't have their own databases and everything is only stored inside
event store?
My second question is - when I execute command in microservice and
need some data for validation purposes do I need call ReadModel or
what? Assuming microservices haven't got their own databases I have no
choice?
Can microservice has his own database and store data inside it too?
Definitely, microservice can have its own database. But let's use terms from ES/CQRS. Database can represent Event Store (append-only log of immutabale events) and Read Model - some database used to answer queries which is populated by proseccing events.
So, microservice can have its own Read model, populated from events from other microservices.
Or microservice can process commands and save events to the shared Event Store.
Or microservice can process commands and save events to its own Event store.
Choice is yours, and it depends on degree of separation you want to achieve among microservices.
I would put all events that usually consumed together into same Event store. Which means I should be able to query for these events and have a single ordered stream as a result.
when I execute command in microservice and need some data for validation purposes do I need call ReadModel or what?
Command is executed by Aggregate, that has its own state. This state is built by processing all events for this aggregate, and this state should be used to validate a command.
You cannot/should not talk to Read Models in the command handler, primarily because those read models are not consistent with aggregate state. Aggregate state is consistent.
You can query Read Model before sending a command (to make sure it can be sent). But in command handler you need to rely on aggregate state only.
There is a famous case of registering user with requirement of a unique name. As a primary validation, in your UI code you can query read model and tell user that entered name is taken. If name is not taken, UI lets user issue a command. I'm assuming your Aggregate root is user.
But when processing this command ({id:123, type:CREATE_USER, name:somename}) you cannot check that "somename" is taken, because aggregate state for user 123 does not contain a list of taken names. You can potentially query some AllUsernames read model, but it can be milliseconds old, and some other user could take this "somename" already. So in this scenario, you will find a duplication during adding names to read model. And at that point you can do some compensation action - usually issue a command to suspend a user with duplicated name and ask him to re-register or change his name somehow.
It may seems strange, but if you have a really distributed system with several replicas of user list, you'll have the same problem, so why not just embrace the fact that data is always not fully consistent, and just deal with it?

Spring Batch Framework

I am not able to finalize whether Spring Batch framework is applicable for the below requirement. I need experts inputs on this.
Following is my requirement:
Read multiple Oracle tables (at least 10 tables including both transaction and master), do complex
calculation based on the business rules, Insert / Update / Delete
records in transaction tables.
I have identified the following two designs:
Design # 1:
ItemReader: Select eligible records from Key transaction table.
ItemProcessor: Fetch additional details from DB using the key available in the record retrieved by ItemReader.(It would require multipble DB transactions)
Do the validation and computation and add the details to be written to DB as objects in a list.
ItemWriter: Write the details available in objects using CustomItemWriter(insert / update / delete operation)
With this design, we can achieve parallel processing but increase the number of DB transactions.
Design # 2:
Step # 1
ItemReader: Use Composite Item Reader (Group of ItemReaders) to read all the required tables.
ItemWriter: Save the result sets as lists of Objects (One list per table) in execution context
Step # 2
ItemReader: Retrieve lists of Objects available in execution context and group them into one list of objects based on the business processing so that processor can process them.
IremProcessor:
Process the chunk of Objects returned by ItemReader.
Do the validation and computation and add the details to be written to DB as objects in a list.
ItemWriter: Write the details available in objects using CustomItemWriter(insert / update / delete operation)
With this design, we can REDUCE the number of DB Transactions but we are delaying the processing till all table records are retrieved and stored in execution context ie we are not using parallel processing provided by SpringBatch.
Please advise whether the above is feasible using SpringBatch or we need to use conventional Java program.
The good news is that your problem description matches a very common use case for spring-batch. The bad news is that the problem description is too generic to allow much meaningful input about the specifc design beyond the comments already provided.
Spring-batch brings facilities similar to JCL and ISPF from the mainframe world into the java context.
Spring batch provides a framework for organizing and managing the boundaries of your process. It is a natural for a lot of ETL and bigdata operations, but it is not the only way to write these processes.
If you process can be broken down into discreet steps, then spring batch is a good choice for you.
The Itemreader should (logicall) be an iterator returning a single object representing the start of one logical unit of work (luw). The luw object is captured by the chunker and assembled into collections of the size you configure, and then passed to the processor. The result of the processor is then passed to the writer. In the context of an RDBMS centric process, the commit happens at the end of the writer's operation.
What happens in each of those pieces of the step is 100% whatever you need (plain old java). The point of the framework is to free you from the complexity and enable you to solve the problem.
From my understanding, Spring batch has nothing to do with database batch operations (or at least the word 'batch' has a different meaning in these two contexts..) Spring batch is used to create processes with multiple steps, and gives you the chance to restart a process if one of the process steps fails (without repeating the previously finished process steps.)

optimizing large selects in hibernate/jpa with 2nd level cache

I have a user object represented in JPA which has specific sub-types. Eg, think of User and then a subclass Admin, and another subclass Power User.
Let's say I have 100k users. I have successfully implemented the second level cache using Ehcache in order to increase performance and have validated that it's working.
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache
I know it does work (ie, you load the object from the cache rather than invoke an sql query) when you call the load method. I've verified this via logging at the hibernate level and also verifying that it's quicker.
However, I actually want to select a subset of all the users...for example, let's say I want to do a count of how many Power Users there are.
Furthermore, my users have an associated ZipCode object...the ZipCode objects are also second level cached...what I'd like to do is actually be able to ask queries like...how many Power Users do i have in New York state...
However, my question is...how do i write a query to do this that will hit the second level cache and not the database. Note that my second level cache is configured to be read/write...so as new users are added to the system they should automatically be added to the cache...also...note that I have investigated the Query cache briefly but I'm not sure it's applicable as this is for queries that are run multiple times...my problem is more a case of...the data should be in the second level cache anyway so what do I have to do so that the database doesn't get hit when I write my query.
cheers,
Brian
(...) the data should be in the second level cache anyway so what do I have to do so that the database doesn't get hit when I write my query.
If the entities returned by your query are cached, have a look at Query#iterate(). This will trigger a first query to retrieve a list of IDs and then subsequent queries for each ID... that would hit the L2 cache.

Resources