Spring-Couchbase auto generated unique ID not for production? - spring

The feature documentation
and the reference document of the spring-data-couchbase module says that the feature of generating IDs, using the build in 'UNIQUE' generation strategy shall only be used for test scaffolding. This statement is given without an explanation.
Why shall this method not be suitable for production?
Example usage:
#Document
class Entity(
#Id
#GeneratedValue(strategy = GenerationStrategy.UNIQUE)
val id: String?,
#Version
val version: String?,
#CreatedDate
val creationTime: LocalDateTime?
)

Writes in couchbase are asynchronous by default, same with views and indexes. But, if you need strong consistency (read after you write ) you should get the document by its key.
So, if you rely on the database to autogenerate they key for you, you will need to wait until the document is actually persisted in the database in order to get the generated id back. This wait can increase significantly your overall write throughput.
Generate your own ids is also considered a good practice, but please, avoid generating sequential ones (owasp security flaw - sequential ids).
This is the code I use for id generation:
public String generateId(Class t) {
return t.getSimpleName()+"--"+UUID.randomUUID().toString()+UUID.randomUUID().toString();
}

Related

One to One relationship with axon framework

I'm new to Axon Framework. I've a requirement within an asset management module which I am working on.
In this module different types of asset are build, which need to be paired in a similar fashion as one to one relationships in SQL. I am finding it difficult to design an Aggregate for this format.
The business logic validation is as follows:
Two assetIds are inputs. These identifiers resemble aggregate identifiers.
Then, load the asset instances tied to these assetIds and check if the status is unpaired or paired. If both the assets are unpaired then pair them (update the status to paired and add UUID to associatedAssets). Else raise an exception.
I have come up with the following Aggregate class:
#Aggregate
#Data
public class AssetAggregate {
#AggregateIdentifier
private UUID assetId;
private String assetType;
private HashMap<String,String> attributes;
private String status;
private String modifier;
private UUID associatedAsset;
}
My Command Message for pairing is this:
#Data
public class PairAssetCommand {
private UUID assetAId;
private UUID assetBId;
}
In the sample you have given, the PairAssetsCommand can not be handled by a single AssetAggregate as it spans the consistency boundary of two distinct aggregate instances. Namely, two different AssetAggregates.
Note that the Aggregate defines the consistency boundary within your command model. Thus any command taken in by it and all it's resulting events (and following state changes) will be regarded as an atomic operation. Making associations between several entities through this can mean two things:
You create a bigger Aggregate class which spans all AssetAggregates.
You have a External Command Handler (i.e. a #CommandHandler outside of an Aggregate) which handles the PairAssetsCommand.
I'd advise against option one, as it will enlarge the consistency boundary towards the entire set of assets in your system. This will eventually become a major bottleneck the maintain an Aggregate's requirement of "keeping the consistency boundary".
That thus leaves option 2. Let's rephrase the business logic you have defined:
if both the assets are unpaired then pair them(update the status to paired and add UUID to associatedAssets) else raise an exception
This means you cannot validate on a single instance, but need to do this on several. Again, you can take two routes to solving this:
Dispatch a AssociateWithAssetCommand to both the AssetAggregates and dispatch a compensating command if one of the AssetAggregates is already associated.
Use set based validation in the external command handler handling the PairAssetsCommand to validate your business logic.
Which of the two is best is left to preference I'd say. Solution two requires you to have a small query model containing a set of assets and their association status'. Added, this query model needs to be updated in the same transaction as when the association commands occur. Thus, somewhat more complicated.
Hence solution one would be the simplest way to go in your scenario.

What is better way to update JPA(Hibernate) entity: transactional or non-transactional and why?

I have a situation where I have to make a choice between two options and it's not clear for me what is the difference between that options. I will be very thankful if somebody could explain to me which one should I choose and why.
Long story short I have a simple JPA entity (Kotlin language):
#Entity
#Table(name = "account")
data class AccountEntity(
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
var id: Long,
var balance: Int,
#ManyToOne
var accountType: AccountTypeEntity
)
And in business-logic layer I want to have a method for updating account balance by its accountId. Basically I need to load account entity by id, then set new balance and lastly use save method which is provided by Hibernate. But I also found the fact that I don't need to call save method in explicit form if my method will be annotated with #transactional. So from that point I have two options
First one
fun updateAccountBalance(id: Long, balance: Int) {
val account = accountRepository.findById(id).orElseThrow { RuntimeException("No account with id=$id found") }
account.balance = balance
accountRepository.save(account)
}
Second one
#Transactional
fun updateAccountBalance(id: Long, balance: Int) {
val account = accountRepository.findById(id).orElseThrow { RuntimeException("No account with id=$id found") }
account.balance = balance
}
Firstly, for me it's not clear what will be the difference of those options in terms of database. Could you please clarify it?
Secondly, I think that in such method I don't really need TRANSACTION (in terms of database) at all because I make only one 'write' operation and for me it looks redundant to use to avoid calling hibernate save method in explicit form. But may be I'm wrong and there are some reasons to use transaction even here. So please correct me.
The difference is almost none in this case. The first example also creates a transaction as it will be created by the save() call when there is no running transaction to adopt. It will live for as long as the save() call. In the second example you create a transaction yourself, which will basically live for just as long as the method invocation. Since there is almost no logic in these methods their footprint will be mostly identical.
This is not a great example to try and figure this out as it is too simplistic. Things will get more interesting when you perform more complicated updates to the entity which may touch multiple tables and records and the same time, especially when you start to do changes which will cause cascaded persists, updates and deletes to happen when you modify a OneToMany collection.
Imagine a system which processes orders. orders have orderlines. And orders are tied to invoices. And orderlines are tied to invoicelines. And maybe orders have parent orders because they're grouped together. And payments are split in bookings and bookinglines which are tied to the orders, orderlines, invoices and invoicelines. Imagine what such an entity hierarchy does in a single save() statement.
In such cases it is all the more clear why a function such as save() still creates a transaction; that one save() call can still represent anywhere between one and thousands of statements being executed depending on the complexity of the entity hierarchy. Having the possibility to rollback all changes in case of a failure is a must.
When you start to work with such an entity structure, you will likely gravitate to using a #Transactional setup quite quickly as you will be running into the infamous lazy initialization error sooner or later.

Should GraphQL DataLoader wrap request to database or wrap requests to service methods?

I have very common GraphQL schema like this (pseudocode):
Post {
commentsPage(skip: Int, limit: Int) {
total: Int
items: [Comment]
}
}
So to avoid n+1 problem when requesting multiple Post objects I decided to use Facebook's Dataloader.
Since I'm working on Nest.JS 3-tier layered application (Resolver-Service-Repository), I have question:
should I wrap my repository methods with DataLoader or should I wrap my service methods with Dataloder?
Below is example of my service method that returns Comments page (i.e. this method called from commentsPage property resolver). Inside service method I'm using 2 repository methods (#count and #find):
#Injectable()
export class CommentsService {
constructor(
private readonly repository: CommentsRepository,
) {}
async getCommentsPage(postId, dataStart, dateEnd, skip, limit): PaginatedComments {
const counts = await this.repository.getCount(postId, dateStart, dateEnd);
const itemsDocs = await this.repository.find(postId, dateStart, dateEnd, skip, limit);
const items = this.mapDbResultToGraphQlType(itemsDocs);
return new PaginatedComments(total, items)
}
}
So should I create individual instances of Dataloader for each of repository method (#count, #find etc) or should I just wrap my entire service method with Dataloader (so my commentsPage property resolver will just work with Dataloader not with service)?
Disclaimer: I am not an expert in Nest.js but I have written a good bunch of dataloaders as well as worked with automatically generated dataloaders. I hope I can give a bit of insight nonetheless.
What is the actual problem?
While your question seems to be a relatively simple either or question it is probably much more difficult than that. I think the actual problem is the following: Whether to use the dataloader pattern or not for a specific field needs to be decided on a per field basis. The repository+service pattern on the other hand tries to abstract away this decision by exposing abstract and powerful ways of data access. One way out would be to simply "dataloaderify" every method of your service. Unfortunately in practice this is not really feasable. Let's explore why!
Dataloader is made for key-value-lookups
Dataloader provides a promise cache to reduce dublicated calls to the database. For this cache to work all requests need to be simple key value lookups (e.g. userByIdLoader, postsByUserIdLoader). This quickly becomes not sufficient enough, like in one of your example your request to the repository has a lot of parameters:
this.repository.find(postId, dateStart, dateEnd, skip, limit);
Sure technically you could make { postId, dateStart, dateEnd, skip, limit } your key and then somehow hash the content to generate a unique key.
Writing Dataloader queries is an order of magnitude harder than normal queries
When you implement a dataloader query it now suddenly has to work for a list of the inputs the initial query needed. Here a simple SQL example:
SELECT * FROM user WHERE id = ?
-- Dataloaded
SELECT * FROM user WHERE id IN ?
Okay now the repository example from above:
SELECT * FROM comment WHERE post_id = ? AND date < ? AND date > ? OFFSET ? LIMIT ?
-- Dataloaded
???
I have sometimes written queries that work for two parameters and they already become very difficult problems. This is why most dataloaders are simply load by id lookups. This tread on twitter discusses how a GraphQL API should only expose what can be efficiently queried. If you create service methods with strong filter methods you have the same problem even if your GraphQL API does not expose these filters.
Okay so what is the solution?
The first thing to my understanding that Facebook does is match fields and service methods very closely. You could do the same. This way you can make a decision in the service method if you want to use a dataloader or not. For example I don't use dataloaders in root queries (e.g. { getPosts(filter: { createdBefore: "...", user: 234 }) { .. }) but in subfields of types that appear in lists { getAllPosts { comments { ... } }. The root query is not going to be executed in a loop and is therefore not exposed to the n+1 problem.
Your repository now exposes what can be "efficiently queried" (as in Lee's tweet) like foreign/primary key lookups or filtered find all queries. The service can then wrap for example the key lookups in a dataloader. Often I end up filtering small lists in my business logic. I think this is perfectly fine for small apps but might be problematic when you scale. The GraphQL Relay helpers for JavaScript do something similar when you use the connectionFromArray function. The pagination is not done on the database level and this is probably okay for 90% of connections.
Some sources to consider
GraphQL before GraphQL - Dan Schafer
Dataloader source code walkthrough - Lee Byron
There is another talk from this years GraphQL conf that discusses the data access at FB but I don't think it is uploaded yet. I might come back when it has been published.

Objectify, efficient relationships. Ref<> vs storing id and duplicating fields

I'm having a hard time understanding Objectify entities relationship concepts. Let's say that i have entities User and UsersAction.
class User{
String nick;
}
class UsersAction{
Date actionDate;
}
Now in the frond-end app I want to load many UsersActions and display it, along with corresponding user's nick. I'm familiar with two concepts of dealing with this:
Use Ref<>,
I can put a #Load Ref in UsersAction, so it will create a link between this entites. Later while loading Users Action, Objectify will load proper User.
class User{
String nick;
}
class UsersAction{
#Load Ref<User> user;
Date actionDate;
}
Store Id and duplicate nick in UsersAction:
I can also store User's Id in UsersAction and duplicate User's nick while saving UsersAction.
class User{
String nick;
}
class UsersAction{
Long usersId;
String usersNick;
Date actionDate;
}
When using Ref<>, as far as I understand, Objectify will load all needed UsersActions, then all corresponding Users. When using duplication Objectify will only need to load UsersActions and all data will be there. Now, my question is. Is there a significant difference in performance, between this approaches? Efficiency is my priority but second solution seems ugly and dangerous to me since it causes data duplication and when User changes his nick, I need to update his Actions too.
You're asking whether it is better to denormalize the nickname. It's hard to say without knowing what kinds of queries you plan to run, but generally speaking the answer is probably no. It sounds like premature optimization.
One thing you might consider is making User a #Parent Ref<?> of UserAction. That way the parent will be fetched at the same time as the action in the same bulk get. As long as it fits your required transaction throughput (no more than 1 change per second for the whole User entity group), it should be fine.

Detecting duplicate table name via unit / integration test

On a couple of occasions now, as a result of copy-paste, I have created two JPA entities with the same table name. i.e.:
#Table(name = "myfirsttable")
public class MyFirstTable { #Id #Column private Long id; }
#Table(name = "myfirsttable")
public class MySecondTable { #Id #Column private Integer id; }
I'm using Spring Test, which means that fortunately at least one test fails when I do this. The trouble is that the failures I see will complain about data types. For example, in the above, I would see an exception raised from HibernateJpaAutoConfiguration.class such as expected int but found bigint for myfirsttable. If I look at the class which is supposed to be myfirsttable, I get confused (I'm easily confused), thinking "But it says it's a Long, so surely bigint is the correct mapping?" It can take me a while to work out why I'm seeing that particular message. Similarly, the stack trace may mention being unable to find a field.
So far, there are only a couple of occasions when I have felt the need to create two differing entities pointing at the same table, so as a means of covering the 99% of cases where two entities pointing at the same table is an error, I was wondering whether there is a simple way to set up a test, which would fail in a way that tells me up front that I have created a duplicate table name. I'm thinking about a single test that I can put into all of my projects, which could give me a useful warning identifying this issue.
There are 2 options, that I can see:
You could create a test that will just try and load your ApplicationContext. If that fails something is wrong. Unfortunately to find out what is wrong exactly, you'll have to dig through the logs.
The other option would be to write a test, that will look at all class annotated with #Table and see if more than one have the same table name. I use a similar test in one of my projects to make sure that no entity class uses primitives. There are libriares that make it easier to scan for classes with certain annotations.

Resources