Comparing Entities in javers without id - javers

My application uses jBPM. In jBPM may be long live processes, in processes may exist some variables.
For example, I have a business process of contract approval. When process started, empty contract created. After contract owner fill common properties, financial people fill theirs field, security theirs, lawyers theirs, and so on. In result, this long process finished and whole contract with all data will be saved into db. And all time before contract was only a process variable with id == null (saved at finish).
But in some other processes variables can be saved multiple times in process, not only at end (saved multiple times).
I need to track changes in process entities and use javers for it.
Process entities are hibernate #Entity, and by default javers treat them as Entity object. But id of contract appears only at the end of process. And I get ENTITY_INSTANCE_WITH_NULL_ID.
I try to compare them as ValueObject:
Javers javers = JaversBuilder.javers()
.registerValueObject(Contract.class)
.build()
In some situations properties of my entities are hibernate proxies. It is if entity saved multiple times in process. If I use .withObjectAccessHook(new HibernateUnproxyObjectAccessHook()) it tries to get whole db, object references, references of references and so on. So, it is not working solution.
How to get difference of my object in all cases? When Id is null, and when objects are proxies? And not select all data from db :)

Related

Javers global id fragility

Ran into what I believe is a bug or at least a shortcoming with Javers.
We have two applications that use a shared database.
The first application is responsible for creating the database entities.
The second application is a batch processing application that reads and updates the same entities after performing a long running batch process, eventually updating the entity with the results of that process.
The first application uses the audit log of the entity (provided by Javers) to show the various changes to the entity over time.
Here's the issue:
In the first application the package name for the entity is something like this:
a.b.c.Entity
In the second application the package name is slightly different:
a.b.c.d.Entity
The different package naming was done to follow the naming/package conventions of the two applications.
Otherwise the two mappings of the entities are identical. Both applications are set up to use Javers for audit logging.
We observed that the changes to the entity performed by the batch application were not showing up in the audit log in the main application.
As it turns out, Javers relies on the entity package name to identify the object recorded in the j_snapshot table. So we had two separate audit trails for the same database table: one for a.b.c.Entity and another for a.b.c.d.Entity even though they shared the same collection and id.
We were able resolve the issue by using the same package for the entity in both applications. This works but it seems rather brittle to me.
A similar scenario could arise if for whatever reason the package name of the mapped entity was changed. All history of the original entity would be essentially lost when looking up the audit changes for the newly renamed entity.
Is there a way to configure Javers to avoid this or are we tied to using the package name in the global_id_key and the object's id as an identifier?
As it's said in Javers' docs. You should use #TypeName:
#TypeName("Person")
class Person {
#Id String name
String position
}
https://javers.org/documentation/domain-configuration/#entity
#TypeName — a convenient way to name Entities and Value Objects. We recommend using this annotation for all Entities and Value Objects. Otherwise, Javers uses fully-qualified class names in GlobalIds, which hinders refactoring classes committed to JaversRepository.
Important! All classes with #TypeName should be registered in JaversBuilder using withPackagesToScan(String)
https://javers.org/documentation/domain-configuration/#supported-annotations

Should database primary keys be used to identify entities across microservices?

Given I have two microservices: Service A and Service B.
Service A owns full customer data and Service B requires a small subset of this data (which it gets from Service A through some bulk load say).
Both services store customers in their own database.
If service B then needs to interact with service A say to get additional data (e.g. GET /customers/{id}) it clearly needs a unique identifier that is shared between the two services.
Because the ids are GUIDs I could simply use the PK from Service A when creating the record in Service B. So both PKs match.
However this sounds extremely fragile. One option is to store the 'external Id' (or 'source id') as a separate field in Service B, and use that to interact with Service A. And probably this is a string as one day it may not be a GUID.
Is there a 'best practice' around this?
update
So I've done some more research and found a few related discussions:
Should you expose a primary key in REST API URLs?
Is it a bad practice to expose the database ID to the client in your REST API?
Slugs as Primary Keys
conclusion
I think my idea of trying to keep both Primary Keys for Customer the same across Service A and B was just wrong. This is because:
Clearly PKs are service implementation specific, so they may be totally incompatible e.g. UUID vs auto-incremented INT.
Even if you can guarantee compatibility, although the two entities both happen to be called 'Customer' they are effectively two (potentially very different) concepts and both Service A and Service B both 'own' their own 'Customer' record. What you may want to do though is synchronise some Customer data across those services.
So I now think that either service can expose customer data via its own unique id (in my case the PK GUID) and if one service needs to obtain additional customer data from another service it must store the other service identifier/key and uses that. So essentially back to my 'external id' or 'source id' idea but perhaps more specific as 'service B id'.
I think it depends a bit on the data source and your design. But, one thing I would avoid sharing is a Primary key which is a GUID or auto-increment integer to an external service. Those are internal details of your service and not what other services should take a dependency on.
I would rather have an external id which is more understood by other services and perhaps business as a whole. It could be a unique customer number, order number or a policy number as opposed to an id. You can also consider them as a "business id". One thing to also keep in mind is that an external id can also be exposed to an end-user. Hence, it is a ubiquitous way of identifying that "entity" across the entire organization and services irrespective of whether you have an Event-Driven-Design or if your services talk through APIs. I would only expose the DB ids to the infrastructure or repository. Beyond that, it is only a business/ external id.
Well, if you have an idea on Value Object, a business ID will much better for designing.
DDD focus on business, an pure UUID/Auto increment ID can't present it.
Use a business meaning ID(UL ID), like a Customer ID, instead of a simple ID.

Single vs. multiple Linq2sql repositories

I have a Users table, Events table, and a mapping of UserEvents. In some parts of my code, I just need user-based stuff. In other parts, I need all of this information. (Especially: given a user, what are the details of each event they are subscribed to?)
If I have one repository just for users and another for users + events + userevents, then the auto-created users object is duplicated and the code won't compile until I rename one of them. This is possible but inconvenient. On the other hand, if I only have one repository with all 3 tables, when I just want user info, will it be expensive due to linq getting all the associated data with that user id?
In Linq2Sql, is it more expensive if you have more tables in a single dbml/repository?
Linq2Sql uses lazy loading to get additional information. I believe it can be configured to fetch all at once, but that is not the default behavior. If you ask for a user, you will not get events unless you specifically ask for them.
I have a project with 100+ tables in the dbml, as far as I can tell this does not effect the the time to instanciate the datacontext class.

Referencing object's identity before submitting changes in LINQ

is there a way of knowing ID of identity column of record inserted via InsertOnSubmit beforehand, e.g. before calling datasource's SubmitChanges?
Imagine I'm populating some kind of hierarchy in the database, but I wouldn't want to submit changes on each recursive call of each child node (e.g. if I had Directories table and Files table and am recreating my filesystem structure in the database).
I'd like to do it that way, so I create a Directory object, set its name and attributes,
then InsertOnSubmit it into DataContext.Directories collection, then reference Directory.ID in its child Files. Currently I need to call InsertOnSubmit to insert the 'directory' into the database and the database mapping fills its ID column. But this creates a lot of transactions and accesses to database and I imagine that if I did this inserting in a batch, the performance would be better.
What I'd like to do is to somehow use Directory.ID before commiting changes, create all my File and Directory objects in advance and then do a big submit that puts all stuff into database. I'm also open to solving this problem via a stored procedure, I assume the performance would be even better if all operations would be done directly in the database.
One way to get around this is to not use an identity column. Instead build an IdService that you can use in the code to get a new Id each time a Directory object is created.
You can implement the IdService by having a table that stores the last id used. When the service starts up have it grab that number. The service can then increment away while Directory objects are created and then update the table with the new last id used at the end of the run.
Alternatively, and a bit safer, when the service starts up have it grab the last id used and then update the last id used in the table by adding 1000 (for example). Then let it increment away. If it uses 1000 ids then have it grab the next 1000 and update the last id used table. Worst case is you waste some ids, but if you use a bigint you aren't ever going to care.
Since the Directory id is now controlled in code you can use it with child objects like Files prior to writing to the database.
Simply putting a lock around id acquisition makes this safe to use across multiple threads. I've been using this in a situation like yours. We're generating a ton of objects in memory across multiple threads and saving them in batches.
This blog post will give you a good start on saving batches in Linq to SQL.
Not sure off the top if there is a way to run a straight SQL query in LINQ, but this query will return the current identity value of the specified table.
USE [database];
GO
DBCC CHECKIDENT ("schema.table", NORESEED);
GO

Mapping Linq Entities and Domain Objects and object tracking

If I map my Domain objects to linq Entities will I now not be able to track changes when saving my domain objects? So for any change in my model that i wish to make, once I map the object to linq entities for submission to db, all object values will be submitted to the db by linq since it it goes through a mapping first? Or would the object tracking here still be utilized?
Depends on the O/R mapper you're using. You're referring to entity framework which doesn't do any change tracking inside the entity and therefore it needs help from you when you re-attach an entity which previously was fetched from the db (so it knows it's not new).
Here's an article from microsoft about CRUD operations in multi-tiered environments (similiar issues to your Domain mapping scenario).
Check out the Update - With Complete Entities for the way to do change tracking yourself.
There's another technique, where you attach the entity as unmodified, and then .Refresh() with Keep Current Values - replacing the original. This would allow you to Insert/Update/Do Nothing as appropriate at the cost of a database roundtrip.

Resources