Single vs. multiple Linq2sql repositories - linq

I have a Users table, Events table, and a mapping of UserEvents. In some parts of my code, I just need user-based stuff. In other parts, I need all of this information. (Especially: given a user, what are the details of each event they are subscribed to?)
If I have one repository just for users and another for users + events + userevents, then the auto-created users object is duplicated and the code won't compile until I rename one of them. This is possible but inconvenient. On the other hand, if I only have one repository with all 3 tables, when I just want user info, will it be expensive due to linq getting all the associated data with that user id?
In Linq2Sql, is it more expensive if you have more tables in a single dbml/repository?

Linq2Sql uses lazy loading to get additional information. I believe it can be configured to fetch all at once, but that is not the default behavior. If you ask for a user, you will not get events unless you specifically ask for them.

I have a project with 100+ tables in the dbml, as far as I can tell this does not effect the the time to instanciate the datacontext class.

Related

What should be projection primary key on query side - CQRS, Event Sourcing, Microservices

I have one thing that confuses me.
I have 2 microservices.
One creates commands and other consumes commands and produces events (events are stored in Event Store).
In my example aggregates have Guid as Entity ID, and Guid is created when aggregate is created.
Thing that confuses me is, should that key (generated on write side) be transfered via Event to query side (microservice that created command)?
Or maybe query side (projection) should have separate id in read DB.
Or maybe I should generate some shared key?
What is best solution here?
I think it all depends on your setup.
If you are doing CQRS, and you have a separate read-service (within the same bounded context), then it is up to the read-side service to model the data as it wish, either reusing the same keys or not.
If you are communicating between two different services (separate bounded contexts) then I recommend you create new primary keys in the receiving service and use the incoming key as a foreign key. Just as you would do with relationships between two tables in a SQL-database.
I think this depends on your requirements. Is there a specific reason to have different keys?
Given that you are using Guids as your PK, it seems simplest to reuse the PKs assigned by the write side.
Some reasons you might want to keep the keys consistent:
During command processing an ID was returned to the client that they may have cached and should reasonably expect to be able to use that key when querying the read side.
If your write side data is long-lived and there is an bug on your read side output, it is gonna be much easier to debug what went wrong if your keys are consistent on write and read side.
Entities in the write side will use the write side Guid PK of another entity as its FK. When you emit an event for this new dependent entity you would want the read side to be able to build the relationship back to the principal.
This is kind of an odd question.
Your primary key on a projection could literally be anything or you might not even have one.
There is no "correct answer" for this question ... It depends entirely on the projection.
What if my projection was say just a flattening out of information associated to an aggregate ... As example we have an "order" and we make a row per order showing summary information about that order. Using an "OrderId" here would seemingly make some sense as my primary key.
What if my projection was building out counts of orders by Product? Well then using a "ProductItemId" would make a lot more sense.
What if in either of these cases the Ids themselves ("OrderId" and "ProductItemId") could change? Well then using another key might make a lot of sense.
What if this is an append-only table? I might not even want to have a key.
Again, there is not a ... correct ... answer here there are many situations that you may run into.

Text search for microservice architectures

I am investigating into implementing text search on a microservice based system. We will have to search for data that span across more than one microservice.
E.g. say we have two services for managing Organisations and managing Contacts. We should be able to search for organisations by contact details in one search operation.
Our preferred search solution is Elasticsearch. We already have a working solution based on embedded objects (and/or parent-child) where when a parent domain is updated the indexing payload is enriched with the dependent object data, which is held in a cache (we avoid making calls to the service managing child directly for this purpose).
I am wondering if there is a better solution. Is there a microservice pattern applicable to such scenarios?
It's not particularly a microservice pattern I would suggest you, but it fits perfectly into microservices and it's called Event sourcing
Event sourcing describes an architectural pattern in which events are generated by different sources. An event will now trigger 0 or more so called Projections which then use the data contained in the event to aggregate information in the form it is needed.
This is directly applicable to your problem: Whenever the organisation service changes it's internal state (Added / removed / updated an organization) it can fire an event. If an organization is added, it will for example aggregate the contacts to this organization and store this aggregate. The search for it is now trivial: Lookup the organizations id in the aggregated information (this can be indexed) and get back the contacts associated with this organization. Of course the same works if contracts are added to the contract service: It just fires a message with the contract creation information and the corresponding projections now alter different aggregates that can again be indexed and searched quickly.
You can have multiple projections responding to a single event - which enables you to aggregate information in many different forms - exactly the way you'd like to query it later. Don't be afraid of duplicated data: event sourcing takes this trade-off intentionally and since this is not the data your business-services rely on and you do not need to alter it manually - this duplication will not hurt you.
If you store the events in the chronological order they happened (which I seriously advise you to do!) You can 'replay' these events over and over again. This helps for example if a projection was buggy and has to be fixed!
If your're interested I suggest you read up on event sourcing and look for some kind of event store:
event sourcing
event store
We use event sourcing to aggregate an array of different searches in our system and we aggregate millons of records every day into mongodb. All projections have their own collection create their own indexes and until now we never had to resort to different systems / patterns like elastic search or the likes!
Let me know if this helped!
Amendment
use the data contained in the event to aggregate information in the form it is needed
An event should contain all the information necessary to aggregate more information. For example if you have an organization creation event, you need to at least provide some information on what the organizations name is, an ID of some kind, creation date, parent organizations ID etc. As a rule of thumb, we send all the information we gather in the service that gets the request (don't take it directly form the request ;-) check it first, then write it to the event and send it off) because we do not know what we're gonna need in the future. Just stay cautious - payloads should not get too large!
We can now have multiple projections responding to this event: One that adds the organizations to it's parents aggregate (to get an easy lookup for all children of a given organization), one that just adds it to the search set of all organizations and maybe a third that aggregates all the parents of a given child organization so the lookup for the parent organizations is easy and fast.
We have the same service process these events that also process client requests. The motivation behind it is, that the schema of the data that your projections create is tightly coupled to the way it is read by the service that the client interacts with. This does not have to be that way and it could be separated into two services - but you create an almost invisible dependency there and releasing these two services independently becomes even more challenging. But if you do not mind that additional level of complexity - you can separate the two.
We're currently also considering writing a generic service for aggregating information from events for things like searches, where projections could be scripted. That only makes the invisible dependencies problem less conspicuous, it does not solve it.

optimization of db queries when implementing bounded contexts

In our project we're trying to apply the Bounded Context ideology and we've faced kind of obvious problem of performance. E.g., we have different classes (in different contexts) for representing a user in the system: Person in our core domain's context and User in security context. So, we have two different repositories for each of the aggregate, but they are using the same table in DB and sometimes accessing the same data.
Is there common solution to minimize db roundtrips in this case? Are there ORM's which deals with it, or should we code some caching system by ourselves?
upd: the db is from legacy app, and we'll have to use it "as is"
So, we have two different repositories for each of the aggregate, but
they are using the same table in DB and sometimes accessing the same
data.
The fact that you have two aggregates stored in the same table is an indication of a problem with the design. In this case, it seems you have two bounded contexts - a BC for the core domain (Person is here) and an identity/access BC (User is here). The BCs are related and the latter can be seen as upstream from the former. A Person in the core domain has a corresponding User in the identity BC, but they are not exactly the same thing.
Beyond this relationship between the BCs there are questions regarding ownership of behavior. For example, both a Person and a User may have a name and what is to be determined is who own's the behavior of changing a name. This can be implemented in several ways. Person may have its own name and changes should be propagated to the identity BC. Similarly, User may own changes to name, in which case they must be propagated to Person via a synchronization mechanism.
Overall, your problem could be addressed in two ways. First, you can store Person and User aggregates in different tables. Any given query should only use one of these tables and they can be synchronized in an eventually consistent matter. Another approach is to decouple the behavioral domain model from a model designed for queries (read-model). This way, you can create a read-model designed to serve a specific screen(s) and have a customized query, perhaps even outside of an ORM.
If all the Users are Person too (sometimes external services are modeled as special users too), the only data that User and Person should share on the database are their identifiers.
Indeed each entity in a domain model should hold references only to the data that they need to ensure their invariants.
Moreover I guess that Users are identified by Username and Persons are identified by something else (VAT code or so..).
Thus, the simplest optimization technique is to avoid to encapsulate in an entity those informations that are not required to ensure its invariants.
Furthermore you simply need an effective context mapping technique to easily pass from User to Person when needed. I use shared identifiers for this.
As an example you can expose the Person's identifier in the User class, so that a simple query to the Person's repository can provide you the data you need.
Finally I suggest you the Vaughn Vernon series on Aggregate Root Design.

How to organize the DataClasses.dbml file

I would like to find out how people out there manage the dbml file in a more scalable manner?
Do you have just one DataClasses1.dbml and drag every table into it?
Do you have separate files for separate logical groupings, eg Accounts, HR? If so, how do you visually see the foreign key relationships when one table has links to a table in another dbml file?
Thanks.
Better will be to use one single DBML file for all your tables, so that you can see all your relations i.e Foreign Key etc all together..But its depends upon your requirement totally..
Using Entity Framework (same for linq-to-sql) I like to use separate context classes for distinct parts of the database.
But what is "distinct"?
In most cases everything that is related to the core business of an application is too much interrelated for a separate context to be meaningful. But almost every application has lateral tasks like authorization, translation, auditing and so on. These are good candidates for separate contexts.
There will still be connections to the business logic though. As you probably know, you cannot join classes from separate contexts in a way that the join is translated to SQL. Only in memory. So it is useful to duplicate some entities in several contexts. So, for instance, both the business context and the authorization context will contain User entities. One context should be responsible for maintenance of the entity and the other one(s) should use it read-only.
Edit
By duplication of entities I mean that two (or more) contexts can have an entity that maps to the same table in the database. Like User. If you like, the business context could be for creating and updating users, the authorization context is (for instance) for adding roles to a specific user, without modifying the user itself.

Handling passive deletion updates (ie. archiving instead of deleting)

We are developing an application based on DDD principles. We have encountered a couple of problems so far that we can't answer nor can we find the answers on the Internet.
Our application is intended to be a cloud application for multiple companies.
One of the demands is that there are no physical deletions from the database. We make only passive deletion by setting Active property of entities to false. That takes care of Select, Insert and Delete operations, but we don't know how to handle update operations.
Update means changing values of properties, but also means that past values are deleted and there are many reasons that we don't want that. One of the primary reason is for Accounting purposes.
If we make all update statements as "Archive old values" and then "Create new values" we would have a great number of duplicate values. For eg., Company has Branches, and Company is the Aggregate Root for Branches. If I change Companies phone number, that would mean I have to archive old company and all of its branches and create completely new company with branches just for one property. This may be a good idea at first, but over time there will be many values which can clog up the database. Phone is maybe an irrelevant property, but changing the Address (if street name has changed, but company is still in the same physical location) is a far more serious problem.
Currently we are using ASP.NET MVC with EF CF for repository, but one of the demands is that we are able to easily switch, or add, another technology like WPF or WCF. Currently we are using Automapper to map DTO's to Domain entities and vice versa and DTO's are primary source for views, ie. we have no view models. Application is layered according to DDD principle, and mapping occurs in Service Layer.
Another demand is that we musn't create a initial entity in database and then fill the values, but an entire aggregate should be stored as a whole.
Any comments or suggestions are appreciated.
We also welcome any changes in demands (as this is an internal project, and not for a customer) and architecture, but only if it's absolutely neccessary.
Thank you.
Have you ever come across event sourcing? Sounds like it could be of use if you're interested in tracking the complete history of aggregates.
To be honest I would create another table that would be a change log inserting the old record and deleted records etc etc into it before updating the live data. Yes you are creating a lot of records but you are abstracting this data from live records and keeping this data as lean as possible.
Also when it comes to clean up and backup you have your live date and your changed / delete data and you can routinely back up and trim your old changed / delete and reduced its size depending on how long you have agreed to keep changed / delete data live with the supplier or business you are working with.
I think this would be the best way to go as your core functionality will be working on a leaner dataset and I'm assuming your users wont be wanting to check revision and deletions of records all the time? So by separating the data you are accessing it when it is needed instead of all the time because everything is intermingled.

Resources