File Import Validation in domain driven design - validation

I am developing an application and trying to use a DDD approach. I modeled all my entities and aggregates in the domain. Now I would like to keep my validation logic in the domain, but I need to support some bulk import functionality(via a csv file). I need to validate the file and give the user information about the rows that need corrections.
What would be the best DDD way to do this? The validation logic is quite complex and I wouldn't like to duplicate the csv file validation in the domain. Also the file structure is a more flatten structure which is different from the domain aggregate. So even if I validate the aggregate the row information is lost.

What would be the best DDD way to do this? The validation logic is quite complex and I wouldn't like to duplicate the csv file validation in the domain.
To accomplish something of this nature, you can use a Saga or Process Manager. It would maintain your position in the file and issue commands to your Aggregates to perform the file import. If the commands fail or are rejected then you can update the corresponding rows with the error messages that are produced by the domain.
Also the file structure is a more flatten structure which is different from the domain aggregate. So even if I validate the aggregate the row information is lost.
You could group the records in the file if necessary, but the saga should maintain a record of the line numbers and corresponding commands. You may choose to model the file itself as an Aggregate which will provide benefits as well. Eliding domain concepts such as this will make things more difficult in the future, but adding them to the design will come at the cost of time.
I can provide more detail if you can explain: Are you using DDD, CQRS, ES, queues, async, etc? As #Dnomyar explained, you might also help to explain where you are facing actual difficulties.

Related

Text search for microservice architectures

I am investigating into implementing text search on a microservice based system. We will have to search for data that span across more than one microservice.
E.g. say we have two services for managing Organisations and managing Contacts. We should be able to search for organisations by contact details in one search operation.
Our preferred search solution is Elasticsearch. We already have a working solution based on embedded objects (and/or parent-child) where when a parent domain is updated the indexing payload is enriched with the dependent object data, which is held in a cache (we avoid making calls to the service managing child directly for this purpose).
I am wondering if there is a better solution. Is there a microservice pattern applicable to such scenarios?
It's not particularly a microservice pattern I would suggest you, but it fits perfectly into microservices and it's called Event sourcing
Event sourcing describes an architectural pattern in which events are generated by different sources. An event will now trigger 0 or more so called Projections which then use the data contained in the event to aggregate information in the form it is needed.
This is directly applicable to your problem: Whenever the organisation service changes it's internal state (Added / removed / updated an organization) it can fire an event. If an organization is added, it will for example aggregate the contacts to this organization and store this aggregate. The search for it is now trivial: Lookup the organizations id in the aggregated information (this can be indexed) and get back the contacts associated with this organization. Of course the same works if contracts are added to the contract service: It just fires a message with the contract creation information and the corresponding projections now alter different aggregates that can again be indexed and searched quickly.
You can have multiple projections responding to a single event - which enables you to aggregate information in many different forms - exactly the way you'd like to query it later. Don't be afraid of duplicated data: event sourcing takes this trade-off intentionally and since this is not the data your business-services rely on and you do not need to alter it manually - this duplication will not hurt you.
If you store the events in the chronological order they happened (which I seriously advise you to do!) You can 'replay' these events over and over again. This helps for example if a projection was buggy and has to be fixed!
If your're interested I suggest you read up on event sourcing and look for some kind of event store:
event sourcing
event store
We use event sourcing to aggregate an array of different searches in our system and we aggregate millons of records every day into mongodb. All projections have their own collection create their own indexes and until now we never had to resort to different systems / patterns like elastic search or the likes!
Let me know if this helped!
Amendment
use the data contained in the event to aggregate information in the form it is needed
An event should contain all the information necessary to aggregate more information. For example if you have an organization creation event, you need to at least provide some information on what the organizations name is, an ID of some kind, creation date, parent organizations ID etc. As a rule of thumb, we send all the information we gather in the service that gets the request (don't take it directly form the request ;-) check it first, then write it to the event and send it off) because we do not know what we're gonna need in the future. Just stay cautious - payloads should not get too large!
We can now have multiple projections responding to this event: One that adds the organizations to it's parents aggregate (to get an easy lookup for all children of a given organization), one that just adds it to the search set of all organizations and maybe a third that aggregates all the parents of a given child organization so the lookup for the parent organizations is easy and fast.
We have the same service process these events that also process client requests. The motivation behind it is, that the schema of the data that your projections create is tightly coupled to the way it is read by the service that the client interacts with. This does not have to be that way and it could be separated into two services - but you create an almost invisible dependency there and releasing these two services independently becomes even more challenging. But if you do not mind that additional level of complexity - you can separate the two.
We're currently also considering writing a generic service for aggregating information from events for things like searches, where projections could be scripted. That only makes the invisible dependencies problem less conspicuous, it does not solve it.

Create subsets for certain Resources to better fit existing data model?

We are trying to implement a FHIR Rest Server for our application. In our current data model (and thus live data) several FHIR resources are represented by multiple tables, e.g. what would all be Observations are stored in tables for vital values, laboratory values and diagnosis. Each table has an independent, auto-incrementing primary ID, so there are entries with the same ID in different tables. But for GET or DELETE calls to the FHIR server a unique ID is needed. What would be the most sensible way to handle this?
Searching didn't reveal an inherent way of doing this, so I'm considering these two options:
Add a prefix to all (or just the problematic) table IDs, e.g lab-123 and vit-123
Add a UUID to every table and use that as the logical identifier
Both have drawbacks: an ID parser is necessary for the first one and the second requires multiple database calls to identify the correct record.
Is there a FHIR way that allows to split a resource into several sub-resources, even in the Rest URL? Ideally I'd get something like GET server:port/Observation/laboratory/123
Server systems will have all sorts of different divisions of data in terms of how data is stored internally. What FHIR does is provide an interface that tries to hide those variations. So Observation/laboratory/123 would be going against what we're trying to do - because every system would have different divisions and it would be very difficult to get interoperability happening.
Either of the options you've proposed could work. I have a slight leaning towards the first option because it doesn't involve changing your persistence layer and it's a relatively straight-forward transformation to convert between external/fhir and internal.
Is there a FHIR way that allows to split a resource into several
sub-resources, even in the Rest URL? Ideally I'd get something like
GET server:port/Observation/laboratory/123
What would this mean for search? So, what would /Obervation?code=xxx search through? Would that search labs, vitals etc combined, or would you just allow access on /Observation/laboratory?
If these are truly "silos", maybe you could use http://servername/lab/Observation (so swap the last two path parts), which suggests your server has multiple "endpoints" for the different observations. I think more clients will be able to handle that url than the url you suggested.
Best, still, I think is having one of your two other options, for which the first is indeed the easiest to implement.

couchdb validation based on content from existing documents

QUESTION
Is it possible to query other couchdb documents as part of a standard couchdb validation function ?
If not, what is the standard approach for including properties of other documents as part of a validation rule inside a couchdb validation function?
RATIONALE
Consider a run-of-the-mill address book application where the validation function is intended to prevent two or more entries having the same value for the 'e-mail' in one of the address book entry fields.
Consider also an address book application where it is possible to specify validation rules in separate documents, based on whether the postal code is a US-based postal code or something else.
No, it is not possible to query other couchdb documents in a validate_doc_update function. Each runs in isolation passing references only to: the new document, the old document, and user (where applicable).
My personal experience has been there are at least three options for dealing with duplicate checking:
Use Cloudant as your CouchDB provider. They offer a free tier for now if you'd like to experiment, but they guarantee consistency across nodes for a CouchDB database. (See #2)
I've used a secondary "reserve table" for names using the type-key as the ID. Then, you need to check for conflicts if not using a system like Cloudant. Basically, there's a simple document that maintains a key to prevent duplicates. It's not fun code to write given that you need to watch for conflicts. (Even with cloudant, you need to deal with failed requests to write, but it's easier than dealing with timing issues surrounding data replication across multiple nodes).
Use a traditional DB like MySQL for example that can maintain a unique and consistent index for specific data values like you're describing. Store the documents away in CouchDB though. While slightly annoying that you need different data providers, it's reliable.
(Optional: decide that CouchDB isn't a great fit for the type of system you're building)

A Spring DAO that can adapt to changes in the data

For application developers, I suppose the traditional paradigm for writing an application with domain objects that can be persisted to an underlying data store (SQL database for arguments sake), is to write the domain objects and then write (or generate) the table structure. There is a tight coupling between what the domain object looks like and what the structure of underlying data store looks like. So if you want to add a piece of information to your domain object, you add the field to your code and then add a column to the appropriate database table. All familiar?
This is all well and good for data stores that have a well defined structure (I'm mainly talking about SQL databases whereby the tables and columns are pre-defined and fixed), but now a number of alternatives to the ubiquitous SQL database exist and these often do not constrain the data in this way. For instance, MongoDB is a NoSQL database whereby you divide data into collections but aside from that there is no structuring of the data. You don't define new columns when you want to add a new field.
Now to the question: given the flexibility of a data store like MongoDB, how would one go about achieving a similar kind of flexibility in the domain objects that represent this data? So for instance if I'm using Spring and creating my own domain obejcts, when I add a "middleName" field to my data, how can I avoid having to add a "middleName" field to my domain object? I'm looking for some kind of mechanism/approach/framework to dynamically inspect the data and have access to it in my domain object without having to make a code change every time. All ideas welcome.
I think you have a couple of choices:
You can use a dynamic programming language and not have domain objects (clojure for example)
If you're fixed on using java, the mongo java driver returns data in DBObject which is essentially a Map. So the default behavior already provides what you want. It's only when you map the DBObject into domain objects, using a library like morphia (or spring-data), that you even have to worry about domain objects at all.
But, if I was using java, I would stick with the standard convention of domain objects mapped via morphia, because I think adding a field is a very minor inconvenience when compared against the benefits.
I think the question is inherintly paradoxical-
On one hand, you want to have domain objects, i.e. objects that represent the data (and behaviour) of your problem domain.
On the other hand, you say that you don't want your domain objects to be explicitly influenced by changes to the data.
But when you have objects that represent your problem domain, you want to do just that- to represent your problem domain.
So that if, for example, middle name is added, then your representation of the real-life 'User' entity should change to accomodate this change to the real-life user; perhaps not only by adding this piece of data to your object, but also adding some related behaviour (validation of middle name, or some functionality related to it).
In essense, what I'm trying to say here is that when you have (classic OO) domain objects, you may need to change your behaviour / functionality along with your data, and since you don't have any automatic way of changing your behaviour, the question of automatically changing your data becomes irrelevant.
If you don't want behaviour associated with your data, then you essentialy have DTOs, and #Kevin's answer is what you're looking for.
Honestly, it sounds more like you're looking for some kind of blackbox DTO where, like you describe, fields are added or removed "arbitrarily" depending on the data. This makes me inclined to suggest a simple Map to do the job. You can't really have a domain-driven design if your domain model is constantly changing.

UI-centric vs domain-centric data model - pros and cons

How closely does your data model map to your UI and domain model?
The data model can be quite close to the domain model if it has, for example, a Customer table, an Employee table etc.
The UI might not reflect the data model so closely though - for example, there may be multiple forms, all feeding in bits-and-pieces of Customer data along with other miscellaneous bits of data. In this case, one could you have separate tables to hold the data from each form. As required the data can then combined at a future point... Alternatively one could insert the form data directly into a Customer table, so that the data model does not correlate well to the UI.
What has proven to work better for you?
I find it cleaner to map your domain model to the real world problem you are trying to solve.
You can then create viewmodels which act as a bucket of all the data required by your view.
as stated, your UI can change frequently, but this does not usually change the particular domain problem you are tackling...
information can be found on this pattern here:
http://blogs.msdn.com/dphill/archive/2009/01/31/the-viewmodel-pattern.aspx
UI can change according to many needs, so it's generally better to keep data in a domain model, abstracted away from any one UI.
If I have a RESTful service layer, what they are exposing the domain model. In that case , the UI(any particular screen) calls a number of these services and from the domain models collected composes the screen. In this scenario although domain models bubble all the way up to UI the UI layer skims out the necessary data to build its particular screen. There are also some interesting questions on SO about on using domain model(annotated) for persistence.
My point here is the domain models can be a single source of truth. It can do the work of carrying data , encapsulating logic fairly well. I have worked on projects which had a lot of boilerplate code translating each domain model to DTO, VO , DO and what-have-yous. A lot of that looked quite unnecessary and more due to habit in most cases.

Resources