Related
I am new to domain driven design and there is one thing that is bothering me when I'm writing domain model. How to handle domain validation?
I am designing library management system where user can search through books and see if book is on the stock.
If it is not, user can create request for book so some kind of queue is created. Rule is that we don't have any book in stock. Right now I have information about quantity inside book entity and that is not a problem but what if i have different bounded context for requesting books and book catalog. Then I must somehow contact another vertical/service and ask (validate) that book quantity is zero before creating book aggregate.
Also I am checking if user have valid membership card, is book already borrowed by him, do user have active requests for any book.
Things that bother me.
I need to know what exactly to include in aggregate before passing it to domain model because of validations. I am not sure that is safest approach because my validations accuracy will depend from specification/query, etc.
Another very important thing. When application layer method start with execution and something is not valid client will get only validation messages for code that was executed and there is good chance there is more things that are preventing code for execution. This can be really inconvenient if user is filling some form.
First thoughts for solving this problem.
I have command/handler architecture and I am using MediatR so I am thinking to move domain validations between command and handler and that will solve my problems for now but that approach will spread domain knowledge across bounded context and domain model will not be smart enough to guard from not valid actions. More precise I will need to think before executing application method (handler) what I need to validate.
So I am really curious. Is there any clear way of handling domain validations inside domain model?
Is there any clear way of handling domain validations inside domain model?
Yes; they require work and careful thinking.
One aspect of careful thinking is to distinguish message validation from domain logic. Message validation is an isolated thing, a message is valid or not according to the schema of the message -- are all of the required fields present, is the data in the right form, are the numbers in the allowed range, and so on. Really, we're asking the question "did the client fill out the form correctly?"
Integrating a valid message with previously known information (aka, the "state" of the domain model) is a domain logic concern. State is chosen deliberately - the domain model is a state machine for the bookkeeping of your domain.
Depending on your domain, and the information that is available, there can be states that mean that the client doesn't get what they want. "The road less traveled" doesn't mean that things are invalid.
Furthermore, if your system is distributed (different pieces of data are the responsibility of different authorities), then any locally cached copies of that data are necessarily stale, and may be out of date. See Pat Helland's Memories, Guesses, and Apologies. That we will sometimes produce an incorrect answer is an inevitable consequence of distributing the work. If we're responsible, then we performed a cost benefit analysis to ensure that the expected benefits of distributing the work offset the expected risks.
So, I'm working on a CQRS/ES project in which we are having some doubts about how to handle trivial problems that would be easy to handle in other architectures
My scenario is the following:
I have a customer CRUD REST API and each customer has unique document(number), so when I'm registering a new customer I have to verify if there is another customer with that document to avoid duplicity, but when it comes to a CQRS/ES architecture where we have eventual consistency, I found out that this kind of validations can be very hard to address.
It is important to notice that my problem is not across microservices, but between the command application and the query application of the same microservice.
Also we are using eventstore.
My current solution:
So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%. That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.
Altough this works, there are 2 things that bother me here, the first thing is my command application relying on the query application, so if my query application is down, my command is affected (today I just return false on my validation if query is down but still...) and second thing is, should a query/read model really be able to emit events? And if so, what is the correct way of doing it? Should the command have some kind of API for that? Or should the query emit the event directly to eventstore using some common shared library? And if I have more than one view/read? Which one should I choose to handle this?
Really hope someone could shine a light into these questions and help me this these matters.
For reference, you may want to be reviewing what Greg Young has written about Set Validation.
I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right?
That's exactly right - your read model is stale copy, and may not have all of the information collected by the write model.
That's when my second validation kicks in, when my query application is processing the events and saving them to my PostgreSQL, I check again if there is a customer with that document and if there is, I reject that event and emit a compensating event to undo/cancel/inactivate the customer with the duplicated document, therefore finishing that customer stream on eventstore.
This spelling doesn't quite match the usual designs. The more common implementation is that, if we detect a problem when reading data, we send a command message to the write model, telling it to straighten things out.
This is commonly referred to as a process manager, but you can think of it as the automation of a human supervisor of the system. Conceptually, a process manager is an event sourced collection of messages to be sent to the command model.
You might also want to consider whether you are modeling your domain correctly. If documents are supposed to be unique, then maybe the command model should be using the document number as a key in the book of record, rather than using the customer. Or perhaps the document id should be a function of the customer data, rather than being an arbitrary input.
as far as I know, eventstore doesn't have transactions across different streams
Right - one of the things you really need to be thinking about in general is where your stream boundaries lie. If set validation has significant business value, then you really need to be thinking about getting the entire set into a single stream (or by finding a way to constrain uniqueness without using a set).
How should I send a command message to the write model? via API? via a message broker like Kafka?
That's plumbing; it doesn't really matter how you do it, so long as you are sure that the command runs within its own transaction/unit of work.
So what I do today is, in my command application, before saving the CustomerCreated event, I ask the query application (using PostgreSQL) if there is a customer with that document, and if not, I allow the event to go on. But that doesn't guarantee 100%, right? Because my query can be desynchronized, so I cannot trust it 100%.
No, you cannot safely rely on the query side, which is eventually consistent, to prevent the system to step into an invalid state.
You have two options:
You permit the system to enter in a temporary, pending state and then, eventually, you will bring it into a valid permanent state; for this you could allow the command to pass, yield CustomerRegistered event and using a Saga/Process manager you verify against a uniquely-indexed-by-document-collection and issue a compensating command (not event!), i.e. UnregisterCustomer.
Instead of sending a command, you create&start a Saga/Process that preallocates the document in a uniquely-indexed-by-document-collection and if successfully then send the RegisterCustomer command. You can model the Saga as an entity.
So, in both solution you use a Saga/Process manager. In order for the system to be resilient you should make sure that RegisterCustomer command is idempotent (so you can resend it if the Saga fails/is restarted)
You've butted up against a fairly common problem. I think the other answer by VoicOfUnreason is worth reading. I just wanted to make you aware of a few more options.
A simple approach I have used in the past is to create a lookup table. Your command tries to register the key in a unique constraint table. If it can reserve the key the command can go ahead.
Depending on the nature of the data and the domain you could let this 'problem' occur and raise additional events to mark it. If it is something that's important to the business/the way the application works then you can deal with it either manually or at the time via compensating commands. if the latter then it would make sense to use a process manager.
In some (rare) cases where speed/capacity is less of an issue then you could consider old-fashioned locking and transactions. Admittedly these are much better suited to CRUD style implementations but they can be used in CQRS/ES.
I have more detail on this in my blog post: How to Handle Set Based Consistency Validation in CQRS
I hope you find it helpful.
How we can implement Two recaptcha user control on the same page.
Problem :
we have two views one for tell a friend about the site by sending e-mail and another authoring any note. the tell a friend part is hidden which send e-mail through ajax and note author part is visible so it is making problem when we need both but in different way.
Refactor to avoid usability issue
This doesn't seem reasonable and I would suggest to avoid this at all costs, because you have a serious usability issue if you do need two of them. Why would you need two captchas anyway? The main idea behind a captcha is that it assures that there was a person entering data in the form and not a computer.
So if there's one captcha on the page, you're assured. So if the first one was filled by a person, all other data is as well and you don't need a second one.
But I can see one scenario where two captchas could come into place. And that's when you'd have two <form> elements on the page. So a user can either submit one or the other. In this case user will always submit data from just one form and not both. So you could avoid this as well by either:
separating these two forms into two pages/views with an additional pre-condition page where a user would select one of the two forms
hiding captcha at first, but when a user starts entering data into one of the forms you could move the hidden DIV with captcha inside the form and display it. This way there would only be one captcha on the page and it would be on the form that the user is about to send
The second one is the one you'd want to avoid. If you give us more details what your business problem is, we could give you a much better answer.
Alternatives
Since you described your actual business problem I suggest you take a look at the honey pot trick, that is more frequently used for this kind of scenarios. Because if you used too many captchas on your site, people would get annoyed. They are tedious work, that's for sure. Honey pot trick may help you avoid these unnecessary data entering.
The other question is of course: Are your users logged in when they have these actions available? Especially the editing one. If they are, you can better mitigate this problem. You could set a time limit per user for sending out messages. Like few per minute. That's what a person would do. And of course store the information about sending out these emails, so you can still keep historical track of what users did so you can disable accounts of this gets abused. But when users are logged in they normally don't have to enter captchas since they've already identified themselves during authentication phase.
The ultimate question is of course: Why would a bot send out emails to friends? they wouldn't be able to send any kind of spam would they? What's the point then? It's more likely that bots will abuse your system if they can spam users anyhow. Either by sending email with content or leaving spam comments on your site. These forms need to be bot checked.
I'm struggling to apply RESTful principles to a new web application I'm working on. In particular, it's the idea that to be RESTful, each HTTP request should carry enough information by itself for its recipient to process it to be in complete harmony with the stateless nature of HTTP.
The application allows users to search for medications. The search accepts filters as input, for example, return discontinued medicines, include complimentary therapy etc..etc. In total there are around 30 filters that can be applied.
Additionally, patient details can be entered including the patients age, gender, current medications etc.
To be Restful, should all this information be included with every request? This seems to place a huge overhead on the network. Also, wouldn't the restrictions on URL length, at least for GET, make this unfeasible?
The "Filter As Resource" is a perfect tact for this.
You can PUT the filter definition to the filter resource, and it can return the filter ID.
PUT is idempotent, so even if the filter is already there, you just need to detect that you've seen the filter before, so you can return the proper ID for the filter.
Then, you can add a filter parameter to your other requests, and they can grab the filter to use for the queries.
GET /medications?filter=1234&page=4&pagesize=20
I would run the raw filters through some sort of canonicalization process, just to have a normalized set, so that, e.g. filter "firstname=Bob lastname=Eubanks" is identical to "lastname=Eubanks firstname=Bob". That's just me though.
The only real concern is that, as time goes on, you may need to obsolete some filters. You can simply error out the request should someone make a request with a missing or obsolete filter.
Edit answering question...
Let's start with the fundamentals.
Simply, you want to specify a filter for use in queries, but these filters are (potentially) involved and complicated. If it was simple /medications/1234, this wouldn't be a problem.
Effectively, you always need to send the filter to the query. The question is how to represent that filter.
The fundamental issue with things like sessions in REST systems is that they're typically managed "out of band". When you, say, go and create a medication, you PUT or POST to the medications resource, and you get a reference back to that medication.
With a session, you would (typically) get back a cookie, or perhaps some other token to represent that session. If your PUT to the medications resource created a session also, then, in truth, your request created two resources: a medication, and a session.
Unfortunately, when you use something like a cookie, and you require that cookie for your request, the resource name is no longer the true representation of the resource. Now it's the resource name (the URL), and the cookie.
So, if I do a GET on the resource named /medications/search, and the cookie represents a session, and that session happens to have a filter in it, you can see how in effect, that resource name, /medications/search, isn't really useful at all. I don't have all of the information I need to make effective use, because of the side effect of the cookie and the session and the filter therein.
Now, you could perhaps rewrite the name: /medications/search?session=ABC123, effectively embedding the cookie in the resource name.
But now you run in to the typical contract of sessions, notably that they're short lived. So, that named resource is less useful, long term, not useless, just less useful. Right now, this query gives me interesting data. Tomorrow? Probably not. I'll get some nasty error about the session being gone.
The other problem is that sessions typically are not managed as a resource. For example, they're usually a side effect, vs explicitly managed via GET/PUT/DELETE. Sessions are also the "garbage heap" of web app state. In this case, we're just kind of hoping that the session is properly populated with what is needed for this request. We actually don't really know. Again, it's a side effect.
Now, let's turn it on its head a little bit. Let's use /medications/search?filter=ABC123.
Obviously, casually, this looks identical. We just changed the name from 'session' to 'filter'. But, as discussed, Filters, in this case, ARE a "first class resource". They need to be created, managed, etc. the same as a medication, a JPEG, or any other resource in your system. This is the key distinction.
Certainly, you could treat "sessions" as a first class resource, creating them, putting stuff in them directly, etc. But you can see how, at least from a clarity point of view, a "first class" session isn't really a good abstraction for this case. Using a session, its like going to the cleaners and handing over your entire purse or briefcase. "Yea, the ticket is in there somewhere, dig out what you want, give me my clothes", especially compared to something explicit like a filter.
So, you can see how, at 30,000 feet, there's not a lot of difference in the case between a filter and a session. But when you zoom in, they're quite different.
With the filter resource, you can choose to make them a persistent thing forever and ever. You can expire them, you can do whatever you want. Sessions tend to have pre-conceived semantics: short live, duration of the connection, etc. Filters can have any semantics you want. They're completely separate from what comes with a session.
If I were doing this, how would I work with filters?
I would assume that I really don't care about the content of a filter. Specifically, I doubt I would ever query for "all filters that search by first name". At this juncture, it seems like uninteresting information, so I won't design around it.
Next, I would normalize the filters, like I mentioned above. Make sure that equivalent filters truly are equivalent. You can do this by sorting the expressions, ensuring fieldnames are all uppercase, or whatever.
Then, I would store the filter as an XML or JSON document, whichever is more comfortable/appropriate for the application. I would give each filter a unique key (naturally), but I would also store a hash for the actual document with the filter.
I would do this to be able to quickly find if the filter is already stored. Since I'm normalizing it, I "know" that the XML (say) for logically equivalent filters would be identical. So, when someone goes to PUT, or insert a new filter, I would do a check on the hash to see if it has been stored before. I may well get back more than one (hashes can collide, of course), so I'll need to check the actual XML payloads to see whether they match.
If the filters match, I return a reference to the existing filter. If not, I'd create a new one and return that.
I also would not allow a filter UPDATE/POST. Since I'm handing out references to these filters, I would make them immutable so the references can remain valid. If I wanted a filter by "role", say, the "get all expire medications filter", then I would create a "named filter" resource that associates a name with a filter instance, so that the actual filter data can change but the name remain the same.
Mind, also, that during creation, you're in a race condition (two requests trying to make the same filter), so you would have to account for that. If your system has a high filter volume, this could be a potential bottleneck.
Hope this clarifies the issue for you.
To be Restful, should all this information be included with every request?
No. If it looks like your server is sending (or receiving) too much information, chances are that there are one or more resources you haven't yet identified.
The first and most important step in designing a RESTful system is to identify and name your resources. How would you do that for your system?
From your description, here's one possible set of resources:
User - a user of the system (maybe a doctor or patient (?) - Role might need to be exposed as a resource here)
Medication - the stuff in the bottle, but it also might represent the kind of bottle (quantity and contents), or it might represent a particular bottle - depending on if you're a pharmacy or just a help desk.
Disease - the condition for which a Patient might want to take a Medication.
Patient - a person who might take a Medication
Recommendation - a Medication that might be beneficial to a Patient based on a Disease they suffer from.
Then you could look for relationships among resources;
User has and belongs to many Roles
Medication has and belongs to many Diseases
Disease has many Recommendations.
Patient has and belongs to many Medications and Diseases (poor chap)
Patient has many Recommendations
Recommendation has one Patient and has one Disease
The specifics are probably not right for your particular problem, but the idea is simple: create a network of relationships among your resources.
At this point it might be helpful to think about URI structure, although keep in mind that REST APIs must be hypertext-driven:
# view all Recommendations for the patient
GET http://server.com/patients/{patient}/recommendations
# view all Recommendations for a Medication
GET http://servier.com/medications/{medication}/recommendations
# add a new Recommendation for a Patient
PUT http://server.com/patients/{patient}/recommendations
Because this is REST, you'll spend most of your time defining the media types used to transfer representations of your resources between client and server.
By exposing more resources, you can cut down on the amount of data that needs to be transferred during each request. Also notice there are no query parameters in the URIs. The server can be as stateful as it needs to be to keep track of it all, and each request can be fully self-contained.
REST is for APIs, not (typical) applications. Don't try to wedge a fundamentally stateful interaction into a stateless model just because you read about it on wikipedia.
To be Restful, should all this information be included with every request? This seems to place a huge overhead on the network. Also, wouldn't the restrictions on URL length, at least for GET, make this unfeasible?
The size of parameters is usually insignificant compared to the size of resources the server sends. If you're using such large parameters that they are a network burden, place them on the server once and then use them as resources.
There are no significant restrictions on URL length -- if your server has such a limit, upgrade it. It's probably years old and chock-full of security vulnerabilities anyway.
No all of that does not have to be in every request.
Each resource (medication, patient history, etc) should have a canonical URI that uniquely identifies it. In some applications (eg, Rails-based ones) this will be something like "/patients/1234" or "/drugs/5678" but the URL format is unimportant.
A client that has previously obtained the URI for a resource (such as from a search, or from a link embedded in another resource) can retrieve it using this URI.
Are you working on a RESTful API that other apps will use to search your data? Or are you building a end-user focused web application where users will log in and perform these searches?
If your users are logging in, then you're already stateful as you'll have some type of session cookie to maintain the logged in state. I would go ahead and create a session object that contains all the search filters. If a user hasn't set any filters, then this object will be empty.
Here's a great blog post about using GET vs POST. It mentions a URL length limit set by Internet Explorer of 2,048 characters, so you want to use POST for long requests.
http://carsonified.com/blog/dev/the-definitive-guide-to-get-vs-post/
Hopefully you'll see the problem I'm describing in the scenario below. If it's not clear, please let me know.
You've got an application that's broken into three layers,
front end UI layer, could be asp.net webform, or window (used for editing Person data)
middle tier business service layer, compiled into a dll (PersonServices)
data access layer, compiled into a dll (PersonRepository)
In my front end, I want to create a new Person object, set some properties, such as FirstName, LastName according to what has been entered in the UI by a user, and call PersonServices.AddPerson, passing the newly created Person. (AddPerson doesn't have to be static, this is just for simplicity, in any case the AddPerson will eventually call the Repository's AddPerson, which will then persist the data.)
Now the part I'd like to hear your opinion on is validation. Somewhere along the line, that newly created Person needs to be validated. You can do it on the client side, which would be simple, but what if I wanted to validate the Person in my PersonServices.AddPerson method. This would ensure any person I want to save would be validated and removes any dependancy on the UI layer doing the work. Or maybe, validate both in UI and in by business server layer. Sounds good so far right?
So, for simplicity, I'll update the PersonService.AddPerson method to perform the following validation checks
- Check if FirstName and LastName are not empty
- Ensure this new Person doesn't already exist in my repository
And this method will return True if all validation passes and the Person is persisted, False if Validation fails or if the Person is not persisted.
But this Boolean value that AddPerson returns isn't enough for me at the UI layer to give the user a clear reason why the save process failed. So what's a lonely developer to do? Ultimately, I'd like the AddPerson method to be able to ensure what its about to save is valid, and if not, be able to communicate the reasons why it's not invalid to my UI layer.
Just to get your juices flowing, some ways of solving this could be: (Some of these solutions, in my opinion, suck, but I'm just putting them there so you get an understanding of what I'm trying to solve)
Instead of AddPerson returning a boolean, it can return an int (i.e. 0 = Success, Non Zero equals failure and the number indicates the reason why it failed.
In AddPerson, throw custom exceptions when validation fails. Each type of custom exception would have its own error message. In addition, each custom exception would be unique enough to catch in the UI layer
Have AddPerson return some sort of custom class that would have properties indicating whether validation passed or failed, and if it did fail, what were the reasons
Not sure if this can be done in VB or C#, but attach some sort of property to the Person and its underlying properties. This "attached" property could contain things like validation info
Insert your idea or pattern here
And maybe another here
Apologies for the long winded question, but I definately like to hear your opinion on this.
Thanks!
Multiple layers of validation go well with multi-layer apps.
The UI itself can do the simplest and quickest checks (are all mandatory fields present, are they using the appropriate character sets, etc) to give immediate feedback when the user makes a typo.
However the business logic should have the lion's share of validation responsibilities... and for once it's not a problem if this is "repetitious", i.e., if the business layer re-checks something that should already have been checked in the UI -- the BL should check all the business rules (this double checks on UI's correctness, enables multiple different UI clients that may not all be perfect in their checks -- e.g. a special client on a smart phone which may not have good javascript, and so on -- and, a bit, wards against maliciously hacked clients).
When the business logic saves the "validated" data to the DB, that layer should perform its own checks -- DBs are good at that, and, again, don't worry about some repetition -- it's the DB's job to enforce data integrity (you might want different ways to feed data to it one day, e.g. a "bulk loader" to import a number of Persons from another source, and it's key to ensure that all those ways to load data always respect data integrity rules); some rules such as uniqueness and referential integrity are really best enforced in the DB, in particular, for performance reasons too.
When the DB returns an error message (data not inserted as constraint X would be violated) to the business layer, the latter's job is to reinterpret that error in business terms and feed the results to the UI to inform the user; and of course the BL must similarly provide clear and complete info on business rules violation to the UI, again for display to the user.
A "custom object" is thus clearly "the only way to go" (in some scenarios I'd just make that a JSON object, for example). Keeping the Person object around (to maintain its "validation problems" property) when the DB refused to persist it does not look like a sharp and simple technique, so I don't think much of that option; but if you need it (e.g. to enable "tell me again what was wrong" functionality, maybe if the client went away before the response was ready and needs to smoothly restart later; or, a list of such objects for later auditing, &c), then the "custom validation-failure object" could also be appended to that list... but that's a "secondary issue", the main thing is for the BL to respond to the UI with such an object (which could also be used to provide useful non-error info if the insertion did in fact succeed).
Just a quick (and hopefully helpful) comment: when you're wondering where to place validation, try pretending that, soon, you're going to completely recreate your UI layer using a technology you're not yet so familiar with**. Try to keep out of that layer any validation-like business logic that you know for certain you'd have to rewrite in the new technology.
You'll find exceptions - business logic that ends up in your UI layer regardless, but it's a useful consideration nonetheless.
** Mobile dev, Silverlight, Voice XML, whatever - pretending you don't know the technology of your "new" UI layer helps you abstract your concerns and get less mired in implementation details.
The only important points are:
From the perspective of the front-end(s), the Middle Tier must perform all validation, you never know whether someone is going to try circumventing your front-end validation by talking directly to your Middle Tier (for whatever reason)
The Middle Tier may elect to delegate some of that validation to the DB layer (e.g. data integrity constraints)
You may optionally duplicate some validation in the UI, but that should only be for the sake of performance (to avoid round-trips to the Middle Tier for common scenarios, such as missing mandatory fields, incorrectly formatted data, etc.) These checks should never take the place of doing them in the Middle Tier
Validation should be done at all three levels.
When I am in a project I assume I am making a framework, which most of the time is not the case. Each layer is separate and must check all layers input before doing an operation
Each level can have a different way of doing it, it is not necessary they all use the same, but ideally they should all use the same validation with the ability to customize it.
You never want to let bad data into the database. So you can never trust the data you are getting from the business layer. It needs to be checked.
In the business layer you can never trust the UI layer, and you must check it to prevent un-needed calls to the database layer. The UI layer works the same way.
I disagree with David Basarab's comment that the same validations should be present in all layers. This defies the paradigm of responsibility of layers for one reason. Secondly, though the main intention is to make the layers (or components) loosely coupled, it is also important that a level of responsibility (and hence trust) is endowed on the layers. Though it might be necessary to duplicated some validations in UI and Business Layer (since UI layer can be bypassed by hacking attempts), however, it is not advisable to repeat the validations in each layer. Each layer should perform only those validations which they are responsible for. The biggest flaw in repeting validations in all layers is code redundancy, which can cause maintenance nightmare.
A lot of this is more style than substance. I personally favor returning status objects as a flexible and extensible solution. I would say that I think there are a couple classes of validation in play, the first being "does this person data conform to the contract of what a person is?" and the second being "does this person data violate constraints in the database?" I think the first validation can, and should be done at the client. The second should be done at the middle tier. With this division, you may find that the only reasons the save could fail are 1)violates a uniqueness constrains, or 2)something catastrophic. You could then return false for the first case, and throw an exception for the other.
If tier R is closer to the user (or any input stream you don't control) than tier S then tier S should validate all data received from tier R. This does not mean that tier R shouldn't validate data. It's better for the user if the GUI warns him he's making a mistake before he attempts a new transaction. But no matter how bulletproof the validation in your GUI is, the next tier up should not trust that any validation has taken place.
This assumes your database in completely under your control. If not, you have bigger problems.
Also, you could have the UI pass the data needed to build a Person object through some sort of PersonBuilder object, so that object creation is consolidated in the domain/business layer, and you can keep the Person object in a state that is always consistent. This makes more sense for more complex entities, however even for simple ones, it is good to centralize object creation, just like you centralize persistence, etc.