Modelling calculated fields in Rest API - performance

It is a common practice for Restful resources to support field selectors in the query string. For example, if a resource has fields A,B,C and D but the client is interested only in a subset of fields (say A and B) then the Url might look like
.../resource/1/?fields=A,B // only A and B are 'selected'
Now supposed we add another property to the resource. The thing with this property is that it does not have any physical storage. It is a computed value. Also suppose that this computation is very expensive.
Now obviously, Rest does not care about such things, whether data comes from a file a DB or a fancy algorithm.
But here comes a dilemma: the 'fields' query parameter is always optional. In my case, omitting 'fields' means "bring all the fields" (much like '*' in SQL):
.../resource/1 // A,B,C,D and E(xpensive) are 'selected'
I am positive that there are many existing clients that are using the naive approach (not bothering to specify an explicit list of fields). This means that adding this new heavy property will unintentionally create a performance break (possibly a very severe one).
What are the common techniques to cope with these situations?
Alternatives I considered:
Add a special notion to the system that says that querying with '*' semantics will not necessarily return ALL the fields (heavy fields will be omitted by default). If a client wants them- he must ask for them explicitly
Not to model these extra properties as fields on the resource. Instead expose a dedicated endpoint that will carry out the computation, thus eliminating possible confusion but introducing Rest-RPC style into the system.
Make it the cilent's problem: if he did not bother to be explicit in the first place, tough for him. That is not really an option- don't have this privilege.

I like option 1 best. I'd consider representing these fields as referenced object, and then you could get to them HATEOAS style, through separate calls for the heavy fields. This is similar to how web-pages behave -- return the framework and some content, then force extra calls if the user wants the images, videos, etc. –
Take a look at this: spring.io/understanding/HATEOAS, this: timelessrepo.com/haters-gonna-hateoas, and this: stackoverflow.com/questions/tagged/hateoas

Related

DDD - Life cycle of Value Objects: Validation & Persistence

I understand what is VO (immutable, no-identity, ...). But I have several questions that are from discussion with my co-workers.
A) Validation - How precise should it be?
What types of validation should I put into VO? Only basic ones or all? Good example of VO is Email with some regexp validation. I've seen it many times. I've worked on several big/medium-size applications and regexp wasn't good enough because:
system-A: Domain name of email was validated, eg test#gmali.com in invalid email because domain gmali.com doesn't exist
system-B: We had list (service) of banned domains of "temporary email services" because we wanted to avoid of "fake accounts"
I cannot imagine to put validation of this kind into VO, because it require network communication and VO will be complicated (and slow).
B) Validation: Names, Titles, Strings... is length part of VO?
Programmer can use old-good string data type. I can image VO as NotEmptyString, but is it good approach to create value objects as:
FirstName (non-empty string with length limitation)
Surname (non-empty string with length limitation)
StreetName(non-empty string with length limitation)
There is no difference between FirstName and Surname, because in application we cannot find out if some one swap first name and surname in form. Robert can be first name and it can be also surname...
class Person
{
private string $firstName; // no VO for firstName
// or
private FirstName $firstName; // VO just for firstName & length validation
// or
private NotEmptyString $firstName; // VO - no max length validation
// or
private StringLength50 $firstName; // same as FirstName, different name for sharing
}
Which approach is the best and why?
C) Copy of VO: Providing "Type-Safety" for entity arguments?
This point is similar to previous one.
Is it good practice to create classes like this:
class Surname extends Name
{
}
class FirstName extends Name
{
}
just "to alias" VO?
D) Persistence: Reading stored VO
This point is closely related to first one: A) Validation - How precise should it be?. I strongly believe what is stored in my "storage engine" (DB) is valid - no questions. I don't see any reason why I should validate VO again when everything was validated during "persistence step". Even complex/poorly-written regexp could be performance killer - listing of N-hundreds on user emails.
I'm lost here... should I validate only basic stuff and use same VO during persist and read or should I have 2 separate VO for these cases?
E) Persistence/Admin: Something like "god" in the system.
From my experience: In real-word system user with higher privileges can sometimes by-pass validation rules and this is again related to point A) Example:
you (as regular user of system) can make online reservation up to 30 days from today
admin user can make online reservation to any date
Should I use only Date / FutureDate VO or what?
F) Persistence: Mapping to DB data-types
Is it good practice to closely bound VO and DB (storage engine) data types?
If FirstName can have only 50 chars should it be defined / mapped to VAR_CHAR(50)?
Thanks.
A) Validation - How precise should it be?
It's not about precision, it's about invariants & responsibility. A value object (VO) can't possibly have authority on whether or not an email address exists. That's a fact that varies and can't be controlled by the VO. Even if you had code such as the following:
var emailAddress = EmailAddress.of('some#email.com', emailValidityChecker);
The address may not exist a few minutes later, the user may have lost his account password forever, etc.
So what does EmailAddress should represent? It should ensure the "format" of the address makes it a usable & useful address in your domain.
For instance, in a system responsible for delivering tax reminders, I had a limitation where I had to use Exchange and it couldn't support certain email formats like addresses with "leading, trailing or consecutive dots in the local-part" (took the exact comment I had put).
Even though that's a technical concern in theory, that means our system couldn't ingest such email addresses and they were completely useless to us so the ValidEmailAddress VO did not accept those to fail early (otherwise it was generating false positives down the chain).
B) Validation: Names, Titles, Strings... is length part of VO?
I would, even though such lengths might sometimes feel somewhat arbitrary or infrastructure-driven. However, I think it's safe to say that a name with 500 characters is certainly a mistake. Furthermore, validating with reasonable ranges can protect against attacks (e.g. a 1GB name???). Some may argue that it's purely an infrastructure concern and would put the validation at another layer, but I disagree and I think the distinction is unhelpful.
The length rules aren't always arbitrary, for instance a TweetMessage that can't be longer than 280 chars, that's a domain rule.
Does that mean you must have a VO for every possible strings in the system? Honestly I pushed backed being scared to overuse VOs and edge towards a VO-obsession rather than primitive obsession, but in almost every scenario I wished I just took the time to wrap that damn string.
Be pragmatic, but I see more harm in underusing than overusing VOs.
C) Copy of VO: Providing "Type-Safety" for entity arguments?
I most likely wouldn't extend Name just for the sake of reuse here. There's most likely no place where you'd want to interchange a Surename with a FirstName so polymorphism is pretty useless too. However, the explicit types may help to interchange "surename" for "first name" and vice-versa.
Independently of whether or not the explicit types are useful, something more useful here might be to aggregate both under a FullName VO that creates increases cohesion.
Please beware that overly restrictive name policies has been a huge pain point for many international systems though...
D) Persistence: Reading stored VO
Persisted data lives on the "safe" side and should NOT be validated again when loaded into memory. You should be able to circumvent the validation path when hydrating our VOs.
E) Persistence/Admin: Something like "god" in the system.
VOs are great to enforce their "invariants". An invariant by definition doesn't vary given the context. That's actually something many misunderstood when saying "always-valid" approach doesn't work.
That said, even system admins most likely can't make new reservations in the past, so perhaps that can be an invariant of a ReservationDate. Ultimately you would most likely extract the other rules in the context to which they belong.
F) Persistence: Mapping to DB data-types
I think it's more important to reflect the DB limitation in the domain than inversely, reflect the domain limitation in the DB. If your DB only accepts 50 chars and you exceed that some systems will just crash with a very cryptic error message not even telling you which column overflowed. Validating in the domain would help debugging much more quickly. However, you do not necessarily have to match the domain rule in the DB.
DDD, like any other design, is a matter of drawing lines and making abstract rules. Some rules may be very strict, while others may be fluent to some extent. The important thing is to keep consistency as much as possible, rather than striving to build the ultimate-undefeatable domain.
Validation - How precise should it be?
"Heavy" validations should not occur inside VO. A VO is not very
different in its nature from the primitive it encapsulates, therefore
validations should be independent of external factors. Please recall that
even primitives such as byte may be internally validated: an exception (sometimes even a compile error) occurs when a byte variable is assigned with value greater than 255.
Advanced validations, on the other hand, belong to the flow part (use-case / interactor / command-handler), since they involve operations beyond the scope of the VO's primitive, such as querying databases or invoking APIs. You can, for example, query a list of valid email providers from database, check if VO's provider contained in list, and throw exception if not. This is simply flow.
You may, of course, decide to have an in-memory static list of email providers, in which case it will be perfectly valid to keep it inside VO and check its primitive against that list. No need to communicate with external world, everything is "local". Would it be scalable? probably not. But it follows a DDD rule stating that VO should not "speak" with external resources.
Validation: Names, Titles, Strings... is length part of VO?
VOs, much like other DDD concepts, should "speak out loud" your business domain, meaning that they should express business semantics. This is why FirstName, Surname and StreetName are good names, while NotEmptyString is less preferable due to the fact it communicates technical rather than business details.
If, for example, your business states that customers with a more-than-50-characters-length name are to be addressed differently than customers with a less-than-50-characters-length name, then you probably should have two VOs, e.g. LongFirstName, ShortFirstName.
True, several VOs may require exactly the same validations, e.g. both StreetName and CityName must start with a capital and length cannot exceed 100. Does this mean we have to make great effort to avoid duplications in the name of "reusability"? I would say no, especially if avoiding duplications means having a single VO named CapitalHeadStringUpTo100Characters. Such name conveys no business purpose. Moreover, if CityName suddenly requires additional validations, breaking CapitalHeadStringUpTo100Characters into two VOs may require much work.
Copy of VO: Providing "Type-Safety" for entity arguments?
Inheritance is a tool provided by development platform, it is more than OK to use it, but only to the point where things get messy or too abstract. Remember, VO only expresses a specific domain-approach principle. The polymorphism OOP principle, on the other hand, which of course may be applied in DDD applications, is tightly coupled with abstraction concepts (i.e. base classes), and I would say it should fit better to the entities model part.
BTW, you can find on web several implementations for a base VO class.
Persistence: Reading stored VO
System designs were to be of less importance if the same validations had occurred over and over again in different points of a single use case. Unless you have a reason to believe that your database can be altered by external components, it is sufficient to reconstitute an entity from database without re-validating. Also keep in mind that a typical entity may embed at least one VO, which is the same VO used both in "persistence step" (when entity is being constructed) and in "reading step" (when being reconstituted).
Persistence/Admin: Something like "god" in the system.
Multitenancy applications can be aware of multiple kinds of users. Software does not care if one user is more powerful than another one, it is only subjected to rules. Whether you choose to have a single general entity User with VO FutureDate allowed to be set with null, or two entities User, Admin with VOs FutureDate (not null), FutureDate (nullable) respectively, is less of our interest here. The important thing is that multitenancy can be achieved through smart usage of dependency injection: system identifies user privileges and infers what factories, services or validations are to be injected.
Persistence: Mapping to DB data-types
It really depends on level of maturity in the DDD field. Applications will always have bugs, and you should have some clue on your business's bug-tolerance level in case you choose to design a lenient database.
Aside of that, keep in mind that no matter how much effort you put into it, your database can probably never reflect the full set of business invariants: limiting a single VO to some length is easy, but setting rules involving multiple VOs (that is when one VO's validity depends another VO) is less convenient.

Changing hyperledger-composer resource definition

So as a project matures it will almost certainly be necessary to modify attributes of the resource definitions to cope with additional requirements.
Let's use two trivial examples - to add a country code to a client address, or to remove a middle initial and swap in a middle name field instead.
Currently if the resource definition changes, composer won't read whatever values are extant in the repository. I didn't exhaustively try all combos, but have had to reconstitute my blockchain at least twice because of this problem.
Is there a way to mark fields either as "new" or "deprecated" to get past this that I overlooked? It will be hard to make a case to move a system that can't be changed forward to production.
In the same vein it doesn't seem to like empty or null strings much (at least for participant attributes). Having an "optional" override somewhere would save a lot of extra bounds checking in my application. Is there one of those I missed too?
So you can use the APIs or REST to expose the legacy data? You may be referring to Playground above (its not really a tool for looking at production data, its for model prototyping/sandbox/testing type stuff).
On optional question - can just add that the field is optional in the model - example here -> https://github.com/hyperledger/composer-sample-networks/blob/master/packages/pii-network/models/pii.cto#L20

Trying to identify if a data injection method has a name already

Lets say we have a class "Car" than has different pieces of data ( maker, model, color, fabrication date, registration date, etc). The class has no method to get data, but it knows to as for it from another object (sent via constructor, let's cal it for short DS).- and the same for when needing to update changes.
A method getColor() would be implemented like this
if(! this->loaded('color')){
this->askDS('color') // this will do the necesarry work to generate a request to DS
}
return this->information('color');
Nothing too fancy so far. No comes the part i want to find out if it has a name, or if there are libraries / frameworks that do this already.
DS has a list of methods registered dinamically based on the class that needs data. For car we have:
input: car serial number, output: method to use to read the numbers to extract raw values
input: car raw color value, output: color code
input: car color code, manufacturer, year, mode, output:human-readable color (for example navy blue)
Now, DS or any method does not have an ordered list of using command to start from serial number and return the color blue, but if can construct a chain of methods that from one set of data, it can run them in order and get the desired data.
For our example above, DS runs 1,2,3 in that order and injects the data resulted from all methods into the class object that needed it.
Now if the car needs registration info, we have method (4) that gets that from the police database with an api request.
So, given:
- a type of model (class/object)
- a list of methods that take a fixed list of input(object properties) and give out a fixed list of output (object properties)
- a class DS that can glue the methods and run the needed ones for a model to get from property A (serial) to properby B (human readable colour) without the model or DS having a preconfigured way to get this data but finding it as needed.
does this have a name or is it already implemented somewhere ?
I've implemented a very basic prototype and it works very nice and i think this implementation method has useful features:
if you have a set of methods that do sql queries and then your app switches to using an api, you only need to change the methods and don't have to touch any other part of the application
when looking for a chain of methods that resolve the 'need' the object has, you can find a method chain, run it, if it fails keep looking for another list of methods based on the currently available data - so if you have multiple sources for a piece of data, it can try multiple versions
starting from the above paragraph i could start with an app that only has sql queries for data retrieval - when i find out a part of the app overloads the sql server i could add a method to retrieve data from cache with a lower cost than the one from database (or multiple layered caches, each with different costs)
i could probably add business logi in the mix the same ways as cache, and based on the user location / options present different data
this requires less coding overall, and decouples the data source from the object, making each piece easier to mock/test
what is needed to make this fast is a caching solution for the discovered method chains, since matching hundreds of thousands of methods per model type would be time-consuming but I don't think this is very hard to do - just store all found chains in memory as you find them and some metadata to be able to resume a search from any point in time - when you update the methods, just clear the cache, take a performance hit for the first requests
Thank you for your time
What you describe sounds like a somewhat roundabout way of doing Dependency Injection. Quote:
"Passing the service to the client, rather than allowing a client to
build or find the service, is the fundamental requirement of the
pattern."
Depending on what language you're using, there should be several Dependency Injection frameworks/libraries available.

Best practice with coding system values

I think this should be an easy one, but haven't found any clear answer, on what would the best practice be.
In an application, we keep current status of an order (open, canceled, shipped, closed ...).
This variables cannot change without code change, but application should meet the following criteria:
status names should be easily displayed in different languages,
application can search via freetext status names (like googling for "open")
status_id should be available to developer via enum
zero headache when adding new statuses
Possible ways we have tackled this so far:
having DB table status with PK(id, language_id) and a separate enum which represents this statuses in an application.
PROS: 1.,2.,3. work out of the box, CONS: 4. needs to run update script on every client installation, SQL selects can become large and cumbersome, when dealing with a lot of code tables
having just enum:
PROS: 3.,4. CONS: 1.,2. is a total nightmare
having enums, which populate database tables on each start of an application:
PROS: 1.,2.,3.,4. work CONS: some overhead on application start, SQL select can become large and cumbersome, when dealing a lot code tables.
What is the most common way of tackling this problem?
Sounds like you summarized it pretty good yourself, and comparing the pros/cons points towards #3. Just one comment when you implement #3 though:
Use a caching mechanism (even a simple HashMap!) plus adding the option to refresh the cache - will ease your work when you'll want to change values (without the need to restart every time!).
I would, and do, use method 3 because it is the best of the lot. You can use resource files to store the translations in and map the enum values to keys in the resource files. Your database can contain the id of the enum for the status.
1.status names should be easily displayed in different languages,
2.application can search via freetext status names (like googling for "open")
These are interfaces layer's concern, you'd better not mix them in you domain model.
I would setup a mapping between status enum and i18n codes. the mapping could be stored in a file (cached in memory) or hardcoded.
for example: if you use dto or view adatper to render your ui.
public class OrderDetailViewAdapter {
private Order order;
public String getStatus() {
return i18nMapper.to(order.getStatus());//use hardcoded switch case or file impl
}
}
Or you could done this before you populating you dtos.
You could use a similar solution for goal2. When user types text, find corresponding enum from mapping and use enum for search.
Anyway, use db tables the less the better.
Personally, I always use dedicated enum class inside domain. Only responsibility of this class is holding status name (OPEN, CANCELED, SHIPPED, ...). Status name is not visible outside codebase. Also, status could be also stored inside database field as string (varchar or similar).
For the purpose of rendering, depending of number of use cases, sometimes I implement formatting inside formatter (e.g. OrderFormatter::formatStatusName(), OrderFormatter::formatAbbreviatedStatusName(), ...). If formatting is needed often I create dedicated class with all formatting styles needed (OrderStatusFormatter::short(), OrderStatusFormatter::abbriviated()...). Of course, internal mapping is needed to map status name to status title, and this is tricky part. But if you want layering you can't avoid mapping.
Translation is not dealt so far. I translate strings inside templates so formatters are clean of that responsibility. To summarize:
enum inside domain model
formatter inside presentation layer
translation inside template
There is no need to create special table for order status translations. Better choice would be to implement generic translation mechanism, seperated from your business code.

Which one do you prefer for Searching/Reporting DataTable or DTO or Domain Class?

The project currently I am working in requires a lot of searhing/filtering pages. For example I have a comlex search page to get Issues by data,category,unit,...
Issue Domain Class is complex and contains lots of value objects and child objects.
.I am wondering how people deal with Searching/Filtering/Reporting for UI. As far As I know I have 3 options but none of them make me happier.
1.) Send parameters to Repository/DAO to Get DataTable and Bind DataTable to UI Controls.For Example to ASP.NET GridView
DataTable dataTable =issueReportRepository.FindBy(specs);
.....
grid.DataSource=dataTable;
grid.DataBind();
In this option I can simply by pass the Domain Layer and query database for given specs. And I dont have to get fully constructed complex Domain Object. No need for value objects,child objects,.. Get data to displayed in UI in DataTable directly from database and show in the UI.
But If have have to show a calculated field in UI like method return value I have to do this in the DataBase because I don't have fully domain object. I have to duplicate logic and DataTable problems like no intellisense etc...
2.)Send parameters to Repository/DAO to Get DTO and Bind DTO to UI Controls.
IList<IssueDTO> issueDTOs =issueReportRepository.FindBy(specs);
....
grid.DataSource=issueDTOs;
grid.DataBind();
In this option is same as like above but I have to create anemic DTO objects for every search page. Also For different Issue search pages I have to show different parts of the Issue Objects.IssueSearchDTO, CompanyIssueTO,MyIssueDTO....
3.) Send parameters to Real Repository class to get fully constructed Domain Objects.
IList<Issue> issues =issueRepository.FindBy(specs);
//Bind to grid...
I like Domain Driven Design and Patterns. There is no DTO or duplication logic in this option.but in this option I have to create lot's of child and value object that will not shown in the UI.Also it requires lot's ob join to get full domain object and performance cost for needles child objects and value objects.
I don't use any ORM tool Maybe I can implement Lazy Loading by hand for this version but It seems a bit overkill.
Which one do you prefer?Or Am I doing it wrong? Are there any suggestions or better way to do this?
I have a few suggestions, but of course the overall answer is "it depends".
First, you should be using an ORM tool or you should have a very good reason not to be doing so.
Second, implementing Lazy Loading by hand is relatively simple so in the event that you're not going to use an ORM tool, you can simply create properties on your objects that say something like:
private Foo _foo;
public Foo Foo
{
get {
if(_foo == null)
{
_foo = _repository.Get(id);
}
return _foo;
}
}
Third, performance is something that should be considered initially but should not drive you away from an elegant design. I would argue that you should use (3) initially and only deviate from it if its performance is insufficient. This results in writing the least amount of code and having the least duplication in your design.
If performance suffers you can address it easily in the UI layer using Caching and/or in your Domain layer using Lazy Loading. If these both fail to provide acceptable performance, then you can fall back to a DTO approach where you only pass back a lightweight collection of value objects needed.
This is a great question and I wanted to provide my answer as well. I think the technically best answer is to go with option #3. It provides the ability to best describe and organize the data along with scalability for future enhancements to reporting/searching requests.
However while this might be the overall best option, there is a huge cost IMO vs. the other (2) options which are the additional design time for all the classes and relationships needed to support the reporting needs (again under the premise that there is no ORM tool being used).
I struggle with this in a lot of my applications as well and the reality is that #2 is the best compromise between time and design. Now if you were asking about your busniess objects and all their needs there is no question that a fully laid out and properly designed model is important and there is no substitute. However when it comes to reporting and searching this to me is a different animal. #2 provides strongly typed data in the anemic classes and is not as primitive as hardcoded values in DataSets like #1, and still reduces greatly the amount of time needed to complete the design compared to #3.
Ideally I would love to extend my object model to encompass all reporting needs, but sometimes the effort required to do this is so extensive, that creating a separate set of classes just for reporting needs is an easier but still viable option. I actually asked almost this identical question a few years back and was also told that creating another set of classes (essentially DTOs) for reporting needs was not a bad option.
So to wrap it up, #3 is technically the best option, but #2 is probably the most realistic and viable option when considering time and quality together for complex reporting and searching needs.

Resources