2 JSON Schema Questions, Is the type keyword required and what is the differencen between Core and Validation - validation

Okay I have been UP and DOWN the internet and I cannot find an answer that DEFINITIVELY answers the following question.
"Is the type keyword required?" If it is not then can some one, for all that is holy, please, in EXCRUCIATING detail, describe what should happen when it is not provided, validation-wise.
I have found this...
http://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.6.1.1
But I have found so many other examples where a schema object can be defined and not have this keyword.
For example I have found this repo with testing examples.
https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft7/additionalProperties.json
Here they have a schema at line 5. It does not have a type but does look like they are talking about an object. Also on lines 21 - 25 they describe a test where an array is valid.
Can someone please clarify this for me.
Also for the second one,... What is the difference between the Core and the Validation as defined here...
https://json-schema.org/specification.html
Thank you in advanced

1. Is the type keyword required?
No. Keywords will respond to instances of the types they're designed for, otherwise they will be ignored (silently pass validation). So
{ "minimum": 5 }
will pass anything as long as it's not a number less than 5. Objects, strings, arrays, etc., all pass. But as soon as you introduce a number, this keyword becomes interested and it'll do its thing.
Every keyword has a type or set of types that it responds to. type is one of the ones that responds to all of them.
2. What are the different specs for?
We (the spec authors) thought it would make things a little simpler if we split the specification into two parts: one for the schema construction keywords (e.g. $id, $schema, allOf, properties, etc.), and one for value validation and annotation (e.g. minimum, minLength, etc.). It does mean that you have to look into several documents in order to create a validator, however.
It also allows us to revise one of them without the other, though we've never done that.
This split was done several iterations ago, and we've just kept it as it seems to work well.

Related

Is it a good practice in protobuf3 using optional to check nullability?

I noticed that they bring optional back in protobuf 3.15. I'm trying to use optional to check field presence. But I'm still unclear regarding the philosophy behind this.
Here is my usecase:
I'm providing some services that accept protobuf as my input. But the client side is untrusted from my perspective, therefore I have to check the nullability for the input protobuf.
The way I expect is that,
for a required field, either it's set, or it's null,
for an optional field, I don't care I can just use a default value and that won't cause any problem from my system
So I end up adding optional to every field that should not be null so that I can use hasXXX to check the presence. This looks weird to me because those fileds are actually required from my perspective, but I have to add optioanl keyword for them all.......I'm not sure whether this is a good practice. Proto experts pls give me some suggestions.
Also the default value doesn't make sense to me at all regarding nullability checking, since zero or empty string usually have their own meaning in many scenarios.
The entire point of optional in proto3 is to be able to distinguish between, for example:
no value was specified for field Foo
the field Foo was explicitly assigned the value that happens to be the proto3 default (zero/empty-string/false/etc)
In proto3 without optional: the above both look identical (which is to say: the field is omitted)
If you don't need to distinguish between those two scenarios: you don't need optional, but using optional also isn't likely to hurt much - worst case, a few extra zero/empty-string/false values get written to the wire, but they're small anyway.
Google's API Design guide discourages the usage of the optional keyword. Better practice is to make use of the google.api.field_behavior annotation for describing the field's behaviour.
It is however not recommended to use the optional annotation at all [1]. If one consistently implements the field behaviour annotations then OPTIONAL is redundant and can be omitted.
Check out AIP 203 for an overview of the various behaviour types along with guidelines around the usage of OPTIONAL fields.
In general, Google's API Improvement Proposals are a great reference for good practices in your API design.

how to respect Post.CommentsAllowed if Post and Comment are separate aggregate roots?

In a classic example of 2 aggregate roots like:
class Post
{
string AuthorId;
string Title;
string Content;
boolean AllowComments;
...
}
class Comment
{
string AuthorId;
string Content;
DateTime Date;
...
}
When creating new comment, how to ensure that comments are added only to the post that have Post.AllowComments = true?
Having on mind that when user starts writing comment Post.AllowComments could very well be true but in the meantime (while comment is being written) the Post author might change it to false.
Or, even at the time of the submission => when we check Post.AreCommentsAllowed() it could return true but then when we do CommentRepository.Save(comment) it could be false.
Of course, one Post might have many Comments so it might not be practical to have single aggregate where Post have collection of Comments.
Is there any other solution to this?
PS.
I could do db transaction within which i'd check it but i'm looking for a DDD purist solution.
i'm looking for a DDD purist solution.
Basic idea first: if our Comments logic needs information from our Post aggregate, then what we normally do is pass a copy of that information as an argument.
So our application code would, in this case, get a local copy of AllowComments, presumably by retrieving the handle to the Post aggregate root and invoking some query in its interface.
Within the comments aggregate, you would then be able to use that information as necessary (for instance, as an argument to some branching logic)
Race conditions are hard.
A microsecond difference in timing shouldn’t make a difference to core business behaviors. -- Udi Dahan
If data here must be consistent with data there, then the answer is that we have to lock both when we are making our change.
In an application where information is stored locally, that's pretty straight forward, at least on paper: you just need to acquire locks on both objects at the same time, so that the data doesn't change out from under you. There's a bit of care required to ensure that you get deadlocked (aka the dining philosophers problem).
In a distributed system, this can get really nasty.
The usual answers in "modern purist DDD" is that you either relax the consistency requirement (the lock that you are reading is allowed to change while you are working) and you mitigate the inconsistencies elsewhere (see Memories, Guesses, and Apologies by Pat Helland) OR you change your aggregate design so that all of the information is enclosed within the same aggregate (here, that would mean making the comment entities parts of the post aggregate).
Also: creation patterns are weird; you expect that the entity you are intending to create doesn't yet exist in the repository (but off the happy path maybe it does), so that business logic doesn't fit as smoothly into the usual "get the handle from the repository" pattern.
So the conditional logic needs to sneak somewhere else -- maybe into the Post aggregate? maybe you just leave it in the application code? ultimately, somebody is going to have to tell the application code if anything is being saved in the repository.
As far as I can tell, there isn't a broad consensus on how to handle conditional create logic in DDD, just lots of different compromises that might be "good enough" in their local context.

Defining mutations in GraphQL via fields: Is this bad practice?

Suppose you have a user type, and a user has many posts. Then imagine you want to find a user, and delete all of their posts. One way to do this is to implement the following mutation field:
field deleteAllPosts, types[Types::PostType] do
argument :user_id, types.String
resolve -> (obj,args,ctx){
posts = Posts.where(user_id:args[:user_id])
posts.each{|post| post.destroy}
}
end
Then the query
mutation {
deleteAllPosts(user_id:1)
}
will delete all the posts of the user with id 1.
Before I did this, I thought about doing it a different way, which I've not seen anyone else do. I wanted to check that this different way doesn't have any pitfalls, or reasons I shouldn't use it.
The idea is to instead put a deletePost field for PostType, and a findUser field on mutation (which would typically be a query field). Assuming it's obvious how those fields would be defined, I would then make the query
mutation{
findUser(id:1){
posts{
deletePost{
id
}
}
}
}
Is this a bad idea?
Edit in response to feedback: One thing I'm concerned about is the possibility that a user could, in principle, make the deletePost selection inside of a query. But I'm tempted to say that that's "their fault". I'd like to say "this selection can only be made if it is inside of a mutation query", but I don't think that's possible in GraphQL.
In order to avoid the XY problem, here is why I am keen to use this idea rather than the initial one. It feels more expressive (said differently, it feels less redundant). Suppose that, after a while, you decide that you want to delete all the posts for those users belonging to a particular group. Then in what I regard as the 'convention', you should create a whole new mutation field:
field deleteAllPostsInGroup, types[Types::PostType] do
argument :group_id, types.String
resolve -> (obj,args,ctx){
posts = Group.find_by(args[:group_id]).users.map{|u| u.posts}.flatten
posts.each{|post| post.destroy}
}
end
whereas in my suggested convention you just define a trivial findGroup field (but you have to define it on mutation, where it doesn't belong), and make the query:
mutation{
findGroup(id:1){
users{
posts{
deletePost{
id
}
}
}
}
}
I suppose that really what I'm trying to do is use a query to find some data, and then mutate the data I've found. I don't know how to do this in GraphQL.
Second Edit: It seems like there is a well-defined component of this question, which I have asked here. It may turn out that one of these questions answers the other, and can be closed, but I don't know which way round yet.
This is basically a code quality issue and is similar to asking about the point of the DRY principle or encapsulation.
A quote from https://graphql.org/learn/queries/ reads:
In REST, any request might end up causing some side-effects on the server, but by convention it's suggested that one doesn't use GET requests to modify data. GraphQL is similar - technically any query could be implemented to cause a data write. However, it's useful to establish a convention that any operations that cause writes should be sent explicitly via a mutation.
This is a good convention as it makes maintenance, testing and debugging easier. Side-effects, whether intentional or not, can be awfully difficult to track and understand. Particularly if you have them in GraphQL queries, which can be arbitrarily large and complex. There is nothing preventing you from querying and modifying the same object and it's siblings at the same time, and doing this multiple times in one query by simple nesting. It is very easy to get this wrong.
Even if you pull it off, code readability and maintainability suffer. E.g. if you knew that only your mutations ever modified the data, and queries had no effect on it, you would immediately know where to start looking for the implementation of a particular behaviour. It is also a lot easier to reason about how your program works in general.
If you only write small, properly named, granular mutations, you can reason about what they do more easily than you could if you had a complex query which updated different data at different points.
Last but not necessarily least, sticking to conventions is useful if you ever need to transfer your work to someone else.
In short - it is all about making the lives of yourself and others easier in the future.
EDIT
OK, so I see where you are going with this - you want to give the flexibility of a GraphQL query to the mutations. Sure, this particular example would work. Not going this way would only be about the future. There is no point in discussing this if deletePost is the only operation you will ever define.
If that's not the case, then what if you wanted to delete, let's say, 5 specific user posts? Would you give extra parameters to findGroup and then pass those down the tree? But then why does findGroup method have to know about what you will do with it's results? That kind of defies the idea of a flexible query itself. What if you also wanted to perform mutations on users? More parameters for findGroup? What if users and posts can be queried in a different way, like, users by domains, posts by categories, etc.? Define the same parameters there too? How would you ensure that with every operation (especially if you do a few of them at once) all the relational links are properly erased in your database? You would have to imagine every possible combination of queries and query-mutations and code appropriately for them. Since query size is unlimited it could end up being very hard to do. And even if the purpose of an individual query-mutation (deletePost) is clear and easy to grasp, the overall query would not be. Quickly your queries would become too complex to understand even for you and you'd probably begin breaking them down to smaller ones, which would only do specific mutations. This way you'd go back to the original convention but a more complex version of it. You would probably also end up defining some regular mutations too. How would you update or add posts, e.g.? That would spread your logic all over the place.
These questions would not occur if you were writing mutations. That's slightly more work in exchange for better maintainability.
These are all potential issues in the future (and there are probably more). If these don't concern you, then please go ahead with the implementation. I personally would run away from a project that did this, but if you are really clever, I don't see anything that would technically completely prevent you from achieving what you want :]

What's a good way to make a type a plural when writing comments?

When writing comments, I sometimes find myself needing to talk about a type (class, struct, etc.) in plural when writing comments, such as:
/*
* getThings
* Get a list of --> Things <-- from somewhere.
*/
Thing *getThings(void);
The problem is, the type name is singular (namely, Thing), but I want to talk about them in plural in comments.
If I say Things, it suggests to the reader it's talking about a type called Things, which is not the case. If I say Thing's, it looks awkward because it's not grammatically correct (it's either possessive or "Thing is", not plural). I could talk around the problem and say a list of Thing items
What's a good convention to stick to when writing plurals of types?
Well, depending on the documentation system you're using, you can wrap the name of the type in a special syntax and put the s outside it. For example:
.NET XML comments
Get a list of <see cref="Thing"/>s from somewhere.
doxygen C/C++ comments
Get a list of \link Thing \endlink s from somewhere.
Not 100% certain on the doxygen variant but it should be something like that.
And if you're not using a particular documentation system and thus have no special comments, I'd do something like:
Get a list of [Thing]s from somewhere.
Or you could use ( ) or { }, depending on preference...
I would use the 's' in parentheses.
/* Get a list of Thing(s) from somewhere */

Should a method parameter name specify its unit in its name?

Of the following two options for method parameter names that have a unit as well as a value, which do you prefer and why? (I've used Java syntax, but my question would apply to most languages.)
public void move(int length)
or
public void move(int lengthInMetres)
Option (1) would seem to be sufficient, but I find that when I'm coding/typing, my IDE can indicate to me I need a length value, but I typically have to break stride and look up the method's doco to determine the units, so that I pass in the correct value (and not kilometres instead of metres for example). This can be an annoying interruption to a thought process. Option (2) alleviates this problem, but can be verbose, particularly if your unit is metresPerSecondSquared or some such. Which do you think is the best?
I would recommend making your parameter (and method) names as clear as possible, even if they become wordy. You'll be glad when you look at or use the code in 6 months time, or when someone else has to look at your code.
If you think the names are becoming too long, consider rewording them. In your example you could use parameter name int Metres that would probably be clear enough. Consider changing the method name, eg public void moveMetres(int length).
In Visual Studio, the XML comments generated when you enter 3 comment symbols above a method definition will appear in Intellisense hints when you use the method in other locations. Other IDEs may have similar functionality.
Abbreviations should be used sparingly. If absolutely necessary only use commonly known and/or relevant industry-standard abbreviations and be consistent, ie use the same abbreviation everywhere.
Take a step back. Write the code then move on to something else. Come back the next day and check to see if the names are still clear.
Peer reviews can help too. Ask someone who knows the programming language (or just thinks logically), but not the specific functionality, if your naming scheme is clear enough or to help brainstorm alternatives. They might be the poor sap who has to maintain your code in the future!
I would prefer the second approach (i.e. lengthInMeters) as it describes the input needed for the method accurately. The fact that you find it confusing to figure out the units when you are just writing the code would imply it would be much more confusing when you (or some one) looks at the same piece of code later. As regard to issue of the variable name being longer you can find ways to abbreviate it (say "mtrsPerSecondSquared").
Also in defence second approach, the book Code Complete mentions a research that indicates, effort required to debug a program was minimized when variables had names averaged to 10 to 16 characters.

Resources