Related
I want to write some Rest(ful) application with Spring Boot and Spring Data JPA.
Let's assume that for business reasons I have a database with the following tables:
customer(id number, first_name text, last_name text, type text);
customer_type(type text, description text);
where:
id is generated by the database at inserion time
type column in customer table is a foreign key to type column in customer_type table and it is immutable from a microservice point of view, just a lookup table.
Assuming I want to create APIs for CRUD operations on a customer but want to minimize api calls when just reading, I suppose I need the following operations:
GET /customer/{id}
POST /customer
PUT /customer/{id}
DELETE /customer/{id}
How the body should be structured?
For GET operation the response should be
{
"id":123,
"firstName":"John",
"lastName":"Doe",
"customerType":{
"type":"P",
"description":"Premium Customer"
}
}
But for POST I imagine I need to avoid sending the id and send just the customer type since the description is immutable and the client needs the description only for visualizing the information on screen, but this leads to different request body from the one returned in the GET operation.
For the PUT operation is the same but also should the id field be sent? How to handle the case where the id in the API path is different from the id in the request body if sent?
DELETE should not be a problem since it just deletes the row in customer table.
Thank you
How the body should be structured?
Let's make a step back first and let us discuss quickly what you basically try when following a REST architecture and why and how REST installs those mechanisms.
REST is an architectural style that helps in decoupling clients from servers by introducing indirection mechanisms which may seem odd at first but in the end allow you to achieve the required level of decoupling which allows clients to introduce changes which clients will naturally adept to. Such indirection mechanisms include attaching URIs to link-relation names, using form-based representation formats to tell a client how to create requests, content-type negotiation to return representations supported and understood by others and so forth. If you don't need such properties, i.e. as client and servers always go hand in hand in regards to changes and communicate on predefined messages, REST is probably not the best style to follow. If you though have a server that is contacted by various clients not under your control or a client that has to contact various servers, also not under your direct control, this is where REST truly starts to shine if all parties adhere to these concepts.
One of RESTs premise is that a server will teach clients everything they need to know in order to construct requests. If you look at the Web, where HTML is basically used everywhere, you might see that HTML defines HTML forms which basically allow a server to explain to a client what properties of a resource the server expects as input. On top of that the form also tells you client which HTTP operation to use, which target URI to send the request to and which media-type to represent the state in. In HTML this is usually implicitly given as application/x-www-form-urlencoded which chains properties together i.e. like this:
firstName=Roman&lastName=Vottner&role=Dev
or the like. This is in essence what HATEOAS or hypertext as the engine of application state is all about. You use in-build controls of the media-type exchanged to allow your client to progress its task instead of having to consult external documentation to lookup the "API" of some services. I.e. a form could state that an input only allows numeric values, that a sub-portion of the form represents a date/time picker widget which a client could render to a user accordingly, or an element represents a slider with a given range of admissible values and the like.
How the actual representation format you have to send to the server has to look like depends on the instructed media-type. I.e. HAL forms uses application/json by default and also specifies that application/x-www-form-urlencoded needs to be supported. Other media-types have explicitly negotiated between client and server. Ion states that application/json or application/ion+json have to be negotiated via the Content-Type request header.
In plain application/json the url-encoded payload from above could simply be expressed as:
{
"firstName": "Roman",
"lastName": "Vottner",
"role": "Dev"
}
and this is OK as the server basically instructed you to send this data in that format.
There are further media-types available that are worth a closer look whether they could fit your need or not. I.e. Hydra has a bit of a different take on this matter by connecting Linked Data to REST and its affordances called operations and allows to describe resources and its properties through LD classes. So the presence of an affordance for a certain resource tells you what you can do with that resource, like i.e. updating its state, and therefore also which class it belongs to and therefore which properties it has.
This just should illustrate how a negotiated media type finally decides how the actual representation needs to look like that has to be sent to the server.
In regards of whether to put in resource identifiers in the payload or not it depends. Usually resources are identified by the URI/IRI and this, as a whole, is the identifier of the resource. In your application though you will reference related domain objects through their ID which does not necessarily need to be, and probably also should not be, part of the IRI itself. I.e. let's assume we retrieve a resource that represents an order. That order contains the users name and address, the various items that got ordered including some meta data describing those items and what not. It usually makes sense in such a case to add the orderId which you use in your application even though the URI may contain that information already. Users of that API are usually not interested in those URIs but the actual content and might also never see those URIs if they are hidden behind automated processes or user interfaces. If a user now wants to print out that order s/he has all the information needed to file complaints later on via phone i.e. In other cases, i.e. if you design a resource to be an all-purpose clipboard like, copy&paste location, an ID does not make any sense unless you grant the user to explicitly reference one of that states directly.
The reason why IDs should not be part of the URI itself stems from the fact that a URI shouldn't change if the actual resource does not change. I.e. we have a customer who went through a merge a couple of years ago. They used to expose all their products via own URIs that exposed the productId as part of the URI. During the merger the tried to combine the various different data models to reduce the number of systems they had to operate while serving each of their customers with the same data as before as the underlying products didn't change. As they tried to stay "backwards" compatible for the purpose of supporting legacy systems of their customers, they quickly noticed that exposing those productIds as part of the URI was causing them some troubles. If they had used a mapping table of i.e. exposed UUIDs to internal productIds (again an introduction of indirection) earlier they could have reduced their whole data model and thus complexity by a lot while being able to change the mapping from internal prodcutId to UUID on the fly while allowing their clients to lookup the product information.
Long story short, as hopefully can be seen the structure of a representation depends on the exchanged media type. There are loads of different media-types available. Use the ones that allow you to describe resources to clients, such as HAL/HAL forms, Ion, Hydra, .... In regards to URIs, don't overengineer URIs. They are, as a whole, just a pointer to a resource and clients are usually interested in the content, not the URI! As such, make use of indirection-features like link-relation names, content-type negotiation and so forth to help remove the direct coupling of clients to services but instead rely more on the document type exchanged. The media-type here becomes basically the contract of the message. Through mappings on the client and server side resources of various representations can be "translated" to an object which you can use in your application.
As you've tagged your question with spring-boot and spring-data-jpa, you might want to look into spring-hateoas. It supports HAL out of the box, HAL forms can be used via affordances though the media-type needs to be enabled explicitly for it otherwise you might miss out on the form-template in the responses. Hydra support in spring-hateoas seems to be added through hydra-java which implements the Spring HATEOAS SPI. While Amazon provides implementation for Ion for various programming languages, including Java, it does not yet support Spring HATEOAS or Spring in general. Here a custom SPI implementation may be necessary.
For PUT operations you need to send the id of the entity that you want to update.
If you want to generate the same response as you would get in GET, then you need to write a DTO and map details accordingly.
We are developing a FHIR server. We have profiled Condition resource. Now we have been trying to figure out in HAPI library documentation that do we actually return in GET search request a HL7 base resource for Condition or can we return our own profiled resource?
We only find this example in HAPI documentation that return a base resource for Patient.
https://hapifhir.io/hapi-fhir/docs/server_plain/resource_providers.html#resource-providers
If we return our own profiled resource, do we need to validate it or is the only difference that we insert meta.profile field in return resource? Is there any example how to return own profiled resource?
In response to a Get, the default is that servers return the data they have. In some cases, they may need to filter to exclude data the requesting system/user does not have permission to share. You can use a profile to document that set of expectations (i.e. what data the server is capable of exposing/exposing to a given recipient).
When you return an instance, if you happen to be aware that the instance complies with one or more profiles, you're free to list those in Resource.meta. However, there's no general expectation that you do so. Occasionally IGs will set expectations for profile declaration, though this is generally discouraged as it can impose an unnecessary interoperability barrier.
A system is non-conformant if it declares a profile that it turns out the instance is not valid against. Most of the time, this is handled by validating inbound data rather than validating when creating a response, as that's more efficient. However, if you want to validate a response before returning it, you certainly can. (Presumably dropping the profile declaration if it turns out the instance isn't valid against it.)
I am trying to understand how to use the FHIR Questionnaire resource, and have a specific question regarding this.
My project is specifically regarding how a citizen in our country could be responding to Questionnaires via a web app, which are then submitted to the FHIR server as QuestionnaireAnswers, to be read/analyzed by a health professional.
A FHIR-based system could have lots of Questionnaires (Qs), groups of Qs or even specific Qs would be targeted towards certain users or groups of users. The display of the questionnare to the citizen could also be based on a Care-plan of a sort, for example certain Questionnaires needing filling-in in the weeks after surgery. The Questionnaires could also be regular ones that need to be filled in every day or week permanently, to support data collection on the state of a chronic disease.
What I'm wondering is if FHIR has a resource which fits into organizing the 'logistics' of displaying the right form to the right person. I can see CarePlan, which seems to partly fit. Or is this something that would typically be handled out-of-FHIR-scope by specific server implementations?
So, to summarize:
Which resource or mechanism would a health professional use to set up that a patient should answer certain Questionnaires, either regularly or as part of for example a follow-up after a surgery. So this would include setting up the schedule for the form(s) to be filled in, and possibly configure what would happen if the form wasn't filled in as required.
Which resource (possibly the same) or mechanism would be used for the patient's web app to retrieve the relevant Questionnaire(s) at a given point in time?
At the moment, the best resource for saying "please capture data of type X on schedule Y" would be DiagnosticOrder, though the description probably doesn't make that clear. (If you'd be willing to click the "Propose a change" link and submit a change request for us to clarify, that'd be great.) If you wanted to order multiple questionnaires, then CarePlan would be a way to group that.
The process of taking a complex schedule (or set of schedules) and turning that into a simple list of "do this now" requests that might be more suitable for a mobile application to deal with is scheduled for DSTU 2.1. Until then, you have a few options for the mobile app:
- have it look at the CarePlan and complex DiagnosticOrder schedule and figure things out itself
- have a server generate a List of mini 1-time DiagnosticOrders and/or Orders identifying the specific "answer" times
- roll your own mechanism using the Other/Basic resource
Depending on your timelines, you might want to stay tuned to discussions by the Patient Care and Orders and Observations work groups as they start dealing with the issues around workflow management starting next month in Atlanta.
I want to plan a solution that manages enriched data in my architecture.
To be more clear, I have dozens of micro services.
let's say - Country, Building, Floor, Worker.
All running over a separate NoSql data store.
When I get the data from the worker service I want to present also the floor name (the worker is working on), the building name and country name.
Solution1.
Client will query all microservices.
Problem - multiple requests and making the client be aware of the structure.
I know multiple requests shouldn't bother me but I believe that returning a json describing the entity in one single call is better.
Solution 2.
Create an orchestration that retrieves the data from multiple services.
Problem - if the data (entity names, for example) is not stored in the same document in the DB it is very hard to sort and filter by these fields.
Solution 3.
Before saving the entity, e.g. worker, call all the other services and fill the relative data (Building Name, Country name).
Problem - when the building name is changed, it doesn't reflect in the worker service.
solution 4.
(This is the best one I can come up with).
Create a process that subscribes to a broker and receives all entities change.
For each entity it updates all the relavent entities.
When an entity changes, let's say building name changes, it updates all the documents that hold the building name.
Problem:
Each service has to know what can be updated.
When a trailing update happens it shouldnt update the broker again (recursive update), so this can complicate to the microservices.
solution 5.
Keeping everything normalized. Fileter and sort in ElasticSearch.
Problem: keeping normalized data in ES is too expensive performance-wise
One thing I saw Netflix do (which i like) is create intermediary services for stuff like this. So maybe a new intermediary service that can call the other services to gather all the data then create the unified output with the Country, Building, Floor, Worker.
You can even go one step further and try to come up with a scheme for providing as input which resources you want to include in the output.
So I guess this closely matches your solution 2. I notice that you mention for solution 2 that there are concerns with sorting/filtering in the DB's. I think that if you are using NoSQL then it has to be for a reason, and more often then not the reason is for performance. I think if this was done wrong then yeah you will have problems but if all the appropriate fields that are searchable are properly keyed and indexed (as #Roman Susi mentioned in his bullet points 1 and 2) then I don't see this as being a problem. Yeah this service will only be as fast as the culmination of your other services and data stores, so they have to be fast.
Now you keep your individual microservices as they are, keep the client calling one service, and encapsulate the complexity of merging the data into this new service.
This is the video that I saw this in (https://www.youtube.com/watch?v=StCrm572aEs)... its a long video but very informative.
It is hard to advice on the Solution N level, but certain problems can be avoided by the following advices:
Use globally unique identifiers for entities. For example, by assigning key values some kind of URI.
The global ids also simplify updates, because you track what has actually changed, the name or the entity. (entity has one-to-one relation with global URI)
CAP theorem says you can choose only two from CAP. Do you want a CA architecture? Or CP? Or maybe AP? This will strongly affect the way you distribute data.
For "sort and filter" there is MapReduce approach, which can distribute the load of figuring out those things.
Think carefully about the balance of normalization / denormalization. If your services operate on URIs, you can have a service which turns URIs to labels (names, descriptions, etc), but you do not need to keep the redundant information everywhere and update it. Do not do preliminary optimization, but try to keep data normalized as long as possible. This way, worker may not even need the building name but it's global id. And the microservice looks up the metadata from another microservice.
In other words, minimize the number of keys, shared between services, as part of separation of concerns.
Focus on the underlying model, not the JSON to and from. Right modelling of the data in your system(s) gains you more than saving JSON calls.
As for NoSQL, take a look at Riak database: it has adjustable CAP properties, IIRC. Even if you do not use it as such, reading it's documentation may help to come up with suitable architecture for your distributed microservices system. (Of course, this applies if you have essentially parallel system)
First of all, thanks for your question. It is similar to Main Problem Of Document DBs: how to sort collection by field from another collection? I have my own answer for that so i'll try to comment all your solutions:
Solution 1: It is good if client wants to work with Countries/Building/Floors independently. But, it does not solve problem you mentioned in Solution 2 - sorting 10k workers by building gonna be slow
Solution 2: Similar to Solution 1 if all client wants is a list enriched workers without knowing how to combine it from multiple pieces
Solution 3: As you said, unacceptable because of inconsistent data.
Solution 4: Gonna be working, most of the time. But:
Huge data duplication. If you have 20 entities, you are going to have x20 data.
Large complexity. 20 entities -> 20 different procedures to update related data
High cohesion. All your services must know each other. Data model change will propagate to every service because of update procedures
Questionable eventual consistency. It can be done so data will be consistent after failures but it is not going to be easy
Solution 5: Kind of answer :-)
But - you do not want everything. Keep separated services that serve separated entities and build other services on top of them.
If client wants enriched data - build service that returns enriched data, as in Solution 2.
If client wants to display list of enriched data with filtering and sorting - build a service that provides enriched data with filtering and sorting capability! Likely, implementation of such service will contain ES instance that contains cached and indexed data from lower-level services. Point here is that ES does not have to contain everything or be shared between every service - it is up to you to decide better balance between performance and infrastructure resources.
This is a case where Linked Data can help you.
Basically the Floor attribute for the worker would be an URI (a link) to the floor itself. And Any other linked data should be expressed as URIs as well.
Modeled with some JSON-LD it would look like this:
worker = {
'#id': '/workers/87373',
name: 'John',
floor: {
'#id': '/floors/123'
}
}
floor = {
'#id': '/floor/123',
'level': 12,
building: { '#id': '/buildings/87' }
}
building = {
'#id': '/buildings/87',
name: 'John's home',
city: { '#id': '/cities/908' }
}
This way all the client has to do is append the BASE URL (like api.example.com) to the #id and make a simple GET call.
To remove the extra calls burden from the client (in case it's a slow mobile device), we use the gateway pattern with micro-services. The gateway can expand those links with very little effort and augment the return object. It can also do multiple calls in parallel.
So the gateway will make a GET /floor/123 call and replace the floor object on the worker with the reply.
I'm struggling to apply RESTful principles to a new web application I'm working on. In particular, it's the idea that to be RESTful, each HTTP request should carry enough information by itself for its recipient to process it to be in complete harmony with the stateless nature of HTTP.
The application allows users to search for medications. The search accepts filters as input, for example, return discontinued medicines, include complimentary therapy etc..etc. In total there are around 30 filters that can be applied.
Additionally, patient details can be entered including the patients age, gender, current medications etc.
To be Restful, should all this information be included with every request? This seems to place a huge overhead on the network. Also, wouldn't the restrictions on URL length, at least for GET, make this unfeasible?
The "Filter As Resource" is a perfect tact for this.
You can PUT the filter definition to the filter resource, and it can return the filter ID.
PUT is idempotent, so even if the filter is already there, you just need to detect that you've seen the filter before, so you can return the proper ID for the filter.
Then, you can add a filter parameter to your other requests, and they can grab the filter to use for the queries.
GET /medications?filter=1234&page=4&pagesize=20
I would run the raw filters through some sort of canonicalization process, just to have a normalized set, so that, e.g. filter "firstname=Bob lastname=Eubanks" is identical to "lastname=Eubanks firstname=Bob". That's just me though.
The only real concern is that, as time goes on, you may need to obsolete some filters. You can simply error out the request should someone make a request with a missing or obsolete filter.
Edit answering question...
Let's start with the fundamentals.
Simply, you want to specify a filter for use in queries, but these filters are (potentially) involved and complicated. If it was simple /medications/1234, this wouldn't be a problem.
Effectively, you always need to send the filter to the query. The question is how to represent that filter.
The fundamental issue with things like sessions in REST systems is that they're typically managed "out of band". When you, say, go and create a medication, you PUT or POST to the medications resource, and you get a reference back to that medication.
With a session, you would (typically) get back a cookie, or perhaps some other token to represent that session. If your PUT to the medications resource created a session also, then, in truth, your request created two resources: a medication, and a session.
Unfortunately, when you use something like a cookie, and you require that cookie for your request, the resource name is no longer the true representation of the resource. Now it's the resource name (the URL), and the cookie.
So, if I do a GET on the resource named /medications/search, and the cookie represents a session, and that session happens to have a filter in it, you can see how in effect, that resource name, /medications/search, isn't really useful at all. I don't have all of the information I need to make effective use, because of the side effect of the cookie and the session and the filter therein.
Now, you could perhaps rewrite the name: /medications/search?session=ABC123, effectively embedding the cookie in the resource name.
But now you run in to the typical contract of sessions, notably that they're short lived. So, that named resource is less useful, long term, not useless, just less useful. Right now, this query gives me interesting data. Tomorrow? Probably not. I'll get some nasty error about the session being gone.
The other problem is that sessions typically are not managed as a resource. For example, they're usually a side effect, vs explicitly managed via GET/PUT/DELETE. Sessions are also the "garbage heap" of web app state. In this case, we're just kind of hoping that the session is properly populated with what is needed for this request. We actually don't really know. Again, it's a side effect.
Now, let's turn it on its head a little bit. Let's use /medications/search?filter=ABC123.
Obviously, casually, this looks identical. We just changed the name from 'session' to 'filter'. But, as discussed, Filters, in this case, ARE a "first class resource". They need to be created, managed, etc. the same as a medication, a JPEG, or any other resource in your system. This is the key distinction.
Certainly, you could treat "sessions" as a first class resource, creating them, putting stuff in them directly, etc. But you can see how, at least from a clarity point of view, a "first class" session isn't really a good abstraction for this case. Using a session, its like going to the cleaners and handing over your entire purse or briefcase. "Yea, the ticket is in there somewhere, dig out what you want, give me my clothes", especially compared to something explicit like a filter.
So, you can see how, at 30,000 feet, there's not a lot of difference in the case between a filter and a session. But when you zoom in, they're quite different.
With the filter resource, you can choose to make them a persistent thing forever and ever. You can expire them, you can do whatever you want. Sessions tend to have pre-conceived semantics: short live, duration of the connection, etc. Filters can have any semantics you want. They're completely separate from what comes with a session.
If I were doing this, how would I work with filters?
I would assume that I really don't care about the content of a filter. Specifically, I doubt I would ever query for "all filters that search by first name". At this juncture, it seems like uninteresting information, so I won't design around it.
Next, I would normalize the filters, like I mentioned above. Make sure that equivalent filters truly are equivalent. You can do this by sorting the expressions, ensuring fieldnames are all uppercase, or whatever.
Then, I would store the filter as an XML or JSON document, whichever is more comfortable/appropriate for the application. I would give each filter a unique key (naturally), but I would also store a hash for the actual document with the filter.
I would do this to be able to quickly find if the filter is already stored. Since I'm normalizing it, I "know" that the XML (say) for logically equivalent filters would be identical. So, when someone goes to PUT, or insert a new filter, I would do a check on the hash to see if it has been stored before. I may well get back more than one (hashes can collide, of course), so I'll need to check the actual XML payloads to see whether they match.
If the filters match, I return a reference to the existing filter. If not, I'd create a new one and return that.
I also would not allow a filter UPDATE/POST. Since I'm handing out references to these filters, I would make them immutable so the references can remain valid. If I wanted a filter by "role", say, the "get all expire medications filter", then I would create a "named filter" resource that associates a name with a filter instance, so that the actual filter data can change but the name remain the same.
Mind, also, that during creation, you're in a race condition (two requests trying to make the same filter), so you would have to account for that. If your system has a high filter volume, this could be a potential bottleneck.
Hope this clarifies the issue for you.
To be Restful, should all this information be included with every request?
No. If it looks like your server is sending (or receiving) too much information, chances are that there are one or more resources you haven't yet identified.
The first and most important step in designing a RESTful system is to identify and name your resources. How would you do that for your system?
From your description, here's one possible set of resources:
User - a user of the system (maybe a doctor or patient (?) - Role might need to be exposed as a resource here)
Medication - the stuff in the bottle, but it also might represent the kind of bottle (quantity and contents), or it might represent a particular bottle - depending on if you're a pharmacy or just a help desk.
Disease - the condition for which a Patient might want to take a Medication.
Patient - a person who might take a Medication
Recommendation - a Medication that might be beneficial to a Patient based on a Disease they suffer from.
Then you could look for relationships among resources;
User has and belongs to many Roles
Medication has and belongs to many Diseases
Disease has many Recommendations.
Patient has and belongs to many Medications and Diseases (poor chap)
Patient has many Recommendations
Recommendation has one Patient and has one Disease
The specifics are probably not right for your particular problem, but the idea is simple: create a network of relationships among your resources.
At this point it might be helpful to think about URI structure, although keep in mind that REST APIs must be hypertext-driven:
# view all Recommendations for the patient
GET http://server.com/patients/{patient}/recommendations
# view all Recommendations for a Medication
GET http://servier.com/medications/{medication}/recommendations
# add a new Recommendation for a Patient
PUT http://server.com/patients/{patient}/recommendations
Because this is REST, you'll spend most of your time defining the media types used to transfer representations of your resources between client and server.
By exposing more resources, you can cut down on the amount of data that needs to be transferred during each request. Also notice there are no query parameters in the URIs. The server can be as stateful as it needs to be to keep track of it all, and each request can be fully self-contained.
REST is for APIs, not (typical) applications. Don't try to wedge a fundamentally stateful interaction into a stateless model just because you read about it on wikipedia.
To be Restful, should all this information be included with every request? This seems to place a huge overhead on the network. Also, wouldn't the restrictions on URL length, at least for GET, make this unfeasible?
The size of parameters is usually insignificant compared to the size of resources the server sends. If you're using such large parameters that they are a network burden, place them on the server once and then use them as resources.
There are no significant restrictions on URL length -- if your server has such a limit, upgrade it. It's probably years old and chock-full of security vulnerabilities anyway.
No all of that does not have to be in every request.
Each resource (medication, patient history, etc) should have a canonical URI that uniquely identifies it. In some applications (eg, Rails-based ones) this will be something like "/patients/1234" or "/drugs/5678" but the URL format is unimportant.
A client that has previously obtained the URI for a resource (such as from a search, or from a link embedded in another resource) can retrieve it using this URI.
Are you working on a RESTful API that other apps will use to search your data? Or are you building a end-user focused web application where users will log in and perform these searches?
If your users are logging in, then you're already stateful as you'll have some type of session cookie to maintain the logged in state. I would go ahead and create a session object that contains all the search filters. If a user hasn't set any filters, then this object will be empty.
Here's a great blog post about using GET vs POST. It mentions a URL length limit set by Internet Explorer of 2,048 characters, so you want to use POST for long requests.
http://carsonified.com/blog/dev/the-definitive-guide-to-get-vs-post/