How to get grouped data from a microservice? - microservices

Let's say we have a system to store appointments. Each appointment has multiple resources (e.g. trainers, rooms, etc.). We have decided to move all appointment data into an Appointment Service and all resources into a Resources Service.
Now we need a UI that shows filters for the appointments, to filter by trainer. Usually, you only want to display checkboxes for trainers that actually have appointments and not all trainers.
That means we can't really use the Resource Service to get all trainers, instead, we would have to ask the Appointment Service to get a grouped view of all trainers that have at least one appointment. Then we would have to call the Resource Service to get more info about each trainer.
So how do you get grouped data from a microservice?
Edit: Each system has it's own database. We also use RabbitMQ to sync data between services.

This is an interesting question with many possible solutions. #Welbog comment makes a good point about it depending on the scale of the application. Denormalized databases are obviously a possibility.
Getting grouped data is one of the challenges of implementing microservices, and this challenge becomes greater the more granular our services get. What does your database setup look like? I'm assuming your two services are using different databases otherwise your question would have a simple solution.
Without knowing the ins and outs of your system, I would assume that denormalizing your db's would be the path of least resistance.
You could possible explore the idea that maybe these two services should in fact be a single service. Nanoservices are not what we are after, and sometimes it just makes more logical sense for two services to actually be together. Things that must change together, should be contained together. I'm not saying this is applicable in your case, I'm just saying it's worth considering.
I'm certain others will have other ideas, but based on what little I know about the entirety of your system, it's hard to say; however I think this is an interesting question that I will follow to see what other peoples proposed solutions are.

Related

How to deal with "Foreign Key" in microservice architecture?

Quick question on Foreign key in Microservices. I already tried looking for answer. But, they did not give me the exact answer I was looking for.
Usecase : Every blog post will have many comments. Traditional monolith will have comments table with foreign key to blog post. However in microservice, we will have two services.
Service 1 : Post Microservie with these table fields (PostID, Name, Content)
Service 2 : Comments Microservie with these table fields (CommentID, PostID, Cpmment)
The question is, Do we need "PostID" in service 2 (Comments Microservice) ? I guess the answer is yes, as we need to know which comment belongs to which post. But then, it will create tight coupling? I mean if I delete service 1(Blog post service), it will impact service 2(Comments service) ?
I'm going to use another example I'm more familiar with to explain how I believe most people would do this.
Consider an Order Management System (OMS) and an Inventory Management System (IMS).
When a customer places an order in the company web site, we ask the OMS to create an order entry in the backend (e.g. via an HTTP endpoint).
The OMS system then broadcasts an event e.g. OrderPlaced containing all the details of the customer order. We may have a pub/sub (e.g. Redis), or a queue (e.g. RabbitMQ), or an event stream (e.g. Kafka) where we place the event (although this can be done in many other ways).
The thing is that we have one or more subscribers interested in this event. One of those could be the IMS, which has the responsibility of assigning the best inventory available every time an order is placed.
We can expect that the IMS will keep a copy of the relevant order information it received when it processed the OrderPlaced event such that it does not ask every little detail of the order to the OMS all the time. So, if the IMS needed a join with the order, instead of calling an endpoint in the Order API, it would probably just do a join with its local copy of the orders table.
Say now that our customer called to cancel her order. A customer service representative then cancelled it in the OMS Web User Interface. At that point an event OrderCanceled is broadcast. Guess who is listening for that event? Correct, the IMS receives notification and acts accordingly reversing the inventory assignation and probably even deleting the order record because it is no longer necessary on this domain.
So, as you can see, the best way to do this is by using events and making copies of the relevant details on the other domain.
Since events need time to get broadcast and processed by interested parties, we say that the order data in the IMS is eventually consistent.
Followup Questions
Q: So, if I understood right in microservises we prefer to duplicate data and get better performance? That is the concept? I mean I know the concept is scaling and flexibility but when we must share data we will just duplicate it?
Not really. That´s definitively not what I meant although it may have sounded like that due to my poor choice of words in the original explanation. It appears to me that at the heart of your question lies a lack of sufficient understanding of the concept of a bounded context.
In my explanation I meant to indicate that the OMS has a domain concept known as the order, but so does the IMS. Therefore, they both have an entity within their domain that represents it. There is a good chance that the order entity in the OMS is much richer than the corresponding representation of the same concept in the IMS.
For example, if the system I was describing was not for retail, but for wholesale, then the same concept of a "sales order" in our system corresponds to the concept of a "purchase order" in that of our customers. So you see, the same data, mapped under a different name, simply because under a different bounded context the data may have a different perspective and meaning.
So, this is the realization that a given concept from our model may be represented in multiple bounded contexts, perhaps from a different perspective and names from our ubiquitous language.
Just to give another example, the OMS needs to know about the customer, but the representation of the idea of a customer in the OMS is probably different than the same representation of such a concept or entity in the CRM. In the OMS the customer's name, email, shipping and billing addresses are probably enough representation of this idea, but for the CRM the customer encompasses much more.
Another example: the IMS needs to know the shipping address of the customer to choose the best inventory (e.g. the one in a facility closest to its final destination), but probably does not care much about the billing address. On the other hand, the billing address is fundamental for the Payment Management System (PMS). So, both the IMS and PMS may have a concept of an "order", it is just that it is not exactly the same, neither it has the same meaning or perspective, even if we store the same data.
One final example: the accounting system cares about the inventory for accounting purposes, to be able to tell how much we own, but perhaps accounting does not care about the specific location of the inventory within the warehouse, that's a detail only the IMS cares about.
In conclusion, I would not say this is about "copying data", this is about appropriately representing a fundamental concept within your bounded context and the realization that some concepts from the model may overlap between systems and have different representations, sometimes even under different names and levels of details. That's why I suggested that you investigate the idea of context mapping some more.
In other words, from my perspective, it would be a mistake to assume that the concept of an "order" only exists in the OMS. I could probably say that the OMS is the master of record of orders and that if something happens to an order we should let other interested systems know about those events since they care about some of that data because those other systems could have mapping concepts related to orders and when reacting to the changes in the master of record, they probably want to change their data as well.
From this point of view, copying some data is a side effect of having a proper design for the bounded context and not a goal in itself.
I hope that answers your question.

Mechanisms for response aggregation in event sourcing based microservices

When it comes to implementing event sourcing based microservices, one of the main concerns that we've come across is aggregating data for responses. For an example we may have two entities like school and student. One microservice may be responsible for handling school related business logic while another may handle students.
Now if someone makes a query through a REST endpoint and ask for a particular student and they might expect both school and student details, then the only known ways for me are the following.
Use something like service chaining. An example would be an Api-Gateway aggregating a response after making couple of requests to couple of microservices.
Having everything replicated throughout all services. Essentially, data would be duplicated.
Having services calling each other for those extra bit of information. This solution works but hard to scale and goes against basic idea of using event sourcing.
My question is that what other ways are there to do this ?
A better approach can be to create a separate reporting/search service, that aggregates the data from both services. For example implemented using ElasticSearch or SOLR.This now allows the users to do search and queries across multiple services and aggregates.
Sure, it will be eventually consistent, but I doubt that is s a problem. This gives a better separation of concerns and you get a nice search experience for your users at the same time.

Microservices: model sharing between bounded contexts

I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.

Access and scheduling of FHIR Questionnaire resource

I am trying to understand how to use the FHIR Questionnaire resource, and have a specific question regarding this.
My project is specifically regarding how a citizen in our country could be responding to Questionnaires via a web app, which are then submitted to the FHIR server as QuestionnaireAnswers, to be read/analyzed by a health professional.
A FHIR-based system could have lots of Questionnaires (Qs), groups of Qs or even specific Qs would be targeted towards certain users or groups of users. The display of the questionnare to the citizen could also be based on a Care-plan of a sort, for example certain Questionnaires needing filling-in in the weeks after surgery. The Questionnaires could also be regular ones that need to be filled in every day or week permanently, to support data collection on the state of a chronic disease.
What I'm wondering is if FHIR has a resource which fits into organizing the 'logistics' of displaying the right form to the right person. I can see CarePlan, which seems to partly fit. Or is this something that would typically be handled out-of-FHIR-scope by specific server implementations?
So, to summarize:
Which resource or mechanism would a health professional use to set up that a patient should answer certain Questionnaires, either regularly or as part of for example a follow-up after a surgery. So this would include setting up the schedule for the form(s) to be filled in, and possibly configure what would happen if the form wasn't filled in as required.
Which resource (possibly the same) or mechanism would be used for the patient's web app to retrieve the relevant Questionnaire(s) at a given point in time?
At the moment, the best resource for saying "please capture data of type X on schedule Y" would be DiagnosticOrder, though the description probably doesn't make that clear. (If you'd be willing to click the "Propose a change" link and submit a change request for us to clarify, that'd be great.) If you wanted to order multiple questionnaires, then CarePlan would be a way to group that.
The process of taking a complex schedule (or set of schedules) and turning that into a simple list of "do this now" requests that might be more suitable for a mobile application to deal with is scheduled for DSTU 2.1. Until then, you have a few options for the mobile app:
- have it look at the CarePlan and complex DiagnosticOrder schedule and figure things out itself
- have a server generate a List of mini 1-time DiagnosticOrders and/or Orders identifying the specific "answer" times
- roll your own mechanism using the Other/Basic resource
Depending on your timelines, you might want to stay tuned to discussions by the Patient Care and Orders and Observations work groups as they start dealing with the issues around workflow management starting next month in Atlanta.

Micro Services and noSQL - Best practice to enrich data in micro service architecture

I want to plan a solution that manages enriched data in my architecture.
To be more clear, I have dozens of micro services.
let's say - Country, Building, Floor, Worker.
All running over a separate NoSql data store.
When I get the data from the worker service I want to present also the floor name (the worker is working on), the building name and country name.
Solution1.
Client will query all microservices.
Problem - multiple requests and making the client be aware of the structure.
I know multiple requests shouldn't bother me but I believe that returning a json describing the entity in one single call is better.
Solution 2.
Create an orchestration that retrieves the data from multiple services.
Problem - if the data (entity names, for example) is not stored in the same document in the DB it is very hard to sort and filter by these fields.
Solution 3.
Before saving the entity, e.g. worker, call all the other services and fill the relative data (Building Name, Country name).
Problem - when the building name is changed, it doesn't reflect in the worker service.
solution 4.
(This is the best one I can come up with).
Create a process that subscribes to a broker and receives all entities change.
For each entity it updates all the relavent entities.
When an entity changes, let's say building name changes, it updates all the documents that hold the building name.
Problem:
Each service has to know what can be updated.
When a trailing update happens it shouldnt update the broker again (recursive update), so this can complicate to the microservices.
solution 5.
Keeping everything normalized. Fileter and sort in ElasticSearch.
Problem: keeping normalized data in ES is too expensive performance-wise
One thing I saw Netflix do (which i like) is create intermediary services for stuff like this. So maybe a new intermediary service that can call the other services to gather all the data then create the unified output with the Country, Building, Floor, Worker.
You can even go one step further and try to come up with a scheme for providing as input which resources you want to include in the output.
So I guess this closely matches your solution 2. I notice that you mention for solution 2 that there are concerns with sorting/filtering in the DB's. I think that if you are using NoSQL then it has to be for a reason, and more often then not the reason is for performance. I think if this was done wrong then yeah you will have problems but if all the appropriate fields that are searchable are properly keyed and indexed (as #Roman Susi mentioned in his bullet points 1 and 2) then I don't see this as being a problem. Yeah this service will only be as fast as the culmination of your other services and data stores, so they have to be fast.
Now you keep your individual microservices as they are, keep the client calling one service, and encapsulate the complexity of merging the data into this new service.
This is the video that I saw this in (https://www.youtube.com/watch?v=StCrm572aEs)... its a long video but very informative.
It is hard to advice on the Solution N level, but certain problems can be avoided by the following advices:
Use globally unique identifiers for entities. For example, by assigning key values some kind of URI.
The global ids also simplify updates, because you track what has actually changed, the name or the entity. (entity has one-to-one relation with global URI)
CAP theorem says you can choose only two from CAP. Do you want a CA architecture? Or CP? Or maybe AP? This will strongly affect the way you distribute data.
For "sort and filter" there is MapReduce approach, which can distribute the load of figuring out those things.
Think carefully about the balance of normalization / denormalization. If your services operate on URIs, you can have a service which turns URIs to labels (names, descriptions, etc), but you do not need to keep the redundant information everywhere and update it. Do not do preliminary optimization, but try to keep data normalized as long as possible. This way, worker may not even need the building name but it's global id. And the microservice looks up the metadata from another microservice.
In other words, minimize the number of keys, shared between services, as part of separation of concerns.
Focus on the underlying model, not the JSON to and from. Right modelling of the data in your system(s) gains you more than saving JSON calls.
As for NoSQL, take a look at Riak database: it has adjustable CAP properties, IIRC. Even if you do not use it as such, reading it's documentation may help to come up with suitable architecture for your distributed microservices system. (Of course, this applies if you have essentially parallel system)
First of all, thanks for your question. It is similar to Main Problem Of Document DBs: how to sort collection by field from another collection? I have my own answer for that so i'll try to comment all your solutions:
Solution 1: It is good if client wants to work with Countries/Building/Floors independently. But, it does not solve problem you mentioned in Solution 2 - sorting 10k workers by building gonna be slow
Solution 2: Similar to Solution 1 if all client wants is a list enriched workers without knowing how to combine it from multiple pieces
Solution 3: As you said, unacceptable because of inconsistent data.
Solution 4: Gonna be working, most of the time. But:
Huge data duplication. If you have 20 entities, you are going to have x20 data.
Large complexity. 20 entities -> 20 different procedures to update related data
High cohesion. All your services must know each other. Data model change will propagate to every service because of update procedures
Questionable eventual consistency. It can be done so data will be consistent after failures but it is not going to be easy
Solution 5: Kind of answer :-)
But - you do not want everything. Keep separated services that serve separated entities and build other services on top of them.
If client wants enriched data - build service that returns enriched data, as in Solution 2.
If client wants to display list of enriched data with filtering and sorting - build a service that provides enriched data with filtering and sorting capability! Likely, implementation of such service will contain ES instance that contains cached and indexed data from lower-level services. Point here is that ES does not have to contain everything or be shared between every service - it is up to you to decide better balance between performance and infrastructure resources.
This is a case where Linked Data can help you.
Basically the Floor attribute for the worker would be an URI (a link) to the floor itself. And Any other linked data should be expressed as URIs as well.
Modeled with some JSON-LD it would look like this:
worker = {
'#id': '/workers/87373',
name: 'John',
floor: {
'#id': '/floors/123'
}
}
floor = {
'#id': '/floor/123',
'level': 12,
building: { '#id': '/buildings/87' }
}
building = {
'#id': '/buildings/87',
name: 'John's home',
city: { '#id': '/cities/908' }
}
This way all the client has to do is append the BASE URL (like api.example.com) to the #id and make a simple GET call.
To remove the extra calls burden from the client (in case it's a slow mobile device), we use the gateway pattern with micro-services. The gateway can expand those links with very little effort and augment the return object. It can also do multiple calls in parallel.
So the gateway will make a GET /floor/123 call and replace the floor object on the worker with the reply.

Resources