Ingesting FHIR resources - Metadata and resource Identifier for Interoperability / data sharing workflows - hl7-fhir

Are there any best practices or industry standards when ingesting FHIR resources from another system? More specifically - when multiple healthcare entities using FHIR resources begin sharing data (and wanting to store each other's data within their own system), how should a resource's metadata and ID be handled?
Let's consider one organization has a Patient resource for John Doe. A Practitioner in that system orders a lab (ServiceRequest) and sends that resource to a lab. After the lab is performed, that organization receives the results via a DiagnosticReport and Observation from the 3rd party who performed the test. The metadata (version id, profile, etc.) and ID are going to be referring to that 3rd party, which the ingesting system may not care about.
My gut feeling is the system ingesting those two resources for the lab result would:
Replace the metadata and identifier with information pertaining to it's own system.
Transform any attributes based on the profile being used (if different from the 3rd party)
Store off the 3rd party's identifiers and/or metadata (if needed to be kept for later) elsewhere.
For 3, this could be:
Resource.identifier contains both the ingesting system's AND 3rd party system's identifier
Resource.metadata.source moved to an extension indicating the system it originated from.
References to the Patient in the ingested resource updated to the ingesting system's patient identifier.
Is this the "correct" way to handle persisting external FHIR resources? Or are there other solutions?

metadata isn't monolithic. metadata.version and lastUpdated would absolutely be replaced. profiles would typically be retained (unless data gets stripped that would violate the declared profiles or at least call validity into question). workflow and security tags might be retained or stripped.
In general, store whatever you can. Don't throw away data just because it isn't in your profile, but you can throw it away if you have no way to store it (so long as it's not a modifier element)
Commonly you'll add the 'id' of the source system as a '.identifier' in the new system to aid in mapping new data.

Related

Should database primary keys be used to identify entities across microservices?

Given I have two microservices: Service A and Service B.
Service A owns full customer data and Service B requires a small subset of this data (which it gets from Service A through some bulk load say).
Both services store customers in their own database.
If service B then needs to interact with service A say to get additional data (e.g. GET /customers/{id}) it clearly needs a unique identifier that is shared between the two services.
Because the ids are GUIDs I could simply use the PK from Service A when creating the record in Service B. So both PKs match.
However this sounds extremely fragile. One option is to store the 'external Id' (or 'source id') as a separate field in Service B, and use that to interact with Service A. And probably this is a string as one day it may not be a GUID.
Is there a 'best practice' around this?
update
So I've done some more research and found a few related discussions:
Should you expose a primary key in REST API URLs?
Is it a bad practice to expose the database ID to the client in your REST API?
Slugs as Primary Keys
conclusion
I think my idea of trying to keep both Primary Keys for Customer the same across Service A and B was just wrong. This is because:
Clearly PKs are service implementation specific, so they may be totally incompatible e.g. UUID vs auto-incremented INT.
Even if you can guarantee compatibility, although the two entities both happen to be called 'Customer' they are effectively two (potentially very different) concepts and both Service A and Service B both 'own' their own 'Customer' record. What you may want to do though is synchronise some Customer data across those services.
So I now think that either service can expose customer data via its own unique id (in my case the PK GUID) and if one service needs to obtain additional customer data from another service it must store the other service identifier/key and uses that. So essentially back to my 'external id' or 'source id' idea but perhaps more specific as 'service B id'.
I think it depends a bit on the data source and your design. But, one thing I would avoid sharing is a Primary key which is a GUID or auto-increment integer to an external service. Those are internal details of your service and not what other services should take a dependency on.
I would rather have an external id which is more understood by other services and perhaps business as a whole. It could be a unique customer number, order number or a policy number as opposed to an id. You can also consider them as a "business id". One thing to also keep in mind is that an external id can also be exposed to an end-user. Hence, it is a ubiquitous way of identifying that "entity" across the entire organization and services irrespective of whether you have an Event-Driven-Design or if your services talk through APIs. I would only expose the DB ids to the infrastructure or repository. Beyond that, it is only a business/ external id.
Well, if you have an idea on Value Object, a business ID will much better for designing.
DDD focus on business, an pure UUID/Auto increment ID can't present it.
Use a business meaning ID(UL ID), like a Customer ID, instead of a simple ID.

Microservice cross-db referencial integrity

We have a database that manages codes, such as a list of valid currencies, a list of country codes, etc (hereinafter known as CodesDB).
We also have multiple microservices that in a monolithic app + database would have foreign key constraints to rows in tables in the CodesDB.
When a microservice receives a request to modify data, what are my options for ensuring the codes passed in the request are valid?
I am currently leaning towards having the CodesDB microservice post an event onto a service bus announcing when a code is added or modified - and then each other microservice interested in that type of code (country / currency / etc) can then issue an API request to the CodeDB microservice to grab the state it needs and reflect the changes in its own local DB. That way we get referential integrity within each microservice DB.
Is this the correct approach? Are there any other recommended approaches?
Asynchronous event based notification is a pattern commonly used in micro services world for ensuring eventual consistency. Depending on how strict your consistency requirement are you may have to ensure additional checks.
Another possible approach could be to use
Read only data stores using materialized view. This is a form of CQRS pattern where data from multiple services is stored in a de-normalized form in read only data store. The data gets updated asynchronously using the approach mentioned above. The consumers gets fast access to data without having to query multiple services
Caching - You could also possibly use distributed or replicated depending on your performance or consistency requirements.

Data structure for activity feed

There's a concept of a workspace in our application. A user can be a member of virtually any number of workspaces and a workspace can have virtually any number of users. I want to implement an activity feed to help users find out what happened in every workspace they're members of, i.e. when someone uploads a file or creates a task in a workspace, this activity appears in that workspace's activity feed and also in each of its users activity feeds. The problem is that I can't come up with a suitable data structure for quick read and write operations of activities. What I have come up with is storing each activity with a property Targets which is a string of all the workspace's user ids and then filtering activities where that field contains an id of a user I want to fetch activities for, but this approach has serious performance and scalability limitations, because we use SharePoint as our storage. We can also use Azure Table or Blob Storage and I was thinking of just creating a separate activity entity for every user of a workspace so that then I can just easily filter activities by user's id, but this could result in hundreds of copies of the same activity if a workspace has hundreds of members and then writing all those copies becomes problematic as Azure only supports 100 entities in a single batch operation (correct me if I'm wrong), and SharePoint then is not an option at all. So I need help figuring out what data structure I could use to store activities of each workspace so that they're easily retrievable for any member probably by its id and also for any workspace by workspace's id.
We can also use Azure Table or Blob Storage and I was thinking of just creating a separate activity entity for every user of a workspace so that then I can just easily filter activities by user's id
Azure Storage Table could be a choice for storing your activity entities, and Table storage is relatively inexpensive, you can consider storing the same entity multiple times (with different partitioning strategy) in separate partitions or in separate tables for reading efficient.
And storing user’s activity entity with workspaceid_userid as a compound key can be also a possible approach. For more and detailed Table design patterns, please refer to this article.
Azure only supports 100 entities in a single batch operation (correct me if I'm wrong)
Yes, a single batch operation can include up to 100 entities.

FHIR - How to delete Vital Data

This link talks about how to write Vital information to FHIR database. Write Vital on Smart on FHIR
As we are demonstrating our data collection capabilities we find that we get a large amount of database making our graphs look cluttered.
How do we delete a particular observation, a set of observations or all the observations for a given patient ID and LOINC code?
Deletions are only handled on a resource by resource basis. There's no ability to delete by "query". So you'd have to do a query, find all the ids, then send requests to delete each record. (Though you could do so using batch.) Instructions for delete are here: http://build.fhir.org/http.html#delete

Text search for microservice architectures

I am investigating into implementing text search on a microservice based system. We will have to search for data that span across more than one microservice.
E.g. say we have two services for managing Organisations and managing Contacts. We should be able to search for organisations by contact details in one search operation.
Our preferred search solution is Elasticsearch. We already have a working solution based on embedded objects (and/or parent-child) where when a parent domain is updated the indexing payload is enriched with the dependent object data, which is held in a cache (we avoid making calls to the service managing child directly for this purpose).
I am wondering if there is a better solution. Is there a microservice pattern applicable to such scenarios?
It's not particularly a microservice pattern I would suggest you, but it fits perfectly into microservices and it's called Event sourcing
Event sourcing describes an architectural pattern in which events are generated by different sources. An event will now trigger 0 or more so called Projections which then use the data contained in the event to aggregate information in the form it is needed.
This is directly applicable to your problem: Whenever the organisation service changes it's internal state (Added / removed / updated an organization) it can fire an event. If an organization is added, it will for example aggregate the contacts to this organization and store this aggregate. The search for it is now trivial: Lookup the organizations id in the aggregated information (this can be indexed) and get back the contacts associated with this organization. Of course the same works if contracts are added to the contract service: It just fires a message with the contract creation information and the corresponding projections now alter different aggregates that can again be indexed and searched quickly.
You can have multiple projections responding to a single event - which enables you to aggregate information in many different forms - exactly the way you'd like to query it later. Don't be afraid of duplicated data: event sourcing takes this trade-off intentionally and since this is not the data your business-services rely on and you do not need to alter it manually - this duplication will not hurt you.
If you store the events in the chronological order they happened (which I seriously advise you to do!) You can 'replay' these events over and over again. This helps for example if a projection was buggy and has to be fixed!
If your're interested I suggest you read up on event sourcing and look for some kind of event store:
event sourcing
event store
We use event sourcing to aggregate an array of different searches in our system and we aggregate millons of records every day into mongodb. All projections have their own collection create their own indexes and until now we never had to resort to different systems / patterns like elastic search or the likes!
Let me know if this helped!
Amendment
use the data contained in the event to aggregate information in the form it is needed
An event should contain all the information necessary to aggregate more information. For example if you have an organization creation event, you need to at least provide some information on what the organizations name is, an ID of some kind, creation date, parent organizations ID etc. As a rule of thumb, we send all the information we gather in the service that gets the request (don't take it directly form the request ;-) check it first, then write it to the event and send it off) because we do not know what we're gonna need in the future. Just stay cautious - payloads should not get too large!
We can now have multiple projections responding to this event: One that adds the organizations to it's parents aggregate (to get an easy lookup for all children of a given organization), one that just adds it to the search set of all organizations and maybe a third that aggregates all the parents of a given child organization so the lookup for the parent organizations is easy and fast.
We have the same service process these events that also process client requests. The motivation behind it is, that the schema of the data that your projections create is tightly coupled to the way it is read by the service that the client interacts with. This does not have to be that way and it could be separated into two services - but you create an almost invisible dependency there and releasing these two services independently becomes even more challenging. But if you do not mind that additional level of complexity - you can separate the two.
We're currently also considering writing a generic service for aggregating information from events for things like searches, where projections could be scripted. That only makes the invisible dependencies problem less conspicuous, it does not solve it.

Resources