I have a central system that publishes new records to a message bus topic.
Multiple agents subscribe to these messages and create new records in their respective systems using REST APIs.
These downstream systems cannot accommodate my central system's record Ids.
So I need to link records across all systems using a central record linkage repository e.g.
Central System Id
System A Id
System B Id
1
3231
767
2
3232
768
When each agent creates a new record, there is an opportunity to grab the new downstream system Id in the HTTP response message and use it to populate the above respository.
But the agents have one chance to take note of this Id and either update the central record linkage repository directly or place the Id on a message bus.
If there is a system failure before the agent can persist the Id, there is no way getting the Id back from the downstream system without a human needing to perform record matching.
For these lost records, an agent cannot consult the central record linkage repository to determine whether the record already exists, and therefore creates duplicate records in downstream system.
How can I implement a reliable record linkage strategy?
Alternatively, I could look towards implementing idempotent consumers but the attibutes used for matching existing records could change between source and target systems.
Related
Suppose I have a smart contract that uses Chainlink's "Call Any External API" capability to get some data from an external URL. My understanding is that each Ethereum full node runs each smart contract to verify the status; it does so to verify the latest block. But what if, between the time one full node runs the contract and another one does, the data returned by calling tha external API changes. Then it would seem that different full nodes would get different results for that smart contract, resulting in inconsistent states. Why does that not happen?
Because oracle responses are stored on-chain.
You are right, if fetching external data was part of the validation process, no nodes would be able to reach a consensus. So instead, a Chainlink oracle network places the data on-chain in a transaction, and then it goes through the same validation process as every other transaction.
You can read more about how Chainlink's Basic Request Model looks, but this is basically the reason why having an oracle system built into a blockchain is impossible.
A project I'm working on requires users to insert themselves into a list on a server. We expect a few hundred users over a weekend and while very unlikely, a collision could happen in which two users submit the list concurrently and one of them is lost. The server has no validation, it simply allows you to get and put data.
I was pointed in the direction of "optimistic locking" but I'm having trouble grasping when exactly the data should be validated and how it prevents this from happening. If one of the clients reads the data, adds itself and then checks again to ensure that the data is the same with the use of an index or timestamp, how does this prevent the other client from doing the same and then one overwriting the other?
I'm trying to understand the flow in the context of two clients getting data and putting data.
The point of optimistic locking is that the decision to accept or reject a write is taken on the server, and is protected against concurrency by a pessimistic transaction or some sort of hardware protection, such as compare-and-swap. So a client requests a write together with some sort of timestamp or version identifier, and the server only accepts the write if the timestamp is still accurate. If it isn't the client gets some sort of rejection code and will have to try again. If it is, the client gets told that its write succeeded.
This is not the only way to handle receiving data from multiple clients. One popular alternative is to use a reliable messaging system - for example the Java Messaging Service specifies an interface for such systems for which you can find open source implementations. Clients write into the messaging system and can go away as soon as their message is accepted. The server reads requests from the messaging system and acts on them. If the server or the network goes down it's no big deal: the messages will still be there to be read when they come back (typically they are written to disk and have the same level of protection as database data although if you look at a reliable message queue implementation you may find that it is not, in fact, built on top of a standard database table).
One example of a writeup of the details of optimistic locking is the HTTP server Etag specification e.g. https://en.wikipedia.org/wiki/HTTP_ETag
As I heard most of the time that in micro services architecture, for every single micro service we have to create individual database.
But if I have to maintain foreign key constraint across the different databases which is not possible. Like I have a user table in authentication micro service and I want to use it in my catalog service(userid column from user table)
So how can it be resolve.
Thanks in Advance
You can maintain a shadow copy (with only useful information for eg. just the userid column) of user table in catalog service via event sourcing(for e.g. you can use rabbit MQ or apache kafka for async messaging).
Catalog service will use the user information in read only mode. This solution is however effective only when user information doesn't change frequently. Otherwise async communication can be inefficient and costly.
In that case you can implement API calls from catalog service to user service for any validations to be done on user data.
Use the Saga Pattern to maintain data consistency across services.
A saga is a sequence of local transactions. Each local transaction
updates the database and publishes a message or event to trigger the
next local transaction in the saga. If a local transaction fails
because it violates a business rule then the saga executes a series of
compensating transactions that undo the changes that were made by the
preceding local transactions.
I'm new to the microservices architecture and am seeing that it is possible under the model to call a microservice from another via HTTP request. However, I am reading that if a service is down all other services should still operate.
My question is, how is this generally achieved?
Example, a microservice that handles all Car record manipulation may need access to the service which handles the Vehicle data. How can the Car Microservice complete it's operations if that service is down or doesn't respond?
You should generally consider almost zero sync communication between microservices(if still you want sync comminucation try considering circuit breakers which allow your service to be able to respond but with logical error message , if no circuit breaking used dependent services will also go down completly).This could be achieved by questioning the consistency requirement of the micorservice.
Sometimes these things are not directly visible , for eg: lets say there are two services order service and customer service and order service expose a api which say place a order for customer id. and business say you cannot place a order for a unknown customer
one implementation is from the order service you call the customer service in sync ---- in this case customer service down will impact your service, now lets question do we really need this.
Because a scenario could happen where customer just placed an order and somebody deleted that customer from customer service, now we have a order which dosen't belong to customer.Consistency cannot be guaranteed.
In the new sol. we are saying allow the order service to place the order without checking the customer id and do one of the following:
Using ProcessManager check the customer validity and update the status of the order as invalid and when customer get deleted using ProcessManager update the order status as invalid or perform business logic
Do not check at all , because placing a order dosen't count a thing, when this order will be in the process of dispatch that service will anyway check the customer status
Statement here is try to achieve more async communication between microservices , mostly you will be able find the sol. in the consistency required by the business. But in case your business wants to check it 100% you have to call other service and if other service is down , your service will give logical errors.
Is there anyway to get the same "MessageId" you can get in Exchange EWS when using ActiveSync?
I thought this was an Exchange way to identify each message uniquely, but I can't seem to find a way to retrieve it using ActiveSync.
EDIT: I've got 2 applications, one that stores info using ActiveSync, and one that stores info using EWS, and I want them to be able to work separately on the same message.... To do this, I was hoping to use the EWS MessageId, which seems to be a GUID type identifier for each individual message. (Note: This doesn't appear to be the same Message-ID as is found in email headers).
Sadly, you're mostly out of luck.
ActiveSync is not an integration protocol, it's a mobile synchronization protocol designed for low-bandwidth communication devices like smart phones. A lot of capabilities in EWS will not exist in EAS.
Long-term message identification and correlation isn't as important for mobile devices. They simply get told what messages are in each folder, and allow the user to manipulate them. At any time the Exchange server may tell its EAS-connected clients to "re-sync" which causes them to forget the messages they have on the device and pull them cleanly from the server. That happens a lot with EAS, sometimes a couple of times an hour, depending on what is happening with that mailbox. For example, deleting a folder via Outlook causes a FolderSync to happen, and that forces connected devices to cleanly re-sync again.
Therefore EAS appears to have left behind the notion of GUIDs or other long term IDs for messages. Instead, the server will assign ephemeral IDs that are valid only until the next big resync is forced (which could happen at any time). You'll probably see Exchange give very simple IDs like 7:45 (which means message ID 45 within folder 7, IIRC). However after a resync that might have the number 7:32 (if the user deletes other messages in that folder) or something like 4:22 (if the message gets moved to another folder entirely).
Other EAS servers like Zimbra, Kerio or Notes Traveler might assign GUIDs, but from memory this is how Exchange behaves. Your only option might be to put a hidden correlation ID of your own into the body or subject of messages you're interested in. That will allow you to track the lifecycle of the items you're interested in, at the expense of some odd stuff being visible to users in their message contents.
#Brian is correct - There are no global unique identifiers for ActiveSync items that can be used to correlate with EWS (With some exceptions, for instance a meeting invite has a UID, as do Events which can be used with some hackery to retrieve an EWS ID for the related EWS calendar event) and there are no fields that aren't visible to the user that can be hijacked for adding your own data with which to correlate. This is most apparent in email, contacts, tasks, notes etc...
However if you are syncing both, it is possible to use the meta data in the objects to match. For instance, for contacts write a hashing algorithm that combines the data from the first name, last name, company name, etc... fields and produces a result. This can be run on the data from both sides and will have relatively little object collision for matching (and those that do collide will have exactly the same visible data to the user anyway so in most cases it won't matter that you didn't get an exact alignment)