Why does calling an external API in chainlink not cause inconsistency? - chainlink

Suppose I have a smart contract that uses Chainlink's "Call Any External API" capability to get some data from an external URL. My understanding is that each Ethereum full node runs each smart contract to verify the status; it does so to verify the latest block. But what if, between the time one full node runs the contract and another one does, the data returned by calling tha external API changes. Then it would seem that different full nodes would get different results for that smart contract, resulting in inconsistent states. Why does that not happen?

Because oracle responses are stored on-chain.
You are right, if fetching external data was part of the validation process, no nodes would be able to reach a consensus. So instead, a Chainlink oracle network places the data on-chain in a transaction, and then it goes through the same validation process as every other transaction.
You can read more about how Chainlink's Basic Request Model looks, but this is basically the reason why having an oracle system built into a blockchain is impossible.

Related

Atomically update database and send message. Outbox pattern or not?

You have a command/operation which means you both need to save something in database end send an event/message to another system. For example you have an OrderService and when a new order is created you want to publish an "OrderCreated"-event for another system/systems to react on (either direct message or using a message broker) and do something.
The easiest (and naive) implementation is to save in db and if successful then send message. But of course this is not bullet proof because the other service/message broker is down or your service crash before sending message.
One (and common?) solution is to implement "outbox pattern", i.e. instead of publish messages directly you save the message to an outbox table in your local database as part of your database transaction (in this example save to outbox table as well as order table) and have a different process (polling db or using change data capture) reading the outbox table and publish messages.
What is your solution to this dilemma, i.e. "update database and send message or do neither"? Note: I am not talking about using SAGAs (could be part of a SAGA though but this is next level).
I have in the past used different approaches:
"Do nothing", i.e just try to send the message and hope it will be sent. Which might be fine in some cases especially with a stable message broker running on same machine.
Using DTC (in my case MSDTC). Beside all the problem with DTC it might not work with your current solution.
Outbox pattern
Using an orchestrator which will retry process if you have not got a "completed" event.
In my current project it is not handled well IMO and I want to change it to be more resilient and self correcting. Sometimes when a service is calling another service and it fails the user might retry and it might work ok. But some operations might require out support to fix it (if it is even discovered).
ATM it is not a Microservice solution but rather two large (legacy) monoliths communicating and is running on same server but moving to a Microservice architecture in the near future and might run on multiple machines.

What is the best way to share events between Google cloud run containers

I have a service which is running on many cloud run containers.
When a single container (A) receives a web request to do some work, I need all the other live containers to fetch some updated data from elasticsearch.
I would have expected ES to have a "listening" type of connection such as firebase but this is not possible.
Right now I am having to poll the database from each service.
Is there a better way to achieve this sort of cross container sync when using cloud run? Would pub/sub be the best solution here?
It's unusual but not impossible to achieve.
First of all, you have to understand the instance life cycle: the CPU is allocated only when a request is being processed. Else, the CPU is throttle ( bellow 5%). That's also for that you pay only when your instance is processing, and not when the instance is kept warm (and offloaded after a while).
That being said, it's totally useless and inefficient to update instances in background when a request is not being processed.
Therefore, the idea is to perform something when the instance receive a request. The bad thing is that this solution will increase the request latency (the instance start to sync his cache and then process the request).
Finally the solution is to store, somewhere, the latest cache update. You have to keep that pretty same information in your instance. When the instance receive a request, first thing, it compares its own cache date with the central data date.
If it's the same, no problem, continue the processing.
If the central data date is after the current instance date, update the instance data, and then process the request.
You can store the data, and the date of that data in Firestore for instance, or in MemoryStore, or in any other databases.
PubSub can be also a solution but more complex to implement. Each instance, when they start have to create a pull subscription on a topic. When the instance is killed, you have to delete that subscription.
Then, when a request comes in, your instance have to pull the subscription, and get the messages, if any, and update his local cache.
Could be faster than the previous solution, but harder to implement.

How to log all microservices logs in a single Log-file using springboot

I have 5 web applications which I developed using Springboot(A,B,C,D and E).
Below are the flow of 2 calls
First Flow:
A's Controllers --> A's Service --> B's Controller --> B'Service --> C's Controller --> C's Service --> C's Dao --> DB
SecondFlow:
A's Controllers --> A's Service --> D's Controller --> D'Service --> B's Controller --> B'Service --> C's Controller --> C's Service --> C's Dao --> DB
Once fetch/push data from/into the DB then the corresponding methods are returning some value. For each and every method logging the status (input details and returning status). I am able to see logs in each service separately. But I want to see complete one request-response(A's controller request to A's controller response) cycle logs in one file.
How Can I achieve it?
This is a ver bad idea, but let's take a step back and look at the problem instead of guessing a solution.
You have multiple applications that collaborate together to execute (distributed) transactions. You need to trace those interactions to see your dataflow. This is very useful for many reasons so it's correct that you care about it. It is also correct to collect all you log entries in a single sink, even if it won't be a file because it is not well suited to manage prodcution workloads. A typical scenario that many organization implements is the following
Each application send logs to files or standard output
For each node of you infrastructure there is an agent that reads that streams, does some basic conversion (eg. translates log entries in a common format) and send data to a certain sink
The sink is a database, the best technology option is a DBMS without strict requirements about data schema (you are storing everything in a single huge table after all) and transactional properites (if the data are logs, you are fine with an optimistic concurrency control). You also want some tool that is more good at reads than writes and have good performance in complex searches to drill down a large amount of structured data
a Dashboard to read logs, make searches and even create dashboard with synthetic stats about events
BONUS: use a buffer to manage load spikes
There are precise tools to do the job and they are
logstash/beats/fluentd
Elasticsearch....what else? ;)
Kibana, the favourite Elasticsearch client
BONUS: rabbimq/kafka/otherMessageBroker or Redis
but you still miss a step
Suppose you call a REST API, something simple like a POST /users/:userId/cart
API Gateway receives your request with a JWT
API Gateway calls Authentication-service to validate and decode the JWT
API Gateway calls Authorization-service to check if the client as right to perform the request
API Gateway calls User-service to find :userId
User-Service calls Cart-service to add the product on :userId cart
Cart-Service calls Notification-Service to decide whether is needed to send a notification for the completed task
Notification-Service calls Push-Gateway to invoke an external push notification service
....and back
to not get lost in this labyrinth you NEED just one thing: the correlation ID
Correlation IDs attach a unique ID to all interaction beetween these microservices (headers in HTTP calls or AMQP messages, for instance) and your custom log library (because you've already built a custom logging library and shared it among all the teams) capture this ID to include it in every log entry wrote in the context of the single request processed from each of those microservices. You can even add that correlation ID in the client response, catch it if the respone carry out an error code and perform a query on your logs DB to find all the entries with the given correlation ID. If the system clock works, they will be retrieved in the correct time order and you will be able to reconstruct the dataflow
Distributed Systems make everything more complicate and add a lot of overhead on things that we ever done before, but if you put in your pocket the right tools to manage the complexity, you can see the benefits
you can implement central logging system like:
The ELK stack (Elastic Search, Logstash and Kibana) for Centralized Logging.

Preventing data loss in client authoritative database writes

A project I'm working on requires users to insert themselves into a list on a server. We expect a few hundred users over a weekend and while very unlikely, a collision could happen in which two users submit the list concurrently and one of them is lost. The server has no validation, it simply allows you to get and put data.
I was pointed in the direction of "optimistic locking" but I'm having trouble grasping when exactly the data should be validated and how it prevents this from happening. If one of the clients reads the data, adds itself and then checks again to ensure that the data is the same with the use of an index or timestamp, how does this prevent the other client from doing the same and then one overwriting the other?
I'm trying to understand the flow in the context of two clients getting data and putting data.
The point of optimistic locking is that the decision to accept or reject a write is taken on the server, and is protected against concurrency by a pessimistic transaction or some sort of hardware protection, such as compare-and-swap. So a client requests a write together with some sort of timestamp or version identifier, and the server only accepts the write if the timestamp is still accurate. If it isn't the client gets some sort of rejection code and will have to try again. If it is, the client gets told that its write succeeded.
This is not the only way to handle receiving data from multiple clients. One popular alternative is to use a reliable messaging system - for example the Java Messaging Service specifies an interface for such systems for which you can find open source implementations. Clients write into the messaging system and can go away as soon as their message is accepted. The server reads requests from the messaging system and acts on them. If the server or the network goes down it's no big deal: the messages will still be there to be read when they come back (typically they are written to disk and have the same level of protection as database data although if you look at a reliable message queue implementation you may find that it is not, in fact, built on top of a standard database table).
One example of a writeup of the details of optimistic locking is the HTTP server Etag specification e.g. https://en.wikipedia.org/wiki/HTTP_ETag

Handling remote api validation errors in service layer

Imagine that there is some manager class that talks to remote service, for example, to user microservice that can create new and update existing user profile. This manager class is used everywhere in the code: in controllers and other classes. Before talking to remote service, our manager class doesn't know if submitted DTO is valid. The question is: if remote service returns an validation errors, what to do next? How to handle this errors? I've thought about it, and have some options:
Throw an Exception when validation fails
Pass an Errors object that collects validation errors to the manager
make a method getLastErrors() in a manager class
Maybe there are other better solution exist?
p.s. Suppose that remote service returns errors in JSON format, it doesn't matter if it's JSON-RPC, SOAP or REST microservice.
Unless you want to translate service errors into something different, or even handle them to take certain decisions in the client tier, usually service errors are formatted in a human-readable way to be shown in the UI to let the user know what went wrong.
In the other hand, if there's no UI, there should be a logger. Like you would do in a UI layer, you would format those errors log them to a file or any other storage approach.
Also, you might want to learn more about what's the fail-fast concept:
In systems design, a fail-fast system is one which immediately reports
at its interface any condition that is likely to indicate a failure.
Fail-fast systems are usually designed to stop normal operation rather
than attempt to continue a possibly flawed process. Such designs often
check the system's state at several points in an operation, so any
failures can be detected early. A fail-fast module passes the
responsibility for handling errors, but not detecting them, to the
next-highest level of the system.
OP commented out this:
If validation errors are returned from the microservice, what manager
class should do then? Throw an Exception or put these errors in some
field in it's class?
About this concern, I've arrived to some conclusion, and it's that the entire flow should pass through a specialized DTO that I've called accumulated result (check the full description):
Represents a multi-purpose, orthogonal entity which transports both
results of a called operation and also useful information for the
callers like a status, status description and details about the actual
result of the whole operation.
That way, even in multi-tier architectures, each tier/layer can either add more info to the accumulated result or take decisions.
Probably some may argue that you should throw exceptions, but I don't find that a broken rule is an exception but an expected use case.
how to handle validation errors from remote service?
Return the relevant HTTP status code, along with as much information as is necessary (sometimes none) in the response body.
It is not important if it's SOAP or RESTful, imagine that JSON
response is returned
The type of service it is will determine what your failure handling approach will be. For SOAP services, you should return a SOAP fault.

Resources