Best practices to diagnose a object storage health check - microservices

I'm programming a health-check for a microservice that relies in a object storage (minio).
My approach to diagnose if the object-storage is healthy is to call: bucketExists function, validating that the bucket exists and I have stable connection to it.
Since is checked every 1 second I need this call to be efficient and have a light workload. Here are the functions that minio have: Minio Javascript SDK
My question is: Is correct to use this function as a health-check? There is a best practice way to do this?
Thanks in advance for reading my question :-)

Any idempotent operation(BucketHead, ListBuckets, ListObjects etc) can be used for checking the health of the service. A BucketExists operation may return correctly with a bucket not found error. So make sure you can differentiate between that and the end-point not being reachable. If you don't have a large number of buckets then the ListBuckets API is the easiest.
There is also the Minio Admin API package that provides APIs for managing and querying a minio deployment : https://github.com/minio/minio/tree/master/pkg/madmin , however this seems to be only available in Go.

Related

Index data entered in the smart contract for offchain computation

I want to index the data entered in the near protocol smart contract for offchain computation.
How to trigger a new entry of the smart contract in offchain sql database or elasticsearch for real-time data indexing?
I can do that in the frontend, but don't know if it's the right/best method as different users can use different frontend for querying the blockchain.
Great question. There is currently not an "event" system in NEAR. So the solution will have to be polling for now.
I would recommend looking at this experimental method:
https://docs.near.org/docs/api/rpc-experimental#example-of-data-changes
The code contained in that resource is meant to be run in Terminal, similar to running a curl command. These types of RPC calls are fairly easy to understand if you look in near-api-js. For an example of how a NodeJS app uses this library which talks to the RPC, I would recommend looking in this directory of near-shell:
https://github.com/near/near-shell/tree/master/commands

Can AWS Lambda be used as the backend for getstream.io?

I didn't find any posts related to this topic. It seems natural to use Lambda as a getstream backend, but I'm not sure if it heavily depends on persistent connections or other architectural choices that would rule it out. Is it a sensible approach? Has anyone made it work? Any advice?
While you can build an entire website only in Lambda, you have to consider the followings:
Lambda behind API Gateway has a timeout limit of 30 seconds and a Payload size limit (both received and sended) of 6MB. While for most of the cases this is fine, if you have some really big operations or you need to send some really big datas (like a high resolution image), you can't do it with this approach, but you need to think about something else (for instance you can send an SNS to another Lambda function with higher timeout that can do all this asynchronously and then send the result to the client when it's done, supposing the client is capable of receiving events)
Lambda has cold starts, which in terms slow down your APIs when a client calls them for the first time in a while. The cold start time depends on the language you are doing your Lambdas, so you might consider this too. If you are using C# or Java for your Lambdas, than this is probably not the best choice. From this point of view, Node.JS and Python seems to be the best choices, with Golang rising. You can find more about it here. And by the way, you can now specify a Provisioned Throughput for your Lambda, which aims to fix the cold start issue, but I haven't used it yet so I can't tell if there is any difference (but I'm sure there is)
If done correctly you'll end up managing hundreds of Lambda functions, while with a standard Docker Container under ECS you'll manage few APIs with multiple endpoints. This point should not be underestimated, as on one side it will make changes easier in the future, since lambda will be small and you'll easily find the bug and fix it, but on the other side you have to move across these functions, which if you don't know exactly which lambda is responsible of what can be a long process
Lambda can't handle sessions as far as I know. Because after some time the Lambda container gets dropped, you can't store any session inside the Lambda itself. You'll always need a structure to store the session so it can be shared across multiple Lambda invocations, such as some records in a DynamoDB table or something else, but this mean that you have to write the code for this, while in a classic API (like a .NET Core one) all of this is handled by the language itself and you only need to store or retrieve items from the session (most of the times)
So... yeah! A backed written entirely in Lambda is possible. The company I work in does it and I must say is a lot better, both in terms of speed and development time. But those benefits comes later, since you need to face all of the reasons I listed above before, and is not as easy as it could seem
Yes, you can use AWS Lambda as backend and integrate with Stream API there.
Building an entire application on Lambda directly is going to be very complex and requires writing lot of boiler plate code just to enforce some basic organization and structure to your project.
My recommendation is use a serverless framework to do this that takes care of keeping your application well organized and to deploy new versions (and environments).
Serverless is a good option for that: https://serverless.com/framework/docs/providers/aws/guide/intro/

What is the right method for storing files in a microservice architecture?

I'm currently working on a traditional monolith application, but I am in the process of breaking it up into spring microservices managed by kubernetes. The application allows the uploading/downloading of large files and these files are normally stored on the host filesystem. I'm wondering what would be the most viable method of persisting these files in a microservice architecture?
You have a bunch of different options, Googling your question you'll find many answers, for any budget and taste. Basically you'd want high-availability storage like AWS S3. You could setup your own dedicated server to store these files as well if you wanted to cut costs, but then you'd have to worry about backups and availability. If you need low latency access to these files then you'd want to have them behind CDN as well.
We are mostly on prem. We end up using nfs. Path to least resistance, but probably not the most performant and making it highly available is tough. If you have the chance i agree with Denis Pshenov, that S3-like system for example minio might be a better alternative.
Maybe you should have a look at the rook project (https://rook.io/). It's easy to set up and provides different kinds of storage and persistence technologies to your CNAs.
There are many places to store your data. It also depends on the budget that you are able to spent (Holding duplicate data means also more storage which costs money) and mostly on your business requirements.
Is all data needed at all time?
Are there geo/region-related cases?
How fast needs a read / write operation need to be?
Do things need to be cached?
Statefull or Stateless?
Are there operational requirements? How should this be maintained?
...
A part from this your microservices should not know where the data is actually stored. In kubernetes you can use Persistent-Volumes https://kubernetes.io/docs/concepts/storage/persistent-volumes/ that can link to a storage of your Cloud-Provider or something else. The microservice should just mount the volume and be able to treat it like a local file.
Note that the Cloud Provider Storages already include solutions for scaling, concurrency etc. So I would probably use a single Blob-Storage under the hood.
However it has to be said, there is trend to understand a microservice as a package of data and logic coupled together and also accept duplicating the data, which leads to better scalability.
See for more information:
http://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/
https://github.com/katopz/best-practices/blob/master/best-practices-for-building-a-microservice-architecture.md#stateless-service-instances
https://12factor.net/backing-services
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html

Will it be a safe to access elasticsearch directly from Javascript API

I am learning elasticsearch. I wanted to know how safe (in terms of access control & validating user access) it is to access ES server directly from JavaScript API rather than accessing it through your backend ? Will it be a safe to access ES directly from Javascript API ?
Depends on what you mean by "safe".
If you mean "safe to expose to the internet", then no, definitely not, as there isn't any access control and anyone will be able to insert data or even drop all the indexes.
This discussion gives a good overview of the issue. Relevant section:
Just as you would not expose a database directly to the Internet and let users send arbitrary SQL, you should not expose Elasticsearch to the world of untrusted users without sanitizing the input. Specifically, these are the problems we want to prevent:
Exposing private data. This entails limiting the searches to certain indexes, and/or applying filters to the searches.
Restricting who can update what.
Preventing expensive requests that can overwhelm or crash nodes and/or the entire cluster.
Preventing arbitrary code execution through dynamic scripts.
Its most certainly possible, and it can be "safe" if say you're using it as an internal tool behind some kind of authentication. In general, no, its not secure. Elasticsearch API can create, delete, update, and search, which is not something you want to give a client access to or they could essentially do any or all of these things.
You could, in theory, create role based auth with ElasticSearch Shield, but it's far from standard practice. Its not really anymore difficult to implement search on your backend then just have a simple call to return search results.

Is it a good practice to use useMasterKey() in cloud code?

For security reasons we are planning to disable all write access to classes. Client applications (android and IOS sdks) will only have read-only access to the Parse data(classes)
Data stored in parse servers will be only modified by cloud functions. The cloud functions will call
Parse.Cloud.useMasterKey();
We have come up with this solution because it is nearly impossible to hide parse application ids and parse client key from attackers/hackers.
So is this a good solution? Are there any drawbacks? Does "Parse.Cloud.useMasterKey()" method have performance implications?
Thank you...
This is pretty standard practice, you implement your own security in your Cloud Functions and use the master key.
In theory it might be more efficient using the master key as the Parse servers don't have to process the Roles/Users/ACLs at all when the master key is used. Of course you need to balance that with any extra check your security logic does.

Resources