Is okapi pipeline instance thread safe? - thread-safety

I'm planning to build a service which uses okapi pipeline as the core library for translation. And for this I plan to have a singleton instance of the pipeline. However based on the code which I went through, most of the classes are stateful (like regex plain text filters), making the pipeline not thread safe. Is there any way we can have a singleton of the okapi pipeline instance?
Did anyone face a similar situation while developing using okapi libraries?

None of the filters are threadsafe, and I am fairly sure a lot of the pipeline code isn't either. Other people have run into this problem as well, but it's never been fixed -- the amount of code required (particularly to change the filters) is quite large. A general workaround is to abstract the filtering/pipeline process away behind something that looks threadsafe from the outside, such as Longhorn (or MateCat-Filters).

Related

Can AWS Lambda be used as the backend for getstream.io?

I didn't find any posts related to this topic. It seems natural to use Lambda as a getstream backend, but I'm not sure if it heavily depends on persistent connections or other architectural choices that would rule it out. Is it a sensible approach? Has anyone made it work? Any advice?
While you can build an entire website only in Lambda, you have to consider the followings:
Lambda behind API Gateway has a timeout limit of 30 seconds and a Payload size limit (both received and sended) of 6MB. While for most of the cases this is fine, if you have some really big operations or you need to send some really big datas (like a high resolution image), you can't do it with this approach, but you need to think about something else (for instance you can send an SNS to another Lambda function with higher timeout that can do all this asynchronously and then send the result to the client when it's done, supposing the client is capable of receiving events)
Lambda has cold starts, which in terms slow down your APIs when a client calls them for the first time in a while. The cold start time depends on the language you are doing your Lambdas, so you might consider this too. If you are using C# or Java for your Lambdas, than this is probably not the best choice. From this point of view, Node.JS and Python seems to be the best choices, with Golang rising. You can find more about it here. And by the way, you can now specify a Provisioned Throughput for your Lambda, which aims to fix the cold start issue, but I haven't used it yet so I can't tell if there is any difference (but I'm sure there is)
If done correctly you'll end up managing hundreds of Lambda functions, while with a standard Docker Container under ECS you'll manage few APIs with multiple endpoints. This point should not be underestimated, as on one side it will make changes easier in the future, since lambda will be small and you'll easily find the bug and fix it, but on the other side you have to move across these functions, which if you don't know exactly which lambda is responsible of what can be a long process
Lambda can't handle sessions as far as I know. Because after some time the Lambda container gets dropped, you can't store any session inside the Lambda itself. You'll always need a structure to store the session so it can be shared across multiple Lambda invocations, such as some records in a DynamoDB table or something else, but this mean that you have to write the code for this, while in a classic API (like a .NET Core one) all of this is handled by the language itself and you only need to store or retrieve items from the session (most of the times)
So... yeah! A backed written entirely in Lambda is possible. The company I work in does it and I must say is a lot better, both in terms of speed and development time. But those benefits comes later, since you need to face all of the reasons I listed above before, and is not as easy as it could seem
Yes, you can use AWS Lambda as backend and integrate with Stream API there.
Building an entire application on Lambda directly is going to be very complex and requires writing lot of boiler plate code just to enforce some basic organization and structure to your project.
My recommendation is use a serverless framework to do this that takes care of keeping your application well organized and to deploy new versions (and environments).
Serverless is a good option for that: https://serverless.com/framework/docs/providers/aws/guide/intro/

EF Migrations in orchestrator (eShopOnContainers)

Looking at eShopOnContainers, the microservice reference architecture from Microsoft. I see that for each service, in Program.cs a call is made to host.MigrateDbContext. This, in turn, executes all of the EF migrations for the given context.
In a real-world orchestrator isn't is possible that numerous containers for the same service could be spun up almost simultaneously? And if that happened, isn't it likely that multiple containers trying to execute the same migrations would deadlock or cause other issues?
Is this something that wasn't dealt with because it is beyond the scope of a reference project or does EF have something built in to handle concurrency that I'm not seeing?
I've found that there are numerous approaches to this problem, each with their own strengths and weaknesses. Some are straightforward... bringing the entire app down, updating schema, and then bringing the app back online. Some implement the schema changes as a series of smaller changes, each of which are both forward and backward compatible, allowing zero downtime. Still others leverage built-in or third-party tools written specifically to address this task.
So, to answer my own question, this topic was almost certainly omitted because it was beyond the scope of the eShopOnContainers project/eBooks. The right choice for you will vary based on your project's size, complexity, acceptable downtime, etc.

Changing spring-cloud-stream instance index/count at runtime

In spring-cloud-stream, is there a way to change the instance count and instance index of an application without restarting it?
Also, is there any recommended way to automatically populate these values? In the microservices world, this seems like it would prohibitively difficult, since services are starting and stopping all the time.
In spring-cloud-stream, is there a way to change the instance count and instance index of an application without restarting it?
Not in the current version, but open to discuss this in the context of a GitHub issue.
Also, is there any recommended way to automatically populate these values? In the microservices world, this seems like it would prohibitively difficult, since services are starting and stopping all the time.
My recommendation would be to look at http://cloud.spring.io/spring-cloud-dataflow/ which helps with the orchestration of complex microservice topologies (and is designed to work in conjunction with Spring Cloud Stream for streaming scenarios)

Using interception to implement caching - how to define keys?

TL;DR Can someone point me to a through implementation of a caching system that is added to the solution through interception?
I'm refactoring one of my solutions so that cross-cutting concerns are implemented through Unity Intercept. I've read the guides from MSFT, and now I think I can very easily implement the interception behaviors.
However, I was wondering about caching; I want to consistently use the cache regions and keys throughout the solution. Furthermore, I have key-specif configurations for expiration on my caching system.
On one example in the Unity's Developer Guide, it checks the method name -- this is a bad approach since it would mean altering the implementation everytime a new class/method must use cache (obviously).
I'm having this (mad) idea of implementing a configurable Interceptor that learns how to compose the region and key from the given parameters, and is configurable for each class(type)/method. However this would push a lot of responsibility to configuration; I don't like the feeling that I'm programming in the *.config file.
As you can see, I'm a tad bit lost on how to go about this. I don't like singletons and right now the caching system is a singleton, accessed everywhere by the solution. Can someone link me to a good documentation on how I should proceed about this? Is it possible to add cache and have proper keys/regions defined on the cache?
Quick search on the similar matter lead me to the "Attribute Based Cache using Unity Interception" project on CodePlex. Entire project looks to be abandoned in some Alpha stage, however, it should provide you with the baseline to start with.

DRb: how to check if remote object exists?

I've been toying around with DRb to use as my solution to communicate across multiple processes. I'm using the stardard process: one creates a service, registers it to a druby uri, and on the other process a DRbObject is created referencing that URI. So far so good. Let's say I kill the first process. Every subsequent method call on the remote object will culminate in a ECONNRefused exception. Which is only fair. But isn't there a way to see if the DRbObject is indeed registered in the given URI? I think testing it by forcing a ECONNRefused on every instance start to see if it exists is a bit silly.
Of course, other solutions involving resources other than DRb are always welcome, provided they indeed represent a plus.
You should check out ZeroMQ. It is somewhat more complex to set up than DRb but it handles all the presence/reconnection issues mostly transparently.
This may not be what you are looking for, but I have developed an IPC framework on top of DRb that hides all of the DRb stuff from the applications level. This includes client methods to find whatever services have registered with the server across the network. Probably too much overhead for you but maybe worth poking around in it. Anyway, you can check it out on Github.

Resources