Is there a feature for setting Min/Max/Fixed function/action replica in Openwhisk? - openwhisk

I have an Openwhisk setup on Kubernetes using [1]. For some study purpose, I want to have a fixed number of replicas/pods for each action that I deploy, essentially disabling the auto-scaling feature.
Similar facility exists for OpenFaas [2], where during deployment of a function, we can configure the system to have N function replicas at all times. These N function replicas (or pods) for the given function will always be present.
I assume this can be configured somewhere while deploying an action, but being a beginner in OpenWhisk, I could not find a way to do this. Is there a specific configuration that I need to change?
What can I do to achieve this in Openwhisk? Thanks :)
https://github.com/apache/openwhisk-deploy-kube
https://docs.openfaas.com/architecture/autoscaling/#minmax-replicas

OpenWhisk serverless functions follow closer to AWS lambda. You don’t set the number of replicas. OpenWhisk uses various heuristics and can specialize a container in milliseconds and so elasticity on demand is more practical than kube based solutions. There is no mechanism in the system today to set minimums or maximums. A function gets to scale proportional to the resources available in the system and when that capacity is maxed out, requests will queue.
Note that while AWS allows one to set the max concurrency, this isn’t the same as what you’re asking for, which is a fixed number of pre-provisioned resources.
Update to answer your two questions specifically:
Is there a specific configuration that I need to change?
There isn’t. This feature isn’t available at user level or deployment time.
What can I do to achieve this in Openwhisk?
You can modify the implementation in several ways to achieve what you’re after. For example, one model is to extend the stem-cell pool for specific users or functions. If you were interested in doing something like this, the project Apache dev list is a great place to discuss this idea.

Related

Scheduling tasks/messages for later processing/delivery

I'm creating a new service, and for that I have database entries (Mongo) that have a state field, which I need to update based on a current time, so, for instance, the start time was set to two hours from now, I need to change state from CREATED -> STARTED in database, and there can be multiple such states.
Approaches I've thought of:
Keep querying database entries that are <= current time and then change their states accordingly. This causes extra reads for no reason and half the time empty reads, and it will get complicated fast with more states coming in.
I write a job scheduler (I am using go, so that'd be not so hard), and schedule all the jobs, but I might lose queue data in case of a panic/crash.
I use some products like celery, have found a go implementation for it https://github.com/gocelery/gocelery
Another task scheduler I've found is on Google Cloud https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine, but I don't want to get stuck in proprietary technologies.
I wanted to use some PubSub service for this, but I couldn't find one that has delayed messages (if that's a thing). My problem is mainly not being able to find an actual name for this problem, to be able to search for it properly, I've even tried searching Microsoft docs. If someone can point me in the right direction or if any of the approaches I've written are the ones I should use, please let me know, that would be a great help!
UPDATE:
Found one more solution by Netflix, for the same problem
https://medium.com/netflix-techblog/distributed-delay-queues-based-on-dynomite-6b31eca37fbc
I think you are right in that the problem you are trying to solve is the job or task scheduling problem.
One approach that many companies use is the system you are proposing: jobs are inserted into a datastore with a time to execute at and then that datastore can be polled for jobs to be run. There are optimizations that prevent extra reads like polling the database at a regular interval and using exponential back-off. The advantage of this system is that it is tolerant to node failure and the disadvantage is added complexity to the system.
Looking around, in addition to the one you linked (https://github.com/gocelery/gocelery) there are other implementations of this model (https://github.com/ajvb/kala or https://github.com/rakanalh/scheduler were ones I found after a quick search).
The other approach you described "schedule jobs in process" is very simple in go because goroutines which are parked are extremely cheap. It's simple to just spawn a goroutine for your work cheaply. This is simple but the downside is that if the process dies, the job is lost.
go func() {
<-time.After(expirationTime.Sub(time.Now()))
// do work here.
}()
A final approach that I have seen but wouldn't recommend is the callback model (something like https://gitlab.com/andreynech/dsched). This is where your service calls to another service (over http, grpc, etc.) and schedules a callback for a specific time. The advantage is that if you have multiple services in different languages, they can use the same scheduler.
Overall, before you decide on a solution, I would consider some trade-offs:
How acceptable is job loss? If it's ok that some jobs are lost a small percentage of the time, maybe an in-process solution is acceptable.
How long will jobs be waiting? If it's longer than the shutdown period of your host, maybe a datastore based solution is better.
Will you need to distribute job load across multiple machines? If you need to distribute the load, sharding and scheduling are tricky things and you might want to consider using a more off-the-shelf solution.
Good luck! Hope that helps.

Eventual Consistency in microservice-based architecture temporarily limits functionality

I'll illustrate my question with Twitter. For example, Twitter has microservice-based architecture which means that different processes are in different servers and have different databases.
A new tweet appears, server A stored in its own database some data, generated new events and fired them. Server B and C didn't get these events at this point and didn't store anything in their databases nor processed anything.
The user that created the tweet wants to edit that tweet. To achieve that, all three services A, B, C should have processed all events and stored to db all required data, but service B and C aren't consistent yet. That means that we are not able to provide edit functionality at the moment.
As I can see, a possible workaround could be in switching to immediate consistency, but that will take away all microservice-based architecture benefits and probably could cause problems with tight coupling.
Another workaround is to restrict user's actions for some time till data aren't consistent across all necessary services. Probably a solution, depends on customer and his business requirements.
And another workaround is to add additional logic or probably service D that will store edits as user's actions and apply them to data only when they will be consistent. Drawback is very increased complexity of the system.
And there are two-phase commits, but that's 1) not really reliable 2) slow.
I think slowness is a huge drawback in case of such loads as Twitter has. But probably it could be solved, whereas lack of reliability cannot, again, without increased complexity of a solution.
So, the questions are:
Are there any nice solutions to the illustrated situation or only things that I mentioned as workarounds? Maybe some programming platforms or databases?
Do I misunderstood something and some of workarounds aren't correct?
Is there any other approach except Eventual Consistency that will guarantee that all data will be stored and all necessary actions will be executed by other services?
Why Eventual Consistency has been picked for this use case? As I can see, right now it is the only way to guarantee that some data will be stored or some action will be performed if we are talking about event-driven approach when some of services will start their work when some event is fired, and following my example, that event would be “tweet is created”. So, in case if services B and C go down, I need to be able to perform action successfully when they will be up again.
Things I would like to achieve are: reliability, ability to bear high loads, adequate complexity of solution. Any links on any related subjects will be very much appreciated.
If there are natural limitations of this approach and what I want cannot be achieved using this paradigm, it is okay too. I just need to know that this problem really isn't solved yet.
It is all about tradeoffs. With eventual consistency in your example it may mean that the user cannot edit for maybe a few seconds since most of the eventual consistent technologies would not take too long to replicate the data across nodes. So in this use case it is absolutely acceptable since users are pretty slow in their actions.
For example :
MongoDB is consistent by default: reads and writes are issued to the
primary member of a replica set. Applications can optionally read from
secondary replicas, where data is eventually consistent by default.
from official MongoDB FAQ
Another alternative that is getting more popular is to use a streaming platform such as Apache Kafka where it is up to your architecture design how fast the stream consumer will process the data (for eventual consistency). Since the stream platform is very fast it is mostly only up to the speed of your stream processor to make the data available at the right place. So we are talking about milliseconds and not even seconds in most cases.
The key thing in these sorts of architectures is to have each service be autonomous when it comes to writes: it can take the write even if none of the other application-level services are up.
So in the example of a twitter like service, you would model it as
Service A manages the content of a post
So when a user makes a post, a write happens in Service A's DB and from that instant the post can be edited because editing is just a request to A.
If there's some other service that consumes the "post content" change events from A and after a "new post" event exposes some functionality, that functionality isn't going to be exposed until that service sees the event (yay tautologies). But that's just physics: the sun could have gone supernova five minutes ago and we can't take any action (not that we could have) until we "see the light".

Comparison with mainstream workflow engines

I'd like to use Spring SM in my next future that has very simple workflows, 3-4 states, rule based transitions, and max actors.
The WF is pretty fixed, so storing its definition in java config is quite ok.
I'd prefer to use SM than WF engine which comes with the whole machinery, but I couldnt find out if there is a notion of Actor.
Meaning, only one particular user (determined by login string) can trigger a transition between states.
Also, can I run the same State machine definition in parallel. Is there a notion of instance, like process instance in WF jargon?
Thanks,
Milan
Actor with a security is an interesting concept but we don't have anything build in right now. I'd say that this can be accomplished via Spring Security i.e. https://spring.io/blog/2013/07/04/spring-security-java-config-preview-method-security/ and there's more in its reference doc.
I could try to think if there's something what we could do to make this easier with Spring Security.
Parallel machines are on my todo list. It is a big topic so takes while to implement. Follow https://github.com/spring-projects/spring-statemachine/issues/35 and other related tickets. That issue is a foundation of making distributed state machines.

Block assignation using network topology

If I well understood principles when applying network topology, blocks are written:
On the client server if hosting a datanode
On a second server defined on a different rack
On a third server defined on the same rack as #2
Is this policy configurable or it is “hard-written” in class? Of course, I do not want to modify any class by myself…
Basically, I would like to:
Take into account datacenter (according to what I read, HDFS do not care datacenters even if using network topology)
Force the write in 3 distinct racks
How do I do that?
There is a capability to override the baseline block allocation algorithm but it does involve writing quite a bit of Java code and there aren't any real good examples out there. Here is a blog with a link to the JIRA ticket explaining the enhancement:
http://hadoopblog.blogspot.com/2009/09/hdfs-block-replica-placement-in-your.html
https://issues.apache.org/jira/browse/HDFS-385

Azure scalability over XML File

What is the best practise solution for programmaticaly changing the XML file where the number of instances are definied ? I know that this is somehow possible with this csmanage.exe for the Windows Azure API.
How can i measure which Worker Role VMs are actually working? I asked this question on MSDN Community forums as well: http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/02ae7321-11df-45a7-95d1-bfea402c5db1
To modify the configuration, you might want to look at the PowerShell Azure Cmdlets. This really simplifies the task. For instance, here's a PowerShell snippet to increase the instance count of 'WebRole1' in Production by 1:
$cert = Get-Item cert:\CurrentUser\My\<YourCertThumbprint>
$sub = "<YourAzureSubscriptionId>"
$servicename = '<YourAzureServiceName>'
Get-HostedService $servicename -Certificate $cert -SubscriptionId $sub |
Get-Deployment -Slot Production |
Set-DeploymentConfiguration {$_.RolesConfiguration["WebRole1"].InstanceCount += 1}
Now, as far as actually monitoring system load and throughput: You'll need a combination of Azure API calls and performance counter data. For instance: you can request the number of messages currently in an Azure Queue:
http://yourstorageaccount.queue.core.windows.net/myqueue?comp=metadata
You can also set up your role to capture specific performance counters. For example:
public override bool OnStart()
{
var diagObj= DiagnosticMonitor.GetDefaultInitialConfiguration();
AddPerfCounter(diagObj,#"\Processor(*)\% Processor Time",60.0);
AddPerfCounter(diagObj, #"\ASP.NET Applications(*)\Request Execution Time", 60.0);
AddPerfCounter(diagObj,#"\ASP.NET Applications(*)\Requests Executing", 60.0);
AddPerfCounter(diagObj, #"\ASP.NET Applications(*)\Requests/Sec", 60.0);
//Set the service to transfer logs every minute to the storage account
diagObj.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);
//Start Diagnostics Monitor with the new storage account configuration
DiagnosticMonitor.Start("DiagnosticsConnectionString",diagObj);
}
So this code captures a few performance counters into local storage on each role instance, then every minute those values are transferred to table storage.
The trick, now, is to retrieve those values, parse them, evaluate them, and then tweak your role instances accordingly. The Azure API will let you easily pull the perf counters from table storage. However, parsing and evaluating will take some time to build out.
Which leads me to my suggestion that you look at the Azure Dynamic Scaling Example on the MSDN code site. This is a great sample that provides:
A demo line-of-business app hosting a wcf service
A load-generation tool that pushes messages to the service at a rate you specify
A load-monitoring web UI
A scaling engine that can either be run locally or in an Azure role.
It's that last item you want to take a careful look at. Based on thresholds, it compares your performance counter data, as well as queue-length data, to those thresholds. Based on the comparisons, it then scales your instances up or down accordingly.
Even if you end up not using this engine, you can see how data is grabbed from table storage, massaged, and used for driving instance changes.
Quantifying the load is actually very application specific - particularly when thinking through the Worker Roles. For example, if you are doing a large parallel processing application, the expected/hoped for behavior would be 100% CPU utilization across the board and the 'scale decision' may be based on whether or not the work queue is growing or shrinking.
Further complicating the decision is the lag time for the various steps - increasing the Role Instance Count, joining the Load Balancer, and/or dropping from the load balancer. It is very easy to get into a situation where you are "chasing" the curve, constantly churning up and down.
As to your specific question about specific VMs, since all VMs in a Role definition are identical, measuring a single VM (unless the deployment starts with VM count 1) should not really tell you much - all VMs are sitting behind a load balancer and/or are pulling from the same queue. Any variance should be transitory.
My recommendation would be to pick something that is not inherently highly variable to monitor (e.g. CPU). Generally, you want to find a trending point - for web apps it may be the response queue, for parallel apps it may be azure queue depth, etc. but for either they would be the trend and not the absolute number. I would also suggest measuring them at fairly broad intervals - minutes, not seconds. If you have a load you need to respond to in seconds, then realistically you will need to increase your running instance count ahead of time.
With regard to your first question, you can also use the Autoscaling Application Block to dynamically change instance counts based on a set of predefined rules.

Resources