Autoscaling VertexAI pipeline components - google-ai-platform

I am exploring VertexAI pipelines and understand that it is a managed alternative to, say, AI Platform pipelines (where you have to deploy a GKE cluster to be able to run Kubeflow pipelines). What I am not clear on is whether VertexAI will autoscale the cluster depending on the load. In the answer to a similar question, it is mentioned that for pipeline steps that use GCP resources such as Dataflow etc., autoscaling will be done automatically. In the google docs, it is mentioned that for components, one can set resources, such as CPU_LIMIT GPU_LIMIT etc. My question is, can these limits be set for any type of component, i.e., Google Cloud pipeline components or Custom components, whether Python function-based or those packaged as a container image? Secondly, do these limits mean that the components resources will autoscale till they hit those limits? And what happens if these options are not even specified, how are the resources allocated then, will they autoscale as VertexAI sees fit?
Links to relevant docs and resources would be really helpful.

To answer your questions,
1. Can these limits be set for any type of components?
Yes. Because, these limits are applicable to all Kubeflow components and are not specific to any particular type of component.
These components could be implemented to perform tasks with a set amount of resources.
2. Do these limits mean that the component resources will autoscale till they hit the limits?
No, there is no autoscaling performed by Vertex AI. Based on the limits set, Vertex AI chooses one suitable VM to perform the task.
Having a pool of workers is supported in Google Cloud Pipeline Components such as “CustomContainerTrainingJobRunOp” and “CustomPythonPackageTrainingJobRunOp” as part of Distributed Training in Vertex AI. Otherwise, only 1 machine is used per step.
3. What happens if these limits are not specified? Does Vertex AI scale the resources as it sees fit?
If the limits are not specified, an “e2-standard-4” VM is used for task execution as the default option.
EDIT: I have updated the links with the latest version of the documentation.

Related

Using Kuma to run a multi-cloud service mesh

How can I use Kuma to run a multi-cloud service mesh that spans across a VM-based environment as well as a Kubernetes-based environment?
Specifically, how will service discovery work in such a way that VM-based workloads can discover K8s-based ones and vice-versa?
Kuma defines the so-called zone as a domain of control isolation, i.e. all workload connections are managed by a single control plane. Such a control plane is called remote. The overall view and policy management is done in a global control plane, which unifies all zones.
When one starts planning a distributed deployment, they have to enlist the following items:
Where the Global control plane will be deployed and its type. The latter can be either Universal (VM/BareMetal/Container) or Kubernetes(on-premise/cloud).
Number and type of zones to add. These can be changed over time.
Follow the instructions to install the global control plane following the steps specific for the chose type of deployment. Gather the relevant IP address/ports as described.
Installing remote control plane is fairly trivial. This process can be repeated as needed during the lifetime of the whole multi-zone deployment.
Cross-zone service consumption is described in brief here. In short, we do recommend using the following syntax to access a service echo-server, deployed in a Kubernetes namespace echo-example and exposed on port 1010:
<kuma-enabled-pod>$ curl http://echo-server_echo-example_svc_1010.mesh
Using this syntax, the service can be found and consumed even from a neighbouring Universal zone where the workload runs in a VM. Kuma leverages its own DNS service, that allows for this service discovery.
It is recommended that service declared in VMs follow the same service naming format so that if needed to have a service replica in a Kubernetes cluster, they can be easily interchanged without the need to reconfigure the whole infrastructure.

Fargate vs Lambda, when to use which?

I'm pretty new to the whole Serverless landscape, and am trying to wrap my head around when to use Fargate vs Lambda.
I am aware that Fargate is a serverless subset of ECS, and Lambda is serverless as well but driven by events. But I'd like to be able to explain the two paradigms in simple terms to other folks that are familiar with containers but not that much with AWS and serverless.
Currently we have a couple of physical servers in charge of receiving text files, parsing them out, and populating several db tables with the results. Based on my understanding, I think this would be a use case better suited for Lambda because the process that parses out the text files is triggered by a schedule, is not long running, and ramps down when not in use.
However, if we were to port over one of our servers that receive API calls, we would probably want to use Fargate because we would always need at least one instance of the image up and running.
In terms of containers, and in very general terms would it be safe to say that if the container is designed to do:
docker run <some_input>
Then it is a job for Lambda.
But if the container is designed to do something like:
docker run --expose 80
Then it is a job for Fargate.
Is this a good analogy?
That's the start of a good analogy. However Lambda also has limitations in terms of available CPU and RAM, and a maximum run time of 15 minutes per invocation. So anything that needs more resources, or needs to run for longer than 15 minutes, would be a better fit for Fargate.
Also I'm not sure why you say something is a better fit for Fargate because you "always need at least one instance running". Lambda+API Gateway is a great fit for API calls. API Gateway is always ready to receive the API call and it will then invoke a Lambda function to process it (if the response isn't already cached).
It is important to notice that with Lambda you don't need to build, secure, or maintain a container. You just worry about the code. Now as mentioned already, Lambda has a max run time limit and 3GB memory limit (CPU increases proportionally). Also if it is used sporadically it may need to be pre-warmed (called on a schedule) for extra performance.
Fargate manages docker containers, which you need to define, maintain and secure. If you need more control of what is available in the environment where your code runs, you could potentially use a container (or a server), but that again comes with the management. You also have more options on Memory/CPU size and length of time your run can take to run.
Even for an API server as you mentioned you could put API gateway in front and call Lambda.
As Mark has already mentioned, you can Lambda + API Gateway to expose your lambda function as API.
But lambda has significant limitations in terms of function executions. There are limitations on the programming languages supported, memory consumption and execution time (It was increased to 15 mins recently from the earlier 5 mins). This is where AWS Fargate can help by giving the benefits of both container world and Serverless (FaaS) world. Here you worry only about container (its CPU, memory requirements, IAM policies..) and leave the rest to Amazon ECS by choosing Fargate launch type. ECS will choose the right instance type, manage your cluster, it's auto scaling, optimum utilization.
This is the right analogy, but it is not an exhaustive list to be able to explain the two paradigms.
In general, Lambda is more suitable for serverless applications. Its nature is a function-as-a-service (FaaS). It just does the simple tasks and that’s all. Don’t expect too much more.
It should be considered as the first option for serverless module. But it has more limitations and restrictions. Module architecture elaborated from functional and not-functional requirements, surrounded infrastructure and many other factors.
To make a decision minimum you must review the list of restrictions such as:
Portability
Environment control
Trigger type
Response time
Response size
Process time
Memory usage
These are the main factors. But the list hasn’t covered all the factors and restrictions to consider between both these serverless technologies.
To know more about I recommend this article https://greenm.io/aws-lambda-or-aws-fargate/

GCE managed groups api for horizontally scaling kubernetes nodes

i would like to scale kubernetes nodes according to unscheduled pods. if i have pods that can't be scheduled because of their resource requirements, i want to add a new node to the cluster.
looking at autoscaling feature of managed groups in GCE, this doesn't seem to be possible, as their model requires a metric per node in the cluster, while my metric is global.
can anyone confirm that this can't be achieved with current GCE solution?
anyone know of any existing tool/blogpost whatever that could help implementing a solution?
assuming i'm going to roll my own, i'm having problems finding an api that controls GCE managed groups (allows to add a node, remove a node)
thanks,
Nathan
If you are fine with the standard per-node metrics, read the "Horizontal auto-scaling of nodes (GCE)" section of the kubernetes cluster management guide to enable to autoscaler.
If you want custom metrics, you can check out the GCE document.
There is also a similar question on stackoverflow, and the author of one of the answers said that after writing their own custom metrics, the standard per-node metrics was found to be just as good, if not better, for their use case.

Bluemix Spark and Hadoop Service Configuration

Having run through configuration of both the Hadoop Big Insights and Apache Spark services on Bluemix, I noticed that Hadoop is very configurable.I have a choice of how many nodes there will be in the cluster and the RAM and CPU cores of those nodes as well as hard disk space
But the Spark service seems less configurable. The only choice I have is to choose between 2 and 30 Spark executors.
I am working with Bluemix as part of an IBM IC4 project to evaluate these services, so I have a few questions about this.
Is it possible to configure the Spark service in a similar way to the Hadoop service? i.e. choose nodes, RAM of nodes, CPU cores etc.
What are Spark executors in this context? Are they nodes? If so, what are their specifications?
Is there a plan to improve the options for Spark's configuration in the future?
Apologies for the questions but I need to know these specifications in order to carry out my work.
The Big Insights service is what some would call a hosted service. Which is to say, when you provision on instance of this service you get your own cluster with nodes configured as specified in the chosen plan. Consequently, you'll want to know exactly what each node you're paying for gives you. On the other hand, the Apache Spark service is a shared compute service, wherein you pay for compute to run your spark programs. Running spark is about in-memory compute, and creating RDDs over sources of data hosted by other data services. So in this context, what matters is how many concurrent jobs can I run and how many parallel tasks can I run with how much memory, and so on. In the Spark service plan, these executors seem to be an abstraction on this compute horsepower; unfortunately, hard for you to map that to physical hardware if you care about that. The plan description needs more elaboration and details about how one translates this abstraction to how you map to your workload needs.
However, I understand that this should be improved considerably at some point in the near future. There have been rumors about moving to only a single spark service plan where you can dial in, whenever you want, how much compute you need and that would take effect when you click "go", for all spark jobs from that point forward; it seems like you can twiddle the dials until you get what you want, see what that would cost, then lock it in until next time you need to change it. I can image something even more dynamic than that on a per-job basis. But anyway, seems like the direction things may be going for this compute service.

Consul infrastructure footprint and performance

Is there any documentation on Consul's requirements as it pertains to infrastructure footprint ? (e.g. memory / disk / cpu requirement and typical usage of Consul agents and master themselves). How does this compare with other similar service discovery solutions ?
I know of no official documentation on this. It will depend in part on the number of total nodes you are running (i.e. server agents plus client agents on each machine running services.
This thread includes comments (and a good link to a presentation by Darron at DataDog) from production users. They indicate using AWS m3.medium to m3.large instances in their production situations (you can find specs on those instance types here). Darron's presentation includes some information on # of nodes in their scenario as well as comments on how they scaled as the number of nodes grew.

Resources