Recently I've been surveying Serverless, and I find some differences between the architecture of Serverless platforms such as AWS Lambda and OpenWhisk. Some platforms use container as function executor(and most of them are open-source like Fn).
However, commercial serverless platforms like Alibaba Cloud Function Compute prefer to use a combination of virtual machine and container where containers run on host VMs .
It confuses me a lot, and I want to ask why do these platforms need an extra host VM rather than plain container? What's the benefit of this? One possible answer, I guess, is to obtain higher security.
Related
I have a Go application running in a Kubernetes cluster which needs to read files from a large MapR cluster. The two clusters are separate and the Kubernetes cluster does not permit us to use the CSI driver. All I can do is run userspace apps in Docker containers inside Kubernetes pods and I am given maprtickets to connect to the MapR cluster.
I'm able to use the com.mapr.hadoop maprfs jar to write a Java app which is able to connect and read files using a maprticket, but we need to integrate this into a Go app, which, ideally, shouldn't require a Java sidecar process.
This is a good question because it highlights the way that some environments impose limits that violate the assumptions external software may hold.
And just for reference, MapR was acquired by HPE so a MapR cluster is now an HPE Ezmeral Data Fabric cluster. I am still training myself to say that.
Anyway, the accepted method for a generic program in language X to communicate with the Ezmeral Data Fabric (the filesystem formerly known as MapR FS) is to mount the file system and just talk to it using file APIs like open/read/write and such. This applies to Go, Python, C, Julia or whatever. Inside Kubernetes, the normal way to do this mount is to use a CSI driver that has some kind of operator working in the background. That operator isn't particularly magical ... it just does what is needful. In the case of data fabric, the operator mounts the data fabric using NFS or FUSE and then bind mounts[1] part of that into the pod's awareness.
But this question is cool because it precludes all of that. If you can't install an operator, then this other stuff is just a dead letter.
There are three alternative approaches that may work.
NFS mounts were included in Kubernetes as a native capability before the CSI plugin approach was standardized. It might still be possible to use that on a very vanilla Kubernetes cluster and that could give access to the data cluster.
It is possible to integrate a container into your pod that does the necessary FUSE mount in an unprivileged way. This will be kind of painful because you would have to tease apart the FUSE driver from the data fabric install and get it to work. That would let you see the data fabric inside the pod. Even then, there is no guarantee Kubernetes or the OS will allow this to work.
There is an unpublished Go file system client that users the low level data fabric API directly. We don't yet release that separately. For more information on that, folks should ping me directly (my contact info is everywhere ... email to ted.dunning hpe.com or gmail.com works)
The data fabric allows you to access data via S3. With the 7.0 release of Ezmeral Data Fabric, this capability is heavily revamped to give massive performance especially since you can scale up the number of gateways essentially without limit (I have heard numbers like 3-5GB/s per stateless connection to a gateway, but YMMV). This will require the least futzing and should give plenty of performance. You can even access files as if they were S3 objects.
[1] https://unix.stackexchange.com/questions/198590/what-is-a-bind-mount#:~:text=A%20bind%20mount%20is%20an,the%20same%20as%20the%20original.
Suppose I want to provision a simple stack in any of several public clouds from a single configuration. Will any existing IaC tools do this?
For example:
a virtual network (VPC, VNET)
one small* Ubuntu instance on the latest Xenial image from Canonical with a public IP (EC2, VM, GCE)
one big* server on the latest Trusty image from Canonical
an internet gateway (IGW)
I can do this easily using Terraform or Ansible. But, per my current understanding, this would be a separate Ansible playbooks or Terraform configurations for each cloud environment (AWS, Azure, GCP).
Does a tool exist that would allow me to point to a single configuration and pass the cloud in which to provision the stack as an option.
i.e.
toolname create --config=my_simple_stack --provider=azure
or
toolname create --config=my_simple_stack --provider=gcp
Then if my_simple_stack configuration needed to change, the change could be made in one place rather than three.
* sizes being ballpark as I realize that available VM sizes are not necessarily consistent across providers. So small might be 2+ core / 2+ GB RAM and big might be 16+ core / 16+ GB RAM depending on what the provider offers.
Thanks all. I suspect that there isn't tooling available to do this for the reason that ydaetskcoR said - it isn't necessarily a good idea.
Containerization is the better approach rather than trying to build like-for-like least-common-denominator environments across a bunch of different cloud platforms.
I am relatively new to all these, but I'm having troubles getting a clear picture among the listed technologies.
Though, all of these try to solve different problems, but do have things in common too. I would like to understand what are the things that are common and what is different. It is likely that the combination of few would be great fit, if so what are they?
I am listing a few of them along with questions, but it would be great if someone lists all of them in detail and answers the questions.
Kubernetes vs Mesos:
This link
What's the difference between Apache's Mesos and Google's Kubernetes
provides a good insight into the differences, but I'm unable to understand as to why Kubernetes should run on top of Mesos. Is it more to do with coming together of two opensource solutions?
Kubernetes vs Core-OS Fleet:
If I use kubernetes, is fleet required?
How does Docker-Swarm fit into all the above?
Disclosure: I'm a lead engineer on Kubernetes
I think that Mesos and Kubernetes are largely aimed at solving similar problems of running clustered applications, they have different histories and different approaches to solving the problem.
Mesos focuses its energy on very generic scheduling, and plugging in multiple different schedulers. This means that it enables systems like Hadoop and Marathon to co-exist in the same scheduling environment. Mesos is less focused on running containers. Mesos existed prior to widespread interest in containers and has been re-factored in parts to support containers.
In contrast, Kubernetes was designed from the ground up to be an environment for building distributed applications from containers. It includes primitives for replication and service discovery as core primitives, where-as such things are added via frameworks in Mesos. The primary goal of Kubernetes is a system for building, running and managing distributed systems.
Fleet is a lower-level task distributor. It is useful for bootstrapping a cluster system, for example CoreOS uses it to distribute the kubernetes agents and binaries out to the machines in a cluster in order to turn-up a kubernetes cluster. It is not really intended to solve the same distributed application development problems, think of it more like systemd/init.d/upstart for your cluster. It's not required if you run kubernetes, you can use other tools (e.g. Salt, Puppet, Ansible, Chef, ...) to accomplish the same binary distribution.
Swarm is an effort by Docker to extend the existing Docker API to make a cluster of machines look like a single Docker API. Fundamentally, our experience at Google and elsewhere indicates that the node API is insufficient for a cluster API. You can see a bunch of discussion on this here: https://github.com/docker/docker/pull/8859 and here: https://github.com/docker/docker/issues/8781
Join us on IRC # #google-containers if you want to talk more.
I think the simplest answer is that there is no simple answer. The swift rise to power of containers, and Docker in particular has left a power vacuum for "container scheduling and orchestration", whatever that might mean. In reality, that means you have a number of technologies that can work in harmony on some levels, but with certain aspects in competition. For example, Kubernetes can be used as a one stop shop for deploying and managing containers on a compute cluster (as Google originally designed it), but could also sit atop Fleet, making use of the resilience tier that Fleet provides on CoreOS.
As this Google vid states Kubernetes is not a complete out the box container scaling solution, but is a good statement to start from. In the same way, you would at some stage expect Apache Mesos to be able to work with Kubernetes, but not with Marathon, in as much as Marathon appears to fulfil the same role as Kubernetes. Somewhere I think I've read these could become part of the same effort, but I could be wrong about that - it's really about the strategic direction of Mesosphere and the corresponding adoption of Kubernetes principles.
In the DockerCon keynote, Solomon Hykes suggested Swarm would be a tier that could provide a common interface onto the many orchestration and scheduling frameworks. From what I can see, Swarm is designed to provide a smooth Docker deployment workflow, working with some existing container workflow frameworks such as Deis, but flexible enough to yield to "heavyweight" deployment and resource management such as Mesos.
Hope this helps - this could be an enormous post. I think the key is that these are young, evolving services that will likely merge and become interoperable, but we need to ride out the next 12 months to see how it plays out. There's some very clever people on the problem, so the future looks very bright.
As far as I understand it:
Mesos, Kubernetes and Fleet are all trying to solve a very similar problem. The idea is that you abstract away all your hardware from developers and the 'cluster management tool' sorts it all out for you. Then all you need to do is give a container to the cluster, give it some info (keep it running permanently, scale up if X happens etc) and the cluster manager will make it happen.
With Mesos, it does all the cluster management for you, but it doesn't include the scheduler. The scheduler is the bit that says, ok this process needs 2 procs and 512MB RAM, and I have a machine over there with that free, so I'll run it on that machine. There are some plugin schedulers available for Mesos: Marathon and Chronos and you can write your own. This gives you a lot of power of resource distribution and cluster scaling etc.
Fleet and Kubernetes seem to abstract away those sorts of details (so you don't have to write your own scheduler basically). This means you have to define your tasks and submit them in the format/manner defined by Fleet or Kubernetes and then they take over and schedule the tasks (containers) for you.
So I guess: Using Mesos may mean a bit more work in writing your own scheduler, but potentially provides more flexibility if required.
I think the idea of running Kubernetes on top of Mesos is that Kubernetes acts as the scheduler for Mesos. Personally I'm not sure what benefits this brings over running one or the other on its own though (hopefully someone will jump in and explain!)
As MikeB said.. it's early days, and it's all up for grabs (keep an eye on Amazon's ECS as well) so there are many competing standards and a lot of overlap!
-edit- I didn't mention Docker swarm as I don't really have much experience with it.
For anyone coming to this after 2017 fleet is deprecated. Do not use it anymore.
Fleet docs say "fleet is no longer actively developed or maintained by CoreOS" and link to Container orchestration: Moving from fleet to Kubernetes. Fleet was removed from Container Linux (formerly known as CoreOS Linux) and replaced with Kubernetes kubelet (agent). This coincided with a corporate pivot to offer Tectonic (a Kubernetes distro) as their primary product.
I really like the approach of a 12factor app, which you are kinda forced into, when you deploy an application to Heroku. For this question I'm particularly interested in setting environment variables for configuration, like one would do on Heroku.
As far as I can tell, there's no way to change the ENV for one or multiple instances within the EC2 console (though it's seems to be possible to set 5 ENV vars when using elastic beanstalk). Therefore my next bet on an Ubuntu based system would be to use /etc/environment, /etc/profile, ~/.profile or just the export command to set ENV variables.
Is this the correct approach or am I missing something?
And if so, is there a best practice on how to do it? I guess I could use something like Capistrano or Fabric, get a list of servers from the AWS api, connect to all of them and change the mentioned files/call export. Though 12factor is pretty well known, I couldn't find any blog post describing how to handle the ENV for a non-trivial amount of instances on EC2. And I don't want to implement such a thing, if somebody already did it very well and I just missed something.
Note: I want a solution without using elastic beanstalk and I don't care about git push deployment or any other Heroku-like feature, this is solely related to app configuration.
Any hints appreciated, thanks!
Good question. There are many ways you can approach your deployment/environment setup.
One thing to keep in mind is that with Heroku (or Elastic Beanstalk for that matter) you only push the code. Their service takes care of the scalability factor and replication of your services across their infrastructure (once you push the code).
If you are using fabric (or capistrano) you are using a push model too, but you have to take care of all the scalability/replication/fault tolerance of your application.
Having said that, if you are using EC2, in my opinion it's better if you leverage AMIs, Autoscale and Cloudformation for your deployments. This is the beauty of elasticity and Virtualization in that you can think of resources as ephemeral. You can still use fabric/capistrano to automate the AMI builds (I use Ansible) and configure environment variables, packages, etc. Then you can define a Cloudformation stack (with a JSON file) and in it you can add an autoscaling group with your prebaked AMI.
Another way of deploying your app is to simply use the AWS Opsworks service. It's pretty comprehensive and it has a lot of options but it may not be for everybody since some people may want a bit more flexibility.
If you want to go 'pull' model you can use Puppet, Chef or CFEngine. In this case you have a master policy server somewhere in the cloud (Puppetmaster, Chef Server or Policy Server). When a server gets spun up, an agent (Puppet agent, Chef Client, Cfengine agent) connects to its master to pick up its policy and then executes it. The policy may contains all the packages and environment variables that you need for your application to function. Again, it's a different model. This model scales pretty well but it depends on how many agents the master can handle and how you stagger the connections from the agents to the master. You can load balance multiple masters too if you want to scale to thousands of servers or you can just simply use multiple masters. From experience, if you want something really "fast" Cfengine works pretty good, there's a good blog comparing the speed of Puppet and CFengine here: http://www.blogcompiler.com/2012/09/30/scalability-of-cfengine-and-puppet-2/
You can also go "push" completely with tools like fabric, Ansible, Capistrano. However, you are constrained by how much a single server (or laptop) can handle multiple connections to thousands of servers that its trying to push to. This is also constrained by network bandwidth, but hey you can get creative and stagger your push updates and perhaps use multiple servers to push. Again it works and it's a different model so it depends which direction you want to go.
Hope this helps.
If you dont need beanstalk, you can look at AWS Opsworks (http://aws.amazon.com/opsworks/). Ideal for Web worker kind of deployment scenerios. You can pass any variable from outside the code here (even Chef recipies)
It's might be late but they what we are doing.
We have python script that take env var in Json and send that to as post data to another python script that convert those vars to ymal file.
After that we use Jenkins pipline groovy using multibranch. Jenkins do all the build and then code deploy copies those env vars to ec2 instanced running in autoscaling.
Off course we are doing some manapulation from yaml to simple text file so code deploy can paste it on /etc/envoirments
What can I do with Cloudformation that I can not do with chef? It looks like chef support spawning node in the ec2 and I can read back the information of the spawned nodes (ip,..) so why would still need cloudformation for?
Very little, but what it does makes it worth using.
The primary advantage of CloudFormation is that Amazon ties created resources together. In other words if your application comprises a db, four webservers, an autoscaling group, a launch configuration, ingress rules, some VPC subnets, an internet gateway or two, and a VPN connection, you can manage them in a single place, as a CloudFormation stack. Want to shoot them? That's easy. Kill the stack and every resource dies with it.
Sure you could technically do this with Chef and the Amazon API. CloudFormation is little more than a way of generating extremely sophisticated sets of AWS API calls + Magic (tm), so you could always roll your own. Netflix sort of did that with their OS tool, Asgard, and RightScale and other services are more or less that. If those softwares don't meet your needs or if you can't afford them, CloudFormation is a nice supplement to AWS deployments. In fact, you don't even need to rely on it. It's quite simple to launch chef-solo from CloudFormation, which allows you to leverage the advantages of both.