Cluster of dokku nodes running Elixir/Phoenix - cluster-computing

How can I configure a bunch of dokku nodes (I use the DigitalOcean One-Click Droplets) running an Elixir/Phoenix system to work as a cluster. I found an article describing Elixir clusters in general (https://dockyard.com/blog/2016/01/28/running-elixir-and-phoenix-projects-on-a-cluster-of-nodes) but I do not know how to apply this to dokku.

Dokku Maintainer here: We might not be the best fit for your use case. We're a platform optimized for single-server solutions. While this doesn't mean that you couldn't do multi-server, I believe that Elixer requires direct tcp - not http - access in order to setup clustering. You could get around this by implementing a custom proxy plugin on top of haproxy instead of nginx.
If you have a simpler solution, I would always go with that :)

Related

How to provide mutual TLS (mTLS) with Spring application in Kubernetes?

I have an interesting problem, maybe you could help me out.
There are given two spring applications, called app1 and app2. There is plenty of REST calls are happening to both of the services. I need to implement a security solution where both of them can communicate with each other on REST but it is protected by mutual TLS (mTLS where both app has its own cert for each other)
Implementing it the standard way its not that hard, Spring has solutions for it (with keystores and etc.), but the twist is, I have to create it in a Kubernetes environment.
The two app is not in the same cluster, so app1 is in our cluster but app2 deployed in one of our partner's system.
I am pretty new to k8s and not sure what is the best method to achieve this. Should I store the certs or the keystore(s) as secrets? Use and configure nginx ingress somehow, maybe Istio would be useful? I would really want to find the optimal solution but I don't know the right way.
I would really like if I could configure it outside my app and let k8s take care about it but I am not sure if it is the right thing to do.
Any help would be really appreciated, some guidance to find the right path or some kind of real life examples.
Thank you for your help!
Mikolaj has probably covered everything but still let me add my cent
i don't have much experience working with Istio, however i would also suggest checking out the Linkerd service mesh.
Step 1.
Considering if you are on multi could GKE & EKS or so still it will work.
Multicluster guide details and installation details
Linkerd will use the Trust anchor between the cluster so traffic can flow encrypted and not get open to the public internet.
You have to generate the certificate which will form a common base of trust between clusters.
Each proxy will get copy of the certificate and use it for validation.
The answer to your problem will be more complex as there is no one-size-fits-all solution that turns out to be the best. It all depends on what exactly you want to do and what tools you have for it. suren mentioned it very well in the comment:
if you are still in the stage of PoC, then note that there are couple of ways of achieving what you want. Istio would be a valid way, for example. You could have the other service in a ServiceEntry, enable mTLS and there you go. You don't have to even manage secrets for this specific scenario, as it is automatic. But there are other ways. Even with Istio there are other ways. If you are on any cloud provider, you might have some managed services as well
This is a very good comment and I would also recommend an istio based solution to you. First of all check the official mTLS documentation for istio first. You will also find specific usage examples and sample configuration files there.
You also mentioned in the question that your application will run between two clusters. Take a look at this tutorial, which shows exactly how to solve this situation:
Istio injects an envoy sidecar to every pod and makes sure all the traffic goes through the envoy proxy. Envoy proxies compose the data plane. The control plane manages the Envoy sidecars. In previous versions of Istio, the control plane used to have other components, such as Pilot, Citadel, and Galley. These components got consolidated into a single binary called “istiod”. The control plane also deals with the configurations, certificates, secrets, and health checking.
For more information look also at related problem on stackoverflow and another tutorial.
Take into account that in addition to istio itself, you will be able to use ready-made cloud solutions, for example available at GKE i.e. Configuring TLS and mTLS on the Istio ingress .
Another way might be to use a tool Anthos Service Mesh by example: mTLS.

How to configure kube-proxy master_url with multiple apiservers

I'm using a cluster setup with multiple apiservers with a loadbalancer in front of them for external access, with an installation on bare metal.
Like mentioned in the High Availability Kubernetes Clusters docs, I would like to use internal loadbalancing utilizing the kubernetes service within my cluster. This works fine so far, but I'm not sure what is the best way to set up the kube-proxy. It obviously cannot use the service IP, since it does the proxying to this one based on the data from the apiserver (master). I could use the IP of any one of the apiservers, but this would cause losing the high availability. So, the only viable option I currently see is to utilize my external loadbalancer, but this seems somehow wrong.
Anybody any ideas or best practices?
This is quite old question, but as the problem persists... here it goes.
There is a bug in the Kubernetes restclient, which does not allow to use more than one IP/URL, as it will pick up always the first IP/URL in the list. This affects to kube-proxy and also to kubelet, leaving a single point of failure in those tools if you don't use a load balancer (as you did) in a multi-master setup. The solution probably is not the most elegant solution ever, but currently (I think) is the easier one.
Other solution (which I prefer, but may not work for everyone and it does not solve all the problems) is to create a DNS entry that will round robin your API servers, but as pointed out in one of the links below, that only solves the load balancing, and not the HA.
You can see the progress of this story in the following links:
The kube-proxy/kubelet issue: https://github.com/kubernetes/kubernetes/issues/18174
The restclient PR: https://github.com/kubernetes/kubernetes/pull/30588
The "official" solution: https://github.com/kubernetes/kubernetes/issues/18174#issuecomment-199381822
I think the way it is meant to be set up is that you have a kube-proxy on each master node, so each kube-proxy points to its master on 127.0.0.1 / localhost
The podmaster determines which api-server should run, which in turns makes use of the local proxy of that master

How can a Phoenix application tailored only to use channels scale on multiple machines? Using HAProxy? How to broadcast messages to all nodes?

I use the node application purely for socket.io channels with Redis PubSub, and at the moment I have it spread across 3 machines, backed by nginx load balancing on one of the machines.
I want to replace this node application with a Phoenix application, and I'm still all new to the erlang/Elixir world so I still haven't figured out how a single Phoenix application can span on more than one machine. Googling all possible scaling and load balancing terms yielded nothing.
The 1.0 release notes mention this regarding channels:
Even on a cluster of machines, your messages are broadcasted across the nodes automatically
1) So I basically deploy my application to N servers, starting the Cowboy servers in each one of them, similarly to how I do with node and them I tie them nginx/HAProxy?
2) If that is the case, how channel messages are broadcasted across all nodes as mentioned on the release notes?
EDIT 3: Taking Theston answer which clarifies that there is no such thing as Phoenix applications, but instead, Elixir/Erlang applications, I updated my search terms and found some interesting results regarding scaling and load balancing.
A free extensive book: Stuff Goes Bad: Erlang in Anger
Erlang pooling libraries recommendations
EDIT 2: Found this from Elixir's creator:
Elixir provides conveniences for process grouping and global processes (shared between nodes) but you can still use external libraries like Consul or Zookeeper for service discovery or rely on HAProxy for load balancing for the HTTP based frontends.
EDITED: Connecting Elixir nodes on the same LAN is the first one that mentions inter Elixir communication, but it isn't related to Phoenix itself, and is not clear on how it related with load balancing and each Phoenix node communicating with another.
Phoenix isn't the application, when you generate a Phoenix project you create an Elixir application with Phoenix being just a dependency (effectively a bunch of things that make building a web part of your application easier).
Therefore any Node distribution you need to do can still happen within your Elixir application.
You could just use Phoenix for the web routing and then pass the data on to your underlying Elixir app to handle the distribution across nodes.
It's worth reading http://www.phoenixframework.org/v1.0.0/docs/channels (if you haven't already) where it explains how Phoenix channels are able to use PubSub to distribute (which can be configured to use different adapters).
Also, are you spinning up cowboy on your deployment servers by running mix phoenix.server ?
If so, then I'd recommend looking at EXRM https://github.com/bitwalker/exrm
This will bundle your Elixir application into a self contained file that you can simply deploy to your production servers (with Capistrano if you like) and then you start your application.
It also means you don't need any Erlang/Elixir dependencies installed on the production machines either.
In short, Phoenix is not like Rails, Phoenix is not the application, not the stack. It's just a dependency that provides useful functionality to your Elixir application.
Unless I am misunderstanding your use case, you can still use the exact scaling technique your node version of the application is. Simply deploy the Phoenix application to > 1 machines and use an Nginx load balancer configured to forward requests to one of the many application machines.
The built in node communications etc of Erlang are used for applications that scale in a different way than a web app. For instance, distributed databases or queues.
Look at Phoenix.PubSub
It's where Phoenix internally has the Channel communication bits.
It currently has two adapters:
Phoenix.PubSub.PG2 - uses Distributed Elixir, directly exchanging notifications between servers. (This requires that you deploy your application in a elixir/erlang distributed cluster way.)
Phoenix.PubSub.Redis - uses Redis to exchange data between servers. (This should be similar to solutions found in socket.io and others)

Heroku-like deployment and environment configuration via EC2

I really like the approach of a 12factor app, which you are kinda forced into, when you deploy an application to Heroku. For this question I'm particularly interested in setting environment variables for configuration, like one would do on Heroku.
As far as I can tell, there's no way to change the ENV for one or multiple instances within the EC2 console (though it's seems to be possible to set 5 ENV vars when using elastic beanstalk). Therefore my next bet on an Ubuntu based system would be to use /etc/environment, /etc/profile, ~/.profile or just the export command to set ENV variables.
Is this the correct approach or am I missing something?
And if so, is there a best practice on how to do it? I guess I could use something like Capistrano or Fabric, get a list of servers from the AWS api, connect to all of them and change the mentioned files/call export. Though 12factor is pretty well known, I couldn't find any blog post describing how to handle the ENV for a non-trivial amount of instances on EC2. And I don't want to implement such a thing, if somebody already did it very well and I just missed something.
Note: I want a solution without using elastic beanstalk and I don't care about git push deployment or any other Heroku-like feature, this is solely related to app configuration.
Any hints appreciated, thanks!
Good question. There are many ways you can approach your deployment/environment setup.
One thing to keep in mind is that with Heroku (or Elastic Beanstalk for that matter) you only push the code. Their service takes care of the scalability factor and replication of your services across their infrastructure (once you push the code).
If you are using fabric (or capistrano) you are using a push model too, but you have to take care of all the scalability/replication/fault tolerance of your application.
Having said that, if you are using EC2, in my opinion it's better if you leverage AMIs, Autoscale and Cloudformation for your deployments. This is the beauty of elasticity and Virtualization in that you can think of resources as ephemeral. You can still use fabric/capistrano to automate the AMI builds (I use Ansible) and configure environment variables, packages, etc. Then you can define a Cloudformation stack (with a JSON file) and in it you can add an autoscaling group with your prebaked AMI.
Another way of deploying your app is to simply use the AWS Opsworks service. It's pretty comprehensive and it has a lot of options but it may not be for everybody since some people may want a bit more flexibility.
If you want to go 'pull' model you can use Puppet, Chef or CFEngine. In this case you have a master policy server somewhere in the cloud (Puppetmaster, Chef Server or Policy Server). When a server gets spun up, an agent (Puppet agent, Chef Client, Cfengine agent) connects to its master to pick up its policy and then executes it. The policy may contains all the packages and environment variables that you need for your application to function. Again, it's a different model. This model scales pretty well but it depends on how many agents the master can handle and how you stagger the connections from the agents to the master. You can load balance multiple masters too if you want to scale to thousands of servers or you can just simply use multiple masters. From experience, if you want something really "fast" Cfengine works pretty good, there's a good blog comparing the speed of Puppet and CFengine here: http://www.blogcompiler.com/2012/09/30/scalability-of-cfengine-and-puppet-2/
You can also go "push" completely with tools like fabric, Ansible, Capistrano. However, you are constrained by how much a single server (or laptop) can handle multiple connections to thousands of servers that its trying to push to. This is also constrained by network bandwidth, but hey you can get creative and stagger your push updates and perhaps use multiple servers to push. Again it works and it's a different model so it depends which direction you want to go.
Hope this helps.
If you dont need beanstalk, you can look at AWS Opsworks (http://aws.amazon.com/opsworks/). Ideal for Web worker kind of deployment scenerios. You can pass any variable from outside the code here (even Chef recipies)
It's might be late but they what we are doing.
We have python script that take env var in Json and send that to as post data to another python script that convert those vars to ymal file.
After that we use Jenkins pipline groovy using multibranch. Jenkins do all the build and then code deploy copies those env vars to ec2 instanced running in autoscaling.
Off course we are doing some manapulation from yaml to simple text file so code deploy can paste it on /etc/envoirments

HA and centralized config storage for zeromq - alternatives to zookeeper and doozer

Is there an alternative to zookeeper or doozer that are best of breed? I do not want to use java and doozer install is rather tough.
I would like a HA service in the spirit of zookeeper and doozer where machines can register there IP address and the type of server e.g. worker or web app. This will needed for zeromq where workers will need to know the servers to connect.
Thanks
You could consider zpax, it's similar to zookeeper/doozer in basic concept but it's intended for direct embedding into the target application. However, it's still pretty early in the development process. If you have the time and motivation to invest some effort in this arena, it might be worth a look.

Resources