Background
Our current infrastructure consists of a Jenkins master and a number of slave VM's. We are running into a lot of scalability and inherently stability issues with our tests as the VM's are being overworked.
Mesosphere and Jenkins
That being said, I'm looking to explore more solutions, particularly with mesosphere because its ability to dynamically generate slaves as needed.
My only issue with that is that we have all these dependencies installed on the slave VM's. In order to make Jenkins work on mesos, I would have to "dirty" the mesos slaves by installing the dependencies on them. This would kind of render these mesos slaves useless as they would only be suited for running Jenkins.
Question
What is the proper method of implementing a Jenkins environment in Mesos alongside other applications?
Check out eBay's video and blogs about their Mesos+Marathon+Jenkins setup:
http://blog.docker.com/2014/06/dockercon-video-delivering-ebays-ci-solution-with-apache-mesos-docker/
http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/
http://www.ebaytechblog.com/2014/05/12/delivering-ebays-ci-solution-with-apache-mesos-part-ii/
Part II of the blog talks about running Jenkins builds in Docker containers, which could alleviate the problem of "dirtying" the slaves with dependencies.
See the mesos-jenkins plugin for more documentation, and see dockerhub for pre-built images
https://github.com/jenkinsci/mesos-plugin
https://registry.hub.docker.com/u/folsomlabs/jenkins-mesos/ (latest)
https://registry.hub.docker.com/u/thefactory/jenkins-mesos/ (documented)
Related
I am in planning phase of a multi-node Hadoop cluster in a Docker based environment. So it should be based on a lightweight easy to use virtualized system.
Current architecture (regarding to documentation) contains 1 master and 3 slave nodes. This host machine uses HDFS filesystem and KVM for virtualization.
The whole cloud is managed by Cloudera Manager. There are several Hadoop modules installed on this cluster. There is also a NodeJS data upload service.
This time I should make architecture Docker based.
I have read several tutorials and have some opinions, but also open questions.
A. What do you think, is https://github.com/Lewuathe/docker-hadoop-cluster a good base for my project? I have found also an official image, but it is single-node.
B. How will system requirements change if I would like to make this in a single container? It would be great, because this architecture should work in different locations, so changes can be easily transferred between these locations. Synchronization between these so called clones would be important.
C. Do you have some other ideas, maybe best practices?
As of September 2016 there is no quick answer.
https://github.com/Lewuathe/docker-hadoop-cluster does not seem like a good start, as it should be universal for your B. option
Keep an eye on https://github.com/sequenceiq/hadoop-docker and https://github.com/kiwenlau/hadoop-cluster-docker
To address your question C., you may want to check out BlueData's software platform: http://www.bluedata.com/blog/2015/06/docker-containers-big-data-clusters
It's designed to run multi-node Hadoop clusters in a Docker-based environment and there is a free version available for download (you can also run it in an AWS EC2 instance).
This work has already been done for you, actually:
https://hub.docker.com/r/cloudera/clusterdock/
It includes a pre-packaged multi-node CDH cluster, with Cloudera Manager as an optional component for cluster management et al.
I have an query, If we can use Mesos Cluster by directly installing master and slave nodes. Then why do we need DCOS , is it that DCOS provides additional support along with mesos cluster. Please elaborate on this part.
Depends on your needs :-):
Here what in my opinion the Community Edition (the Enterprise Edition includes more proprietary features such as security) of DCOS adds to self setup of Mesos:
Easy setup, including Marathon and MesosDNS.
Command Line Interface with one Click install from the Universe. I personally especially like the simple installs of these services as it is really simple to install for example HDFS or cassandra in your cluster. Note: As with the above you can probably with some effort configure such setup yourself as both projects are on Github.
Very nice UI
So overall I would summarize DCOS provides a very easy and tested best-practice setup of Mesos and its ecosystem.
Hope this helps!
I am new to Jenkins and know how to create Jobs and add servers for JAR deployment.
I need to create deployment job using Jenkins which takes a JAR file and deploys it of 50-100 servers.
These servers are categorized in 6 categories. there will be different process run on each server but same JAR will be used.
Please suggest what is the best approach to create JOB for this.
As of now, the servers are less(6-7), I have added each server to Jenkins and using command execution over ssh for process execution. But for 50 servers this is not the possibility.
Jenkins is a great tool for managing builds and dependencies, but it is not a great tool for Configuration Management. If you're deploying to more than 2 targets (and especially if different targets have different configurations), I would highly recommend investing the time to learn a configuration management tool.
I can personally recommend Puppet and Ansible. In particular, Ansible works over an SSH connection to the target (which it sounds like you have) and requires only a base Python install.
Can a Hadoop Yarn instance manage nodes from different places on Earth, networks? Can it manage nodes that use different platforms?
Every note about Yarn I found tells that Yarn manages clusters, but if the app I deploy is written in Java then it should probably work on the nodes regardless of the nodes' hardware.
Similarly, Yarn seems general enough to support more than just a LAN.
YARN is not platform aware. It is also not aware about how application processes on different hosts communicate with each other to perform the work.
In the same time for YARN application master should be run as a command line - and thereof any node on the cluster with enough resources should be able to run it.
If not every platform is capable to run specific app master- then YARN should be aware on it. Today it can not, but I can imegine platform to be special kind of resource - and then YARN will select appropriate node
Regarding LAN if you have application master which knows how to manage job over several LAN - it is should be fine with YARN.
Is there a way to have my hudson slaves used by multiple hudson masters?
A bit of background info:
My build guy has set-up separate hudson masters to do the deployment and testing of our solution into different test environments. My tests are run on hudson slaves (I have 4 slaves). These slaves are associated to one specific hudson master. I want the slaves to be available for use by any of the hudson masters.
I believe the build guy chooses to use multiple hudsom masters to manage the number of jobs on each master. His set-up for one environment has 8 view tabs therefore 5 environments would mean 40 tabs. Unfortunately as is common, the solution to one problem creates another.
Yes, you can add the slaves to both Hudson masters. The problem is that each master will not be aware of the resource utilization by the other master, so you'll have to figure out some mechanism for that, such as reducing the number of executors.
Even better would be to combine the two Hudson masters into a single Hudson instance. Your question doesn't explain the motivation for having two masters.
As I cannot comment above I'll try an answer.
I think you can have several independent slaves on same machine, each attaching and discussing with its unique master. I also think that different slaves on same machine sharing same home directory is not supported, not working. And of course if they are completely independent, as Michael Donohue said above, there is a workload sharing issue to resolve.
v1.366 added support for Windows slaves running as a Win32 service to serve multiple masters
see http://hudson-ci.org/changelog.html
Hudson jobs can also be parameterized, with a default value used for scheduled jobs and a web page offered for parameter input on manually triggered jobs. That can work in some situations to reduce need for multiple jobs.
Or try the nested view plugin if number of tabs an issue and can't reduce number of jobs