How Much Ram For a Jenkins Master Instance? - memory-management

I want to set up a Jenkins Master that will let slaves do all the the builds.
The master is just a traffic cop, getting SVN hook triggers and kicking off slave builds.
There will be about 10 Java Maven build jobs in this setup.
I wish to run the Jenkins master on a hosted server that has limited resources (RAM).
I will run the slaves on some nicely loaded machines on my own network.
So my question is how little RAM could I get away with allocating to the Master Jenkins instance? 256M? 384M? 512M? Other?
I cannot seem to find this specific info in the Jenkins docs.

A coworker asked me the same question and my first answer was that 1-2 GB should be enough. Later I discovered this entry from the Jenkins documentation:
Have a beefy machine for Jenkins master & do not run slaves on the
master machine. Every slave has certain memory allocated in the master
JVM, so the bigger the RAM for the master, the better it is. We
typically hear customers allocate 16G or so.
Source: https://docs.cloudbees.com/docs/cloudbees-core/latest/traditional-install-guide/system-requirements

As of mid-2016, the official documentation says
Memory Requirements for the Master
The amount of memory Jenkins needs is largely dependent on many factors, which is why the RAM allotted for it can range from 200 MB for a small installation to 70+ GB for a single and massive Jenkins master. However, you should be able to estimate the RAM required based on your project build needs.
Each build node connection will take 2-3 threads, which equals about 2 MB or more of memory. You will also need to factor in CPU overhead for Jenkins if there are a lot of users who will be accessing the Jenkins user interface.
It is generally a bad practice to allocate executors on a master, as builds can quickly overload a master’s CPU/memory/etc and crash the instance, causing unnecessary downtime. Instead, it is advisable to set up slaves that the Jenkins master can delegate build jobs to, keeping the bulk of the work off of the master itself.

I don't think there is a rule of thumb for this. Our master uses 2G and we have 6 slaves. We have close to 60 jobs - most of them maven. We have never had memory issues in the past. And our slaves are always busy (I always see some job or the other being kicked off).
You could start with 512M and see how it works. If you see memory issues increase the memory. That is the only way I can think of. But to monitor the memory of your master use the Jenkins Monitoring Plugin. This plugin integrates JavaMelody and lets you monitor the JVM of your master and even slaves. Good luck!

I have a Jenkins master with a few dozens of jobs and slaves. But I don't run builds or tests on Master. Based on my observation the memory consuption is not that big. It rarely went more than 2 or 3 GB. Also I believe it depends on the memory size option you specifiy to the java process of Jenkins. I would recommend at least 2GB RAM in your case. You could always load balance the builds to slaves if you need.

It depends on what you are building, how often you are running builds, and how fast you need them to build. From my experience you need 1 GB per concurrent build, but this can vary depending on how resource intensive your build is. Monitor your memory usage under it's heaviest load and if the memory usage gets between 70-80% or more add more memory, it's around 30-40% or less allocate less.
Also keep an eye on disk usage, about 4 years ago had a TeamCity build server that kept falling to it's knees, that problem turned out to be the master and the salves where all virtual servers sharing the same attached storage and the disks couldn't keep up. More of a problem with the VM environment but still something to keep in mind.

Related

Is there a way to force re-election in Apache mesos master quorum?

We have a Apache Mesos master running in HA mode with 3 nodes(each with 4CPU, 15G Memory), this cluster stops offering resources when the memory gets completely exhausted (happens every week)
we have >200 agents connected to this master and it grows, so a long term solution is to increase the CPU & Memory. But till we get bigger VMs, we have to baby sit every day to monitor the CPU load and memory to restart the mesos master service (which will force the re-election) as a precaution.
To avoid this manual effort, we are planning to force the re-election of this cluster on a specific interval.. say every 2days.
So my question here is, whether mesos master has support to force re-election like this, if so how, is it recommended and does it has any caveat?
Appreciate your time to answer and help me here!

Does Mesos really treat all your resources as a single pool?

Mesos is advertised as a system that lets you program against your datacenter like it's a single pool of resources (See the Mesos Website). But is this really true that you don't need to consider the configuration of the individual machines? Using Mesos, can you request more resources for a task than are available on a single machine?
For example, if you have 10 machines each with 2 cores and 2g of RAM and 20g HD, can you really request 10 cores, 15g of RAM and 100g of disk space for a single task?
If so, how does this work? Is Mesos able to address memory across machines for you, and use other CPUs as local threads and create a single filesystem from a number of distributed nodes?
How does it accomplish this without suffering from the Fallacies of distributed computing, especially those related to network latency and transport cost?
According to this Mesos architecture you can't aggregate resources from different slaves (agents / machines) to use them for one task.
As you can see there is strict "taks per agent" situation
Also their example says pretty much same
Let’s walk through the events in the figure.
Agent 1 reports to the master that it has 4 CPUs and 4 GB of memory
free. The master then invokes the allocation policy module, which
tells it that framework 1 should be offered all available resources.
The master sends a resource offer describing what is available on
agent 1 to framework 1. The framework’s scheduler replies to the
master with information about two tasks to run on the agent, using <2
CPUs, 1 GB RAM> for the first task, and <1 CPUs, 2 GB RAM> for the
second task. Finally, the master sends the tasks to the agent, which
allocates appropriate resources to the framework’s executor, which in
turn launches the two tasks (depicted with dotted-line borders in the
figure). Because 1 CPU and 1 GB of RAM are still unallocated, the
allocation module may now offer them to framework 2.

How do I find a marathon runaway process

I have a mesos / marathon system, and it is working well for the most part. There are upwards of 20 processes running, most of them using only part of a CPU. However, sometimes (especially during development), a process will spin up and start using as much CPU as is available. I can see on my system monitor that there is a pegged CPU, but I can't tell what marathon process is causing it.
Is there a monitor app showing CPU usage for marathon jobs? Something that shows it over time. This would also help with understanding scaling and CPU requirements. Tracking memory usage would be good, but secondary to CPU.
It seems that you haven't configured any isolation mechanism on your agent (slave) nodes. mesos-slave comes with an --isolation flag that defaults to posix/cpu,posix/mem. Which means isolation at process level (pretty much no isolation at all). Using cgroups/cpu,cgroups/mem isolation will ensure that given task will be killed by kernel if exceeds given memory limit. Memory is a hard constraint that can be easily enforced.
Restricting CPU is more complicated. If you have machine that offers 8 CPU cores to Mesos and each of your tasks is set to require cpu=2.0, you'll be able run there at most 4 tasks. That's easy, but at given moment any of your 4 tasks might be able to utilize all idle cores. In case some of your jobs is misbehaving, it might affect other jobs running on the same machine. For restricting CPU utilization see Completely Fair Scheduler (or related question How to understand CPU allocation in Mesos? for more details).
Regarding monitoring there are many possibilities available, choose an option that suits your requirements. You can combine many of the solutions, some are open-source other enterprise level solutions (in random order):
collectd for gathering stats, Graphite for storing, Grafana for visualization
Telegraf for gathering stats, InfluxDB for storing, Grafana for visualization
Prometheus for storing and gathering data, Grafana for visualization
Datadog for a cloud based monitoring solution
Sysdig platform for monitoring and deep insights

How to select CPU parameter for Marathon apps ran on Mesos?

I've been playing with Mesos cluster for a little bit, and thinking of utilizing Mesos cluster in our production environment. One problem I can't seem to find an answer to: how to properly schedule long running apps that will have varying load?
Marathon has "CPUs" property, where you can set weight for CPU allocation to particular app. (I'm planning on running Docker containers) But from what I've read, it is only a weight, not a reservation, allocation, or limitation that I am setting for the app. It can still use 100% of CPU on the server, if it's the only thing that's running. The problem is that for long running apps, resource demands change over time. Web server, for example, is directly proportional to the traffic. Coupled to Mesos treating this setting as a "reservation," I am choosing between 2 evils: set it too low, and it may start too many processes on the same host and all of them will suffer, with host CPU going past 100%. Set it too high, and CPU will go idle, as reservation is made (or so Mesos think), but there is nothing that's using those resources.
How do you approach this problem? Am I missing something in how Mesos and Marathon handle resources?
I was thinking of an ideal way of doing this:
Specify weight for CPU for different apps (on the order of, say, 0.1 through 1), so that when going gets tough, higher priority gets more (as is right now)
Have Mesos slave report "Available LA" with its status (e.g. if 10 minute LA is 2, with 8 CPUs available, report 6 "Available LA")
Configure Marathon to require "Available LA" resource on the slave to schedule a task (e.g. don't start on particular host if Available LA is < 2)
When available LA goes to 0 (due to influx of traffic at the same time as some job was started on the same server before the influx) - have Marathon move jobs to another slave, one that has more "Available LA"
Is there a way to achieve any of this?
So far, I gather that I can possible write a custom isolator module that will run on slaves, and report this custom metric to the master. Then I can use it in resource negotiation. Is this true?
I wasn't able to find anything on Marathon rescheduling tasks on different nodes if one becomes overloaded. Any suggestions?
As of Mesos 0.23.0 oversubscription is supported. Unfortunately it is not yet implemented in Marathon: https://github.com/mesosphere/marathon/issues/2424
In order to dynamically do allocation, you can use the Mesos slave metrics along with the Marathon HTTP API to scale, for example, as I've done here, in a different context. My colleague Niklas did related work with nibbler, which might also be of help.

Node.js CPU load balancing

I created test with JMeter to test performance of Ghost blogging platform. Ghost written in Node.js and was installed in cloud server with 1Gb RAM, 1 CPU.
I noticed after 400 concurrent users JMeter getting errors. Till 400 concurrent users load is normal. I decide increase CPU and added 1 CPU.
But errors reproduced and added 2 CPUs, totally 4 CPUs. The problem is occuring after 400 concurrent users.
I don't understand why 1 CPU can handle 400 users and the same results with 4 CPUs.
During monitoring I noticed that only one CPU is busy and 3 other CPUs idle. When I check JMeter summary in console there were errors, about 5% of request. See screenshot.
I would like to know is it possible to balance load between CPUs?
Are you using cluster module to load-balance and Node 0.10.x?
If that's so, please update your node.js to 0.11.x.
Node 0.10.x was using balancing algorithm provided by an operating system. In 0.11.x the algorithm was changed, so it will be more evenly distributed from now on.
Node.js is famously single-threaded (see this answer): a single node process will only use one core (see this answer for a more in-depth look), which is why you see that your program fully uses one core, and that all other cores are idle.
The usual solution is to use the cluster core module of Node, which helps you launch a cluster of Node processes to handle the load, by allowing you to create child processes that all share the same server ports.
However, you can't really use this without fixing Ghost's code. An option is to use pm2, which can wrap a node program, by using the cluster module for you. For instance, with four cores:
$ pm2 start app.js -i 4
In theory this should work, except if Ghost relies on some global variables (that can't be shared by every process).
Use cluster core and for load balancing nginx. Thats bad part about node.js. Fantastic framework, but developer has to enter into load balancing mess. While java and other runtimes makes is seamless. Anyway, nothing is perfect.

Resources