How does nomad limit resource consumption for a task? If there are two tasks within a group that each have cpu = 100, is there a shared pool of 200 that both tasks have access to? What happens if one of those tasks wants access to more CPU ticks?
job "docs" {
group "example" {
task "server" {
resources {
cpu = 100
memory = 256
}
}
task "greeter" {
resources {
cpu = 100
memory = 256
}
}
}
}
Looking at /client/allocation/:alloc_id/stats, I see ThrottledPeriods, ThrottledTicks broken down for both resources and tasks -- will both resources and tasks throttle resource usage?
This doesn't fully answer the question, but the docker driver docs have some details about how part of it works:
CPU
Nomad limits containers' CPU based on CPU shares. CPU shares allow containers to burst past their CPU limits. CPU limits will only be imposed when there is contention for resources. When the host is under load your process may be throttled to stabilize QoS depending on how many shares it has. You can see how many CPU shares are available to your process by reading NOMAD_CPU_LIMIT. 1000 shares are approximately equal to 1 GHz.
Please keep the implications of CPU shares in mind when you load test workloads on Nomad.
Memory
Nomad limits containers' memory usage based on total virtual memory. This means that containers scheduled by Nomad cannot use swap. This is to ensure that a swappy process does not degrade performance for other workloads on the same host.
Since memory is not an elastic resource, you will need to make sure your container does not exceed the amount of memory allocated to it, or it will be terminated or crash when it tries to malloc. A process can inspect its memory limit by reading NOMAD_MEMORY_LIMIT, but will need to track its own memory usage. Memory limit is expressed in megabytes so 1024 = 1 GB.
Related
We have a 20-core CPU running on a KVM (CentOS 7.8)
We have two heavy enterprise JAVA applications (Java 8) running on the same node.
We are using ParallelGC in both and by default 14 GC threads are showing up (default no. determined using ~ 5/8 * no. of cores)
Is it okay to have GC threads(combined is 14+14 = 28) exceeding the no. of cores(20) in the system ? Will there be no issues when GC threads on both JVM instances are running concurrently?
Would it make sense to reduce the no. of GC threads to 10 each ?
How can we determine the minimum no. of GC threads (ParallelGC) needed to get the job done without impacting for an application?
Will there be no issues when GC threads on both JVM instances are running concurrently?
Well, if both JVMs are running the GC at the same time, then their respective GC runs may take longer because there are fewer physical cores available. However, the OS should (roughly speaking) give both JVMs an equal share of the available CPU. So nothing should break.
Would it make sense to reduce the no. of GC threads to 10 each ?
That would mean that if two JVMs were running the GC simultaneously then they wouldn't be competing for CPU.
But the flip-side is that a since a JVM has 10 threads for GC, its GC runs will always take longer than if it has 16 threads ... and 16 cores are currently available.
Bear in mind that most of the time you would expect that the JVMs are not GC'ing at the same time. (If they are, then probably something else is wrong1.)
How can we determine the minimum no. of GC threads (ParallelGC) needed to get the job done without impacting for an application?
By a process of trial and error:
Pick an initial setting
Measure performance with a indicative benchmark workload
Adjust setting
Repeat ... until you have determined the best settings.
But beware that if your actual workload doesn't match your benchmark, your settings maybe "off".
My advice would be that if you have two CPU intensive, performance critical applications, and you suspect that they are competing for resources, you should try to run them on different (dedicated!) compute nodes.
There is only so much you can achieve by "fiddling with the tuning knobs".
1 - Maybe the applications have memory leaks. Maybe they need to be profiled to look for major CPU hotspots. Maybe you don't have enough RAM and the real competition is for physical RAM pages and swap device bandwidth (during GC) rather than CPU.
We are running a Windows 2019 cluster in AWS ECS.
From time to time the instances get problems with higher cpu and memory usage, not related to container usage
When checking the instances we can see that the vmcompute process have spiked its memory usage (commit) to up to 90% of the system memory, and having a average CPU usage of at least 30-40%
But I fail to understand why that is happening, and if it is a real issue?
Or if the memory and CPU usage will decrease when more load is put onto the containers?
I am using scrapy to scrape multiple sites and Scrapyd to run spiders.
I had written 7 spiders and each spider processes at least 50 start URLs. I have around 7000 URL's. 1000 URL's for each spider.
As I start placing jobs in ScrapyD with 50 start URL's per job. Initially all spiders responds fine but suddenly they start working really slow. While running those on localhost it gives high performance.
While I run Scrapyd on localhost it gives me very high performance. As I publish jobs on Scrapyd server. Request response time drastically decreases.
Response time for each start URL is really slow after some time on server
Settings looks like this:
BOT_NAME = 'service_scraper'
SPIDER_MODULES = ['service_scraper.spiders']
NEWSPIDER_MODULE = 'service_scraper.spiders'
CONCURRENT_REQUESTS = 30
# DOWNLOAD_DELAY = 0
CONCURRENT_REQUESTS_PER_DOMAIN = 1000
ITEM_PIPELINES = {
'service_scraper.pipelines.MongoInsert': 300,
}
MONGO_URL="mongodb://xxxxx:yyyy"
EXTENSIONS = {'scrapy.contrib.feedexport.FeedExporter': None}
HTTPCACHE_ENABLED = True
We tried changing CONCURRENT_REQUESTS and CONCURRENT_REQUESTS_PER_DOMAIN, but nothing is working. We had hosted scrapyd in AWS EC2.
As with all performance testing, the goal is to find the performance bottleneck. This typically falls to one (or more) of:
Memory: Use top to measure memory consumption. If too much memory is consumed, it might swap to disk, which is slower than RAM. Try adding memory.
CPU: Use Amazon CloudWatch to track CPU. Be very careful with t2 instances (see below).
Disk speed: If the job is disk-intensive, or if memory is swapping to disk, this can impact performance -- especially for databases. Amazon EBS is network-attached disk, so network speed can actually throttle disk speed.
Network speed: Due to the multi-tenant design of Amazon EC2, network bandwidth is intentionally throttled. The amount of network bandwidth available depends upon the instance type used.
You are using a t2.small instance. It has:
Memory: 2GB (This is less than the 4GB on your own laptop)
CPU: The t2 family is extremely powerful, but the t2.small only receives an average 20% of CPU (see below).
Network: The t2.small is rated as Low to Moderate network bandwidth.
The fact that your CPU is recording 60%, while the t2.small is limited to an average 20% of CPU indicates that the instance is consuming CPU credits faster than they are being earned. This leads to an eventual exhaustion of CPU credits, thereby limiting the machine to 20% of CPU. This is highly likely to be impacting your performance. You can view CPU Credit balances in Amazon CloudWatch.
See: T2 Instances documentation for an understanding of CPU Credits.
Network bandwidth is relatively low for the t2.small. This impacts Internet access and communication with the Amazon EBS storage volume. Given that your application is downloading lots of web pages in parallel, and then writing them to disk, this is also a potential bottleneck for your system.
Bottom line: When comparing to the performance on your laptop, the instance in use has less memory, potentially less CPU due to exhaustion of CPU credits and potentially slower disk access due to high network traffic.
I recommend you use a larger instance type to confirm that performance is improved, then experiment with different instance types (both in the t2 family and outside of it) to determine what size machine gives you the best price/performance trade-off.
Continue to monitor the CPU, Memory and Network performance to identify the leading bottleneck, then aim to fix that bottleneck.
It might be a trivial question, but I feel like I need to ask this question:
When Heroku says that I have 512 MB of RAM, and 10 process types, does this mean I have 512MB of RAM per each process, or the 512MB is divided by the number of processes I use, e.g 512MB/10 = 51.2MB per process?
If it's the latter, doesn't it make the unlimited number of processes in Heroku useless? I don't understand this
Each dyno is an independent container running on a different instance. You can see them as a different server.
That means each running process will get it's own memory and CPU. The 512MB are therefore not divided by the number of processes.
I have 2 instances on Amazon EC2. The one is a t2.micro machine as web cache server, the other is a performance test tool.
When I started a test, TPS (transactions per second) was about 3000. But a few minutes later TPS has been decreased to 300.
At first I thought that the CPU credit balance was exhausted, but it was enough to process requests. During a test, the max outgoing traffic of web cache was 500Mbit/s, usage of CPU was 60% and free memory was more than enough.
I couldn't find any cause of TPS decrease. Is there any limitation on EC2 machine or network?
There are several factors that could be constraining your processes.
CPU credits on T2 instances
As you referenced, T2 instances use credits for bursting CPU. They are very powerful machines, but each instance is limited to a certain amount of CPU. t2.micro instances are given 10% of CPU, meaning they actually get 100% of the CPU only 10% of the time (at low millisecond resolution).
Instances start with CPU credits for a fast start, and these credits are consumed when the CPU is used faster than the credits are earned. However, you say that the credit balance was sufficient, so this appears not to be the cause.
Network Bandwidth
Each Amazon EC2 instance can use a certain throughput of network bandwidth. Smaller instances have 'low' bandwidth, bigger instances have more. There is no official statement of bandwidth size, but this is an interesting reference from Serverfault: Bandwidth limits for Amazon EC2
Disk IOPS
If your application uses disk access for each transaction, and your instance is using a General Purpose (SSD) instance type, then your disk may have consumed all available burst credits. If your disk is small, this could mean it will run slow (speed is 3 IOPS per GB, so a 20GB disk would run at 60 IOPS). Check the Amazon CloudWatch VolumeQueueLength metric to see if IO is queuing excessively.
Something else
The slowdown could also be due to your application or cache system (eg running out of free memory for storing data).