can you explain what's the difference between WORKER_EVICTOR and WORKER_BLOCK_ANNOTATOR,and why alluxio abandoned WORKER_EVICTOR?
We moved from passive background eviction to active in-line eviction in order to prevent many problems happening around peak capacity. For more through explanation, see this video's section starting from 14:50 -- https://www.alluxio.io/resources/videos/alluxio-architecture-and-scaling-performance/
Related
I have some instances of IHS working with more than 75% of memory usage. Is there any official recommendation for this max memory usage?
Is it high or growing unbounded? Are you measuring the RSS of each process or some system-wide metric?
If it's just high, you can make sure you aren't creating far more threads/processes then needed by looking at mod_mpmstats output over time then reducing things like MinSpareThreads. Additionally, MaxMemFree may help.
If you have unbounded growth, you can use MaxSpareThreads and MaxRequestsPerChild to recycle processes regularly.
Oracle documentation https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector.htm#JSGCT-GUID-98E80C82-24D8-41D4-BC39-B2583F04F1FF says that XX:G1HeapRegionSize must be power of 2 but there is no restriction in setting any value between 1m to 32m.
Question :
1.Can anyone explain why XX:G1HeapRegionSize must be power of 2 ?
2.For applications that deals with lots humongous objects, after extensive testing and analysing GC causes, GC throughput , GC pause time etc.,the setting XX:G1HeapRegionSize=10m seems to be appropriate. Is there any problem in setting 10m heap region for -Xmx12g ?
Can anyone explain why XX:G1HeapRegionSize must be power of 2 ?
Simply because that is how Oracle has implemented the G1 collector.
When implementing something like a garbage collector, you need to make design choices in order for it to work. There are probably a number of detailed reasons why the G1 designers made these particular choices. The general themes will be to optimize performance in critical G1 code paths (e.g. the write barriers) and/or to avoid unnecessary complexity in the G1 code base as a whole.
(For example, if the size of a region is a power of two, and if each one starts on an address that is divisible by the region size, then you can use masking to efficiently compute which region an address belongs in. If you can save some instructions in the write barrier code, that could mean a couple of percentage points of performance for the entire JVM. I won't go into more detail ... but you can look up the literature.)
But the reasons are moot. You either live with the G1 restriction, or you select a different garbage collector that is better for your use-case.
Is there any problem in setting 10m heap region for -Xmx12g ?
Well ... it won't work ... because of the restriction that you have already found.
On one of our NIFI instances, when we are in a backlog state,we encounter the throttling warning quite frequently. We have tuned the indexing threads and also upped the resources (CPU) allocated to the VM. What other things should we be looking at to identify what is causing the contention that is resulting in throttling? Obviously could be disk I/O, but when looking at the monitoring, nothing is jumping out there. Any suggestions on what others do to further investigate, would be greatly appreciated.
NIFI Version: 0.6.1
I would focus on disk contention. Are the flowfile, content, and provenance repositories all on the same physical partition? If yes then almost certainly it is disk contention related. A great command to use for this is 'iostat'. You can typically run something like 'iostat -xmh 5' and watch for utilization.
Now even on a well configured system it is possible to have just such a high rate of data that provenance indexing simply cannot keep up. These cases are fairly rare and almost always easily addressed by reducing the number of individual items floating around the flow (leveraging batching where appropriate).
There have been considerable performance related improvements since the 0.6.1 release regarding provenance handling and that may or may not help your case.
Worse case scenario is that you can switch to transient provenance which is all in memory and only keeps 100,000 recent events by default.
I saw idx miss % in mongostat but when I run
db.serverStatus().indexCounters
there is no response. where can I find this? And One more question, what is the appropriate page fault value I should concern?
The indexCounters information was specific to MMAP storage and not entirely accurate (for some examples, see: SERVER-9296, SERVER-9284, and SERVER-14583). The indexCounters section was removed during the development cycle leading up to MongoDB 3.0 along with some other former metrics like recordStats and workingSet. See: SERVER-16378 and discussion on related issues in the MongoDB Jira issue tracker.
If you have enabled the WiredTiger storage engine, note that there will be a new wiredTiger section in the serverStatus() output with relevant metrics.
what is the appropriate page fault value I should concern?
Page faults provide a good proxy for whether your working set fits in memory with MMAP, but the specific value of concern will depend on your deployment and whether there is any noticeable performance impact. Consistently high hard page faults (where data needs to be loaded from disk to RAM) will add I/O pressure, but this may not be significant depending on your disk configuration and overall workload.
A general best practice is to use a monitoring system like MMS (MongoDB Management Service) to capture a historical baseline of metrics for your deployment so you can then look for pain points when performance problems are observed.
It's also worth reading the Production Notes section of the MongoDB manual. If you are using Linux, for example, there are some suggestions on tuning file system and readahead parameters that can affect the efficiency of reading data from disk.
For an idea of how to approach metrics, see: Five MMS monitoring alerts to keep your MongoDB deployment on track. This blog post is a few years old but the general approach of determining normal, worrying, and critical limits (as well as identifying false positives) is still very relevant.
I have this question on my assignment this week, and I don't understand how the caches can be defeated, or how I can show it with an assembly program.. Can someone point me in the right direction?
Show, with assembly program examples, how the two different caches (Associative and Direct Mapping) can be defeated. Explain why this occurs and how it can be fixed. Are the same programs used to defeat the caches the same?
Note: This is homework. Don't just answer the question for me, it won't help me to understand the material.
A cache is there to increase performance. So defeating a cache means finding a pattern of memory accesses that decreases performance (in the presence of the cache) rather than increases it.
Bear in mind that the cache is limited in size (smaller than main memory, for instance) so typically defeating the cache involves filling it up so that it throws away the data you're just about to access, just before you access it.
If you're looking for a hint, think about splitting a data word across 2 cache lines.
(In case you're also looking for the answer, a similar problem was encountered by the x264 developers -- more information available here and here. The links are highly informative, and I really suggest you read them even after you've found your answer.)
Another thing to keep in mind is whether the caches you deal with are virtually or physically indexed / tagged. In some variants, cache aliasing forces line replacements even if the cache as such isn't completely filled. In other variants, cache/page coloring collisions might cause evictions. Finally, in multiprocessor systems under certain workloads, cacheline migrations (between the caches of different CPUs) may limit the usefulness of CPU caches.