I saw idx miss % in mongostat but when I run
db.serverStatus().indexCounters
there is no response. where can I find this? And One more question, what is the appropriate page fault value I should concern?
The indexCounters information was specific to MMAP storage and not entirely accurate (for some examples, see: SERVER-9296, SERVER-9284, and SERVER-14583). The indexCounters section was removed during the development cycle leading up to MongoDB 3.0 along with some other former metrics like recordStats and workingSet. See: SERVER-16378 and discussion on related issues in the MongoDB Jira issue tracker.
If you have enabled the WiredTiger storage engine, note that there will be a new wiredTiger section in the serverStatus() output with relevant metrics.
what is the appropriate page fault value I should concern?
Page faults provide a good proxy for whether your working set fits in memory with MMAP, but the specific value of concern will depend on your deployment and whether there is any noticeable performance impact. Consistently high hard page faults (where data needs to be loaded from disk to RAM) will add I/O pressure, but this may not be significant depending on your disk configuration and overall workload.
A general best practice is to use a monitoring system like MMS (MongoDB Management Service) to capture a historical baseline of metrics for your deployment so you can then look for pain points when performance problems are observed.
It's also worth reading the Production Notes section of the MongoDB manual. If you are using Linux, for example, there are some suggestions on tuning file system and readahead parameters that can affect the efficiency of reading data from disk.
For an idea of how to approach metrics, see: Five MMS monitoring alerts to keep your MongoDB deployment on track. This blog post is a few years old but the general approach of determining normal, worrying, and critical limits (as well as identifying false positives) is still very relevant.
Related
I am working on implementing prototype performance monitoring system, I went through multiple documents and resources for understanding the concept but am still confused between profiling and dignostics. Can somebody provide an explanation of these two terms, their relation and when/where do we use them?
"Profiling" usually means mapping things happening in the system (e.g., performance monitoring events) to processes, or to functions (or instructions) within processes. Examples of profiling tools in the Unix/Linux world include "gprof" and "oprofile". Intel's "VTune Amplifier" is another commonly used profiler. Some profilers are limited to looking at the performance of a single process, while others (usually requiring elevated privileges) monitor all processes (including the kernel) operating on the system during the measurement period.
"Diagnostics" is not a term I see very often in performance monitoring, but from the context I would assume that this means looking for evidence of "trouble" in the overall operation of the system. As an example, the performance monitoring system at https://github.com/TACC/tacc_stats collects hardware and software performance monitoring data on each server. In TACC's operation, the data is reviewed automatically to look for matches to a variety of heuristics related to known patterns of poor performance (e.g., all memory accesses being made to one socket in a 2-socket system). The data is also used by human performance analysts in response to user queries and is aggregated to provide an overview of performance-related characteristics by application area.
On one of our NIFI instances, when we are in a backlog state,we encounter the throttling warning quite frequently. We have tuned the indexing threads and also upped the resources (CPU) allocated to the VM. What other things should we be looking at to identify what is causing the contention that is resulting in throttling? Obviously could be disk I/O, but when looking at the monitoring, nothing is jumping out there. Any suggestions on what others do to further investigate, would be greatly appreciated.
NIFI Version: 0.6.1
I would focus on disk contention. Are the flowfile, content, and provenance repositories all on the same physical partition? If yes then almost certainly it is disk contention related. A great command to use for this is 'iostat'. You can typically run something like 'iostat -xmh 5' and watch for utilization.
Now even on a well configured system it is possible to have just such a high rate of data that provenance indexing simply cannot keep up. These cases are fairly rare and almost always easily addressed by reducing the number of individual items floating around the flow (leveraging batching where appropriate).
There have been considerable performance related improvements since the 0.6.1 release regarding provenance handling and that may or may not help your case.
Worse case scenario is that you can switch to transient provenance which is all in memory and only keeps 100,000 recent events by default.
I am trying to develop a brand new multitasking software. This software will have multiple components and they will/can talk each other to get system working. Before release of this software to end users, we will run many round of validation to ensure good quality. During validation cycle our developers will be mainly relying on collected logs to troubleshoot and diagnose the issues. Since our debugging capability and time to market will be heavily dependent on collected traces, we will like to have as much as information available to developer at first hand. On the other hand if we add a lot of traces it will slow down our software and can eat many precious resources like CPU cycles etc. we will like to keep tracing/logging optimum. To solve this problem we want to provide guidance to our developer on judiciously using different log levels e.g. debug, info, warning, error etc. Here no of traces in debug > info > warning > error. we will like to start our validation cycle for software using say "info" level and gradually go to "warning" and "error" as software matures. Developers tend to put as much as traces they can so that they have all the information available. But as we will have multi-component system, total number of traces becomes very high and unmanageable leading to big log size, s/w performance issue etc. We want to give a generic guidance like if you have total 100 logs/traces for your component then say 100 should be available in debug, 70 should be available in Info, g", 10 should be available in error etc. So that we get a good quality of trace in first hand as well we keep control on total log/trace size. Is there any guidance/standard available, which we should use?
You're unlikely to find any standard guides containing information about how to efficiently or effectively log. The fact is that every application has different requirements with regards to latency. There are some tricks you can employ to reduce your overhead. The points I'm mentioning here come from years of experience with highly concurrent software for which logging was a real bottleneck.
Define Semantics and Make Them Configurable
Your post only gets as deep as describing some sort of hierarchy of log message types. Your question seems to be about whether there are best practices for limiting the overhead of these types. There are not, and this is because everybody needs different information at different times.
Let's assume your hierarchy of DEBUG < INFO < WARNING < ERROR < FATAL. (I introduced FATAL because you need one.) There are some best practices. They are best practices because they are common sense, not because an authority said so.
FATAL messages correspond to events that are going to terminate your software. You always log these, no matter what. This behavior hopefully obviously does not need to be configurable.
ERROR messages are things that have prevented your software from making expected progress, and it is unlikely that someone would have noticed.
WARNING messages correspond to unexpected events, possibly events that prevented something from happening, but it is likely someone would have noticed.
The distinction between visibility is the real difference between WARNING and ERROR. ERROR is higher priority specifically because it is not visible. WARNING has a higher probability of being brought to your attention.
INFO messages are purely informational, but their utility is largely for users of the software (as opposed to developers). These messages may include information about module start-up, version informations, etc. They are useful for understanding how a system is configured and its runtime state. They generally shouldn't be introduced as part of common runtime behavior.
DEBUG messages are also purely informational, but their utility is for developers. When DEBUG messages are enabled, you want to log everything, always.
If you treat this as a hierarchy, your configuration lets you specify the minimum log level. The alternative is to treat them as options (i.e. "I want to log ERROR and INFO"). Either way, it may be useful to have different settings for different levels. But regardless, make them configurable.
Regarding DEBUG messages, it can be beneficial to break these down further in modular software. If you're concerned about DEBUG traces at customer sites including too much information, break down DEBUG messages into a per-component thing. Allow configuring e.g. DEBUG.subsystemA, DEBUG.subsystemB, etc.
Use a Central Logger
At some point in your software, you need to actually put logs in a place. This should be done by a central service (library component, subsystem, whatever terminology makes sense for you) in your software.
Why centralize this? Many of the tricks you can employ to reduce the overhead of logging rely on being able to coordinate between multiple threads / tasks / whatever. It's easier to do this when log management is centralized. (Other approaches may end up requiring significant locking overhead, which sort of defeats the purpose.)
Reduce Memory Allocation
Naïve approaches to logging tend to be extremely difficult on memory. Ephemeral log entries are constantly being allocated and freed, strings are being created and concatenated all the time, etc. Memory allocation isn't cheap, so we should start there.
Define a maximum length of a log line and pre-allocate your log buffers. Much of the overhead of a naive logging solution comes from allocating and freeing memory to store log entries. In fact, log entries tend to be pretty ephemeral. If you can define a maximum length of a log entry, you can preallocate them, and use a pool of them to reduce allocation overhead through reuse.
Many languages have immutable strings, and operations like concatenation that are common in logging end up having high overhead due to extra allocation. Use string building interfaces that do not have the overhead of allocating new objects every time a string operation happens.
Somewhat relatedly, many languages have format specifiers for constructing strings. Format specifiers have high cost because format strings must be parsed at runtime, every time. Instead of using format specifiers, attempt to use a string building interface directly to avoid format specifier overhead.
Rate Limit
When you've reduced the allocation overhead, you may still find that the overhead is prohibitive in some cases. If you've used a log entry pool, you will have an easy means to rate limit. If you have not (you should have, it's a good idea), you should consider rate limiting by itself.
With a log entry pool, when the pool is empty, you simply refuse to service new log entries. This effectively limits the rate. Otherwise, you'll need to define some kind of counter and use some kind of aging mechanism to determine the relevant immediate rate of messages you are logging.
Rate limiting should happen before format strings are parsed and before you had to do any allocations for things. Otherwise, you're paying additional overhead to not do anything.
Batch Log Items
Queue your log items in a batch. This should be what you're already doing if you've created a centralized logging service in your software. Batching increases log latency, but reduces the immediate I/O overhead when your output is to a disk.
Make your flush interval configurable. Make your batch size configurable.
Queue Items Asynchronously
If you're batching items, it's clearly already asynchronous, but just to spell it out: a log request for anything but a FATAL-level log item is a request that can wait a little bit to be logged. Much of the overhead of logging is due to I/O overhead, especially when the log destination is a disk or network endpoint.
Provide Output Options
Logs don't have to go to files. They can go to network endpoints, files, or shared memory regions. Allow your users to define where logs go, and provide them with utilities or libraries to receive log items over those formats.
Make Custom Trace Formats
Sometimes logging is too heavyweight for the task at hand. In some cases, custom trace formats can allow you to log where you were unable to before.
In our software at Fastly, threads take snapshots of their execution history and put them in a ring buffer that is exposed via tmpfs. These snapshots are strategically placed around locks and in a few key places in our main state machine.
The ring buffer is exposed to the software via a shared mapped memory region, and since it lives on a Linux tmpfs, all operations remain in-memory. These traces are always on, but require minimal overhead. By implementing it this way, we are able to get the level of debugging information in our production deployments that allows us to debug race conditions, deadlocks, and other problems.
The idea behind these sorts of optimizations is that it reduces the need for you to discard error messages in the first place. If you log appropriately and judiciously, and your logging infrastructure is properly designed, you will never be want for more information.
The standard practice is to make the logging level configurable at run time. This is done by using a logging framework such as winston for javascript or an slf4j implementation for java.
Given that, I would let you developers work out on their own how much debug & trace information they need. It's also usually pretty clear when an error needs to be logged. So the only guidance / discussion we have is around how much info should be logged at the info level. I would let the developers use their best judgement & then adjust as necessary, with the general guidance that stack traces for handled errors or data dumps should be done at debug or trace levels only.
When the system is deployed, the log level should be set to info or higher. If an issue arises on a deployed system, the logging level can be increased temporarily to debug (or possibly higher) to allow the issue to be investigated.
Developers may run development systems with a debug or trace log level as the default.
During your validation cycle you can turn on debug logs as necessary to capture information.
I'm reading the O'Reilly Linux Kernel book and one of the things that was pointed out during the chapter on paging is that the Pentium cache lets the operating system associate a different cache management policy with each page frame. So I get that there could be scenarios where a program has very little spacial/temporal locality and memory accesses are random/infrequent enough that the probability of cache hits is below some sort of threshold.
I was wondering whether this mechanism is actually used in practice today? Or is it more of a feature that was necessary back when caches where fairly small and not as efficient as they are now? I could see it being useful for an embedded system with little overhead as far as system calls are necessary, are there other applications I am missing?
Having multiple cache management policies is widely used, whether by assigning whole regions using MTRRs (fixed/dynamic, as explained in Intel's PRM), MMIO regions, or through special instructions (e.g. streaming loads/stores, non-temporal prefetches, etc..). The use-cases also vary a lot, whether you're trying to map an external I/O device into virtual memory (and don't want CPU caching to impact its coherence), or whether you want to define a writethrough region for better integrity management of some database, or just want plain writeback to maximize the cache-hierarchy capacity and replacement efficiency (which means performance).
These usages often overlap (especially when multiple applications are running), so the flexibility is very much needed, as you said - you don't want data with little to no spatial/temporal locality to thrash out other lines you use all the time.
By the way, caches are never going to be big enough in the foreseeable future (with any known technology), since increasing them requires you to locate them further away from the core and pay in latency. So cache management is still, and is going to be for a long while, one of the most important things for performance critical systems and applications
A distributed system is described as scalable if it remains effective when there is a significant increase the number of resources and the number of users. However, these systems sometimes face performance bottlenecks. How can these be avoided?
The question is pretty broad, and depends entirely on what the system is doing.
Here are some things I've seen in systems to reduce bottlenecks.
Use caches, reducing network and disk bottlenecks. But remember that knowing when to evict from a cache eviction can a hard problem in some situations.
Use message queues to decouple components in the system. This way you can add more hardware to specific parts of the system that need it.
Delay computation when possible (often by using message queues). This takes the heat off the system during high-processing times.
Of course, design the system for parallel processing wherever possible. One host doing processing is NOT scalable. Note: most relational databases fall into the one-host bucket, this is why NoSQL has become suddenly popular; but not always appropriate (theoretically).
Use eventual consistency if possible. Strong consistency is much harder to scale.
Some are proponents for CQRS and DDD. Though I have never seen or designed a "CQRS system" nor a "DDD system," those have definitely affected the way I design systems.
There is a lot of overlap in the points above; some the techniques may use some of the others.
But, experience (your own and others) eventually teaches you about scalable systems. I keep up-to-date by reading about designs from google, amazon, twitter, facebook, and the like. Another good starting point is the high-scalability blog.
Just to build on a point discussed in the abover post, I would like to add that what you need for your distributed system is a distributed cache, so that when you intend on scaling your application the distributed cache acts like a "elastic" data-fabric meaning that you can increase storage capacity of the cache without compromising on performance and at the same time giving you a relaible platform that is accessible to multiple applications.
One such distributed caching solution is NCache. Do take a look!