Seeing high RES memory but very little inuse_space while profiling [closed] - go

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 days ago.
Improve this question
We have a job executor that runs periodic jobs. We have being seeing a constant growth in the RES memory (to the tune of 10 GB). However, our heap usage is constant at around 75-85MB. Here are the stats for the same:
process_resident_memory_bytes 1.0107805696e+10 = 10GB
go_memstats_heap_idle_bytes 1.00286464e+08 = 100MB
go_memstats_heap_inuse_bytes 1.01433344e+08 = 100MB
go_memstats_alloc_bytes 9.0241416e+07 = 90MB
go_gc_cycles_automatic_gc_cycles_total = 8249
go_gc_heap_allocs_bytes_total 9.9512908776e+10 = 99 GB
go_gc_heap_frees_bytes_total 9.942266736e+10 = 99GB
The go version is 1.19.1. Looking at the call graph, we see that json to struct conversion (using https://github.com/goccy/go-json) has been allocating a lot of space (40GB out of 99GB). However, its only seen in the alloc_space and not in the inuse_space. Any pointers to what could be the reason and if its anything you have also seen would be helpful.
We tried to use pprof to see the where the memory leak might be. However, it reported only 100MB was being used. Tried to profile the memory on different days, but the heap size was constant. However, the RES memory for the executable kept increasing.
Also, since go1.16 - go releases the memory promptly rather than lazily. So that also doesn't seem to be the issue (uses MADV_DONTNEED rather than MADV_FREE)

Related

Does C Code enjoy the Go GC's fragmentation prevention strategies? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Corrected the false implications:
Golang's GC does virtual address space defragmentation fragmentation-prevention strategies, which enables a program to run for a very long time (if not ever).
But it seems C code (cgo or SWIG) has no means of having it's memory pointers updated in case they get moved elsewhere. getting benefit from these strategies.
Is it true? Won't C code get benefit from Golang's virtual address space defragmentation fragmentation-prevention, and will finally get fragmentation?
If that's false, how?
Also, what happens to any DLL code loaded by C code (e.g. Windows DLLs) ?
(The question is updated to correct my wrong assumptions)
I'm afraid you might be confusing things on multiple levels here.
First, calling into C in a production-grade Go code is usually a no-go right from the start: it is slow; as slow as making a system call — as for the most part it really works as a system call: one need to switch from Go stack to C stack and have the OS thread which happened to be executing the Go code which made the cgo call to be locked to that thread even if something on the C side blocks.
That is not to say you must avoid calling out to C, but this means you need to think this through up front and measure. May be setting up a pool of worker goroutines onto which to fan out the tasks which need to make C calls.
Second, your memory concerns might be well unfounded; let me explain.
Fragmenting virtual memory should be a non-issue on contemporary systems
usually used to run Go programs (I mean amd64 and the like).
That is pretty much because allocating virtual memory does not force the OS
to actually allocate physical memory pages — the latter happens only
when the virtual memory gets used (that is, accessed at an address
happening to point into an allocated virtual memory region).
So, want you or not, you do have that physical memory fragmentation problem
anyway, and it is getting sorted out
at the OS and CPU level using multiple-layered address translation
tables (and TLB-caches).
Third, you appear to be falling into a common trap of speculating about
how things will perform under load instead of writing a highly simplified
model program and inspecting how it behaves under the estimated production
load. That is, you think a problem with allocating C memory will occur
and then fancy the whole thing will not work.
I would say your worries are unfounded — given the amount of production
code written in C and C++ and working under hardcore loads.
And finally, C and C++ programmers tred the pathways to high-performance
memory management long time ago. A typical solution is using custom
pool allocators for the objects which exhibit the most
allocation/deallocation churn under the typical load. With this approach,
the memory allocated on your C side is mostly stable for the lifetime
of your program.
TL;DR
Write a model program, put the estimated load on it and see how it behaves.
Then analyze, what the problems with the memory are, if any, and
only then start attacking them.
It depends.
The memory which the C code needs can be allocated by Go and it's pointer be passed to C code. In this case, C code will get benefit from Go's fragmentation-prevention strategies.
The same goes for DLL code, so if DLL functions don't allocate their working memory on their own, this can be done for them as well.

1. Why size of stack memory is fixed? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
If there is only stack memory, no heap memory , what are the problems will be created? i think it will make programs very fast.
i know objects are created in heap memory.
but if objects are created in stack memory, what will be the problems? Why we created heap memory?
i read.
Stack
very fast access
don't have to explicitly de-allocate variables
space is managed efficiently by CPU, memory will not become fragmented
limit on stack size (OS-dependent)
Heap
variables can be accessed globally
no limit on memory size
(relatively) slower access
no guaranteed efficient use of space, memory may become fragmented over time as blocks of memory are allocated, then freed
you must manage memory (you're in charge of allocating and freeing variables)
variables can be resized using realloc()
I would start with the advantages of heaps / disadvantages of stacks :
1. Allocation-order independent freeing : This is the most important answer to your question. If there was only stack based memory, you could not immediately free e memory area unless it is immediately on top of the stack. However, with a heap , you can free memory regardless the order of allocation requests. As in a dynamic software, you can not expect to know the order of free requests throughout the life cycle of your software, this is why you do not always want to use stack memory.
2. Determining the system-wide stack size : Another related point is that if there was only stack memory used , that means all threads in the system will have fixed memory. In that case, it would not be easy to determine the ideal default thread stack size. That would cause over-consumption of memory. Therefore this might lead us to out of memory issues even though there is actually unused enough memory. Regarding this one I suggest to look at this one: http://linuxtips.manki.in/2011/11/tweaking-stack-size-of-linux-processes.html
Areas where stack-like heap allocators can be useful : Game engines employ this technique. A good example is loading and unloading assets ( textures, shaders , 3d meshes etc )
during loading a level and unloading it. A stack like allocator is a natural solution as order of unloading assets will be the reverse of loading them. Here you can see an implementation : https://github.com/davidaug/stackallocator
Other issues with heaps : You can also consider these as more pros for stack as an addition to pros you mentioned in the question :
a) Multicore systems : Regarding stack-allocators another advantage is that there would be much less lock contention as allocation requests from different CPU cores is an important problem to solve for heap allocator design. As an addition to lock contention, you also have to deal with problems like false sharing. Regarding this one, you could look at http://www.drdobbs.com/parallel/understanding-and-avoiding-memory-issues/212400410
b) Fragmentation : The very first item I mentioned ( allocation order independent freeing ) is a must requirement. However, this also brings the problem of fragmentation : What is memory fragmentation?

Workingset/privateWorkingSet memory don't add up to memory usage in Task manager [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Memory Issue
One of our server boxes shows 96% memory usage in task manager (137/140GB or so used).
When I look in the "Processes" tab though (even with show processes from all users checked), the top processes combined only use 40GB or so combined at peak times. I've provided an image of top used processes below as well as an image of the performance panel showing the memory usage.
Note: CPU usage isn't usually at 99%, it spiked when I took that screenshot.
My Question
What is the reason for this discrepancy, and how can I more accurately tell which processes are eating the other 100GB of memory?
To verify, here's an image of the performance pannel:
Sergmat is correct in his comment (thanks by the way); I actually found RAMMAP myself yesterday and used it and it revealed the problem.
Our server runs a very heavily used SQL Server instance. RAMMAP reveals that there is a 105GB region of memory used for "AWE" Address Windowing Extensions - which are used to manipulate large regions of memory very quickly by things like RDBMS's (SQL Server).
Apparently you can configure the maximum memory SQL Server would use, this being included; so that's the solution.

context switch vs memory access, which is faster? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Got asked in an interview. They asked to order the following in terms of speed:
CPU register access,
Context switch
Memory access
Disk seek.
pretty sure the disk seek is the slowest and register access is the fastest, but not quite sure about the two in between. Can anyone explain it a bit?
I happened to find a surprisingly good answer on Yahoo!:
Fastest to slowest:
CPU
Memory
Context switching
Disk
Although:
Disk access may be significantly faster at times due to caching ... so
can memory access (CPUs sometimes manage a caches from main memory to
help speed up access and avoid competition for the bus).
Memory access could also be as slow or slightly slower than disk
access at times, due to virtual memory page swapping.
Context switching needs to be extremely fast in general ... if it was
slow then your CPU could begin to spend more time switching between
processes than actually performing meaningful work when several
processes are running concurrently.
Register access is nearly instantaneous.
(emphasis mine)
I agree with that answer.

where does OSs keep the programs that are not used for some minutes? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Typically in a working environment, I have many windows open, Outlook, 2/3 word docuements, few windows in browser, notepad++, some vpn client, excel etc..
Having said that, there are chances that about 40% of these apps are not frequently used, but are referred only sparingly. They occupy memory none-the-less.
Now, how does a typical OS deal with that kind of memory consumption ? does it suspend that app to hard disk (pagefile , or linux swap area etc) thereby freeing up that memory for usage, or does it keep occupying the memory there as it is.
Can this suspension be a practical solution, doable thing ? Are there any downsides ? response time ?
Is there some study material I can refer to for reading on this topic/direction..
would appreciate the help here.
The detailed answer depends on your OS and how it implements its memory management, but here is a generality:
The OS doesnt look at memory in terms of how many processes are in RAM, it looks at in terms of discrete units called pages. Most processes have several pages of RAM. Pages that are least referenced can be swapped out of RAM and onto the hard disk when physical RAM becomes scarce. Rarely, therefore, is an entire process swapped out of RAM, but only certain parts of it. It could be, for example, that some aspect of your currently running program is idle (ie the page is rarely accessed). In that case, it could be swapped out even though the process is in the foreground.
Try the wiki article for starters on how this process works and the many methods to implement it.

Resources