How can I programmatically create a process snapshot on windows/unix? - go

I am creating a Golang program that creates a process and then should be able to suspend it.
To make it more memory efficient, I would need my program to be able to dump the memory of the process to disk and reload it only when needed.
I cannot find any info here on Stack Overflow and also GitHub is not helping.
Any solution?

Attempting to answer this with the limited info..
To make it more memory efficient, I would need my program to be able to dump the memory of the process to disk and reload it only when needed.
This is generally something handled by your operating system (scheduler, memory management) controlling what processes are currently running / suspended / etc. and what memory needs to be paged in / out. Trying to implement the equivalent is quite complex, error prone, and likely to be less performant. Why do you believe you need to implement this yourself?
If you are building a program and want to have explicit control about whether it should be considered runnable or not, you could create a process which forks (creating two total processes), and have the parent process suspend and resume the child process using signals:
https://man7.org/linux/man-pages/man7/signal.7.html

Related

Easier way to aggregate a collection of memory accesses made by a Windows process?

I'm doing this as a personal project, I want to make a visualizer for this data. but the first step is getting the data.
My current plan is to
make my program debug the target process step through it
each step record the EIP from every thread's context within the target process
construct the memory address the instruction uses from the context and store it.
Is there an easier or built in way to do this?
Have a look at Intel PIN for dynamic binary instrumentation / running a hook for every load / store instruction. intel-pin
Instead of actually single-stepping in a debugger (extremely slow), it does binary-to-binary JIT to add calls to your hooks.
https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/index.html
Honestly the best way to do this is probably instrumentation like Peter suggested, depending on your goals. Have you ever ran a script that stepped through code in a debugger? Even automated it's incredibly slow. The only other alternative I see is page faults, which would also be incredibly slow but should still be faster than single step. Basically you make every page not in the currently executing section inaccessible. Any RW access outside of executing code will trigger an exception where you can log details and handle it. Of course this has a lot of flaws -- you can't detect RW in the current page, it's still going to be slow, it can get complicated such as handling page execution transfers, multiple threads, etc. The final possible solution I have would be to have a timer interrupt that checks RW access for each page. This would be incredibly fast and, although it would provide no specific addresses, it would give you an aggregate of pages written to and read from. I'm actually not entirely sure off the top of my head if Windows exposes that information already and I'm also not sure if there's a reliable way to guarantee your timers would get hit before the kernel clears those bits.

Can many (similar) processes use a common RAM cache?

As I understand the creation of processes, every process has it's own space in RAM for it's heap, data, etc, which is allocated upon its creation. Many processes can share their data and storage space in some ways. But since terminating a process would erase its allocated memory(so also its caches), I was wondering if it is possible that many (similar) processes share a cache in memory that is not allocated to any specific process, so that it can be used even when these processes are terminated and other ones are created.
This is a theoretical question from a student perspective, so I am merely interested in the general sence of an operating system, without adding more functionality to them to achieve it.
For example I think of a webserver that uses only single-threaded processes (maybe due to lack of multi-threading support), so that most of the processes created do similar jobs, like retrieving a certain page.
There are a least four ways what you describe can occur.
First, the system address space is shared by all processes. The Operating system can save data there that survives the death of a process.
Second, processes can map logical pages to the same physical page frame. The termination of one process does not cause the page frame to be deallocated to the other processes.
Third, some operating systems have support for writable shared libraries.
Fourth, memory mapped files.
There are probably others as well.
I think so, when a process is terminated the RAM clears it. However your right as things such as webpages will be stored in the Cache for when there re-called. For example -
You open Google and then go to another tab and close the open Google page, when you next go to Google it loads faster.
However, what I think your saying is if the Entire program E.G - Google Chrome or Safari - is closed, does the webpage you just had open stay in the cache? No, when the program is closed all its relative data is also terminated in order to fully close the program.
I guess this page has some info on it -
https://www.wikipedia.org/wiki/Shared_memory

Is it possible to associate data with a running process?

As the title says, I want to associate a random bit of data (ULONG) with a running process on the local machine. I want that data persisted with the process it's associated with, not the process thats reading & writing the data. Is this possible in Win32?
Yes but it can be tricky. You can't access an arbitrary memory address of another process and you can't count on shared memory because you want to do it with an arbitrary process.
The tricky way
What you can do is to create a window (with a special and known name) inside the process you want to decorate. See the end of the post for an alternative solution without windows.
First of all you have to get a handle to the process with OpenProcess.
Allocate memory with VirtualAllocEx in the other process to hold a short method that will create a (hidden) window with a special known name.
Copy that function from your own code with WriteProcessMemory.
Execute it with CreateRemoteThread.
Now you need a way to identify and read back this memory from another process other than the one that created that. For this you simply can find the window with that known name and you have your holder for a small chunk of data.
Please note that this technique may be used to inject code in another process so some Antivirus may warn about it.
Final notes
If Address Space Randomization is disabled you may not need to inject code in the process memory, you can call CreateRemoteThread with the address of a Windows kernel function with the same parameters (for example LoadLibrary). You can't do this with native applications (not linked to kernel32.dll).
You can't inject into system processes unless you have debug privileges for your process (with AdjustTokenPrivileges).
As alternative to the fake window you may create a suspended thread with a local variable, a TLS or stack entry used as data chunk. To find this thread you have to give it a name using, for example, this (but it's seldom applicable).
The naive way
A poor man solution (but probably much more easy to implement and somehow even more robust) can be to use ADS to hide a small data file for each process you want to monitor (of course an ADS associated with its image then it's not applicable for services and rundll'ed processes unless you make it much more complicated).
Iterate all processes and for each one create an ADS with a known name (and the process ID).
Inside it you have to store the system startup time and all the data you need.
To read back that informations:
Iterate all processes and check for that ADS, read it and compare the system startup time (if they mismatch then it means you found a widow ADS and it should be deleted.
Of course you have to take care of these widows so periodically you may need to check for them. Of course you can avoid this storing ALL these small chunk of data into a well-known location, your "reader" may check them all each time, deleting files no longer associated to a running process.

Limiting memory of V8 Context

I have a script server that runs arbitrary java script code on our servers. At any given time multiple scripts can be running and I would like to prevent one misbehaving script from eating up all the ram on the machine. I could do this by having each script run in its own process and have an off the shelf monitoring tool monitor the ram usage of each process, killing and restarting the ones that get out of hand. I don't want to do this because I would like to avoid the cost of restart the binary every time one of these scripts goes crazy. Is there a way in v8 to set a per context/isolate memory limit that I can use to sandbox the running scripts?
It should be easy to do now
context.EstimatedSize() to get estimated size of the context
isolate.TerminateExecution() when context goes out of acceptable memory/cpu usage/whatever
in order to get access if there is an infinite loop(or something else blocking, like high cpu calculation) I think you could use isolate.RequestInterrupt()
A single process can run multiple isolates, if you have a 1 isolate to 1 context ratio you can easily
restrict memory usage per isolate
get heap stats
See some examples in this commit:
https://github.com/discourse/mini_racer/commit/f7ec907547e9a6ea888b2587e4edee3766752dd3
In particular you have:
v8::HeapStatistics stats;
isolate->GetHeapStatistics(&stats);
There are also fancy features like memory allocation callbacks you can use.
This is not reliably possible.
All JavaScript contexts by this process share the same object heap.
WebKit/Chromium tries some stuff to disable contexts after context OOMs.
http://code.google.com/searchframe#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/bindings/v8/V8Proxy.cpp&exact_package=chromium&q=V8Proxy&type=cs&l=361
Sources:
http://code.google.com/p/v8/source/browse/trunk/src/heap.h?r=11125&spec=svn11125#280
http://code.google.com/p/chromium/issues/detail?id=40521
http://code.google.com/p/chromium/issues/detail?id=81227

Difference between pthread and fork on gnu/Linux

What is the basic difference between a pthread and fork w.r.t. linux in terms of
implementation differences and how the scheduling varies (does it vary ?)
I ran strace on two similar programs , one using pthreads and another using fork,
both in the end make clone() syscall with different arguments, so I am guessing
the two are essentially the same on a linux system but with pthreads being easier
to handle in code.
Can someone give a deep explanation?
EDIT : see also a related question
In C there are some differences however:
fork()
Purpose is to create a new process, which becomes the child process of the caller
Both processes will execute the next instruction following the fork() system call
Two identical copies of the computer's address space,code, and stack are created one for parent and child.
Thinking of the fork as it was a person; Forking causes a clone of your program (process), that is running the code it copied.
pthread_create()
Purpose is to create a new thread in the program which is given the same process of the caller
Threads within the same process can communicate using shared memory. (Be careful!)
The second thread will share data,open files, signal handlers and signal dispositions, current working directory, user and group ID's. The new thread will get its own stack, thread ID, and registers though.
Continuing the analogy; your program (process) grows a second arm when it creates a new thread, connected to the same brain.
On Linux, the system call clone clones a task, with a configurable level of sharing.
fork() calls clone(least sharing) and pthread_create() calls clone(most sharing).
forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory.
You should look at the clone manpage.
In particular, it lists all the possible clone modes and how they affect the process/thread, virtual memory space etc...
You say "threads easier to handle in code": that's very debatable. Writing bug-free, deadlock-free multi-thread code can be quite a challenge. Sometimes having two separate processes makes things much simpler.

Resources