(MASM) - How important is restoring the stack

(MASM) - How important is restoring the stack - winapi

I've been learning assembly recently, starting to move into some of the more complex instructions for doing things. I understand how the stack works and the size of it but how important is restoring?
Lets say we call a function that takes a 4 byte integer
push 0
call ...
Now calling this a lot would surely cause the stack to overflow? I assume this with basic knowledge but something could be handled elsewhere.
How important is popping that integer off of the stack? (Or realign the stack pointer). Is it better to do this, possibly degrading performance maybe.
I understand this could be needed in some circumstances not all.
However I noticed that calling WINAPI functions change the stack itself and not restore it, why?

To answer your question(s):
Calling a lot would cause the stack to overflow?
Not necessarily - the WINAPI calling convention has the called method pop the stack. If all the called routines had a corresponding pop, then it would not cause an overflow.
How important is popping that integer off the stack?
As it is intended for use by the called function it would be expected to be popped off somewhere - either by the code performing the call if we're using a caller cleanup calling convention, or by the routine being called if we're using a callee cleanup calling convention.
Some situations not needing it
The WINAPI has the called routine pop the stack at the end, rather than the caller. This saved memory as all the callers did not need to pop the stack. Raymond Chen's blog has good overview of calling conventions; full specifications to all of them are also in the x86 tag wiki.
There are a lot of calling conventions, you need to follow them if you expect your code to work with other code. You can choose to 'roll your own', but you're always expected to follow some standard if you want to work with code that you didn't write.

Related

In my App memory graph, there are instances of dispatch_group leaked but I don't use that technology explicitly. Any possible suggestion?

In my macOS App with Mixed Objective-C/Swift, in the Xcode memory graph, there are instances of dispatch_group leaked:
I am a bit familiar with GCD and I use it in my project, but I don't use dispatch_groups explicitly in my code. I have thought that it could be some indirect usage of it when I call other GCD APIs like dispatch_async. I was wondering if there is somebody that can help me track this issue. Thanks for your attention.

In order to diagnose this, you want to know (a) what is keeping a strong reference to them; and (b) where these objects were instantiated. Unfortunately, unlike many objects, dispatch groups might not tell you much about the former (though your memory addresses suggest that there might be some object keeping a reference to them), but we can use the “Malloc Stack” feature to answer the latter.
So, edit your scheme (“Product” » “Scheme” » “Edit Scheme” or press command-<) and temporarily turn on this feature:
You can then click on the object in question and the panel on the right might show you illuminating stack information about where the object was allocated:
Now, in this case, in viewDidLoad I manually instantiated six dispatch groups, performed an enter, but not a leave, which is why these objects are still in memory. This is a common source of dispatch groups lingering in memory.
As you look at the stack trace, focus first on entries in your codebase (in white, rather than gray, in the stack trace). And if you click on your code in the stack trace, it will even jump you to the relevant line of code. But even if it is not in your code, often the stack trace will give you insights where these dispatch groups were created, and you can start your research there.
And remember, when you are done with your diagnostics, make sure to turn off the “Malloc Stack” feature.

How is a Linux kernel task's stack pointer determined for each thread?

I'm working on a tool that sometimes hijacks application execution, including working in a different stack.
I'm trying to get the kernel to always see the application stack when performing certain system calls, so that it will print the [stack] qualifier in the right place in /proc/pid/maps.
However, simply modifying the esp around the system call seems not to be enough. When I use my tool on "cat /proc/self/stat" I'm seeing kstkesp (entry 29 here) sometimes has the value I want but sometimes has a different value, corresponding to my alternate stack.
I'm trying to understand:
How is the value reflected in /proc/self/stat:29 determined?
Can I modify it so that it will reliably have an appropriate value?
If 2 is difficult to answer, where would you recommend that I look to understand why the value is intermittently incorrect?

Looks like it's defined e.g. in line 409 of http://lxr.free-electrons.com/source/fs/proc/array.c?v=3.16 to me.
There is lots of discussion about the related macro KSTK_ESP over the last few years for example: https://github.com/davet321/rpi-linux/commit/32effd19f64908551f8eff87e7975435edd16624
and
http://lists.openwall.net/linux-kernel/2015/01/04/140
From what I gather regarding the intermittent oddness it seems like an NMI or other interrupt hits inside the kernel sometimes and then it doesn't properly walk the stack in that case.

What is the design rationale behind HandleScope?

V8 requires a HandleScope to be declared in order to clean up any Local handles that were created within scope. I understand that HandleScope will dereference these handles for garbage collection, but I'm interested in why each Local class doesn't do the dereferencing themselves like most internal ref_ptr type helpers.
My thought is that HandleScope can do it more efficiently by dumping a large number of handles all at once rather than one by one as they would in a ref_ptr type scoped class.

Here is how I understand the documentation and the handles-inl.h source code. I, too, might be completely wrong since I'm not a V8 developer and documentation is scarce.
The garbage collector will, at times, move stuff from one memory location to another and, during one such sweep, also check which objects are still reachable and which are not. In contrast to reference-counting types like std::shared_ptr, this is able to detect and collect cyclic data structures. For all of this to work, V8 has to have a good idea about what objects are reachable.
On the other hand, objects are created and deleted quite a lot during the internals of some computation. You don't want too much overhead for each such operation. The way to achieve this is by creating a stack of handles. Each object listed in that stack is available from some handle in some C++ computation. In addition to this, there are persistent handles, which presumably take more work to set up and which can survive beyond C++ computations.
Having a stack of references requires that you use this in a stack-like way. There is no “invalid” mark in that stack. All the objects from bottom to top of the stack are valid object references. The way to ensure this is the LocalScope. It keeps things hierarchical. With reference counted pointers you can do something like this:
shared_ptr<Object>* f() {
shared_ptr<Object> a(new Object(1));
shared_ptr<Object>* b = new shared_ptr<Object>(new Object(2));
return b;
}
void g() {
shared_ptr<Object> c = *f();
}
Here the object 1 is created first, then the object 2 is created, then the function returns and object 1 is destroyed, then object 2 is destroyed. The key point here is that there is a point in time when object 1 is invalid but object 2 is still valid. That's what LocalScope aims to avoid.
Some other GC implementations examine the C stack and look for pointers they find there. This has a good chance of false positives, since stuff which is in fact data could be misinterpreted as a pointer. For reachability this might seem rather harmless, but when rewriting pointers since you're moving objects, this can be fatal. It has a number of other drawbacks, and relies a lot on how the low level implementation of the language actually works. V8 avoids that by keeping the handle stack separate from the function call stack, while at the same time ensuring that they are sufficiently aligned to guarantee the mentioned hierarchy requirements.
To offer yet another comparison: an object references by just one shared_ptr becomes collectible (and actually will be collected) once its C++ block scope ends. An object referenced by a v8::Handle will become collectible when leaving the nearest enclosing scope which did contain a HandleScope object. So programmers have more control over the granularity of stack operations. In a tight loop where performance is important, it might be useful to maintain just a single HandleScope for the whole computation, so that you won't have to access the handle stack data structure so often. On the other hand, doing so will keep all the objects around for the whole duration of the computation, which would be very bad indeed if this were a loop iterating over many values, since all of them would be kept around till the end. But the programmer has full control, and can arrange things in the most appropriate way.
Personally, I'd make sure to construct a HandleScope
At the beginning of every function which might be called from outside your code. This ensures that your code will clean up after itself.
In the body of every loop which might see more than three or so iterations, so that you only keep variables from the current iteration.
Around every block of code which is followed by some callback invocation, since this ensures that your stuff can get cleaned if the callback requires more memory.
Whenever I feel that something might produce considerable amounts of intermediate data which should get cleaned (or at least become collectible) as soon as possible.
In general I'd not create a HandleScope for every internal function if I can be sure that every other function calling this will already have set up a HandleScope. But that's probably a matter of taste.

Disclaimer: This may not be an official answer, more of a conjuncture on my part; but the v8 documentation is hardly
useful on this topic. So I may be proven wrong.
From my understanding, in developing various v8 based backed application. Its a means of handling the difference between the C++ and javaScript environment.
Imagine the following sequence, which a self dereferencing pointer can break the system.
JavaScript calls up a C++ wrapped v8 function : lets say helloWorld()
C++ function creates a v8::handle of value "hello world =x"
C++ returns the value to the v8 virtual machine
C++ function does its usual cleaning up of resources, including dereferencing of handles
Another C++ function / process, overwrites the freed memory space
V8 reads the handle : and the data is no longer the same "hell!#(#..."
And that's just the surface of the complicated inconsistency between the two; Hence to tackle the various issues of connecting the JavaScript VM (Virtual Machine) to the C++ interfacing code, i believe the development team, decided to simplify the issue via the following...
All variable handles, are to be stored in "buckets" aka HandleScopes, to be built / compiled / run / destroyed by their
respective C++ code, when needed.
Additionally all function handles, are to only refer to C++ static functions (i know this is irritating), which ensures the "existence"
of the function call regardless of constructors / destructor.
Think of it from a development point of view, in which it marks a very strong distinction between the JavaScript VM development team, and the C++ integration team (Chrome dev team?). Allowing both sides to work without interfering one another.
Lastly it could also be the sake of simplicity, to emulate multiple VM : as v8 was originally meant for google chrome. Hence a simple HandleScope creation and destruction whenever we open / close a tab, makes for much easier GC managment, especially in cases where you have many VM running (each tab in chrome).

Iterating over WDM device stack

As I understand, one can iterate the device stack of WDM devices only from the bottoms up, because DEVICE_OBJECT has an AttachedDevice member (but not a LowerDevice member). Luckily, the AddDevice callback receives the PhysicalDeviceObject so you can iterate over the entire stack.
From within my filter driver I'm trying to determine whether I'm already filtering a certain device object. (Let's say I have a legit reason for this. Bear with me.) My idea was to go over every DEVICE_OBJECT in the stack and compare its DriverObject member to mine.
Judging from the existence of IoGetAttachedDeviceReference, I assume just accessing AttachedDevice isn't a safe thing to do, for the risk of the device suddenly going away. However, IoGetAttachedDeviceReference brings me straight to the top of the stack, which is no good for me.
So, is there a safe way to iterate over a device stack?

Correct, you can't safely walk the AttachedDevice chain unless you can somehow guarantee that the stack will not be torn down (e.g. if you have an active file object referencing the stack). On Win2K this is pretty much your only option.
On XP and later, the preferred method is actually to walk from the top of the stack down. You can do this by calling IoGetAttachedDeviceReference and then calling IoGetLowerDeviceObject.
-scott

Getting the return value of a method via DTE.Debugger without evaluating the method?

I'm looking for a way to get the return value of a method via the Visual Studio Debugger (using DTE). Is it possible to obtain it if I'm at the closing brace of the method, but not yet exited? Also, it would be best if this could be possible without evaluating the function again via the immediate window.

Nope. The debugger doesn't have enough information about the exact way the JIT compiler generated the code to return the value. It is a heavy duty implementation detail of the jitter and the specific architecture it generates code for.
Simple types like objects and integral types are not much of a problem, usually the EAX/RAX register, FPU stack or XMM0 register. It gets convoluted when the method returns a struct. That gets mapped to registers it the struct fits, but needs to spill over in a temporary stack buffer when the struct is too large.
I suspect they'll need to do a lot of work on the metadata that the jitter generates to get that working. You'll know when that work is complete, it will become visible in the Autos window. Like it used to be, back in the simple days.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

(MASM) - How important is restoring the stack - winapi

Related

In my App memory graph, there are instances of dispatch_group leaked but I don't use that technology explicitly. Any possible suggestion?

How is a Linux kernel task's stack pointer determined for each thread?

What is the design rationale behind HandleScope?

Iterating over WDM device stack

Getting the return value of a method via DTE.Debugger without evaluating the method?

Categories

Resources