blocks and the stack - cocoa

According to bbum:
2) Blocks are created on the stack. Careful.
Consider:
typedef int(^Blocky)(void);
Blocky b[3];
for (int i=0; i<3; i++)
b[i] = ^{ return i;};
for (int i=0; i<3; i++)
printf("b %d\n", b[i]());
You might reasonably expect the above to output:
0
1
2
But, instead, you get:
2
2
2
Since the block is allocated on the stack, the code is nonsense. It
only outputs what it does because the Block created within the lexical
scope of the for() loop’s body hasn’t happened to have been reused for
something else by the compiler.
I don't understand that explanation. If the blocks are created on the stack, then after the for loop completes wouldn't the stack look something like this:
stack:
---------
^{ return i;} #3rd block
^{ return i;} #2nd block
^{ return i;} #1st block
But bbum seems to be saying that when each loop of the for loop completes, the block is popped off the stack; then after the last pop, the 3rd block just happens to be sitting there in unclaimed memory. Then somehow when you call the blocks the pointers all refer to the 3rd block??

You are completely misunderstanding what "on the stack" means.
There is no such thing as a "stack of variables". The "stack" refers to the "call stack", i.e. the stack of call frames. Each call frame stores the current state of the local variables of that function call. All the code in your example is inside a single function, hence there is only one call frame that is relevant here. The "stack" of call frames is not relevant.
The mentioning of "stack" means only that the block is allocated inside the call frame, like local variables. "On the stack" means it has lifetime akin to local variables, i.e. with "automatic storage duration", and its lifetime is scoped to the scope in which it was declared.
This means that the block is not valid after the end of the iteration of the for-loop in which it was created. And the pointer you have to the block now points to an invalid thing, and it is undefined behavior to dereference the pointer. Since the block's lifetime is over and the space it was using is unused, the compiler is free to use that place in the call frame for something else later.
You are lucky that the compiler decided to place a later block in the same place, so that when you try to access the location as a block, it produces a meaningful result. But this is really just undefined behavior. The compiler could, if it wanted, place an integer in part of that space and another variable in another part, and maybe a block in another part of that space, so that when you try to access that location as a block, it will do all sorts of bad things and maybe crash.
The lifetime of the block is exactly analogous to a local variable declared in that same scope. You can see the same result in a simpler example that uses a local variable that reproduces what's going on:
int *b[3];
for (int i=0; i<3; i++) {
int j = i;
b[i] = &j;
}
for (int i=0; i<3; i++)
printf("b %d\n", *b[i]);
prints (probably):
b 2
b 2
b 2
Here, as in the case with the block, you are also storing a pointer to something that is scoped inside the iteration of the loop, and using it after the loop. And again, just because you're lucky, the space for that variable happens to be allocated to the same variable from a later iteration of the loop, so it seems to give a meaningful result, even though it's just undefined behavior.
Now, if you're using ARC, you likely do not see what your quoted text says happening, because ARC requires that when storing something in a variable of block-pointer type (and b[i] has block-pointer type), that a copy is made instead of a retain, and the copy is stored instead. When a stack block is copied, it is moved to the heap (i.e. it is dynamically allocated, and has dynamic lifetime and is memory managed like other objects), and it returns a pointer to the heap block. This you can safely use after the scope.

Yeah, that does make sense, but you really have to think about it. When b[0] is given its value, the "^{ return 0;}" is never used again. b[0] is just the address of it. The compiler kept overwriting those temp functions on the stack as it went along, so the "2" is just the last function written in that space. If you print those 3 addresses as they are created, I bet they are all the same.
On the other hand, if you unroll your assignment loop, and add other references to "^{ return 0;}", like assigning it to a c[0], and you'll likely see b[0] != b[1] != b[2]:
b[0] = ^{ return 0;};
b[1] = ^{ return 1;};
b[2] = ^{ return 2;};
c[0] = ^{ return 0;};
c[1] = ^{ return 1;};
c[2] = ^{ return 2;};
Optimization settings could affect the outcome.
By the way, I don't think bbum is saying the pop happens after the for loop completion -- it's happening after each iteration hits that closing brace (end of scope).

Mike Ash provides the answer:
Block objects [which are allocated on the stack] are only valid through the lifetime of their
enclosing scope
In bbum's example, the scope of the block is the for-loop's enclosing braces(which bbum omitted):
for (int i=0; i<3; i++) {#<------
b[i] = ^{ return i;};
}#<-----
So, each time through the loop, the newly created block is pushed onto the stack; then when each loop ends, the block is popped off the stack.
If you print those 3 addresses as they are created, I bet they are all
the same.
Yes, I think that's the way that it must have worked in the past. However, now it appears that a loop does not cause the block to be popped off the stack. Now, it must be the method's braces that determine the block's enclosing scope. Edit: Nope. I constructed an experiment, and I still get different addresses for each block:
AppDelegate.h:
typedef int(^Blocky)(void); #******TYPEDEF HERE********
#interface AppDelegate : NSObject <NSApplicationDelegate>
#end
AppDelegate.m:
#import "AppDelegate.h"
#interface AppDelegate ()
-(Blocky)blockTest:(int)i {
Blocky myBlock = ^{return i;}; #If the block is allocated on the stack, it should be popped off the stack at the end of this method.
NSLog(#"%p", myBlock);
return myBlock;
}
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
// Insert code here to initialize your application
Blocky b[3];
for (int i=0; i < 3; ++i) {
b[i] = [self blockTest:i];
}
for (int j=0; j < 3; ++j) {
NSLog(#"%d", b[j]() );
}
}
#end
--output:--
0x608000051820
0x608000051850
0x6080000517c0
0
1
2
That looks to me like blocks are allocated on the heap.
Okay, my results above are due to ARC. If I turn off ARC, then I get different results:
0x7fff5fbfe658
0x7fff5fbfe658
0x7fff5fbfe658
2
1606411952
1606411952
That looks like stack allocation. Each pointer points to the same area of memory because after a block is popped off the stack, that area of memory is reused for the next block.
Then it looks like when the first block was called it just happened to get the correct result, but by the time the 2nd block was called, the system had overwritten the reclaimed memory resulting in a junk value? I'm still not clear on how calling a non-existent block results in a value??

Related

Swift assign function to var cause retain cycle?

I met a similar question in Swift Memory Management: Storing func in var but that didn't solve my problem.
Here is my class definition:
class Test {
var block: (() -> Int)?
func returnInt() -> Int {
return 1
}
deinit {
print("Test deinit")
}
}
I tried two ways to assign value to block property and got completely different result. The second approach didn't cause retain circle, which is quite unexpected:
var t = Test()
// This will lead to retain cycle
// t.block = t.returnInt
// I thought this will also lead to retain cycle but actually didn't
t.block = {
return t.returnInt()
}
t = Test()
In my opinion, variable t is captured by block while block is a property of t, so can anyone explain why there isn't a retain cycle?
In Swift, all captured variables are captured by reference (in Apple Blocks terminology, all captured local variables are __block). So the t inside the block is shared with the t outside the block; the block does not hold an independent copy of t.
Originally, there is a retain cycle in the second case too, as the block holds a reference to this shared copy of t, and t points to the first Test object, and that Test object's block property points to the block. However, when you re-assign the shared variable t (which is visible both inside and outside the block), you break the retain cycle, because t no longer points to the first Test object.
In the first case, the t is effectively captured by value, because the t is evaluated immediately in the expression t.returnInt rather than be captured as a variable in a block. So a reassignment of t outside the block later has no effect on the block, and does not break the retain cycle. So you can think of
t.block = t.returnInt
as kind of like
let tmp = t
t.block = {
return tmp.returnInt()
}

Add CCNodes to a Parent CCNode using a for loop

-(CCNode *)createFieldNode:(NSMutableArray *)fieldArray{
CGSize winSize = [CCDirector sharedDirector].viewSize;
CCNode* stackNode= [CCNode node];
for (int i; i <=fieldArray.count; i++){
//itemP is previous item in array and itemC is current item in area based on index i
BPItem*itemP;
BPItem*itemC;
if(i!=0){
itemP=[fieldArray objectAtIndex:i-1];
itemC=[fieldArray objectAtIndex:i];
float stackWidth=arc4random()%200+50;
float stackHeight=itemP.position.y+itemP.contentSize.height;
itemC.position=ccp(stackWidth,stackHeight);
}
else{
itemC=[fieldArray objectAtIndex:i];
float stackWidth=arc4random()%200+50;
itemC.position=ccp(stackWidth,0);
}
//having trouble adding multiple nodes to stackNode
[stackNode addChild:itemC];
}
return stackNode;
}
I want to add CCNodes from fieldArray on to a parent CCNode "stackNode". When I use breakpoints I am able to add the CCNode at index 0 and CCNode at index 1. However the program crash at i=2. The error I receive is:
Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'child already added to another node. It can't be added again'
Before the crash "stackNode" has two children. I'm not adding the CCNodes one by one because I have hundreds of different arrays with many a fieldArray.count around 20. Please help I can explain more if I have been unclear.
Change the for loop start as follows :
//itemP is previous item in array and itemC is current item in area based on index i
BPItem*itemP;
BPItem*itemC; // moved out of the for loop
for (int i; i <fieldArray.count; i++){ // <- changed the end condition to avoid crash
... rest of loop
Also, in the code that creates fieldArray, make certain you have logic to ensure that there are no duplicates, otherwise you will have the same issue (but for an altogether different reason).

what differs 2 function calls in a row?

Suppose you look at the stack and registers of a process which has the following code...
...
void Test()
{
for (int i = 0; i < 10; i++)
{
OneRunDontKnow();
}
}
...
You look at the stack twice exactly when the process executes the loop, and in both times the OneRunDontKnow is at the top of the stack.
Can you somehow know if OneRunDontKnow was popped out of the stack and then pushed in again or if it was never popped out?
EDIT: OneRunDontKnow can have any signature (it can also take parameters or return a value).
Probably the best way is to look at your assembled code. OneRunDontKnow() takes no parameters, so the only thing on the stack will be the instruction pointer, and other stack frame stuff, but no parameters. So find the place in the disassembly where OneRunDontKnow() should be called, and see what kind of PUSH and JMP inside the code where LOOP_ (LOOP, LOOPE, etc) is.

two consecutive "cudaMallocPitch" make the code failed

I wrote a simple CUDA code as follows:
//Allocate the first 2d array "deviceArray2DInput"
if(cudaMallocPitch((Float32**) &deviceArray2DInput, &devicePitch, sizeof(Float32)*deviceColNumber,deviceRowNumber) == cudaErrorMemoryAllocation){
return -1;
}
//Allocate the second 2d array "deviceArray2DOutput". It was suppose to hold the output of some process.
if(cudaMallocPitch((Float32**) &deviceArray2DOutput, &devicePitch,sizeof(Float32)*deviceRowNumber,deviceColNumber) == cudaErrorMemoryAllocation){
return -1;
}
//Copy data from "hostArrayR" to "deviceArray2DInput" (#1)
cudaMemcpy2D(deviceArray2DInput,devicePitch,hostArrayR,sizeof(Float32)*colNumber,sizeof(Float32)*deviceColNumber,deviceRowNumber,cudaMemcpyHostToDevice);
//Clean the top 10000 elements in "hostArrayR" for verification.
for(int i = 0; i < 10000; ++i){
hostArrayR[i] = 0;
}
//Copy data back from "deviceArray2DInput" to "hostArrayR"(#2)
cudaMemcpy2D(hostArrayR,sizeof(Float32)*colNumber,deviceArray2DInput,devicePitch,sizeof(Float32)*deviceColNumber,deviceRowNumber,cudaMemcpyDeviceToHost);
I commented out the second allocation block, the code worked well. It copied the data from the host array "hostArrayR" to the device array "deviceArray2DInput" and copied it back. However, if both allocation blocks existed, the copied-back "hostArrayR" was empty (no data was copyed back from device).
I am sure that the data was in "hostArrayR" at line (#1) but there was no data at line (#2). I cleaned the first 10000 elements (much lesss than the size of the array) to verfy that data did not come back.
I am using Nvidia Nsight 2.2 on Visual Studio 2010. The array size is 1024x768 and I am using floating 32-bit data. My GPU card is GTX570. It seems that there was no memory allocation error (or the code will return before doing copy stuffs).
I did not try "cudaMalloc()" because I prefer to use "cudaMallocPitch()" for memory alignment.
You should check the API calls against cudaSuccess, rather than one
specific error.
You should check the error value returned by the memcpys.
You're overwriting the devicePitch on the second cudaMallocPitch() call, the arrays have different shapes and hence could have different pitches.

Lua - Cocoa - String Concat - Simple Table to NSArray

Mac OS X 10.5 compatibility, Lua 5.0 compatibility required (hence cannot use current batch of LuaObjc bridges.)
My lua script produces an indexed table containing tens of thousands of strings.
Basic problem: how to concat those strings with a newline separator, to one string, quickly?
Fly in ointment: even using garbage-collection friendly concat code (provided at stackoverflow) the results take far too long for this purpose. (10 seconds vs 1 minute for a brute force solution.)
Proposed solution: offload the job to Cocoa, where it can be done in a fraction of a second, using NSArray's -componentsJoinedByString method.
New fly in ointment: how to get table data from Lua to Cocoa?
The script calls a registered C function, passing it the table. The C function tries to grab the table on the stack:
// Get an NSArray of strings from the first argument on the stack (a table).
NSArray *strings = nsArrayFromIndexedTable(luaState, index_1Based);
...
// Given a simple table consisting of numbers or strings, returns an NSArray.
// Nested subtables are not followed.
NSArray * nsArrayFromIndexedTable(lua_State *L, int argIdx)
{
// (Allegedly) stops GC.
lua_setgcthreshold(L, INT_MAX);
// Arg must be a table.
luaL_checktype(L, argIdx, LUA_TTABLE);
// Get num elements in table, build an array with that many.
int count = luaL_getn(L, 1);
NSMutableArray *array = [NSMutableArray arrayWithCapacity: count];
int i;
for (i = 1; i <= count; i++) {
lua_rawgeti(L, argIdx, i);
int valueType = lua_type(L, -1);
id value = 0x00;
if (valueType is_eq LUA_TNUMBER) {
value = [NSNumber numberWithDouble:lua_tonumber(L, -1)];
} else if (valueType is_eq LUA_TSTRING) {
value = [NSString stringWithUTF8String:lua_tostring(L, -1)];
}
if (value) {
[array addObject:value];
}
}
// Resume GC
lua_setgcthreshold(L, 0); // INTERMITTENT EXC_BAD_ACCESS CRASH HERE!!!!
return array;
}
Problem: calling this function with a (very large) Lua table of strings (intermittently) results in a EXC_BAD_ACCESS.
Debugger results are sporadic; sometimes not providing anything useful, but I've been able to glean that:
If those Lua GC lines included, the crash happens at lua_setgcthreshold, near the end of the function.
But... if those Lua GC lines are commented out, the crash happens at [array addObject:value]
(NSZombieEnabled is on, but is not providing useful info.)
Any help is appreciated.
This:
int count = luaL_getn(L, 1);
Should be:
int count = luaL_getn(L, argIdx);
So you may be getting an incorrect row count and scanning off the end of the table.
Maybe you grow your C stack too much. I am not familiar with Cocoa, but I guess that the Lua values need not be accessible all the time - the string should be copied into NSString. If it is so, try including a lua_pop(L, 1) at the end of the loop, to clean up the C stack and keep it from growing.

Resources