How is a reference counter implemented at compile time? - memory-management

Here is a made up set of function calls (I tried to make it complicated but perhaps it is easy).
function main(arg1, arg2) {
do_foo(arg1, arg2)
}
function do_foo(a, b) {
let x = a + b
let y = x * a
let z = x * b
let p = y + z
let q = x + z
let r = do_bar(&p)
let s = do_bar(&q)
}
function do_bar(&p, &q) {
*p += 1
*q += 3
let r = &p * &q
let s = &p + &q
let v = do_baz(&r, &s)
return &v
}
function do_baz(&a, &b) {
return *a + *b
}
How do you generally go about figuring out the liveness of variables and where you can insert instructions for reference counting?
Here is my attempt...
Start at the top function main. It starts with 2 arguments. Assume there is no copying that occurs. It passes the actual mutable values to do_foo.
Then we have x. X owns a and b. Then we see y. y is set to x, so link the previous x to this x. By r, we don't see x anymore, so perhaps it can be freed.... Looking at do_bar by itself, we know basically that p and q can't be garbage collected within this scope.
Basically, I have no idea how to start implementing an algorithm to implement ARC (ideally compile time reference counting, but runtime would be okay for now too to get started).
function main(arg1, arg2) {
let x = do_foo(arg1, arg2)
free(arg1)
free(arg2)
free(x)
}
function do_foo(a, b) {
let x = a + b
let y = x * a
let z = x * b
let p = y + z
free(y)
let q = x + z
free(x)
free(z)
let r = do_bar(&p)
let s = do_bar(&q)
return r + s
}
function do_bar(&p, &q) {
*p += 1
*q += 3
let r = &p * &q
let s = &p + &q
let v = do_baz(&r, &s)
free(r)
free(s)
return &v
}
function do_baz(&a, &b) {
return *a + *b
}
How do I start with implementing such an algorithm. I have searched for every paper on the topic but found no algorithms.

The following rules should do the job for your language.
When a variable is declared, increment its refcount
When a variable goes out of scope, decrement its refcount
When a reference-to-variable is assigned to a variable, adjust the reference counts for the variable(s):
increment the refcount for the variable whose reference is being assigned
decrement the refcount for the variable whose references was previously in the variable being assigned to (if it was not null)
When a variable containing a non-null reference-to-variable goes out of scope, decrement the refcount for the variable it referred to.
Note:
If your language allows reference-to-variable types to be used in data structures, "static" variables, etcetera, the rules abouve need to be extended ... in the obvious fashion.
An optimizing compiler may be able to eliminate some refcount increments and decrements.
Compile time reference counting:
There isn't really any such thing. Reference counting is done at runtime. It doesn't make sense to do it at compile time.
You are probably talking about analyzing the code to determine if runtime reference counting can be optimized or entirely eliminated.
I alluded to the former above. It is really a kind of peephole optimization.
The latter entails checking whether a reference-to-variable can ever escape; i.e. whether it could be used after the variable goes out of scope. (Try Googling for "escape analysis". This is kind of analogous to the "escape analysis" that a compiler could do to decide if an object could be allocated on the stack rather than in the heap.)

Related

vim rand() is not deterministic. Is this expected?

According to :help rand(),
rand([{expr}])
Return a pseudo-random Number generated with an xoshiro128**
algorithm using seed {expr}. The returned number is 32 bits,
also on 64 bits systems, for consistency.
{expr} can be initialized by srand() and will be updated by
rand(). If {expr} is omitted, an internal seed value is used
and updated.
Examples:
:echo rand()
:let seed = srand()
:echo rand(seed)
:echo rand(seed) % 16 " random number 0 - 15
It doesn't explain how a seed is changed every time rand() is called, but I expected it to be deterministically altered because
C++'s std::rand() does so,
and Wikipedia says
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm...
However, in the code below, the value of a is deterministic but the values of b are not deterministic; they take different values when you restart the script.
let seed = srand(0)
let a = rand(seed) "deterministic
let b = rand() "not deterministic (why?)
echo [a, b]
let seed = [0, 1, 2, 3]
let a = rand(seed) "deterministic
let b = rand() "not deterministic (why?)
echo [a, b]
Is this an expected behavior? I think the behavior contradicts the documentation.
Environments:
~ $ vi --version
VIM - Vi IMproved 8.2 (2019 Dec 12, compiled Apr 30 2020 13:32:36)
Included patches: 1-664
An algorithm used in Vim is fully deterministic. What creates a confusion is the fact that calling rand(seed) updates the seed "in place", but does not update any internal value(s). Therefore any subsequent rand() uses another (more or less random - quality depends on platform) internal seed value. So if you want to produce fully deterministic sequence, you must consequently invoke rand(seed) with the same variable.
This behaviour is easy to deduce from Vim's source code. Also :h rand() says that:
Return a pseudo-random Number generated with an xoshiro128**
algorithm using seed {expr}. The returned number is 32 bits,
also on 64 bits systems, for consistency.
{expr} can be initialized by srand() and will be updated by
rand(). If {expr} is omitted, an internal seed value is used
and updated.
If you find the wording misleading you can open an issue on github.
The documentation is badly written but the behavior is actually the expected one from the source code's perspective.
Analysis
rand() is defined as f_rand() in src/evalfunc.c. From the snippet at the end of this answer, we know some things:
f_rand() has only two sets of static variables: gx, ..., gw and initialized.
gx, ..., gw are the internal seeds. Their values are touched and referenced only when f_rand() is called with no argument (i.e. when argvars[0].v_type == VAR_UNKNOWN).
initialized remembers if f_rand() has ever been called with no argument and it is also touched and referenced only when f_rand() is called with no argument.
When f_rand() is called with a seed,
The value of the seed is used once and that is not saved as a static variable. In other words, the sentence "{expr} can be initialized by srand() and will be updated by rand()" in the documentation is nothing but a "lie"; {expr} is not remembered and thus not updated by the subsequent f_rand().
The value of the seed is updated in place via the pointers lx, ..., lw.
Conclusion
The sentence
{expr} can be initialized by srand() and will be updated by rand()
shall be modified to
{expr} can be initialized by srand() and will be updated by rand({expr}). You may want to store a seed into a variable and pass it to rand() since {expr} is not remembered in the function.
If you need the deterministic rand(), do this:
let seed = srand(0)
let a = rand(seed) "The value of `seed` is changed in place.
let b = rand(seed) "ditto
echo [a, b]
The Source Code of rand()
#define ROTL(x, k) ((x << k) | (x >> (32 - k)))
#define SPLITMIX32(x, z) ( \
z = (x += 0x9e3779b9), \
z = (z ^ (z >> 16)) * 0x85ebca6b, \
z = (z ^ (z >> 13)) * 0xc2b2ae35, \
z ^ (z >> 16) \
)
#define SHUFFLE_XOSHIRO128STARSTAR(x, y, z, w) \
result = ROTL(y * 5, 7) * 9; \
t = y << 9; \
z ^= x; \
w ^= y; \
y ^= z, x ^= w; \
z ^= t; \
w = ROTL(w, 11);
/*
* "rand()" function
*/
static void
f_rand(typval_T *argvars, typval_T *rettv)
{
list_T *l = NULL;
static UINT32_T gx, gy, gz, gw;
static int initialized = FALSE;
listitem_T *lx, *ly, *lz, *lw;
UINT32_T x, y, z, w, t, result;
if (argvars[0].v_type == VAR_UNKNOWN)
{
// When no argument is given use the global seed list.
if (initialized == FALSE)
{
// Initialize the global seed list.
init_srand(&x);
gx = SPLITMIX32(x, z);
gy = SPLITMIX32(x, z);
gz = SPLITMIX32(x, z);
gw = SPLITMIX32(x, z);
initialized = TRUE;
}
SHUFFLE_XOSHIRO128STARSTAR(gx, gy, gz, gw);
}
else if (argvars[0].v_type == VAR_LIST)
{
l = argvars[0].vval.v_list;
if (l == NULL || list_len(l) != 4)
goto theend;
lx = list_find(l, 0L);
ly = list_find(l, 1L);
lz = list_find(l, 2L);
lw = list_find(l, 3L);
if (lx->li_tv.v_type != VAR_NUMBER) goto theend;
if (ly->li_tv.v_type != VAR_NUMBER) goto theend;
if (lz->li_tv.v_type != VAR_NUMBER) goto theend;
if (lw->li_tv.v_type != VAR_NUMBER) goto theend;
x = (UINT32_T)lx->li_tv.vval.v_number;
y = (UINT32_T)ly->li_tv.vval.v_number;
z = (UINT32_T)lz->li_tv.vval.v_number;
w = (UINT32_T)lw->li_tv.vval.v_number;
SHUFFLE_XOSHIRO128STARSTAR(x, y, z, w);
lx->li_tv.vval.v_number = (varnumber_T)x;
ly->li_tv.vval.v_number = (varnumber_T)y;
lz->li_tv.vval.v_number = (varnumber_T)z;
lw->li_tv.vval.v_number = (varnumber_T)w;
}
else
goto theend;
rettv->v_type = VAR_NUMBER;
rettv->vval.v_number = (varnumber_T)result;
return;
theend:
semsg(_(e_invarg2), tv_get_string(&argvars[0]));
rettv->v_type = VAR_NUMBER;
rettv->vval.v_number = -1;
}

Getting a random number in a function in OCAML OR Telling compiler to evaluate function each time

I'm new to OCAML and was playing around with putting a marker on a random 5X5 square. I've written the example program below. "silly_method1" works but notice that it takes an argument. I don't really have argument to pass in for what I want. I'm just asking for a random number to create my robot on a particular square:
let create = {x = ( Random.int 4); y=3; face = North}
However, I get the same location each time. This makes sense to me... sort of. I'm assuming that the way I've set it up, "create" is basically a constant. It's evaluated once and that's it! I've fixed it below in silly_method2 but look how ugly it is!
let silly_method2 _ = (Random.int 10)
Every time I have to call it, I have to pass in an argument even though I'm not really using it.
What is the correct way to do this? There must be some way to have a function that takes no arguments and passes back a random number (or random tuple, etc.)
And possibly related... Is there a way to tell OCaml not to evaluate the function once and save the result but rather recalculate the answer each time?
Thank you for your patience with me!
Dave
let _ = Random.self_init()
let silly_method1 x = x + (Random.int 10)
let silly_method2 _ = (Random.int 10)
let report1 x = (print_newline(); print_string("report1 begin: "); print_int (silly_method1 x); print_string("report1 end"); print_newline(); )
let report2 y = (print_newline(); print_string("report2 begin: "); print_int(silly_method2 y ); print_string("report2 end"); print_newline(); )
let _ = report1 3
let _ = report1 3
let _ = report1 3
let _ = report2 3
let _ = report2 3
let _ = report2 3
The idiomatic way to define a function in OCaml that doesn't take an argument is to have the argument be (), which is a value (the only value) of type unit:
# let f () = Random.int 10;;
val f : unit -> int = <fun>
# f ();;
- : int = 5
# f ();;
- : int = 2
OCaml doesn't save function results for later re-use. If you want this behavior you have to ask for it explicitly using lazy.

How to pass a parameter that changes with time to SciLab ode?

I'm trying to solve and heat transfer problem using SciLab's ode function. The thing is: one of the parameters changes with time, h(t).
ODE
My question is: how can I pass an argument to ode function that is changing over time?
ode allows extra function's parameters as list :
It may happen that the simulator f needs extra arguments. In this
case, we can use the following feature. The f argument can also be a
list lst=list(f,u1,u2,...un) where f is a Scilab function with
syntax: ydot = f(t,y,u1,u2,...,un) and u1, u2, ..., un are extra
arguments which are automatically passed to the simulator simuf.
Extra parameter is a function of t
function y = f(t,y,h)
// define y here depending on t and h(t),eg y = t + h(t)
endfunction
function y = h(t)
// define here h(t), eg y = t
endfunction
// define y0,t0 and t
y = ode(y0, t0, t, list(f,h)) // this will pass the h function as a parameter
Extra is a vector for which we want to extract the corresponding term.
Since ode only compute the solution y at t. An idea is to look for Ti < t < Tj when ode performs a computation and get Hi < h < Hj.
This is rather ugly but totally works:
function y = h(t,T,H)
res = abs(t - T) // looking for nearest value of t in T
minres = min(res) // getting the smallest distance
lower = find(res==minres) // getting the index : T(lower)
res(res==minres)=%inf // looking for 2nd nearest value of t in T: nearest is set to inf
minres = min(res) // getting the smallest distance
upper = find(minres==res) // getting the index: T(upper)
// Now t is between T(lower) (nearest) and T(upper) (farest) (! T(lower) may be > T(upper))
y = ((T(upper)-t)*H(lower)+(t-T(lower))*H(upper))/(T(upper)-T(lower)) // computing h such as the barycenter with same distance to H(lower) and H(upper)
endfunction
function ydot=f(t, y,h,T,H)
hi = h(t,T,H) // if Ti< t < Tj; Hi<h(t,T,H)<Hj
disp([t,hi]) // with H = T, hi = t
ydot=y^2-y*sin(t)+cos(t) - hi // example of were to use hi
endfunction
// use base example of `ode`
y0=0;
t0=0;
t=0:0.1:%pi;
H = t // simple example
y = ode(y0,t0,t,list(f,h,t,H));
plot(t,y)

Best Practices with Initialization or Pre-allocation - MATLAB

My question doesn't depend expressly on one snippet of code, but is more conceptual.
Unlike some programming languages, MATLAB doesn't require variables to be initialized expressly before they're used. For example, this is perfectly valid to have halfway through a script file to define 'myVector':
myVector = vectorA .* vectorB
My question is: Is it faster to initialize variables (such as 'myVector' above) to zero and then assign values to them, or to keep initializing things throughout the program?
Here's a direct comparison of what I'm talking about:
Initializing throughout:
varA = 8;
varB = 2;
varC = varA - varB;
varD = varC * varB;
Initializing at start:
varA = 8;
varB = 2;
varC = 0;
varD = 0;
varC = varA - varB;
varD = varC * varB;
On one hand, it seems a bit of a waste to have these extra lines of code for no reason. On the other hand, though, it makes a little bit of sense that it would be faster to allocate all the memory for a program at once instead of spread out over the runtime.
Does anyone have a little insight?
Copy and paste your Initializing at start: code into MATLAB Editor Window and you would get this warning that looks like this -
And if you go into the Details, you would read this -
Explanation
The code does not appear to use the assignment to the indicated variable. This situation occurs when any of the following are true:
Another assignment overwrites the value of the variable before an operation uses it.
The specified argument value contains a typographical error, causing it to appear unused.
The code does not use all values returned by a function call...
In our case, the reason for this warning is The code does not use all values. So, this clarifies that initialization/pre-allocation won't help for that case.
When should we pre-allocate?
From my experience, pre-allocation helps when you need to later on index into part of it.
Thus, if you need to index into a portion of varC to store the results, pre-allocation would help. Hence, this would make more sense -
varC = zeros(...)
varD = zeros(...)
varC(k,:) = varA - varB;
varD(k,:) = varC * varB;
Again, while indexing if you are going beyond the size of varC, MATLAB would spend time trying to allocate more memory space for it, so that would slow things a bit. So, pre-allocate output variables to the maximum size which you think would be used for storing results. But, if you don't know the size of results, you are in a catch there and have to append results into the output variable(s) and that would slow down things for sure.
Alright! I've done some tests, and here are the results.
This is the code I used for the "throughout" variable assignments:
tic;
a = 1;
b = 2;
c = 3;
d = 4;
e = a - b;
f = e + c;
g = f - a;
h = g * c;
i = h - g;
j = 9 * i;
k = [j i h];
l = any(k);
b2(numel(b2) + 1) = toc
Here's the code for the "At Start" variable assignments:
tic;
a = 1;
b = 2;
c = 3;
d = 4;
e = 0;
f = 0;
g = 0;
h = 0;
i = 0;
j = 0;
k = 0;
l = 0;
e = a - b;
f = e + c;
g = f - a;
h = g * c;
i = h - g;
j = 9 * i;
k = [j i h];
l = any(k);
b1(numel(b1) + 1) = toc
I saved the time in the vectors 'b1' and 'b2'. Each was run with only MATLAB and Chrome open, and was the only script file open inside MATLAB. Each was run 201 times. Because the first time a program is run it compiles, I disregarded the first time value for both (I'm not interested in compile time).
To find the average, I used
mean(b1(2:201))
and
mean(b2(2:201))
The results:
"Throughout": 1.634311562062418e-05 seconds (0.000016343)
"At Start": 2.832598989758290e-05 seconds (0.000028326)
Interestingly (or perhaps not, who knows) defining variables only when needed, spread throughout the program was almost twice as fast.
I don't know whether this is because of the way MATLAB allocates memory (maybe it just grabs a huge chunk and doesn't need to keep allocating more every time you define a variable?) or if the allocation speed is just so fast that it's eclipsed by the extra lines of code.
NOTE: As Divakar points out, mileage may vary when using arrays. My testing should hold true for when the size of variables doesn't change, however.
tl;dr Setting variables to zero only to change it later is slow

Fibonacci recursion with a stack

I've already asked a question about this, yet I'm still confused. I want to convert a recursive function into a stack based function without recursion. Take, for example, the fibonacci function:
algorithm Fibonacci(x):
i = 0
i += Fibonacci(x-1)
i += Fibonacci(x-2)
return i
(Yes I know I didn't put a base case and that recursion for fibonacci is really inefficient)
How would this be implemented using an explicit stack? For example, if I have the stack as a while loop, I have to jump out of the loop in order to evaluate the first recursion, and I have no way of returning to the line after the first recursion and continue on with the second recursion.
in pseudo python
def fib(x):
tot = 0
stack = [x]
while stack:
a = stack.pop()
if a in [0,1]:
tot += 1
else:
stack.push(a - 1)
stack.push(a - 2)
return tot
If you do not want the external counter then you will need to push tuples that keep track of the accumulated sum and whether this was a - 1 or a - 2.
It is probably worth your time to explicitly write out the call stack (by hand, on paper) for a run of say fib(3) for your code (though fix your code first so it handles the boundary conditions). Once you do this it should be clear how to do it without a stack.
Edit:
Let us analyze the running of the following Fibonacci algorithm
def fib(x):
if (x == 0) or (x == 1):
return 1
else:
temp1 = fib(x - 1)
temp2 = fib(x - 2)
return temp1 + temp2
(Yes, I know that this isn't even an efficient implementation of an inefficient algorithm, I have declared more temporaries than necessary.)
Now when we use a stack for function calling we need to store two kinds of things on the stack.
Where to return the result.
Space for local variables.
In our case we have three possible places to return to.
Some outside caller
Assign to temp1
Assign to temp2
we also need space for three local variables x, temp1, and temp2. let us examine fib(3)
when we initially call fib we tell the stack that we want to return to wherever we cam from, x = 3, and temp1 and temp2 are uninitialized.
Next we push onto the stack that we want to assign temp1, x = 2 and temp1 and temp2 are uninitialized. We call again and we have a stack of
(assign temp1, x = 1, -, -)
(assign temp1, x = 2, -, -)
(out , x = 3, -, -)
we now return 1 and do the second call and get
(assign temp2, x = 0, -, -)
(assign temp1, x = 2, temp1 = 1, -)
(out , x = 3, -, -)
this now again returns 1
(assign temp1, x = 2, temp1 = 1, temp2 = 1)
(out , x = 3, -, -)
so this returns 2 and we get
(out , x = 3, temp1 =2, -)
So we now recurse to
(assign temp2, x = 1, -, -)
(out , x = 3, temp1 =2, -)
from which we can see our way out.
algorithm Fibonacci(x):
stack = [1,1]
while stack.length < x
push to the stack the sum of two topmost stack elements
return stack.last
You can preserve stack between calls as some kind of cache.
This stack is not a "true stack" since you can do more than only pushing, popping and checking its emptiness, but I believe this is what you are planning to do.
Your question inspired me to write a piece of code, that initially scared me, but I'm not really sure what to think about it now, so here it is for Your amusement. Maybe it can help a bit, with understanding things.
It's a blatant simulation of an execution of a recursive Fibonacci function implementation. The language is C#. For an argument 0 the function returns 0 - according to the definition of the Fibonacci sequence given by Ronald Graham, Donald Knuth, and Oren Patashnik in "Concrete Mathematics". It's defined this way also in Wikipedia. Checks for negative arguments are omitted.
Normally a return address is stored on the stack and execution just jumps to the right address. To simulate this I used an enum
enum JumpAddress
{
beforeTheFirstRecursiveInvocation,
betweenRecursiveInvocations,
afterTheSecondRecursiveInvocation,
outsideFibFunction
}
and a little state machine.
The Frame stored on the stack is defined like this:
class Frame
{
public int argument;
public int localVariable;
public JumpAddress returnAddress;
public Frame(int argument, JumpAddress returnAddress)
{
this.argument = argument;
this.localVariable = 0;
this.returnAddress = returnAddress;
}
}
It's a C# class - a reference type. The stack holds references to the objects placed on the heap, so when I'm doing this:
Frame top = stack.Peek();
top.localVariable = lastresult;
I'm modifying the object still referenced by the reference at the top of a stack, not a copy.
I model invocation of a function, by pushing a frame on the stack and setting the execution address in my state machine to the beginning - beforeTheFirstRecursiveInvocation.
To return form the function I set the lastresult variable, pointOfExecution variable to the return address stored in the top frame and pop the frame from the stack.
Here is the code.
public static int fib(int n)
{
Stack<Frame> stack = new Stack<Frame>(n);
//Constructor uses the parameter to reserve space.
int lastresult = 0;
//variable holding the result of the last "recursive" invocation
stack.Push(new Frame(n, JumpAddress.outsideFibFunction));
JumpAddress pointOfExecution = JumpAddress.beforeTheFirstRecursiveInvocation;
// that's how I model function invocation. I push a frame on the stack and set
// pointOfExecution. Above the frame stores the argument n and a return address
// - outsideFibFunction
while(pointOfExecution != JumpAddress.outsideFibFunction)
{
Frame top = stack.Peek();
switch(pointOfExecution)
{
case JumpAddress.beforeTheFirstRecursiveInvocation:
if(top.argument <= 1)
{
lastresult = top.argument;
pointOfExecution = top.returnAddress;
stack.Pop();
}
else
{
stack.Push(new Frame(top.argument - 1, JumpAddress.betweenRecursiveInvocations));
pointOfExecution = JumpAddress.beforeTheFirstRecursiveInvocation;
}
break;
case JumpAddress.betweenRecursiveInvocations:
top.localVariable = lastresult;
stack.Push(new Frame(top.argument - 2, JumpAddress.afterTheSecondRecursiveInvocation));
pointOfExecution = JumpAddress.beforeTheFirstRecursiveInvocation;
break;
case JumpAddress.afterTheSecondRecursiveInvocation:
lastresult += top.localVariable;
pointOfExecution = top.returnAddress;
stack.Pop();
break;
default:
System.Diagnostics.Debug.Assert(false,"This point should never be reached");
break;
}
}
return lastresult;
}
// 0<x<100
int fib[100];
fib[1]=1;
fib[2]=1;
if(x<=2)
cout<<1;
else{
for(i=3;i<=x;i++)
fib[i]=fib[i-1]+fib[i-2];
cout<<fib[x];
}
OR without using a vector
int x,y,z;
x=1;y=1;z=1;
if(x<=2)
cout<<1;
else{
for(i=3;i<=x;i++){
z=x+y;
x=y;
y=z;
}
cout<<z;
}
The last method works because you only need the previous 2 fibonacci numbers for creating the current one.

Resources