Data structure for storing variables in an interpreted language - memory-management

I am designing my own experimental scripting language for the purpose of embedding it in my bigger application.
Almost everything I wanted to do was programmed smoothly, but the "simple" act of storing variables in memory appeared to be the hardest part here. I don't know how to store them to allow all type checking, global variables and special flags on them. First look at a sample code:
a = 1
b = 2
someFunction()
print(a) --> This should read the global variable and print `1`
a = 3 --> Now `a` should become a local variable of this function
and the global `a` remain unchanged
x = 4 --> `x` should always be local of this function
end
I call the "locality" of variables their levels so variables in nested blocks have a higher level. In the above code, a and b are level 1 variables. Local variables of someFunction will have level 2. The first line of the function should read the global variable a (level 1) but the second line should create a variable again called a but with level 2 that shadows the global a from that point onwards. The third line should create the variable x with level 2. How to store and keep track of all these in memory?
What I tried so far:
Method 1: Storing maps of variable=>value in array of levels:
variables
{
level=1 //global variables
{
a => 1,
b => 2
},
level=2 //function variables
{
a => 3,
x => 4
}
}
But that will make variable look-up really slow since one has to search all the levels for a given variable.
Method 2: Storing the (variable, level) pairs as keys of a map:
variables
{
(a, 1) => 1, //global
(b, 1) => 2, //global
(a, 2) => 3, //function
(x, 2) => 3 //function
}
This has the same problem as before since we have to try the pair (variable, level) with all possible levels for a given variable.
What method should I use for optimal memory usage and fastest access time?
Additional notes:
I know about how variables are managed on stack and heap on other "real" languages, but I find it tricky to do this on an interpreted language. "This mustn't be how Lua and Python do that," I always think. Correct me if I'm wrong. I'm trying to store the variable in maps and internal C++ structures.
And finally, this is how I represent a variable. Do you think it's big and there can be more memory-efficient representations? (I've also tried to put the "Level" as a member here but it had the same problem as the other too.)
struct Member
{
uchar type; //0=num, 1=str, 2=function, 3=array, etc
uchar flags; //0x80 = read-only, 0x40 = write-only, etc
union {
long double value_num;
char* value_str;
int value_func;
//etc
};
};

An easy thing to do, similar to your array, is to maintain a stack of maps. Each map contains the bindings for that scope. To bind a variable, add it to the top map; to look up a variable, start at the top of the stack and stop when you reach a map that contains a binding for that variable. Search takes a little bit, but starting from the top/end you only have to search until you find it — in most cases, this search will not be very long.
You can also make the stack implicit by encapsulating this logic in an Environment class that has local bindings and an inherited environment used for resolving unknown variables. Need to go into a new scope? Create a new environment with the current environment as its base, use it, then discard it when the scope is finished. The root/global environment can just have a null inherited environment. This is what I would probably do.

Its worth noting that if, inside a function, you don't have access to any variables from the caller function, it lowers the number of levels you need to look at. For example:
variable a;
function one() {
variable b;
// in this function, we can see the global a, local b
two();
}
function two() {
// in this function, we can see the global a, local c
// we cannot see the local b of our caller
variable c;
while (true) {
variable d;
// here we can see local d, local c, global a
}
}
The idea being that function boundaries limit the visibility of variables, with the global scope being "special".
That being said, you can consider removing the specialness of global variables, but allowing the code to specify that they want access to non-local variables
variable a;
function one() {
global a; // or upvar #0 a;
variable b;
// in this function, we can see the global a, local b
two();
}
function two() {
// in this function, we can see the local c
// and the local b of our caller
// (since we specifically say we want access to "b" one level up)
upvar 1 b;
variable c;
}
It looks complicated at first, but it's really easy to understand once you get used to it (upvar is a construct from the Tcl programming language). What it allows you is access to variables in your caller's scope, but it avoids some of the costly lookup involved by requiring that you specify exactly where that variable comes from (with 1 being one level up the call stack, 2 being two levels up, and #0 being "special" in saying "the uppermost call stack, the global)

Related

Why make a variable immutable and create a new entry when redefining a variable?

In SML,if I am correct, variables are immutable by default. So when we try to redefine a variable
val y = 100;
val y = 0.6;
y
the environment will have two entries for y. The new entry hides the original entry.
Isn't it the same effect as if we modified the value in the original entry from 100 to 0.6?
If the original entry was created outside a function call, and the new entry was created in a function call, then when the function call returns, we can access the original entry.
If both entries were created in the same "scope", like the example above, is the original entry not accessible?
Effectively, isn't it the same in SML as in an imperative language such as C? What is the point of making a variable immutable in SML and creating a new entry when redefining a variable?
Thanks.
Isn't it the same effect as if we modified the value in the original entry from 100 to 0.6
No. Consider code that references the previous environment through use of a closure like
val x = 7
val f = fn () => x
val x = 8
val _ = print (Int.toString (f ())) (* prints 7 *)
If the original entry was created outside a function call, and the new entry was created in a function call, then when the function call returns, we can access the original entry.
Sure, its statically scoped.
If both entries were created in the same "scope", like the example above, is the original entry not accessible?
It is still accessible, just not by that identifier. Consider the example above.
What is the point of making a variable immutable in SML and creating a new entry when redefining a variable?
One use of this is changing the type of a variable while still using the same identifier (which you do in the example you post!). Take for instance (in C):
int i = 7;
i = 7.0;
Here, i will still be of type int. Cf., in SML:
val i : int = 7
val i : real = 7.0
I've added type annotations for illustration, but even without this has the same behaviour. After the second binding, i has type real.
To amend kopecs' answer:
What is the point of making a variable immutable in SML and creating a new entry when redefining a variable?
That is the wrong way to phrase the question. :) Immutability by default has many merits, but that would be off-topic.
The actual question is: Why does SML allow multiple bindings for the same variable in a single scope? Instead of making it an error?
And that's a fair question. There is no deep reason. It merely turns out to be convenient sometimes. For example, to avoid spurious identifier clashes when using open. Or to allow redefining previous definitions in an interactive session.

Introspecting _ENV from coroutines

NB: I am using Lua version 5.3.
This question is motivated by Exercise 25.1 (p. 264) of Programming in Lua (4th ed.). That exercise reads as follows:
Exercise 25.1: Adapt getvarvalue (Listing 25.1) to work with different coroutines (like the functions from the debug library).
The function getvarvalue that the exercise refers to is copied verbatim below.
-- Listing 25.1 (p. 256) of *Programming in Lua* (4th ed.)
function getvarvalue (name, level, isenv)
local value
local found = false
level = (level or 1) + 1
-- try local variables
for i = 1, math.huge do
local n, v = debug.getlocal(level, i)
if not n then break end
if n == name then
value = v
found = true
end
end
if found then return "local", value end
-- try non-local variables
local func = debug.getinfo(level, "f").func
for i = 1, math.huge do
local n, v = debug.getupvalue(func, i)
if not n then break end
if n == name then return "upvalue", v end
end
if isenv then return "noenv" end -- avoid loop
-- not found; get value from the environment
local _, env = getvarvalue("_ENV", level, true)
if env then
return "global", env[name]
else -- no _ENV available
return "noenv"
end
end
Below is my enhanced version of this function, which implements the additional functionality specified in the exercise. This version accepts an optional thread parameter, expected to be a coroutine. The only differences between this enhanced version and the original getvarvalue are:
the handling of the additional optional thread parameter;
the special setting of the level parameter depending on whether the thread parameter is the same as the running coroutine or not; and
the passing of the thread parameter in the calls to debug.getlocal and debug.getinfo, and in the recursive call.
(I have marked these differences in the source code through numbered comments.)
function getvarvalue_enhanced (thread, name, level, isenv)
-- 1
if type(thread) ~= "thread" then
-- (thread, name, level, isenv)
-- (name, level, isenv)
isenv = level
level = name
name = thread
thread = coroutine.running()
end
local value
local found = false
-- 2
level = level or 1
if thread == coroutine.running() then
level = level + 1
end
-- try local variables
for i = 1, math.huge do
local n, v = debug.getlocal(thread, level, i) -- 3
if not n then break end
if n == name then
value = v
found = true
end
end
if found then return "local", value end
-- try non-local variables
local func = debug.getinfo(thread, level, "f").func -- 3
for i = 1, math.huge do
local n, v = debug.getupvalue(func, i)
if not n then break end
if n == name then return "upvalue", v end
end
if isenv then return "noenv" end -- avoid loop
-- not found; get value from the environment
local _, env = getvarvalue_enhanced(thread, "_ENV", level, true) -- 3
if env then
return "global", env[name]
else
return "noenv"
end
end
This function works reasonably well, but I have found one strange situation1 where it fails. The function make_nasty below generates a coroutine for which getvarvalue_enhanced fails to find an _ENV variable; i.e. it returns "noenv". (The function that serves as the basis for nasty is the closure closure_B, which in turn invokes the closure closure_A. It is closure_A that then yields.)
function make_nasty ()
local function closure_A () coroutine.yield() end
local function closure_B ()
closure_A()
end
local thread = coroutine.create(closure_B)
coroutine.resume(thread)
return thread
end
nasty = make_nasty()
print(getvarvalue_enhanced(nasty, "_ENV", 2))
-- noenv
In contrast, the almost identical function make_nice produces a coroutine for which getvarvalue_enhanced succeeds in finding an _ENV variable.
function make_nice ()
local function closure_A () coroutine.yield() end
local function closure_B ()
local _ = one_very_much_non_existent_global_variable -- only difference!
closure_A()
end
local thread = coroutine.create(closure_B)
coroutine.resume(thread)
return thread
end
nice = make_nice()
print(getvarvalue_enhanced(nice, "_ENV", 2))
-- upvalue table: 0x558a2633c930
The only difference between make_nasty and make_nice is that, in the latter, the closure closure_B references a non-existent global variable (and does nothing with it).
Q: How can I modify getvarvalue_enhanced so that it is able to locate _ENV for nasty, the way it does for nice?
EDIT: changed the names of the closures within make_nasty and make_nice.
EDIT2: the wording of Exercise 25.3 (same page) may be relevant here (my emphasis):
Exercise 25.3: Write a version of getvarvalue (Listing 25.1) that returns a table with all variables that are visible at the calling function. (The returned table should not include environmental variables; instead it should inherit them from the original environment.)
This question implies that there should be a way to get at the variables that are merely visible from a function, whether the function uses them or not. Such variables would certainly include _ENV. (The author is one of Lua's creators, so he knows what he's talking about.)
1 I am sure that someone with a better understanding of what is going on in this example will be able to come up with a less convoluted way to elicit the same behavior. The example I present here is the most minimal form I can come up with of the situation I found by accident.
local function inner_closure () coroutine.yield() end
local function outer_closure ()
inner_closure()
end
The function make_nasty below generates a coroutine for which getvarvalue_enhanced fails to find an _ENV variable; i.e. it returns "noenv"
That's a correct behavior.
The closure outer_closure has upvalue inner_closure but doesn't have upvalue _ENV.
This is how lexical scope works.
It's OK that some closures don't have _ENV upvalue.
In your example the closure inner_closure isn't defined inside the body of outer_closure.
inner_closure is not nested in outer_closure.
It's impossible.
If a closure doesn't make any use of the global environment _ENV, then it doesn't have that upvalue whatsoever.
A function like
local something = 20
local function noupval(x, y)
return x * y
end
Doesn't need or have any upvalues, not even to the global environment.
This question implies that there should be a way to get at the variables that are merely visible from a function, whether the function uses them or not.
There really isn't though. You can easily confirm this by looking at the output of luac -p -l <your_code.lua>, more precisely at the upvalues of each function.
If anything, I think using the word visible is somewhat misleading there. Visibility really only matters when creating a closure, but once it has been closed, that closure only has a set of upvalues which it can access.
Exercise 25.3: Write a version of getvarvalue (Listing 25.1) that returns a table with all variables that are visible at the calling function. (The returned table should not include environmental variables; instead it should inherit them from the original environment.)
You may have misunderstood that exercise; the way I understand it is something like this:
local upvalue = 20
local function foo()
local var = upvalue -- Create 1 local and access 1 upvalue
type(print) == "function" -- Access _ENV so it becomes an upvalue
return getvarvalue_enhanced()
end
and the above would return {var = 20, upvalue = 20, _ENV = <Proxy table to _ENV>}
After all, it asks specifically about the calling function, not one you pass as a parameter.
This doesn't change the fact that you still only get _ENV if you access it though. If you don't use any globals, the function won't have any reference to _ENV whatsoever.

Create two enum with identical 0 values

I want to create two enums with identical 0 (default values), which looks like:
enum testone_e {
NOCHANGE = 0,
DOONETHING,
BLABLA
};
enum testtwo_e {
NOCHANGE = 0,
DOANOTHERTJHING,
} ;
but the compiler complains about:
"NOCHANGE" has already been declared in the current scope
why that, isn't that two different scopes (as the values are in different enums)..? How do I solve this best?
This is with WindRiver's diab compiler
In C, all enumeration constants are ints in the global scope. (More accurately, in the scope of the enum itself, which is usually file scope.)
So you can only define each name once.

Binding using std::bind vs lambdas. How expensive are they?

I was playing with bind and I was thinking, are lambdas as expensive as function pointers?
What I mean is, as I understand lambdas, they are syntactic sugar for functors and bind is similar. However, if you do this:
#include<functional>
#include<iostream>
void fn2(int a, int b)
{
std::cout << a << ", " << b << std::endl;
}
void fn1(int a, int b)
{
//auto bound = std::bind(fn2, a, b);
//static auto bound = std::bind(fn2, a, b);
//auto bound = [&]{ fn2(a, b); };
static auto bound = [&]{ fn2(a, b); };
bound();
}
int main()
{
fn1(3, 4);
fn1(1, 2);
return 0;
}
Now, if I were to use the 1st auto bound = std::bind(fn2, a, b);, I get an output of 3, 4
1, 2, the 2nd I get 3, 4
3, 4. The 3rd and 4th I get output like the 1st.
Now I get why the 1st and 2nd work that way, they are getting initialised at the beginning of the function call (the static one, only the 1st time it is called). However, 3 and 4 seem to have compiler magic going on where the generated functors are not really creating references to the enclosing scope's variables, but are actually latching on to the symbols whether or not it is initialised only the first time or every time.
Can someone clarify what is actually happening here?
Edit: What I was missing is using static auto bound = std::bind(fn2, std::ref(a), std::ref(b)); to have it work as the 4th option.
You have this code:
static auto bound = [&]{ fn2(a, b); };
Assignment is done only first time you are invoking this function because it's static. So in fact it's called only once. Compiler creates closure when you are making lambdas, so references to a and b from first call to fn1 was captured. It's very risky. It may lead to dangling references. I'm surprised it didn't crashed since you are making closure from function parameters passed by value - to local variables.
I recommend this excellent article about lambdas: http://www.cprogramming.com/c++11/c++11-lambda-closures.html .
As a general rule, only use [&] lambdas when your closure is going to go away by the end of the current scope.
If it is going to outlast the current scope, and you need by-reference, explicitly capture the things you are going to capture, or create local pointers to the things you are going to capture and capture them by-value.
In your case, your static lambda code is full of undefined behavior, as you [&] capture a and b in the first call, then use it in the second call.
In theory, the compiler could rewrite your code to capture a and b by value instead of by reference, then call that every time, because the only difference between that implementation and the one you wrote occurs when the behavior is undefined, and the result will be much faster.
It could do a more efficient job by ignoring your static completely, as the entire state of your static object is undefined after you leave scope the first time you call, and the construction has no visible side effects.
To fix your problem with the lambdas, use [=] or [a,b] to introduce the lambda, and it will capture the a and b by value. I prefer to capture state explicitly on lambdas when I expect the lambda to persist longer than the current block.

Is it possible to inject values in the frama-c value analyzer?

I'm experimenting with the frama-c value analyzer to evaluate C-Code, which is actually threaded.
I want to ignore any threading problems that might occur und just inspect the possible values for a single thread. So far this works by setting the entry point to where the thread starts.
Now to my problem: Inside one thread I read values that are written by another thread, because frama-c does not (and should not?) consider threading (currently) it assumes my variable is in some broad range, but I know that the range is in fact much smaller.
Is it possible to tell the value analyzer the value range of this variable?
Example:
volatile int x = 0;
void f() {
while(x==0)
sleep(100);
...
}
Here frama-c detects that x is volatile and thus has range [--..--], but I know what the other thread will write into x, and I want to tell the analyzer that x can only be 0 or 1.
Is this possible with frama-c, especially in the gui?
Thanks in advance
Christian
This is currently not possible automatically. The value analysis considers that volatile variables always contain the full range of values included in their underlying type. There however exists a proprietary plug-in that transforms accesses to volatile variables into calls to user-supplied function. In your case, your code would be transformed into essentially this:
int x = 0;
void f() {
while(1) {
x = f_volatile_x();
if (x == 0)
sleep(100);
...
}
By specifying f_volatile_x correctly, you can ensure it returns values between 0 and 1 only.
If the variable 'x' is not modified in the thread you are studying, you could also initialize it at the beginning of the 'main' function with :
x = Frama_C_interval (0, 1);
This is a function defined by Frama-C in ...../share/frama-c/builtin.c so you have to add this file to your inputs when you use it.

Resources