Determining the maximum stack depth - algorithm

Imagine I have a stack-based toy language that comes with the operations Push, Pop, Jump and If.
I have a program and its input is the toy language. For instance I get the sequence
Push 1
Push 1
Pop
Pop
In that case the maximum stack would be 2. A more complicated example would use branches.
Push 1
Push true
If .success
Pop
Jump .continue
.success:
Push 1
Push 1
Pop
Pop
Pop
.continue:
In this case the maximum stack would be 3. However it is not possible to get the maximum stack by walking top to bottom as shown in this case since it would result in a stack-underflow error actually.
CFGs to the rescue you can build a graph and walk every possible path of the basic blocks you have. However since the number of paths can grow quickly for n vertices you get (n-1)! possible paths.
My current approach is to simplify the graph as much as possible and to have less possible paths. This works but I would consider it ugly. Is there a better (read: faster) way to attack this problem? I am fine if the algorithm produces a stack depth that is not optimal. If the correct stack size is m then my only constraint is that the result n is n >= m. Is there maybe a greedy algorithm available that would produce a good result here?
Update: I am aware of cycles and the invariant that all controlf flow merges have the same stack depth. I thought I write down a simple toy-like language to illustrate the issue. Basically I have a deterministic stack-based language (JVM bytecode), so each operation has a known stack-delta.
Please note that I do have a working solution to this problem that produces good results (simplified cfg) but I am looking for a better/faster approach.

Given that your language doesn't seem to have any user input all programs will simply compute in the same way all the time. Therefore, you could execute the program and keep track of the maximum stacksize during execution. Probably not what you want though.
As for your path argumentation: Be aware, that jumping allows cycles, hence, without further analysis a cycle might imply non-termination and stack overflows (i.e. stack size is increased after each cycle execution). [n nodes still means infinitely many paths if there is a cycle]
Instead of actual execution of the code you might be able to do some form of abstract interpretation.
Regarding the comment from IVlad: Simply counting the pushs is wrong due to the existence of possible cycles.
I am not sure what the semantics of your if-statements is though, so this could be useful too: Assume that an if-statement's label can only be a forward label (i.e., you can never jump back in your code). In that case your path counting argument comes back to life. In effect, the resulting CFG will be a tree (or DAG if you don't copy code). In that case you could do an approximative count, by a bottom-up computation of the number of pushs and then taking the maximum number of pushs for both branches in case of an if-statement. It's still not the optimal correct result, but yields a better approximation than a simple count of push-statements.

You generally want to have the stack depth invariant over jumps and loops.
That means that for every node, every incoming edge should have the same stack depth. This simplifies walking the CFG significantly, because backedges can no longer change the stack depth of already calculated instructions.
This also is requirement for bounded stack depth. If not enforced, you will have increasing loops in your code.
Another thing you should consider is making the stack effect of all opcodes deterministic. An example of a nondeterministic opcode would be: POP IF TopOfStack == 0.
Edit:
If you do have a deterministic set of opcodes and the stack depth invariant, there is no need to visit every possible path of the program. It's enough to do a DFS/BFS through the CFG to determine the maximum stack depth. This can be done in linear time (depending on the amount of instructions), but not faster.
Evaluating if the basic blocks, at which the outgoing edges of your current basic block point, still need to be evaluated should not be performance relevant. Even in the worst case, every instruction is an IF, there will be only 2*N edges to evaluate.

Related

What are some true Iterative Depth First Search implementation suggestions?

So from what I am seeing I have been thaught like most people that the iterative version of DFS is just like iterative BFS besides two differences: replace queue with stack and mark a node as discovered after POP not after PUSH.
Two questions have really been puzzling me recently:
For certain cases it will result in different output, but is it really necessary to mark a node as visited after we POP? Why is it wrong to do it after we PUSH? As far as I can see this will result in the same consequence as to why we do it for BFS...to not have duplicates in our queue/stack.
Now the big one: My impression is that this kind of implementation of iterative DFS is not true DFS at all: If we think about the recursive version it is quite space efficient since it doesnt store all the possible neighbours at one level(as we would do in the iterative version), it only selects one and goes with it and then it backtracks and goes for a second one. As an extreme example think of a graph with one node in the center and conected to 100 leaf nodes. In the recursive implementation if we start from the middle node the underlying stack will grow to maximum 2 ... one for the middle node and one for every leaf it visits. If we do it as we have been thaught with the iterative version the stack will grow to 1000 elements. That doesnt seem right.
So with all those previous details in mind, my question is what would the approach be to have a true iterative DFS implementation?
Question 1: If you check+mark before the push, it uses less space but changes the order in which nodes are visited. You will visit everything, though.
Question 2: You are correct that an iterative DFS will usually put all the children of a node onto the stack at the same time. This increases the space used for some graphs, but it doesn't change the worst case space usage, and it's the easiest way so there's usually no reason to change that.
Occasionally you know that it will save a lot of space if you don't do this, and then you can write an iterative DFS that works more like the recursive one. Instead of pushing the next nodes to visit on the stack, you push a parent and a position in its list of children, or equivalent, which is pretty much what the recursive version has to remember when it recurses. In pseudo-code, it looks like this:
func DFS(start):
let visited = EMPTY_SET
let stack = EMPTY_STACK
visited.add(start)
visit(start)
stack.push( (start,0) )
while(!stack.isEmpty()):
let (parent, pos) = stack.pop()
if (pos < parent.numChildren()):
let child = parent.child[pos]
stack.push(parent,pos+1)
if (!visited.contains(child)):
visited.add(child)
visit(child)
stack.push( (child,0) )
You can see that it's a little more complicated, and the records you push on the stack are tuples, which is annoying in some cases. Often we'll use two stacks in parallel instead of creating tuples to push, or we'll push/pop two records at a time, depending on how nodes and child list positions have to be represented.

Most effective Algorithm to find maximum of double-precision values

What is the most effective way of finding a maximum value in a set of variables?
I have seen solutions, such as
private double findMax(double... vals) {
double max = Double.NEGATIVE_INFINITY;
for (double d : vals) {
if (d > max) max = d;
}
return max;
}
But, what would be the most effective algorithm for doing this?
You can't reduce the complexity below O(n) if the list is unsorted... but you can improve the constant factor by a lot. Use SIMD. For example, in SSE you would use the MAXSS instruction to perform 4-ish compare+select operations in a single cycle. Unroll the loop a bit to reduce the cost of loop control logic. And then outside the loop, find the max out of the four values trapped in your SSE register.
This gives a benefit for any size list... also using multithreading makes sense for really large lists.
Assuming the list does not have elements in any particular order, the algorithm you mentioned in your question is optimal. It must look at every element once, thus it takes time directly proportional to the to the size of the list, O(n).
There is no algorithm for finding the maximum that has a lower upper bound than O(n).
Proof: Suppose for a contradiction that there is an algorithm that finds the maximum of a list in less than O(n) time. Then there must be at least one element that it does not examine. If the algorithm selects this element as the maximum, an adversary may choose a value for the element such that it is smaller than one of the examined elements. If the algorithm selects any other element as the maximum, an adversary may choose a value for the element such that it is larger than the other elements. In either case, the algorithm will fail to find the maximum.
EDIT: This was my attempt answer, but please look at the coments where #BenVoigt proposes a better way to optimize the expression
You need to traverse the whole list at least once
so it'd be a matter of finding a more efficient expression for if (d>max) max=d, if any.
Assuming we need the general case where the list is unsorted (if we keep it sorted we'd just pick the last item as #IgnacioVazquez points in the comments), and researching a little about branch prediction (Why is it faster to process a sorted array than an unsorted array? , see 4th answer) , looks like
if (d>max) max=d;
can be more efficiently rewritten as
max=d>max?d:max;
The reason is, the first statement is normally translated into a branch (though it's totally compiler and language dependent, but at least in C and C++, and even in a VM-based language like Java happens) while the second one is translated into a conditional move.
Modern processors have a big penalty in branches if the prediction goes wrong (the execution pipelines have to be reset), while a conditional move is an atomic operation that doesn't affect the pipelines.
The random nature of the elements in the list (one can be greater or lesser than the current maximum with equal probability) will cause many branch predictions to go wrong.
Please refer to the linked question for a nice discussion of all this, together with benchmarks.

How to get O(log n) space complexity in quicksort? [duplicate]

Why do we prefer to sort the smaller partition of a file and push the larger one on stack after partitioning for quicksort(non-recursive implementation)? Doing this reduces the space complexity of quicksort O(log n) for random files. Could someone elaborate it?
As you know, at each recursive step, you partition an array. Push the larger part on the stack, continue working on the smaller part.
Because the one you carry on working with is the smaller one, it is at most half the size of the one you were working with before. So for each range we push on the stack, we halve the size of the range we're working with.
That means we can't push more than log n ranges onto the stack before the range we're working with hits size 1 (and therefore is sorted). This bounds the amount of stack we need to complete the first descent.
When we start processing the "big parts", each "big part" B(k) is bigger than the "small part" S(k) produced at the same time, so we might need more stack to handle B(k) than we needed to handle S(k). But B(k) is still smaller than the previous "small part", S(k-1) and once we're processing B(k), we've taken it back off the stack, which therefore is one item smaller than when we processed S(k), and the same size as when we processed S(k-1). So we still have our bound.
Suppose we did it the other way around - push the small part and continue working with the large part. Then in the pathologically nasty case, we'd push a size 1 range on the stack each time, and continue working with a size only 2 smaller than the previous size. Hence we'd need n / 2 slots in our stack.
Consider the worst case where you partition in such a way that your partition is 1:n. If you sort small subfile first than you only need to use O(1) space, as you push the large subfile and then pop it back (and then again push the large subfile). But, if you sort large subfile first than you need O(N) space, because you keep pushing 1 element array in the stack.
Here is a quote from Algorithms by ROBERT SEDGEWICK (he was the one who wrote paper on this) :
For Quicksort, the combination of end- recursion removal and a policy
of processing the smaller of the two subfiles first turns out to
ensure that the stack need only contain room for about, lg N entries,
since each entry on the stack after the top one must represent a
subfile less than half the size of the previous entry.
OK, am I right that you mean if we make the Quicksort algorithm non-recursive, you have to use a stack where you put partitions on the stack?
If so: an algorithm must allocate for each variable it uses memory. So, if you run two instances of it parallel, they are allocating the double amount of one algorithm memory space...
Now, in a recursive version, you start a new instance of the algorithm (which needs to allocate memory) BUT the instance which calls the recursive one, DOES NOT end, so the allocated memory is needed! -> in fact, we have started lets say 10 recursive instances and need 10*X memory, where X is the memory needed by one instance.
Now, we use the non-recursive algorithm. You MUST only allocate the needed memory ONCE. In fact, helper variables only use the space of one instance. To accomplish the function of the algorithm we must save the already made partitions (or what we haven't done already). In fact, we put it on a stack and take the partitions off until we made the last "recursion" step. So, imagine: you are giving the algorithm an array. The recursive algorithm needs to allocate the whole array and some helper variables with each instance (again: if the recursion depth is 10, we need 10*X memory where the array needs much).
The non-recursive one needs to allocate the array, helper variables only once BUT it needs a stack. However, in the end you won't put so many parts on the stack that the recursive algorithm will need less memory due to the part that we doesn't need to allocate the array again each time/instance.
I hope, I have described it so that you can understand it, but my English isn't soooo good. :)

Algorithm for Connect 4 Evaluation of Data Set

I am working on a connect 4 AI, and saw many people were using this data set, containing all the legal positions at 8 ply, and their eventual outcome.
I am using a standard minimax with alpha/beta pruning as my search algorithm. It seems like this data set could could be really useful for my AI. However, I'm trying to find the best way to implement it. I thought the best approach might be to process the list, and use the board state as a hash for the eventual result (win, loss, draw).
What is the best way for to design an AI to use a data set like this? Is my idea of hashing the board state, and using it in a traditional search algorithm (eg. minimax) on the right track? or is there is better way?
Update: I ended up converting the large move database to a plain test format, where 1 represented X and -1 O. Then I used a string of the board state, an an integer representing the eventual outcome, and put it in an std::unsorted_map (see Stack Overflow With Unordered Map to for a problem I ran into). The performance of the map was excellent. It built quickly, and the lookups were fast. However, I never quite got the search right. Is the right way to approach the problem to just search the database when the number of turns in the game is less than 8, then switch over to a regular alpha-beta?
Your approach seems correct.
For the first 8 moves, use alpha-beta algorithm, and use the look-up table to evaluate the value of each node at depth 8.
Once you have "exhausted" the table (exceeded 8 moves in the game) - you should switch to regular alpha-beta algorithm, that ends with terminal states (leaves in the game tree).
This is extremely helpful because:
Remember that the complexity of searching the tree is O(B^d) - where B is the branch factor (number of possible moves per state) and d is the needed depth until the end.
By using this approach you effectively decrease both B and d for the maximal waiting times (longest moves needed to be calculated) because:
Your maximal depth shrinks significantly to d-8 (only for the last moves), effectively decreasing d!
The branch factor itself tends to shrink in this game after a few moves (many moves become impossible or leading to defeat and should not be explored), this decreases B.
In the first move, you shrink the number of developed nodes as well
to B^8 instead of B^d.
So, because of these - the maximal waiting time decreases significantly by using this approach.
Also note: If you find the optimization not enough - you can always expand your look up table (to 9,10,... first moves), of course it will increase the needed space exponentially - this is a tradeoff you need to examine and chose what best serves your needs (maybe even store the entire game in file system if the main memory is not enough should be considered)

quicksort stack size

Why do we prefer to sort the smaller partition of a file and push the larger one on stack after partitioning for quicksort(non-recursive implementation)? Doing this reduces the space complexity of quicksort O(log n) for random files. Could someone elaborate it?
As you know, at each recursive step, you partition an array. Push the larger part on the stack, continue working on the smaller part.
Because the one you carry on working with is the smaller one, it is at most half the size of the one you were working with before. So for each range we push on the stack, we halve the size of the range we're working with.
That means we can't push more than log n ranges onto the stack before the range we're working with hits size 1 (and therefore is sorted). This bounds the amount of stack we need to complete the first descent.
When we start processing the "big parts", each "big part" B(k) is bigger than the "small part" S(k) produced at the same time, so we might need more stack to handle B(k) than we needed to handle S(k). But B(k) is still smaller than the previous "small part", S(k-1) and once we're processing B(k), we've taken it back off the stack, which therefore is one item smaller than when we processed S(k), and the same size as when we processed S(k-1). So we still have our bound.
Suppose we did it the other way around - push the small part and continue working with the large part. Then in the pathologically nasty case, we'd push a size 1 range on the stack each time, and continue working with a size only 2 smaller than the previous size. Hence we'd need n / 2 slots in our stack.
Consider the worst case where you partition in such a way that your partition is 1:n. If you sort small subfile first than you only need to use O(1) space, as you push the large subfile and then pop it back (and then again push the large subfile). But, if you sort large subfile first than you need O(N) space, because you keep pushing 1 element array in the stack.
Here is a quote from Algorithms by ROBERT SEDGEWICK (he was the one who wrote paper on this) :
For Quicksort, the combination of end- recursion removal and a policy
of processing the smaller of the two subfiles first turns out to
ensure that the stack need only contain room for about, lg N entries,
since each entry on the stack after the top one must represent a
subfile less than half the size of the previous entry.
OK, am I right that you mean if we make the Quicksort algorithm non-recursive, you have to use a stack where you put partitions on the stack?
If so: an algorithm must allocate for each variable it uses memory. So, if you run two instances of it parallel, they are allocating the double amount of one algorithm memory space...
Now, in a recursive version, you start a new instance of the algorithm (which needs to allocate memory) BUT the instance which calls the recursive one, DOES NOT end, so the allocated memory is needed! -> in fact, we have started lets say 10 recursive instances and need 10*X memory, where X is the memory needed by one instance.
Now, we use the non-recursive algorithm. You MUST only allocate the needed memory ONCE. In fact, helper variables only use the space of one instance. To accomplish the function of the algorithm we must save the already made partitions (or what we haven't done already). In fact, we put it on a stack and take the partitions off until we made the last "recursion" step. So, imagine: you are giving the algorithm an array. The recursive algorithm needs to allocate the whole array and some helper variables with each instance (again: if the recursion depth is 10, we need 10*X memory where the array needs much).
The non-recursive one needs to allocate the array, helper variables only once BUT it needs a stack. However, in the end you won't put so many parts on the stack that the recursive algorithm will need less memory due to the part that we doesn't need to allocate the array again each time/instance.
I hope, I have described it so that you can understand it, but my English isn't soooo good. :)

Resources