Translate recursive function calls into cost-efficient representation

Translate recursive function calls into cost-efficient representation - algorithm

I am trying to use some static analysis tools to check a program with extensive usage of recursive calls. Conceptually, it is something like this:
int counter = 0;
int access = 0;
extern int nd (); // a nondeterministic value
void compute1();
void compute2();
int get()
{
static int fpa[2] = {2, 2}; // each function can be called for twice
int k = nd() % 2;
if (fpa[k] > 0) {
fpa[k]--;
return k+1;
}
else
return 0;
}
void check_r(int* x) {
if (x == &counter) {
__VERIFIER_assert(!(access == 2));
access = 1;
}
}
void check_w(int* x) {
if (x == &counter) {
__VERIFIER_assert((access == 0));
access = 2;
}
}
void schedule() {
for (int i = 0; i < 5; i++) {
int fp = get();
if (fp == 0)
return;
elif (fp == 1)
compute1();
elif (fp == 2)
compute2();
}
}
void compute1()
{
// some computations
...
schedule(); // recursive call
check_w(&counter); // check write access
...
}
void compute2()
{
// some computations
...
schedule(); //recursive call
check_r(&counter);
...
}
int main()
{
schedule();
return 0;
}
My tentative tests show that due to the recursive call, the static analysis becomes too slow to terminate.
While in principle, I can somehow rewrite the recursive call into a switch statement or so, but the problem is that before the recursive call schedule, compute1 and compute2 functions have performed nontrivial amount of computations already, and it is difficult to save the program context for further usage.
I have been trapped to optimize this cases for a few days, but just cannot come up with a even ad-hoc solution. Could anyone please provide some comments and suggestions to get rid of the recursive call here? Thank you so much.

To me it looks like all the schedule function is doing is deciding whether to call compute1 or compute2 and all the get is doing is ensuring a single function is never called more than twice. I don't think the recursive call from compute to schedule is necessary then since there is never more than two calls. The recursion here seem to imply that every time we can successfully call one of the compute functions we wan't another chance to call compute again
void schedule() {
int chances = 1;
for (int i = 0; i < 5 || chances > 0; i++) {
int fp = get();
if (fp == 0){
chances--;
if(chances < 0)
chances = 0;
continue;
}
elif (fp == 1){
compute1(); chances++;
}
elif (fp == 2){
compute2(); chances++;
}
}
}
void compute1()
{
// some computations
...
//schedule(); remove
check_w(&counter); // check write access
...
}
void compute2()
{
// some computations
...
//schedule(); remove
check_r(&counter);
...
}
This code is a bit confusing so please clarify if I made any incorrect assumptions

Related

time complexity, Java

Please see the code that I wrote based on a school example.
public class Test {
public static void main(String [] args)
{
int number = 0;
int [] array = new int[number+1];
array[number] = 0;
methodName(number, array);
}
public static void methodName(int n, int[] b )
{
if (n == 0)
{
System.out.println(" b is : " + b);
return;
}
else
{
b[n-1] = 0;
methodName(n-1, b);
b[n-1] = 1;
methodName(n-1, b);
}
}
}
I am trying to calculate the best and worst case time complexity of this code.
As far as I understand the best case would be O(1).
And I'm having a difficulty determining the worst case.
There are four basic operations in the else loop.
I know that this is a progressively growing function and I have a feeling it is close to being O(!n).
Thank you for your time.

IF methodName is not getting called from anywhere else than main,
then it would always be O(1)

Why is atomic_thread_fence(memory_order_seq_cst) needed in a lock-free queue that already uses seq_cst CAS?

A lock-free queue, only one thread execute push and pop， others execute steal.
However, I can't understand why steal() needs std::atomic_thread_fence(std::memory_order_seq_cst).
In my opinion, steal() only has one store operation, that is _top.compare_exchange_strong, and it has memory_order_seq_cst. So, why does it need a seq_cst fence as well?
template <typename T>
class WorkStealingQueue {
public:
WorkStealingQueue() : _bottom(1), _top(1) { }
~WorkStealingQueue() { delete [] _buffer; }
int init(size_t capacity) {
if (capacity & (capacity - 1)) {
LOG(ERROR) << "Invalid capacity=" << capacity
<< " which must be power of 2";
return -1;
}
_buffer = new(std::nothrow) T[capacity];
_capacity = capacity;
return 0;
}
// Steal one item from the queue.
// Returns true on stolen.
// May run in parallel with push() pop() or another steal().
bool steal(T* val) {
size_t t = _top.load(std::memory_order_acquire);
size_t b = _bottom.load(std::memory_order_acquire);
if (t >= b) {
// Permit false negative for performance considerations.
return false;
}
do {
std::atomic_thread_fence(std::memory_order_seq_cst);
b = _bottom.load(std::memory_order_acquire);
if (t >= b) {
return false;
}
*val = _buffer[t & (_capacity - 1)];
} while (!_top.compare_exchange_strong(t, t + 1,
std::memory_order_seq_cst,
std::memory_order_relaxed));
return true;
}
// Pop an item from the queue.
// Returns true on popped and the item is written to `val'.
// May run in parallel with steal().
// Never run in parallel with push() or another pop().
bool pop(T* val) {
const size_t b = _bottom.load(std::memory_order_relaxed);
size_t t = _top.load(std::memory_order_relaxed);
if (t >= b) {
// fast check since we call pop() in each sched.
// Stale _top which is smaller should not enter this branch.
return false;
}
const size_t newb = b - 1;
_bottom.store(newb, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
t = _top.load(std::memory_order_relaxed);
if (t > newb) {
_bottom.store(b, std::memory_order_relaxed);
return false;
}
*val = _buffer[newb & (_capacity - 1)];
if (t != newb) {
return true;
}
// Single last element, compete with steal()
const bool popped = _top.compare_exchange_strong(
t, t + 1, std::memory_order_seq_cst, std::memory_order_relaxed);
_bottom.store(b, std::memory_order_relaxed);
return popped;
}
// Push an item into the queue.
// Returns true on pushed.
// May run in parallel with steal().
// Never run in parallel with pop() or another push().
bool push(const T& x) {
const size_t b = _bottom.load(std::memory_order_relaxed);
const size_t t = _top.load(std::memory_order_acquire);
if (b >= t + _capacity) { // Full queue.
return false;
}
_buffer[b & (_capacity - 1)] = x;
_bottom.store(b + 1, std::memory_order_release);
return true;
}
private:
DISALLOW_COPY_AND_ASSIGN(WorkStealingQueue);
std::atomic<size_t> _bottom;
size_t _capacity;
T* _buffer;
std::atomic<size_t> BAIDU_CACHELINE_ALIGNMENT _top;
};

You do not have to use a seq-cst-fence, but then you would have to make the operations on _bottom sequentially consistent. The reason is that it must be guaranteed that the load operation in steal sees the updated value written in pop. Otherwise you could have a race condition where the same item could be returned twice (once from pop and once from steal).
For comparison you can take a look at my implementation of the Chase-Lev-Deque: https://github.com/mpoeter/xenium/blob/master/xenium/chase_work_stealing_deque.hpp

Stored lambda function calls are very slow - fix or workaround?

In an attempt to make a more usable version of the code I wrote for an answer to another question, I used a lambda function to process an individual unit. This is a work in progress. I've got the "client" syntax looking pretty nice:
// for loop split into 4 threads, calling doThing for each index
parloop(4, 0, 100000000, [](int i) { doThing(i); });
However, I have an issue. Whenever I call the saved lambda, it takes up a ton of CPU time. doThing itself is an empty stub. If I just comment out the internal call to the lambda, then the speed returns to normal (4 times speedup for 4 threads). I'm using std::function to save the reference to the lambda.
My question is - Is there some better way that the stl library internally manages lambdas for large sets of data, that I haven't come across?
struct parloop
{
public:
std::vector<std::thread> myThreads;
int numThreads, rangeStart, rangeEnd;
std::function<void (int)> lambda;
parloop(int _numThreads, int _rangeStart, int _rangeEnd, std::function<void(int)> _lambda) //
: numThreads(_numThreads), rangeStart(_rangeStart), rangeEnd(_rangeEnd), lambda(_lambda) //
{
init();
exit();
}
void init()
{
myThreads.resize(numThreads);
for (int i = 0; i < numThreads; ++i)
{
myThreads[i] = std::thread(myThreadFunction, this, chunkStart(i), chunkEnd(i));
}
}
void exit()
{
for (int i = 0; i < numThreads; ++i)
{
myThreads[i].join();
}
}
int rangeJump()
{
return ceil(float(rangeEnd - rangeStart) / float(numThreads));
}
int chunkStart(int i)
{
return rangeJump() * i;
}
int chunkEnd(int i)
{
return std::min(rangeJump() * (i + 1) - 1, rangeEnd);
}
static void myThreadFunction(parloop *self, int start, int end) //
{
std::function<void(int)> lambda = self->lambda;
// we're just going to loop through the numbers and print them out
for (int i = start; i <= end; ++i)
{
lambda(i); // commenting this out speeds things up back to normal
}
}
};
void doThing(int i) // "payload" of the lambda function
{
}
int main()
{
auto start = timer.now();
auto stop = timer.now();
// run 4 trials of each number of threads
for (int x = 1; x <= 4; ++x)
{
// test between 1-8 threads
for (int numThreads = 1; numThreads <= 8; ++numThreads)
{
start = timer.now();
// this is the line of code which calls doThing in the loop
parloop(numThreads, 0, 100000000, [](int i) { doThing(i); });
stop = timer.now();
cout << numThreads << " Time = " << std::chrono::duration_cast<std::chrono::nanoseconds>(stop - start).count() / 1000000.0f << " ms\n";
//cout << "\t\tsimple list, time was " << deltaTime2 / 1000000.0f << " ms\n";
}
}
cin.ignore();
cin.get();
return 0;
}

I'm using std::function to save the reference to the lambda.
That's one possible problem, as std::function is not a zero-runtime-cost abstraction. It is a type-erased wrapper that has a virtual-call like cost when invoking operator() and could also potentially heap-allocate (which could mean a cache-miss per call).
If you want to store your lambda in such a way that does not introduce additional overhead and that allows the compiler to inline it, you should use a template parameter. This is not always possible, but might fit your use case. Example:
template <typename TFunction>
struct parloop
{
public:
std::thread **myThreads;
int numThreads, rangeStart, rangeEnd;
TFunction lambda;
parloop(TFunction&& _lambda,
int _numThreads, int _rangeStart, int _rangeEnd)
: lambda(std::move(_lambda)),
numThreads(_numThreads), rangeStart(_rangeStart),
rangeEnd(_rangeEnd)
{
init();
exit();
}
// ...
To deduce the type of the lambda, you can use an helper function:
template <typename TF, typename... TArgs>
auto make_parloop(TF&& lambda, TArgs&&... xs)
{
return parloop<std::decay_t<TF>>(
std::forward<TF>(lambda), std::forward<TArgs>(xs)...);
}
Usage:
auto p = make_parloop([](int i) { doThing(i); },
numThreads, 0, 100000000);
I wrote an article that's related to the subject:
"Passing functions to functions"
It contains some benchmarks that show how much assembly is generated for std::function compared to a template parameter and other solutions.

for loop Arduino/C

My aim is to create a for loop that iterates through numbers and once it reaches the maximum, it stops printing. So far I managed to create a piece of code that stops printing the x but it keeps printing zeroes. How can I stop Serial.print() function to be executed once the iteration reached the maximum value?
int x;
boolean f = false;
void setup(){
Serial.begin(9600);
}
void loop(){
for(x=0;x<8;x++){
Serial.println(x);
delay(300);
if(x==7){
f = true;
}
if(f){
break;
}
}
}

Something like below should serve. Btw, I like to name my vars something meaningful to avoid potential confusion and make the code more intelligible.
(In general you are better off posting questions to the Arduino forum. More traffic and more knowledgeable/helpful people == more likelihood of getting an answer.)
int current;
int limit;
boolean complete;
void setup(){
Serial.begin(9600);
current = 0;
limit = 8;
complete = false;
}
void loop(){
if (!complete){
while (true){
Serial.println(current);
current++;
if (current >= limit){
complete = true;
break;
}
delay(300);
}
}
}

The key is the word loop - that function is called repetitively!
If you want something to happen once, do it in setup(), or (as another answer suggests), have a flag to keep track of the fact that you've done it already.
Another way (since x is global) would be:
void loop() {
if (x < 8) {
Serial.println(x);
x++;
}
}
or, getting rid of the global variable:
void loop() {
static int x = 0;
if (x < 8) {
Serial.println(x);
x++;
}
}

int x;
boolean f = false;
void setup(){
Serial.begin(9600);
}
Very similar to your code, just put the print statement in the if statement and you're set.
void loop(){
for(x=0;x<8;x++){
if(!f) {
Serial.println(x);
}
delay(300);
if(x==7){
f = true;
}
}
}

Looking at Sorts - Quicksort Iterative?

I'm looking at all different sorts. Note that this is not homework (I'm in the midst of finals) I'm just looking to be prepared if that sort of thing would pop up.
I was unable to find a reliable method of doing a quicksort iteratively. Is it possible and, if so, how?

I'll try to give a more general answer in addition to the actual implementations given in the other posts.
Is it possible and, if so, how?
Let us first of all take a look at what can be meant by making a recursive algorithm iterative.
For example, we want to have some function sum(n) that sums up the numbers from 0 to n.
Surely, this is
sum(n) =
if n = 0
then return 0
else return n + sum(n - 1)
As we try to compute something like sum(100000), we'll soon see this recursive algorithm has it's limits - a stack overflow will occur.
So, as a solution, we use an iterative algorithm to solve the same problem.
sum(n) =
s <- 0
for i in 0..n do
s <- s + i
return s
However, it's important to note that this implementation is an entirely different algorithm than the recursive sum above. We didn't in some way modify the original one to obtain the iterative version, we basically just found a non-recursive algorithm - with different and arguably better performance characteristics - that solves the same problem.
This is the first aspect of making an algorithm iterative: Finding a different, iterative algorithm that solves the same problem.
In some cases, there simply might not be such an iterative version.
The second one however is applicable to every recursive algorithm. You can turn any recursion into iteration by explicitly introducing the stack the recursion uses implicitly. Now this algorithm will have the exact same characteristics as the original one - and the stack will grow with O(n) like in the recursive version. It won't that easily overflow since it uses conventional memory instead of the call stack, and its iterative, but it's still the same algorithm.
As to quick sort: There is no different formulation what works without storing the data needed for recursion. But of course you can use an explicit stack for them like Ehsan showed. Thus you can - as always - produce an iterative version.

#include <stdio.h>
#include <conio.h>
#define MAXELT 100
#define INFINITY 32760 // numbers in list should not exceed
// this. change the value to suit your
// needs
#define SMALLSIZE 10 // not less than 3
#define STACKSIZE 100 // should be ceiling(lg(MAXSIZE)+1)
int list[MAXELT+1]; // one extra, to hold INFINITY
struct { // stack element.
int a,b;
} stack[STACKSIZE];
int top=-1; // initialise stack
int main() // overhead!
{
int i=-1,j,n;
char t[10];
void quicksort(int);
do {
if (i!=-1)
list[i++]=n;
else
i++;
printf("Enter the numbers <End by #>: ");
fflush(stdin);
scanf("%[^\n]",t);
if (sscanf(t,"%d",&n)<1)
break;
} while (1);
quicksort(i-1);
printf("\nThe list obtained is ");
for (j=0;j<i;j++)
printf("\n %d",list[j]);
printf("\n\nProgram over.");
getch();
return 0; // successful termination.
}
void interchange(int *x,int *y) // swap
{
int temp;
temp=*x;
*x=*y;
*y=temp;
}
void split(int first,int last,int *splitpoint)
{
int x,i,j,s,g;
// here, atleast three elements are needed
if (list[first]<list[(first+last)/2]) { // find median
s=first;
g=(first+last)/2;
}
else {
g=first;
s=(first+last)/2;
}
if (list[last]<=list[s])
x=s;
else if (list[last]<=list[g])
x=last;
else
x=g;
interchange(&list[x],&list[first]); // swap the split-point element
// with the first
x=list[first];
i=first+1; // initialise
j=last+1;
while (i<j) {
do { // find j
j--;
} while (list[j]>x);
do {
i++; // find i
} while (list[i]<x);
interchange(&list[i],&list[j]); // swap
}
interchange(&list[i],&list[j]); // undo the extra swap
interchange(&list[first],&list[j]); // bring the split-point
// element to the first
*splitpoint=j;
}
void push(int a,int b) // push
{
top++;
stack[top].a=a;
stack[top].b=b;
}
void pop(int *a,int *b) // pop
{
*a=stack[top].a;
*b=stack[top].b;
top--;
}
void insertion_sort(int first,int last)
{
int i,j,c;
for (i=first;i<=last;i++) {
j=list[i];
c=i;
while ((list[c-1]>j)&&(c>first)) {
list[c]=list[c-1];
c--;
}
list[c]=j;
}
}
void quicksort(int n)
{
int first,last,splitpoint;
push(0,n);
while (top!=-1) {
pop(&first,&last);
for (;;) {
if (last-first>SMALLSIZE) {
// find the larger sub-list
split(first,last,&splitpoint);
// push the smaller list
if (last-splitpoint<splitpoint-first) {
push(first,splitpoint-1);
first=splitpoint+1;
}
else {
push(splitpoint+1,last);
last=splitpoint-1;
}
}
else { // sort the smaller sub-lists
// through insertion sort
insertion_sort(first,last);
break;
}
}
} // iterate for larger list
}
// End of code.
taken from here

I was unable to find a reliable method of doing a quicksort iteratively
Have you tried google ?
It is just common quicksort, when recursion is realized with array.

This is my effort. Tell me if there is any improvement possible.
This code is done from the book "Data Structures, Seymour Lipschutz(Page-173), Mc GrawHill, Schaum's Outline Series."
#include <stdio.h>
#include <conio.h>
#include <math.h>
#define SIZE 12
struct StackItem
{
int StartIndex;
int EndIndex;
};
struct StackItem myStack[SIZE * SIZE];
int stackPointer = 0;
int myArray[SIZE] = {44,33,11,55,77,90,40,60,99,22,88,66};
void Push(struct StackItem item)
{
myStack[stackPointer] = item;
stackPointer++;
}
struct StackItem Pop()
{
stackPointer--;
return myStack[stackPointer];
}
int StackHasItem()
{
if(stackPointer>0)
{
return 1;
}
else
{
return 0;
}
}
void ShowStack()
{
int i =0;
printf("\n");
for(i=0; i<stackPointer ; i++)
{
printf("(%d, %d), ", myStack[i].StartIndex, myStack[i].EndIndex);
}
printf("\n");
}
void ShowArray()
{
int i=0;
printf("\n");
for(i=0 ; i<SIZE ; i++)
{
printf("%d, ", myArray[i]);
}
printf("\n");
}
void Swap(int * a, int *b)
{
int temp = *a;
*a = *b;
*b = temp;
}
int Scan(int *startIndex, int *endIndex)
{
int partition = 0;
int i = 0;
if(*startIndex > *endIndex)
{
for(i=*startIndex ; i>=*endIndex ; i--)
{
//printf("%d->", myArray[i]);
if(myArray[i]<myArray[*endIndex])
{
//printf("\nSwapping %d, %d", myArray[i], myArray[*endIndex]);
Swap(&myArray[i], &myArray[*endIndex]);
*startIndex = *endIndex;
*endIndex = i;
partition = i;
break;
}
if(i==*endIndex)
{
*startIndex = *endIndex;
*endIndex = i;
partition = i;
}
}
}
else if(*startIndex < *endIndex)
{
for(i=*startIndex ; i<=*endIndex ; i++)
{
//printf("%d->", myArray[i]);
if(myArray[i]>myArray[*endIndex])
{
//printf("\nSwapping %d, %d", myArray[i], myArray[*endIndex]);
Swap(&myArray[i], &myArray[*endIndex]);
*startIndex = *endIndex;
*endIndex = i;
partition = i;
break;
}
if(i==*endIndex)
{
*startIndex = *endIndex;
*endIndex = i;
partition = i;
}
}
}
return partition;
}
int GetFinalPosition(struct StackItem item1)
{
struct StackItem item = {0};
int StartIndex = item1.StartIndex ;
int EndIndex = item1.EndIndex;
int PivotIndex = -99;
while(StartIndex != EndIndex)
{
PivotIndex = Scan(&EndIndex, &StartIndex);
printf("\n");
}
return PivotIndex;
}
void QuickSort()
{
int median = 0;
struct StackItem item;
struct StackItem item1={0};
struct StackItem item2={0};
item.StartIndex = 0;
item.EndIndex = SIZE-1;
Push(item);
while(StackHasItem())
{
item = Pop();
median = GetFinalPosition(item);
if(median>=0 && median<=(SIZE-1))
{
if(item.StartIndex<=(median-1))
{
item1.StartIndex = item.StartIndex;
item1.EndIndex = median-1;
Push(item1);
}
if(median+1<=(item.EndIndex))
{
item2.StartIndex = median+1;
item2.EndIndex = item.EndIndex;
Push(item2);
}
}
ShowStack();
}
}
main()
{
ShowArray();
QuickSort();
ShowArray();
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Translate recursive function calls into cost-efficient representation - algorithm

Related

time complexity, Java

Why is atomic_thread_fence(memory_order_seq_cst) needed in a lock-free queue that already uses seq_cst CAS?

Stored lambda function calls are very slow - fix or workaround?

for loop Arduino/C

Looking at Sorts - Quicksort Iterative?

Categories

Resources