I wrote the following for a class, but came across some strange behavior while testing it. arrayProcedure is meant to do things with an array based on the 2 "tweaks" at the top of the function (arrSize, and start). For the assignment, arrSize must be 10,000, and start, 100. Just for kicks, I decided to see what happens if I increase them, and for some reason, if arrSize exceeds around 60,000 (I haven't found the exact limit), the program immediately crashes with a stack overflow when using a debugger:
Unhandled exception at 0x008F6977 in TMA3Question1.exe: 0xC00000FD: Stack overflow (parameters: 0x00000000, 0x00A32000).
If I just run it without a debugger, I don't get any helpful errors; windows hangs for a fraction of a second, then gives me an error TMA3Question1.exe has stopped working.
I decided to play around with debugging it, but that didn't shed any light. I placed breaks above and below the call to arrayProcedure, as well as peppered inside of it. When arrSize doesn't exceed 60,000 it runs fine: It pauses before calling arrayProcedure, properly waits at all the points inside of it, then pauses on the break underneath the call.
If I raise arrSize however, the break before the call happens, but it appears as though it never even steps into arrayProcedure; it immediately gives me a stack overflow without pausing at any of the internal breakpoints.
The only thing I can think of is the resulting arrays exceeds my computer's current memory, but that doesn't seem likely for a couple reasons:
It should only use just under a megabyte:
sizeof(double) = 8 bytes
8 * 60000 = 480000 bytes per array
480000 * 2 = 960000 bytes for both arrays
As far as I know, arrays aren't immediately constructed when I function is entered; they're allocated on definition. I placed several breakpoints before the arrays are even declared, and they are never reached.
Any light that you could shed on this would be appreciated.
The code:
#include <iostream>
#include <ctime>
//CLOCKS_PER_SEC is a macro supplied by ctime
double msBetween(clock_t startTime, clock_t endTime) {
return endTime - startTime / (CLOCKS_PER_SEC * 1000.0);
}
void initArr(double arr[], int start, int length, int step) {
for (int i = 0, j = start; i < length; i++, j += step) {
arr[i] = j;
}
}
//The function we're going to inline in the next question
void helper(double a1, double a2) {
std::cout << a1 << " * " << a2 << " = " << a1 * a2 << std::endl;
}
void arrayProcedure() {
const int arrSize = 70000;
const int start = 1000000;
std::cout << "Checking..." << std::endl;
if (arrSize > INT_MAX) {
std::cout << "Given arrSize is too high and exceeds the INT_MAX of: " << INT_MAX << std::endl;
return;
}
double arr1[arrSize];
double arr2[arrSize];
initArr(arr1, start, arrSize, 1);
initArr(arr2, arrSize + start - 1, arrSize, -1);
for (int i = 0; i < arrSize; i++) {
helper(arr1[i], arr2[i]);
}
}
int main(int argc, char* argv[]) {
using namespace std;
const clock_t startTime = clock();
arrayProcedure();
clock_t endTime = clock();
cout << endTime << endl;
double elapsedTime = msBetween(startTime, endTime);
cout << "\n\n" << elapsedTime << " milliseconds. ("
<< elapsedTime / 60000 << " minutes)\n";
}
The default stack size is 1 MB with Visual Studio.
https://msdn.microsoft.com/en-us/library/tdkhxaks.aspx
You can increase the stack size or use the new operator.
double *arr1 = new double[arrSize];
double *arr2 = new double[arrSize];
...
delete [] arr1;
delete [] arr2;
Related
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I wrote a small test to figure out the fastest mathematic operation for a special x. I wanted the x to be entered by the user, so that I can run the tests for different x. In the following code I tells me that there is an error with std::cin >> val;
error: cannot bind 'std::istream {aka std::basic_istream}' lvalue to 'std::basic_istream&&'
If I declare val as double valinstead of const double val I get more errors. What can I change in order to have a running programm?
#include <cmath>
#include <chrono>
#include <iomanip>
#include <iostream>
#include <istream>
#include <ostream>
// for x^1.5
double test_pow_15(double x) { return std::pow(x, 1.5); };
double test_chain_15(double x) { return sqrt(x * x * x); };
double test_tmp_15(double x) { double tmp = x * x * x; return sqrt(tmp); };
volatile double sink;
const double val = 0;
const double ans_15 = std::pow(val, 1.5);
void do_test(const char* name, double(&fn)(double), const double ans) {
auto start = std::chrono::high_resolution_clock::now();
for (size_t n = 0; n < 1000 * 1000 * 10; ++n) {
sink = val;
sink = fn(sink);
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> dur = end - start;
std::cout << name << ".Took" << dur.count() << "ms, error:" << sink - ans << '\n';
}
int main()
{
std::cout << "Speed test"<< '\n';
std::cout << "Please enter value for x."<< '\n';
std::cout << "x = ";
std::cin >> val;
std::cout << "Speed test starts for x = "<< val <<"."<<'\n';
std::cout << " " << '\n';
std::cout << "For " << val<<"^(1.5) the speed is:" <<'\n';
do_test("std::pow(x,1.5) ",test_pow_15, ans_15);
do_test("sqrt(x*x*x) ",test_chain_15, ans_15);
do_test("tmp = x*x*x; sqrt(tmp) ",test_tmp_15, ans_15);
return 0;
}
I think if you remove the "const" keyword, it would probably work fine.
double val = 0;
I am doing some parallel programming with async. I have an integrator and in a test program I wanted to see whether if dividing a vector in 4 subvectors actually takes one fourth of the time to complete the task.
I had an initial issue about the time measured, now solved as steady_clock() measures real and not CPU time.
I tried the code with different vector lenghts. For short lenghts (<10e5 elements) the direct integration is faster: normal, as the .get() calls and the sum take their time.
For intermediate lenghts (about 1e8 elements) the integration followed the expected time, giving 1 s as the first time and 0.26 s for the second time.
For long vectors(10e9 or higher) the second integration takes much more time than the first, more than 3 s against a similar or greater time.
Why? What is the process that makes the divide and conquer routine slower?
A couple of additional notes: Please note that I pass the vectors by reference, so that cannot be the issue, and keep in mind that this is a test code, thus the subvector creation is not the point of the question.
#include<iostream>
#include<vector>
#include<thread>
#include<future>
#include<ctime>
#include<chrono>
using namespace std;
using namespace chrono;
typedef steady_clock::time_point tt;
double integral(const std::vector<double>& v, double dx) //simpson 1/3
{
int n=v.size();
double in=0.;
if(n%2 == 1) {in+=v[n-1]*v[n-1]; n--;}
in=(v[0]*v[0])+(v[n-1]*v[n-1]);
for(int i=1; i<n/2; i++)
in+= 2.*v[2*i] + 4.*v[2*i+1];
return in*dx/3.;
}
int main()
{
double h=0.001;
vector<double> v1(100000,h); // a vector, content is not important
// subvectors
vector<double> sv1(v1.begin(), v1.begin() + v1.size()/4),
sv2(v1.begin() + v1.size()/4 +1,v1.begin()+ 2*v1.size()/4),
sv3( v1.begin() + 2*v1.size()/4+1, v1.begin() + 3*v1.size()/4+1),
sv4( v1.begin() + 3*v1.size()/4+1, v1.end());
double a,b;
cout << "f1" << endl;
tt bt1 = chrono::steady_clock::now();
// complete integration: should take time t
a=integral(v1, h);
tt et1 = chrono::steady_clock::now();
duration<double> time_span = duration_cast<duration<double>>(et1 - bt1);
cout << time_span.count() << endl;
future<double> f1, f2,f3,f4;
cout << "f2" << endl;
tt bt2 = chrono::steady_clock::now();
// four integrations: should take time t/4
f1 = async(launch::async, integral, ref(sv1), h);
f2 = async(launch::async, integral, ref(sv2), h);
f3 = async(launch::async, integral, ref(sv3), h);
f4 = async(launch::async, integral, ref(sv4), h);
b=f1.get()+f2.get()+f3.get()+f4.get();
tt et2 = chrono::steady_clock::now();
duration<double> time_span2 = duration_cast<duration<double>>(et2 - bt2);
cout << time_span2.count() << endl;
cout << a << " " << b << endl;
return 0;
}
I had a completely functioning codebase written in C++11 that used Grand Central Dispatch parallel processing, specifically dispatch_apply to do the basic parallel for loop for some trivial game calculations.
Since upgrading to Sierra, this code still runs, but each block is run in serial -- the cout statement shows that they are being executed in serial order, and CPU usage graph shows no parallel working on.
Queue is defined as:
workQueue = dispatch_queue_create("workQueue", DISPATCH_QUEUE_CONCURRENT);
And the relevant program code is:
case Concurrency::Parallel: {
dispatch_apply(stateMap.size(), workQueue, ^(size_t stateIndex) {
string thisCode = stateCodes[stateIndex];
long thisCount = stateCounts[stateIndex];
GameResult sliceResult = playStateOfCode(thisCode, thisCount);
results[stateIndex] = sliceResult;
if ((stateIndex + 1) % updatePeriod == 0) {
cout << stateIndex << endl;
}
});
break;
}
I strongly suspect that this either a bug, but if this is GCD forcing me to use new C++ methods for this, I'm all ears.
I'm not sure if it is a bug in Sierra or not. But it seems to work if you explicitly associate a global concurrent queue as target:
dispatch_queue_t target =
dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0);
dispatch_queue_t workQueue =
dispatch_queue_create_with_target("workQueue", DISPATCH_QUEUE_CONCURRENT, target);
// ^~~~~~~~~~~ ^~~~~~
Here is a working example
#include <iostream>
#include <fstream>
#include <vector>
#include <cmath>
#include <sstream>
#include <dispatch/dispatch.h>
void load_problem(const std::string, std::vector<std::pair<double,double>>&);
int main() {
// n-factor polynomial - test against a given problem provided as a set of space delimited x y values in 2d.txt
std::vector<std::pair<double,double>> problem;
std::vector<double> test = {14.1333177226503,-0.0368874860476915,
0.0909424058436257,2.19080982673558,1.24632025036125,0.0444549880462031,
1.06824631867947,0.551482840616757, 1.04948148731933};
load_problem("weird.txt",problem); //a list of space delimited doubles representing x, y.
size_t a_count = test.size();
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
__block double diffs = 0.0; //sum of all values..
dispatch_apply(problem.size(), queue, ^(size_t i) {
double g = 0;
for (size_t j=0; j < a_count - 1; j++) {
g += test[j]*pow(problem[i].first,a_count - j - 1);
}
g += test[a_count - 1];
diffs += pow(g - problem[i].second,2);
});
double delta = 1/(1+sqrt(diffs));
std::cout << "test: fit delta: " << delta << std::endl;
}
void load_problem(const std::string file, std::vector<std::pair<double,double>>& repo) {
repo.clear();
std::ifstream ifs(file);
if (ifs.is_open()) {
std::string line;
while(getline(ifs, line)) {
double x= std::nan("");
double y= std::nan("");
std::istringstream istr(line);
istr >> std::skipws >> x >> y;
if (!isnan(x) && !isnan(y)) {
repo.push_back({x, y});
};
}
ifs.close();
}
}
In the last week i have been programming some 2-dimensional convolutions with FFTW, by passing to the frequency domain both signals, multiplying, and then coming back.
Surprisingly, I am getting the correct result only when input size is less than a fixed number!
I am posting some working code, in which i take simple initial constant matrixes of value 2 for the input, and 1 for the filter on the spatial domain. This way, the result of convolving them should be a matrix of the average of the first matrix values, i.e., 2, since it is constant. This is the output when I vary the sizes of width and height from 0 to h=215, w=215 respectively; If I set h=216, w=216, or greater, then the output gets corrupted!! I would really appreciate some clues about where could I be making some mistake. Thank you very much!
#include <fftw3.h>
int main(int argc, char* argv[]) {
int h=215, w=215;
//Input and 1 filter are declared and initialized here
float *in = (float*) fftwf_malloc(sizeof(float)*w*h);
float *identity = (float*) fftwf_malloc(sizeof(float)*w*h);
for(int i=0;i<w*h;i++){
in[i]=5;
identity[i]=1;
}
//Declare two forward plans and one backward
fftwf_plan plan1, plan2, plan3;
//Allocate for complex output of both transforms
fftwf_complex *inTrans = (fftwf_complex*) fftw_malloc(sizeof(fftwf_complex)*h*(w/2+1));
fftwf_complex *identityTrans = (fftwf_complex*) fftw_malloc(sizeof(fftwf_complex)*h*(w/2+1));
//Initialize forward plans
plan1 = fftwf_plan_dft_r2c_2d(h, w, in, inTrans, FFTW_ESTIMATE);
plan2 = fftwf_plan_dft_r2c_2d(h, w, identity, identityTrans, FFTW_ESTIMATE);
//Execute them
fftwf_execute(plan1);
fftwf_execute(plan2);
//Multiply in frequency domain. Theoretically, no need to multiply imaginary parts; since signals are real and symmetric
//their transform are also real, identityTrans[i][i] = 0, but i leave here this for more generic implementation.
for(int i=0; i<(w/2+1)*h; i++){
inTrans[i][0] = inTrans[i][0]*identityTrans[i][0] - inTrans[i][1]*identityTrans[i][1];
inTrans[i][1] = inTrans[i][0]*identityTrans[i][1] + inTrans[i][1]*identityTrans[i][0];
}
//Execute inverse transform, store result in identity, where identity filter lied.
plan3 = fftwf_plan_dft_c2r_2d(h, w, inTrans, identity, FFTW_ESTIMATE);
fftwf_execute(plan3);
//Output first results of convolution(in, identity) to see if they are the average of in.
for(int i=0;i<h/h+4;i++){
for(int j=0;j<w/w+4;j++){
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(w*h*w*h) << endl;
}
}std::cout<<endl;
//Compute average of data
float sum=0.0;
for(int i=0; i<w*h;i++)
sum+=in[i];
std::cout<<"Mean of input was " << (float)sum/(w*h) << endl;
std::cout<< endl;
fftwf_destroy_plan(plan1);
fftwf_destroy_plan(plan2);
fftwf_destroy_plan(plan3);
return 0;
}
Your problem has nothing to do with fftw ! It comes from this line :
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(w*h*w*h) << endl;
if w=216 and h=216 then `w*h*w*h=2 176 782 336. The higher limit for signed 32bit integer is 2 147 483 647. You are facing an overflow...
Solution is to cast the denominator to float.
std::cout<<"After convolution, component (" << i <<","<< j << ") is " << identity[j+i*w]/(((float)w)*h*w*h) << endl;
The next trouble that you are going to face is this one :
float sum=0.0;
for(int i=0; i<w*h;i++)
sum+=in[i];
Remember that a float has 7 useful decimal digits. If w=h=4000, the computed average will be lower than the real one. Use a double or write two loops and sum on the inner loop (localsum) before summing the outer loop (sum+=localsum) !
Bye,
Francis
Switching from VC10 to VC11 I observe a performance drop of a factor 10 when reading a file with double numbers:
#include <iostream>
int main() {
double sum = 0, x;
for(int i=0; i<1000000; i++){
std::cin >> x;
sum += x;
}
std::cerr << sum << std::endl;
return 0;
}
I built the executable in Developer Studio, so that the environment chooses the options in release mode at best.
Can anybody confirm this?
What could be the problem? Might it be related to locale?
thanks in advance,
andreas
*for some reason my previous answer was deleted (I do admit that the first sentence was a bit confusing, due to a clumsy edit when it got better results)
Actually, for me the performance is about the same.
VC11 writing/reading 1M doubles -> 6.600/3.562 seconds
VC10 writing/reading 1M doubles -> 6.266/3.606 seconds
So in my experiment, reading doubles from file in vc11 is aprox. the same performance as with vc10.
Codesample:
int _tmain(int argc, _TCHAR* argv[])
{
auto x = 0.0;
auto numberofdoubles = 1000000;
auto filename = "C:\\double.txt";
{
std::ofstream filestr(filename);
auto starttime = clock();
for(int i=0; i<numberofdoubles; i++)
filestr << (double)i << " ";
auto endtime = clock();
auto elapsed = (double)(endtime - starttime)/CLOCKS_PER_SEC;
std::cout << "writing: " << elapsed << std::endl;
}
{
std::ifstream filestr (filename);
auto starttime = clock();
for(int i=0; i<numberofdoubles; i++)
filestr >> x;
auto endtime = clock();
auto elapsed = (double)(endtime - starttime)/CLOCKS_PER_SEC;
std::cout << "reading: " << elapsed << std::endl;
}
return 0;
}