I can´t count the time of cpu processing in cygwin?Why is that?
Do I need a special command.Counting clicks in cpu is done by clock
function after including time.h!
Still, after i get it done in visual studio i just can´t run on cygwin?
Why is that!
Here is the code.
#include <iostream>
#include <time.h>
using namespace std;
int main()
{
clock_t t1,t2;
int x=0;
int num;
cout << "0 to get out of program, else, number of iterations" << endl;
cin>>num;
if(num==0)
system(0);
t1=clock();
while (x!=num)
{
cout << "Number "<<x<<" e"<< endl;
if(x%2==0)
cout << "Even" << endl;
else
cout << "Odd" << endl;
x=x+1;
}
t2=clock();
float diff ((float)t2-(float)t1);
cout<<diff<<endl;
float seconds = diff / CLOCKS_PER_SEC;
cout<<seconds<<endl;
system ("pause");
return 0;
}
Sorry for the bad english.
Looks like the clock() function is defined differently for Windows and POSIX (and hence Cygwin). MSDN says that the Windows clock() returns "the elapsed wall-clock time since the start of the process", whereas the POSIX version returns "the implementation's best approximation to the processor time used by the process". In your example, the process will be spending almost its entire time waiting for output to the terminal to complete, which doesn't count towards the processing time.
Related
I wrote a small program with which you can get edges of a digital image (the well-known Canny detector). It is necessary to measure the exact time (in milliseconds) of the algorithm execution on the device (GPU) (including the stages of data transfer). I attach the working program code in C:
#include <iostream>
#include <sys/time.h>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/cudaimgproc.hpp>
#include <cuda_runtime.h>
#include <opencv2/core/cuda.hpp>
using namespace cv;
using namespace std;
__device__ __host__
void FirstRun (void)
{
cudaSetDevice(0);
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
}
int main( int argc, char** argv )
{
clock_t time;
if (argc != 2)
{
cout << "Wrong number of arguments!" << endl;
return -1;
}
const char* filename = argv[1];
Mat img = imread(filename, IMREAD_GRAYSCALE);
if( !img.data )
{
cout << " --(!) Error reading images \n" << endl;
return -2;
}
double low_tresh = 100.0;
double high_tresh = 150.0;
int apperture_size = 3;
bool useL2gradient = false;
int imageWidth = img.cols;
int imageHeight = img.rows;
cout << "Width of image: " << imageWidth << endl;
cout << "Height of image: " << imageHeight << endl;
cout << endl;
FirstRun();
// Canny algorithm
cuda::GpuMat d_img(img);
cuda::GpuMat d_edges;
time = clock();
Ptr<cuda::CannyEdgeDetector> canny = cuda::createCannyEdgeDetector(low_tresh, high_tresh, apperture_size, useL2gradient);
canny->detect(d_img, d_edges);
time = clock() - time;
cout << "CannyCUDA time (ms): " << (float)time / CLOCKS_PER_SEC * 1000 << endl;
return 0;
}
I get two different work times (image 7741 x 8862)
System configuration:
1) CPU: Intel Core i7 9600K (3.6 GHz), 32 GB RAM;
2) GPU: Nvidia Geforce RTX 2080 Ti;
3) OpenCV ver. 4.0
What time is right and do I measure it correctly, thank you!
There are different times you can measure when dealing with cuda.
Here are some solution you might want to try:
Measure the total time used by cuda: Use time() to get an absolute time value before using any cuda functions and time() again after you got the result. The difference will be the real time that passed.
Measure only the time of the calculation: cuda has some start-up overhead, but if you're not interested in that, because you will be using your code many times without exiting the cuda environment, you can measure it seperatly. Please read the CUDA C Programming Guide, it will explain the use of Events to be used for timing.
Use the profiler to get detailed information on what part of the program takes what part of time: The kernel times are expecially interesting, as they tell you how long your computations take. Be careful when looking at the API times. In your example, a lot of time is used by cudaEventCreate() as this is the first cuda function in your program, so it includes the start-up overhead. Also, cuda[...]Synchronize() doesnt actually take that long to be called, but it includes the time it is waiting for synchronization.
The code below is meant to generate a list of five pseudo-random numbers in the interval [1,100]. I seed the default_random_engine with time(0), which returns the system time in unix time. When I compile and run this program on Windows 7 using Microsoft Visual Studio 2013, it works as expected (see below). When I do so in Arch Linux with the g++ compiler, however, it behaves strangely.
In Linux, 5 numbers will be generated each time. The last 4 numbers will be different on each execution (as will often be the case), but the first number will stay the same.
Example output from 5 executions on Windows and Linux:
| Windows: | Linux:
---------------------------------------
Run 1 | 54,01,91,73,68 | 25,38,40,42,21
Run 2 | 46,24,16,93,82 | 25,78,66,80,81
Run 3 | 86,36,33,63,05 | 25,17,93,17,40
Run 4 | 75,79,66,23,84 | 25,70,95,01,54
Run 5 | 64,36,32,44,85 | 25,09,22,38,13
Adding to the mystery, that first number periodically increments by one on Linux. After obtaining the above outputs, I waited about 30 minutes and tried again to find that the 1st number had changed and now was always being generated as a 26. It has continued to increment by 1 periodically and is now at 32. It seems to correspond with the changing value of time(0).
Why does the first number rarely change across runs, and then when it does, increment by 1?
The code. It neatly prints out the 5 numbers and the system time:
#include <iostream>
#include <random>
#include <time.h>
using namespace std;
int main()
{
const int upper_bound = 100;
const int lower_bound = 1;
time_t system_time = time(0);
default_random_engine e(system_time);
uniform_int_distribution<int> u(lower_bound, upper_bound);
cout << '#' << '\t' << "system time" << endl
<< "-------------------" << endl;
for (int counter = 1; counter <= 5; counter++)
{
int secret = u(e);
cout << secret << '\t' << system_time << endl;
}
system("pause");
return 0;
}
Here's what's going on:
default_random_engine in libstdc++ (GCC's standard library) is minstd_rand0, which is a simple linear congruential engine:
typedef linear_congruential_engine<uint_fast32_t, 16807, 0, 2147483647> minstd_rand0;
The way this engine generates random numbers is xi+1 = (16807xi + 0) mod 2147483647.
Therefore, if the seeds are different by 1, then most of the time the first generated number will differ by 16807.
The range of this generator is [1, 2147483646]. The way libstdc++'s uniform_int_distribution maps it to an integer in the range [1, 100] is essentially this: generate a number n. If the number is not greater than 2147483600, then return (n - 1) / 21474836 + 1; otherwise, try again with a new number. It should be easy to see that in the vast majority of cases, two ns that differ by only 16807 will yield the same number in [1, 100] under this procedure. In fact, one would expect the generated number to increase by one about every 21474836 / 16807 = 1278 seconds or 21.3 minutes, which agrees pretty well with your observations.
MSVC's default_random_engine is mt19937, which doesn't have this problem.
The std::default_random_engine is implementation defined. Use std::mt19937 or std::mt19937_64 instead.
In addition std::time and the ctime functions are not very accurate, use the types defined in the <chrono> header instead:
#include <iostream>
#include <random>
#include <chrono>
int main()
{
const int upper_bound = 100;
const int lower_bound = 1;
auto t = std::chrono::high_resolution_clock::now().time_since_epoch().count();
std::mt19937 e;
e.seed(static_cast<unsigned int>(t)); //Seed engine with timed value.
std::uniform_int_distribution<int> u(lower_bound, upper_bound);
std::cout << '#' << '\t' << "system time" << std::endl
<< "-------------------" << std::endl;
for (int counter = 1; counter <= 5; counter++)
{
int secret = u(e);
std::cout << secret << '\t' << t << std::endl;
}
system("pause");
return 0;
}
In Linux, the random function is not a random function in the probabilistic sense of the way, but a pseudo random number generator.
It is salted with a seed, and based on that seed, the numbers that are produced are pseudo random and uniformly distributed.
The Linux way has the advantage that in the design of certain experiments using information from populations, that the repeat of the experiment with known tweaking of input information can be measured. When the final program is ready for real-life testing, the salt (seed), can be created by asking for the user to move the mouse, mix the mouse movement with some keystrokes and add in a dash of microsecond counts since the beginning of the last power on.
Windows random number seed is obtained from the collection of mouse, keyboard, network and time of day numbers. It is not repeatable. But this salt value may be reset to a known seed, if as mentioned above, one is involved in the design of an experiment.
Oh yes, Linux has two random number generators. One, the default is modulo 32bits, and the other is modulo 64bits. Your choice depends on the accuracy needs and amount of compute time you wish to consume for your testing or actual use.
my name is Adam, I have just begun to learn C++, I love it, but am only on pg 181 in the seventh edition of sams teach yourself C++ in one hour a day, and pg 102 in the seventh edition of C++ for dummies. I have seven multi page notes on the sams book, and twenty one multi page notes on the for dummies book. Please help me understand why I get 5 errors with my simple program which will be shown shortly. I do not want to use -fpermissive option, I need to learn how to code correctly as I am not very experienced. Thank you everyone, very very much, I absolutely love C++, and even have a very good idea on a simple program I plan to learn how to write, which could allow program development time, or writing time to be reduced by 5-20 times on average. The following program shown is not this program however, but please help me so I may one day write, and use my program idea for a college paper. Thank you again, problem program follows:
#include <iostream>
using namespace std;
int main()
{
cout<< "how many integers do you wish to enter? ";
int InputNums = 0;
cin>> InputNums;
int* pNumbers = new int [InputNums];
int* pCopy = pNumbers;
cout<< "successfully allocated memory for "<<
InputNums<< " integers"<<endl;
for(int Index = 0; Index < InputNums; ++Index)
{
cout<< "enter number "<< Index << ": ";
cin>> *(pNumbers + Index);
}
cout<< "displaying all numbers input: " <<endl;
for(int Index = 0, int* pCopy = pNumbers;
Index < InputNums; ++Index)
cout<< *(pCopy++) << " ";
cout<< endl;
delete[] pNumbers;
cout<< "press enter to continue..." << endl;
cin.ignore(10, '\n');
cin.get();
return 0;
}
The problem is indicated as being in the multiple initializations of the second for loop. Please tell me why my problem program will not compile. Thank you all. Sincerely Adam.
My first advice would be to find a better book.
Once you've done that, forget everything you think you know about using new to allocate an array (e.g., int* pNumbers = new int [InputNums];). It's an obsolete construct that you shouldn't use (ever).
If I had to write a program doing what you've outlined above, the core of it would look something like this:
cout<< "how many integers do you wish to enter? ";
int InputNums;
cin>> InputNums;
std::vector<int> numbers;
int temp;
for (int i=0; i<InputNums; i++) {
cin >> temp;
numbers.push_back(temp);
}
cout<< "displaying all numbers input:\n";
for (auto i : numbers)
cout << i << " ";
Directly answer your question: you cannot initialize variables of different types in the same for loop declaration.
In your example:
for(int Index = 0, int* pCopy = pNumbers;
int and int * are different types. Even if you use auto to let the compiler automatically deduct the types, both variables cannot have different deducted types.
The solution:
int Index = 0;
for(int *pCopy=pNumbers; ...
Having this one single secondary effect: Index is now not only confined to the scope of the for. Should this be a problem, you may do:
{
int Index = 0;
for(int *pCopy=pNumbers; ...
...
}
And now the scope of Index is limited to the surrounding curly braces.
I want to generate pseudo-random numbers in C++, and the two likely options are the feature of C++11 and the Boost counterpart. They are used in essentially the same way, but the native one in my tests is roughly 4 times slower.
Is that due to design choices in the library, or am I missing some way of disabling debug code somewhere?
Update: Code is here, https://github.com/vbeffara/Simulations/blob/master/tests/test_prng.cpp and looks like this:
cerr << "boost::bernoulli_distribution ... \ttime = ";
s=0; t=time();
boost::bernoulli_distribution<> dist(.5);
boost::mt19937 boostengine;
for (int i=0; i<n; ++i) s += dist(boostengine);
cerr << time()-t << ", \tsum = " << s << endl;
cerr << "C++11 style ... \ttime = ";
s=0; t=time();
std::bernoulli_distribution dist2(.5);
std::mt19937_64 engine;
for (int i=0; i<n; ++i) s += dist2(engine);
cerr << time()-t << ", \tsum = " << s << endl;
(Using std::mt19937 instead of std::mt19937_64 makes it even slower on my system.)
That’s pretty scary.
Let’s have a look:
boost::bernoulli_distribution<>
if(_p == RealType(0))
return false;
else
return RealType(eng()-(eng.min)()) <= _p * RealType((eng.max)()-(eng.min)());
std::bernoulli_distribution
__detail::_Adaptor<_UniformRandomNumberGenerator, double> __aurng(__urng);
if ((__aurng() - __aurng.min()) < __p.p() * (__aurng.max() - __aurng.min()))
return true;
return false;
Both versions invoke the engine and check if the output lies in a portion of the range of values proportional to the given probability.
The big difference is, that the gcc version calls the functions of a helper class _Adaptor.
This class’ min and max functions return 0 and 1 respectively and operator() then calls std::generate_canonical with the given URNG to obtain a value between 0 and 1.
std::generate_canonical is a 20 line function with a loop – which will never iteratate more than once in this case, but it adds complexity.
Apart from that, boost uses the param_type only in the constructor of the distribution, but then saves _p as a double member, whereas gcc has a param_type member and has to “get” the value of it.
This all comes together and the compiler fails in optimizing.
Clang chokes even more on it.
If you hammer hard enough you can even get std::mt19937 and boost::mt19937 en par for gcc.
It would be nice to test libc++ too, maybe i’ll add that later.
tested versions: boost 1.55.0, libstdc++ headers of gcc 4.8.2
line numbers on request^^
While playing with VS11 beta I noticed something weird:
this code couts
f took 0 milliseconds
int main()
{
std::vector<int> v;
size_t length =64*1024*1024;
for (int i = 0; i < length; i++)
{
v.push_back(rand());
}
uint64_t sum=0;
auto t1 = std::chrono::system_clock::now();
for (size_t i=0;i<v.size();++i)
sum+=v[i];
//std::cout << sum << std::endl;
auto t2 = std::chrono::system_clock::now();
std::cout << "f() took "
<< std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
<< " milliseconds\n";
}
But when I decide to uncomment the line with couting of the sum then it prints out a reasonable number.
This is the behaviour I get with optimizations enabled, with them disabled I get "normal" cout
f() took 471 milliseconds
So is this standard compliant behaviour?
Important: it is not that dead code gets optimized away, I can see the lag when running from console, and I can see CPU spike in Task Manager.
My guess is that this is dead code optimization - and that your load spike is due to the work initializing the vector isn't being optimized away, but the computation of your unused sum variable is.
But when I decide to uncomment the line with couting of the sum then it prints out a reasonable number.
That goes along with my theory, yes - when you're forced to use the result of the computation, the computation itself can't be optimized away.
If you want to confirm that further, make your program say when it's ready and pause for you to press return - that will allow you to wait for any CPU spike to be obviously "gone" before you press return, which will give you more confidence about what's causing it.