I'm trying out SFML and am creating the classic game Snake.
I've successfully made the snake move a certain amount of pixels after a certain amount of time. The problem is that it takes different time for the gameloop to execute. So when I write out the differtime for every move it looks like this:
Time: 0.273553
Time: 0.259873
Time: 0.260135
Time: 0.396735
Time: 0.258397
Time: 0.262811
Time: 0.259681
Time: 0.257136
Time: 0.266248
Time: 0.412772
Time: 0.260008
The bumps with 0.39 and 0.41 are not good. They make the snake sometimes move slower and does not look good at all.
The time should always be 0.25 so that the snake will run smoothly on the screen.
Here is my code that is implemented in the gameloop (snake.getSpeed() function returns 4):
if(_clock.GetElapsedTime() > (1 / snake.getSpeed())){
std::cout << "Time: " << _clock.GetElapsedTime() << std::endl;
snake.changeDir(keyDown); //Change direction
snake.move(); //Move the snake
_clock.Reset();
}
Is the processor just to slow or do anyone have another idea on how to make the code better?
EDIT: IGNORE THE ABOVE. The real time bumper seems to be as the GetEvent functions. Don't know why but it takes all from 0 to 0.2 seconds to execute. Here is my test code:
//This is just i bit of code, therefore there's no ending brackets ;)
_clock.Reset();
while(_mainWindow.GetEvent(currentEvent)) {
std::cout << _clock.GetElapsedTime() << std::endl; //cout the elapsed time for GetEvent
(The _mainWindow is a sf::RenderWindow)
Don't know if this can be fixed but I'm leaving question unanswered and if anyone got an idea then that's great. Thanks!
First I advise you to use SFML 2, because SFML 1.6 hasn't been maintained for over 2.5 years, has quite a few known and ugly bugs and lacks many nice features from SFML 2.
Next it's mostly better to not trying to force a certain framerate, since there are factors one can't really do something about it (OS interrupts, lots of events when moving the mouse, etc.), but rather make the movement depend on the frame-rate.
The simplest way would be to use the Euler method:
pos.x = pos.x*velocity.x*dt
Where pos is a vector of the position of an object, velocity is a vector for the two-dimensional velocity and dt is the delta time, i.e. the time between two frames.
Unfortunately the simple Euler method isn't very precise and maybe the Verlet integration could give a smoother movement.
But that's again not all, because even though the movement is now more tightly bound to the frame-rate, spikes will still occur and lead to unwanted effects. Thus it's better to fix your time steps, so the rendering with it's FPS (frames per second) count is independent from the physics calculation. There are again many approaches to this and one article that I've found useful is that one.
Related
Being new to GLSL shaders, I noticed on my old netbook that adding a single more line to a perfectly running shader could suddenly multiply the execution time by thousands.
For example this fragment shader runs instantly while limit's value is 32 or below, and takes 10 seconds to run once limit's value is 33 :
int main()
{
float limit=33.;//runs instantly if =32.
float useless=0.5;
for(float i=0.;i<limit;i++) useless=useless*useless;
gl_FragColor=useless*vec4(1.,1.,1.,1.);
}
What confuses me as well is that adding one or more useless self-multiplications out of the 32 turns loop does not cause that sharp time increasing.
Here is an example without a for loop. It runs within a millisecond on my computer with 6 sin computations, and adding the seventh one suddenly makes the program take about 500ms to run :
int main()
{
float useless=gl_FragCoord.x;
useless=sin(useless);
useless=sin(useless);
useless=sin(useless);
useless=sin(useless);
useless=sin(useless);
useless=sin(useless);
useless=sin(useless);//the straw that breaks the shader's back
gl_FragColor=useless*vec4(1.,1.,1.,1.);
}
On a less outdated computer I own, the compilation time becomes too big before I can find such a breaking point.
On my netbook, I'd expect the running times to increase continuously as I add operations.
I'd like to know what causes those sudden leaps and consequently if it's a problem I should adress, planning to target the reasonably widest Steam audience. If useful, here is the netbook I'm doing my tests on http://support.hp.com/ch-fr/document/c01949780 and its chipset http://ark.intel.com/products/36549/Intel-82945GSE-Graphics-and-Memory-Controller
Also I don't know if it matters but I'm using SFML to run shaders.
according to intel, the GMA 950 supports shader model 2 in hardware, and shader model 3 in software. According to microsoft, shader model 2 has a rather harsh limit on instruction count (64 ALU and 32 tex instructions).
my guess would be that, when having more than this instruction count, the intel driver decides to do shading in software, which would match the abysmal performance you're seeing.
the sin function might expand to multiple instructions. the loop likely gets unrolled, resulting in a higher instruction count with a higher limit. why adding the 33th multiplication outside the loop does not trigger this i don't know.
to decide whether you should fix this, i can recommend the unity hardware stats and steam hardware survey. in short i'd say that the shader model 2 is nothing you need to support :)
function w=oja(X, varargin)
% get the dimensionality
[m n] = size(X);
% random initial weights
w = randn(m,1);
options = struct( ...
'rate', .00005, ...
'niter', 5000, ...
'delta', .0001);
options = getopt(options, varargin);
success = 0;
% run through all input samples
for iter = 1:options.niter
y = w'*X;
for ii = 1:n
% y is a scalar, not a vector
w = w + options.rate*(y(ii)*X(:,ii) - y(ii)^2*w);
end
end
if (any(~isfinite(w)))
warning('Lost convergence; lower learning rate?');
end
end
size(X)= 400 153600
This code implements oja's rule and runs slow. I am not able to vectorize it any more. To make it run faster I wanted to do computations on the GPU, therefore I changed
X=gpuArray(X)
But the code instead ran slower. The computation used seems to be compatible with GPU. Please let me know my mistake.
Profile Code Output:
Complete details:
https://drive.google.com/file/d/0B16PrXUjs69zRjFhSHhOSTI5RzQ/view?usp=sharing
This is not a full answer on how to solve it, but more an explanation why GPUs does not speed up, but actually enormously slow down your code.
GPUs are fantastic to speed up code that is parallel, meaning that they can do A LOT of things at the same time (i.e. my GPU can do 30070 things at the same time, while a modern CPU cant go over 16). However, GPU processors are very slow! Nowadays a decent CPU has around 2~3Ghz speed while a modern GPU has 700Mhz. This means that a CPU is much faster than a GPU, but as GPUs can do lots of things at the same time they can win overall.
Once I saw it explained as: What do you prefer, A million dollar sports car or a scooter? A million dolar car or a thousand scooters? And what if your job is to deliver pizza? Hopefully you answered a thousand scooters for this last one (unless you are a scooter fan and you answered the scooters in all of them, but that's not the point). (source and good introduction to GPU)
Back to your code: your code is incredibly sequential. Every inner iteration depends in the previous one and the same with the outer iteration. You can not run 2 of these in parallel, as you need the result from one iteration to run the next one. This means that you will not get a pizza order until you have delivered the last one, thus what you want is to deliver 1 by 1, as fast as you can (so sports car is better!).
And actually, each of these 1 line equations is incredibly fast! If I run 50 of them in my computer I get 13.034 seconds on that line which is 1.69 microseconds per iteration (7680000 calls).
Thus your problem is not that your code is slow, is that you call it A LOT of times. The GPU will not accelerate this line of code, because it is already very fast, and we know that CPUs are faster than GPUs for these kind of things.
Thus, unfortunately, GPUs suck for sequential code and your code is very sequential, therefore you can not use GPUs to speed up. An HPC will neither help, because every loop iteration depends in the previous one (no parfor :( ).
So, as far I can say, you will need to deal with it.
I've been experimenting with the GPU support of Matlab (v2014a). The notebook I'm using to test my code has a NVIDIA 840M build in.
Since I'm new to GPU computing with Matlab, I started out with a few simple examples and observed a strange scalability behavior. Instead of increasing the size of my problem, I simply put a forloop around my computation. I expected the time for the computation, to scale with the number of iterations, since the problem size itself does not increase. This was also true for smaller numbers of iterations, however at a certain point the time does not scale as expected, instead I observe a huge increase in computation time. Afterwards, the problem continues to scale again as expected.
The code example started from a random walk simulation, but I tried to produce an example that is easy and still shows the problem.
Here's what my code does. I initialize two matrices as sin(alpha)and cos(alpha). Then I loop over the number of iterations from 2**1to 2**15. I then repead the computation sin(alpha)^2 and cos(alpha)^2and add them up (this was just to check the result). I perform this calculation as often as the number of iterations suggests.
function gpu_scale
close all
NP = 10000;
NT = 1000;
ALPHA = rand(NP,NT,'single')*2*pi;
SINALPHA = sin(ALPHA);
COSALPHA = cos(ALPHA);
GSINALPHA = gpuArray(SINALPHA); % move array to gpu
GCOSALPHA = gpuArray(COSALPHA);
PMAX=15;
for P = 1:PMAX;
for i=1:2^P
GX = GSINALPHA.^2;
GY = GCOSALPHA.^2;
GZ = GX+GY;
end
end
The following plot, shows the computation time in a log-log plot for the case that I always double the number of iterations. The jump occurs when doubling from 1024 to 2048 iterations.
The initial bump for two iterations might be due to initialization and is not really relevant anyhow.
I see no reason for the jump between 2**10 and 2**11 computations, since the computation time should only depend on the number of iterations.
My question: Can somebody explain this behavior to me? What is happening on the software/hardware side, that explains this jump?
Thanks in advance!
EDIT: As suggested by Divakar, I changed the way I time my code. I wasn't sure I was using gputimeit correctly. however MathWorks suggests another possible way, namely
gd= gpuDevice();
tic
% the computation
wait(gd);
Time = toc;
Using this way to measure my performance, the time is significantly slower, however I don't observe the jump in the previous plot. I added the CPU performance for comparison and keept both timings for the GPU (wait / no wait), which can be seen in the following plot
It seems, that the observed jump "corrects" the timining in the direction of the case where I used wait. If I understand the problem correctly, then the good performance in the no wait case is due to the fact, that we do not wait for the GPU to finish completely. However, then I still don't see an explanation for the jump.
Any ideas?
I am wondering about the big performance difference of a fft and a simple addition on a GPU using Matlab. I would expect that a fft is slower on the GPU than a simple addition. But why is it the other way around? Any suggestions?
a=rand(2.^20,1);
a=gpuArray(a);
b=gpuArray(0);
c=gpuArray(1);
tic % should take a long time
for k=1:1000
fft(a);
end
toc % Elapsed time is 0.085893 seconds.
tic % should be fast, but isn't
for k=1:1000
b=b+c;
end
toc % Elapsed time is 1.430682 seconds.
It is also interesting to note that the computational time for the addition (second loop) decreases if I reduce the length of the vetor a.
EDIT
If I change the order of the two loops, i.e. if the addition is done first, the addition takes 0.2 seconds instead of 1.4 seconds. The FFT time is still the same.
I'm guessing that Matlab isn't actually running the fft because the output is not used anywhere. Also, in your simple addition loop, each iteration depends on the previous one, so it has to run serially.
I don't know why the order of the loops matters. Maybe it has something to do with cleaning up the GPU memory after the first loop. You could try calling pause(1) between the loops to let your computer get back to an idle state before the second loop. That may make your timing more consistent.
I don't have a 2012b MATLAB with GPU to hand to check this but I think that you are missing a wait() command. In 2012a, MATLAB introduced asynchronous GPU calculations. So, when you send something to the GPU it doesn't wait until its finished before moving on in code. Try this:
mygpu=gpuDevice(1);
a=rand(2.^20,1);
a=gpuArray(a);
b=gpuArray(0);
c=gpuArray(1);
tic % should take a long time
for k=1:1000
fft(a);
end
wait(mygpu); %Wait until the GPU has finished calculating before moving on
toc
tic % should be fast
for k=1:1000
b=b+c;
end
wait(mygpu); %Wait until the GPU has finished calculating before moving on
toc
The computation time of the addition should no longer depend on when its carried out. Would you mind checking and getting back to me please?
I'm trying to solve a 'decaying' puzzle that goes somewhat like this:
given A is 100 at DateTime.new(2012,5,10,0,0,0) and is decaying by 0.5 every 12 seconds, has it decayed exactly 20 by DateTime.new(2012,5,10,0,8,0)?
It so happens that the answer to that question is - well, true :)
But what about
A being 1304.5673,
the decay 0.00000197 every 1.2 msec
and end time being not one but 2000 DateTime.new's
I've tried with
fd=3.minutes.ago.to_datetime
td=Time.now
material=1304.5673
decay=0.00000197
step=0.00012.seconds
fd.step(td,step){ |n| material-=decay }
puts material
and the processing time is acceptable - but if I step any further back in time (like perhaps 10.hours or even 2.hours; my CPU cooler starts building up momentum, like it was about to propel the entire Mac into orbit :(
I've toiled with this problem for quite a while - even though the timespan from question to answer on SO does indicate the opposite <:)
(and the answer, to me, explicitly demonstrates why Ruby is such a wonderful language!)
# recap the variables in the question
total_decay = ((td.to_time - fd.to_time).divmod( step))[0]* decay
puts "new material: #{material - total_decay}"
The results will probably not pass scientific scrutiny, but I'm OK with that (for now) ;)