I was surprised by the performance of my Dart code in the browser vs. the Dart VM. Here is a simple example that reproduces the issue.
test('speed test', () {
var n = 10000;
var rand = Random(0);
var x = List.generate(n, (i) => rand.nextDouble());
var res = <num>[];
var sw = Stopwatch()..start();
for (int i=0; i<1000; i++) {
for (int j=0; j<n; j++) {
x[j] += i;
}
res.add(x.reduce((a, b) => a + b));
}
sw.stop();
print('Milliseconds: ${sw.elapsedMilliseconds}');
});
If I run this code with dart, I get somewhere around 140 milliseconds. If I run the same code as a browser test with pub run test -p "chrome" ... I get times around 8000 milliseconds.
I am willing to wait for a 0.1 s calculation, but to wait 8 s for something in the browser, no -- it is basically unusable. When I go in release mode, the performance in browser improve but it's still 10x slower.
Am I missing something? Do I have to avoid any calculations in the browser?
Thanks,
Tony
It's interesting how slow this is.
The corresponding JavaScript code:
(function() {
"use strict";
var n = 10000;
var x = [];
var res = [];
for (var i = 0; i < n; i++) x.push(Math.random());
var t0 = Date.now();
for (var i = 0; i < 1000; i++) {
for (var j = 0; j < n; j++) {
x[j] += i;
}
res.push(x.reduce((a, b) => a + b));
}
var t1 = Date.now();
console.log("Milliseconds: " + (t1 - t0));
}());
runs in as little as ~20 milliseconds.
So, it looks like Dart is somehow triggering "slow mode" for its generated Javascript.
If you look at the generated code, it contains:
for (i = 0; i < 1000; ++i) {
for (j = 0; j < 10000; ++j) {
if (j >= x.length)
return H.ioore(x, j);
t1 = x[j];
if (typeof t1 !== "number")
return t1.$add();
C.JSArray_methods.$indexSet(x, j, t1 + i);
}
C.JSArray_methods.add$1(res, C.JSArray_methods.reduce$1(x, new A.main_closure0()));
}
You can try to tweak this code, but the big cost comes from C.JSArray_methods.$indexSet(x, j, t1 + i);. If you change that to x[j] = t1 + i;, the time drops to a few hundred milliseconds. So, this is the problem with the current code.
(You can improve performance a little, ~20%, by making x a List<num> instead of a List<double>. I have no idea why that makes a difference, the generated code is almost the same, the add closure uses checkDouble to check the type instead of checkNum, but they have exactly the same body).
You don't have to avoid any computation in the browser. You may have to optimize a little for slow cases like this (or report them to the compiler developers, because this probably can be recognized and optimized, it just fails to be so for now). For example, you can change your list x of doubles to a Float64List from dart:typed_data:
var x = Float64List.fromList([for (var i = 0; i < n; i++) rand.nextDouble()]);
Then speed increases quite a lot.
The Dart tracking issue for this is https://github.com/dart-lang/sdk/issues/38705.
The performance of this kind of code has recently improved considerably and is much closer to the Dart VM.
Related
In my for loop I'm getting an 'avoid unnecessary statements', the code works fine but I assume there must be a more efficient way of writing the below.
int i = 0;
for (i; i < (list.length - list.indexOf('.')); i++)
^
I'm getting the alert on the first i in the for loop.
I understand that the alternative is to replace it with var i = 0, but I've done it this way so I can use the ith position for running the next loop after this one.
You don't need to put anything in the first part of the for:
var i = 0;
for (; i < list.length - list.indexOf('.'); i++) ...
Putting an i there is technically valid, but kindof useless since it doesn't do anything, (which is why you get a warning).
You can also do:
var i = 0;
for (var n = list.length - list.indexOf('.'); i < n; i++) ...
and avoid computing the end for every step of the loop. That would be quite a lot more efficient.
You can also do:
var n = list.length - list.indexOf('.');
for (var i = 0; i < n; i++) ...
and then use n afterwards instead of i. That would make it a more traditional loop for readers looking at it.
I have been trying to get better awareness of cache locality. I produced the 2 code snippets to gain better understanding of the cache locality characteristics of both.
vector<int> v1(1000, some random numbers);
vector<int> v2(1000, some random numbers);
for(int i = 0; i < v1.size(); ++i)
{
auto min = INT_MAX;
for(int j = 0; j < v2.size(); ++j)
{
if(v2[j] < min)
{
v1[i] = j;
}
}
}
vs
vector<int> v1(1000, some random numbers);
vector<int> v2(1000, some random numbers);
for(int i = 0; i < v1.size(); ++i)
{
auto min = INT_MAX;
auto loc = -1;
for(int j = 0; j < v2.size(); ++j)
{
if(v2[j] < min)
{
loc = j;
}
}
v1[i] = loc;
}
In the first code v1 is being updated directly inside the if statement. Does this cause cache locality issues because during the update it'll replace the current cache line with some contiguous segment of data from v1[i] to v1[cache_line_size/double_size]? If this analysis is correct, then this would seem to slow down the loop over j, since for each iteration of the j loop, it'll likely have cache misses. It seems the second implementation alleviates this issue by using a local variable and not updating v1[i] until after the j loop is complete?
In practice, I think the compiler might optimize the cache locality issue away? For discussion, how about we assume no compiler optimizations.
I would like to know if there is a way to optimize the speed of time to implementing this code in C. This part is for initializing the matrix in rowwise.
For other parts of my whole code are basically calculate current time and Main function, so I guess this initialization is the most time-cost part
Some hint for me is we can use cache blocking. BTW this code is also for imitate the process which CPU scratch data from cache. I had been figuring it all day but had limited idea.
Thanks!!
void InitializeMatrixRowwise() {
int i, j;
double x;
x = 0.0;
for (i = 0; i < DIMENSION; i++) {
for (j = 0; j < DIMENSION; j++) {
if (i >= j) {
Matrix[i][j] = x;
x += 1.0;
} else
Matrix[i][j] = 1.0;
}
}
}
I try to implement video stabilization project in C++/cli.First of all I have bmp image sequences,and I found motion vectors that show how much specific pixel region move between each image frame . For example I have 256*256 image, I selected 200*200 region in first image frame and secong image frame.And I found how much pixel move between first region and second region.When algorithm went to the last image sequnce,program finished the work.Eventually,I obtained motion vectors.I did this operation using mean absolute method.
It worked, but too slowly.My example code block is here,I found only one motion vector first index(x direction and y direction):
//M:image height =256
//N.image width =256
//BS:block size=218
//selecting and reading first and second image frame
frame = 1;
s1 = "C:\\bike\\" + frame + ".bmp";
image = gcnew System::Drawing::Bitmap(s1, true);
s2 = "C:\\bike\\" + (frame + 1) + ".bmp";
image2 = gcnew System::Drawing::Bitmap(s2, true);
for (b = 0; b < M; b++){
for (a = 0; a < N; a++)
{
System::Drawing::Color BitmapColor = image->GetPixel(a, b);
I1[b][a] = (double)((BitmapColor . R * 0.3) + (BitmapColor . G * 0.59) + (BitmapColor . B * 0.11));
}
}
for (b = 0; b < M; b++){
for (a = 0; a < N; a++)
{
System::Drawing::Color BitmapColor = image2->GetPixel(a, b);
I2[b][a] = (double)((BitmapColor . R * 0.3) + (BitmapColor . G * 0.59) + (BitmapColor . B * 0.11));
}
}
//finding blocks
a = 0;
for (i = 19; i < 237; i++){
b = 0;
for (j = 19; j < 237; j++){
Blocks[a][b] = I2[i][j];
b++;
}
a++;
}
//finding motion vectors according to the mean absolute differences
//MAD method
for (m = 0; m < (M - BS); m++){
for (n = 0; n < (N - BS); n++){
toplam = 0;
for (i = 0; i < BS; i++){
for (j = 0; j < BS; j++){
toplam += fabs(I1[m + i][n + j] - Blocks[i][j]);
}
}
// finding vectors
if (difference < mindifference) {
mindifference = difference;
MV_x = m;
MV_y = n;
}
}
}
This code example worked.But this is very slowly.I need to implement code optimization.
How can I do this without using for cycles,such as I do indexing in C++/cli like MATLAB codes(ex. I1(1:20)=100).
Could you help me please?
Best Regards...
A couple things you should note:
First, loops in C++ are not slow compared to built-in functions. In MatLab, the fewer operations the better, so it's best to call built-in functions that are a single operation done with optimized code. In C++, YOUR code gets optimized equally with the built-in functions.
Next, GetPixel is extremely slow. Try Bitmap.LockBits instead. Ironically, this seems to contradict my previous statement, but actually it isn't because looping inside LockBits is faster than you doing the loop, but because GetPixel uses a different method which is much much slower.
Once you switch to LockBits, you can probably double or triple your speed again by unrolling the loop somewhat, if the compiler isn't already doing so.
Finally, make sure you're making good use of cache locality. Try both looping orders (e.g. for (a...) for (b...) and for (b...) for (a...)) and measure the time of each to find out which is faster.
Is there a performance hit when iterating over object attributes vs. iterating an array?
Example, using objects:
var x:Object = {one: 1, two: 2, three: 3};
for (var s:String in x) {
trace(x[s]);
}
Vs using an array
var a:Array = [1, 2, 3];
var len:Number = a.length;
for (var i:Number = 0; i < len; ++i) {
trace(a[i]);
}
So - which is faster and most importantly by what factor?
IIRC, in some JavaScript implementation iterating over objects attributes is slower up to 20x but I haven't been able to find such measurement for ActionScript2.
I just tried a very similar test, but iterating just once over 200k elements, with opposite results:
Task build-arr: 2221ms
Task iter-arr: 516ms
Task build-obj: 1410ms
Task iter-obj: 953ms
I suspect Luke's test is dominated by loop overhead, which seems bigger in the array case.
Also, note that the array took significantly longer to populate in the first place, so ymmv if your task is insert-heavy.
Also, in my test, storing arr.length in a local variable gave a measurable performance increase of about 15%.
Update:
By popular demand, I am posting the code I used.
var iter:Number = 200000;
var time:Number = 0;
var obj:Object = {};
var arr:Array = [];
time = getTimer();
for (var i:Number = 0; i < iter; ++i) {
arr[i] = i;
}
trace("Task build-arr: " + (getTimer() - time) + "ms");
time = getTimer();
for (var i:Number = 0; i < iter; ++i) {
arr[i] = arr[i];
}
trace("Task iter-arr: " + (getTimer() - time) + "ms");
time = getTimer();
for (var i:Number = 0; i < iter; ++i) {
obj[String(i)] = i;
}
trace("Task build-obj: " + (getTimer() - time) + "ms");
time = getTimer();
for (var i:String in obj) {
obj[i] = obj[i];
}
trace("Task iter-obj: " + (getTimer() - time) + "ms");
OK. Why not do some simple measurements?
var time : Number;
time = getTimer();
var x:Object = {one: 1, two: 2, three: 3};
for( i = 0; i < 100000; i++ )
{
for (var s:String in x)
{
// lets not trace but do a simple assignment instead.
x[s] = x[s];
}
}
trace( getTimer() - time + "ms");
time = getTimer();
var a:Array = [1, 2, 3];
var len:Number = a.length;
for( i = 0; i < 100000; i++ )
{
for ( var j : Number = 0; j < len; j++)
{
a[j] = a[j];
}
}
trace( getTimer() - time + "ms");
On my machine the array iteration is somewhat slower. This could be because ActionScript 2 doesn't have 'real' arrays but only associative arrays (maps). Apparently to work with an array the compiler has to generate some code overhead. I haven't looked into the specifics of this but I can imagine that that could be the case.
BTW. Doing this test might also show that putting the array length value into a variable doesn't really increase performance either. Just give it go....
UPDATE: Even though ActionScript and JavaScript are syntactically related, the underlying execution mechanism is completely different. E.g. FireFox uses SpiderMonkey and IE will probably use a Microsoft implementation whereas AS2 is executed by the Adobe's AVM1.