Consider:
float[] xPos = new float[pt3f.Count];
float[] yPos = new float[pt3f.Count];
float[] zPos = new float[pt3f.Count];
for (int i = 0; i < pt3f.Count; i++)
{
xPos[i] = pt3f[i].X;
yPos[i] = pt3f[i].Y;
zPos[i] = pt3f[i].Z;
}
I know I can use LINQ here
var xPos = pt3f.Select(I => I.X).ToArray();
var yPos = pt3f.Select(I => I.Y).ToArray();
var zPos = pt3f.Select(I => I.Z).ToArray();
so my questions is apart from a much cleaner code using LINQ, are there any performance benefits?
I think in terms of performance, using a single for-loop is faster, am I right i.e. 3 Linqs will be eventually converted to 3 loops.
Yes, the LINQ example will result in three loops instead of just the one in your first example. But in most practical cases, you won't notice any difference, at least not on small arrays.
Unless you actually notice a performance problem, I personally would prefer the more readable LINQ version.
Related
I'm working on an implementation of Ukkonen's linear time suffix tree construction algorithm, and planning to implement improvements suggested by e.g. Kurtz and NJ Larsson (for example edge links instead of suffix links).
While testing, I experienced mixed green and red lights based on the specific strings I tested, and had similar experiences with a few algorithms I found online. Which made me wonder:
Are there any known, specifically built (preferably simple/short) strings for unit-testing suffix trees to ensure the algorithm works precisely in all branching scenarios?
Furthermore, are there any good methods to separate the testing of the tree building algorithm from the testing of the traversal/lookup algorithm?
I know this question doesn't have a single specific correct answer, but I think it could serve as a good reference point for people working on similar algorithms.
My current unit-testing approach is quite primitive (C# with NUnit):
[TestCase]
public void Contains_Simple_ShouldReturnTrue()
{
var s = "bananasbanananananananananabananas";
var st = SuffixTree.Build(s);
var t1 = s.Substring(0, 10);
Assert.IsTrue(st.Contains(t1));
}
// ... Other simple test cases
[TestCase]
// This test fails, but it's not particularly helpful for bugfixing
public void Contains_DynamicBarrage_OnLongString_ShouldReturnTrue()
{
const int CYCLES = 200,
MAXLEN = 200;
var s = "olbafuynhfcxzqhnebecxjrfwfttw"; // Shortened for sanity
var st = SuffixTree.Build(s);
var r = new Random();
for (int i = 0; i < CYCLES; i++)
{
var pos = r.Next(0, s.Length - 2);
var len = r.Next(1, Math.Min(s.Length - pos, MAXLEN));
Assert.IsTrue(st.Contains(s.Substring(pos, len)));
}
}
I'm trying to re-implement the shared memory implementation of the n-body simulation presented in chapter 6.1.6 in An Introduction to Parallel Programming by Peter Pacheco. In that chapter it was implemented using OpenMP.
Here is my parallel implementation using OpenMP. And here is a serial implementation using Chapel. I'm having issues implementing a shared memory parallel implementation using Chapel. Since there is no way to get the rank of a thread in a forall loop, I can't use the same approach as in the OpenMP implementation. I would have to use the coforall loop, create the tasks and distribute the iterations manually. That does not seem practical and suggests that there is a more elegant way to solve this within Chapel.
I'm looking for guidance and suggestions on how to better solve this problem using the tools provided by Chapel.
My suggestion would be to use a (+) reduce intent on forces in your forall-loop, which will give each task its own private copy of forces and then (sum) reduce their individual copies back into the original forces variable as the tasks complete. This would be done by attaching the following with-clause to your forall loop:
forall q in 0..#n_bodies with (+ reduce forces) {
While here, I looked for other ways to make the code a bit more elegant and would suggest changing from a 2D array to an array-of-arrays for this problem in order to collapse a bunch of trios of similar code statements for x, y, z components down to a single statement. I also made use of your pDomain variable and created a type alias for [0..#3] real in order to remove some redundancy in the code. Oh, and I removed the uses of the Math and IO modules because they are auto-used in Chapel programs.
Here's where that left me:
config const filename = "input.txt";
config const iterations = 100;
config const out_filename = "out.txt";
const X = 0;
const Y = 1;
const Z = 2;
const G = 6.67e-11;
config const dt = 0.1;
// Read input file, initialize bodies
var f = open(filename, iomode.r);
var reader = f.reader();
var n_bodies = reader.read(int);
const pDomain = {0..#n_bodies};
type vec3 = [0..#3] real;
var forces: [pDomain] vec3;
var velocities: [pDomain] vec3;
var positions: [pDomain] vec3;
var masses: [pDomain] real;
for i in pDomain {
positions[i] = reader.read(vec3);
velocities[i] = reader.read(vec3);
masses[i] = reader.read(real);
}
f.close();
reader.close();
for i in 0..#iterations {
// Reset forces
forces = [0.0, 0.0, 0.0];
forall q in pDomain with (+ reduce forces) {
for k in pDomain {
if k <= q {
continue;
}
var diff = positions[q] - positions[k];
var dist = sqrt(diff[X]**2 + diff[Y]**2 + diff[Z]**2);
var dist_cubed = dist**3;
var tmp = -G * masses[q] * masses[k] / dist_cubed;
var force_qk = tmp * diff;
forces[q] += force_qk;
forces[k] -= force_qk;
}
}
forall q in pDomain {
positions[q] += dt * velocities[q];
velocities[q] += dt / masses[q] * forces[q];
}
}
var outf = open(out_filename, iomode.cw);
var writer = outf.writer();
for q in pDomain {
writer.writeln("%er %er %er %er %er %er".format(positions[q][X], positions[q][Y], positions[q][Z], velocities[q][X], velocities[q][Y], velocities[q][Z]));
}
writer.close();
outf.close();
One other change you could consider making would be to replace the forall-loop that updates positions and velocities with the following whole-array statements:
positions += dt * velocities;
velocities += dt / masses * forces;
where the main tradeoff would be that the forall would implement the statements in a fused manner using a single parallel loop while the whole-array statements would not (at least in the current version 1.18 version of the compiler).
I was told once that I should avoid referencing properties and such of my main game class from other classes as much as possible because it's an inefficient thing to do. Is this actually true? For a trivial example, in my Character class would
MainGame.property = something;
MainGame.property2 = something2;
MainGame.property3 = something3;
//etc.
take more time to execute than putting the same in a function
MainGame.function1();
and calling that (thereby needing to "open" that class only once rather than multiple times)? By the same logic, would
something = MainGame.property;
something2 = MainGame.property;
be slightly less efficient than
variable = MainGame.property;
something = variable;
something2 = variable;?
Think of referencing things as of operations. Every "." is an operation, every function call "()" is an operation (and a heavy one), every "+", "-" and so on.
So, indeed, certain approaches will generate more efficient code than other ones.
But. The thing is, to feel the result of inefficient approach you need to create a program performing millions of operations (or heavy tasks of the appropriate difficulty) each frame. If you are not parsing megabytes of binary data, or converting bitmaps, or such, you can not worry about performance. Although worrying about performance and efficiency is generally a good thing to do.
If you are willing to measure the efficiency of your code, you are free to measure its performance:
var aTime:int;
var a:int = 10;
var b:int = 20;
var c:int;
var i:int;
var O:Object;
aTime = getTimer();
O = new Object;
for (i = 0; i < 1000000; i++)
{
O.ab = a + b;
O.ba = a + b;
}
trace("Test 1. Elapsed", getTimer() - aTime, "ms.");
aTime = getTimer();
O = new Object;
c = a + b;
for (i = 0; i < 1000000; i++)
{
O.ab = c;
O.ba = c;
}
trace("Test 2. Elapsed", getTimer() - aTime, "ms.");
aTime = getTimer();
O = new Object;
for (i = 0; i < 1000000; i++)
{
O['ab'] = a + b;
O['ba'] = a + b;
}
trace("Test 3. Elapsed", getTimer() - aTime, "ms.");
Be prepared to run million-iteration loops so that total execution time exceeds 1 second, otherwise the precision will be too low.
For days ago, I ask a question on how to use the edge collapse with Assimp. Smooth the obj and remove duplicated vertices in software are sloved the basic problem that could make edge collapse work, I mean it work because it could be simplicated by MeshLab like this:
It looks good in MeshLab, but I then do it in my engine which used Assimp and OpenMesh. The problem is Assimp imported the specified vertices and Indices, that could let the halfedge miss the opposite pair (Is this called non-manifold?).
The result snapshot use OpenMesh's Quadric Decimation:
To clear to find the problem, I do it without decimation and parse the OpenMesh data structure back directly. Everything is work fine as expect (I mean the result without decimation).
The code that I used to decimate the mesh:
Loader::BasicData Loader::TestEdgeCollapse(float vertices[], int vertexLength, int indices[], int indexLength, float texCoords[], int texCoordLength, float normals[], int normalLength)
{
// Mesh type
typedef OpenMesh::TriMesh_ArrayKernelT<> OPMesh;
// Decimater type
typedef OpenMesh::Decimater::DecimaterT< OPMesh > OPDecimater;
// Decimation Module Handle type
typedef OpenMesh::Decimater::ModQuadricT< OPMesh >::Handle HModQuadric;
OPMesh mesh;
std::vector<OPMesh::VertexHandle> vhandles;
int iteration = 0;
for (int i = 0; i < vertexLength; i += 3)
{
vhandles.push_back(mesh.add_vertex(OpenMesh::Vec3f(vertices[i], vertices[i + 1], vertices[i + 2])));
if (texCoords != nullptr)
mesh.set_texcoord2D(vhandles.back(),OpenMesh::Vec2f(texCoords[iteration * 2], texCoords[iteration * 2 + 1]));
if (normals != nullptr)
mesh.set_normal(vhandles.back(), OpenMesh::Vec3f(normals[i], normals[i + 1], normals[i + 2]));
iteration++;
}
for (int i = 0; i < indexLength; i += 3)
mesh.add_face(vhandles[indices[i]], vhandles[indices[i + 1]], vhandles[indices[i + 2]]);
OPDecimater decimater(mesh);
HModQuadric hModQuadric;
decimater.add(hModQuadric);
decimater.module(hModQuadric).unset_max_err();
decimater.initialize();
//decimater.decimate(); // without this, everything is fine as expect.
mesh.garbage_collection();
int verticesSize = mesh.n_vertices() * 3;
float* newVertices = new float[verticesSize];
int indicesSize = mesh.n_faces() * 3;
int* newIndices = new int[indicesSize];
float* newTexCoords = nullptr;
int texCoordSize = mesh.n_vertices() * 2;
if(mesh.has_vertex_texcoords2D())
newTexCoords = new float[texCoordSize];
float* newNormals = nullptr;
int normalSize = mesh.n_vertices() * 3;
if(mesh.has_vertex_normals())
newNormals = new float[normalSize];
Loader::BasicData data;
int index = 0;
for (v_it = mesh.vertices_begin(); v_it != mesh.vertices_end(); ++v_it)
{
OpenMesh::Vec3f &point = mesh.point(*v_it);
newVertices[index * 3] = point[0];
newVertices[index * 3 + 1] = point[1];
newVertices[index * 3 + 2] = point[2];
if (mesh.has_vertex_texcoords2D())
{
auto &tex = mesh.texcoord2D(*v_it);
newTexCoords[index * 2] = tex[0];
newTexCoords[index * 2 + 1] = tex[1];
}
if (mesh.has_vertex_normals())
{
auto &normal = mesh.normal(*v_it);
newNormals[index * 3] = normal[0];
newNormals[index * 3 + 1] = normal[1];
newNormals[index * 3 + 2] = normal[2];
}
index++;
}
index = 0;
for (f_it = mesh.faces_begin(); f_it != mesh.faces_end(); ++f_it)
for (fv_it = mesh.fv_ccwiter(*f_it); fv_it.is_valid(); ++fv_it)
{
int id = fv_it->idx();
newIndices[index] = id;
index++;
}
data.Indices = newIndices;
data.IndicesLength = indicesSize;
data.Vertices = newVertices;
data.VerticesLength = verticesSize;
data.TexCoords = nullptr;
data.TexCoordLength = -1;
data.Normals = nullptr;
data.NormalLength = -1;
if (mesh.has_vertex_texcoords2D())
{
data.TexCoords = newTexCoords;
data.TexCoordLength = texCoordSize;
}
if (mesh.has_vertex_normals())
{
data.Normals = newNormals;
data.NormalLength = normalSize;
}
return data;
}
Also provide the tree obj I tested, and the face data that generated by Assimp, I fetch out from visual studio debugger, that shows the problem that some of the indices could not find the index pair.
Few weeks thinking about this and fails, I thought I want some Academic/Mathematical solution for automatically generating these decimated mesh, but now I'm trying to find the simple way to implement this, the way I am able to do is changing the structure for loading multi-object (file.obj) in single custom object (class obj), and switch the object when needed it. The benefit of this is I could manage what should present and ignore any algorithm problem.
By the way, I list some obstacles that push me back to simple way.
Assimp Unique Indices and Vertices, this is nothing wrong, but for the algorithm, no way to make the adjacency half-edge structure for this.
OpenMesh for reading only object file(*.obj), this could be done when using read_mesh function, but the disadvantage is the lack example of document and hard to using in my engine.
Write a custom 3d model importer for any format is hard.
In conclusion, there are two ways to make level of details work in engine, one is using the mesh simplication algorithm and more test to ensure quality, the other is just switch the 3dmodel that made by 3d software, It is not automatic but stable. I use the second method, and I show the result here :)
However, this is not a real solution with my question, so I won't assign me an answer.
Which one is faster? Why?
var messages:Array = [.....]
// 1 - for
var len:int = messages.length;
for (var i:int = 0; i < len; i++) {
var o:Object = messages[i];
// ...
}
// 2 - foreach
for each (var o:Object in messages) {
// ...
}
From where I'm sitting, regular for loops are moderately faster than for each loops in the minimal case. Also, as with AS2 days, decrementing your way through a for loop generally provides a very minor improvement.
But really, any slight difference here will be dwarfed by the requirements of what you actually do inside the loop. You can find operations that will work faster or slower in either case. The real answer is that neither kind of loop can be meaningfully said to be faster than the other - you must profile your code as it appears in your application.
Sample code:
var size:Number = 10000000;
var arr:Array = [];
for (var i:int=0; i<size; i++) { arr[i] = i; }
var time:Number, o:Object;
// for()
time = getTimer();
for (i=0; i<size; i++) { arr[i]; }
trace("for test: "+(getTimer()-time)+"ms");
// for() reversed
time = getTimer();
for (i=size-1; i>=0; i--) { arr[i]; }
trace("for reversed test: "+(getTimer()-time)+"ms");
// for..in
time = getTimer();
for each(o in arr) { o; }
trace("for each test: "+(getTimer()-time)+"ms");
Results:
for test: 124ms
for reversed test: 110ms
for each test: 261ms
Edit: To improve the comparison, I changed the inner loops so they do nothing but access the collection value.
Edit 2: Answers to oshyshko's comment:
The compiler could skip the accesses in my internal loops, but it doesn't. The loops would exit two or three times faster if it was.
The results change in the sample code you posted because in that version, the for loop now has an implicit type conversion. I left assignments out of my loops to avoid that.
Of course one could argue that it's okay to have an extra cast in the for loop because "real code" would need it anyway, but to me that's just another way of saying "there's no general answer; which loop is faster depends on what you do inside your loop". Which is the answer I'm giving you. ;)
When iterating over an array, for each loops are way faster in my tests.
var len:int = 1000000;
var i:int = 0;
var arr:Array = [];
while(i < len) {
arr[i] = i;
i++;
}
function forEachLoop():void {
var t:Number = getTimer();
var sum:Number = 0;
for each(var num:Number in arr) {
sum += num;
}
trace("forEachLoop :", (getTimer() - t));
}
function whileLoop():void {
var t:Number = getTimer();
var sum:Number = 0;
var i:int = 0;
while(i < len) {
sum += arr[i] as Number;
i++;
}
trace("whileLoop :", (getTimer() - t));
}
forEachLoop();
whileLoop();
This gives:
forEachLoop : 87
whileLoop : 967
Here, probably most of while loop time is spent casting the array item to a Number. However, I consider it a fair comparison, since that's what you get in the for each loop.
My guess is that this difference has to do with the fact that, as mentioned, the as operator is relatively expensive and array access is also relatively slow. With a for each loop, both operations are handled natively, I think, as opossed to performed in Actionscript.
Note, however, that if type conversion actually takes place, the for each version is much slower and the while version if noticeably faster (though, still, for each beats while):
To test, change array initialization to this:
while(i < len) {
arr[i] = i + "";
i++;
}
And now the results are:
forEachLoop : 328
whileLoop : 366
forEachLoop : 324
whileLoop : 369
I've had this discussion with a few collegues before, and we have all found different results for different scenarios. However, there was one test that I found quite eloquent for comparison's sake:
var array:Array=new Array();
for (var k:uint=0; k<1000000; k++) {
array.push(Math.random());
}
stage.addEventListener("mouseDown",foreachloop);
stage.addEventListener("mouseUp",forloop);
/////// Array /////
/* 49ms */
function foreachloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var i:uint=0;
for each (var n:Number in array) {
i++;
tmp+=n;
}
trace("foreach", i, tmp, getTimer() - t1);
}
/***** 81ms ****/
function forloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var l:uint=array.length;
for(var i:uint = 0; i < l; i++)
tmp += Number(array[i]);
trace("for", i, tmp, getTimer() - t1);
}
What I like about this tests is that you have a reference for both the key and value in each iteration of both loops (removing the key counter in the "for-each" loop is not that relevant). Also, it operates with Number, which is probably the most common loop that you will want to optimize that much. And most importantly, the winner is the "for-each", which is my favorite loop :P
Notes:
-Referencing the array in a local variable within the function of the "for-each" loop is irrelevant, but in the "for" loop you do get a speed bump (75ms instead of 105ms):
function forloop(e) {
var t1:uint=getTimer();
var tmp:Number=0;
var a:Array=array;
var l:uint=a.length;
for(var i:uint = 0; i < l; i++)
tmp += Number(a[i]);
trace("for", i, tmp, getTimer() - t1);
}
-If you run the same tests with the Vector class, the results are a bit confusing :S
for would be faster for arrays...but depending on the situation it can be foreach that is best...see this .net benchmark test.
Personally, I'd use either until I got to the point where it became necessary for me to optimize the code. Premature optimization is wasteful :-)
Maybe in a array where all element are there and start at zero (0 to X) it would be faster to use a for loop. In all other case (sparse array) it can be a LOT faster to use for each.
The reason is the usage of two data structure in the array: Hast table an Debse Array.
Please read my Array analysis using Tamarin source:
http://jpauclair.wordpress.com/2009/12/02/tamarin-part-i-as3-array/
The for loop will check at undefined index where the for each will skip those one jumping to next element in the HastTable
guys!
Especially Juan Pablo Califano.
I've checked your test. The main difference in obtain array item.
If you will put var len : int = 40000;, you will see that 'while' cycle is faster.
But it loses with big counts of array, instead for..each.
Just an add-on:
a for each...in loop doesn't assure You, that the elements in the array/vector gets enumerated in the ORDER THEY ARE STORED in them. (except XMLs)
This IS a vital difference, IMO.
"...Therefore, you should not write code that depends on a for-
each-in or for-in loop’s enumeration order unless you are processing
XML data..." C.Moock
(i hope not to break law stating this one phrase...)
Happy benchmarking.
sorry to prove you guys wrong, but for each is faster. even a lot. except, if you don't want to access the array values, but a) this does not make sense and b) this is not the case here.
as a result of this, i made a detailed post on my super new blog ... :D
greetz
back2dos