Mathematica optimization append table in a loop - performance

I am kinda of new in Mathematica and there are lots of appendto in my code which I think take up a look of time. I know there are some other ways optimize but I cannot really know exactly how to achieve. I think getBucketShocks can be improved a lot? Anyone?
getBucketShocks[BucketPivots_,BucketShock_,parallelOffset_:0]:=
Module[{shocks,pivotsNb},
shocks={};
pivotsNb=Length[BucketPivots];
If[pivotsNb>1,
AppendTo[shocks,LinearFunction[{0,BucketShock},{BucketPivots[[1]],BucketShock},{BucketPivots[[2]],0},BucketPivots[[2]],0},parallelOffset]];
Do[AppendTo[shocks,LinearFunction[{BucketPivots[[i-1]],0},{BucketPivots[[i]],BucketShock},{BucketPivots[[i+1]],0},{BucketPivots[[i+1]],0},parallelOffset]],{i,2,pivotsNb-1}];
AppendTo[shocks,LinearFunction[{BucketPivots[[pivotsNb-1]],0},{BucketPivots[[pivotsNb]],BucketShock},{BucketPivots[[pivotsNb]],BucketShock},{BucketPivots[[pivotsNb]],BucketShock},parallelOffset]],
If[pivotsNb==1,AppendTo[shocks,BucketShock+parallelOffset&]];
];
shocks];
LinearInterpolation[x_,{x1_,y1_},{x2_,y2_},parallelOffset_:0]:=parallelOffset+y1+(y2-y1)/(x2-x1)*(x-x1);
LinearFunction[p1_,p2_,p3_,p4_,parallelOffset_:0]:=Which[
#<=p1[[1]],parallelOffset+p1[[2]],
#<=p2[[1]],LinearInterpolation[#,p1,p2,parallelOffset],
#<=p3[[1]],LinearInterpolation[#,p2,p3,parallelOffset],
#<=p4[[1]],LinearInterpolation[#,p3,p4,parallelOffset],
#>p4[[1]],parallelOffset+p4[[2]]]&;

I think you can optimize the middle Do loop a lot by using some form of Map one way or another. At every iteration, you're trying to access 3 adjacent elements of BucketPivots. This seems like this would be the easiest to do with MovingMap, but you need to jump through a few hoops to get the arguments in the right place. This one is probably the easiest solution:
shocks = MovingMap[
LinearFunction[
{#[[1]], 0},
{#[[2]], BucketShock},
{#[[3]], 0},
{#[[3]], 0},
parallelOffset
]&,
BucketPivots,
2
]
As a general principle: if you want to do a Do or For loop in Mathematica that runs over the Length of another list, try to find a way you can do it with a function from the Map family (Map, MapIndexed, MapAt, MapThread, etc.) and get familiar with those. They are great substitutions for iterations!
After this, the first and last elements of shocks you can then add with AppendTo.
BTW: here's a free tip. I recommend that in Mathematica you avoid giving variables and functions names that start with a capital (like you did with BucketPivots). All of Mathematica's own symbols start with capitals, so if you avoid starting with them yourself, you'll never clash with a build-in function.

Related

What's the most efficient way of combining switch/if statements

This question doesn't address any programming language in particular but of course I'm happy to hear some examples.
Imagine a big number of files, let's say 5000, that have all kinds of letters and numbers in it. Then, there is a method that receives a user input that acts as an alias in order to display that file. Without having the files sorted in a folder, the method(s) need to return the file name that is associated to the alias the user provided.
So let's say user input "gd322" stands for the file named "k4e23", the method would look like
if(input.equals("gd322")){
return "k4e23";
}
Now, imagine having 4 values in that method:
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
case s3red: return 536fg;
case h563d: return h425d;
} //switch on string, no break, no string indicators, ..., pls ignore the syntax, it's just pseudo
Keeping in mind we have 5000 entries, there are probably more than just 2 entries starting with g. Now, if the user input starts with 's', instead of wasting CPU cycles checking all the a's, b's, c's, ..., we could also make another switch for this, which then directs to the 'next' methods like this:
switch(input[0]){ //implying we could access strings like that
case a: switchA(input);
case b: switchB(input);
// [...]
case g: switchG(input);
case s: switchS(input);
}
So the CPU doesn't have to check on all of them, but rather calls a method like this:
switchG(String input){
switch(input){
case gd322: return fw332;
case g344d: return 5g4gh;
// [...]
}
Is there any field of computer science dealing with this? I don't know how to call it and therefore don't know how to search for it but I think my thoughts make sense on a large scale. Pls move the thread if it doesn't belong here but I really wanna see your thoughts on this.
EDIT: don't quote me on that "5000", I am not in the situation described above and I wanted to talk about this completely theoretical, it could also be 3 entries or 300'000, maybe even less or more
If you have 5000 options, you're probably better off hashing them than having hard-coded if / switch statements. In c++ you could also use std::map to pair a function pointer or other option handling information with each possible option.
Interesting, but I don't think you can give a generic answer. It all depends on how the code is executed. Many compilers will have all kinds of optimizations, in the if and switch, but also in the way strings are compared.
That said, if you have actual (disk) files with those lists, then reading the file will probably take much longer than processing it, since disk I/O is very slow compared to memory access and CPU processing.
And if you have a list like that, you may want to build a hash table, or simply a sorted list/array in which you can perform a binary search. Sorting it also takes time, but if you have to do many lookups in the same list, it may be well worth the time.
Is there any field of computer science dealing with this?
Yes, the science of efficient data structures. Well, isn't that what CS is all about? :-)
The algorithm you described resembles a trie. It wouldn't be statically encoded in the source code with switch statements, but would use dynamic lookups in a structure loaded from somewhere and stuff, but the idea is the same.
Yes the problem is known and solved since decades. Hash functions.
Basically you have a set of values (here strings like "gd322", "g344d") and you want to know if some other value v is among them.
The idea is to put the strings in a big array, at an index calculated from their values by some function. Given a value v, you'll compute an index the same way, and check whether the value v is here or not. Much faster than checking the whole array.
Of course there is a problem with different values falling at the same place : collisions. Some magic is needed then : perfect hash functions whose coefficients are tweaked so values from the initial set don't cause any collisions.

Fast check if element is in MATLAB matrix

I would like to verify whether an element is present in a MATLAB matrix.
At the beginning, I implemented as follows:
if ~isempty(find(matrix(:) == element))
which is obviously slow. Thus, I changed to:
if sum(matrix(:) == element) ~= 0
but this is again slow: I am calling a lot of times the function that contains this instruction, and I lose 14 seconds each time!
Is there a way of further optimize this instruction?
Thanks.
If you just need to know if a value exists in a matrix, using the second argument of find to specify that you just want one value will be slightly faster (25-50%) and even a bit faster than using sum, at least on my machine. An example:
matrix = randi(100,1e4,1e4);
element = 50;
~isempty(find(matrix(:)==element,1))
However, in recent versions of Matlab (I'm using R2014b), nnz is finally faster for this operation, so:
matrix = randi(100,1e4,1e4);
element = 50;
nnz(matrix==element)~=0
On my machine this is about 2.8 times faster than any other approach (including using any, strangely) for the example provided. To my mind, this solution also has the benefit of being the most readable.
In my opinion, there are several things you could try to improve performance:
following your initial idea, i would go for the function any to test is any of the equality tests had a success:
if any(matrix(:) == element)
I tested this on a 1000 by 1000 matrix and it is faster than the solutions you have tested.
I do not think that the unfolding matrix(:) is penalizing since it is equivalent to a reshape and Matlab does this in a smart way where it does not actually allocate and move memory since you are not modifying the temporary object matrix(:)
If your does not change between the calls to the function or changes rarely you could simply use another vector containing all the elements of your matrix, but sorted. This way you could use a more efficient search algorithm O(log(N)) test for the presence of your element.
I personally like the ismember function for this kind of problems. It might not be the fastest but for non critical parts of the code it greatly improves readability and code maintenance (and I prefer to spend one hour coding something that will take day to run than spending one day to code something that will run in one hour (this of course depends on how often you use this program, but it is something one should never forget)
If you can have a sorted copy of the elements of your matrix, you could consider using the undocumented Matlab function ismembc but remember that inputs must be sorted non-sparse non-NaN values.
If performance really is critical you might want to write your own mex file and for this task you could even include some simple parallelization using openmp.
Hope this helps,
Adrien.

Where is DropWhile in Mathematica?

Mathematica 6 added TakeWhile, which has the syntax:
TakeWhile[list, crit]
gives elements ei from the beginning of list, continuing so long as crit[ei] is True.
There is however no corresponding "DropWhile" function. One can construct DropWhile using LengthWhile and Drop, but it almost seems as though one is discouraged from using DropWhile. Why is this?
To clarify, I am not asking for a way to implement this function. Rather: why is it not already present? It seems to me that there must be a reason for its absence other than an oversight, or it would have been corrected by now. Is there something inefficient, undesirable, or superfluous about DropWhile?
There appears to be some ambiguity about the function of DropWhile, so here is an example:
DropWhile = Drop[#, LengthWhile[#, #2]] &;
DropWhile[{1,2,3,4,5}, # <= 3 &]
Out= {4, 5}
Just a blind guess.
There are a lot list operations that could take a while criteria. For example:
Total..While
Accumulate..While
Mean..While
Map..While
Etc..While
They are not difficult to construct, anyway.
I think those are not included just because the number of "primitive" functions is already growing too long, and the criteria of "is it frequently needed and difficult to implement with good performance by the user?" is prevailing in those cases.
The ubiquitous Lists in Mathematica are fixed length vectors, and when they are of a machine numbers it is a packed array.
Thus the natural functions for a recursively defined linked list (e.g. in Lisp or Haskell) are not the primary tools in Mathematica.
So I am inclined to think this explains why Wolfram did not fill out its repertoire of manipulation functions.

Scala performance question

In the article written by Daniel Korzekwa, he said that the performance of following code:
list.map(e => e*2).filter(e => e>10)
is much worse than the iterative solution written using Java.
Can anyone explain why? And what is the best solution for such code in Scala (I hope it's not a Java iterative version which is Scala-fied)?
The reason that particular code is slow is because it's working on primitives but it's using generic operations, so the primitives have to be boxed. (This could be improved if List and its ancestors were specialized.) This will probably slow things down by a factor of 5 or so.
Also, algorithmically, those operations are somewhat expensive, because you make a whole list, and then make a whole new list throwing a few components of the intermediate list away. If you did it in one swoop, then you'd be better off. You could do something like:
list collect (case e if (e*2>10) => e*2)
but what if the calculation e*2 is really expensive? Then you could
(List[Int]() /: list)((ls,e) => { val x = e*2; if (x>10) x :: ls else ls }
except that this would appear backwards. (You could reverse it if need be, but that requires creating a new list, which again isn't ideal algorithmically.)
Of course, you have the same sort of algorithmic problems in Java if you're using a singly linked list--your new list will end up backwards, or you have to create it twice, first in reverse and then forwards, or you have to build it with (non-tail) recursion (which is easy in Scala, but inadvisable for this sort of thing in either language since you'll exhaust the stack), or you have to create a mutable list and then pretend afterwards that it's not mutable. (Which, incidentally, you can do in Scala also--see mutable.LinkedList.)
Basically it's traversing a list twice. Once for multiplying every element with two. And then another time to filter the results.
Think of it as one while loop producing a LinkedList with the intermediate results. And then another loop applying the filter to produce the final results.
This should be faster:
list.view.map(e => e * 2).filter(e => e > 10).force
The solution lies mostly with JVM. Though Scala has a workaround in the figure of #specialization, that increases the size of any specialized class hugely, and only solves half the problem -- the other half being the creation of temporary objects.
The JVM actually does a good job optimizing a lot of it, or the performance would be even more terrible, but Java does not require the optimizations that Scala does, so JVM does not provide them. I expect that to change to some extent with the introduction of SAM not-real-closures in Java.
But, in the end, it comes down to balancing the needs. The same while loop that Java and Scala do so much faster than Scala's function equivalent can be done faster yet in C. Yet, despite what the microbenchmarks tell us, people use Java.
Scala approach is much more abstract and generic. Therefore it is hard to optimize every single case.
I could imagine that HotSpot JIT compiler might apply stream- and loop-fusion to the code in the future if it sees that the immediate results are not used.
Additionally the Java code just does much more.
If you really just want to mutate over a datastructure, consider transform.
It looks a bit like map but doesn't create a new collection, e. g.:
val array = Array(1,2,3,4,5,6,7,8,9,10).transform(_ * 2)
// array is now WrappedArray(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
I really hope some additional in-place operations will be added ion the future...
To avoid traversing the list twice, I think the for syntax is a nice option here:
val list2 = for(v <- list1; e = v * 2; if e > 10) yield e
Rex Kerr correctly states the major problem: Operating on immutable lists, the stated piece of code creates intermediate lists in memory. Note that this is not necessarily slower than equivalent Java code; you just never use immutable datastructures in Java.
Wilfried Springer has a nice, Scala idomatic solution. Using view, no (manipulated) copies of the whole list are created.
Note that using view might not always be ideal. For example, if your first call is filter that is expected to throw away most of the list, is might be worthwhile to create the shorter version explicitly and use view only after that in order to improve memory locality for later iterations.
list.filter(e => e*2>10).map(e => e*2)
This attempt reduces first the List. So the second traversing is on less elements.

How to prepend a column to a matrix?

Ok, imagine I have this Matrix: {{1,2},{2,3}}, and I'd rather have {{4,1,2},{5,2,3}}. That is, I prepended a column to the matrix. Is there an easy way to do it?
My best proposal is this:
PrependColumn[vector_List, matrix_List] :=
Outer[Prepend[#1, #2] &, matrix, vector, 1]
But it obfuscates the code and constantly requires loading more and more code. Isn't this built in somehow?
Since ArrayFlatten was introduced in Mathematica 6 the least obfuscated solution must be
matrix = {{1, 2}, {2, 3}}
vector = {{4}, {5}}
ArrayFlatten#{{vector, matrix}}
A nice trick is that replacing any matrix block with 0 gives you a zero block of the right size.
I believe the most common way is to transpose, prepend, and transpose again:
PrependColumn[vector_List, matrix_List] :=
Transpose[Prepend[Transpose[matrix], vector]]
I think the least obscure is the following way of doing this is:
PrependColumn[vector_List, matrix_List] := MapThread[Prepend, {matrix, vector}];
In general, MapThread is the function that you'll use most often for tasks like this one (I use it all the time when adding labels to arrays before formating them nicely with Grid), and it can make things a lot clearer and more concise to use Prepend instead of the equivalent Prepend[#1, #2]&.
THE... ABSOLUTELY.. BY FAR... FASTEST method to append or prepend a column from my tests of various methods on array RandomReal[100,{10^8,5}] (kids, don't try this at home... if your machine isn't built for speed and memory, operations on an array this size are guaranteed to hang your computer)
...is this: Append[tmp\[Transpose], Range#Length#tmp]\[Transpose].
Replace Append with Prepend at will.
The next fastest thing is this: Table[tmp[[n]]~Join~{n}, {n, Length#tmp}] - almost twice as slow.

Resources