How can I optimize a multiple (matrix) switch / case algorithm? - algorithm

Is it possible to optimize this kind of (matrix) algorithm:
// | case 1 | case 2 | case 3 |
// ------|--------|--------|--------|
// | | | |
// case a| a1 | a2 | a3 |
// | | | |
// case b| b1 | b2 | b3 |
// | | | |
// case c| c1 | c2 | c3 |
// | | | |
switch (var)
{
case 1:
switch (subvar)
{
case a:
process a1;
case b:
process b1;
case c:
process c1;
}
case 2:
switch (subvar)
{
case a:
process a2;
case b:
process b2;
case c:
process c2;
}
case 3:
switch (subvar)
{
case a:
process a3;
case b:
process b3;
case c:
process c3;
}
}
The code is fairly simple but you have to imagine more complex with more "switch / case".
I work with 3 variables. According they take the values 1, 2, 3 or a, b, c or alpha, beta, charlie have different processes to achieve. Is it possible to optimize it any other way than through a series of "switch / case?
(Question already asked in french here).
Edit: (from Dran Dane's responses to comments below. These might as well be in this more prominent place!)
"optimize" is to be understood in terms of having to write less code, fewer "switch / case". The idea is to improve readability, maintainability, not performance.
There is maybe a way to write less code via a "Chain of Responsibility" but this solution is not optimal on all points, because it requires the creation of many objects in memory.

It sounds like what you want is a 'Finite State Machine' where using those cases you can activate different processes or 'states'. In C this is usually done with an array (matrix) of function pointers.
So you essentially make an array and put the right function pointers at the right indicies and then you use your 'var' as an index to the right 'process' and then you call it. You can do this in most languages. That way different inputs to the machine activate different processes and bring it to different states. This is very useful for numerous applications; I myself use it all of the time in MCU development.
Edit: Valya pointed out that I probably should show a basic model:
stateMachine[var1][var2](); // calls the right 'process' for input var1, var2

There are no good answers to this question :-(
because so much of the response depends on
The effective goals (what is meant by "optimize", what is unpleasing about the nested switches)
The context in which this construct is going to be applied (what are the ultimate needs implicit to the application)
TokenMacGuy was wise to ask about the goals. I took the time to check the question and its replies on the French site and I'm still puzzled as to the goals... Dran Dane latest response seems to point towards lessening the amount of code / improving readability but let's review for sure:
Processing Speed: not an issue the nested switches are quite efficient, possibly a tat less than 3 multiplications to get an index into a map table, but maybe not even.
Readability: yes possibly an issue, As the number of variables and level increases the combinatorial explosion kicks in, and also the format of the switch statement tends to spread the branching spot and associated values over a long vertical stretch. In this case a 3 dimension (or more) table initialized with fct. pointers puts back together the branching values and the function to be call on on a single line.
Writing less code: Sorry not much help here; at the end of the day we need to account for a relatively high number of combinations and the "map", whatever its form, must be written somewhere. Code generators such as TokenMacGuy's may come handy, it does seem a bit of an overkill in this case. Generators have their place, but I'm not sure it is the case here. One of two case: if the number of variables and level is small enough, the generator is not worth it (takes more time to set it up than to write the actual code in the first place), if the number of variables and levels is significant, the generated code is hard to read, hard to maintain...)
In a nutshell, my recommendation with regards to making the code more readable (and a bit faster to write) is the table/matrix approach described on the French site.
This solution is in two part:
a one time initialization of a 3 dimensional array (for 3 levels); (or a "fancier" container structure if preferred: a tree for example) . This is done with code like:
// This is positively more compact / readable
...
FctMap[1][4][0] = fctAlphaOne;
FctMap[1][4][1] = fctAlphaOne;
..
FctMap[3][0][0] = fctBravoCharlie4;
FctMap[3][0][1] = NULL; // impossible case
FctMap[3][0][2] = fctBravoCharlie4; // note how the same fct may serve in mult. places
And a relatively simple snippet wherever the functions need to be called:
if (FctMap[cond1][cond2][cond3]) {
retVal = FctMap[cond1][cond2][cond3](Arg1, Arg2);
if (retVal < 0)
DoSomething(); // anyway we're leveraging the common api to these fct not the switch alternative ....
}
A case which may prompt one NOT using the solution above are if the combination space is relatively sparsely populated (many "branches" in the switch "tree" are not used) or if some of the functions require a different set of parameters; For both of these cases, I'd like to plug a solution Joel Goodwin proposed first here, and which essentially combines the various keys for the several level into one longer key (with separator character if need be), essentially flattening the problem back to a long, but single level switch statement.
Now...
The real discussion should be about why we need such a mapping/decision-tree in the first place. To answer this unfortunately requires understanding the true nature of the underlying application. To be sure I'm not saying that this is indicative of bad design. A big dispatching section may make sense in some applications. However, even with the C language (which the French Site contributors seemed to disqualify to Object Oriented design), it is possible to adopt Object oriented methodology and patterns. Anyway I'm diverging...) It is possible that the application would overall be better served with alternative design patterns where the "information tree about what to call when" has been distributed in several modules and/or several objects.
Apologies to speak about this in rather abstract terms, it's just the lack of application specifics... The point remains: challenge the idea that we need this big dispatching tree; think of alternative approaches to the application at large.
Alors, bonne chance! ;-)

Depending on the language, some form of hash map with the pair (var, subvar) as the key and first-class functions as the values (or whatever your language offers to best approximate that, e.g. instances of classes extending some proper interface in Java) is likely to provide top performance -- and the utter conciseness of fetching the appropriate function (or whatever;-) from the map based on the key, and executing it, leads to high readability for readers familiar with the language and such functional idioms.

The idea of a function pointer is probably best (as per mjv, Shhnap). But, if the code under each case is fairly small, it may be overkill and result in more obfuscation than intended. In that case, I might implement something snappy and fast-to-read like this:
string decision = var1.ToString() + var2.ToString() + var3.ToString();
switch(decision)
{
case "1aa":
....
case "1ab":
....
}
Unfamiliar with your particular scenario so perhaps the previous suggestions are more appropriate.

I had exactly the same problem once, albeit for an immanent mess of a 5-parameter nested switch. I figured, why type all these O(N5) cases myself, why even invent 'nested' function names if the compiler can do this for me. And all this resulted in a 'nested specialized template switch' referring to a 'specialized template database'.
It's a little complicated to write. But I found it worth it: it results in a 'knowledge' database that is very easy to maintain, to debug, to add to etc... And I must admit: a sense of pride.
// the return type: might be an object actually _doing_ something
struct Result {
const char* value;
Result(): value(NULL){}
Result( const char* p ):value(p){};
};
Some variable types for switching:
// types used:
struct A { enum e { a1, a2, a3 }; };
struct B { enum e { b1, b2 }; };
struct C { enum e { c1, c2 }; };
A 'forward declaration' of the knowledge base: the 'api' of the nested switch.
// template database declaration (and default value - omit if not needed)
// specializations may execute code in stead of returning values...
template< A::e, B::e, C::e > Result valuedb() { return "not defined"; };
The actual switching logic (condensed)
// template layer 1: work away the first parameter, then the next, ...
struct Switch {
static Result value( A::e a, B::e b, C::e c ) {
switch( a ) {
case A::a1: return SwitchA<A::a1>::value( b, c );
case A::a2: return SwitchA<A::a2>::value( b, c );
case A::a3: return SwitchA<A::a3>::value( b, c );
default: return Result();
}
}
template< A::e a > struct SwitchA {
static Result value( B::e b, C::e c ) {
switch( b ) {
case B::b1: return SwitchB<a, B::b1>::value( c );
case B::b2: return SwitchB<a, B::b2>::value( c );
default: return Result();
}
}
template< A::e a, B::e b > struct SwitchB {
static Result value( C::e c ) {
switch( c ) {
case C::c1: return valuedb< a, b, C::c1 >();
case C::c2: return valuedb< a, b, C::c2 >();
default: return Result();
}
};
};
};
};
And the knowledge base itself
// the template database
//
template<> Result valuedb<A::a1, B::b1, C::c1 >() { return "a1b1c1"; }
template<> Result valuedb<A::a1, B::b2, C::c2 >() { return "a1b2c2"; }
This is how it can be used.
int main()
{
// usage:
Result r = Switch::value( A::a1, B::b2, C::c2 );
return 0;
}

Yes, there is definitely easier way to do that, both faster and simpler. The idea is basically the same as proposed by Alex Martelli. Instead of seeing you problem as bi-dimentional, see it as some one dimension lookup table.
It means combining var, subvar, subsubvar, etc to get one unique key and use it as your lookup table entry point.
The way to do it depends on the used language. With python combining var, subvar etc. to build a tuple and use it as key in a dictionnary is enough.
With C or such it's usually simpler to convert each keys to enums, then combine them using logical operators to get just one number that you can use in your switch (that's also an easy way to use switch instead of string comparizons with cascading ifs). You also get another benefit doing it. It's quite usual that several treatments in different branches of the initial switch are the same. With the initial form it's quite difficult to make that obvious. You'll probably have some calls to the same functions but it's at differents points in code. Now you can just group the identical cases when writing the switch.
I used such transformation several times in production code and it's easy to do and to maintain.
Summarily you can get something like this... the mix function obviously depends on your application specifics.
switch (mix(var, subvar))
{
case a1:
process a1;
case b1:
process b1;
case c1:
process c1;
case a2:
process a2;
case b2:
process b2;
case c2:
process c2;
case a3:
process a3;
case b3:
process b3;
case c3:
process c3;
}

Perhaps what you want is code generation?
#! /usr/bin/python
first = [1, 2, 3]
second = ['a', 'b', 'c']
def emit(first, second):
result = "switch (var)\n{\n"
for f in first:
result += " case {0}:\n switch (subvar)\n {{\n".format(f)
for s in second:
result += " case {1}:\n process {1}{0};\n".format(f,s)
result += " }\n"
result += "}\n"
return result
print emit(first,second)
#file("autogen.c","w").write(emit(first,second))
This is pretty hard to read, of course, and you might really want a nicer template language to do your dirty work, but this will ease some parts of your task.

If C++ is an option i would try using virtual function and maybe double dispatch. That could make it much cleaner. But it will only probably pay off only if you have many more cases.
This article on DDJ.com might be a good entry.

If you're just trying to eliminate the two-level switch/case statements (and save some vertical space), you can encode the two variable values into a single value, then switch on it:
// Assumes var is in [1,3] and subvar in [1,3]
// and that var and subvar can be cast to int values
switch (10*var + subvar)
{
case 10+1:
process a1;
case 10+2:
process b1;
case 10+3:
process c1;
//
case 20+1:
process a2;
case 20+2:
process b2;
case 20+3:
process c2;
//
case 30+1:
process a3;
case 30+2:
process b3;
case 30+3:
process c3;
//
default:
process error;
}

If your language is C#, and your choices are short enough and contain no special characters you can use reflection and do it with just a few lines of code. This way, instead of manually creating and maintaining an array of function pointers, use one that the framework provides!
Like this:
using System.Reflection;
...
void DispatchCall(string var, string subvar)
{
string functionName="Func_"+var+"_"+subvar;
MethodInfo m=this.GetType().GetMethod(fName);
if (m == null) throw new ArgumentException("Invalid function name "+ functionName);
m.Invoke(this, new object[] { /* put parameters here if needed */ });
}
void Func_1_a()
{
//executed when var=1 and subvar=a
}
void Func_2_charlie()
{
//executed when var=2 and subvar=charlie
}

Solution from developpez.com
Yes, you can optimize it and make it so much cleaner. You can not use such a "Chain of
Responsibility" with a Factory:
public class ProcessFactory {
private ArrayList<Process> processses = null;
public ProcessFactory(){
super();
processses = new ArrayList<Process>();
processses.add(new ProcessC1());
processses.add(new ProcessC2());
processses.add(new ProcessC3());
processses.add(new ProcessC4());
processses.add(new ProcessC5(6));
processses.add(new ProcessC5(22));
}
public Process getProcess(int var, int subvar){
for(Process process : processses){
if(process.canDo(var, subvar)){
return process;
}
}
return null;
}
}
Then just as your processes implement an interface process with canXXX you can easily use:
new ProcessFactory().getProcess(var,subvar).launch();

Related

Removing mutability without losing speed

I have a function like this:
fun randomWalk(numSteps: Int): Int {
var n = 0
repeat(numSteps) { n += (-1 + 2 * Random.nextInt(2)) }
return n.absoluteValue
}
This works fine, except that it uses a mutable variable, and I would like to make everything immutable when possible, for better safety and readability. So I came up with an equivalent version that doesn't use any mutable variables:
fun randomWalk_seq(numSteps: Int): Int =
generateSequence(0) { it + (-1 + 2 * Random.nextInt(2)) }
.elementAt(numSteps)
.absoluteValue
This also works fine and produces the same results, but it takes 3 times longer.
I used the following way to measure it:
#OptIn(ExperimentalTime::class)
fun main() {
val numSamples = 100000
val numSteps = 15708
repeat(5) {
val randomWalkSamples: IntArray
val duration = measureTime {
randomWalkSamples = IntArray(numSamples) { randomWalk(numSteps) }
}
println(duration)
}
}
I know it's a bit hacky (I could have used JMH but this is just a quick test - at least I know that measureTime uses a monotonic clock). The results for the iterative (mutable) version:
2.965358406s
2.560777033s
2.554363661s
2.564279403s
2.608323586s
As expected, the first line shows it took a bit longer on the first run due to the warming up of the JIT, but the next 4 lines have fairly small variation.
After replacing randomWalk with randomWalk_seq:
6.636866719s
6.980840906s
6.993998111s
6.994038706s
7.018054467s
Somewhat surprisingly, I don't see any warmup time - the first line is always lesser duration than the following 4 lines, every time I run this. And also, every time I run it, the duration keeps increasing, with line 5 always being the greatest duration.
Can someone explain the findings, and also is there any way of making this function not use any mutable variables but still have performance that is close to the mutable version?
Your solution is slower for two main reasons: boxing and the complexity of the iterator used by generateSequence()'s Sequence implementation.
Boxing happens because a Sequence uses its types generically, so it cannot use primitive 32-bit Ints directly, but must wrap them in classes and unwrap them when retrieving the items.
You can see the complexity of the iterator by Ctrl+clicking the generateSequence function to view the source code.
#Михаил Нафталь's suggestion is faster because it avoids the complex iterator of the sequence, but it still has boxing.
I tried writing an overload of sumOf that uses IntProgression directly instead of Iterable<T>, so it won't use boxing, and that resulted in equivalent performance to your imperative code with the var. As you can see, it's inline and when put together with the { -1 + 2 * Random.nextInt(2) } lambda suggested by #Михаил Нафталь, then the resulting compiled code will be equivalent to your imperative code.
inline fun IntProgression.sumOf(selector: (Int) -> Int): Int {
var sum: Int = 0.toInt()
for (element in this) {
sum += selector(element)
}
return sum
}
Ultimately, I don't think you're buying yourself much in the way of code clarity by removing a single var in such a small function. I would say the sequence code is arguably harder to read. vars may add to code complexity in complex algorithms, but I don't think they do in such simple algorithms, especially when there's only one of them and it's local to the function.
Equivalent immutable one-liner is:
fun randomWalk2(numSteps: Int) =
(1..numSteps).sumOf { -1 + 2 * Random.nextInt(2) }.absoluteValue
Probably, even more performant would be to replace
with
so that you'll have one multiplication and n additions instead of n multiplications and (2*n-1) additions:
fun randomWalk3(numSteps: Int) =
(-numSteps + 2 * (1..numSteps).sumOf { Random.nextInt(2) }).absoluteValue
Update
As #Tenfour04 noted, there is no specific stdlib implementation for IntProgression.sumOf, so it's resolved to Iterable<T>.sumOf, which will add unnecessary overhead for int boxing.
So, it's better to use IntArray here instead of IntProgression:
fun randomWalk4(numSteps: Int) =
(-numSteps + 2 * IntArray(numSteps).sumOf { Random.nextInt(2) }).absoluteValue
Still encourage you to check this all with JMH
I think:"Removing mutability without losing speed" is wrong title .because
mutability thing comes to deal with the flow that program want to achieve .
you are using var inside function.... and 100% this var will not ever change from outside this function and that is mutability concept.
if we git rid off from var everywhere why we need it in programming ?

Concise notation for assigning `unique_ptr`?

I have a pointer to a parent class and I want to assign a new child object to that pointer conditionally. Right now, the syntax I have is rather lengthly:
std::unique_ptr<ParentClass> parentPtr;
if (...) {
parentPtr = std::unique_ptr<ParentClass>(new ChildClass1());
} else {
parentPtr = std::unique_ptr<ParentClass>(new ChildClass2());
}
Is there a good way of making this more readable / less lengthly?
Two possibilities would be:
std::unique_ptr<ParentClass> parentPtr(condition ?
(ParentClass*)new ChildClass1() :
(ParentClass*)new ChildClass2());
If condition is complicated, just assign a boolean to it and then write the construction. This solution only works for a binary condition though.
Another is to embrace C++14, and use
parentPtr = std::make_unique<ChildClass>();
First off, the "obvious" solution C ? new X : new Y does not work, since even if X and Y have a common base class A, the types X * and Y * have no common type. This is actually not so surprising after all if you consider that a class can have many bases (direct or indirect) and a given type may appear as a base multiple times.
You could make the conditional operator work by inserting a cast:
A * = C ? static_cast<A *>(new X) : static_cast<A *>(new Y);
But this would quickly get long and tedious to read when you try to apply this to your real situation.
However, as for std::unique_ptr, it offers the reset function which can be used to good effect here:
std::unique_ptr<A> p;
if (C)
{
p.reset(new X);
}
else
{
p.reset(new Y);
}
Now even if the actual new expressions are long, this is still nicely readable.

about memory barriers (why the following example is error)

I read one article,
https://www.kernel.org/doc/Documentation/memory-barriers.txt
In this doc, the following example shown
So don't leave out the ACCESS_ONCE().
It is tempting to try to enforce ordering on identical stores on both
branches of the "if" statement as follows:
q = ACCESS_ONCE(a);
if (q) {
barrier();
ACCESS_ONCE(b) = p;
do_something();
} else {
barrier();
ACCESS_ONCE(b) = p;
do_something_else();
}
Unfortunately, current compilers will transform this as follows at high
optimization levels:
q = ACCESS_ONCE(a);
barrier();
ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
if (q) {
/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
do_something();
} else {
/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
do_something_else();
}
I don't know, why "moveed up" is a bug ? If I write code, I will move "ACCESS_ONE(b) up because both if/else branch execute the same code.
It isn't so much that the moving up is a bug, it's that it exposes a bug in the code.
The intention was to use the conditional on q (from a), to ensure that the write to b is done after the read from a; because both stores are "protected" by a conditional and "stores are not speculated", the CPU shouldn't be making the store until it knows the outcome of the condition, which requires the read to have been done first.
The compiler defeats this intention by seeing that both branches of the conditional start with the same thing, so in a formal sense those statements are not conditioned. The problem with this is explained in the next paragraph:
Now there is no conditional between the load from 'a' and the store to
'b', which means that the CPU is within its rights to reorder them:
The conditional is absolutely required, and must be present in the
assembly code even after all compiler optimizations have been applied.
I'm not experienced enough to know exactly what is meant by barrier(), but apparently it is not powerful enough to enforce the ordering between the two independent memory operations.

Early return statements and cyclomatic complexity

I prefer this writing style with early returns:
public static Type classify(int a, int b, int c) {
if (!isTriangle(a, b, c)) {
return Type.INVALID;
}
if (a == b && b == c) {
return Type.EQUILATERAL;
}
if (b == c || a == b || c == a) {
return Type.ISOSCELES;
}
return Type.SCALENE;
}
Unfortunately, every return statement increases the cyclomatic complexity metric calculated by Sonar. Consider this alternative:
public static Type classify(int a, int b, int c) {
final Type result;
if (!isTriangle(a, b, c)) {
result = Type.INVALID;
} else if (a == b && b == c) {
result = Type.EQUILATERAL;
} else if (b == c || a == b || c == a) {
result = Type.ISOSCELES;
} else {
result = Type.SCALENE;
}
return result;
}
The cyclomatic complexity of this latter approach reported by Sonar is lower than the first, by 3. I have been told that this might be the result of a wrong implementation of the CC metrics. Or is Sonar correct, and this is really better? These related questions seem to disagree with that:
https://softwareengineering.stackexchange.com/questions/118703/where-did-the-notion-of-one-return-only-come-from
https://softwareengineering.stackexchange.com/questions/18454/should-i-return-from-a-function-early-or-use-an-if-statement
If I add support for a few more triangle types, the return statements will add up to make a significant difference in the metric and cause a Sonar violation. I don't want to stick a // NOSONAR on the method, as that might mask other problems by new features/bugs added to the method in the future. So I use the second version, even though I don't really like it. Is there a better way to handle the situation?
Your question relates to https://jira.codehaus.org/browse/SONAR-4857. For the time being all SonarQube analysers are mixing the cyclomatic complexity and essential complexity. From a theoretical point of view return statement should not increment the cc and this change is going to happen in the SQ ecosystem.
Not really an answer, but way too long for a comment.
This SONAR rule seems to be thoroughly broken. You could rewrite
b == c || a == b || c == a
as
b == c | a == b | c == a
and gain two points in this strange game (and maybe even some speed as branching is expensive; but this is on the discretion of the JITc, anyway).
The old rule claims, that the cyclomatic complexity is related to the number of tests. The new one doesn't, and that's a good thing as obviously the number of meaningfull tests for your both snippets is exactly the same.
Is there a better way to handle the situation?
Actually, I do have an answer: For each early return use | instead of || once. :D
Now seriously: There is a bug requesting annotations allowing to disable a single rule, which is marked as fixed. I din't look any further.
Since the question is also about early return statements as a coding style, it would be helpful to consider the effect of size on the return style. If the method or function is small, less than say 30 lines, early returns are no problem, because anyone reading the code can see the whole method at a glance including all of the returns. In larger methods or functions, an early return can be a trap unintentionally set for the reader. If the early return occurs above the code the reader is looking at, and the reader doesn't know the return is above or forgets that it is above, the reader will misunderstand the code. Production code can be too big to fit on one screen.
So whoever is managing a code base for complexity should be allowing for method size in cases where the complexity appears to be problem. If the code takes more than one screen, a more pedantic return style may be justified. If the method or function is small, don't worry about it.
(I use Sonar and have experienced this same issue.)

performance of many if statements/switch cases

If I had literally 1000s of simple if statements or switch statements
ex:
if 'a':
return 1
if 'b':
return 2
if 'c':
return 3
...
...
Would the performance of creating trivial if statements be faster when compared to searching a list for something? I imagined that because every if statement must be tested until the desired output is found (worst case O(n)) it would have the same performance if I were to search through a list. This is just an assumption. I have no evidence to prove this. I am curious to know this.
You could potentially put these things in to delegates that are then in a map, the key of which is the input you've specified.
C# Example:
// declare a map. The input(key) is a char, and we have a function that will return an
// integer based on that char. The function may do something more complicated.
var map = new Dictionary<char, Func<char, int>>();
// Add some:
map['a'] = (c) => { return 1; };
map['b'] = (c) => { return 2; };
map['c'] = (c) => { return 3; };
// etc... ad infinitum.
Now that we have this map, we can quite cleanly return something based on the input
public int Test(char c)
{
Func<char, int> func;
if(map.TryGetValue(c, out func))
return func(c);
return 0;
}
In the above code, we can call Test and it will find the appropriate function to call (if present). This approach is better (imho) than a list as you'd have to potentially search the entire list to find the desired input.
This depends on the language and the compiler/interpreter you use. In many interpreted languages, the performance will be the same, in other languages, the switch statements gives the compiler crucial additional information that it can use to optimize the code.
In C, for instance, I expect a long switch statement like the one you present to use a lookup table under the hood, avoiding explicit comparison with all the different values. With that, your switch decision takes the same time, no matter how many cases you have. A compiler might also hardcode a binary search for the matching case. These optimizations are typically not performed when evaluating a long else if() ladder.
In any case, I repeat, it depends on the interpreter/compiler: If your compiler optimized else if() ladders, but no switch statements, what it could do with a switch statement is quite irrelevant. However, for mainline languages, you should be able to expect all constructs to be optimized.
Apart from that, I advise to use a switch statement wherever applicable, it carries a lot more semantic information to the reader than an equivalent else if() ladder.

Resources