[It seems my explanations and expectations are not clear at all, so I added precision on how I'd like to use the feature at the end of the post]
I'm currently working on grammars using boost qi. I had a loop construction for a rule cause I needed to build it from the elements of a vector. I have re-written it with simple types, and it looks like:
#include <string>
// using boost 1.43.0
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_eps.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace bqi = boost::spirit::qi;
typedef const char* Iterator;
// function that you can find [here][1]
template<typename P> void test_phrase_parser(char const* input, P const& p, bool full_match = true);
int main()
{
// my working rule type:
bqi::rule<Iterator, std::string()> myLoopBuiltRule;
std::vector<std::string> v;
std::vector<std::string>::const_iterator iv;
v.push_back("abc");
v.push_back("def");
v.push_back("ghi");
v.push_back("jkl");
myLoopBuiltRule = (! bqi::eps);
for(iv = v.begin() ; iv != v.end() ; iv++)
{
myLoopBuiltRule =
myLoopBuiltRule.copy() [ bqi::_val = bqi::_1 ]
| bqi::string(*iv) [ bqi::_val = bqi::_1 ]
;
}
debug(myLoopBuiltRule);
char s[] = " abc ";
test_phrase_parser(s, myLoopBuiltRule);
}
(Looks like here does not want to be replaced by corresponding hyperlink, so here is the address to find function test_phrase_parser(): http://www.boost.org/doc/libs/1_43_0/libs/spirit/doc/html/spirit/qi/reference/basics.html)
All was for the best in the best of all worlds... until I had to pass an argument to this rule. Here is the new rule type:
// my not-anymore-working rule type:
bqi::rule<Iterator, std::string(int*)> myLoopBuiltRule;
'int*' type is for example purpose only, my real pointer is adressing a much more complex class... but still a mere pointer.
I changed my 'for' loop accordingly, i.e.:
for(iv = v.begin() ; iv != v.end() ; iv++)
{
myLoopBuiltRule =
myLoopBuiltRule.copy()(bqi::_r1) [ bqi::_val = bqi::_1 ]
| bqi::string(*iv) [ bqi::_val = bqi::_1 ]
;
}
I had to add a new rule because test_phrase_parser() cannot guess which value is to be given to the int pointer:
bqi::rule<Iterator> myInitialRule;
And change everything that followed the for loop:
myInitialRule = myLoopBuiltRule((int*)NULL);
debug(myLoopBuiltRule);
char s[] = " abc ";
test_phrase_parser(s, myInitialRule);
Then everything crashed:
/home/sylvain.darras/software/repository/software/external/include/boost/boost_1_43_0/boost/spirit/home/qi/nonterminal/rule.hpp:199: error: no matching function for call to ‘assertion_failed(mpl_::failed************ (boost::spirit::qi::rule<Iterator, T1, T2, T3, T4>::operator=(const Expr&)
Then I got crazy and tried:
myLoopBuiltRule =
myLoopBuiltRule.copy(bqi::_r1) [ bqi::_val = bqi::_1 ]
| bqi::string(*iv) [ bqi::_val = bqi::_1 ]
-->
error: no matching function for call to ‘boost::spirit::qi::rule<const char*, std::string(int*), boost::fusion::unused_type, boost::fusion::unused_type, boost::fusion::unused_type>::copy(const boost::phoenix::actor<boost::spirit::attribute<1> >&)’
Then I got mad and wrote:
myLoopBuiltRule =
myLoopBuiltRule(bqi::_r1) [ bqi::_val = bqi::_1 ]
| bqi::string(*iv) [ bqi::_val = bqi::_1 ]
Which compiles since it is perfectly syntactically correct, but which magnificently stack overflows coz it happily, nicely, recursively, calls itself to death...
Then I lost my mind and typed:
myLoopBuiltRule =
jf jhsgf jshdg fjsdgh fjsg jhsdg jhg sjfg jsgh df
Which, as you probably expect, has failed to compile.
You imagine that before writing the above novel, I checked out on the web, but didn't find out anything related to copy() and argument passing in the same time. Has anyone already experienced this problem ? Have I missed something ?
Be assured that any help will be really really appreciated.
PS: Great thanks to hkaiser who has, without knowing it, answered a lot of my boost::qi problems through google (but this one).
Further information:
The purpose of my parser is to read files written in a given language L. The purpose of my post is to propagate my "context" (i.e.: variable definitions and especially constant values, so I can compute expressions).
The number of variable types I handle is small, but it's bound to grow, so I keep these types in a container class. I can loop on these managed types.
So, let's consider a pseudo-algorithm of what I would like to achive:
LTypeList myTypes;
LTypeList::const_iterator iTypes;
bqi::rule<Iterator, LType(LContext*)> myLoopBuiltRule;
myLoopBuiltRule = (! bqi::eps);
for(iTypes = myTypes.begin() ; iTypes != myTypes.end() ; iTypes++)
{
myLoopBuiltRule =
myLoopBuiltRule.copy()(bqi::_r1) [ bqi::_val = bqi::_1 ]
| iTypes->getRule()(bqi::_r1) [ bqi::_val = bqi::_1 ]
}
This is done during initialization and then myLoopBuiltRule is used and reused with different LContext*, parsing multiple types. And since some L types can have bounds, which are integer expressions, and that these integer expressions can exhibit constants, I (think that I) need my inherited attribute to take my LContext around and be able to compute expression value.
Hope I've been clearer in my intentions.
Note I just extended my answer with a few more informational links. In this particular case I have a hunch that you could just get away with the Nabialek trick and replacing the inherited attribute with a corresponding qi::locals<> instead. If I have enough time, I might work out a demonstration later.
Caveats, expositioning the problem
Please be advised that there are issues when copying proto expression trees and spirit parser expressions in particular - it will create dangling references as the internals are not supposed to live past the end of the containing full expressions. See BOOST_SPIRIT_AUTO on Zero to 60 MPH in 2 seconds!
Also see these answers which also concerns themselves with building/composing rules on the fly (at runtime):
Generating Spirit parser expressions from a variadic list of alternative parser expressions
Can Boost Spirit Rules be parameterized which demonstrates how to return rules from a function using boost::proto::deepcopy (like BOOST_SPIRIT_AUTO does, actually)
Nabialek Trick
In general, I'd very strongly advise against combining rules at runtime. Instead, if you're looking to 'add alternatives' to a rule at runtime, you can always use qi::symbols<> instead. The trick is to store a rule in the symbol-table and use qi::lazy to call the rule. In particular, this is known as the Nabialek Trick.
I have a toy command-line arguments parser here that demonstrates how you could use this idiom to match a runtime-defined set of command line arguments:
https://gist.github.com/sehe/2a556a8231606406fe36
Limitations of qi::lazy, what's next?
Unfortunately, qi::lazy does not support inherited arguments see e.g.
http://boost.2283326.n4.nabble.com/pass-inhertited-attributes-to-nabialek-trick-td2679066.html
You might be better off writing a custom parser component, as documented here:
http://boost-spirit.com/home/articles/qi-example/creating-your-own-parser-component-for-spirit-qi/
I'll try to find some time to work out a sample that replaces inherited arguments by qi::locals later.
Related
I want to implement simple image processing routine quite similar to Auto Levels, so need to precalculate thresholds, make LUT and then make histogram stretching/normalization applying LUT.
But my question is not about algorithm side, it is about using extern defined functions, because i need a couple of while cycles for LUT calculation and i think using extern is good for it.
I tried following examples from Halide sources and checked this question too
I use AOT compilation currently testing on PC(winx64), aiming for arm in future, and have the following generator code:
Var x("x"), y("y");
Func make_a_root{ "make_a_root" };
Buffer<bitType> Lut{256, "lut"};
make_a_root(x, y) = inputY(x, y);
ExternFuncArgument arg = make_a_root;
Func g;
g.define_extern("generateAutoLevelsLut", { arg }, UInt(8), 2, Halide::NameMangling::CPlusPlus);
g.compute_root();
inputY has Input<Buffer<uint8_t>> inputY{ "input_y", 2 }; type
First i just want to make it run the call, so function body makes nothing but print (can i define function in same cpp file as generator?)
int generateAutoLevelsLut(halide_buffer_t * input, halide_buffer_t * out)
{
printf("\nextern call\n");
return 0;
}
I tried default mangling with extern "C" too.
Never succeeded getting print message though, so my question is, why this happenin. Is it just misunderstanding on some syntax or are there any problem with calling extern function from generator code?
EDIT:
Added usage of extern like 'out(x,y) = g(x,y)' (lvalue should be actually used!) , and it started to make a call. Now struggling with host == NULL. Digging into bounds inference stuff.
EDIT 2:
I added basic bounds inference checks, now it does not crash.. The next problem i have now, is: Is it possible to make call to external function, without actually influencing output result in direct manner?
Let me concretise what i mean.
The generator code looks like following:
Buffer<bitType> lut{256, "lut"};
args[0] = inputY;
args[1] = lut;
g.define_extern("generateAutoLevelsLut", args, { UInt(8) }, 2, Halide::NameMangling::C);
outputY(x, y) = g(x, y); // Call line
g.compute_root();
outputY.compute_root();
Extern functon code fills second input 'lut' with some dummy LUT:
Halide::Runtime::Buffer<uint16_t> im2Buffer(*input2);
Mat im2Mat(Size(im2Buffer.width(), im2Buffer.height()), CVC_8U, im2Buffer.data(), im2Buffer.stride(1));
for (int i = 0; i < 256; i++)
im2Mat.at<uchar>(i) = i;
And if i comment the 'Call line' in generator, it optimizes away the call to extern at all.
I want to make something like:
Func lutRoot;
lutRoot(x) = lut(x); // to convert from Buffer
outputY(x, y) = autoLevelsPrecalcLut(inputY, lutRoot)(x, y);
And here lut is implicitly passed into extern and filled there. But it doesn't work, as well as other variants which ignore the modification of 'output'... or maybe this whole approach is wrong?
Any suggestions? Thanks
EDIT 3:
Solved task avoiding extern calls, replacing while cycles with argmin and RDom combo, but original question about extern still remains
That should work (or fail with a linker error if it wasn't going to). It's possible the Halide pipeline doesn't think it needs to call your extern function. E.g. does something use the result?
Alternatively, try stderr instead, just in case it's an output stream buffering issue. That extern function definition is likely to cause Halide to error out (because it doesn't reply to the bounds inference query), and erroring out calls abort by default, which would swallow things printed to stdout.
Consider the below function,
public static int foo(int x){
return x + 5;
}
Now, let us call it,
int in = /*Input taken from the user*/;
int x = foo(10); // ... (1)
int y = foo(in); // ... (2)
Here, can the compiler change
int x = foo(10); // ... (1)
to
int x = 15; // ... (1)
by evaluating the function call during compile time since the input to the function is available during compile time ?
I understand this is not possible during the call marked (2) because the input is available only during run time.
I do not want to know a way of doing it in any specific language. I would like to know why this can or can not be a feature of a compiler itself.
C++ does have a method for this:
Have a read up on the 'constexpr' keyword in C++11, it allows compile time evaluation of functions.
They have a limitation: the function must be a return statement (not multiple lines of code), but can call other constexpr functions (C++14 does not have this limitation AFAIK).
static constexpr int foo(int x){
return x + 5;
}
EDIT:
Why a compiler might not evaluate a function (just my guess):
It might not be appropriate to remove a function by evaluating it without being told.
The function could be used in different compilation units, and with static/dynamic inputs: thus evaluating it in some circumstances and adding a call in other places.
This use would provide inconsistent execution times (especially on a deterministic platform like AVR) where timing may be important, or at least need to be predictable.
Also interrupts (and how the compiler interacts with them) may come into play here.
EDIT:
constexpr is actually stronger -- it requires that the compiler do this. The compiler is free to fold away functions without constexpr, but the programmer can't rely on it doing so.
Can you give an example in the case where the user would have benefited from this but the compiler chose not to do it ?
inline functions may, or may not resolve to constant expressions which could be optimized into the end result.
However, a constexpr guarantees it. An inline function cannot be used as a compile time constant whereas constexpr can allow you to formulate compile time functions and more so, objects.
A basic example where constexpr makes a guarantee that inline cannot.
constexpr int foo( int a, int b, int c ){
return a+b+c;
}
int array[ foo(1, 2, 3) ];
And the same as a simple object.
struct Foo{
constexpr Foo( int a, int b, int c ) : val(a+b+c){}
int val;
};
constexpr Foo foo( 1,2,4 );
int array[ foo.val ];
Unless foo.val is a compile time constant, the code above will not compile.
Even as just a function, an inline function has no guarantee. And the linker can also do inlining over multiple compilation units, after the syntax has been compiled (array bounds checked for integer constants).
This is kind of like meta-programming, but without the templates. Of course these examples do not do the topic justice, however very complex solutions would benefit from the ability to use objects and functional programming to achieve a result.
Yes, evaluation can happen during compile time. This comes under the heading of constant folding and function inlining, both of which are common optimizations for optimizing compilers.
Many languages do not have strong distinction between "compile time" and "run time", but the general rule is that the language defines an "execution model" which defines the behavior of any particular program with any particular input (or specifies that it is undefined). The compiler must produce an executable that can read any input and produce the corresponding output as defined by the execution model. What happens inside the executable doesn't matter -- as long as the externally viewed behavior is correct.
Here "input", "output" and "behavior" includes all possible interactions with the environment that are defined in the execution model, including timing effects.
first of all, I apologize for the overly verbose question. I couldn't think of any other way to accurately summarize my problem... Now on to the actual question:
I'm currently experimenting with C++0x rvalue references... The following code produces unwanted behavior:
#include <iostream>
#include <utility>
struct Vector4
{
float x, y, z, w;
inline Vector4 operator + (const Vector4& other) const
{
Vector4 r;
std::cout << "constructing new temporary to store result"
<< std::endl;
r.x = x + other.x;
r.y = y + other.y;
r.z = z + other.z;
r.w = w + other.w;
return r;
}
Vector4&& operator + (Vector4&& other) const
{
std::cout << "reusing temporary 2nd operand to store result"
<< std::endl;
other.x += x;
other.y += y;
other.z += z;
other.w += w;
return std::move(other);
}
friend inline Vector4&& operator + (Vector4&& v1, const Vector4& v2)
{
std::cout << "reusing temporary 1st operand to store result"
<< std::endl;
v1.x += v2.x;
v1.y += v2.y;
v1.z += v2.z;
v1.w += v2.w;
return std::move(v1);
}
};
int main (void)
{
Vector4 r,
v1 = {1.0f, 1.0f, 1.0f, 1.0f},
v2 = {2.0f, 2.0f, 2.0f, 2.0f},
v3 = {3.0f, 3.0f, 3.0f, 3.0f},
v4 = {4.0f, 4.0f, 4.0f, 4.0f},
v5 = {5.0f, 5.0f, 5.0f, 5.0f};
///////////////////////////
// RELEVANT LINE HERE!!! //
///////////////////////////
r = v1 + v2 + (v3 + v4) + v5;
return 0;
}
results in the output
constructing new temporary to store result
constructing new temporary to store result
reusing temporary 1st operand to store result
reusing temporary 1st operand to store result
while I had hoped for something like
constructing new temporary to store result
reusing temporary 1st operand to store result
reusing temporary 2nd operand to store result
reusing temporary 2nd operand to store result
After trying to re-enact what the compiler was doing (I'm using MinGW G++ 4.5.2 with option -std=c++0x in case it matters), it actually seems quite logical. The standard says that arithmetic operations of equal precedence are evaluated/grouped left-to-right (why I assumed right-to-left I don't know, I guess it's more intuitive to me). So what happened here is that the compiler evaluated the sub-expression (v3 + v4) first (since it's in parentheses?), and then began matching the operations in the expression left-to-right against the operator overloads, resulting in a call to Vector4 operator + (const Vector4& other) for the sub-expression v1 + v2. If I want to avoid the unnecessary temporary, I'd have to make sure that no more than one lvalue operand appears to the immediate left of any parenthesized sub-expression, which is counter-intuitive to anyone using this "library" and innocently expecting optimal performance (as in minimizing the creation of temporaries).
(I'm aware that there's ambiguity in my code regarding operator + (Vector4&& v1, const Vector4& v2) and operator + (Vector4&& other) when (v3 + v4) is to be added to the result of v1 + v2, resulting in a warning. But it's harmless in my case and I don't want to add yet another overload for two rvalue reference operands - anyone know if there's a way to disable this warning in gcc?)
Long story short, my question boils down to: Is there any way or pattern (preferably compiler-independent) this vector class could be rewritten to enable arbitrary use of parentheses in expressions that still results in the "optimal" choice of operator overloads (optimal in terms of "performance", i.e. maximizing the binding to rvalue references)? Perhaps I'm asking for too much though and it's impossible... if so, then that's fine too. I just want to make sure I'm not missing anything.
Many thanks in advance
Addendum
First thanks to the quick responses I got, within minutes (!) - I really should have started posting here sooner...
It's becoming tedious replying in the comments, so I think a clarification of my intent with this class design is in order. Maybe you can point me to a fundamental conceptual flaw in my thought process if there is one.
You may notice that I don't hold any resources in the class like heap memory. Its members are only scalar types even. At first sight this makes it a suspect candidate for move-semantics based optimizations (see also this question that actually helped me a great deal grasping the concepts behind rvalue references).
However, since the classes this one is supposed to be a prototype for will be used in a performance-critical context (a 3D engine to be precise), I want to optimize every little thing possible. Low-complexity algorithms and maths-related techniques like look-up tables should of course make up the bulk of the optimizations as anything else would simply be addressing the symptoms and not eradicating the real reason for bad performance. I am well aware of that.
With that out of the way, my intent here is to optimize algebraic expressions with vectors and matrices that are essentially plain-old-data structs without pointers to data in them (mainly due to the performance drawbacks you get with data on the heap [having to dereference additional pointers, cache considerations etc.]).
I don't care about move-assignment or construction, I just don't want more temporaries being created during the evaluation of a complicated algebraic expression than absolutely necessary (usually just one or two, e.g. a matrix and a vector).
Those are my thoughts that might be erroneous. If they are, please correct me:
To achieve this without relying on RVO, return-by-reference is necessary (again: keep in mind I don't have remote resources, only scalar data members).
Returning by reference makes the function-call expression an lvalue, implying the returned object is not a temporary, which is bad, but returning by rvalue reference makes the function-call expression an xvalue (see 3.10.1), which is okay in the context of my approach (see 4)
Returning by reference is dangerous, because of the possibly short lifetime of objects, but:
temporaries are guaranteed to live until the end of the evaluation of the expression they were created in, therefore:
making it safe to return by reference from those operators that take at least one rvalue-reference as their argument, if the object referenced by this rvalue reference argument is the one being returned by reference. Therefore:
Any arbitrary expression that only employs binary operators can be evaluated by creating only one temporary when not more than one PoD-like type is involved, and the binary operations don't require a temporary by nature (like matrix multiplication)
(Another reason to return by rvalue-reference is because it behaves like returning by value in terms of rvalue-ness of the function-call expression; and it's required for the operator/function-call expression to be an rvalue in order to bind to subsequent calls to operators that take rvalue references. As stated in (2), calls to functions that return by reference are lvalues, and would therefore bind to operators with the signature T operator+(const T&, const T&), resulting in the creation of an unnecessary temporary)
I could achieve the desired performance by using a C-style approach of functions like add(Vector4 *result, Vector4 *v1, Vector4 *v2), but come on, we're living in the 21st century...
In summary, my goal is creating a vector class that achieves the same performance as the C-approach using overloaded operators. If that in itself is impossible, than I guess it can't be helped. But I'd appreciate if someone could explain to me why my approach is doomed to fail (the left-to-right operator evaluation issue that was the initial reason for my post aside, of course).
As a matter of fact, I've been using the "real" vector class this one is a simplification of for a while without any crashes or corrupted memory so far. And in fact, I never actually return local objects as references, so there shouldn't be any problems. I dare say what I'm doing is standard-compliant.
Any help on the original issue would of course be appreciated as well!
many thanks for all the patience again
You should not return an rvalue reference, you should return a value. In addition, you should not specify both a member and a free operator+. I'm amazed that even compiled.
Edit:
r = v1 + v2 + (v3 + v4) + v5;
How could you possibly only have one temporary value when you're performing two sub-computations? That's just impossible. You can't re-write the Standard and change this.
You will just have to trust your users to do something not completely stupid, like write the above line of code, and expect to have just one temporary.
I recommend modeling your code after the basic_string operator+() found in chapter 21 of N3225.
I am using Win32::API to call an arbitary function exported in a DLL which accepts a C++ structure pointer.
struct PluginInfo {
int nStructSize;
int nType;
int nVersion;
int nIDCode;
char szName[ 64 ];
char szVendor[ 64 ];
int nCertificate;
int nMinAmiVersion;
};
As we need to use the "pack" function to construct the structure and need to pass an argument
my $name = " " x 64;
my $vendor = " " x 64;
my $pluginInfo = pack('IIIIC64C64II',0,0,0,0,$name,$vendor,0,0);
Its not constructing the structure properly.
It seems that length argument applied to C will gobble those many arguments.
Can some one please suggest the best way to construct this structure form Perl and passon to dll call.
Thanks in advance,
Naga Kiran
Use Z (NUL-padded string) in your template, as in
my $pluginInfo = pack('IIIIZ64Z64II',0,0,0,0,$name,$vendor,0,0);
Also, take a look at Win32::API::Struct, which is part of the Win32::API module.
For anything complicated, check out Convert::Binary::C. It may seem daunting at first, but once you realize its power, it's an eye opener.
Update: Let me add a bit of information. You need to have a look at a specific section of the module's manpage for the prime reason to use it. I'll quote it for convenience:
Why use Convert::Binary::C?
Say you want to pack (or unpack) data
according to the following C
structure:
struct foo {
char ary[3];
unsigned short baz;
int bar;
};
You could of course use Perl's pack
and unpack functions:
#ary = (1, 2, 3);
$baz = 40000;
$bar = -4711;
$binary = pack 'c3 Si', #ary, $baz, $bar;
But this implies that the struct
members are byte aligned. If they were
long aligned (which is the default for
most compilers), you'd have to write
$binary = pack 'c3 x S x2 i', #ary, $baz, $bar;
which doesn't really increase
readability.
Now imagine that you need to pack the
data for a completely different
architecture with different byte
order. You would look into the pack
manpage again and perhaps come up with
this:
$binary = pack 'c3 x n x2 N', #ary, $baz, $bar;
However, if you try to unpack $foo
again, your signed values have turned
into unsigned ones.
All this can still be managed with
Perl. But imagine your structures get
more complex? Imagine you need to
support different platforms? Imagine
you need to make changes to the
structures? You'll not only have to
change the C source but also dozens of
pack strings in your Perl code. This
is no fun. And Perl should be fun.
Now, wouldn't it be great if you could
just read in the C source you've
already written and use all the types
defined there for packing and
unpacking? That's what
Convert::Binary::C does.
Is it possible to optimize this kind of (matrix) algorithm:
// | case 1 | case 2 | case 3 |
// ------|--------|--------|--------|
// | | | |
// case a| a1 | a2 | a3 |
// | | | |
// case b| b1 | b2 | b3 |
// | | | |
// case c| c1 | c2 | c3 |
// | | | |
switch (var)
{
case 1:
switch (subvar)
{
case a:
process a1;
case b:
process b1;
case c:
process c1;
}
case 2:
switch (subvar)
{
case a:
process a2;
case b:
process b2;
case c:
process c2;
}
case 3:
switch (subvar)
{
case a:
process a3;
case b:
process b3;
case c:
process c3;
}
}
The code is fairly simple but you have to imagine more complex with more "switch / case".
I work with 3 variables. According they take the values 1, 2, 3 or a, b, c or alpha, beta, charlie have different processes to achieve. Is it possible to optimize it any other way than through a series of "switch / case?
(Question already asked in french here).
Edit: (from Dran Dane's responses to comments below. These might as well be in this more prominent place!)
"optimize" is to be understood in terms of having to write less code, fewer "switch / case". The idea is to improve readability, maintainability, not performance.
There is maybe a way to write less code via a "Chain of Responsibility" but this solution is not optimal on all points, because it requires the creation of many objects in memory.
It sounds like what you want is a 'Finite State Machine' where using those cases you can activate different processes or 'states'. In C this is usually done with an array (matrix) of function pointers.
So you essentially make an array and put the right function pointers at the right indicies and then you use your 'var' as an index to the right 'process' and then you call it. You can do this in most languages. That way different inputs to the machine activate different processes and bring it to different states. This is very useful for numerous applications; I myself use it all of the time in MCU development.
Edit: Valya pointed out that I probably should show a basic model:
stateMachine[var1][var2](); // calls the right 'process' for input var1, var2
There are no good answers to this question :-(
because so much of the response depends on
The effective goals (what is meant by "optimize", what is unpleasing about the nested switches)
The context in which this construct is going to be applied (what are the ultimate needs implicit to the application)
TokenMacGuy was wise to ask about the goals. I took the time to check the question and its replies on the French site and I'm still puzzled as to the goals... Dran Dane latest response seems to point towards lessening the amount of code / improving readability but let's review for sure:
Processing Speed: not an issue the nested switches are quite efficient, possibly a tat less than 3 multiplications to get an index into a map table, but maybe not even.
Readability: yes possibly an issue, As the number of variables and level increases the combinatorial explosion kicks in, and also the format of the switch statement tends to spread the branching spot and associated values over a long vertical stretch. In this case a 3 dimension (or more) table initialized with fct. pointers puts back together the branching values and the function to be call on on a single line.
Writing less code: Sorry not much help here; at the end of the day we need to account for a relatively high number of combinations and the "map", whatever its form, must be written somewhere. Code generators such as TokenMacGuy's may come handy, it does seem a bit of an overkill in this case. Generators have their place, but I'm not sure it is the case here. One of two case: if the number of variables and level is small enough, the generator is not worth it (takes more time to set it up than to write the actual code in the first place), if the number of variables and levels is significant, the generated code is hard to read, hard to maintain...)
In a nutshell, my recommendation with regards to making the code more readable (and a bit faster to write) is the table/matrix approach described on the French site.
This solution is in two part:
a one time initialization of a 3 dimensional array (for 3 levels); (or a "fancier" container structure if preferred: a tree for example) . This is done with code like:
// This is positively more compact / readable
...
FctMap[1][4][0] = fctAlphaOne;
FctMap[1][4][1] = fctAlphaOne;
..
FctMap[3][0][0] = fctBravoCharlie4;
FctMap[3][0][1] = NULL; // impossible case
FctMap[3][0][2] = fctBravoCharlie4; // note how the same fct may serve in mult. places
And a relatively simple snippet wherever the functions need to be called:
if (FctMap[cond1][cond2][cond3]) {
retVal = FctMap[cond1][cond2][cond3](Arg1, Arg2);
if (retVal < 0)
DoSomething(); // anyway we're leveraging the common api to these fct not the switch alternative ....
}
A case which may prompt one NOT using the solution above are if the combination space is relatively sparsely populated (many "branches" in the switch "tree" are not used) or if some of the functions require a different set of parameters; For both of these cases, I'd like to plug a solution Joel Goodwin proposed first here, and which essentially combines the various keys for the several level into one longer key (with separator character if need be), essentially flattening the problem back to a long, but single level switch statement.
Now...
The real discussion should be about why we need such a mapping/decision-tree in the first place. To answer this unfortunately requires understanding the true nature of the underlying application. To be sure I'm not saying that this is indicative of bad design. A big dispatching section may make sense in some applications. However, even with the C language (which the French Site contributors seemed to disqualify to Object Oriented design), it is possible to adopt Object oriented methodology and patterns. Anyway I'm diverging...) It is possible that the application would overall be better served with alternative design patterns where the "information tree about what to call when" has been distributed in several modules and/or several objects.
Apologies to speak about this in rather abstract terms, it's just the lack of application specifics... The point remains: challenge the idea that we need this big dispatching tree; think of alternative approaches to the application at large.
Alors, bonne chance! ;-)
Depending on the language, some form of hash map with the pair (var, subvar) as the key and first-class functions as the values (or whatever your language offers to best approximate that, e.g. instances of classes extending some proper interface in Java) is likely to provide top performance -- and the utter conciseness of fetching the appropriate function (or whatever;-) from the map based on the key, and executing it, leads to high readability for readers familiar with the language and such functional idioms.
The idea of a function pointer is probably best (as per mjv, Shhnap). But, if the code under each case is fairly small, it may be overkill and result in more obfuscation than intended. In that case, I might implement something snappy and fast-to-read like this:
string decision = var1.ToString() + var2.ToString() + var3.ToString();
switch(decision)
{
case "1aa":
....
case "1ab":
....
}
Unfamiliar with your particular scenario so perhaps the previous suggestions are more appropriate.
I had exactly the same problem once, albeit for an immanent mess of a 5-parameter nested switch. I figured, why type all these O(N5) cases myself, why even invent 'nested' function names if the compiler can do this for me. And all this resulted in a 'nested specialized template switch' referring to a 'specialized template database'.
It's a little complicated to write. But I found it worth it: it results in a 'knowledge' database that is very easy to maintain, to debug, to add to etc... And I must admit: a sense of pride.
// the return type: might be an object actually _doing_ something
struct Result {
const char* value;
Result(): value(NULL){}
Result( const char* p ):value(p){};
};
Some variable types for switching:
// types used:
struct A { enum e { a1, a2, a3 }; };
struct B { enum e { b1, b2 }; };
struct C { enum e { c1, c2 }; };
A 'forward declaration' of the knowledge base: the 'api' of the nested switch.
// template database declaration (and default value - omit if not needed)
// specializations may execute code in stead of returning values...
template< A::e, B::e, C::e > Result valuedb() { return "not defined"; };
The actual switching logic (condensed)
// template layer 1: work away the first parameter, then the next, ...
struct Switch {
static Result value( A::e a, B::e b, C::e c ) {
switch( a ) {
case A::a1: return SwitchA<A::a1>::value( b, c );
case A::a2: return SwitchA<A::a2>::value( b, c );
case A::a3: return SwitchA<A::a3>::value( b, c );
default: return Result();
}
}
template< A::e a > struct SwitchA {
static Result value( B::e b, C::e c ) {
switch( b ) {
case B::b1: return SwitchB<a, B::b1>::value( c );
case B::b2: return SwitchB<a, B::b2>::value( c );
default: return Result();
}
}
template< A::e a, B::e b > struct SwitchB {
static Result value( C::e c ) {
switch( c ) {
case C::c1: return valuedb< a, b, C::c1 >();
case C::c2: return valuedb< a, b, C::c2 >();
default: return Result();
}
};
};
};
};
And the knowledge base itself
// the template database
//
template<> Result valuedb<A::a1, B::b1, C::c1 >() { return "a1b1c1"; }
template<> Result valuedb<A::a1, B::b2, C::c2 >() { return "a1b2c2"; }
This is how it can be used.
int main()
{
// usage:
Result r = Switch::value( A::a1, B::b2, C::c2 );
return 0;
}
Yes, there is definitely easier way to do that, both faster and simpler. The idea is basically the same as proposed by Alex Martelli. Instead of seeing you problem as bi-dimentional, see it as some one dimension lookup table.
It means combining var, subvar, subsubvar, etc to get one unique key and use it as your lookup table entry point.
The way to do it depends on the used language. With python combining var, subvar etc. to build a tuple and use it as key in a dictionnary is enough.
With C or such it's usually simpler to convert each keys to enums, then combine them using logical operators to get just one number that you can use in your switch (that's also an easy way to use switch instead of string comparizons with cascading ifs). You also get another benefit doing it. It's quite usual that several treatments in different branches of the initial switch are the same. With the initial form it's quite difficult to make that obvious. You'll probably have some calls to the same functions but it's at differents points in code. Now you can just group the identical cases when writing the switch.
I used such transformation several times in production code and it's easy to do and to maintain.
Summarily you can get something like this... the mix function obviously depends on your application specifics.
switch (mix(var, subvar))
{
case a1:
process a1;
case b1:
process b1;
case c1:
process c1;
case a2:
process a2;
case b2:
process b2;
case c2:
process c2;
case a3:
process a3;
case b3:
process b3;
case c3:
process c3;
}
Perhaps what you want is code generation?
#! /usr/bin/python
first = [1, 2, 3]
second = ['a', 'b', 'c']
def emit(first, second):
result = "switch (var)\n{\n"
for f in first:
result += " case {0}:\n switch (subvar)\n {{\n".format(f)
for s in second:
result += " case {1}:\n process {1}{0};\n".format(f,s)
result += " }\n"
result += "}\n"
return result
print emit(first,second)
#file("autogen.c","w").write(emit(first,second))
This is pretty hard to read, of course, and you might really want a nicer template language to do your dirty work, but this will ease some parts of your task.
If C++ is an option i would try using virtual function and maybe double dispatch. That could make it much cleaner. But it will only probably pay off only if you have many more cases.
This article on DDJ.com might be a good entry.
If you're just trying to eliminate the two-level switch/case statements (and save some vertical space), you can encode the two variable values into a single value, then switch on it:
// Assumes var is in [1,3] and subvar in [1,3]
// and that var and subvar can be cast to int values
switch (10*var + subvar)
{
case 10+1:
process a1;
case 10+2:
process b1;
case 10+3:
process c1;
//
case 20+1:
process a2;
case 20+2:
process b2;
case 20+3:
process c2;
//
case 30+1:
process a3;
case 30+2:
process b3;
case 30+3:
process c3;
//
default:
process error;
}
If your language is C#, and your choices are short enough and contain no special characters you can use reflection and do it with just a few lines of code. This way, instead of manually creating and maintaining an array of function pointers, use one that the framework provides!
Like this:
using System.Reflection;
...
void DispatchCall(string var, string subvar)
{
string functionName="Func_"+var+"_"+subvar;
MethodInfo m=this.GetType().GetMethod(fName);
if (m == null) throw new ArgumentException("Invalid function name "+ functionName);
m.Invoke(this, new object[] { /* put parameters here if needed */ });
}
void Func_1_a()
{
//executed when var=1 and subvar=a
}
void Func_2_charlie()
{
//executed when var=2 and subvar=charlie
}
Solution from developpez.com
Yes, you can optimize it and make it so much cleaner. You can not use such a "Chain of
Responsibility" with a Factory:
public class ProcessFactory {
private ArrayList<Process> processses = null;
public ProcessFactory(){
super();
processses = new ArrayList<Process>();
processses.add(new ProcessC1());
processses.add(new ProcessC2());
processses.add(new ProcessC3());
processses.add(new ProcessC4());
processses.add(new ProcessC5(6));
processses.add(new ProcessC5(22));
}
public Process getProcess(int var, int subvar){
for(Process process : processses){
if(process.canDo(var, subvar)){
return process;
}
}
return null;
}
}
Then just as your processes implement an interface process with canXXX you can easily use:
new ProcessFactory().getProcess(var,subvar).launch();