How to use List, Map, etc. from TypeScript - interop

I have a combo ReScript and TypeScript project. I want to use gentype to expose TypeScript friendly objects. Gentype works for string, bool, records, and other basic objects without any hassles. How can I work with list and char and other ReScript-specific types from TypeScript?
There is a shims feature but I don't know how to do that. I would assume there are built-in shims for the standard Belt library but can't find them.
I plan to do most of the calculations and heavy-lifting inside Rescript. So when the info is sent to TypeScript, I'll probably just be consuming the data, not modifying it. This might be iterating through a list to display the contents in a non-Rescript React project. I don't plan to manipulate the data.
One option is to convert lists to arrays inside ReScript before exporting the results. So any time I want to consume information from TypeScript, I'll create a function and/or type in ReScript that only has bool, number, string, array, and object. I could create an "interop" module with all the stuff I want to consume or use from TypeScript. This provides a clean separation between the ReScript and TypeScript world and is easy to get my head around but seems a little inefficient (like converting all lists to arrays) and extra work.
On the other hand, if using the Belt objects like list and map is cumbersome from TypeScript, even with the shims, then I'm probably better off creating my "interop" module anyway.
What is the recommended/simplest/best way to use list, map, char, and other ReScript specific objects from within TypeScript? Are there friendly Typescript definitions for the Belt standard library that I could use even if I wasn't using ReScript?
===
One additional note. I have experience trying to use F# (functional) from C#. It was painful. The best solution for me in that environment was to create an interface on the F# side that was easily consumable on the C# side that did not use the F# native objects.

As rescript compiles to JavaScript, and the output is very readable and (usually) straight-forward, you could just emulate what it generates.
For example, this rescript code:
let xs = list{1, 2, 3}
let _ = List.map(x => x + 1, xs)
is compiled into this (slightly simplified) JavaScript:
var List = require("./stdlib/list.js");
var xs = {
hd: 1,
tl: {
hd: 2,
tl: {
hd: 3,
tl: /* [] */0
}
}
};
List.map((x) => x + 1, xs);
There is a slight problem with the literal syntax of lists specifically, but that could be simplified a bit by using List.cons instead:
let xs = List.cons(1, List.cons(2, List.cons(3, list{})))
which becomes:
var xs = List.cons(1, List.cons(2, List.cons(3, /* [] */0)));
Pattern matching also isn't as convenient, obviously, but still pretty straight-forward for the simple things at least. For example:
let rec sum = xs => switch xs {
| list{} => 0
| list{x, ...rest} => x + sum(rest)
}
becomes:
function sum(xs) {
if (xs) {
return xs.hd + sum(xs.tl);
} else {
return 0;
}
}
Most other types don't have any special compiler support, and so becomes just plain function calls. Using Belt.Map for example, despite using some advanced language features, compiles to very straight-forward JavaScript:
module IntCmp = Belt.Id.MakeComparable({
type t = int
let cmp = (a, b) => a - b
})
let m = Belt.Map.make(~id=module(IntCmp))
let _ = Belt.Map.set(m, 0, "a")
becomes (more or less):
var Belt_Id = require("./stdlib/belt_Id.js");
var Belt_Map = require("./stdlib/belt_Map.js");
var IntCmp = Belt_Id.MakeComparable({
cmp: (a, b) => a - b
});
var m = Belt_Map.make(IntCmp);
Belt_Map.set(m, 0, "a");

Related

Do any functional programming languages have syntax sugar for changing part of an object?

In imperative programming, there is concise syntax sugar for changing part of an object, e.g. assigning to a field:
foo.bar = new_value
Or to an element of an array, or in some languages an array-like list:
a[3] = new_value
In functional programming, the idiom is not to mutate part of an existing object, but to create a new object with most of the same values, but a different value for that field or element.
At the semantic level, this brings about significant improvements in ease of understanding and composing code, albeit not without trade-offs.
I am asking here about the trade-offs at the syntax level. In general, creating a new object with most of the same values, but a different value for one field or element, is a much more heavyweight operation in terms of how it looks in your code.
Is there any functional programming language that provides syntax sugar to make that operation look more concise? Obviously you can write a function to do it, but imperative languages provide syntax sugar to make it more concise than calling a procedure; do any functional languages provide syntax sugar to make it more concise than calling a function? I could swear that I have seen syntax sugar for at least the object.field case, in some functional language, though I forget which one it was.
(Performance is out of scope here. In this context, I am talking only about what the code looks like and does, not how fast it does it.)
Haskell records have this functionality. You can define a record to be:
data Person = Person
{ name :: String
, age :: Int
}
And an instance:
johnSmith :: Person
johnSmith = Person
{ name = "John Smith"
, age = 24
}
And create an alternation:
johnDoe :: Person
johnDoe = johnSmith {name = "John Doe"}
-- Result:
-- johnDoe = Person
-- { name = "John Doe"
-- , age = 24
-- }
This syntax, however, is cumbersome when you have to update deeply nested records. We've got a library lens that solves this problem quite well.
However, Haskell lists do not provide an update syntax because updating on lists will have an O(n) cost - they are singly-linked lists.
If you want efficient update on list-like collections, you can use Arrays in the array package, or Vectors in the vector package. They both have the infix operator (//) for updating:
alteredVector = someVector // [(1, "some value")]
-- similar to `someVector[1] = "some value"`
it is not built-in, but I think infix notation is convenient enough!
One language with that kind of sugar is F#. It allows you to write
let myRecord3 = { myRecord2 with Y = 100; Z = 2 }
Scala also has sugar for updating a Map:
ms + (k -> v)
ms updated (k,v)
In a language such as Haskell, you would need to write this yourself. If you can express the update as a key-value pair, you might define
let structure' =
update structure key value
or
update structure (key, value)
which would let you use infix notation such as
structure `update` (key, value)
structure // (key, value)
As a proof of concept, here is one possible (inefficient) implementation, which also fails if your index is out of range:
module UpdateList (updateList, (//)) where
import Data.List (splitAt)
updateList :: [a] -> (Int,a) -> [a]
updateList xs (i,y) = let ( initial, (_:final) ) = splitAt i xs
in initial ++ (y:final)
infixl 6 // -- Same precedence as +
(//) :: [a] -> (Int,a) -> [a]
(//) = updateList
With this definition, ["a","b","c","d"] // (2,"C") returns ["a","b","C","d"]. And [1,2] // (2,3) throws a runtime exception, but I leave that as an exercise for the reader.
H. Rhen gave an example of Haskell record syntax that I did not know about, so I’ve removed the last part of my answer. See theirs instead.

What is the most efficient way to replace a list of words without touching html attributes?

I absolutely disagree that this question is a duplicate! I am asking for an efficiency way to replace hundreds of words at once. This is an algorithm question! All the provided links are about to replace one word. Should I repeat that expensive operation hundreds of times? I'm sure that there are better ways as a suffix tree where I sort out html while building that tree. I removed that regex tag since for no good reason you are focusing on that part.
I want to translate a given set of words (more then 100) and translate them. My first idea was to use a simple regular expression that works better then expected. As sample:
const input = "I like strawberry cheese cake with apple juice"
const en2de = {
apple: "Apfel",
strawberry: "Erdbeere",
banana: "Banane",
/* ... */}
input.replace(new RegExp(Object.keys(en2de).join("|"), "gi"), match => en2de[match.toLowerCase()])
This works fine on the first view. However it become strange if you words which contains each other like "pineapple" that would return "pineApfel" which is totally nonsense. So I was thinking about checking word boundaries and similar things. While playing around I created this test case:
Apple is a company
That created the output:
Apfel is a company.
The translation is wrong, which is somehow tolerable, but the link is broken. That must not happen.
So I was thinking about extend the regex to check if there is a quote before. I know well that html parsing with regex is a bad idea, but I thought that this should work anyway. In the end I gave up and was looking for solutions of other devs and found on Stack Overflow a couple of questions, all without answers, so it seems to be a hard problem (or my search skills are bad).
So I went two steps back and was thinking to implement that myself with a parser or something like that. However since I have multiple inputs and I need to ignore the case I was thinking what the best way is.
Right now I think to build a dictionary with pointers to the position of the words. I would store the dict in lower case so that should be fast, I could also skip all words with the wrong prefix etc to get my matches. In the end I would replace the words from the end to the beginning to avoid breaking the indices. But is that way efficiency? Is there a better way to achieve that?
While my sample is in JavaScript the solution must not be in JS as long the solution doesn't include dozens of dependencies which cannot be translated easy to JS.
TL;DR:
I want to replace multiple words by other words in a case insensitive way without breaking html.
You may try a treeWalker and replace the text inplace.
To find words you may tokenize your text, lower case your words and map them.
const mapText = (dic, s) => {
return s.replace(/[a-zA-Z-_]+/g, w => {
return dic[w.toLowerCase()] || w
})
}
const dic = {
'grodzi': 'grodzila',
'laaaa': 'forever',
}
const treeWalker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT
)
// skip body node
let currentNode = treeWalker.nextNode()
while(currentNode) {
const newS = mapText(dic, currentNode.data)
currentNode.data = newS
currentNode = treeWalker.nextNode()
}
p {background:#eeeeee;}
<p>
grodzi
LAAAA
</p>
The link stay untouched.
However mapping each word in an other language is bound to fail (be it missing representation of some word, humour/irony, or simply grammar construct). For this matter (which is a hard problem on its own) you may rely on some tools to translate data for you (neural networks, api(s), ...)
Here is my current work in progress solution of a suffix tree (or at least how I interpreted it). I'm building a dictionary with all words, which are not inside of a tag, with their position. After sorting the dict I replace them all. This works for me without handling html at all.
function suffixTree(input) {
const dict = new Map()
let start = 0
let insideTag = false
// define word borders
const borders = ' .,<"\'(){}\r\n\t'.split('')
// build dictionary
for (let end = 0; end <= input.length; end++) {
const c = input[end]
if (c === '<') insideTag = true
if (c === '>') {
insideTag = false
start = end + 1
continue
}
if (insideTag && c !== '<') continue
if (borders.indexOf(c) >= 0) {
if(start !== end) {
const word = input.substring(start, end).toLowerCase()
const entry = dict.get(word) || []
// save the word and its position in an array so when the word is
// used multiple times that we can use this list
entry.push([start, end])
dict.set(word, entry)
}
start = end + 1
}
}
// last word handling
const word = input.substring(start).toLowerCase()
const entry = dict.get(word) || []
entry.push([start, input.length])
dict.set(word, entry)
// create a list of replace operations, we would break the
// indices if we do that directly
const words = Object.keys(en2de)
const replacements = []
words.forEach(word => {
(dict.get(word) || []).forEach(match => {
// [0] is start, [1] is end, [2] is the replacement
replacements.push([match[0], match[1], en2de[word]])
})
})
// converting the input to a char array and replacing the found ranges.
// beginning from the end and replace the ranges with the replacement
let output = [...input]
replacements.sort((a, b) => b[0] - a[0])
replacements.forEach(match => {
output.splice(match[0], match[1] - match[0], match[2])
})
return output.join('')
}
Feel free to leave a comment how this can be improved.

Performance Difference Using Update Operation on a Mutable Map in Scala with a Large Size Data

I would like to know if an update operation on a mutable map is better in performance than reassignment.
Lets assume I have the following Map
val m=Map(1 -> Set("apple", "banana"),
2 -> Set("banana", "cabbage"),
3 -> Set("cabbage", "dumplings"))
which I would like to reverse into this map:
Map("apple" -> Set(1),
"banana" -> Set(1, 2),
"cabbage" -> Set(2, 3),
"dumplings" -> Set(3))
The code to do so is:
def reverse(m:Map[Int,Set[String]])={
var rm = Map[String,Set[Int]]()
m.keySet foreach { k=>
m(k) foreach { e =>
rm = rm + (e -> (rm.getOrElse(e, Set()) + k))
}
}
rm
}
Would it be more efficient to use the update operator on a map if it is very large in size?
The code using the update on map is as follows:
def reverse(m:Map[Int,Set[String]])={
var rm = scala.collection.mutable.Map[String,Set[Int]]()
m.keySet foreach { k=>
m(k) foreach { e =>
rm.update(e,(rm.getOrElse(e, Set()) + k))
}
}
rm
}
I ran some tests using Rex Kerr's Thyme utility.
First I created some test data.
val rndm = new util.Random
val dna = Seq('A','C','G','T')
val m = (1 to 4000).map(_ -> Set(rndm.shuffle(dna).mkString
,rndm.shuffle(dna).mkString)).toMap
Then I timed some runs with both the immutable.Map and mutable.Map versions. Here's an example result:
Time: 2.417 ms 95% CI 2.337 ms - 2.498 ms (n=19) // immutable
Time: 1.618 ms 95% CI 1.579 ms - 1.657 ms (n=19) // mutable
Time 2.278 ms 95% CI 2.238 ms - 2.319 ms (n=19) // functional version
As you can see, using a mutable Map with update() has a significant performance advantage.
Just for fun I also compared these results with a more functional version of a Map reverse (or what I call a Map inverter). No var or any mutable type involved.
m.flatten{case(k, vs) => vs.map((_, k))}
.groupBy(_._1)
.mapValues(_.map(_._2).toSet)
This version consistently beat your immutable version but still doesn't come close to the mutable timings.
The trade-of between mutable and immutable collections usually narrows down to this:
immutable collections are safer to share and allows to use structural sharing
mutable collections have better performance
Some time ago I did comparison of performance between mutable and immutable Maps in Scala and the difference was about 2 to 3 times in favor of mutable ones.
So, when performance is not critical I usually go with immutable collections for safety and readability.
For example, in your case functional "scala way" of performing this transformation would be something like this:
m.view
.flatMap(x => x._2.map(_ -> x._1)) // flatten map to lazy view of String->Int pairs
.groupBy(_._1) // group pairs by String part
.mapValues(_.map(_._2).toSet) // extract all Int parts into Set
Although I used lazy view to avoid creating intermediate collections, groupBy still internally creates mutable map (you may want to check it's sources, the logic is pretty similar to what you have wrote), which in turn gets converted to immutable Map which then gets discarded by mapValues.
Now, if you want to squeeze every bit of performance you want to use mutable collections and do as little updates of immutable collections as possible.
For your case is means having Map of mutable Sets as you intermediate buffer:
def transform(m:Map[Int, Set[String]]):Map[String, Set[Int]] = {
val accum:Map[String, mutable.Set[Int]] =
m.valuesIterator.flatten.map(_ -> mutable.Set[Int]()).toMap
for ((k, vals) <- m; v <- vals) {
accum(v) += k
}
accum.mapValues(_.toSet)
}
Note, I'm not updating accum once it's created: I'm doing exactly one map lookup and one set update for each value, while in both your examples there was additional map update.
I believe this code is reasonably optimal performance wise. I didn't perform any tests myself, but I highly encourage you to do that on your real data and post results here.
Also, if you want to go even further, you might want to try mutable BitSet instead of Set[Int]. If ints in your data are fairly small it might yield some minor performance increase.
Just using #Aivean method in a functional way:
def transform(mp :Map[Int, Set[String]]) = {
val accum = mp.values.flatten
.toSet.map( (_-> scala.collection.mutable.Set[Int]())).toMap
mp.map {case(k,vals) => vals.map( v => accum(v)+=k)}
accum.mapValues(_.toSet)
}

General-purpose language to specify value constraints

I am looking for a general-purpose way of defining textual expressions which allow a value to be validated.
For example, I have a value which should only be set to 1, 2, 3, 10, 11, or 12.
Its constraint might be defined as: (value >= 1 && value <= 3) || (value >= 10 && value <= 12)
Or another value which can be 1, 3, 5, 7, 9 etc... would have a constraint like value % 2 == 1 or IsOdd(value).
(To help the user correct invalid values, I'd like to show the constraint - so something descriptive like IsOdd is preferable.)
These constraints would be evaluated both on client-side (after user input) and server-side.
Therefore a multi-platform solution would be ideal (specifically Win C#/Linux C++).
Is there an existing language/project which allows evaluation or parsing of similar simple expressions?
If not, where might I start creating my own?
I realise this question is somewhat vague as I am not entirely sure what I am after. Searching turned up no results, so even some terms as a starting point would be helpful. I can then update/tag the question accordingly.
You may want to investigate dependently typed languages like Idris or Agda.
The type system of such languages allows encoding of value constraints in types. Programs that cannot guarantee the constraints will simply not compile. The usual example is that of matrix multiplication, where the dimensions must match. But this is so to speak the "hello world" of dependently typed languages, the type system can do much more for you.
If you end up starting your own language I'd try to stay implementation-independent as long as possible. Look for the formal expression grammars of a suitable programming language (e.g. C) and add special keywords/functions as required. Once you have a formal definition of your language, implement a parser using your favourite parser generator.
That way, even if your parser is not portable to a certain platform you at least have a formal standard from where to start a separate parser implementation.
You may also want to look at creating a Domain Specific Language (DSL) in Ruby. (Here's a good article on what that means and what it would look like: http://jroller.com/rolsen/entry/building_a_dsl_in_ruby)
This would definitely give you the portability you're looking for, including maybe using IronRuby in your C# environment, and you'd be able to leverage the existing logic and mathematical operations of Ruby. You could then have constraint definition files that looked like this:
constrain 'wakeup_time' do
6 <= value && value <= 10
end
constrain 'something_else' do
check (value % 2 == 1), MustBeOdd
end
# constrain is a method that takes one argument and a code block
# check is a function you've defined that takes a two arguments
# MustBeOdd is the name of an exception type you've created in your standard set
But really, the great thing about a DSL is that you have a lot of control over what the constraint files look like.
there are a number of ways to verify a list of values across multiple languages. My preferred method is to make a list of the permitted values and load them into a dictionary/hashmap/list/vector (dependant on the language and your preference) and write a simple isIn() or isValid() function, that will check that the value supplied is valid based on its presence in the data structure. The beauty of this is that the code is trivial and can be implemented in just about any language very easily. for odd-only or even-only numeric validity again, a small library of different language isOdd() functions will suffice: if it isn't odd it must by definition be even (apart from 0 but then a simple exception can be set up to handle that, or you can simply specify in your code documentation that for logical purposes your code evaluates 0 as odd/even (your choice)).
I normally cart around a set of c++ and c# functions to evaluate isOdd() for similar reasons to what you have alluded to, and the code is as follows:
C++
bool isOdd( int integer ){ return (integer%2==0)?false:true; }
you can also add inline and/or fastcall to the function depending on need or preference; I tend to use it as an inline and fastcall unless there is a need to do otherwise (huge performance boost on xeon processors).
C#
Beautifully the same line works in C# just add static to the front if it is not going to be part of another class:
static bool isOdd( int integer ){ return (integer%2==0)?false:true; }
Hope this helps, in any event let me know if you need any further info:)
Not sure if it's what you looking for, but judging from your starting conditions (Win C#/Linux C++) you may not need it to be totally language agnostic. You can implement such a parser yourself in C++ with all the desired features and then just use it in both C++ and C# projects - thus also bypassing the need to add external libraries.
On application design level, it would be (relatively) simple - you create a library which is buildable cross-platform and use it in both projects. The interface may be something simple like:
bool VerifyConstraint_int(int value, const char* constraint);
bool VerifyConstraint_double(double value, const char* constraint);
// etc
Such interface will be usable both in Linux C++ (by static or dynamic linking) and in Windows C# (using P/Invoke). You can have same codebase compiling on both platforms.
The parser (again, judging from what you've described in the question) may be pretty simple - a tree holding elements of types Variable and Expression which can be Evaluated with a given Variable value.
Example class definitions:
class Entity {public: virtual VARIANT Evaluate() = 0;} // boost::variant may be used typedef'd as VARIANT
class BinaryOperation: public Entity {
private:
Entity& left;
Entity& right;
enum Operation {PLUS,MINUS,EQUALS,AND,OR,GREATER_OR_EQUALS,LESS_OR_EQUALS};
public:
virtual VARIANT Evaluate() override; // Evaluates left and right operands and combines them
}
class Variable: public Entity {
private:
VARIANT value;
public:
virtual VARIANT Evaluate() override {return value;};
}
Or, you can just write validation code in C++ and use it both in C# and C++ applications :)
My personal choice would be Lua. The downside to any DSL is the learning curve of a new language and how to glue the code with the scripts but I've found Lua has lots of support from the user base and several good books to help you learn.
If you are after making somewhat generic code that a non programmer can inject rules for allowable input it's going to take some upfront work regardless of the route you take. I highly suggest not rolling your own because you'll likely find people wanting more features that an already made DSL will have.
If you are using Java then you can use the Object Graph Navigation Library.
It enables you to write java applications that can parse,compile and evaluate OGNL expressions.
OGNL expressions include basic java,C,C++,C# expressions.
You can compile an expression that uses some variables, and then evaluate that expression
for some given variables.
An easy way to achieve validation of expressions is to use Python's eval method. It can be used to evaluate expressions just like the one you wrote. Python's syntax is easy enough to learn for simple expressions and english-like. Your expression example is translated to:
(value >= 1 and value <= 3) or (value >= 10 and value <= 12)
Code evaluation provided by users might pose a security risk though as certain functions could be used to be executed on the host machine (such as the open function, to open a file). But the eval function takes extra arguments to restrict the allowed functions. Hence you can create a safe evaluation environment.
# Import math functions, and we'll use a few of them to create
# a list of safe functions from the math module to be used by eval.
from math import *
# A user-defined method won't be reachable in the evaluation, as long
# as we provide the list of allowed functions and vars to eval.
def dangerous_function(filename):
print open(filename).read()
# We're building the list of safe functions to use by eval:
safe_list = ['math','acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp', 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
# Let's test the eval method with your example:
exp = "(value >= 1 and value <= 3) or (value >= 10 and value <= 12)"
safe_dict['value'] = 2
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation: True
# Test with a forbidden method, such as 'abs'
exp = raw_input("type an expression: ")
-> type an expression: (abs(-2) >= 1 and abs(-2) <= 3) or (abs(-2) >= 10 and abs(-2) <= 12)
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation:
-> Traceback (most recent call last):
-> File "<stdin>", line 1, in <module>
-> File "<string>", line 1, in <module>
-> NameError: name 'abs' is not defined
# Let's test it again, without any extra parameters to the eval method
# that would prevent its execution
print "expression evaluation: ", eval(exp)
-> expression evaluation: True
# Works fine without the safe dict! So the restrictions were active
# in the previous example..
# is odd?
def isodd(x): return bool(x & 1)
safe_dict['isodd'] = isodd
print "expression evaluation: ", eval("isodd(7)", {"__builtins__":None},safe_dict)
-> expression evaluation: True
print "expression evaluation: ", eval("isodd(42)", {"__builtins__":None},safe_dict)
-> expression evaluation: False
# A bit more complex this time, let's ask the user a function:
user_func = raw_input("type a function: y = ")
-> type a function: y = exp(x)
# Let's test it:
for x in range(1,10):
# add x in the safe dict
safe_dict['x']=x
print "x = ", x , ", y = ", eval(user_func,{"__builtins__":None},safe_dict)
-> x = 1 , y = 2.71828182846
-> x = 2 , y = 7.38905609893
-> x = 3 , y = 20.0855369232
-> x = 4 , y = 54.5981500331
-> x = 5 , y = 148.413159103
-> x = 6 , y = 403.428793493
-> x = 7 , y = 1096.63315843
-> x = 8 , y = 2980.95798704
-> x = 9 , y = 8103.08392758
So you can control the allowed functions that should be used by the eval method, and have a sandbox environment that can evaluate expressions.
This is what we used in a previous project I worked in. We used Python expressions in custom Eclipse IDE plug-ins, using Jython to run in the JVM. You could do the same with IronPython to run in the CLR.
The examples I used in part inspired / copied from the Lybniz project explanation on how to run a safe Python eval environment. Read it for more details!
You might want to look at Regular-Expressions or RegEx. It's proven and been around for a long time. There's a regex library all the major programming/script languages out there.
Libraries:
C++: what regex library should I use?
C# Regex Class
Usage
Regex Email validation
Regex to validate date format dd/mm/yyyy

How can I optimize a multiple (matrix) switch / case algorithm?

Is it possible to optimize this kind of (matrix) algorithm:
// | case 1 | case 2 | case 3 |
// ------|--------|--------|--------|
// | | | |
// case a| a1 | a2 | a3 |
// | | | |
// case b| b1 | b2 | b3 |
// | | | |
// case c| c1 | c2 | c3 |
// | | | |
switch (var)
{
case 1:
switch (subvar)
{
case a:
process a1;
case b:
process b1;
case c:
process c1;
}
case 2:
switch (subvar)
{
case a:
process a2;
case b:
process b2;
case c:
process c2;
}
case 3:
switch (subvar)
{
case a:
process a3;
case b:
process b3;
case c:
process c3;
}
}
The code is fairly simple but you have to imagine more complex with more "switch / case".
I work with 3 variables. According they take the values 1, 2, 3 or a, b, c or alpha, beta, charlie have different processes to achieve. Is it possible to optimize it any other way than through a series of "switch / case?
(Question already asked in french here).
Edit: (from Dran Dane's responses to comments below. These might as well be in this more prominent place!)
"optimize" is to be understood in terms of having to write less code, fewer "switch / case". The idea is to improve readability, maintainability, not performance.
There is maybe a way to write less code via a "Chain of Responsibility" but this solution is not optimal on all points, because it requires the creation of many objects in memory.
It sounds like what you want is a 'Finite State Machine' where using those cases you can activate different processes or 'states'. In C this is usually done with an array (matrix) of function pointers.
So you essentially make an array and put the right function pointers at the right indicies and then you use your 'var' as an index to the right 'process' and then you call it. You can do this in most languages. That way different inputs to the machine activate different processes and bring it to different states. This is very useful for numerous applications; I myself use it all of the time in MCU development.
Edit: Valya pointed out that I probably should show a basic model:
stateMachine[var1][var2](); // calls the right 'process' for input var1, var2
There are no good answers to this question :-(
because so much of the response depends on
The effective goals (what is meant by "optimize", what is unpleasing about the nested switches)
The context in which this construct is going to be applied (what are the ultimate needs implicit to the application)
TokenMacGuy was wise to ask about the goals. I took the time to check the question and its replies on the French site and I'm still puzzled as to the goals... Dran Dane latest response seems to point towards lessening the amount of code / improving readability but let's review for sure:
Processing Speed: not an issue the nested switches are quite efficient, possibly a tat less than 3 multiplications to get an index into a map table, but maybe not even.
Readability: yes possibly an issue, As the number of variables and level increases the combinatorial explosion kicks in, and also the format of the switch statement tends to spread the branching spot and associated values over a long vertical stretch. In this case a 3 dimension (or more) table initialized with fct. pointers puts back together the branching values and the function to be call on on a single line.
Writing less code: Sorry not much help here; at the end of the day we need to account for a relatively high number of combinations and the "map", whatever its form, must be written somewhere. Code generators such as TokenMacGuy's may come handy, it does seem a bit of an overkill in this case. Generators have their place, but I'm not sure it is the case here. One of two case: if the number of variables and level is small enough, the generator is not worth it (takes more time to set it up than to write the actual code in the first place), if the number of variables and levels is significant, the generated code is hard to read, hard to maintain...)
In a nutshell, my recommendation with regards to making the code more readable (and a bit faster to write) is the table/matrix approach described on the French site.
This solution is in two part:
a one time initialization of a 3 dimensional array (for 3 levels); (or a "fancier" container structure if preferred: a tree for example) . This is done with code like:
// This is positively more compact / readable
...
FctMap[1][4][0] = fctAlphaOne;
FctMap[1][4][1] = fctAlphaOne;
..
FctMap[3][0][0] = fctBravoCharlie4;
FctMap[3][0][1] = NULL; // impossible case
FctMap[3][0][2] = fctBravoCharlie4; // note how the same fct may serve in mult. places
And a relatively simple snippet wherever the functions need to be called:
if (FctMap[cond1][cond2][cond3]) {
retVal = FctMap[cond1][cond2][cond3](Arg1, Arg2);
if (retVal < 0)
DoSomething(); // anyway we're leveraging the common api to these fct not the switch alternative ....
}
A case which may prompt one NOT using the solution above are if the combination space is relatively sparsely populated (many "branches" in the switch "tree" are not used) or if some of the functions require a different set of parameters; For both of these cases, I'd like to plug a solution Joel Goodwin proposed first here, and which essentially combines the various keys for the several level into one longer key (with separator character if need be), essentially flattening the problem back to a long, but single level switch statement.
Now...
The real discussion should be about why we need such a mapping/decision-tree in the first place. To answer this unfortunately requires understanding the true nature of the underlying application. To be sure I'm not saying that this is indicative of bad design. A big dispatching section may make sense in some applications. However, even with the C language (which the French Site contributors seemed to disqualify to Object Oriented design), it is possible to adopt Object oriented methodology and patterns. Anyway I'm diverging...) It is possible that the application would overall be better served with alternative design patterns where the "information tree about what to call when" has been distributed in several modules and/or several objects.
Apologies to speak about this in rather abstract terms, it's just the lack of application specifics... The point remains: challenge the idea that we need this big dispatching tree; think of alternative approaches to the application at large.
Alors, bonne chance! ;-)
Depending on the language, some form of hash map with the pair (var, subvar) as the key and first-class functions as the values (or whatever your language offers to best approximate that, e.g. instances of classes extending some proper interface in Java) is likely to provide top performance -- and the utter conciseness of fetching the appropriate function (or whatever;-) from the map based on the key, and executing it, leads to high readability for readers familiar with the language and such functional idioms.
The idea of a function pointer is probably best (as per mjv, Shhnap). But, if the code under each case is fairly small, it may be overkill and result in more obfuscation than intended. In that case, I might implement something snappy and fast-to-read like this:
string decision = var1.ToString() + var2.ToString() + var3.ToString();
switch(decision)
{
case "1aa":
....
case "1ab":
....
}
Unfamiliar with your particular scenario so perhaps the previous suggestions are more appropriate.
I had exactly the same problem once, albeit for an immanent mess of a 5-parameter nested switch. I figured, why type all these O(N5) cases myself, why even invent 'nested' function names if the compiler can do this for me. And all this resulted in a 'nested specialized template switch' referring to a 'specialized template database'.
It's a little complicated to write. But I found it worth it: it results in a 'knowledge' database that is very easy to maintain, to debug, to add to etc... And I must admit: a sense of pride.
// the return type: might be an object actually _doing_ something
struct Result {
const char* value;
Result(): value(NULL){}
Result( const char* p ):value(p){};
};
Some variable types for switching:
// types used:
struct A { enum e { a1, a2, a3 }; };
struct B { enum e { b1, b2 }; };
struct C { enum e { c1, c2 }; };
A 'forward declaration' of the knowledge base: the 'api' of the nested switch.
// template database declaration (and default value - omit if not needed)
// specializations may execute code in stead of returning values...
template< A::e, B::e, C::e > Result valuedb() { return "not defined"; };
The actual switching logic (condensed)
// template layer 1: work away the first parameter, then the next, ...
struct Switch {
static Result value( A::e a, B::e b, C::e c ) {
switch( a ) {
case A::a1: return SwitchA<A::a1>::value( b, c );
case A::a2: return SwitchA<A::a2>::value( b, c );
case A::a3: return SwitchA<A::a3>::value( b, c );
default: return Result();
}
}
template< A::e a > struct SwitchA {
static Result value( B::e b, C::e c ) {
switch( b ) {
case B::b1: return SwitchB<a, B::b1>::value( c );
case B::b2: return SwitchB<a, B::b2>::value( c );
default: return Result();
}
}
template< A::e a, B::e b > struct SwitchB {
static Result value( C::e c ) {
switch( c ) {
case C::c1: return valuedb< a, b, C::c1 >();
case C::c2: return valuedb< a, b, C::c2 >();
default: return Result();
}
};
};
};
};
And the knowledge base itself
// the template database
//
template<> Result valuedb<A::a1, B::b1, C::c1 >() { return "a1b1c1"; }
template<> Result valuedb<A::a1, B::b2, C::c2 >() { return "a1b2c2"; }
This is how it can be used.
int main()
{
// usage:
Result r = Switch::value( A::a1, B::b2, C::c2 );
return 0;
}
Yes, there is definitely easier way to do that, both faster and simpler. The idea is basically the same as proposed by Alex Martelli. Instead of seeing you problem as bi-dimentional, see it as some one dimension lookup table.
It means combining var, subvar, subsubvar, etc to get one unique key and use it as your lookup table entry point.
The way to do it depends on the used language. With python combining var, subvar etc. to build a tuple and use it as key in a dictionnary is enough.
With C or such it's usually simpler to convert each keys to enums, then combine them using logical operators to get just one number that you can use in your switch (that's also an easy way to use switch instead of string comparizons with cascading ifs). You also get another benefit doing it. It's quite usual that several treatments in different branches of the initial switch are the same. With the initial form it's quite difficult to make that obvious. You'll probably have some calls to the same functions but it's at differents points in code. Now you can just group the identical cases when writing the switch.
I used such transformation several times in production code and it's easy to do and to maintain.
Summarily you can get something like this... the mix function obviously depends on your application specifics.
switch (mix(var, subvar))
{
case a1:
process a1;
case b1:
process b1;
case c1:
process c1;
case a2:
process a2;
case b2:
process b2;
case c2:
process c2;
case a3:
process a3;
case b3:
process b3;
case c3:
process c3;
}
Perhaps what you want is code generation?
#! /usr/bin/python
first = [1, 2, 3]
second = ['a', 'b', 'c']
def emit(first, second):
result = "switch (var)\n{\n"
for f in first:
result += " case {0}:\n switch (subvar)\n {{\n".format(f)
for s in second:
result += " case {1}:\n process {1}{0};\n".format(f,s)
result += " }\n"
result += "}\n"
return result
print emit(first,second)
#file("autogen.c","w").write(emit(first,second))
This is pretty hard to read, of course, and you might really want a nicer template language to do your dirty work, but this will ease some parts of your task.
If C++ is an option i would try using virtual function and maybe double dispatch. That could make it much cleaner. But it will only probably pay off only if you have many more cases.
This article on DDJ.com might be a good entry.
If you're just trying to eliminate the two-level switch/case statements (and save some vertical space), you can encode the two variable values into a single value, then switch on it:
// Assumes var is in [1,3] and subvar in [1,3]
// and that var and subvar can be cast to int values
switch (10*var + subvar)
{
case 10+1:
process a1;
case 10+2:
process b1;
case 10+3:
process c1;
//
case 20+1:
process a2;
case 20+2:
process b2;
case 20+3:
process c2;
//
case 30+1:
process a3;
case 30+2:
process b3;
case 30+3:
process c3;
//
default:
process error;
}
If your language is C#, and your choices are short enough and contain no special characters you can use reflection and do it with just a few lines of code. This way, instead of manually creating and maintaining an array of function pointers, use one that the framework provides!
Like this:
using System.Reflection;
...
void DispatchCall(string var, string subvar)
{
string functionName="Func_"+var+"_"+subvar;
MethodInfo m=this.GetType().GetMethod(fName);
if (m == null) throw new ArgumentException("Invalid function name "+ functionName);
m.Invoke(this, new object[] { /* put parameters here if needed */ });
}
void Func_1_a()
{
//executed when var=1 and subvar=a
}
void Func_2_charlie()
{
//executed when var=2 and subvar=charlie
}
Solution from developpez.com
Yes, you can optimize it and make it so much cleaner. You can not use such a "Chain of
Responsibility" with a Factory:
public class ProcessFactory {
private ArrayList<Process> processses = null;
public ProcessFactory(){
super();
processses = new ArrayList<Process>();
processses.add(new ProcessC1());
processses.add(new ProcessC2());
processses.add(new ProcessC3());
processses.add(new ProcessC4());
processses.add(new ProcessC5(6));
processses.add(new ProcessC5(22));
}
public Process getProcess(int var, int subvar){
for(Process process : processses){
if(process.canDo(var, subvar)){
return process;
}
}
return null;
}
}
Then just as your processes implement an interface process with canXXX you can easily use:
new ProcessFactory().getProcess(var,subvar).launch();

Resources