Common causes of Cyclomatic Complexity and their solutions - cyclomatic-complexity

At work we are looking into common problems that lead to high cyclomatic complexity. For example, having a large if-else statement can lead to high cyclomatic complexity, but can be resolved by replacing conditionals with polymorphism. What other examples have you found?

See the NDepend's definition of Cyclomatic Complexity.
Nesting Depth is also a great code metric.
Cyclomatic complexity is a popular procedural software metric equal to the number of decisions that can be taken in a procedure. Concretely, in C# the CC of a method is 1 + {the number of following expressions found in the body of the method}:
if | while | for | foreach | case | default | continue | goto | && | || | catch | ternary operator ?: | ??
Following expressions are not counted for CC computation:
else | do | switch | try | using | throw | finally | return | object creation | method call | field access
Adapted to the OO world, this metric is defined both on methods and classes/structures (as the sum of its methods CC). Notice that the CC of an anonymous method is not counted when computing the CC of its outer method.
Recommendations: Methods where CC is higher than 15 are hard to understand and maintain. Methods where CC is higher than 30 are extremely complex and should be split in smaller methods (except if they are automatically generated by a tool).

Another example to avoid using so many if´s, it's the implementation of a Finite State Machine. Because events fire transitions, so the conditionals are implicit in a clearer way with these transitions that changes the state of the System. The control is easier.
Leave you a link where mentions some of it´s benefits:
http://www.skorks.com/2011/09/why-developers-never-use-state-machines/

Related

How to use Amdahl's Law (overall speedup vs speedup)

Recall Amdahl’s law on estimating the best possible speedup. Answer the following questions.
You have a program that has 40% of its code parallelized on three processors, and just for this fraction of code, a speedup of 2.3 is achieved. What is the overall speedup?
I'm having trouble understanding the difference between speedup and overall speedup in this question. I know there must be a difference by the way this question is worded.
Q : What is the overall speedup?
Best start not with the original and trivial Amdahl's law formula, but by reading a bit more contemporary view, extending the original, where add-on overhead costs are discussed and also an aspect of atomicity-of-split-work was explained.
Two sections,one accelerated by a "local"-speed-up,one overall result
Your original problem-formulation seems to by-pass there explained sorts of problems with real-world process-orchestration overheads by simply postulating a (net-local)-speedup, where a <PAR>-able Section-under-Review related implementation add-on overhead costs become "hidden", expressed but by a sort of inefficiency of having three-times more resources for code-stream execution, yet having but a 2.3 x speedup, not 3.0 x, so spending more than a theoretical 1/3 of the time on actually also initial set-up (an add-on overhead-time, not present in a pure-[SERIAL] code-execution ) + parallel-processing (doing The_useful_work, now on triple the capacity of the code-execution resources) + also terminating and results-collection back (add-on overhead-times, not present in a pure-[SERIAL] code-execution) into the "main"-code.
"Hiding" these natural cost-of-going into/out-of [PARALLEL]-code-execution section(s) simplifies the homework, yet a proper understanding of the real-life costs is crucial not to spend way more (on setups and all other add-on overhead costs, that are un-avoidable in real-world) than one would ever receive back (from a wish-to-get many-processors-harnessed split-processing speedup)
|-------> time
|START:
| |DONE: 100% of the code
| | |
|______________________________________<SEQ>______60%_|_40%__________________<PAR>-able__|
o--------------------------------------<SEQ>----------o----------------------<PAR>-able--o CPU_x runs both <SEQ> and <PAR>-able sections of code, in a pure [SERIAL] process-flow orchestration, one after another
| |
| |
|-------> time
|START: |
| | |DONE: 100% of the code :
o--------------------------------------<SEQ>----------o | :
| o---------o .. .. .. .. ..CPU_1 runs <PAR>'d code
| o---------o .. .. .. .. ..CPU_2 runs <PAR>'d code
| o---------o .. .. .. .. ..CPU_3 runs <PAR>'d code
| | |
| | |
| <_not_1/3_> just ~ 2.3x faster (not 3x) perhaps reflects real-costs (penalisations) of new, add-on, process-organisation related setup + termination overheads
|______________________________________<SEQ>______60%_|_________|~ 40% / 2.3x ~ 17.39% i.e. the <PAR>-section has gained a local ( "net"-section ) speedup of 2.3x instead of 3.0x, achievable on 3-CPU-code-execution streams
| | |
Net overall speedup ( if no other process-organisation releated add-on overhead costs were accrued )
is:
( 60% + ( 40% / 1.0 ) )
---------------------------- ~ 1.2921 x
( 60% + ( 40% / 2.3 ) )

Use machine learning to validate phone numbers

So I am looking to validate user input phone numbers.
So far I have been doing so with Regex. But with different phone number formats from all around the world it's been getting hard to maintain the Regex.
Since I have a lot of datasets of valid phone numbers I figured it might be possible to use a machine learning algorithm.
Because I don't have any prior experience with machine learning, I tried to prototype it by using scikitlearn SVM. It didn't work.
Now I'm curios if this is even a good use case for a machine learning algorithm. If it is, what are some resources I should lookup?
If not, what are some alternatives to machine learning to create a easy to extend phone number validation?
This is a case of mere computer programming, you probably need to refactor your code into some kind of a class that's responsible for validating phone numbers from different countries.
Also from a regex perspective, the question of updating it for international phone numbers have been asked here: What regular expression will match valid international phone numbers? and the best answer is to use the following regex:
\+(9[976]\d|8[987530]\d|6[987]\d|5[90]\d|42\d|3[875]\d|
2[98654321]\d|9[8543210]|8[6421]|6[6543210]|5[87654321]|
4[987654310]|3[9643210]|2[70]|7|1)\d{1,14}$
Regarding machine learning, here's a nice summary of what questions machine learning can answer, which can be summarized in the following list:
Is this A or B?
Is this weird?
How much/how many?
How is it organized?
What should I do next?
Check the blog article (there is also a video within the article) for more details. Your question doesn't really fit in any of the above five categories.
International phone number rules are immensely complicated so it's unlikely a regex will work. Training a machine learning algorithm could potentially work if you have enough data, but there are some weird edge cases and formatting variables (including multiple ways of expressing the same phone number) that would make life difficult.
A better option is to use Google's libphonenumber. It's an open source phone number validation library implemented in C++ and Java, with ports for quite a few other languages.
The given task is Syntax-restricted + subject to Regulatory procedures
Machine Learning would need such a super-set training DataSET, so as to meet the ( Hoeffding's Inequality constrained ) projected error-rate, which is for low level targets by far principally ( almost ) impossible to arrange to train at.
So even the regex-tools are ( almost ) guessing, as the terminal parts of the E.164-"address" are ( almost ) un-maintainable for the global address-space.
Probabilistic ML-learners may get somewhat sense for being harnessed here, but again - these will even knowingly guess ( with a comfort of providing a working estimate of a confidence level achieved by each and every such guess ).
Why?
Because each telephone number ( and here we do not assume the lexical irregularities and similar cosmetic details ) must be conform both the a global set of regulations ( ITU-T governed ), then -- on a lower level -- subject to national set of regulations ( multi-party governed ), and finally there are two distinct phone-number E.164-"address"-assignment procedures, not make the story a bit easier.
An ITU-T RFC 4725 - brief view:
just to realise the [ ITU-T [, NNPA [, CSP [, <privateAdmin> ]]]]-hierarchy of distributed rules, introduced into an ( absolute syntax - distributed governance in ) E.164 number-blocks analyses ( down to an individual number ).
RFC 4725 ENUM Validation Architecture November 2006
These two variants of E.164 number assignment are depicted in
Figure 2:
+--------------------------------------------+
| International Telecommunication Union (ITU)|
+--------------------------------------------+
|
Country codes (e.g., +44)
|
v
+-------------------------------------------+
| National Number Plan Administrator (NNPA) |------------+
+-------------------------------------------+ |
| |
Number Ranges |
(e.g., +44 20 7946 xxxx) |
| |
v |
+--------------------------------------+ |
| Communication Service Provider (CSP) | |
+--------------------------------------+ |
| |
| Single Numbers
Either Single Numbers (e.g., +44 909 8790879)
or Number Blocks (Variant 2)
(e.g., +44 20 7946 0999, +44 20 7946 07xx) |
(Variant 1) |
| |
v |
+----------+ |
| Assignee |<------------------------------+
+----------+
Figure 2: E.164 Number Assignment
(Note: Numbers above are "drama" numbers and are shown for
illustrative purpose only. Assignment polices for similar "real"
numbers in country code +44 may differ.)
As the Assignee (subscriber) data associated with an E.164 number is
the primary source of number assignment information, the NAE usually
holds the authoritative information required to confirm the
assignment.
A CSP that acts as NAE (indirect assignment) may therefore easily
assert the E.164 number assignment for its subscribers. In some
cases, such CSPs operate database(s) containing service information
on their subscribers' numbers.

How can MARS produce weird constants in terms?

I've been reading about an interesting machine learning algorithm, MARS(Multi-variate adaptive regression splines).
As far as I understand the algorithm, from Wikipedia and Friedman's papers, it works in two stages, forward pass and backward pass. I'll ignore backward pass for now, since forward pass is the part I'm interested in. The steps for forward pass, as far as I can tell are.
Start with just the mean of the data.
Generate a new term pair, through exhaustive search
Repeat 2 while improvements are being made
And to generate a term pair MARS appears to do the following:
Select an existing term (e)
Select a variable (x)
Select a value of that variable (v)
Return two terms one of the form e*max(0,x-v) and the other of the form e*max(0, v-x)
And this makes sense to me. I could see how, for example, a data table like this:
+---+---+---+
| A | B | Z |
+---+---+---+
| 5 | 6 | 1 |
| 7 | 2 | 2 |
| 3 | 1 | 3 |
+---+---+---+
Could produce a terms like 2*max(0, B-1) or even 8*max(0, B-1)*max(3-A). However, the wikipedia page has an example that I don't understand. It has an ozone example where the first term is 25. However, it also has term in the final regression that has a coefficient that is negative and fractional. I don't see how this is possible, since the initial term is 5, and you can only multiply by previous terms, and no previous term can have a negative coefficient, that you could ever end up with one...
What am I missing?
As I see it, either I misunderstand term generation, or I misunderstand the simplification process. However, simplification as described seems to only delete terms, not modify them. Can you see what I am missing here?

CUDA / OpenCL cache coherence, locality and space-filling curves

I'm working on a CUDA app that makes use of all available RAM on the card, and am trying to figure out different ways to reduce cache misses.
The problem domain consists of a large 2- or 3-D grid, depending on the type of problem being solved. (For those interested, it's an FDTD simulator). Each element depends on either two or four elements in "parallel" arrays (that is, another array of nearly identical dimensions), so the kernels must access either three or six different arrays.
The Problem
*Hopefully this isn't "too localized". Feel free to edit the question
The relationship between the three arrays can be visualized as (apologize for the mediocre ASCII art)
A[0,0] -C[0,0]- A ---- C ---- A ---- C ---- A
| | | |
| | | |
B[0,0] B B B
| | | |
| | | |
A ---- C ---- A ---- C ---- A ---- C ---- A
| | | |
| | | |
B B B B
| | | |
| | | |
A ---- C ---- A ---- C ---- A ---- C ---- A
| | | |
| | | |
B B B B[3,2]
| | | |
| | | |
A ---- C ---- A ---- C ---- A ---- C ---- A[3,3]
[2,3]
Items connected by lines are coupled. As can be seen above, A[] depends on both B[] and C[], while B[] depends only on A[], as does C[]. All of A[] is updated in the first kernel, and all of B[] and C[] are updated in a second pass.
If I declare these arrays as simple 2D arrays, I wind up with strided memory access. For a very large domain size (3x3 +- 1 in the grid above), this causes occupancy and performance deficiencies.
So, I thought about rearranging the array layout in a Z-order curve:
Also, it would be fairly trivial to interleave these into one array, which should improve fetch performance since (depending on the interleave order) at least half of the elements required for a given cell update would be close to one another. However, it's not clear to me if GPU uses multiple data pointers when accessing multiple arrays. If so, this imagined benefit could actually be a hindrance.
The Questions
I've read that NVidia does this automatically behind the scenes when using texture memory, or a cudaArray. If this is not the case, should I expect the increased latency when crossing large spans (when the Z curve goes from upper right to bottom left at a high subdivision level) to eliminate the benefit of the locality in smaller grids?
Dividing the grid into smaller blocks that can fit in shared memory should certainly help, and the Z order makes this fairly trivial. Should I have a separate kernel pass that updates boundaries between blocks? Will the overhead of launching another kernel be significant compared to the savings I expect ?
Is there any real benefit to using a 2D vs 1D array? I expect memory to be linear, but am unsure if there is any real meaning to the 2D memory layout metaphor that's often used in CUDA literature.
Wow - long question. Thanks for reading and answering any/all of this.
Just to get this off of the unanswered list:
After a lot of benchmarking and playing with different arrangements, the fastest approach I found was to keep the arrays interleaved in z-order so that most of the values required by a thread were located near each other in RAM. This improved cache behavior (and thus performance). Obviously there are many cases where Z order fails to keep required values close together. I wonder if rotating quadrants to reduce "distance" between the end of a Z and the next quadrant, but I haven't tried that.
Thanks to everyone for the advice.

OCaml performance according to matching order

In OCaml, is there any relation between the order in a pattern-matching and performance?
For instance, if I declare a type:
type t = A | B | C
and then perform some pattern-matching as follows:
match t1 with
| A -> ...
| _ -> ...
From a performance point of view, is it equivalent to
match t1 with
| B -> ...
| _ -> ...
assuming in the first case there are as many A's as there are B's in the second?
In other words, should I worry about the order of declaration of constructors in a type, when considering performance?
There is a paper explaining how pattern-matching is compiled in OCaml:
“Optimizing Pattern Matching”, L. Maranget and F. Le Fessant, ICFP’01
It basically says that the semantics is "in order", but that it is usually compiled in the optimal way, independently of the order of lines. Values of constructors don't matter either, it's the number of constructors that makes the difference, i.e. if it is compiled by a tree of comparisons, or by a jump table.
Optimality + exhaustivity test makes pattern-matching in OCaml probably the most wonderful feature of the language, and is much more efficient that writting cascades of "if" manually.
This is an impossible question to answer carefully. However, in practice if you have a type whose constructors are all nullary (i.e., equivalent to small integers), and there are more than a very few of them, but less than a whomping huge pile of them, the code generator will almost certainly use a hardware jump table, which has essentially the same performance for each possible value.
In general, I wouldn't worry about things like this at all until you have identified the slow parts of your code. But there's almost no chance you would be able to speed things up by reordering a set of nullary constructors.

Resources