Complicated Algorithm - How to store rules separate from processing code? - algorithm

I'm working on a project which will do some complicated analyzing on some user-supplied input. There will be 3 parts of the code:
1) Input supplied by user, such as keywords
2) Rules, such as if keyword 1 is repeated 3 times in keyword 5, do this, etc.
3) And the analyzing itself which executes the rules and processes the user input, and generates the output necessary based on the processing.
Naturally this will lead to a lot of spaghetti code and many, many if statements in the processing code. I want to avoid that, and keep the rules (i.e. the if statements) separately from the code which loops through the user input and generates the output.
How can I do that, i.e. what is the best way?

If you have enough rules that you want to externalize, you could try using a business rules engines, like Drools in Java.
A business rules engine is a software system that executes one or more business rules in a runtime production environment. The rules might come from legal regulation ("An employee can be fired for any reason or no reason but not for an illegal reason"), company policy ("All customers that spend more than $100 at one time will receive a 10% discount"), or other sources. (Wikipedia)
It could be a little bit overhead depending of what you're trying to do. In my company we're using such kind of tools for our quality analysis tool.

Store it in XML. Easy to parse and update.
I had designed a code generator, which can be controllable from a xml file.
For each command I had a entry in the xml. I was processing the node to generate the opcode for that command. Node itself contains the actions I need to do for getting the opcode. For some commands I had to look into database, all those things I had put in this xml file.

Well, i doubt that it is necessary to have hughe if statements if polymorphism is applied correctly.
Actually, you need a proper domain model for your rules. This goes somehow into the direction of the command pattern, depending on the complexitiy of your code maybe in combination with the state machine pattern.
Once you have your model, defining rules is instantiate them correctly.
This could be done by having an xml definition, which is parsed and transformed into your model. But the new modern and even more fancy way would be using DSLs. If you program in Java and have a certain freedom about your libraries, this would be a proper use case for Embedded DSLs with Groovy. Basically you would need a Builder which constructs your model, that's all.

You always can implement factory that will create certain strategies according to passed parameters. And then you will use those strategies in your code without any if.

If it's just detecting keywords, a finite state machine or similar. If it's doing more, then other pattern matching systems, such as rules engines.

Adding an embedded scripting language to your application might help. The rules would then be expressed in scripts, executed by the applications on processing.
The idea is that scripts are easy to change and contain high level logic that will be executed by your application in details.
There are a lot of scripting languages available to do this : lua, Python, Falcon, squirrel, angelscript, etc.

Have a look at rule engines!
The approach from Lars may also be arguable.

Related

Why do we need to separate or breakdown one Use Case into two or more use cases?

Why do you need to, in many instances, separate or breakdown one Use Case into two or more use cases?
The only reason to split a use case in multiple use cases is to share a significant piece of functionality among multiple use cases by isolating that piece of functionality in a separate use case.
Example: 'search product information' may be a separate use case included by use cases 'buy product' and 'hire product'.
Apart from 'include' there are also examples of the same principle using 'extend' or 'generalize'.
By doing so, you prevent that the shared behaviour is copied in multiple use cases, with the chance of growing inconsistencies.
In the previous example: We want to make sure that customers don't get a different way to search for product information when buying compared to when hiring products. With an included use case, people who read the use cases are immediately aware of that fact.
First of all: you don't. Starting to do that means you are doing functional analysis. The point in use case synthesis is to find the goal(s) (aka. added value) the different actors have when interacting with the system under consideration. It's quite futile to separate a goal into sub-goals at that level. Either you have some added value or you don't have it. So if someone has settled a use case and tries to break it down then the use case is either wrong (no use case) or it's useless since the use case already shows the added value.
My personal opinion about include and extend: they are basically evil and a wrong concept introduced by techies (which most of the UML designers are) with no business background. Using them means you are already starting functional analysis. But UCs are synthesized from requirements. That is, you drag your net through that requirements soup and fish out those that fit together to build a story which makes sense - and which delivers added value: a use case.
And as always: read Bittner/Spence about use cases.

Run user-submitted code in Go

I am working on an application which allows users to compare the execution of different string comparison algorithms. In addition to several algorithms (including Boyer-Moore, KMP, and other "traditional" ones) that are included, I want to allow users to put in their own algorithms (these could be their own algorithms or modifications to the existing ones) to compare them.
Is there some way in Go to take code from the user (for example, from an HTML textarea) and execute it?
More specifically, I want the following characteristics:
I provide a method signature and they fill in whatever they want in the method.
A crash or a syntax error in their code should not cause my whole program to crash. It should instead allow me to catch the error and display an error message.
(In this case, I am not worried about security against malicious code because users will only be executing my program on their own machines, so security is their own responsibility.)
If it is not possible to do this natively with Go, I am open to embedding one of the following languages to use for the comparison functions (in order of preference): JavaScript, Python, Ruby, C. Is there any way to do any of those?
A clear No.
But you can do fancy stuff: Why not recompile the program including the user provided code?
Split the stuff into two: One driver which collects user code, recompiles the actual code, executes the actual code and reports the outcome.
Including other interpreters for other languages can be done, e.g. Otto is a Javascript interpreter. (C will be hard :-)
Have you considered doing something similar to the gopherjs playground? According to this, the compilation is being done client-side.

Is there an easy way to replace a deprecated method call in Xcode?

So iOS 6 deprecates presentModalViewController:animated: and dismissModalViewControllerAnimated:, and it replaces them with presentViewController:animated:completion: and dismissViewControllerAnimated:completion:, respectively. I suppose I could use find-replace to update my app, although it would be awkward with the present* methods, since the controller to be presented is different every time. I know I could handle that situation with a regex, but I don't feel comfortable enough with regex to try using it with my 1000+-files-big app.
So I'm wondering: Does Xcode have some magic "update deprecated methods" command or something? I mean, I've described my particular situation above, but in general, deprecations come around with every OS release. Is there a better way to update an app than simply to use find-replace?
You might be interested in Program Transformation Systems.
These are tools that can automatically modify source code, using pattern-directed source-to-source transformations ("if you see this source-level pattern, replace it by that source-level pattern") that operate on code structures rather than text. Done properly, these transformations can be reliable and semantically correct, and they're a lot easier to write than low-level procedural code that navigates and smashes nanoscopic actual tree structures.
It is not the case that using such tools is easy; such tools have to know how to parse the language of interest into compiler data structures, (e.g., ObjectiveC), process the patterns, and regenerate compilable source code from the modified structures. Even with the basic transformation engine, somebody needs to carefully define parsers (and unparsers!) for the dialects of the languages of interest. And it takes time to learn how to use such a even if you have such parsers/unparsers. This is worth it if the changes you need to make are "regular" (in the program transformation sense, not the regexp sense) and widespread (as yours seem to be).
Our DMS Software Reengineering toolkit has an ObjectiveC front end, and can carry out such transformations.
no there is no magic like that

Artifact naming convention

We're doing a big project on OSGi and adding some commons modules. There's some discussion about naming the artifact.
So, one possibility when naming the module is for example:
cmns-definitions (for common definitions), another is cmns-definition, still another is cmns-def. This has some effect also on the package name. Now it's
xx.xxx.xxx.xxx.xxx.commons.definitions, if changing to cmns-def it would be xx.xxx.xxx.xxx.xxx.commons.def.
Inside this package will be classes like enums and other definitions to be used throughout the system.
I personally lean to cmns-definitions since there's not only 1 definition inside the package. Other people point out that java.util doesn't have only 1 utility there for example. Still, java.util is an abbreviation for me. It can mean java utility or java utilities. Same thing happens with commons-lang.
How would you name the package? Why would you choose this name?
cmns-definitions
cmns-definition
cmns-def
Bonus question: How to name something like cmns-exceptions? That's how I name it. Would you name it cmns-xcpt?
Ă‹DIT:
I'm throwing in my own thoughts on this in the hope of being either confirmed or contradicted. If you can, please do.
According to what I think, the background reason why you name something is to make it easier to understand what's inside it. Or, according to Peter Kriens, to make it easy to remember and being able to automate processes via patterns. Both are valid arguments.
My reasoning is as follows in terms of pattern:
1) When a substantivation occurs and it's well known in the industry, follow it on your naming.
Eg:
"features" is a case on this. We have a module called cmns-features. Does this mean we have many features on this module? No. It means "the module that implements the "features" file from Apache karaf".
"commons" is a substantivation of "common" well-accepted on the industry. It doesn't mean "many common". It means "Common code".
If I see extr-commons as a module name, I know that it contains common code for extr (in this case extraction), for example.
2) When a quantity of classes inside the module are cooperating to give a distinct "one and one only" meaning to the whole, use singular form to name it.
The majority of modules are included here. If I name something cmns-persistence-jpa, I mean that whatever classes inside cooperate together to provide the jpa implementation of cmns-persistence-api. I don't expect 2 implementations inside it, but actually a myriad of classes that together make one implementation. Crystal clear to me. No?
3) When a grouping of classes is done with the sole purpose of gathering classes by affinity, but the classes don't cooperate together to no purpose, use plural.
Here is the case for example of cmns-definitions (enums used by the whole system).
Alternatively, using an abbreviation circumvents the problem, e.g. cmns-def which can be also "interpreted expanded" by a human reader to cmns-definitions. Many people use also "xxxx-util" meaning xxxx-utilities.
Still a third option can be used to pack things together, using a name that itself means a pluralization. The word "api" comes to mind, but any word that pluralizes something would do, like "pack".
Support to these cases (3) are well-known modules like commons-collections (using the plural) or commons-dbcp (using abbreviation) or commons-lang (again abbreviation) and anything that uses api to pack classes together by affinity.
From apache:
commons-collections -> many powerful data structures that accelerate development of most significant Java applications
commons-lang -> host of helper utilities for the java.lang API
commons-dbcp -> package of several database connection pools
'it is just a name ...'
I find in my long career that these just names can make a tremendous difference in productivity. I do not think it makes a difference if you use definitions, definition, or def as long as you're consistent and use patterns in the name that are easy to remember and can be used to automate processes. A build based on a consistent naming scheme is infinitely easier to work with than a build with "nice human display" names that are ad-hoc and have no discernible pattern.
If you use patterns, names tend to become shorter. Now people working with these names usually spent a lot of time with them. So their readability is not nearly as important as their mnemonic value. It turns out that abbreviations of 3 or 4 characters are surprisingly powerful. One of the reason is they work well is that there is only one possible abbreviation while if you go longer there are many candidates.
Anyway, most import part is the overall consistency. Good luck.
definitions (or def or definition) is a bad name because it doesn't have any semantic to a reader. You're in an object oriented world (I suppose) - try to follow its conventions and principles. Modules in Maven should be named after the biggest "abstraction" they contain. "Definition" is a form, not a meaning.
Your question is similar to: "Which class name is better FileUtilities or FileUtils". Answer: none.
Basically what you do with the Definitions and Exceptions is to provide kind of an API for your other modules. So I propose to combine definitions, exceptions and add interfaces to it. Then it makes sense to call it all cmns-api. I normally prefer the singular names as they are shorter but you are free to decide as it is just a name.

Standards Document

I am writing a coding standards document for a team of about 15 developers with a project load of between 10 and 15 projects a year. Amongst other sections (which I may post here as I get to them) I am writing a section on code formatting. So to start with, I think it is wise that, for whatever reason, we establish some basic, consistent code formatting/naming standards.
I've looked at roughly 10 projects written over the last 3 years from this team and I'm, obviously, finding a pretty wide range of styles. Contractors come in and out and at times, and sometimes even double the team size.
I am looking for a few suggestions for code formatting and naming standards that have really paid off ... but that can also really be justified. I think consistency and shared-patterns go a long way to making the code more maintainable ... but, are there other things I ought to consider when defining said standards?
How do you lineup parenthesis? Do you follow the same parenthesis guidelines when dealing with classes, methods, try catch blocks, switch statements, if else blocks, etc.
Do you line up fields on a column? Do you notate/prefix private variables with an underscore? Do you follow any naming conventions to make it easier to find particulars in a file? How do you order the members of your class?
What about suggestions for namespaces, packaging or source code folder/organization standards? I tend to start with something like:
<com|org|...>.<company>.<app>.<layer>.<function>.ClassName
I'm curious to see if there are other, more accepted, practices than what I am accustomed to -- before I venture off dictating these standards. Links to standards already published online would be great too -- even though I've done a bit of that already.
First find a automated code-formatter that works with your language. Reason: Whatever the document says, people will inevitably break the rules. It's much easier to run code through a formatter than to nit-pick in a code review.
If you're using a language with an existing standard (e.g. Java, C#), it's easiest to use it, or at least start with it as a first draft. Sun put a lot of thought into their formatting rules; you might as well take advantage of it.
In any case, remember that much research has shown that varying things like brace position and whitespace use has no measurable effect on productivity or understandability or prevalence of bugs. Just having any standard is the key.
Coming from the automotive industry, here's a few style standards used for concrete reasons:
Always used braces in control structures, and place them on separate lines. This eliminates problems with people adding code and including it or not including it mistakenly inside a control structure.
if(...)
{
}
All switches/selects have a default case. The default case logs an error if it's not a valid path.
For the same reason as above, any if...elseif... control structures MUST end with a default else that also logs an error if it's not a valid path. A single if statement does not require this.
In the occasional case where a loop or control structure is intentionally empty, a semicolon is always placed within to indicate that this is intentional.
while(stillwaiting())
{
;
}
Naming standards have very different styles for typedefs, defined constants, module global variables, etc. Variable names include type. You can look at the name and have a good idea of what module it pertains to, its scope, and type. This makes it easy to detect errors related to types, etc.
There are others, but these are the top off my head.
-Adam
I'm going to second Jason's suggestion.
I just completed a standards document for a team of 10-12 that work mostly in perl. The document says to use "perltidy-like indentation for complex data structures." We also provided everyone with example perltidy settings that would clean up their code to meet this standard. It was very clear and very much industry-standard for the language so we had great buyoff on it by the team.
When setting out to write this document, I asked around for some examples of great code in our repository and googled a bit to find other standards documents that smarter architects than I to construct a template. It was tough being concise and pragmatic without crossing into micro-manager territory but very much worth it; having any standard is indeed key.
Hope it works out!
It obviously varies depending on languages and technologies. By the look of your example name space I am going to guess java, in which case http://java.sun.com/docs/codeconv/ is a really good place to start. You might also want to look at something like maven's standard directory structure which will make all your projects look similar.

Resources