How can MARS produce weird constants in terms? - algorithm

I've been reading about an interesting machine learning algorithm, MARS(Multi-variate adaptive regression splines).
As far as I understand the algorithm, from Wikipedia and Friedman's papers, it works in two stages, forward pass and backward pass. I'll ignore backward pass for now, since forward pass is the part I'm interested in. The steps for forward pass, as far as I can tell are.
Start with just the mean of the data.
Generate a new term pair, through exhaustive search
Repeat 2 while improvements are being made
And to generate a term pair MARS appears to do the following:
Select an existing term (e)
Select a variable (x)
Select a value of that variable (v)
Return two terms one of the form e*max(0,x-v) and the other of the form e*max(0, v-x)
And this makes sense to me. I could see how, for example, a data table like this:
| A | B | Z |
| 5 | 6 | 1 |
| 7 | 2 | 2 |
| 3 | 1 | 3 |
Could produce a terms like 2*max(0, B-1) or even 8*max(0, B-1)*max(3-A). However, the wikipedia page has an example that I don't understand. It has an ozone example where the first term is 25. However, it also has term in the final regression that has a coefficient that is negative and fractional. I don't see how this is possible, since the initial term is 5, and you can only multiply by previous terms, and no previous term can have a negative coefficient, that you could ever end up with one...
What am I missing?
As I see it, either I misunderstand term generation, or I misunderstand the simplification process. However, simplification as described seems to only delete terms, not modify them. Can you see what I am missing here?


Relational Algebra Division

I'm currently dealing with a relational algebra division issue. I have the following two relations:
A | B | C B
--|---|-- ---
1 | 2 | 3 2
Relation R = 1 | 2 | 6 Relation T =
4 | 2 | 2
4 | 5 | 6
Now I'm doing the following operation: R ÷ T
When I calculate this, my result is as follows:
A | C
1 | 3
R ÷ T = 1 | 6
4 | 2
For me it is because for the division I look at those tuples in R which are present in combination with all tuples in T. But when I use a relational algebra calculator, such as RelaX it returns
A | C
R ÷ T = 4 | 2
Where did I make a mistake? Thanks in advance for any help.
Is there anybody who can help?
Performing division on these schema is not good to fully understand how the operator works. Definitions of this operator are not very clear, and the operation is usually replaced by a combination of other operators.
A clear way to see how this works in your case would consist on create a new instance of R with columns ordered properly defining a new schema: (A,C,B). This is, making the attribute of T appear as the last attribute in A. In this case where you are performing a really simple division, it's pretty straightforward to see the result, but imagine if you had schema R(A,D,B,C) and T(B,D). Here the attributes of T appear with a different order in R, and given some instances with not many tuples would already make it difficult to check just by looking them up.
This might be the most difficult operator defined in relational algebra as a query usually involves concepts from selection, projection and join. Also it's complicated to put it out only on words.
A good way of thinking about this operator, is to think about the GROUP BY on SQL. In your example this means using GROUP BY attributes A,C - which would create groups with every combination of different values for these attributes that appear on the schema instance. Each of these groups will have a set of all the values of B associated with the combinations of values of A, C. If you think of the values of the attribute B in the instance of T as a set, you can quickly verify: for each group obtained by grouping by A,C, if the set of values of B in T is included in the set of values of B in R, then the values of A,C are a tuple of the resulting relation.
I know I'm a bit late to this question but I've seen many people confused about this. If it's not clear enough, I leave reference to a document I wrote explaining it much more in detail and with a really good example, HERE.

Should I eliminate inputs in a logic circuit design?

Recently I had an exam where we were tested on logic circuits. I encountered something on that exam that I had never encountered before. Forgive me for I do not remember the exact problem given and we have not received our grade for it; however I will describe the problem.
The problem had a 3 or 4 inputs. We were told to simplify then draw a logic circuit design for that simplification. However, when I simplified, I ended up eliminating the other inputs and ended up literally with just
I had another problem like this as well where there was 4 inputs and when I simplified, I ended up with three. My question is:
What do I do with the eliminated inputs? Do I just not have it on the circuit? How would I draw it?
Typically an output is a requirement which would not be eliminated, even if it ends up being dependent on a single input. If input A flows through to output Y, just connect A to Y in the diagram. If output Y is always 0 or 1, connect an incoming 0 or 1 to output Y.
On the other hand, inputs are possible, not required, factors in the definition of the problem. Inputs that have no bearing on the output need not be shown in the circuit diagram at all.
Apparently it not eliminating inputs but the resulted expression is the simplified outcome which you need to think of implementing with logic circuit.
As an example if you have a expression given with 3 inputs namely with the combination of A, B & c, possible literals can be 2^9 = 9 between 000 through 111. Now when you said your simplification lead to just A that mean, when any of those 9 input combinations will result in to value which contain by A.
An example boolean expression simplified to output A truth table is as follows,
A B | Output = A
0 0 | 0
0 1 | 0
1 0 | 1
1 1 | 1

CUDA / OpenCL cache coherence, locality and space-filling curves

I'm working on a CUDA app that makes use of all available RAM on the card, and am trying to figure out different ways to reduce cache misses.
The problem domain consists of a large 2- or 3-D grid, depending on the type of problem being solved. (For those interested, it's an FDTD simulator). Each element depends on either two or four elements in "parallel" arrays (that is, another array of nearly identical dimensions), so the kernels must access either three or six different arrays.
The Problem
*Hopefully this isn't "too localized". Feel free to edit the question
The relationship between the three arrays can be visualized as (apologize for the mediocre ASCII art)
A[0,0] -C[0,0]- A ---- C ---- A ---- C ---- A
| | | |
| | | |
B[0,0] B B B
| | | |
| | | |
A ---- C ---- A ---- C ---- A ---- C ---- A
| | | |
| | | |
| | | |
| | | |
A ---- C ---- A ---- C ---- A ---- C ---- A
| | | |
| | | |
B B B B[3,2]
| | | |
| | | |
A ---- C ---- A ---- C ---- A ---- C ---- A[3,3]
Items connected by lines are coupled. As can be seen above, A[] depends on both B[] and C[], while B[] depends only on A[], as does C[]. All of A[] is updated in the first kernel, and all of B[] and C[] are updated in a second pass.
If I declare these arrays as simple 2D arrays, I wind up with strided memory access. For a very large domain size (3x3 +- 1 in the grid above), this causes occupancy and performance deficiencies.
So, I thought about rearranging the array layout in a Z-order curve:
Also, it would be fairly trivial to interleave these into one array, which should improve fetch performance since (depending on the interleave order) at least half of the elements required for a given cell update would be close to one another. However, it's not clear to me if GPU uses multiple data pointers when accessing multiple arrays. If so, this imagined benefit could actually be a hindrance.
The Questions
I've read that NVidia does this automatically behind the scenes when using texture memory, or a cudaArray. If this is not the case, should I expect the increased latency when crossing large spans (when the Z curve goes from upper right to bottom left at a high subdivision level) to eliminate the benefit of the locality in smaller grids?
Dividing the grid into smaller blocks that can fit in shared memory should certainly help, and the Z order makes this fairly trivial. Should I have a separate kernel pass that updates boundaries between blocks? Will the overhead of launching another kernel be significant compared to the savings I expect ?
Is there any real benefit to using a 2D vs 1D array? I expect memory to be linear, but am unsure if there is any real meaning to the 2D memory layout metaphor that's often used in CUDA literature.
Wow - long question. Thanks for reading and answering any/all of this.
Just to get this off of the unanswered list:
After a lot of benchmarking and playing with different arrangements, the fastest approach I found was to keep the arrays interleaved in z-order so that most of the values required by a thread were located near each other in RAM. This improved cache behavior (and thus performance). Obviously there are many cases where Z order fails to keep required values close together. I wonder if rotating quadrants to reduce "distance" between the end of a Z and the next quadrant, but I haven't tried that.
Thanks to everyone for the advice.

Stratego/XT: Understanding the basic of basics

I have really tried to get my head around the first steps of understanding Stratego/XT. I've googled a lot and all the web resources I have found seem to make a large enough leap at the beginning that I just can't make the connection. Let me explain.
I understand Abstract Syntax Trees like this:
But then it seems (in the very next sentence even) the documents make this leap to this:
LetSplit :
Let([d1, d2 | d*], e*) ->
Let([d1], Let([d2 | d*], e*))
This makes no sense to me. Could someone explain what is going on here with LetSplit?
Also, is there a good resource for furthering a solid understanding of Stratego/XT that is easier to read that the garganutan and complex official "tutorial" on the Stratego/XT website?
LetSplit :
Let([d1, d2 | d*], e*) ->
Let([d1], Let([d2 | d*], e*))
This is a rewrite rule with the name LetSplit.
It is equivalent (syntactic sugar) to the strategy:
LetSplit =
?Let([d1, d2 | d*], e*) ; // match
!Let([d1], Let([d2 | d*], e*)) // build
When invoked, then, when the left hand side Let([d1, d2 | d*], e*) (the match part) matches the current term, the current term is replaced by the right hand side Let([d1], Let([d2 | d*], e*)) (the build part). When the left hand side does not match, the rule fails and the current term remains unchanged.
d1, d2, d*, e* are term variables bound to the sub-terms found at their respective positions during the match. The names are then used in the build part, where they expand to the sub-tree they were bound to before. Note that indeed, * and ' may appear at the end of term variable names. The single quote has no special meaning, while * has a special meaning in list build operations (not the case here).
The syntax [d1, d2 | d*] in the match part matches any list with at least two elements. These elements will be bound to d1 and d2 and the remaining elements in the list will be bound to d* (so d* will be a list, and may be the empty list []).
Also, is there a good resource for furthering a solid understanding of
Stratego/XT that is easier to read that the garganutan and complex
official "tutorial" on the Stratego/XT website?
Research papers. Though admittedly they aren't really easier to read, but arguably they are the only place where some of the more advanced concepts are explained.
Stratego/XT 0.17. A language and toolset for program transformation (may be a good starting point to find keywords to use in e.g. google scholar)
Program Transformation with Scoped Dynamic Rewrite Rules (scary, but contains a wealth of information about dynamic rewrite rules that is hard to find elsewhere)
more papers
Anyway feel free to ask more questions here on stackoverflow, I will try to answer them :-)

Common causes of Cyclomatic Complexity and their solutions

At work we are looking into common problems that lead to high cyclomatic complexity. For example, having a large if-else statement can lead to high cyclomatic complexity, but can be resolved by replacing conditionals with polymorphism. What other examples have you found?
See the NDepend's definition of Cyclomatic Complexity.
Nesting Depth is also a great code metric.
Cyclomatic complexity is a popular procedural software metric equal to the number of decisions that can be taken in a procedure. Concretely, in C# the CC of a method is 1 + {the number of following expressions found in the body of the method}:
if | while | for | foreach | case | default | continue | goto | && | || | catch | ternary operator ?: | ??
Following expressions are not counted for CC computation:
else | do | switch | try | using | throw | finally | return | object creation | method call | field access
Adapted to the OO world, this metric is defined both on methods and classes/structures (as the sum of its methods CC). Notice that the CC of an anonymous method is not counted when computing the CC of its outer method.
Recommendations: Methods where CC is higher than 15 are hard to understand and maintain. Methods where CC is higher than 30 are extremely complex and should be split in smaller methods (except if they are automatically generated by a tool).
Another example to avoid using so many if´s, it's the implementation of a Finite State Machine. Because events fire transitions, so the conditionals are implicit in a clearer way with these transitions that changes the state of the System. The control is easier.
Leave you a link where mentions some of it´s benefits:
