How can I build a circuit for 'R = NOT(A AND (B OR C)).' just with transistors? - logic

Only 3 transistors can be used. I can't use any logic gates.
I tried a lot of versions, however, just can solve with at least 5 transistor

Related

Constraint Satisfaction Problems with solvers VS Systematic Search

We have to solve a difficult problem where we need to check a lot of complex rules from multiple sources against a system in order to decide if the system satisfy those rules or how it should be changed to satisfy them.
We initially started using Constraint Satisfaction Problems algorithms (using Choco) to try to solve it but since the number of rules and variables would be smaller than anticipated, we are looking to build a list of all possibles configurations on a database and using multiple requests based on the rules to filter this list and find the solutions this way.
Is there limitations or disadvantages of doing a systematic search compared to using a CSP solver algorithms for a reasonable number of rules and variables? Will it impact performances significantly? Will it reduce the kind of constraints we can implement?
As examples :
You have to imagine it with a much bigger number of variables, much bigger domains of definition (but always discrete) and bigger number of rules (and some much more complex) but instead of describing the problem as :
x in (1,6,9)
y in (2,7)
z in (1,6)
y = x + 1
z = x if x < 5 OR z = y if x > 5
And giving it to a solver we would build a table :
X | Y | Z
1 2 1
6 2 1
9 2 1
1 7 1
6 7 1
9 7 1
1 2 6
6 2 6
9 2 6
1 7 6
6 7 6
9 7 6
And use queries like (this is just an example to help understand, actually we would use SPARQL against a semantic database) :
SELECT X, Y, Z WHERE Y = X + 1
INTERSECT
SELECT X, Y, Z WHERE (Z = X AND X < 5) OR (Z = Y AND X > 5)
CSP allows you to combine deterministic generation of values (through the rules) with heuristic search. The beauty happens when you customize both of those for your problem. The rules are just one part. Equally important is the choice of the search algorithm/generator. You can cull a lot of the search space.
While I cannot make predictions about the performance of your SQL solution, I must say that it strikes me as somewhat roundabout. One specific problem will happen if your rules change - you may find that you have to start over from scratch. Also, the RDBMS will fully generate all of the subqueries, which may explode.
I'd suggest to implement a working prototype with CSP, and one with SQL, for a simple subset of your requirements. You then will get a good feeling what works and what does not. Be sure to think about long term maintenance as well.
Full disclaimer: my last contact with CSP was decades ago in university as part of my master's (I implemented a CSP search engine not unlike choco, of course a bit more rudimentary, and researched a bit on that topic). But the field will certainly have evolved since then.

Haskell: tail-recursive function is slower? [duplicate]

I discovered the "time" command in unix today and thought I'd use it to check the difference in runtimes between tail-recursive and normal recursive functions in Haskell.
I wrote the following functions:
--tail recursive
fac :: (Integral a) => a -> a
fac x = fac' x 1 where
fac' 1 y = y
fac' x y = fac' (x-1) (x*y)
--normal recursive
facSlow :: (Integral a) => a -> a
facSlow 1 = 1
facSlow x = x * facSlow (x-1)
These are valid keeping in mind they were solely for use with this project, so I didn't bother to check for zeroes or negative numbers.
However, upon writing a main method for each, compiling them, and running them with the "time" command, both had similar runtimes with the normal recursive function edging out the tail recursive one. This is contrary to what I'd heard with regards to tail-recursive optimization in lisp. What's the reason for this?
Haskell uses lazy-evaluation to implement recursion, so treats anything as a promise to provide a value when needed (this is called a thunk). Thunks get reduced only as much as necessary to proceed, no more. This resembles the way you simplify an expression mathematically, so it's helpful to think of it that way. The fact that evaluation order is not specified by your code allows the compiler to do lots of even cleverer optimisations than just the tail-call elimination youre used to. Compile with -O2 if you want optimisation!
Let's see how we evaluate facSlow 5 as a case study:
facSlow 5
5 * facSlow 4 -- Note that the `5-1` only got evaluated to 4
5 * (4 * facSlow 3) -- because it has to be checked against 1 to see
5 * (4 * (3 * facSlow 2)) -- which definition of `facSlow` to apply.
5 * (4 * (3 * (2 * facSlow 1)))
5 * (4 * (3 * (2 * 1)))
5 * (4 * (3 * 2))
5 * (4 * 6)
5 * 24
120
So just as you worried, we have a build-up of numbers before any calculations happen, but unlike you worried, there's no stack of facSlow function calls hanging around waiting to terminate - each reduction is applied and goes away, leaving a stack frame in its wake (that is because (*) is strict and so triggers the evaluation of its second argument).
Haskell's recursive functions aren't evaluated in a very recursive way! The only stack of calls hanging around are the multiplications themselves. If (*) is viewed as a strict data constructor, this is what's known as guarded recursion (although it is usually referred to as such with non-strict data constructors, where what's left in its wake are the data constructors - when forced by further access).
Now let's look at the tail-recursive fac 5:
fac 5
fac' 5 1
fac' 4 {5*1} -- Note that the `5-1` only got evaluated to 4
fac' 3 {4*{5*1}} -- because it has to be checked against 1 to see
fac' 2 {3*{4*{5*1}}} -- which definition of `fac'` to apply.
fac' 1 {2*{3*{4*{5*1}}}}
{2*{3*{4*{5*1}}}} -- the thunk "{...}"
(2*{3*{4*{5*1}}}) -- is retraced
(2*(3*{4*{5*1}})) -- to create
(2*(3*(4*{5*1}))) -- the computation
(2*(3*(4*(5*1)))) -- on the stack
(2*(3*(4*5)))
(2*(3*20))
(2*60)
120
So you can see how the tail recursion by itself hasn't saved you any time or space. Not only does it take more steps overall than facSlow 5, it also builds a nested thunk (shown here as {...}) - needing an extra space for it - which describes the future computation, the nested multiplications to be performed.
This thunk is then unraveled by traversing it to the bottom, recreating the computation on the stack. There is also a danger here of causing stack overflow with very long computations, for both versions.
If we want to hand-optimise this, all we need to do is make it strict. You could use the strict application operator $! to define
facSlim :: (Integral a) => a -> a
facSlim x = facS' x 1 where
facS' 1 y = y
facS' x y = facS' (x-1) $! (x*y)
This forces facS' to be strict in its second argument. (It's already strict in its first argument because that has to be evaluated to decide which definition of facS' to apply.)
Sometimes strictness can help enormously, sometimes it's a big mistake because laziness is more efficient. Here it's a good idea:
facSlim 5
facS' 5 1
facS' 4 5
facS' 3 20
facS' 2 60
facS' 1 120
120
Which is what you wanted to achieve I think.
Summary
If you want to optimise your code, step one is to compile with -O2
Tail recursion is only good when there's no thunk build-up, and adding strictness usually helps to prevent it, if and where appropriate. This happens when you're building a result that is needed later on all at once.
Sometimes tail recursion is a bad plan and guarded recursion is a better fit, i.e. when the result you're building will be needed bit by bit, in portions. See this question about foldr and foldl for example, and test them against each other.
Try these two:
length $ foldl1 (++) $ replicate 1000
"The size of intermediate expressions is more important than tail recursion."
length $ foldr1 (++) $ replicate 1000
"The number of reductions performed is more important than tail recursion!!!"
foldl1 is tail recursive, whereas foldr1 performs guarded recursion so that the first item is immediately presented for further processing/access. (The first "parenthesizes" to the left at once, (...((s+s)+s)+...)+s, forcing its input list fully to its end and building a big thunk of future computation much sooner than its full results are needed; the second parenthesizes to the right gradually, s+(s+(...+(s+s)...)), consuming the input list bit by bit, so the whole thing is able to operate in constant space, with optimizations).
You might need to adjust the number of zeros depending on what hardware you're using.
It should be mentioned that the fac function is not a good candidate for guarded recursion. Tail recursion is the way to go here. Due to laziness you are not getting the effect of TCO in your fac' function because the accumulator arguments keep building large thunks, which when evaluated will require a huge stack. To prevent this and get the desired effect of TCO you need to make these accumulator arguments strict.
{-# LANGUAGE BangPatterns #-}
fac :: (Integral a) => a -> a
fac x = fac' x 1 where
fac' 1 y = y
fac' x !y = fac' (x-1) (x*y)
If you compile using -O2 (or just -O) GHC will probably do this on its own in the strictness analysis phase.
You should check out the wiki article on tail recursion in Haskell. In particular, because of expression evaluation, the kind of recursion you want is guarded recursion. If you work out the details of what's going on under the hood (in the abstract machine for Haskell) you get the same kind of thing as with tail recursion in strict languages. Along with this, you have a uniform syntax for lazy functions (tail recursion will tie you to strict evaluation, whereas guarded recursion works more naturally).
(And in learning Haskell, the rest of those wiki pages are awesome, too!)
If I recall correctly, GHC automatically optimizes plain recursive functions into tail-recursive optimized ones.

Mutable, (possibly parallel) Haskell code and performance tuning

I have now implemented another SHA3 candidate, namely Grøstl. This is still work in progress (very much so), but at the moment a 224-bit version pass all KATs. So now I'm wondering about performance (again :->). The difference this time, is that I chose to more closely mirror the (optimized) C implementation, i.e. I made a port from C to Haskell. The optimized C version use table-lookups to implement the algorithm. Furthermore the code is heavily based on updating an array containing 64-bit words. Thus I chose to use mutable unboxed vectors in Haskell.
My Grøstl code can be found here: https://github.com/hakoja/SHA3/blob/master/Data/Digest/GroestlMutable.hs
Short description of the algorithm: It's a Merkle-Damgård construction, iterating a compression function (f512M in my code) as long as there are 512-bits blocks of message left. The compression function is very simple: it simply runs two different independent 512-bit permutations P and Q (permP and permQ in my code) and combines their output. Its these permutations which are implemented by lookup tables.
Q1) The first thing that bothers me is that the use of mutable vectors makes my code look really fugly. This is my first time writing any major mutable code in Haskell so I don't really know how to improve this. Any tips on how I might better strucure the monadic code would be welcome.
Q2) The second is performance. Actually It's not too bad, because at the moment the Haskell code is only 3 times slower. Using GHC-7.2.1 and compiling as such:
ghc -O2 -Odph -fllvm -optlo-O3 -optlo-loop-reduce -optlo-loop-deletion
the Haskell code uses 60s. on an input of ~1GB, while the C-version uses 21-22s. But there are some things I find odd:
(1) If I try to inline rnd512QM, the code takes 4 times longer, but if I inline rnd512PM nothing happens! Why is this happening? These two functions are virtually identical!
(2) This is maybe more difficult. I've been experimenting with executing the two permutations in parallel. But currently to no avail. This is one example of what I tried:
f512 h m = V.force outP `par` (V.force outQ `pseq` (V.zipWith3 xor3 h outP outQ))
where xor3 x1 x2 x3 = x1 `xor` x2 `xor` x3
inP = V.zipWith xor h m
outP = permP inP
outQ = permQ m
When checking the run-time statistics, and using ThreadScope, I noticed that the correct number of SPARKS was created, but almost none was actually converted to useful parallel work. Thus I gained nothing in speedup. My question then becomes:
Are the P and Q functions just too small for the runtime to bother to run in parallel?
If not, is my use of par and pseq (and possibly Vector.Unboxed.force) wrong?
Would I gain anything by switching to strategies? And how would I go about doing that?
Thank you so much for your time.
EDIT:
Sorry for not providing any real benchmark tests. The testing code in the repo was just intended for myself only. For those wanting to test the code out, you will need to compile main.hs, and then run it as:
./main "algorithm" "testvariant" "byte aligned"
For instance:
./main groestl short224 False
or
./main groestl e False
(e stands for "Extreme". It's the very long message provided with the NIST KATS).
I checked out the repo, but there's no simple benchmark to just run and play with, so my ideas are just from eyeballing the code. Numbering is unrelated to your questions.
1) I'm pretty sure force doesn't do what you want -- it actually forces a copy of the underlying vector.
2) I think the use of unsafeThaw and unsafeFreeze is sort of odd. I'd just put f512M in the ST monad and be done with it. Then run it something like so:
otherwise = \msg -> truncate G224 . outputTransformation . runST $ foldM f512M h0_224 (parseMessage dataBitLen 512 msg)
3) V.foldM' is sort of silly -- you can just use a normal (strict) foldM over a list -- folding over the vector in the second argument doesn't seem to buy anything.
4) i'm dubious about the bangs in columnM and for the unsafeReads.
Also...
a) I suspect that xoring unboxed vectors can probably be implemented at a lower level than zipWith, making use of Data.Vector internals.
b) However, it may be better not to do this as it could interfere with vector fusion.
c) On inspection, extractByte looks slightly inefficient? Rather than using fromIntegral to truncate, maybe use mod or quot and then a single fromIntegral to take you directly to an Int.
Be sure to compile with -threaded -rtsopts and execute with +RTS -N2. Without that, you won't have more than one OS thread to perform computations.
Try to spark computations that are referred to elsewhere, otherwise they might be collected:
_
f512 h m = outP `par` (outQ `pseq` (V.zipWith3 xor3 h outP outQ))
where xor3 x1 x2 x3 = x1 `xor` x2 `xor` x3
inP = V.zipWith xor h m
outP = V.force $ permP inP
outQ = V.force $ permQ m
_
3) If you switch things up so parseBlock accepts strict bytestrings (or chunks and packs lazy ones when needed) then you can use Data.Vector.Storable and potentially avoid some copying.

Erlang pattern matching performance

new to Erlang. I am about to start writing some code. The solution I have in mind can go one of two ways. I can do a bunch of mathematical calculations, or I can code this up largely as pattern matching. When I say "pattern matching," I DO NOT mean regex or anything like that - I mean pattern matching in clause heads.
Performance is usually not a concern, however in this app it is. I am not asking you which method would be faster - I'm sure you would say you don't know (depends on many factors). What I am asking is the general performance of Erlang pattern matching in clause heads. In other words, in Prolog, the engine is optimized to do this sort of thing, so all other things being equal, you are "encouraged" to design a solution that takes advantage of pattern matching and unification in clause heads.
Is the same true of Erlang, i.e. is Erlang optimized for pattern matching in clause heads, similar to Prolog? Rather than ask the question here, I actually tried to profile this in Erlang, and wrote a toy program to do pattern matching of clause heads a couple million times vs. summing a list a couple million times. But the system would crash if set to do for "couple million times." But set to any less than a couple million, the results would come back too fast for me to know anything about performance.
Thanks for any insights.
In general, pattern matching in functional languages are as fast or faster than in Prolog. I would expect Erlang to perform inside a factor of 2 compared to Prolog, faster or slower. Since a functional program is almost nothing but pattern matching it is one of those areas you optimize a lot.
The internals usually have a pattern match compiler which turns an advanced pattern match into a simpler series of checks with the goal to minimize the number of checks.
Ok, the first gotcha: "Anything introduced in the shell is interpreted". So we compile modules instead:
-module(z).
-compile(export_all).
%% This pattern is highly uninteresting because it only matches
%% on a single value. Any decent system can do this quickly.
cl(0) -> 0;
cl(1) -> 0;
cl(2) -> 0;
cl(3) -> 0;
cl(4) -> 0;
cl(5) -> 0;
cl(6) -> 0;
cl(7) -> 0;
cl(8) -> 0;
cl(9) -> 0.
mcl(L) ->
[cl(E) || E <- L].
This gives us a runner of your example.
cl2(a, 0) -> a0;
cl2(a, 1) -> a1;
cl2(a, 2) -> a2;
cl2(b, 0) -> b0;
cl2(b, 1) -> b1;
cl2(b, 2) -> b2;
cl2(c, 0) -> c0;
cl2(c, 1) -> c0;
cl2(c, 2) -> c0.
mcl2(L) ->
[cl2(A, V) || {A, V} <- L].
A runner for a more interesting example. Here, we can exploit skips in the pattern. If (a, 0) fails to match on the a we know we can skip to the case (b, 0) straight away because the negative match information can be propagated as information in the system. The pattern match compiler usually makes this optimization.
test1() ->
L1 = [X rem 10 || X <- lists:seq(1, 2000000)],
%% A Core 2 Duo 2.4Ghz P8600 eats this in 132984 microseconds without HiPE
%% c(z).
%% With HiPE it is 91195 or in 0.6857591890753775 of the time the non-HiPE variant use
%% c(z, [native, {hipe, [o3]}]).
timer:tc(z, mcl, [L1]).
You must run this example yourself and evaluate if you think it is fast enough for your use case. Note that some time is also spent in the mapping code and quite a lot of time must be spent to pull data from the main memory through the CPU caches and onto the CPU.
test2() ->
random:seed(erlang:now()),
L2 = [{case random:uniform(3) of
1 -> a;
2 -> b;
3 -> c
end, V rem 3} || V <- lists:seq(1, 2000000)],
%% With HiPE this is 220937
%% Without HiPE this is 296980
timer:tc(z, mcl2, [L2]).
Naturally this example is slower, as we need to match more data before we have a hit. But it is a more interesting case because it gives some indication of the real speed of the match compiler.
Parallel versions were tried, but they are about 10 times slower in this case since the overhead of creating 2 million worker processes far dominates the actual processing :)
It is basically as #(NOT VERY) CRAP ANSWERS :-) has said and his example shows.
Prolog doesn't really do pattern matching which is unidirectional, it does unification which is sort of like a bidirectional pattern matching with logical variables. It is much easier to optimize pattern matching and serious functional languages like Erlang and Haskell put quite a bit of work into an optimising pattern matching compiler. This is especially noticeable with deep patterns.
So, yes, Erlang will do pattern matching faster than Prolog.

Aggregating automatically-generated feature vectors

I've got a classification system, which I will unfortunately need to be vague about for work reasons. Say we have 5 features to consider, it is basically a set of rules:
A B C D E Result
1 2 b 5 3 X
1 2 c 5 4 X
1 2 e 5 2 X
We take a subject and get its values for A-E, then try matching the rules in sequence. If one matches we return the first result.
C is a discrete value, which could be any of a-e. The rest are just integers.
The ruleset has been automatically generated from our old system and has an extremely large number of rules (~25 million). The old rules were if statements, e.g.
result("X") if $A >= 1 && $A <= 10 && $C eq 'A';
As you can see, the old rules often do not even use some features, or accept ranges. Some are more annoying:
result("Y") if ($A == 1 && $B == 2) || ($A == 2 && $B == 4);
The ruleset needs to be much smaller as it has to be human maintained, so I'd like to shrink rule sets so that the first example would become:
A B C D E Result
1 2 bce 5 2-4 X
The upshot is that we can split the ruleset by the Result column and shrink each independently. However, I cannot think of an easy way to identify and shrink down the ruleset. I've tried clustering algorithms but they choke because some of the data is discrete, and treating it as continuous is imperfect. Another example:
A B C Result
1 2 a X
1 2 b X
(repeat a few hundred times)
2 4 a X
2 4 b X
(ditto)
In an ideal world, this would be two rules:
A B C Result
1 2 * X
2 4 * X
That is: not only would the algorithm identify the relationship between A and B, but would also deduce that C is noise (not important for the rule)
Does anyone have an idea of how to go about this problem? Any language or library is fair game, as I expect this to be a mostly one-off process. Thanks in advance.
Check out the Weka machine learning lib for Java. The API is a little bit crufty but it's very useful. Overall, what you seem to want is an off-the-shelf machine learning algorithm, which is exactly what Weka contains. You're apparently looking for something relatively easy to interpret (you mention that you want it to deduce the relationship between A and B and to tell you that C is just noise.) You could try a decision tree, such as J48, as these are usually easy to visualize/interpret.
Twenty-five million rules? How many features? How many values per feature? Is it possible to iterate through all combinations in practical time? If you can, you could begin by separating the rules into groups by result.
Then, for each result, do the following. Considering each feature as a dimension, and the allowed values for a feature as the metric along that dimension, construct a huge Karnaugh map representing the entire rule set.
The map has two uses. One: research automated methods for the Quine-McCluskey algorithm. A lot of work has been done in this area. There are even a few programs available, although probably none of them will deal with a Karnaugh map of the size you're going to make.
Two: when you have created your final reduced rule set, iterate over all combinations of all values for all features again, and construct another Karnaugh map using the reduced rule set. If the maps match, your rule sets are equivalent.
-Al.
You could try a neural network approach, trained via backpropagation, assuming you have or can randomly generate (based on the old ruleset) a large set of data that hit all your classes. Using a hidden layer of appropriate size will allow you to approximate arbitrary discriminant functions in your feature space. This is more or less the same idea as clustering, but due to the training paradigm should have no issue with your discrete inputs.
This may, however, be a little too "black box" for your case, particularly if you have zero tolerance for false positives and negatives (although, it being a one-off process, you get an arbitrary degree of confidence by checking a gargantuan validation set).

Resources