Define a measuring function for an operator in OCaml

Define a measuring function for an operator in OCaml - algorithm

I would like to define an operator l_op : A list * A list -> A list whose implementation requires another operator op : A * A -> A. Given a0: A, though for all a1 : A op a0 a1 always returns a result as A, for some a1 the result makes more sense than other a1.
Intuitively l_op al0 al1 needs a strategy of matching, which finds a meaningful element of al1, with regard to op, for each element of al0. Then the list of the results by op is the result of l_op.
So I need a measure of meaning.
One possible choice is, a function measure: A * A * A -> int can be defined. For instance, measure a0 a1 (op a0 a1) gives an int from 1 to 10 which represents how op a0 a1 makes sense. Then in the implementation of l_op al0 al1, for each a0 of al0, I can find a1 such that measure a0 a1 (op a0 a1) >= measure a0 a1' (op a0 a1') for all a1' in al1. Then I remove a0 and a1 from the two lists, and match the rest of the two lists...
Another choice is, I change a little bit op to op : A * A -> A * int where the integer represents how the operation makes sense. Then in the implementation of l_op al0 al1, for each a0 of al0, I can find a1 such that for all a1' in al1, m1 >= m1' where (_, m1), (_, m1') = op a0 a1, op a0 a1'.
An advantage of the second choice is that, we can save some code because we can calculating the measuring while doing op a0 a1. A disadvantage is that I find the signature op : A * A -> A * int is less good-looking than op : A * A -> A.
So my questions are:
1) There is a conventional word for this kind of measuring function (which starts by h probably), but I have forgotten it, could anyone remind?
2) Do you think int is a good type for measuring? Maybe we can define a type for that... What is the most conventional way?
3) Which choice that I mentioned above is better? Or does anyone have a better idea?

There is a conventional word for this kind of measuring function (which starts by h probably),
Maybe "heuristic"? It comes from the Ancient Greek for "find out, discover" and is used in computer science to name methods that look for "good enough" results, often approximating the perfect behavior in a simpler but nearly as effective way. It is really appropriate here (unless your "meaning measurement" really is an heuristic/approximation) but begins with 'h'.
I would suggest just calling your measurements a "score", or a "weight".
Do you think int is a good type for measuring? Maybe we can define a type for that... What is the most conventional way?
In depends on how your measuring is defined. How much structure do you need on the results (eg. you could want to keep the justification of your measurement, needing a richer structure)? What kind of operations do you use while measuring? If you only use addition and constants, int is fine, if you use division etc., float may be needed. You probably need a type whose values it is possible to compare, in all cases.
I guess that int will be ok in most circumstances, and otherwise you'll be able to change your mind relatively easily. If you plan to change this, you can use a type alias:
type measure = int
This way you can use measure instead of int in most of your code, and don't need to replace all occurrences afterwards. That said, in OCaml we usually don't write a lot of type annotations, thanks to inference, so in practice I don't expect details of your typing choices to be spread in a lot of code.
Which choice that I mentioned above is better? Or does anyone have a better idea?
I'd pick the second choice. I suspect there is some redundancy between the A -> A -> A operation of "computing the result" and the A -> A -> int operation of "computing the result meaning". By doing both at the same time (A -> A -> A * int) you can reuse the same logical structure, which makes that correspondence clearer (and uses less code). If on the contrary the two operations are totally unrelated, you can consider having two separate operator (but I'd still use A -> A -> int for scoring; if you need to get the result to measure meaning, you can still call the first operation internally).

Related

Why does refactoring data to newtype speed up my haskell program?

I have a program which traverses an expression tree that does algebra on probability distributions, either sampling or computing the resulting distribution.
I have two implementations computing the distribution: one (computeDistribution) nicely reusable with monad transformers and one (simpleDistribution) where I concretize everything by hand. I would like to not concretize everything by hand, since that would be code duplication between the sampling and computing code.
I also have two data representations:
type Measure a = [(a, Rational)]
-- data Distribution a = Distribution (Measure a) deriving Show
newtype Distribution a = Distribution (Measure a) deriving Show
When I use the data version with the reusable code, computing the distribution of 20d2 (ghc -O3 program.hs; time ./program 20 > /dev/null) takes about one second, which seems way too long. Pick higher values of n at your own peril.
When I use the hand-concretized code, or I use the newtype representation with either implementation, computing 20d2 (time ./program 20 s > /dev/null) takes the blink of an eye.
Why?
How can I find out why?
My knowledge of how Haskell is executed is almost nil. I gather there's a graph of thunks in basically the same shape as the program, but that's about all I know.
I figure with newtype the representation of Distribution is the same as that of Measure, i.e. it's just a list, whereas with the data version each Distribution is kinda' like a single-field record, except with a pointer to the contained list, and so the data version has to perform more allocations. Is this true? If true, is this enough to explain the performance difference?
I'm new to working with monad transformer stacks. Consider the Let and Uniform cases in simpleDistribution — do they do the same as the walkTree-based implementation? How do I tell?
Here's my program. Note that Uniform n corresponds to rolling an n-sided die (in case the unary-ness was surprising).
Update: based on comments I simplified my program by removing everything not contributing to the performance gap. I made two semantic changes: probabilities are now denormalized and all wonky and wrong, and the simplification step is gone. But the essential shape of my program is still there. (See question edit history for the non-simplified program.)
Update 2: I made further simplifications, reducing Distribution down to the list monad with a small twist, removing everything to do with probabilities, and shortening the names. I still observe large performance differences when using data but not newtype.
import Control.Monad (liftM2)
import Control.Monad.Trans (lift)
import Control.Monad.Reader (ReaderT, runReaderT)
import System.Environment (getArgs)
import Text.Read (readMaybe)
main = do
args <- getArgs
let dieCount = case map readMaybe args of Just n : _ -> n; _ -> 10
let f = if ["s"] == (take 1 $ drop 1 $ args) then fast else slow
print $ f dieCount
fast, slow :: Int -> P Integer
fast n = walkTree n
slow n = walkTree n `runReaderT` ()
walkTree 0 = uniform
walkTree n = liftM2 (+) (walkTree 0) (walkTree $ n - 1)
data P a = P [a] deriving Show
-- newtype P a = P [a] deriving Show
class Monad m => MonadP m where uniform :: m Integer
instance MonadP P where uniform = P [1, 1]
instance MonadP p => MonadP (ReaderT env p) where uniform = lift uniform
instance Functor P where fmap f (P pxs) = P $ fmap f pxs
instance Applicative P where
pure x = P [x]
(P pfs) <*> (P pxs) = P $ pfs <*> pxs
instance Monad P where
(P pxs) >>= f = P $ do
x <- pxs
case f x of P fxs -> fxs

How can I find out why?
This is, in general, hard.
The extreme way to do it is to look at the core code (which you can produce by running GHC with -ddump-simpl). This can get complicated really quickly, and it's basically a whole new language to learn. Your program is already big enough that I had trouble learning much from the core dump.
The other way to find out why is to just keep using GHC and asking questions and learning about GHC optimizations until you recognize certain patterns.
Why?
In short, I believe it's due to list fusion.
NOTE: I don't know for sure that this answer is correct, and it would take more time/work to verify than I'm willing to put in right now. That said, it fits the evidence.
First off, we can check whether this slowdown you're seeing is a result of something truly fundamental vs a GHC optimization triggering or not by running in O0, that is, without optimizations. In this mode, both Distribution representations result in about the same (excruciatingly long) runtime. This leads me to believe that it's not the data representation that is inherently the problem but rather there's an optimization that's triggered with the newtype version that isn't with the data version.
When GHC is run in -O1 or higher, it engages certain rewrite rules to fuse different folds and maps of lists together so that it doesn't need to allocate intermediate values. (See https://markkarpov.com/tutorial/ghc-optimization-and-fusion.html#fusion for a decent tutorial on this concept as well as https://stackoverflow.com/a/38910170/14802384 which additionally has a link to a gist with all of the rewrite rules in base.) Since computeDistribution is basically just a bunch of list manipulations (which are all essentially folds), there is the potential for these to fire.
The key is that with the newtype representation of Distribution, the newtype wrapper is erased during compilation, and the list operations are allowed to fuse. However, with the data representation, the wrappers are not erased, and the rewrite rules do not fire.
Therefore, I will make an unsubstantiated claim: If you want your data representation to be as fast as the newtype one, you will need to set up rewrite rules similar to the ones for list folding but that work over the Distribution type. This may involve writing your own special fold functions and then rewriting your Functor/Applicative/Monad instances to use them.

Substitution system

I am currently developing an advanced text editor with some features.
One of the features is a substitution system. It will help user to replace strings quickly.
It has a following format:
((a x) | x) (d e)
We can divide the string into two parts: left ((a x) | x) and right (d e). If there is a letter (x, in current case) after a |, then we then can perform a substitute action - replace all x in the left side with a string on the right side.
So after the action we will receive (a (d e)).
Of course, these parenthesized expressions might be nested : ((a | a) (d x | d)) (e f | f) -> ((e f | f) x) -> (e x).
Unfortunately, the reductions may continue infinitely: (a a | a) (a a | a).
I need to show a warning if user would write a string for which there is no way to reduce it into a form without reductions.
Any suggestions on how to do it?

Congratulations, you have just invented the λ-calculus (pronounced "Lambda-calculus")
Why this problem is hard
Let's use the original notation for this, since it was already invented by Church in the 1930s:
((λx.f) u) is the rewriting of f where all x's have been replaced by u's.
Note that this is equivalent to you notation (f | x) u, where f is a string that can contain x's. It's a mathematical tool introduced to understand what functions are "computable", i.e. can be given to a computer with an adequate program, so that for all input, the computer will run its program and output the correct answer. Spoiler: computable functions are exactly the functions that can be written as λ-terms (i.e. rewritings in your settings).
Sometimes λ-terms can be simplified, as you have noted yourself. For instance
(λx.yx)((λu.uu)(λz.z)i) -> (λx.yx)((λz.z)(λz.z)i) -> (λx.yx)((λz.z)i) -> (λx.yx)i -> yi
Sometimes the sequence of simplifications (also called "evaluation", or "beta-reduction") is infinite. One very famous is the Omega operator (λx.xx)(λx.xx) (rings a bell?).
Ideally, we would like to restrict the λ-calculus so that all terms are "simplifiable" to a final usable form in a finite number of steps. This property is called "normalization". The actual thing we want however is one step further : we want that all sequences of simplifications end up in a finite number of steps in the final form, so that when faced with multiple choices, you can choose either and not get stuck in an infinite loop because of a bad choice. This is called "strong normalization".
Here's the issue : the λ-calculus is not strongly normalizing. This property is simply not true. There are terms that do not end up in a final - "normal" - form.
You have found one yourself.
How this problem has been solved theoretically
The key to gaining the strong normalization property was to rule out λ-terms which did not satisfy this property (print a warning and spit an error in your case) so that we only consider strongly normalizing λ-terms. This "ruling out" was put in place via a typing system : λ-terms that have a valid type are strongly normalizing, and λ-terms that are not strongly normalizing cannot have a valid type. Awesome, now we know which ones should give errors ! Let's check that with simple cases.
From the way you are able to phrase your problem in a very clear way, I'm assuming you already have experience with programming and static type systems, but I can modify this answer if you'd rather have a full explanation. I'll be using Caml-like notation for types so s is a string, s -> s is a function that to a string associates a string, and s -> (s -> s) is a function that to a string associates a function from string to string, etc. I denote x : t when variable x has type t.
λx.ax : s -> s provided a : s -> s
(λx.yx)((λu.uu)(λz.z)i) : s provided y : s -> s, i : s as we have seen by the reductions above
λx.x : s -> s. But watch out, λx.x : (s -> s) -> (s -> s) is also true. Deciding the type is hard
How you may solve this problem programatically
You problem is slightly easier, because you are only dealing with string replacements, so you know that the base type of everything is a string, and you can try to "type you way up" until you are either able to type the whole rewriting (i.e. no errors), or able to prove that the rewriting is not typable (see Omega) and spit an error. Watch out though : just because you are not able to type a rewriting does not mean that it cannot be typed !
In your example (λx.ax)(de), you know the actual values for a, d and e so you may have for instance a : s -> s, d : s -> s, e : s, hence de : s and λx.ax : s -> s so the whole thing has type s and you're good to go. You can probably write a compiler that will try to type the rewriting and figure out if it's typable or not based on a set of cleverly-crafted decision rules for the specific use that you want. You can even decide that if the compiler fails to type a rewriting, then it is rejected (even when it's valid) because cases so intricate that they will fail though being correct should never happen in a reasonable editor text substitution scenario.
Do you want to solve this programatically ?
No. I mean it. You really don't want to.
Remember, λ-terms describe all computable functions.
If you were to really implement a fully correct warning generator as you seem to intend to, this means that one could encode any program as a string substitution in you editor, so your editor is essentially a programming language on its own, which is typically not what you want your editor to be.
I mean, writing a full program that queries the webcam, analyzes the image, guesses who is typing, and loads an ad based on Google's suggestion when opening a file, all as a cleverly-written text substitution command, really ? Is that what you're trying to achieve ?
PS: If this is indeed what you're trying to achieve, have a look at Lisp and Haskell, their syntax should look somewhat... familiar to what you've seen here.
And good luck with your editor !

Conditional Dependencies in Compiler Semantic Analysis Passes

Imagine that we have a been given an Excel spreadsheet with three columns, labeled COND, X and Y.
COND = TRUE or FALSE (user input)
X = if(COND == TRUE) then 0 else Y
Y = if(COND == TRUE) then X else 1;
These formulas evaluate perfectly fine in Excel, and Excel does not generate a Circular Dependency error.
I am writing a compiler that tries to convert these Excel formulas to C code. In my compiler, these formulas do generate a circular dependency error. The issue is that (naïvely) the expression of X depends on Y and the expression for Y depends on X and my compiler is unable to logically continue.
Excel is able to accomplish this feat because it is a lazy, interpreted language. Excel will just lazily evaluate the formulas at run-time (with user inputs), and since no circular dependency occurs at run-time Excel has no problem evaluating such logic.
Unfortunately, I need to convert these formulas to a compiled language (not an interpreted one). The actual formulas, in the actual spreadsheets, have more complicated dependencies between multiple cells/variables (involving up to over half a dozen different cells). This means that my compiler has to perform some kind of sophisticated static, semantic analysis of the formulas and be smart enough to detect that there are no circular references if we "look inside" the conditional branches. The compiler would then have to generate the following C code from the above Excel formulas:
bool COND;
int X, Y;
if(COND) { X = 0; Y = X; } else { Y = 1; X = Y; }
Notice that the order of the assignment instructions is different in each branch of the if-statement in C.
My question is, is there any established algorithm or literature on compilers that explains how to implement this type of analysis in a compiler? Do functional programming language compilers have to solve this problem?

Why aren't standard optimization techniques adequate?
Presumably, the Excel formulas form a DAG with the leaves being primitive values and the nodes being computations/assignments. (If the Excel computation forms a cycle, then you need
some kind of iterative solver assuming you want a fixpoint).
If you simply propagate the conditional by lifting it (a class compiler optimization), we start with your original equations, where each computation is evaluated in any order WRT to others, such that the result computes dag-like (that "anyorder" is an operator intending to model that):
X = if(COND == TRUE) then 0 else Y;
anyorder
Y = if(COND == TRUE) then X else 1;
then lifting the conditional:
if (COND) { X=0; } else { X = 1; }
anyorder
if (COND) { Y=X; } else { Y = 1; }
then
if (COND) { X=0; anyorder Y=X; } else { X = Y; anyorder Y = 1; }
Each of the arms must be dag-like.
The first arm is daglike evaluating the X=0 assignment first.
The second arm is daglike evaluating Y=1 first. So, we get the answer you wanted:
if (COND) { X=0; Y=X; } else { Y = 1; X = Y; }
So conventional transformations and knowledge about anyorder-if-daglike knowledges
seems to give the right effect.
I'm not sure what you do if COND is computed as a function of the cells.
I suspect the way to do this is to generate a dependency graph of computations with
with conditionals on the dependencies. You probably have to propagate/group those conditionals over the arcs more as less as I did over the syntax.

Yes, literature exists, sorry I cannot quote any, I simply don't remember and would it just google up just as you can..
Basic algos for dependency and cycle analysis are really simple. I.e. detect symbols in the expression, build a set of expressions and dependencies in form:
inps expr outs
cell_A6, cell_B7 -> expr3 -> cell_A7
cell_A1, cell_B4 -> expr1 -> cell_A5
cell_A1, cell_A5 -> expr2 -> cell_A6
and then by comparing and iteratively expanding/replacing sets of inputs/outputs:
step0:
cell_A6, cell_B7 -> expr3 -> cell_A7
cell_A1, cell_B4 -> expr1 -> cell_A5 <--1 note that cell_A5 ~ (A1,B4)
cell_A1, cell_A5 -> expr2 -> cell_A6 <--1 apply that knowledge here
so dependency
cell_A1, cell_A5 -> expr2 -> cell_A6
morphs into
cell_A1, cell_B4 -> expr2 -> cell_A6 <--2 note that cell_A6 ~ (A1,B4) and so on
Finally, you will get either a set of full dependencies, where you can easily detect circular dependencies, like for example:
cell_A1, cell_D6, cell_F7 -> exprN -> cell_D6
or, if none found - you will be able to determine a safe, incremental order of the execution.
If the expressions contain branches or sideeffects other than the 'returned value', you can apply various transformations to reduce/expand the expressions into new ones, or into groups of new expressions that will be of the form above. For example:
B5 = { if(A5 + A3 > 0) A3-1 else A5+1 }
so
inps ... outs
A3, A5 -> theExpr -> B5
the condition can be 'lifted' and form two conditional rules:
A5 + A3 > 0 : A3 -> reducedexpr "A3-1" -> B5
A5 + A3 <= 0 : A5 -> reducedexpr "A5-1" -> B5
but now, your execution/analysis must also take care of the conditions before applying the rules. Lifting is only one of possible transformations.
However, you stil need something more than that, at least some an 'extension' for it. The hard part of your problem is that your expressions are complex, have branches, and you need to include user-random input to resolve branches to eliminate the dead branches and break dead dependencies.
Since the key is elimination of dead dependencies, you have to somehow detect dead branches. Conditions can be of any arbitrary complexity, and user-input is random, so you cannot work it out completely statically, really. After playing with transformations, you would still have to analyze the conditions and generate code accordingly. To do so, you would need to generate code for all possible combinations of the outcomes of the conditions, and all resulting branching and rule combinations, which is simply infeasible except for some trivial cases. With number of unknown the number of leafs can grow exponentially (2^N) which is a huge bloat after crossing some threshold.
Of course while analyzing conditions based on Bools, you can analyze, group and eliminate conflicting conditions like (a & b & !a)..
..but if your input values and conditions include NON-BOOL data, like integers or floating or strings, just imagine your condition is have a condition that executes some external weird statistical function and checks its result.. Ignore the 'weird' part and focus on 'external'. If you meet some expressions that use complex functions like AVG or MAX, you cannot chew through something like that statically(*). Even simple arithmetic is hard to analyze: (a+b)*(c+d) - you could derive a fact that c+d can be ignored when a+b==0, but this a really tough task to cover fully..
IIRC, doing a satisfiability analysis (SAT) for boolean expressions with basic operators is an NP-hard problem, not mentioning integers or floating points with all their math.. Calculating the result of expression is much easier than telling which values does it really depend on!!
So, since input values may be either hardcoded (cool) or user-supplied at runtime (doh!), your compler most probably will not be able to fully analyze it up front. Now link it with the fact marked as (*) and it's quite obvious that you can include some static analysis and try to eliminate some branches at 'compilation time', but still there might be some parts that must be delayed until the user provides the actual input.
So, if part of the analysis must be done at runtime, all the branch elimination is just an optional optimisation and I think you should focus on the runtime part now.
At minimal unoptimized version, your generated program could simply remember all the excel-expressions and wait for input data. Once the program is run and input is given, the program has to substitute the input in the expressions, and then try to iteratively reduce them to output values.
Writing such algo in imperative language is completely possible. Actually, you'd need to write it once, and later you'd just merge it with a different sets of rules derived from cell-formulas and done. Runtime part of the program would be the same, formulas would change.
You could then expand the 'compiler' side to try to help by i.e. preliminarily partially analyzing the dependencies and trying to reorder the rules so later they will be checked in a "better order", or by precalculating constants, or inlining some expressions and so on but as I said, it's all optimizations, not core feature.
Sadly, I cannot really tell you much anything serious about the "functional languages", but since usually their runtimes are 'very dynamic' and sometimes they even execute the code in terms of symbols and transformations, it could reduce the complexity of your 'compiler' and 'engine' part. The most valuable asset here is the dynamism. So, even a Ruby would do much better than C - but in no way it's a "compiled" language as you'd say.
For example, you could try to transform excel rules directly into functions:
def cell_A5 = expr1(cell_A1, cell_B4)
def cell_A7 = expr3(cell_A6, cell_B7)
def cell_A6 = expr2(cell_A1, cell_A5)
write it down as part of the program, then when at runtime when the user provides some values, you'd those would just redefine some of the parts of the program
cell_B7 = 11.2 // filling up undefined variable
cell_A1 = 23 // filling up undefined variable
cell_A5 = 13 // overwriting the function with a value
That's the power of dynamic platforms, nothing very 'functional' here. Dynamic platforms make it easy to fill/override bits. But then, once the user provided some bits and once the program has been "corrected on the fly", which one function would you call first?
The answer is somewhat sad.. You don't know.
If your dynamic language has some rule-engine built into it, you can try generating rules instead of functions and later rely on that engine to "fill up" everything that is possible to calculate.
But if it doesn't have rule engine, you are back to point one..
afterthought:
Hm.. sorry, I think I just wrote too much and too vaguely/chatty. If you think it's helpful, please drop me a comment. Otherwise I'll delete it after few days or a week.

Mutable, (possibly parallel) Haskell code and performance tuning

I have now implemented another SHA3 candidate, namely Grøstl. This is still work in progress (very much so), but at the moment a 224-bit version pass all KATs. So now I'm wondering about performance (again :->). The difference this time, is that I chose to more closely mirror the (optimized) C implementation, i.e. I made a port from C to Haskell. The optimized C version use table-lookups to implement the algorithm. Furthermore the code is heavily based on updating an array containing 64-bit words. Thus I chose to use mutable unboxed vectors in Haskell.
My Grøstl code can be found here: https://github.com/hakoja/SHA3/blob/master/Data/Digest/GroestlMutable.hs
Short description of the algorithm: It's a Merkle-Damgård construction, iterating a compression function (f512M in my code) as long as there are 512-bits blocks of message left. The compression function is very simple: it simply runs two different independent 512-bit permutations P and Q (permP and permQ in my code) and combines their output. Its these permutations which are implemented by lookup tables.
Q1) The first thing that bothers me is that the use of mutable vectors makes my code look really fugly. This is my first time writing any major mutable code in Haskell so I don't really know how to improve this. Any tips on how I might better strucure the monadic code would be welcome.
Q2) The second is performance. Actually It's not too bad, because at the moment the Haskell code is only 3 times slower. Using GHC-7.2.1 and compiling as such:
ghc -O2 -Odph -fllvm -optlo-O3 -optlo-loop-reduce -optlo-loop-deletion
the Haskell code uses 60s. on an input of ~1GB, while the C-version uses 21-22s. But there are some things I find odd:
(1) If I try to inline rnd512QM, the code takes 4 times longer, but if I inline rnd512PM nothing happens! Why is this happening? These two functions are virtually identical!
(2) This is maybe more difficult. I've been experimenting with executing the two permutations in parallel. But currently to no avail. This is one example of what I tried:
f512 h m = V.force outP `par` (V.force outQ `pseq` (V.zipWith3 xor3 h outP outQ))
where xor3 x1 x2 x3 = x1 `xor` x2 `xor` x3
inP = V.zipWith xor h m
outP = permP inP
outQ = permQ m
When checking the run-time statistics, and using ThreadScope, I noticed that the correct number of SPARKS was created, but almost none was actually converted to useful parallel work. Thus I gained nothing in speedup. My question then becomes:
Are the P and Q functions just too small for the runtime to bother to run in parallel?
If not, is my use of par and pseq (and possibly Vector.Unboxed.force) wrong?
Would I gain anything by switching to strategies? And how would I go about doing that?
Thank you so much for your time.
EDIT:
Sorry for not providing any real benchmark tests. The testing code in the repo was just intended for myself only. For those wanting to test the code out, you will need to compile main.hs, and then run it as:
./main "algorithm" "testvariant" "byte aligned"
For instance:
./main groestl short224 False
or
./main groestl e False
(e stands for "Extreme". It's the very long message provided with the NIST KATS).

I checked out the repo, but there's no simple benchmark to just run and play with, so my ideas are just from eyeballing the code. Numbering is unrelated to your questions.
1) I'm pretty sure force doesn't do what you want -- it actually forces a copy of the underlying vector.
2) I think the use of unsafeThaw and unsafeFreeze is sort of odd. I'd just put f512M in the ST monad and be done with it. Then run it something like so:
otherwise = \msg -> truncate G224 . outputTransformation . runST $ foldM f512M h0_224 (parseMessage dataBitLen 512 msg)
3) V.foldM' is sort of silly -- you can just use a normal (strict) foldM over a list -- folding over the vector in the second argument doesn't seem to buy anything.
4) i'm dubious about the bangs in columnM and for the unsafeReads.
Also...
a) I suspect that xoring unboxed vectors can probably be implemented at a lower level than zipWith, making use of Data.Vector internals.
b) However, it may be better not to do this as it could interfere with vector fusion.
c) On inspection, extractByte looks slightly inefficient? Rather than using fromIntegral to truncate, maybe use mod or quot and then a single fromIntegral to take you directly to an Int.

Be sure to compile with -threaded -rtsopts and execute with +RTS -N2. Without that, you won't have more than one OS thread to perform computations.
Try to spark computations that are referred to elsewhere, otherwise they might be collected:
_
f512 h m = outP `par` (outQ `pseq` (V.zipWith3 xor3 h outP outQ))
where xor3 x1 x2 x3 = x1 `xor` x2 `xor` x3
inP = V.zipWith xor h m
outP = V.force $ permP inP
outQ = V.force $ permQ m
_
3) If you switch things up so parseBlock accepts strict bytestrings (or chunks and packs lazy ones when needed) then you can use Data.Vector.Storable and potentially avoid some copying.

Haskell caching results of a function

I have a function that takes a parameter and produces a result. Unfortunately, it takes quite long for the function to produce the result. The function is being called quite often with the same input, that's why it would be convenient if I could cache the results. Something like
let cachedFunction = createCache slowFunction
in (cachedFunction 3.1) + (cachedFunction 4.2) + (cachedFunction 3.1)
I was looking into Data.Array and although the array is lazy, I need to initialize it with a list of pairs (using listArray) - which is impractical . If the 'key' is e.g. the 'Double' type, I cannot initialize it at all, and even if I can theoretically assign an Integer to every possible input, I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array (or, preferably a hash table, as only a handful of resutls will be used) using a function instead of a list.
Update: I am reading the memoization articles and as far as I understand it the MemoTrie could work the way I want. Maybe. Could somebody try to produce the 'cachedFunction'? Prefereably for a slow function that takes 2 Double arguments? Or, alternatively, that takes one Int argument in a domain of ~ [0..1 billion] that wouldn't eat all memory?

Well, there's Data.HashTable. Hash tables don't tend to play nicely with immutable data and referential transparency, though, so I don't think it sees a lot of use.
For a small number of values, stashing them in a search tree (such as Data.Map) would probably be fast enough. If you can put up with doing some mangling of your Doubles, a more robust solution would be to use a trie-like structure, such as Data.IntMap; these have lookup times proportional primarily to key length, and roughly constant in collection size. If Int is too limiting, you can dig around on Hackage to find trie libraries that are more flexible in the type of key used.
As for how to cache the results, I think what you want is usually called "memoization". If you want to compute and memoize results on demand, the gist of the technique is to define an indexed data structure containing all possible results, in such a way that when you ask for a specific result it forces only the computations needed to get the answer you want. Common examples usually involve indexing into a list, but the same principle should apply for any non-strict data structure. As a rule of thumb, non-function values (including infinite recursive data structures) will often be cached by the runtime, but not function results, so the trick is to wrap all of your computations inside a top-level definition that doesn't depend on any arguments.
Edit: MemoTrie example ahoy!
This is a quick and dirty proof of concept; better approaches may exist.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
import Data.MemoTrie
import Data.Binary
import Data.ByteString.Lazy hiding (map)
mangle :: Double -> [Int]
mangle = map fromIntegral . unpack . encode
unmangle :: [Int] -> Double
unmangle = decode . pack . map fromIntegral
instance HasTrie Double where
data Double :->: a = DoubleTrie ([Int] :->: a)
trie f = DoubleTrie $ trie $ f . unmangle
untrie (DoubleTrie t) = untrie t . mangle
slow x
| x < 1 = 1
| otherwise = slow (x / 2) + slow (x / 3)
memoSlow :: Double -> Integer
memoSlow = memo slow
Do note the GHC extensions used by the MemoTrie package; hopefully that isn't a problem. Load it up in GHCi and try calling slow vs. memoSlow with something like (10^6) or (10^7) to see it in action.
Generalizing this to functions taking multiple arguments or whatnot should be fairly straightforward. For further details on using MemoTrie, you might find this blog post by its author helpful.

See memoization

There are a number of tools in GHC's runtime system explicitly to support memoization.
Unfortunately, memoization isn't really a one-size fits all affair, so there are several different approaches that we need to support in order to cope with different user needs.
You may find the original 1999 writeup useful as it includes several implementations as examples:
Stretching the Storage Manager: Weak Pointers and Stable Names in Haskell by Simon Peyton Jones, Simon Marlow, and Conal Elliott

I will add my own solution, which seems to be quite slow as well. First parameter is a function that returns Int32 - which is unique identifier of the parameter. If you want to uniquely identify it by different means (e.g. by 'id'), you have to change the second parameter in H.new to a different hash function. I will try to find out how to use Data.Map and test if I get faster results.
import qualified Data.HashTable as H
import Data.Int
import System.IO.Unsafe
cache :: (a -> Int32) -> (a -> b) -> (a -> b)
cache ident f = unsafePerformIO $ createfunc
where
createfunc = do
storage <- H.new (==) id
return (doit storage)
doit storage = unsafePerformIO . comp
where
comp x = do
look <- H.lookup storage (ident x)
case look of
Just res -> return res
Nothing -> do
result <- return (f x)
H.insert storage (ident x) result
return result

You can write the slow function as a higher order function, returning a function itself. Thus you can do all the preprocessing inside the slow function and the part that is different in each computation in the returned (hopefully fast) function. An example could look like this:
(SML code, but the idea should be clear)
fun computeComplicatedThing (x:float) (y:float) = (* ... some very complicated computation *)
fun computeComplicatedThingFast = computeComplicatedThing 3.14 (* provide x, do computation that needs only x *)
val result1 = computeComplicatedThingFast 2.71 (* provide y, do computation that needs x and y *)
val result2 = computeComplicatedThingFast 2.81
val result3 = computeComplicatedThingFast 2.91

I have several tens of thousands possible inputs and I only actually use a handful. I would need to initialize the array ... using a function instead of a list.
I'd go with listArray (start, end) (map func [start..end])
func doesn't really get called above. Haskell is lazy and creates thunks which will be evaluated when the value is actually required.
When using a normal array you always need to initialize its values. So the work required for creating these thunks is necessary anyhow.
Several tens of thousands is far from a lot. If you'd have trillions then I would suggest to use a hash table yada yada

I don't know haskell specifically, but how about keeping existing answers in some hashed datastructure (might be called a dictionary, or hashmap)? You can wrap your slow function in another function that first check the map and only calls the slow function if it hasn't found an answer.
You could make it fancy by limiting the size of the map to a certain size and when it reaches that, throwing out the least recently used entry. For this you would additionally need to keep a map of key-to-timestamp mappings.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio