I recently started using the Isabelle theorem prover. As I want to prove another lemma, I would like to use a different notation than the one used in the lemma "det_linear_row_setsum", which can be found in the HOL library. More specifically, I would like to use the "χ i j notation" instead of "χ i". I have been trying to formulate an equivalent expression for some time, but couldn't figure it out yet.
(* ORIGINAL lemma from library *)
(* from HOL/Multivariate_Analysis/Determinants.thy *)
lemma det_linear_row_setsum:
assumes fS: "finite S"
shows "det ((χ i. if i = k then setsum (a i) S else c i)::'a::comm_ring_1^'n^'n) = setsum (λj. det ((χ i. if i = k then a i j else c i)::'a^'n^'n)) S"
proof(induct rule: finite_induct[OF fS])
case 1 thus ?case apply simp unfolding setsum_empty det_row_0[of k] ..
next
case (2 x F)
then show ?case by (simp add: det_row_add cong del: if_weak_cong)
qed
..
(* My approach to rewrite the above lemma in χ i j matrix notation *)
lemma mydet_linear_row_setsum:
assumes fS: "finite S"
fixes A :: "'a::comm_ring_1^'n^'n" and k :: "'n" and vec1 :: "'vec1 ⇒ ('a, 'n) vec"
shows "det ( χ r c . if r = k then (setsum (λj .vec1 j $ c) S) else A $ r $ c ) =
(setsum (λj . (det( χ r c . if r = k then vec1 j $ c else A $ r $ c ))) S)"
proof-
show ?thesis sorry
qed
First, make yourself clear what the original lemma says: a is a family of vectors indexed by i and j, c is a family of vectors indexed by i. The k-th row of the matrix on the left is the sum of the vectors a k j ranged over all j from the set S.
The other rows are taken from c. On the right, the matrices are the same except that row k is now a k j and the j is bound in the outer sum.
As you have realised, the family of vectors a is only used for the index i = k, so you can replace a by %_ j. vec1 $ j. Your matrix A yields the family of rows, i.e., c becomes %r. A $ r. Then, you merely have to exploit that (χ n. x $ n) = x (theorem vec_nth_inverse) and push the $ through the if and setsum. The result looks as follows:
lemma mydet_linear_row_setsum:
assumes fS: "finite S"
fixes A :: "'a::comm_ring_1^'n^'n" and k :: "'n" and vec1 :: "'vec1 => 'a^'n"
shows "det (χ r c . if r = k then setsum (%j. vec1 j $ c) S else A $ r $ c) =
(setsum (%j. (det(χ r c . if r = k then vec1 j $ c else A $ r $ c))) S)"
To prove this, you just have to undo the expansion and the pushing through, the lemmas if_distrib, cond_application_beta, and setsum_component might help you in doing so.
Related
I'm studying Floyd-Warshall algorithm. Now having managed to implement it in Haskell, the way I implement it is similar to how it is implemented in imperative languages (that is to say, use list of lists to simulate 2D arrays), but this is really inefficient giving that accessing an element in a list is much more slower than in a array.
Is there a smarter way to do this in Haskell? I thought I could do this by concate some lists but keep failing.
My Code:
floydwarshall :: [[Weight]] -> [[Weight]]
floydwarshall lst = fwAlg 1 $ initMatrix 0 $ list2matrix lst
fwAlg :: Int -> [[Weight]] -> [[Weight]]
fwAlg k m | k < rows m = let n = rows m
m' = foldl (\m (i,j) -> updateDist i j k m) m [(i,j) | i <- [0..n-1], j <- [0..n-1]]
in fwAlg (k+1) m'
| otherwise = m
-- a special case where k is 0
initMatrix :: Int -> [[Weight]] -> [[Weight]]
initMatrix n m = if n == rows m then m else initMatrix (n+1) $ updateAtM 0.0 (n,n) m
updateDist :: Int -> Int -> Int -> [[Weight]] -> [[Weight]]
updateDist i j k m =
let w = min (weight i j m) (weight i k m + weight k j m)
in updateAtM w (i, j) m
weight :: Vertice -> Vertice -> [[Weight]] -> Weight
weight i j m = let Just w = elemAt (i, j) m in w
The algorithm has a regular access pattern so we can avoid a lot of
indexing and still write it with lists, with (I think) the same
asymptotic performance as the imperative version.
If you do want to use arrays for more speed, you might still want to do
something similar to this with bulk operations on rows and columns
rather than reading and writing individual cells.
-- Let's have a type for weights. We could use Maybe but the ordering
-- behaviour is wrong - when there's no weight it should be like
-- +infinity.
data Weight = Weight Int | None deriving (Eq, Ord, Show)
addWeights :: Weight -> Weight -> Weight
addWeights (Weight x) (Weight y) = Weight (x + y)
addWeights _ _ = None
-- the main function just steps the matrix a number of times equal to
-- the node count. Also pass along k at each step.
floydwarshall :: [[Weight]] -> [[Weight]]
floydwarshall m = snd (iterate step (0, m) !! length m)
-- step takes k and the matrix for k, returns k+1 and the matrix for
-- k+1.
step :: (Int, [[Weight]]) -> (Int, [[Weight]])
step (k, m) = (k + 1, zipWith (stepRow ktojs) istok m)
where
ktojs = m !! k -- current k to each j
istok = transpose m !! k -- each i to current k
-- Make shortest paths from one i to all j.
-- We need the shortest paths from the current k to all j
-- and the shortest path from this i to the current k
-- and the shortest paths from this i to all j
stepRow :: [Weight] -> Weight -> [Weight] -> [Weight]
stepRow ktojs itok itojs = zipWith stepOne itojs ktojs
where
stepOne itoj ktoj = itoj `min` (itok `addWeights` ktoj)
-- example from wikipedia for testing
test :: [[Weight]]
test = [[Weight 0, None, Weight (-2), None],
[Weight 4, Weight 0, Weight 3, None],
[None, None, Weight 0, Weight 2],
[None, Weight (-1), None, Weight 0]]
I don't know how to achieve peak performance, but I can give you some tips on making your code abstract so that you can work on performance tuning more easily.
First of all, it would be nice if, when you change around your data types, you don't have to rewrite everything. Right now, you've made everything concretely about lists of lists, so let's see if we can abstract that out. First, we have to figure out what your minimal matrix interface is. Glancing at your code, you appear to have initMatrix, list2matrix, rows, elemAt, and updateAtM. These are the functions that query or modify your matrix, and these are what you would need to implement to make a new version of this code for a different Matrix type.
One way to organize this interface is to make a class out of it. For instance:
class Matrix m where
list2matrix :: [[a]] -> m a
matrix2List :: m a -> [[a]]
rows :: m a -> Int
elemAt :: Int -> Int -> m a -> a
updateAtM :: a -> (Int, Int) -> m a -> m a
setDiag :: a -> m a -> m a
(I went ahead and added a matrix2List function for extracting your result and renamed/modified initMatrix into setDiag, which felt a little more general.)
We can then update your code to use this new class:
floydwarshall :: Matrix m => [[Weight]] -> m Weight
floydwarshall lst = fwAlg 1 $ initMatrix $ list2matrix lst
fwAlg :: Matrix m => Int -> m Weight -> m Weight
fwAlg k m | k < rows m = let n = rows m
m' = foldl (\m (i,j) -> updateDist i j k m) m [(i,j) | i <- [0..n-1], j <- [0..n-1]]
in fwAlg (k+1) m'
| otherwise = m
initMatrix :: Matrix m => m Weight -> m Weight
initMatrix = setDiag 0
updateDist :: Matrix m => Int -> Int -> Int -> m Weight -> m Weight
updateDist i j k m =
let w = min (elemAt i j m) (elemAt i k m + elemAt k j m)
in updateAtM w (i, j) m
dist :: Matrix m => Int -> Int -> Int -> m Weight -> Weight
dist i j 0 m = elemAt i j m
dist i j k m = min (dist i j (k-1) m) (dist i k (k-1) m + dist k j (k-1) m)
Now all we need to do is start defining some Matrix types and see how performance is!
Let's start with lists, since you've already done this work. We'll have to use a newtype wrapper to make GHC happy, but ignoring the wrapping and unwrapping, this is morally the same as the code you wrote:
newtype ListMatrix a = ListMatrix { getListMatrix :: [[a]] }
instance Matrix ListMatrix where
list2matrix = ListMatrix
matrix2List = getListMatrix
rows = length . getListMatrix
elemAt i j (ListMatrix m) = m !! i !! j
updateAtM a (i,j) (ListMatrix m) =
let (firstRows, row:laterRows) = splitAt i m
(firstCols, _:laterCols) = splitAt j row
in ListMatrix $ firstRows <> ((firstCols <> (a:laterCols)):laterRows)
setDiag x = go 0
where go n m = if n == rows m then m else go (n+1) $ updateAtM x (n,n) m
(Also, I filled in elemAt and updateAtM.) You should be able to run
matrix2List #ListMatrix $ floydwarshall myList
and get the same result (and performance) that you currently have.
Now, on to the experimentation! All that's necessary is for us to define new instances of Matrix and see what happens. Perhaps we should try pure functions:
data FunMatrix a = FunMatrix { size :: Int, getFunMatrix :: Int -> Int -> a }
instance Matrix FunMatrix where
list2matrix l = FunMatrix (length l) (\i j -> l !! i !! j)
matrix2List (FunMatrix s f) = (\i -> f i <$> [0..s-1]) <$> [0..s-1]
rows = size
elemAt i j m = getFunMatrix m i j
updateAtM a (i,j) (FunMatrix s f) = FunMatrix s (\i' j' -> if i==i' && j==j' then a else f i' j')
setDiag x (FunMatrix s f) = FunMatrix s (\i j -> if i==j then x else f i j)
How does that perform? One problem is that the starting lookup function is still just indexing into the list of lists, which is slow. One fix would be to convert to an array or vector first and then index. Because we've nicely abstracted everything, all that would need to change is the definition of list2matrix right here, and you'll probably get a nice performance boost!
On the topic of performance, there's one other note I can point out. The definition of dist does some serious "dynamic programming". This could work fine if you were writing and reading directly into an array, but in this recursive form, you may end up doing a lot of duplicate work. One fix is to memoize. My goto memoization package is MemoTrie, which makes it really easy to memoize things. In this case, you could change dist to:
dist :: Matrix m => m Weight -> Int -> Int -> Int -> Weight
dist m = go'
where
go' = memo3 go
go i j 0 = elemAt i j m
go i j k = min (go' i j (k-1)) (go' i k (k-1) + go' k j (k-1))
That might give you a bit of a boost!
You might consider taking #Chi's advice and use STUArray, but you'll run into a problem: the STUArray interface demands that array lookups are in a monad. It's still possible to use the abstraction method I show off above, but you'll have to change the types of the functions. And, because you change the types in the interface, you'll need to update your algorithm code to be monadic. It can be a bit of a pain, but it might be necessary to get optimal performance.
I have a conjunction, let's abstract it as: A /\ B and I have a Lemma proven that C -> A and I wish to get as a result the goal C /\ B. Is this possible?
If yes, I'd be interested in how to do it. If I use split and then apply the lemma to the first subgoal, I can't reassemble the two resulting subgoals C and B to C /\ B - or can I? Also apply does not seem to be applyable to only one branch of a conjunction.
If no, please explain to me why this is not possible :-)
You could introduce a lemma like :
Theorem cut: forall (A B C: Prop), C /\ B -> (C -> A) -> A /\ B.
Proof.
intros; destruct H; split; try apply H0; assumption.
Qed.
And then define a tactic like :
Ltac apply_left lemma := eapply cut; [ | apply lemma].
As an example, you could do stuff like :
Theorem test: forall (m n:nat), n <= m -> max n m = m /\ min n m = n.
Proof.
intros.
apply_left max_r.
...
Qed.
In this case, the context goes from :
Nat.max n m = m /\ Nat.min n m = n
to
n <= m /\ Nat.min n m = n
I assume that's what you are looking for.
Hope this will help you !
In the section on insertion into Braun trees of the Verified Programming in Agda book (page 118), the author does some explanation of what the code is supposed to be doing, but leaving what it does aside, a singificant ommision in the book so far is not explaining the strange syntax in function pattern matching for theorem proving.
I understand that the with pattern can be further destructured by using | and I can understand that when using rewrite, | can also be used to separate the different rewrites, but this makes it confusing.
As far as I can tell, rewrite is definitely not a function. And then comes the following:
bt-insert a (bt-node{n}{m} a' l r p)
rewrite +comm n m with p | if a <A a' then (a , a') else (a' , a)
bt-insert a (bt-node{n}{m} a' l r _) | inj₁ p | (a1 , a2)
rewrite p = (bt-node a1 (bt-insert a2 r) l (inj₂ refl))
bt-insert a (bt-node{n}{m} a' l r _) | inj₂ p | (a1 , a2) =
(bt-node a1 (bt-insert a2 r) l (inj₁ (sym p)))
I am really confused as to how rewrite +comm n m with p | if a <A a' then (a , a') else (a' , a) should be parsed mentally. And how does one read | inj₁ p | (a1 , a2) rewrite p? Also, while testing the previous examples I've discovered that for some reason the order of the rewrites does not matter. Why is that?
If you ignore the proofs for a sec, this function can be simplified as
bt-insert : ∀ {n: ℕ} → A → braun-tree n → braun-tree (suc n)
bt-insert a (bt-node {n} {m} a' l r _) = bt-node a1 (bt-insert a2 r) l _
where
(a1, a2) = if a <A a' then (a , a') else (a' , a)
So (a1, a2) is just (min a a', max a a') i.e. (a, a') sorted.
All the other code is there to maintain the proofs of the invariants:
We rewrite +comm n m so that we can return a braun-tree (2 + (m + n)) even though the return type requires a braun-tree (2 + (n + m)).
p is used to prove that the resulting tree is still balanced: p proves that n ≡ m ∨ n ≡ suc m, so it's either inj₁ (p : n ≡ m) or inj₂ (p : n ≡ suc m). We use the proof of either property to compute the proof of suc m ≡ n ∨ suc m ≡ suc n (remember we flipped n and m via the proof of commutativity).
After pondering it for a bit, I realized that if...
p | if a <A a' then (a , a') else (a' , a)
inj₁ p | (a1 , a2)
I put the expressions like that then it makes sense visually. In bt_insert's second case the rewrite comes before the if statement and in the third case it comes after the destructuring of the if pattern.
Well, that leaves figuring out what the rest of the function is doing.
I am starting in Haskell and am interested in how to get matching performance for simple code I would normally write in C or Python. Consider the following problem.
You are given a long string of 1s and 0s of length n. We want to output, for each substring of length m, the number of 1s in that window. That is the output has n-m+1 different possible values between 0 and m inclusive.
In C this is very simple to do in time proportional to n and using extra space (on top of the space needed to store the input) proportional to m bits. You just count the number of 1s in the first window of length m and then maintain two pointers, one to the start of the window and one to the end and increment or decrement depending of whether one points to a 1 and the other points to a 0 or the opposite occurs.
Is it possible to get the same theoretical performance in a purely functional way in Haskell?
Some terrible code:
chunkBits m = helper
where helper [] = []
helper xs = sum (take m xs) : helper (drop m xs)
main = print $ chunkBits 5 [0,1,1,0,1,0,0,1,0,1,0,1,1,1,0,0,0,1]
C Code
Here is the C code you've described:
int sliding_window(const char * const str, const int n, const int m, int * result){
const char * back = str;
const char * front = str + m;
int sum = 0;
int i;
for(i = 0; i < m; ++i){
sum += str[i] == '1';
}
*result++ = sum;
for(; i < n; ++i){
sum += *front++ == '1';
sum -= *back++ == '1';
*result++ = sum;
}
return n - m + 1;
}
Algorithm
The code above is apparently O(n), since we have n iterations. But lets go a step back and have a look at the underlying algorithm:
Sum the first m elements. Keep this as sum. O(m)
Our first window has sum 1s. O(1)
Until we've exhausted our original string: O(n)
"Slide" the window. O(1)
add 1 to sum if we gain a '1' by sliding O(1)
subtract 1 from sum if we lose a '1' by sliding O(1)
Push sum onto the results. O(1)
Since n > m (otherwise there is no window), O(n) holds.
Moulding a Haskell variant
That's basically a left scan (scanl) with a way to get a list of those differences in (2.1.). So all we need is a way to somehow slide:
slide :: Int -> [Char] -> [Int]
slide m xs = zipWith f xs (drop m xs)
where
f '1' '0' = -1 -- we lose a one
f '0' '1' = 1 -- we gain a one
f _ _ = 0 -- nothing :/
That's O(n), where n is the length of our list.
slidingWindow :: Int -> [Char] -> [Int]
slidingWindow m xs = scanl (+) start (slide m xs)
where
start = length (filter (== '1') (take m xs))
That's O(n), same as in C, since both use the same algorithm.
Caveats
In a real life application, you would always use Text or ByteString instead of String, since the latter is a list of Char with much overhead. Since you only use a string of '1' and '0', you can use ByteString:
import Data.ByteString.Char8 (ByteString)
import qualified Data.ByteString.Char8 as BS
import Data.List (scanl')
slide :: Int -> ByteString -> [Int]
slide m xs = BS.zipWith f xs (BS.drop m xs)
where
f '1' '0' = -1
f '0' '1' = 1
f _ _ = 0
slidingWindow :: Int -> ByteString -> [Int]
slidingWindow m xs = scanl' (+) start (slide m xs)
where
start = BS.count '1' (BS.take m xs)
Update
After reading the question more carefully I noticed that the
C program reads its input from an array.
So here is an equivalent Haskell "pure" function which performs the task.
import qualified Data.Vector as V
import Data.List
import Control.Monad
count :: Int -> V.Vector Int -> [Int]
count m v =
let c0 = V.sum (V.take m v)
n = V.length v
results = scanl' go c0 [0..n-m-1]
where go r i = r - (v V.! i) + (v V.! (i+m))
in results
test1 = let v = V.fromList [0,0,1,1,1,1,1,0,0,0,0]
in print $ count 3 v
Even though count returns a list it will be generated lazily. Moreover, if it is consume by another list operation it could be optimized via one of the various fusion techniques.
Original Answer
This is a good exercise, but why does it have to be "purely functional" (and what does that mean anyway)?
You can write the C algorithm in Haskell - it's not as terse, but it will
generate essentially the same code.
import Data.Vector.Unboxed.Mutable as V
count m = do
v <- V.replicate m '0'
let toInt ch = if ch == '1' then 1 else 0
let loop c i = do
ch <- getChar
oldch <- V.read v i
let c' = c + toInt ch - toInt oldch
V.write v i ch
let i' = mod (i+1) m
putStrLn $ show c
loop c' i'
loop 0 0
main = count 3
(For simplicity this generates n results.)
If you were benchmark this note that you are also including the performance of
getChar and putStrLn and show, so it might be difficult to make a fair
comparison with a C program. However, it has O(n) complexity and constant
memory usage which is what I think you're asking for.
The most basic level is re-implementing the cool HOF-based algorithms with hand-written recursive functions to express the loops.
Banged patterns mark arguments as strict, so simple values can be calculated without unnecessary delay (this is implicitly taken care of when using scanl', for example). This also shows that "pointers" are just names:
{-# LANGUAGE BangPatterns #-}
-- assumes xs has only 0s and 1s
counts :: Int -> [Int] -> [Int]
counts m xs = g 0 m xs
where
g !c 0 ys = h c ys xs
g !c _ [] = [] -- m > |xs|
g !c m (y:ys) = g (c+y) (m-1) ys
h !c [] _ = [c]
h !c (y:ys) (x:xs) = c : h (c+y-x) ys xs
Testing,
> counts [1,1,0,0,1,1,0,1] 2
[2,1,0,1,2,1,1]
> counts [1,1,0,0,1,1,1,1] 3
[2,1,1,2,3,3]
I am facing problems with the following lemma, which I think should be correct. I can get the proof to work up to a certain point with small steps, however I haven't found a way to proof the entire lemma.
lemma abc:
fixes A :: "'a::comm_ring_1^'n^'n" and l :: 'n and c :: 'a
shows "(χ i j. if i = l then c * (transpose A $ i $ j) else (transpose A $ i $ j)) =
(χ i j. if i = l then c * (A $ j $ i) else (A $ j $ i))"
proof -
(* here is my draft *)
have th1: "(χ i j. transpose A $ i $ j) = (χ i j. A $ j $ i)"
by (simp add: det_transpose transpose_def)
have "(χ i j. if i = l then (transpose A $ i $ j) else (transpose A $ i $ j)) =
(χ i j. A $ j $ i)" by (metis column_def row_def row_transpose)
show ?thesis sorry
qed
Before even starting to prove something in Isabelle, you should know how to prove it on paper (also experienced Isabelle users do not always heed their own advice ;)). If you know how to prove it on paper, it might still not be obvious how to translate the paper proof into Isabelle/Isar. However, it would make it easier to help (and show that you understand the mathematical problem at hand, which has nothing to do with Isabelle per se).
In the following I'll explain how I would approach this kind of proof:
lemma abc:
fixes A :: "'a::comm_ring_1^'n^'n" and l :: 'n and c :: 'a
shows "(χ i j. if i = l then c * (transpose A $ i $ j) else (transpose A $ i $ j)) =
(χ i j. if i = l then c * (A $ j $ i) else (A $ j $ i))"
The first thing I noted is the abstractions χ i j. .... If I would prove something about plain lambda abstractions, I would definitely want to get rid of those as a first step, e.g., in order to prove that two functions f and g are equal, I would prove that f x = g x for all x. Which is expressed in Isabelle by the rule
ext: (⋀x. ?f x = ?g x) ⟹ ?f = ?g
Since I do not know much about Multivariate_Analysis I try to find a similar rule involving χ, by
find_theorems "(χ i. ?f i) = (χ i. ?g i)"
where the first hit is what I was searching for, i.e.,
Determinants.Cart_lambda_cong: (⋀x. ?f x = ?g x) ⟹ vec_lambda ?f = vec_lambda ?g
So I start the proof by applying this rule twice (with intro rule-name, the introduction rule rule-name is applied as often as possible):
proof (intro Cart_lambda_cong)
Now I have to show that for arbitrary i and j the statement holds when substituted for the χ parameters, i.e.,
fix i j
show "(if i = l then c * (transpose A $ i $ j) else (transpose A $ i $ j)) =
(if i = l then c * (A $ j $ i) else (A $ j $ i))"
Then the prove is finished by applying the definition of transpose:
by (simp add: transpose_def)
qed
Or instead of the above step-wise proof we could do
by (auto intro!: Cart_lambda_cong simp: transpose_def)
where the ! after intro tells the system that this rule should be applied aggressively (it does not work without the !, but don't ask me why ;), we would have to check the trace of applied rules to find out).
Looking at the lemma reveals that transpose A $ i $ j = A $ j $ i holds for all A,i,j, which can be easily proven by the simplifier:
lemma transpose_eq: "⋀i j. transpose A $ i $ j = A $ j $ i" by (simp add: transpose_def)
If we apply this equation manually with the subst method, your lemma can be solved easily
be just rewriting the branches of the if expression:
lemma abc:
fixes A :: "'a::comm_ring_1^'n^'n" and l :: 'n and c :: 'a
shows "(χ i j. if i = l then c * (transpose A $ i $ j) else (transpose A $ i $ j)) =
(χ i j. if i = l then c * (A $ j $ i) else (A $ j $ i))"
apply (subst transpose_eq)
apply (subst transpose_eq)
apply (rule refl)
done
So, instead of subst we should be able to use the simplifier with apply (simp add: transpose_eq), right? The reason it does not work is due to congruence rules. Basically, the simplifier knows a rule (if_weak_cong), which explicitly forbids it to simplify the branches of an if expression (this is often a good idea to keep goals on a managable size). So we need to remove this rule:
apply (simp add: transpose_def cong del: if_weak_cong)
solves your original lemma.
For a more in-depth answer see "Why won't Isabelle simplify the body of my "if _ then _ else" construct?".