Isabelle/HOL: proof by 'simp' is slow while 'value' is instantaneous - performance

I am new to Isabelle/HOL, still in the study of the prog-prov exercises. In the meantime, I am exercising by applying these proof techniques to questions of combinatorial words. I observe a very different behavior (in terms of efficiency), between 'value' and 'lemma'.
Can one explain the different evaluation/search strategies between the two commands?
Is there a way to have the speed of 'value' used inside a proof of a 'lemma'?
Of course, I am asking because I have not found the answer in the documentation (so far). What is the manual where this difference of efficiency would be documented and explained?
Here is a minimal piece of source to reproduce the problem.
theory SlowLemma
imports Main
begin
(* Alphabet for Motzkin words. *)
datatype alphabet = up | lv | dn
(* Keep the [...] notation for lists. *)
no_notation Cons (infixr "#" 65) and append (infixr "#" 65)
primrec count :: "'a ⇒ 'a list ⇒ nat" where
"count _ Nil = 0" |
"count s (Cons h q) = (if h = s then Suc (count s q) else count s q)"
(* prefix n l simply returns undefined if n > length l. *)
fun prefix :: "'a list ⇒ nat ⇒ 'a list" where
"prefix _ 0 = []" |
"prefix (Cons h q) (Suc n) = Cons h (prefix q n)"
definition M_ex_7 :: "alphabet list" where
"M_ex_7 ≡ [lv, lv, up, up, lv, dn, dn]"
definition M_ex_19 :: "alphabet list" where
"M_ex_19 ≡ [lv, lv, up, up, lv, up, lv, dn, lv, dn, lv, up, dn, dn, lv, up, dn, lv, lv]"
fun height :: "alphabet list ⇒ int" where
"height w = (int (count up w + count up w)) - (int (count dn w + count dn w))"
primrec is_pre_M :: "alphabet list ⇒ nat ⇒ bool" where
"is_pre_M _ (0 :: nat) = True"
| "is_pre_M w (Suc n) = (let w' = prefix w (Suc n) in is_pre_M w' n ∧ height w' ≥ 0)"
fun is_M :: "alphabet list ⇒ bool" where
"is_M w = (is_pre_M w (length w) ∧ height w = 0)"
(* These two calls to value are fast. *)
value "is_M M_ex_7"
value "is_M M_ex_19"
(* This first lemma goes fast. *)
lemma is_M_M_ex_7: "is_M M_ex_7"
by (simp add: M_ex_7_def)
(* This second lemma takes five minutes. *)
lemma is_M_M_ex_19: "is_M M_ex_19"
by (simp add: M_ex_19_def)
end

simp is a proof method that goes through the proof kernel, i.e., every step has to be justified. For long rewriting chains, this may be quite expensive.
On the other hand, value uses the code generator where possible. All used constants are translated into ML code, which is then executed. You have to trust the result, i.e., it didn't go through the kernel and may be wrong.
The equivalent of value as a proof method is eval. Thus, an easy way to speed up your proofs is to use this:
lemma is_M_M_ex_19: "is_M M_ex_19"
by eval
Opinions in the Isabelle community about whether or not this should be used differ. Some say it's similar to axiomatization (because you have to trust it), others consider it a reasonable way if going through the kernel is prohibitively slow. Everyone agrees though that you have to be really careful about custom setup of the code generator (which you haven't done, so it should be fine).
There's middle ground: the code_simp method will set up simp to use only the equations that would otherwise be used by eval. That means: a much smaller set of rules for simp, while still going through the kernel. In your case, it is actually the same speed as by eval, so I would highly recommend doing that:
lemma is_M_M_ex_19: "is_M M_ex_19"
by code_simp
In your case, the reason why code_simp is much faster than simp is because of a simproc that has exponential runtime in the number of nested let expressions. Hence, another solution would be to use simp add: Let_def to just unfold let expressions.
Edited to reflect comment by Andreas Lochbihler

Related

How can I subtract a multiset from a set with a given multiset?

So I'm trying to define a function apply_C :: "('a multiset ⇒ 'a option) ⇒ 'a multiset ⇒ 'a multiset"
It takes in a function C that may convert an 'a multiset into a single element of type 'a. Here we assume that each element in the domain of C is pairwise mutually exclusive and not the empty multiset (I already have another function that checks these things). apply will also take another multiset inp. What I'd like the function to do is check if there is at least one element in the domain of C that is completely contained in inp. If this is the case, then perform a set difference inp - s where s is the element in the domain of C and add the element the (C s) into this resulting multiset. Afterwards, keep running the function until there are no more elements in the domain of C that are completely contained in the given inp multiset.
What I tried was the following:
fun apply_C :: "('a multiset ⇒ 'a option) ⇒ 'a multiset ⇒ 'a multiset" where
"apply_C C inp = (if ∃s ∈ (domain C). s ⊆# inp then apply_C C (add_mset (the (C s)) (inp - s)) else inp)"
However, I get this error:
Variable "s" occurs on right hand side only:
⋀C inp s.
apply_C C inp =
(if ∃s∈domain C. s ⊆# inp
then apply_C C
(add_mset (the (C s)) (inp - s))
else inp)
I have been thinking about this problem for days now, and I haven't been able to find a way to implement this functionality in Isabelle. Could I please have some help?
After thinking more about it, I don't believe there is a simple solutions for that Isabelle.
Do you need that?
I have not said why you want that. Maybe you can reduce your assumptions? Do you really need a function to calculate the result?
How to express the definition?
I would use an inductive predicate that express one step of rewriting and prove that the solution is unique. Something along:
context
fixes C :: ‹'a multiset ⇒ 'a option›
begin
inductive apply_CI where
‹apply_CI (M + M') (add_mset (the (C M)) M')›
if ‹M ∈ dom C›
context
assumes
distinct: ‹⋀a b. a ∈ dom C ⟹ b ∈ dom C ⟹ a ≠ b ⟹ a ∩# b = {#}› and
strictly_smaller: ‹⋀a b. a ∈ dom C ⟹ size a > 1›
begin
lemma apply_CI_determ:
assumes
‹apply_CI⇧*⇧* M M⇩1› and
‹apply_CI⇧*⇧* M M⇩2› and
‹⋀M⇩3. ¬apply_CI M⇩1 M⇩3›
‹⋀M⇩3. ¬apply_CI M⇩2 M⇩3›
shows ‹M⇩1 = M⇩2›
sorry
lemma apply_CI_smaller:
‹apply_CI M M' ⟹ size M' ≤ size M›
apply (induction rule: apply_CI.induct)
subgoal for M M'
using strictly_smaller[of M]
by auto
done
lemma wf_apply_CI:
‹wf {(x, y). apply_CI y x}›
(*trivial but very annoying because not enough useful lemmas on wf*)
sorry
end
end
I have no clue how to prove apply_CI_determ (no idea if the conditions I wrote down are sufficient or not), but I did spend much thinking about it.
After that you can define your definitions with:
definition apply_C where
‹apply_C M = (SOME M'. apply_CI⇧*⇧* M M' ∧ (∀M⇩3. ¬apply_CI M' M⇩3))›
and prove the property in your definition.
How to execute it
I don't see how to write an executable function on multisets directly. The problem you face is that one step of apply_C is nondeterministic.
If you can use lists instead of multisets, you get an order on the elements for free and you can use subseqs that gives you all possible subsets. Rewrite using the first element in subseqs that is in the domain of C. Iterate as long as there is any possible rewriting.
Link that to the inductive predicate to prove termination and that it calculates the right thing.
Remark that in general you cannot extract a list out of a multiset, but it is possible to do so in some cases (e.g., if you have a linorder over 'a).

Expanding all definitions in Isabelle lemma

How can I tell Isabelle to expand all my definitions, please, because that way the proof is trivial? Unfortunately there is no default expansion or simplification happens, and basically I get back the original expression as the subgoal.
Example:
theory Test
imports Main
begin
definition b0 :: "nat⇒nat"
where "b0 n ≡ (n mod 2)"
definition b1 :: "nat⇒nat"
where "b1 n ≡ (n div 2)"
lemma "(a::nat)≤3 ∧ (b::nat)≤3 ⟶
2*(b1 a)+(b0 a)+2*(b1 b)+(b0 b) = a+b"
apply auto
oops
end
Respose before oops:
proof (prove)
goal (1 subgoal):
1. a ≤ 3 ⟹
b ≤ 3 ⟹ 2 * b1 a + b0 a + 2 * b1 b + b0 b = a + b
My recommendation: unfolding
There is a special keyword unfolding for unpacking definitions at the start of proofs. For your example this would read:
unfolding b0_def b1_def by simp
I consider unfolding the most elegant way. It also helps while writing the proofs. Internally, this is (mostly?) equivalent to using the unfold-method:
apply (unfold b0_def b1_def) by simp
This will recursively (!) use the set of equalities you supply to rewrite the proof goal. (Due to the recursion, you should rather not supply a set of equalities that could generate cycles...)
Alternative: Using the simplifier
In cases with possible loops, the simplifier might be able to reach a nice unfolding without running into these cycles, maybe by interleaving with other simplifications. In such cases, by (simp add: b0_def b1_def), as you've suggested, is great!
Alternative definition: Maybe it's just an abbreviation (and no definition)?
If you find yourself unfolding a definition in every single instance, you could consider, using abbreviation instead of definition. Then, some Isabelle magic will do the packing/unpacking for you without further hints. abbeviation does only affect how the user communicates with Isabelle. It does not introduce new symbols at the object level, and consequently, there would be no b1_def facts and the like.
abbreviation b0 :: "nat⇒nat"
where "b0 n ≡ (n mod 2)"
Usually not recommended: Building something like an abbreviation using the simplifier
If you (for whatever reason) want to have a defined name at the object level, but unfold it in almost every instance, you can also feed the defining equality directly into the simplifier.
definition b0 :: "nat⇒nat"
where [simp]: "b0 n ≡ (n mod 2)"
(Usually there should be little reason for the last option.)
Yes, I keep forgetting that definitions are not used in simplifications by default.
Adding the definitions explicitly to the simplification rules solves this problem:
lemma "(a::nat)≤3 ∧ (b::nat)≤3 ⟶
2*(b1 a)+(b0 a)+2*(b1 b)+(b0 b) = a+b"
by (simp add: b0_def b1_def)
This way the definitions (b0, b1) are correctly used.

How to write bigvee and big wedge in Isabelle

I'm trying to use Isabelle to do auto-prove. However, I got a problem of specifying formulas in Isabelle. For example, I have a formulas like this
Then, I define sets and use big_wedge and big_vee symbols in Isabelle as follows:
And the result is "Inner lexical error⌂ Failed to parse prop".
Could you explain what is wrong here, please?
Thank you very much.
Not all symbols shown in Isabelle/jEdit's Symbol tabs have a meaning. These are the symbols you can use in your code.
Based on the corresponding code for sums, I started the setup, but I did not finish it (in particular, the syntax ⋀t!=l. P t is not supported).
context comm_monoid_add
begin
sublocale bigvee: comm_monoid_set HOL.disj False
defines bigvee = bigvee.F and bigvee' = bigvee.G
by standard auto
abbreviation bigvee'' :: ‹bool set ⇒ bool› ("⋁")
where "⋁ ≡ bigvee (λx. x)"
sublocale bigwedge: comm_monoid_set HOL.conj True
defines bigwedge = bigwedge.F and bigwedge' = bigwedge.G
by standard auto
abbreviation bigwedge'' :: ‹bool set ⇒ bool› ("⋀")
where "⋀ ≡ bigwedge (λx. x)"
end
syntax
"_bigwedge" :: "pttrn ⇒ 'a set ⇒ 'b ⇒ 'b::comm_monoid_add" ("(2⋀(_/∈_)./ _)" [0, 51, 10] 10)
translations ― ‹Beware of argument permutation!›
"⋀i∈A. b" ⇌ "CONST bigwedge (λi. b) A"
syntax
"_bigvee" :: "pttrn ⇒ 'a set ⇒ 'b ⇒ 'b::comm_monoid_add" ("(2⋁(_/∈_)./ _)" [0, 51, 10] 10)
translations ― ‹Beware of argument permutation!›
"⋁i∈A. b" ⇌ "CONST bigvee (λi. b) A"
instantiation bool :: comm_monoid_add
begin
definition zero_bool where
[simp]: ‹zero_bool = False›
definition plus_bool where
[simp]: ‹plus_bool = (∨)›
instance
by standard auto
end
thm bigvee_def
lemma ‹finite A ⟹ (⋁i∈A. f i) ⟷ (∃i ∈ A. f i)›
apply (induction rule: finite_induct)
apply (auto simp: )
done
lemma ‹finite A ⟹ (⋀i∈A. f i) ⟷ A = {} ∨ (∀i ∈ A. f i)›
apply (induction rule: finite_induct)
apply (auto simp: )[2]
done
lemma ‹infinite A ⟹ (⋀i∈A. f i) ⟷ True›
by auto
lemma test1:
‹(⋀j∈L. ⋀u∈U. ⋀t∈T. ⋀l∈L. ⋀l⇩1∈L⇩1. ¬P j u t l⇩1) ∨
(⋁i∈I. ⋁v∈V. ⋀k∈K. ⋁h∈H. Q i ∨ k h) ⟹
(⋁i∈I. ⋁v∈V. ⋀k∈K. ⋁h∈H. Q i ∨ k h) ∨ (⋀j∈J. ⋀u∈U. ⋀t∈T. ⋀l⇩1∈L⇩1. ¬P j u t l⇩1)›
apply auto
The full setup is possible. But I am not certain that this is a good idea... You will need a lot of lemmas to make things work nicely and I am not certain the behaviour for infinite sets is the right one.

How to define level tree traversal of a binary tree in isabelle/hol

I have written level traversal of binary tree in other language definitions, but I don't know how to represent level traversal in isabelle/hol.Has anyone defined it or how to define it?
In principle, you can do it exactly the same way as in Haskell. The problematic bit is that you have to prove termination of the recursive auxiliary function (what is called tbf in the Haskell code you linked). The easiest way to show this is by finding some sort of measure on the input (a list of trees) that decreases with every recursive call.
I propose the following measure: sum the sizes of all the trees in the list, where the size is the number of all the nodes in the tree (including leaf nodes).
We can use the binary trees from HOL-Library (HOL-Library.Tree). First, we define some auxiliary functions on trees, including our size functions, and prove some facts about them:
primrec tree_values :: "'a tree ⇒ 'a list" where
"tree_values Leaf = []"
| "tree_values (Node l x r) = [x]"
primrec tree_children :: "'a tree ⇒ 'a tree list" where
"tree_children Leaf = []"
| "tree_children (Node l x r) = [l, r]"
primrec tree_size :: "'a tree ⇒ nat" where
"tree_size Leaf = 1"
| "tree_size (Node l x r) = tree_size l + tree_size r + 1"
definition tree_list_size :: "'a tree list ⇒ nat"
where "tree_list_size = sum_list ∘ map tree_size"
lemma tree_size_pos: "tree_size t > 0"
by (induction t) auto
lemma tree_size_nonzero [simp]: "tree_size t ≠ 0"
by (simp add: tree_size_pos)
lemma tree_list_size_children [simp]:
"tree_list_size (tree_children t) = tree_size t - 1"
by (cases t) (auto simp: tree_list_size_def)
Next, we will need another simple lemma on sum_list and concat:
lemma sum_list_concat: "sum_list (concat xs) = sum_list (map sum_list xs)"
by (induction xs) auto
Finally, we can define BFS and prove its termination:
function bfs_aux :: "'a tree list ⇒ 'a list" where
"bfs_aux ts =
(if ts = [] then [] else concat (map tree_values ts) # bfs_aux (concat (map tree_children ts)))"
by auto
termination
proof (relation "measure tree_list_size")
fix ts :: "'a tree list"
assume ts: "ts ≠ []"
have "tree_list_size (concat (map tree_children ts)) =
sum_list (map (tree_list_size ∘ tree_children) ts)"
by (simp add: map_concat sum_list_concat tree_list_size_def o_assoc)
also from ‹ts ≠ []› have "… < sum_list (map tree_size ts)"
by (intro sum_list_strict_mono) (auto simp: tree_size_pos)
also have "… = tree_list_size ts"
by (simp add: tree_list_size_def)
finally show "(concat (map tree_children ts), ts) ∈ measure tree_list_size"
by simp
qed auto
definition bfs :: "'a tree ⇒ 'a list"
where "bfs t = bfs_aux [t]‹›
And we can test it:
value "bfs (⟨⟨⟨Leaf, ''d'', Leaf⟩, ''b'', ⟨Leaf, ''e'', Leaf⟩⟩, ''a'',
⟨⟨Leaf, ''f'', Leaf⟩, ''c'', ⟨Leaf, ''g'', Leaf⟩⟩⟩)"
> "[''a'', ''b'', ''c'', ''d'', ''e'', ''f'', ''g'']"
:: "char list list"
For more on defining functions with non-trivial recursion patterns like this and proving their termination, see the documentation of the function package (Section 4 in particular).

Combinator logic axioms

I'm carrying out some experiments in theorem proving with combinator logic, which is looking promising, but there's one stumbling block: it has been pointed out that in combinator logic it is true that e.g. I = SKK but this is not a theorem, it has to be added as an axiom. Does anyone know of a complete list of the axioms that need to be added?
Edit: You can of course prove by hand that I = SKK, but unless I'm missing something, it's not a theorem within the system of combinator logic with equality. That having been said, you can just macro expand I to SKK... but I'm still missing something important. Taking the set of clauses p(X) and ~p(X), which easily resolve to a contradiction in ordinary first-order logic, and converting them to SK, performing substitution and evaluating all calls of S and K, my program generates the following (where I am using ' for Unlambda's backtick):
''eq ''s ''s ''s 'k s ''s ''s 'k s ''s 'k k 'k eq ''s ''s 'k s 'k k 'k k ''s 'k k 'k false 'k true 'k true
It looks like maybe what I need is an appropriate set of rules for handling the partial calls 'k and ''s, I'm just not seeing what those rules should be, and all the literature I can find in this area was written for a target audience of mathematicians not programmers. I suspect the answer is probably quite simple once you understand it.
Some textbooks define I as mere alias for ((S K) K). In this case they are identical (as terms) per definitionem. To prove their equality (as functions), we need only to prove that equality is reflexive, which can be achieved by a reflexivity axiom scheme:
Proposition ``E = E'' is deducible (Reflexivity axiom scheme, instantiated for each possible terms denoted here by metavariable E)
Thus, I suppose in the followings, that Your questions investigates another approach: when combinator I is not defined as a mere alias for compound term ((S K) K), but introduced as a standalone basic combinator constant on its own, whose operational semantics is declared explicitly by axiom scheme
``(I E) = E'' is deducible (I-axiom scheme)
I suppose Your question asks
whether we can deduce formally (remaining inside the system), that such a standalone-defined I behaves exactly as ((S K) K), when used as functions in reductions?
I think we can, but we must resort to stronger tools. I conjecture that the usual axiom schemes are not enough, we have to declare also the extensionality property (equality of functions), that's the main point. If we want to formalize extensionality as an axiom, we have to augment our object language with free variables.
I think, we have to adopt such an approach for building combinatory logic, that we have to allow also the use of variables in the object langauge. Oof course, I mean "just" free valuables. Using bound variables would be cheating, we have to remain inside the realm of combinatory logic. Using free varaibles is not cheating, it's a honest tool. Thus, we can do the formal proof You required.
Besides the straightforward equality axioms and rules of inference (transitivity, reflexivity, symmetry, Leibniz rules), we must add an extensionality rule of inference for equality. Here is the point where free variables matter.
In Csörnyei 2007: 157-158, I have found the following approach. I think this way the proof can be done.
Some remarks:
Most of the axioms are in fact axiom schemes, consisting of infinitely many axiom instances. The instances must be instantiated for for every possible E, F, G terms. Here, I use italics for metavariables.
The superficial infinite nature of axiom schemes won't raise computability problems, because they can be tackled in a finite time: our axiom system is recursive. It means that a clever parser can decide in a finite time (moreover, very effectively), whether a given proposition is an instance of an axiom scheme, or not. Thus, the usage of axiom schemes does not raise neither theoretical nor practical problems.
Now let us seem our framework:
Language
ALPHABET
Constants: The following three are called constants: K, S, I.
I added the constant I only because Your question presupposes that we have not defined the combinator I as an mere alias/macro for compound term S K K, but it is a standalone constant on its own.
I shall denote constants by boldface roman capitals.
Sign of application: A sign # of ``application'' is enough (prefix notation with arity 2). As syntactic sugar, I use here parantheses instead of the explicit application sign: I shall use the explicit both opening ( and closing ) signs.
Variables: Although combinator logic does not make use of bound variables, scope etc, but we can introduce free variables. I suspect, they are not only syntactic sugar, they can strengthen the deduction system, too. I conjecture, that Your question will require their usage. Any enumerable infinite set (disjoint of the constants and parenthesis signs) will serve as the alphabet of variables, I will denote them here with unformatted roman lowercase letters x, y, z...
TERMS
Terms are defined inductively:
Any constant is a term
Any variable is a term
If E is a term, and F is a term too, then also (E F) is a term
I sometimes use practical conventions as syntactic sugar, e.g. write
E F G H
instead of
(((E F) G) H).
Deduction
Conversion axiom schemes:
``K E F = E'' is deducible (K-axiom scheme)
``S F G H = F H (G H)'' is deducible (S-axiom scheme)
``I E = E'' is deducible (I-axiom scheme)
I added the third conversion axiom (I rule) only because Your question presupposes that we have not defined the combinator I as an alias/macro for S K K.
Equality axiom schemes and rules of inference
``E = E'' is deducible (Reflexivity axiom)
If "E = F" is deducible, then "F = E" is also deducible (Symmetry rule of inference)
If "E = F" is deducible, and "F = G" is deducible too, then also "E = G" is reducible (Transitivity rule)
If "E = F" is deducible, then "E G = F G" is also deducible (Leibniz rule I)
If "E = F" is deducible, then "G E = G F" is also deducible (Leibniz rule II)
Question
Now let us investigate Your question. I conjecture that the deduction system defined so far is not strong enough to prove Your question.
Is proposition "I = S K K" deducible?
The problem is, that we have to prove the equivalence of functions. We regard two functions equivalent if they behave the same way. Functions act so that they are applied to arguments. We should prove that both functions act the same way if applied to each possible arguments. Again, the problem with infinity! I suspect, axioms schemes can't help us here. Something like
If E F = G F is deducible, then also E = G is deducible
would fail to do the job: we can see that this does not yield what we want. Using it, we can prove that
``I E = S K K E'' is deducible
for each E term instance, but these results are only separated instances of, and cannot be used as a whole for further deductions. We have only concrete results (infinitely many), not being able to summarize them:
it holds for E := K
holds for E := S
it holds for E := K K
.
.
.
...
we cannot summarize these fragmented result instances into a single great result, stating extensionality! We cannot pour these low-value fragment into the funnel a rule of inference that would melt them together into a single more valuable result.
We have to augment the power of our deduction system. We have to find a formal tool that can grasps the problem. Your questions leads to extensionality, and I think, declaring extensionality needs that we can pose propositions that hold for *****arbitrary***** instances. That's why I think we must allow free variables inside our object language. I conjecture that the following additional rule of inference will do the work:
If variable x is not part of terms neither E nor F, and statement (E x) = (F x) is deducible, then E = F is also deducible (Extensionality rule of inference)
The hard thing in this axiom, easily leading to confusion: x is an object variables, fully emancipated and respected parts of our object language, while E and G are metavariables, not parts of the object language, but used only for a concise notation of axiom schemes.
(Remark: More precisely, the extensionality rule of inference should be formalized in a more careful way, introducing a metavariable x over all possible object variables x, y, z..., and also another kind of metavariable E over all possible term instances. But this distinction among the two kinds of metavariables plus the object variables is not so didactic here, it does not affect Your question too much.)
Proof
Let us prove now the proposition that ``I = S K K''.
Steps for left-hand side:
proposition ``I x = x'' is an instance of I-axiom scheme with instatiation [E := x]
Steps for right-hand side:
Proposition "S K K x = K x (K x)" is an instance of S-axiom scheme with instantiations [E := K, F := K, G := x], thus it is deducible
Proposition "K x (K x) = x" is an instance of K-axiom scheme with instantiations [E := x, F := K x], thus it is deducible
Transitivity of equality:
Statement "S K K x = K x (K x)" matches the first premise of transitivity rule of inference, and statement "K x (K x) = x" matches the second premise of this rule of inference. The instantiations are [E := S K K x, F := K x (K x), G = x]. Thus the conclusion holds too: E = G. Rewriting the conclusion with the same instantiations, we get statement "S K K x = x", thus, this is deducible.
Symmetry of equality:
Using "S K K x = x", we can infer "x = S K K x"
Transitivity of equality:
Using "I x = x" and "x = S K K x", we can infer "I x = S K K x"
Now we have paved the way for the crucial point:
Proposition "I x = S K K x" matches with the first premise of Extension rule of inference: (E x) = (F x), with instantiations [E := I, F := S K K]. Thus the conclusion must also hold, that is, "E = F" with the same instantiations ([E := I, F := S K K]), yielding proposition "I = S K K", quod erat demonstrandum.
Csörnyei, Zoltán (2007): Lambda-kalkulus. A funkcionális programozás alapjai. Budapest: Typotex. ISBN-978-963-9664-46-3.
You don't need to define I as an axiom. Start with the following:
I.x = x
K.x y = x
S.x y z = x z (y z)
Since SKanything = anything, then SKanything is an identity function, just like I.
So, I = SKK and I = SKS. No need to define I as an axiom, you can define it as syntax sugar which aliases SKK.
The definitions of S and K are you only axioms.
The usual axioms are complete for beta equality, but do not give eta equality. Curry found a set of about thirty axioms to the usual ones to get completeness for beta-eta equality. They're listed in Hindley & Seldin's Introduction to combinators and lambda-calculus.
Roger Hindley, Curry's Last Problem, lists some additional desiderata we might want from mappings between the lambda calculus and notes that we don't have mappings that satisfy all of them. You likely won't care much about all of the criteria.

Resources