8. Higher-Order Logic with Isabelle#
Author: Spencer Au
8.1. Idea and Introduction#
Topics
Higher-order logic foundations
Function types and quantification
Isabelle/HOL advanced features
Theorem proving strategies
Industrial theorem proving
This chapter extends to higher-order logic in Isabelle/HOL, showing how to reason about functions as first-class objects and exploring the foundations of modern theorem proving.
Higher-order logic (HOL) extends the capabilities of first-order logic (FOL) by allowing quantifiers (such as \( \forall x \) and \( \exists x \)) to range over functions and predicates, not just individual elements. In addition, functions can be passed as arguments to other functions, returned as results, and manipulated just like any other data type. This enables more expressive statements about the properties of functions and their relationships.
8.2. Basic Theory#
8.2.1. From First-Order Logic to Higher-Order Logic#
First-order logic (FOL) restricts quantification to individual elements of a domain, without quantifying over predicates or functions. Second-order logic extends FOL by allowing quantification over relations, sets, and functions of that domain. Higher-order logic (HOL) generalizes this extension by allowing quantification over predicates and functions of arbitrary finite types, not just individuals or sets.[7]
8.2.2. The HOL Type System#
In Isabelle/HOL, the underlying higher-order logic is built upon Church’s simple type theory[18], which is a disciplined type system in which every term is assigned a type and compound types are formed using function type constructors. This simple type theory extends first-order logic by classifying individuals, predicates, and functions into an explicit hierarchy of types, enabling quantification over functions and predicates while avoiding the semantic paradoxes of untyped systems, such as an expression that may refer to itself in a problematic way. In HOL, the type system both guarantees well-formed expressions and organizes how functions of any finite type can be combined and applied.[8] Church’s type theory is widely adopted in proof assistants implementing HOL, such as Isabelle/HOL, and offers a foundation for formalizing mathematics and computations.[2]
For further resources about type theory, SEP provides an excellent article on Type Theory[9]. There is also a solid YouTube video for those that would prefer a different medium.[32]
8.2.3. Lambda (\(\lambda\)) Abstraction#
In higher-order logic, \(\lambda\)-abstraction provides a formal way to define anonymous functions by binding variables in expressions. We write this as \(\lambda x.\; t\), where x is a parameter and t is the body of the function. A \(\lambda\)-term defines a function by binding a variable in an expression, and \(\beta\)-reduction is the rule that implements function application. Basically, when you apply a \(\lambda\)-term to an argument, you substitute that argument for the bound variable in the function body. For example, the abstraction \(\lambda x. \; x * x\) represents the square function, and applying it to an arbitrary number \(a\) yields \(a * a\) by replacing \(x\) with \(a\).[10] [11]
In Isabelle/HOL, \(\lambda\)-abstraction is part of the underlying simply typed \(\lambda\)-calculus, where terms consist of variables, constants, applications, and abstractions. Applying a \(\lambda\)-term to an argument substitutes the argument for the bound variable in the body, and Isabelle treats the original application and its \(\beta\)-reduced form as equivalent. This mechanism makes functions first-class citizens in HOL, supports the construction of higher-order functions, and enables logical constructs to be expressed uniformly.[2]
For supplementary resources about \(\lambda\)-abstraction and \(\lambda\)-calculus, SEP again provides some excellent articles.[12] [13] There are also easy to grasp YouTube videos by LigerLearn.[30] [31]
8.2.4. Logical Constants in HOL#
In higher-order logic, the logical constants are the primitive symbols that express the core logical operations and quantification. They include propositional connectives such as:
Conjunction (\(\land\))
Disjunction (\(\lor\))
Implication (\(\to\))
Negation (\(\neg\))
Equality (\(=\))
as well as quantifiers like:
Universal (\(\forall\))
Existential (\(\exists\))
which operate over objects of any type. In Isabelle/HOL, the full syntax of terms and constants is defined by the grammar of HOL and is treated uniformly alongside function application and \(\lambda\)-abstraction. This vocabulary is the basis on which more complex formulas are built and manipulated in proof development.[2]
8.2.5. Deductive Core of HOL#
The deductive core of higher-order logic consists of the fundamental inference rules for logical connectives, quantifiers, and equality, including the basic introduction and elimination rules that govern valid reasoning in HOL. In Isabelle/HOL, these primitive rules form the logic’s core inference system, and every derived theorem is justified by a sequence of such sound rule applications. Higher level proof methods and tactics ultimately based on these basic rules make structured proofs possible in practice. Concrete presentations of these inference rules and how they are used in proofs can be found in the Isabelle/HOL Proof Assistant Manual.[3]
8.3. Isabelle/HOL#
8.3.1. Installation#
Installation Link for Isabelle
Isabelle processes the theory document incrementally in the background, meaning that there is no additional “compile” or “run” step. As you type in lines and commands, the background system will check them and automatically update the proof state and any relevant error reports. Just go line by line, or command by command in order to check the related output.
Isabelle uses *.thy (theory) files, and the general structure of the theory files is like this:
theory THEORY_NAME
imports OTHER_THEORIES
begin
THEORY_BODY
end
theory THEORY_NAME declares the theory name and must exactly match the .thy filename. For example, if the file is called test.thy, then you would have theory test
imports OTHER_THEORIES tells it to import another theory; for example, one popular theory to import is Main.thy, which includes support for natural numbers, lists, and basic arithmetic
begin is the entry point into the theory body, while end is the end point of the theory body
More detailed instructions on setup and initial use can be found in Chapters 1 and 2.1.2 of Concrete Semantics[1]
8.3.2. Proof Solving via Sledgehammer#
Sledgehammer is an automated theorem prover orchestrator. It dispatches proof obligations to external provers such as Vampire via command line and returns candidate proofs that can be directly applied. Basically, it find proofs by “hammering away” at different sub-goals that would typically be tedious to construct manually. In the Isabelle GUI, Sledgehammer can be accessed by clicking on the sledgehammer tab on the bottom left corner of the application: select the target sub-goal, invoke Sledgehammer via apply, and then apply a suggested proof if found. Although Concrete Semantics 4.3.1[1] introduces the basic sledgehammer command, the GUI method is generally more convenient. The official Isabelle Sledgehammer documentation page[35] provides further detail.
8.3.3. First Example - Add Function#
For the first example, the goal will be to implement recursive addition via a function called “add”. The general form should be:
fun add :: "nat ⇒ nat ⇒ nat" where
"base case"
"recursive case"
where two natural numbers are added, and the result is a natural number that is the sum of the two input numbers. For example, sum m n should return the sum of m and n.
Show Answer
fun add :: "nat ⇒ nat ⇒ nat" where
"add 0 n = n" |
"add (Suc m) n = Suc (add m n)"
For a detailed explanation, see Section 8.3.2 Exercises
8.3.4. First Exercise - Associativity and Commutativity of Add#
For the first exercise, we will be proving the associative and commutative properties of a custom add function in Isabelle. This exercise comes from Exercise 2.2 of Concrete Semantics Exercises
The custom add function is defined here:
Show Answer
fun add :: "nat => nat => nat" where
"add 0 n = n" |
"add (Suc m) n = Suc(add m n)"
We use 2 lemmas to prove each part of the exercise, namely:
lemma add_assoc: "add (add m n) p = add m (add n p)"
lemma add_comm: "add m n = add n m"
The full lemma proof for associativity is here:
Show Answer
lemma add_assoc: "add (add m n) p = add m (add n p)"
proof (induction m)
case 0
show ?case by simp
next
case (Suc m)
show ?case by (simp add: Suc.IH)
qed
For a detailed explanation, see Section 8.3.3 Associative Property
Next, to prove the commutative property, we will first prove 2 helper lemmas:
lemma add_0_right: "add m 0 = m"
lemma add_Suc_right: "add m (Suc n) = Suc (add m n)"
The helper lemma proofs for commutativity
Show Answer
lemma add_0_right: "add m 0 = m"
proof (induction m)
case 0
show ?case by simp
next
case (Suc m)
show ?case by (simp add: Suc.IH)
qed
lemma add_Suc_right: "add m (Suc n) = Suc (add m n)"
proof (induction m)
case 0
show ?case by simp
next
case (Suc m)
show ?case by (simp add: Suc.IH)
qed
With the helper lemmas proven, we will use them to prove the commutative property
Show Answer
lemma add_comm: "add m n = add n m"
proof (induction m)
case 0
show ?case
by (simp add: add_0_right)
next
case (Suc m)
show ?case
by (simp add: Suc.IH add_Suc_right)
qed
For a detailed explanation, see Section 8.3.3 Commutative Property
The Isabelle .thy file for this exercise is located here
This form is an Isar style proof. The following example will be an older style tactic based proof.
8.4. Introductory Example - Flattening and Length Invariant#
The List Flattening and Length Preservation exercise is adjusted from this exercise. First try the exercise yourself, and if stuck, reference the document linked.
8.4.1. List Flattening#
Define a recursive function fun flatten :: 'a list list => 'a list that takes a list of lists as input, flattens it, and returns that flattened list. Flattening is simply turning a multi-dimensional list into a single list by concatenating inner lists. For example, if we have \([[1,2], [3,4], [5]]\), then flattening would yield \([1,2,3,4,5]\).
For the base case, simply check if the input list is empty, and if so, return an empty list. For the recursive case, match a non-empty list with (xs # xss) and concatenate xs (using @) with the result of flattening xss.
Show Answer
fun flatten :: "'a list list => 'a list" where
"flatten [] = []" |
"flatten (xs # xss) = xs @ flatten xss"
Use a quick lemma with simp to confirm that the function works:
lemma "flatten [[1::nat, 2], [3,4], [5,6], [7]] = [1,2,3,4,5,6,7]"
apply simp
done
If there are no errors, then the flatten function is implemented correctly.
8.4.2. Length Invariant#
The next step is to prove that the flatten function implemented is length invariant, meaning that the operation doesn’t create or lose elements, but instead only performs simple concatenation.
Define a lemma called lemma length_flatten:, and check that for all xss, the length of flatten(xss) equals the sum of the lengths of the lists inside xss using.
lemma length_flatten:
"length (flatten xss) = sum_list (map length xss)"
Since this proof is relatively simple, a tactic based approach would be ideal. Set up the tactic based proof by induction by using apply (induct xss)
There should now be 2 sub-goals, and from there, simply leverage the Sledgehammer tool to solve proof obligations.
The full proof is here. Isabelle will recognize the lemma as a theorem when all proof obligations have been satisfied.
Show Answer
lemma length_flatten:
"length (flatten xss) = sum_list (map length xss)"
apply (induct xss)
apply simp
by simp
The Isabelle .thy file for this exercise is located here
In addition to this exercise, there is also an introduction exercise to syllogistic logic[34] in Isabelle by Alexander Kurz linked here.
8.5. The Landscape of Tools#
8.5.1. Interactive Theorem Provers#
HOL4 is part of the “HOL” family of interactive theorem provers, using classical higher-order logic and following the LCF approach to ensure soundness. Developed by Michael J. C. Gordon, the system is implemented in ML, and is a direct descendent of the original HOL88 system. Because HOL4 shares the same underlying logic as Isabelle/HOL, many theories and proof patterns are generally portable between the two tools.
Rocq (formerly Coq) is an interactive theorem prover based on the Calculus of Inductive Constructions (CIC), which is dependently typed \(\lambda\)-calculus that extends the Calculus of Constructions with inductive types. This extension allow propositions to be represented as types and proofs as programs, enabling highly expressive specifications. Rocq is mainly implemented in OCaml with some C, and it’s dependent type theory allows it to have greater expressive power over Isabelle/HOL.
Lean (4) is another proof assistant that follows a similar but more modern variant of dependent type theory. Like Rocq, Lean also extends the Calculus of Constructions with inductive types. Lean 4 in particular is mostly implemented in Lean (with some C++), and can have its Lean theorem prover produce C code. Like Rocq, Lean 4’s dependent type theory also supports greater expressive power compared to Isabelle/HOL.
8.5.2. Automated Theorem Provers#
Leo III is an automated theorem prover for classical higher-order logic that supports all common TPTP input dialects and is based on paramodulation calculus with ordering constraints for reasoning. Leo III is written in Scala and runs on the JVM (Java Virtual Machine). Compared with ITPs (interactive theorem provers) like Isabelle/HOL, Leo III trades human-guided proof structuring and granular control for full automation, allowing it to rapidly discharge proof obligations.[28]
Satallax is another automated theorem prover for classical higher-order logic and is based on Church’s simple type theory with extensionality and choice operators. It is implemented in OCaml and uses the SAT solver MiniSat for its proof search. Basically, Satallax generates propositional clauses corresponding to the rules of a complete tableau calculus and calls MiniSat periodically to test the satisfiability of these clauses.[16]
8.5.3. Programming Languages#
F* is a dependently typed programming language that integrates program development with proof construction in a single system. It’s type system lets you encode precise specifications and it uses SMT solvers to automatically discharge proof obligations while still supporting interactive proof guidance when necessary, similar to Isabelle and its Sledgehammer tool. More information about F* with a web-based tutorial can be found here.
8.6. Algorithms#
8.6.1. Matching and Unification#
Matching and unification are core algorithms in Isabelle/HOL that let the system find substitutions for schematic variables so that rules, lemmas, or patterns can be applied to the current proof goal. Introduced in Section 5.8 in A Proof Assistant for Higher-Order Logic[3], these processes are essential because they enable Isabelle/HOL to automatically instantiate facts and guide proof search without significant manual effort. In Isabelle/HOL, matching often uses higher-order pattern unification because terms can include \(\lambda\)-abstractions and variables that represent functions, meaning that the algorithm must respect equivalence under various \(\lambda\)-calculus conversion rules (such as \(\beta\)-reduction) rather than just syntactic/text-based equality. Basically, higher-order unification must account for the way function application and variable binding behave in the typed \(\lambda\)-calculus that underlies higher-order logic. In practice, this allows Isabelle/HOL handle the richer structure of higher-order logic, such as functions and bound variables, and apply the resulting substitutions directly in tactics and inference steps during proof construction.
8.6.2. Simplification & Rewriting#
Term simplification, introduced in Section 3.1 in A Proof Assistant for Higher-Order Logic[3] is a core component of Isabelle/HOL’s rewriting infrastructure. It repeatedly applies equational theorems (theorem that states an equality between two terms) as rewrite rules to transform terms into simpler forms, with these rules typically marked by the simp attribute so that the simplifier can use them automatically in tactics such as simp, auto, and related methods. The simplifier underpins much of Isabelle’s automation by reducing goal terms and sub-goals before other tactics run, enabling the construction and success of more complex proofs without extensive manual effort. The Isabelle/HOL simplifier also supports conditional rewriting, where rewrite rules are only applied if their premise conditions can be satisfied, increasing flexibility while still ensuring correctness during proof search. This conditional behavior and the ability to tune which rules are active give users granular control over how goals are simplified when developing formal proofs and help automation focus on relevant proof progress rather than unproductive rewrites.
8.6.3. Proof Search and External Automation#
Proof search and external automation serve as the foundation for Isabelle/HOL’s practical automation layer, combining built-in search procedures with calls to external solvers. Isabelle’s classical reasoner, introduced in Section 5.12 in A Proof Assistant for Higher-Order Logic[3], is family of tools that perform proof search automatically by leveraging search and backtracking. While the blast method is “the main workhorse” for the classical reasoner, Section 5.13[3] also talks about clarify, clarsimp, force, auto, fast, and best. This is a general summary provided by the document of the different classical reasoning methods:
blast works automatically and is the fastest
clarify and clarsimp perform obvious steps without splitting the goal; safe even splits goals
force uses classical reasoning and simplification to prove a goal; auto is similar but leaves what it cannot prove
fast and best are legacy methods that work well with rules involving unusual features
External automation is typically invoked via the Sledgehammer tool. Sledgehammer works by selecting a set of relevant facts from the current proof context, translates it to a format appropriate for ATPs/SMTs, invokes the tools, and then returns the proof text in an appropriate Isabelle proof format (oftentimes via metis). This method helps preserve trust given that Sledgehammer relies on tools that are outside of the Isabelle/HOL ecosystem. More information about using the tool can be found in Section 8.3.2, and the official Sledgehammer tool documentation page[35] provides a breadth of detailed, supplementary information.
8.7. Typical Use Cases#
Isabelle/HOL is widely used for formal specification and rigorous proof of both mathematical theorems and system properties. It supports expressing precise logical models and verifying them interactively with automation (via the Sledgehammer tool), making it suitable for a variety of tasks such as formalizing algorithms, verifying software and hardware correctness, and reasoning about programming language semantics and protocols. Isabelle/HOL has been used in many different verification projects, such as through the large body of libraries and developments collected in the Archive of Formal Proofs (AFP)[27] and as evidenced by the examples in Section 8.9.
8.8. Benchmarks and Competitions#
miniF2F is a cross-system benchmark of 488 problem statements drawn from both mathematical competitions (IME, AMC and the International Mathematical Olympiad) and high school and undergraduate level mathematics courses. The benchmark targets theorem provers such as Lean, Metamath, Isabelle, and HOL Light in order to enable cross-system comparison of theorem provers and proof automation tools. The goal is to serve as a benchmark for automated and neural theorem proving systems. A formal problem statement is fed into system and the prover must output a fully machine-verifiable proof.[29]
TPTP (Thousands of Problems for Theorem Provers) is a library of test problems for testing and evaluating ATPs. Problems are expressed in a simple text-based format for either first-order logic or higher-order logic. TPTP provides a common benchmark with a single, unambiguous reference set of problems so that different ATP systems can be both evaluated and compared with reproducible results.[36]
CASC (The CADE ATP System Competition) is an annual competition of fully automatic, classical logic, ATP systems. The purpose of CASC is to provide a public evaluation of relative capabilities of ATP systems as well as to stimulate research and development of ATP systems. At CASC, ATP system performance is evaluated in terms of the total number of problems solved with an acceptable solution output within a specified time limit, as well as the average time taken for problems solved. CASC is hosted at each CADE and IJCAR conference, both forums for automated deduction.[37]
8.9. Applications in Industry and Academia#
In general, Isabelle/HOL appears to have a wide variety of application throughout industry and academia due to the fact that it provides a mathematical assurance of correctness (rather than testing alone). The tool is particularly suited to safety critical systems, such as avionics, embedded systems, industrial process control, SoC design, etc where fault risk must be minimized and certification standards demand high trust.
8.9.1. Physical Addressing on Real Hardware#
Achermann et al.[19] discuss how to formally model and verify physical memory address translation and remapping hardware (such as MMUs) in SoCs (Systems-on-Chip) using Isabelle/HOL. Specifically, they developed a hardware model that encodes translation units and then prove that standard memory operations preserve system invariants.
8.9.2. FOCUS - Stream Processing Components#
Spichkova[20] introduces FOCUS, a framework for formal specification and refinement-based verification of interactive systems. Based on stream-processing semantics that model communication histories over directed channels, FOCUS is supported by Isabelle/HOL using the Isar proof language. FOCUS is evaluated on three case studies:
a steam boiler control system modelled as a distributed real-time system with proofs that water levels and pump actuations satisfy safety and timing constraints
the FlexRay automotive communication protocol, where FOCUS verifies correct static schedules, channel behavior, and broadcast properties for safety-critical embedded communication
an Automotive-Gateway system from the Verisoft project, formally specified and refined with guarantees that crash signals trigger correct emergency-service calls and satisfy required data-handling properties.
8.9.3. IsaBIL - Verifying (In)Correctness of Binaries#
Griffin et al.[21] present IsaBIL, a binary analysis framework in Isabelle/HOL that is based on BAP (Binary Analysis Platform). IsaBIL formalizes BAP’s intermediate language (BIL) and integrates it with Hoare Logic (for proofs of correctness) and incorrectness logic (for proofs of incorrectness). While there is a primary focus is on the RISC-V architecture and C binaries, the authors assert that IsaBIL is a flexible framework that can verify binaries for different languages (C, C++, Rust), toolchains (LLVM, Ghidra), and architectures (x86, RISC-V). The authors prove correctness through some industry level examples such as Lockheed Martin’s JSF (Joint Strike Fighter) coding standards and the MITRE ATT&CK database.
8.9.4. VerIso - Isolation Guarantees of Database Transactions#
Ghasemirad et al.[22] present VerIso, a rigorous Isabelle/HOL-based framework for verifying transaction isolation within production level databases. The authors showcase VerIso by modelling the strict two-phase locking (S2PL) protocol and prove that it provides strict serializability (transactions behave as if executed sequentially and that sequential order must match the actual time order of their invocation/commit). In addition, VerIso’s parameterized architecture supports multiple isolation levels and uncovers design-level bugs in protocols such as the TAPIR protocol and its violation of atomic visibility.
8.9.5. IEEE 754 Floating Point Implementation for MDPs#
Kohlen et al.[23] present a fully verified implementation of the Interval Iteration (II) Algorithm for Markov Decision Processes (MDPs). They model the II algorithm in Isabelle/HOL, use the Isabelle Refinement Framework (IRF) to carry out step-wise refinement down to LLVM bytecode, and extend Isabelle/HOL’s reasoning to support IEEE 754 floating-point arithmetic with directed rounding. Their result is a correct-by-construction floating-point implementation, competitive with industry tools and highly relevant for domains where verified numerical correctness matters, such as embedded systems, SoCs and safety-critical software.
8.9.6. Autoformalization with Large Language Models#
Wu et al.[24] show that large language models are particularly effective at performing autoformalization, which is the process of automatically translating natural language mathematics into formal specifications and proofs. Specifically, they note that 25.3% of mathematical competition problems were translated perfectly to formal Isabelle/HOL statements. In addition, by using these autoformalized statements to fine-tune an existing neural theorem prover, they managed to improve achieve a 35.2% proof rate on Mini2F2, compared to a baseline proof rate of 29.6%.
8.10. Case Study - Autoformalization with LLMs#
For the case study, we will be playing around with autoformalization with a locally hosted LLM. Neither the provided artifacts Wu et al.[24] nor Xu et al.[25] provide model checkpoints or some kind of a modelfile, but instead simply detail the methods the authors had used to fine-tune the model. As a result, for this example, we will simply be using a baseline model along with some examples from Wu et al.[24] in order to produce valid Isabelle/HOL proofs from natural language input. Also, its important to keep in mind that the 3 JSON proof examples provided by Wu et al.[24] use certain keywords like fixes and shows unlike the examples we worked through together above.
Natural language statement 1: Prove that for any functions f and g and any list xs, mapping f over the result of mapping g over xs is the same as mapping the composition (f ∘ g) over xs.
OUTPUT:
theorem
fixes f :: "'b ⇒ 'c" and g :: "'a ⇒ 'b" and xs :: "'a list"
shows "map f (map g xs) = map (f ∘ g) xs"`
Natural language statement 2: Define a custom recursive addition function add on natural numbers (using 0 and Suc). Then prove that add is both associative and commutative.
OUTPUT:
fun add :: "nat ⇒ nat ⇒ nat" where
"add 0 n = n" |
"add (Suc m) n = Suc (add m n)"
theorem add_assoc:
fixes m n p :: nat
shows "add (add m n) p = add m (add n p)"
theorem add_comm:
fixes m n :: nat
shows "add m n = add n m"
Natural language statement 3:
Define a custom recursive function flatten that concatenates a list of lists into a single list, then prove the length-invariance property: for any list of lists xss, length (flatten xss) equals the sum of the lengths of the inner lists (i.e., sum_list (map length xss)).
OUTPUT:
fun flatten :: "'a list list ⇒ 'a list" where
"flatten [] = []" |
"flatten (x # xs) = x @ flatten xs"
theorem flatten_length_invariance:
fixes xss :: "'a list list"
shows "length (flatten xss) = sum_list (map length xss)"
The code is located here.
Currently, the API endpoint for Ollama is pointed towards the Chapman DGX0 Compute Cluster, and runs a Qwen3:30B model.
8.11. History#
8.11.1. Origins of Higher-Order Logic#
Alonzo Church’s work in the 1930s (via \(\lambda\)-calculus)[39] and 1940s (via type theory)[40] and Leon Henkin’s work in the 1950s (on general model/Henkin semantics)[7] laid the foundation for higher-order logic. From their contributions arose an extension of First-Order Logic (FOL) that allows quantification over predicates and functions, enabling reasoning about functions as first class entities.
In the 1970s, Robert Milner developed LCF (Logic for Computable Functions)[38] at Stanford and later Edinburgh, introducing the idea of an interactive theorem prover. LCF pioneers the use of a tactic-based proof automation and the ML meta language, which is designed to let users safely define proof strategies. ML later evolves into OCaml and Standard ML.
In the early 1980s, Michael J. C. Gordon built upon LCF in order to create the HOL system[38], which explicitly uses higher-order logic as its core formalism. This HOL system would become the foundation for hardware verification, paving the way and influencing later provers like HOL4 and Isabelle/HOL.
8.11.2. Development of Isabelle/HOL#
Developed in the late 1980’s by Lawrence C. Paulson at Cambridge, Isabelle was created as a generic theorem proving framework capable of supporting multiple logical formalisms under a single meta-logic.[26]
The HOL instantiation of Isabelle (Isabelle/HOL) became the most widely adopted version due to its strong expressiveness and balance between automation and manual control.[26]
Markus Wenzel proposed and developed the Isar Proof Language[17] for Isabelle between 1998 and 2001. Isar allows for more structured and human readable proofs, improving clarity over traditional tactic-based approaches like in LCF.
The creation of the Archive of Formal Proofs (AFP) in 2004 established a large, community-driven library of formalized mathematics and computer science. The establishment of the AFP solidified Isabelle/HOL’s role in both academia and industry.[27]
The integration of tools such as Sledgehammer and external SMT/ATP solvers in 2007 further extend Isabelle/HOL’s proof power. The bridging of interactive reasoning and automation allows user to tackle complex goals with minimal manual effort.[35]
8.12. Current Developments#
Active research is increasingly integrating generative and large language models (LLMs) with interactive theorem proving to reduce manual proof construction. This emerging paradigm, termed Neural Theorem Proving (NTP), augments classic automatic theorem proving (ATP) by integrating generative AI with symbolic proof assistants like Isabelle/HOL to enhance automation, guide proof search, and address complex formal reasoning tasks. Recent work by Xu et al. introduces MiniLang/IsaMini[25], an intermediate proof language designed to improve LLM performance in Isabelle/HOL and significantly boost success rates on standard proof benchmarks compared to generating Isar scripts directly. NTP and hybrid AI frameworks are also being developed to generate and structure complete formal proofs, fine-tune models for syntactically correct proof output, and integrate verification backends to reduce errors and improve correctness in formal verification tasks. These efforts align with broader trends in formal methods research that explore how LLM reasoning can be combined with rigorous symbolic verification to enhance automation, benchmark performance, and increase formal verification scalability.
8.12.1. Research Challenges#
Despite ongoing advances, substantial research challenges remain in Isabelle/HOL’s ecosystem, particularly at the intersection of formal verification and generative AI. Interactive theorem proving in Isabelle/HOL still depends heavily on human guidance through structured Isar proofs, making large-scale developments both labor-intensive and error-prone. The integration of LLM-based techniques introduces additional difficulties, such as translating model output into Isabelle’s strict proof language, mitigating hallucinations and syntactic errors, and obtaining sufficiently large, high-quality datasets of formal proofs for training. While intermediate languages such as MiniLang/IsaMini[25] can significantly improve LLM performance, they also highlight the sensitivity of proof success to representation choices and data quality. Moreover, bridging the gap between informal specifications and fully formalized Isabelle theories, commonly termed autoformalization[24], remains a major obstacle, as current LLMs struggle to reliably produce complete, semantically correct formalizations without human intervention.
8.12.2. Conferences and Workshops#
International Conference on Interactive Theorem Proving (ITP) - dedicated to interactive theorem proving and related topics, ranging from theoretical foundations to implementation aspects and applications in program verification, security, and formalization of mathematics.
Isabelle Workshop - a workshop focused on Isabelle where users and developers can present tool developments, formalization projects, and ongoing research papers.
International Joint Conference on Automated Reasoning (IJCAR) - international conference covering automated and interactive reasoning which merges several leading conferences and events, such as the Conference on Automated Deduction (CADE), TABLEAUX, and FroCoS, into a single forum covering foundations, tools, and applications in automated reasoning.
Archive of Formal Proofs (AFP) - while not an explicit conference, the AFP functions like a curated repository and journal for Isabelle/HOL proofs, hosting a large collection of mechanically checked proof libraries and scientific developments that are often associated with conference or journal publications.
8.13. References#
8.13.1. eBooks and Textbooks#
- [1]: Nipkow and Klein (2014) Concrete Semantics: With Isabelle/HOL, Springer Publishing Company, Incorporated.
- [2]: Nipkow, Paulson, and Wenzel (2009) Isabelle’s Logics: HOL, Isabelle2009 Documentation.
-
[3]: Nipkow, Paulson, and Wenzel (2025)
Isabelle/HOL: A Proof Assistant for Higher-Order Logic, Springer-Verlag.
- [4]: Peter B. Andrews (2002) Introduction to Mathematical Logic and Type Theory: To Truth through Proof (2nd. ed.), Kluwer Academic Publishers, USA.
-
[5]: Laurence C. Paulson (1987)
Logic and Computation: Interactive Proof with Cambridge LCF, Cambridge University Press, USA.
- [6]: Gunnar Teege (2025) A Gentle Introduction to Isabelle and Isabelle/HOL, Universität der Bundeswehr München.
8.13.2. Online Articles#
-
[7]: Jouko Väänänen (2024)
Second-order and Higher-order Logic, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University
-
[8]: Benzmüller and Andrews (2025)
Church’s Type Theory, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University.
-
[9]: Thierry Coquand (2022)
Type Theory, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University
-
[10]: Alexander Kurz (2023)
Semantics of the Lambda Calculus, CPSC 354 Programming Languages, Chapman University.
-
[11]: Alexander Kurz (2023)
Syntax of Lambda Calculus, CPSC 354 Programming Languages, Chapman University.
- [12]: Deutsch and Marshall (2025) Supplement D: The λ-Calculus and Type Theory, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University.
-
[39]: Deutsch and Marshall (2025)
D.1 Church’s Lambda Calculus, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University.
-
[40]: Deutsch and Marshall (2025)
D.2 Church’s Simple Theory of Types, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University.
-
[13]: Alama and Korbmacher (2023)
The Lambda Calculus, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University.
-
[14]: Jean-Pierre Marquis (2023)
Category Theory, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University
-
[15]: Frederic Portoraro (2025)
Automated Reasoning: Section 3.1 on Higher-Order Logic, The Stanford Encyclopedia of Philosophy, Metaphysics Research Lab, Stanford University.
- [16]: Chad E. Brown (2012) Satallax: An Automatic Higher-Order Prover, Saarland University, Saarbr¨ucken, Germany.
- [17]: Makarius (Markus) Wenzel (1999) Isar — a Generic Interpretative Approach to Readable Formal Proof Documents, Theorem Proving in Higher Order Logics (TPHOLs 1999), volume 1690 of Lecture Notes in Computer Science. Springer-Verlag.
8.13.3. Research Papers#
- [18]: Alonzo Church (1940) A Formulation of the Simple Theory of Types, The Journal of Symbolic Logic 5(2): 56–68.
- [19]: Achermann, Humbel, Cock, and Roscoe (2018) Physical Addressing on Real Hardware in Isabelle/HOL, Department of Computer Science, ETH Zurich.
- [20]: Maria Spichkova (2014) Stream processing components: Isabelle/HOL formalisation and case studies, arXiv preprint arXiv:1405.1512.
- [21]: Griffin, Dongol, and Raad (2025) IsaBIL: A Framework for Verifying (In)correctness of Binaries in Isabelle/HOL (Extended Version), arXiv preprint arXiv:2504.16775.
-
[22]: Ghasemirad et al. (2025)
VerIso: Verifiable isolation guarantees for database transactions, arXiv preprint arXiv:2503.06284.
-
[23]: Kohlen et al. (2025)
A formally verified IEEE 754 floating-point implementation of interval iteration for MDPs, International Conference on Computer Aided Verification, Cham: Springer Nature Switzerland.
-
[24]: Wu et al. (2022)
Autoformalization with Large Language Models, NeurIPS 2022 Conference.
-
[25]: Xu et al. (2025)
IsaMini: Redesigned Isabelle Proof Language for Machine Learning, arXiv preprint arXiv:2507.18885.
-
[26]: Paulson, Nipkow, and Wenzel (2019)
From LCF to Isabelle/HOL, Formal Aspects of Computing 31.6 (2019): 675-698.
-
[27]: Makarius (Markus) Wenzel (2019)
Isabelle technology for the Archive of Formal Proofs with application to MMT, arXiv preprint arXiv:1905.07244.
-
[28]: Steen and Benzmüller (2018)
The Higher-Order Prover Leo-III (Extended Version), IJCAR 2018, Oxford, UK.
- [29]: Zheng, Han, and Polu (2022) MiniF2F: A Cross-System Benchmark for Formal Olympiad-Level Mathematics, ICLR 2022.
8.13.4. Videos#
- [30]: LigerLearn (2023) Lambda (λ) Calculus Primer, YouTube.
- [31]: LigerLearn (2023) Lambda (λ) calculus evaluation rules (δ, β, α, η conversion/reduction), YouTube.
- [32]: Mark Jago / Attic Philosophy (2025) Type Theory in Computer Science, Linguistics, Logic, YouTube.
- [33]: Oliver Lugg (2022) A Sensible Introduction to Category Theory, YouTube.
8.13.5. Miscellaneous#
- [34]: Larry Moss (2015) Natural Logic, UC Berkeley Logic Seminar.
- [35]: Blanchette, Desharnais, Paulson, and Bartl (2025) Hammering Away: A User’s Guide to Sledgehammer for Isabelle/HOL, Isabelle 2025 Documentation
-
[36]: Geoff Sutcliffe (2017)
The TPTP Problem Library and Associated Infrastructure: From CNF to TH0, Journal of Automated Reasoning 59(4): 483–502.
-
[37]: Geoff Sutcliffe (2016)
The CADE ATP System Competition – CASC, AI Magazine 37(2): 99–101.
- [38]: Mike Gordon (1996) From LCF to HOL: a short history
8.14. Suggestions for Future Work#
I definitely agree having an explicit chapter on both first-order logic and theorem provers related to FOL would be helpful, especially as a preceding chapter to the current one about higher-order logic and Isabelle. This chapter should also include popular Automatic Theorem Provers (ATPs) for FOL, and could serve as a nice introduction into theorem proving and “stronger” tools such as Isabelle/HOL and Lean.
It might also be interesting to see different chapters covering automatic theorem provers such as Vampire or Satallax vs interactive theorem provers such as Isabelle/HOL or Lean. In addition, there should be a clear division made between first order logic ATPs such as the aforementioned Vampire and higher-order logic ATPs such as Satallax. In addition, it would also be interesting and useful to either create or find an online Isabelle/HOL program that can run in a web browser, similar to the Lean Game Server. This way users can be quickly onboarded to the tool without having to go through download, setup, version compatibility issues, and other miscellaneous problems that may arise.
It may also be cool to see a more interactive history section, with custom CSS and JS injection that allows for a detailed history timeline of Isabelle/HOL development, and theorem provers in general within the context of higher-order logic. In addition, having some kind of visual aid to see the different relationships between the “major figures” within this space would be helpful as well. For example, Peter Andrews was a doctorate student of Alonzo Church, and while Laurence Paulson was responsible for the development of LCF, we can still see from the citations (where he is often credited alongside Nipkow and Wenzel) that he is still very invovled with the documentation and development of Isabelle/HOL.
8.15. Contributors#
Written by Spencer Au and reviewed by Wayne Chong and Alexander Kurz.