Technology

Overview

E is a saturating theorem prover for first-order and higher-order logic with equality. The main prover operates in four distinctive phases: Additional programs provide services like proof checking or axiom

Clausification

E parses first-order formulas into a perfectly shared formula bank (reusing its mature shared term data structures). It then uses a variant of the clausification algorithm described in [NW:SmallCNF-2001] without the advanced Skolemization techniques. The shared representation of the formula allows for a very efficient introduction and re-use of definitions - once a definition is introduced, it is automatically applied for every occurrence of a syntactically identical formula.

Calculus

The main proof search is performed using an instance of the superposition calculus described in [BG94] (and, in more detail, in [Schulz:AICOM-2002]). E implements superposition with negative literal selection, and with equality factoring, not merging paramodulation. It incorporates a number of simplification techniques, including unconditional rewriting, subsumption, semantic tautology elimination, and contextual literal cutting. It also employs AC redundancy elimination (based on pre-publication communication later documented in [AHL:JSC-2003]) and the controlled introduction of definitions for independent clause parts (equivalent to hyper-splitting with definitions [RV:IJCAI-2001]). The calculus has been extendet to include additional inferences for higher-order logic as described in [VBS:TACAS-2023].

Proof procedure

E uses the DISCOUNT [DKS97] variant of the given-clause algorithm. The proof state is represented by two sets of clauses, the set P of processed clauses (initially empty) and the set U of unprocessed clauses. Clauses in U are ranked according to a heuristic evaluation function, and are processed in order. Processing will first simplify the selected clause g with all clauses in P, then simplify P with g (moving all affected clauses from P back into U), and then compute all direct consequences between g and P that can be derived using the generating inference rules (superposition, equality factoring, and equality resolution). The new clauses are simplified with respect to P and added to U, g is added to P.

This proof procedure is different from the given-clause algorithm implemented in Otter (and many provers since), which also uses U for simplification. The variant E uses was first popularized by DISCOUNT, but also is at the core of Waldmeister. Vampire and SPASS implement it in addition to their primary algorithm.

Heuristics

The major heuristic choice points in E are the choice of a term ordering that restricts inferences and controls rewriting, the choice of the next given clause from U, and the (optional) selection of inference literals in clauses that have at least one negative literal.

E allows the user to select type of term ordering (LPO or KBO), precedence, and weights manually. It also implements a lwarge number of precedence and weight generation schemas that can be selected from the command line or by the automatic mode.

Selection of the given clause is controlled via a number of priority queues from which clauses are picked in a weighted round-robin scheme. Each queue is sorted according to a fixed priority function and a parameterized clause evaluation function. Clause evaluation functions can be goal-oriented, can focus on particular symbols and can take term orderings into account. The whole clause selection scheme can be described using a domain-specific mini-language. The automatic mode will determine a proven scheme for a given problem class automatically.

Literal selection is controlled by one of about 100 hard-coded selection strategies. In automatic mode, the prover with pick one of them without user intervention.

Data structures and algorithms

E has not originally been designed as a speed demon. However, it has acquired a reputation as a fast theorem prover, and, over time, has been retrofitted with data structures to somewhat justify that reputation.

E is build around perfectly shared terms, i.e. each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector.

Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared.

Subsumption and contextual literal cutting (also known as subsumption resolution) are supported using feature vector indexing [Schulz:FVI-2013].

Superposition and backward rewriting use fingerprint indexing [Schulz:IJCAR-2012], a new technique combining ideas from feature vector indexing and path indexing.

Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loechner:IJAIT-2006][Loechner:JAR-2006].

Implementation

E is implemented in ANSI-C, mostly sticking to the 1989 standard. It does require the long long extension from C-99.

By default, it uses the GNU toolchain, but has also been successfully built using a number of different compile environments.