Technology
E is a saturating theorem prover for first-order logic with equality. The main prover operates in four distinctive phases:
•A clausification algorithm translates first-order input into clause normal form, such that the resulting formula is unsatisfiable if and only of the original problem is provable.
•An (optional) preprocessing phase simplifies the resulting clausal problem.
•The (optional) automatic mode analyses the problem and selects suitable search parameters.
•A saturation algorithm implementing an instance of the superposition calculus tries to derive the empty clause and hence an explicit witness of the unsatisfiability of the problem.
Additional programs provide services like proof checking and proof extraction.
Overview
E parses first-order formulas into a perfectly shared formula bank (reusing its mature shared term data structures). It then uses a variant of the clausification algorithm described in [NW:SmallCNF-2001] without the advanced Skolemization techniques. The shared representation of the formula allows for a very efficient introduction and re-use of definitions - once a definition is introduced, it is automatically applied for every occurrence of a syntactically identical formula.
Clausification
The main proof search is performed using an instance of the superposition calculus described in [BG94] (and, in more detail, in [Schulz:AICOM-2002]). E implements superposition with negative literal selection, and with equality factoring, not merging paramodulation. It incorporates a number of simplification techniques, including unconditional rewriting, subsumption, semantic tautology elimination, and contextual literal cutting. It also employs AC redundancy elimination (based on pre-publication communication later documented in [AHL:JSC-2003]) and the controlled introduction of definitions for independent clause parts (equivalent to hyper-splitting with definitions [RV:IJCAI-2001]).
Calculus
E uses the DISCOUNT [DKS97] variant of the given-clause algorithm. The proof state is represented by two sets of clauses, the set P of processed clauses (initially empty) and the set U of unprocessed clauses. Clauses in U are ranked according to a heuristic evaluation function, and are processed in order. Processing will first simplify the selected clause g with all clauses in P, then simplify P with g (moving all affected clauses from P back into U), and then compute all direct consequences between g and P that can be derived using the generating inference rules (superposition, equality factoring, and equality resolution). The new clauses are simplified with respect to P and added to U, g is added to P.
This proof procedure is different from the given-clause algorithm implemented in Otter (and many provers since), which also uses U for simplification. The variant E uses was first popularized by DISCOUNT, but also is at the core of Waldmeister. Vampire and SPASS implement it in addition to their primary algorithm.
Proof procedure
The major heuristic choice points in E are the choice of a term ordering that restricts inferences and controls rewriting, the choice of the next given clause from U, and the (optional) selection of inference literals in clauses that have at least one negative literal.
E allows the user to select type of term ordering (LPO or KBO), precedence, and weights manually. It also implements a large number of precedence and weight generation schemas that can be selected from the command line or by the automatic mode.
Selection of the given clause is controlled via a number of priority queues from which clauses are picked in a weighted round-robin scheme. Each queue is sorted according to a fixed priority function and a parameterized clause evaluation function. Clause evaluation functions can be goal-oriented, can focus on particular symbols and can take term orderings into account. The whole clause selection scheme can be described using a domain-specific mini-language. The automatic mode will determine a proven scheme for a given problem class automatically.
Literal selection is controlled by one of about 100 hard-coded selection strategies. In automatic mode, the prover with pick one of them without user intervention.
Heuristics
E has not been designed as a speed demon. However, it has acquired a reputation as a fast theorem prover, and, over time, has been retrofitted with data structures to somewhat justify that reputation.
E is build around perfectly shared terms, i.e. each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption resolution) are supported using feature vector indexing [Schulz:FVI-2013].
Superposition and backward rewriting use fingerprint indexing [Schulz:IJCAR-2012], a new technique combining ideas from feature vector indexing and path indexing.
Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loechner:IJAIT-2006][Loechner:JAR-2006].
Data structures and algorithms
E is implemented in ANSI-C, mostly sticking to the 1989 standard. It does require the long long extension from C-99.
By default, it uses the GNU toolchain, but has also been successfully built using a number of different compile environments.
Implementation