E 0.999 and EP 0.999

S. Schulz
Institut für Informatik, Technische Universität München, Germany
schulz@eprover.org

Architecture

E 0.999[Sch2002,Sch2004] is a purely equational theorem prover. The core proof procedure operates on formulas in clause normal form, using a calculus that combines superposition (with selection of negative literals) and rewriting. No special rules for non-equational literals have been implemented, i.e., resolution is simulated via paramodulation and equality resolution. The basic calculus is extended with rules for AC redundancy elimination, some contextual simplification, and pseudo-splitting. The latest versions of E also supports simultaneous paramodulation, either for all inferences or for selected inferences.

E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e. a strict separation of active and passive facts. Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of preprogrammed literal selection strategies, many of which are only experimental. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).

The prover uses a preprocessing step to convert formulas in full first order format to clause normal form. This step may introduce (first-order) definitions to avoid an exponential growth of the formula. Preprocessing also unfolds equational definitions and performs some simplifications on the clause level.

The automatic mode determines literal selection strategy, term ordering, and search heuristic based on simple problem characteristics of the preprocessed clausal problem.

EP 0.999 is just a combination of E 0.999 in verbose mode and a proof analysis tool extracting the used inference steps.

Implementation

E is implemented in ANSI C, using the GNU C compiler. The most outstanding feature is the global sharing of rewrite steps. Current versions of E add rewrite links from rewritten to new terms. In effect, E is caching rewrite operations as long as sufficient memory is available. Other important features are the use of perfect discrimination trees with age and size constraints for rewriting and unit-subsumption, feature vector indexing[Sch2004b] for forward- and backward subsumption and contextual literal cutting, and a new polynomial implementation of LPO[Loe2004].

The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, MacOS-X, and various versions of Linux. Sources of the latest released version are available freely from:

    http://www.eprover.org

EP 0.999 is a simple Bourne shell script calling E and the postprocessor in a pipeline.

Strategies

E's automatic mode is optimized for performance on TPTP. The optimization of E 0.999 is based on about 90 test runs over the library (and previous experience) and consists of the selection of one of about 40 different strategies for each problem. All test runs have been performed on SUN Ultra 60/300 machines with a time limit of 300 seconds (or roughly equivalent configurations). All individual strategies are refutationally complete. The worst one solves about 49% of TPTP 3.0.1, the best one about 60%. We expect similar optimization for E 0.999. E distinguishes problem classes based on a number of features, all of which have between 2 and 4 possible values. The most important ones are:

Is the most general non-negative clause unit, Horn, or Non-Horn?
Is the most general negative clause unit or non-unit?
Are all negative clauses unit clauses?
Are all literals equality literals, are some literals equality literals, or is the problem non-equational?
Are there a few, some, or many clauses in the problem?
Is the maximum arity of any function symbol 0, 1, 2, or greater?
Is the sum of function symbol arities in the signature small, medium, or large?

Wherever there is a three-way split on a numerical feature value, the limits are selected automatically with the aim of splitting the set of problems into approximately equal sized parts based on this one feature.

For classes above a threshold size, we assign the absolute best heuristic to the class. For smaller, non-empty classes, we assign the globally best heuristic that solves the same number of problems on this class as the best heuristic on this class does. Empty classes are assigned the globally best heuristic. Typically, most selected heuristics are assigned to more than one class.

Expected Competition Performance

In the last years, E performed well in most proof categories. We believe that E will again be among the stronger provers in the CNF category. Performance on FOF problems should be competitive. We hope that E will at least be a useful complement to dedicated systems in the other categories.

EP 0.999 will be hampered by the fact that it has to analyse the inference step listing, an operation that typically is about as expensive as the proof search itself. Nevertheless, it should be competitive among the MIX* and FOF* systems.

References

Sch2002: Schulz S. (2002), E: A Brainiac Theorem Prover, Journal of AI Communications 15(2/3), 111-126, IOS Press
Sch2004: Schulz S. (2004), System Abstract: E 0.81, Proceedings of the 3rd IJCAR, (Cork, Ireland), Lecture Notes in Artificial Intelligence, Springer-Verlag
Sch2004b: Schulz S. (2004), Simple and Efficient Clause Subsumption with Feature Vector Indexing, Proceedings of the IJCAR-2004 Workshop on Empirically Successful First-Order Theorem Proving, (Cork, Ireland)
Loe2004: Löchner b. (2004), What to know when implementing LPO, Proceedings of the IJCAR-2004 Workshop on Empirically Successful First-Order Theorem Proving, (Cork, Ireland)