[Logo of E] [Logo of DHBW]

A Benchmark and Test Environment for the Theorem Prover E

Stephan Schulz

The theorem prover E is a high-performance deduction system for many-sorted first-order logic with equality. As a saturating theorem prover, E searches for a proof by contradiction by systematically exploring the space of derivable first-order clauses (modulo redundancy). Performance of the theorem prover depends on the one hand on efficient algorithms and data structures, and on the other hand on search heuristics controlled by sets of parameters.

The prover is under continuous development. Currently, evaluation and testing of the system is a manual process, utilizing either small scale ad-hoc tests on the developer workstation, or systematic large-scale automation on the StarExec community cluster. Ad-hoc tests are manual and undocumented, while StarExec tests are complex, require manual set-up, and are only efficient (in terms of developer time and elapsed time) for tests requiring thousands of CPU hours. In particular, it is very hard to achieve fully automated tests with short feedback times.

The aim of this project is to develop a portable, drop-in test and benchmark environment that fills the gap between manual tests and large-scale evaluation enables systematic and automated testing, evaluation and tuning of the system.

The benchmark system should fulfill the following requirements:

In this context, there are (up to) three topics for Bachelor Theses/Student Projects available.

Shared responsibilities: Data Formats, APIs, packaging

All students should work together to define the API and data types for the system: How should jobs be described? How can jobs be submitted? How are results collected? The system should at least support StarExec-packages and StarExec runscripts for theorem prover configuration, but also needs ways to specify test examples, expected results, and both summarized and full actual results.

Preliminary work has identified a number of concepts that the system probably should supportL

The second shared responsibility is the design and development of the packaging and installation process.

Subproject: Multi-core Job Scheduling and Execution

This subproject concerns the efficient scheduling and execution of sets of theorem prover runs on the local hardware. It should ensure that all cores are fully utilized, but also that no more threads than cores are running to avoid unnecessary and expensive context switches. This subproject is the core of the environment.

Prerequisites

Subproject: Automatic Testing and Reporting

This subproject should automatically maintain a configurable list of remote git branches of the theorem prover, check them for updates, and if an update has been detected, should automatically fetch the new version, build the prover in both debug and performance configurations and perform tests with a predefined set of configurations and test problems. The results should be compared against expected values, and should be summarized and published via a web interface, with problems, errors, and regressions highlighted.

Prerequisites

Subproject: StarExec Backend

StarExec (described above) offers an API via the Java program StarExecCommand. This subproject should enable configuration of the benchmark environment to transparently use StarExec as the backend to execute (large) jobs.

Prerequisites

Literature

Schäfer, Simon, and Stephan Schulz. 2015. “Breeding Theorem Proving Heuristics with Genetic Algorithms.” In Proc. of the Global Conference on Artificial Intelligence, Tibilisi, Georgia, edited by Georg Gottlob, Geoff Sutcliffe, and Andrei Voronkov, 36:263–74. EPiC. EasyChair.

Schulz, Stephan. 2002. “E – A Brainiac Theorem Prover.” Journal of AI Communications 15 (2/3):111–26.

Schulz, Stephan. 2013. “System Description: E 1.8.” In Proc. of the 19th LPAR, Stellenbosch, edited by Ken McMillan, Aart Middeldorp, and Andrei Voronkov, 8312:735–43. LNCS. Springer.

Stump, Aaron, Geoff Sutcliffe, and Cesare Tinelli. 2014. “StarExec: A Cross-Community Infrastructure for Logic Solving.” In Proc. of the 7th IJCAR, Vienna, edited by Stéphane Demri, Deepak Kapur, and Christoph Weidenbach, 8562:367–73. LNCS. Springer.

Sutcliffe, G., J. Zimmer, and S. Schulz. 2004. “TSTP Data-Exchange Formats for Automated Theorem Proving Tools.” In Distributed Constraint Problem Solving and Reasoning in Multi-Agent Systems, edited by Volker Sorge and Weixiong Zhang, 201–15. Frontiers in Artificial Intelligence and Applications. IOS Press.


DHBW Stuttgart, Prof. Dr. Stephan Schulz