[Logo of E] [Logo of DHBW]

Cutting Languages Down to Size

Stephan Schulz and Geoff Sutcliffe

Programming languages and other formal languages (e.g. data representation languages like JSON) are often described via a context-free formal grammar, e.g. in the Extended Backus–Naur form that can be automatically processed by tools like Yacc and Bison. These languages tend to grow over time, as more features are added and the language covers more application cases. On the one hand, this makes the language more powerful. But on the other hand, this often makes the language harder to understand and harder to implement, increasing the barrier to entry for new developers.

This problem can, of course, be addressed by manually maintaining simplified or partial grammars. However, this creates not only a maintenance overhead, but may also lead to undesired and even unnoticed divergences of the full and the simplified grammar. Instead, we propose to use automatically extraction of self-contained parts of a formal grammar to create simpler sub-languages.

Our particular use case is the TPTP syntax for automated theorem proving tools. TPTP defines a family of languages, from pure first-order clause normal form (CNF) and full first order (FOF) via typed first-order (TFF), eventually to typed higher-order logic (THF). The language can be used to write both input problems (i.e. lists of logical formulae), but also derivations (where some of the formulas are input formulas, and the others are justified by reference to existing formulas and some mechanism for deriving the former from the latter. Most of the extensions are modular and even conservative (i.e. FOF is mostly a superset of CNF, and normally FOF is used as shorthand for CNF+FOF), and we are interested in e.g. extracting just a grammar for CNF or CNF+FOF. We are also interested extracting languages that do not allow certain features, e.g. a language in which only the pure input format in specified, not logical derivations. The task of this thesis is the design of a tool that performs this extraction and presents the result in a format that is both machine readable, but also friendly to use for human users.

One approach to this would be to a) select a given non-terminal node as the start symbol for the new grammar and b) block the undesired productions starting at certain non-terminal symbols. The tool can then start at the selected new start symbol and follow all legal transition to collect the remaining reachable grammar. It can also clean-up the resulting grammar by inlining rules for non-terminals with only a single production. An outstanding solution would also heuristically associate comments with grammar productions, and keep the comments referring to the remaining productions in place.

There are several possible tools with different levels of convenience for the user:

The topic will be co-supervised by Stephan Schulz and Geoff Sutcliffe, who will be available (via email or video conference) to discuss pearticulars of the TPTP grammar, its format and applications. It is suitable for one or two students - in the case of two students we expect not only the command line tool, but also a usable GUI.

Literature

Donnelly, Charles, and Richard M Stallman. 2018. Bison 1.20: The YACC-Compatible Parser Generator. Free Software Foundation.

Johnson, Stephen C. 1975. Yacc: Yet another compiler-compiler. Vol. 32. Bell Laboratories Murray Hill, NJ.

Levine, John R, Tony Mason, and Doug Brown. 1992. Lex & yacc. O’Reilly Media, Inc.

Sutcliffe, Geoff, Stephan Schulz, Koen Claessen, and Peter Baumgartner. 2012. “The TPTP Typed First-order Form with Arithmetic.” In Proc. of the 18th LPAR, Merida, edited by Nikolaj Bjørner and Andrei Voronkov, 7180:406–419. LNAI. Springer.

Sutcliffe, Geoff, Stephan Schulz, Koen Claessen, and Allen Van Gelder. 2006. “Using the TPTP Language for Writing Derivations and Finite Interpretations.” In Proc. of the 3rd IJCAR, Seattle, edited by Ulrich Fuhrbach and Natarajan Shankar, 4130:67–81. LNAI. Springer.

Van Gelder, A., and G. Sutcliffe. 2006. “Extending the TPTP Language to Higher-Order Logic with Automated Parser Generation.” In Proceedings of the 3rd IJCAR, Seattle, edited by U. Furbach and N. Shankar, 156–161. LNAI 4130. Springer.

Weblinks


DHBW Stuttgart, Prof. Dr. Stephan Schulz