Grammar of Smalltalk-80. Introduction The last pages of the Blue and the Purple Book and the pages 80 - 93 of the Orange Book display syntax diagrams of Smalltalk-80. They raise some issues. (End of Introduction) Extension of Backus Normal Form: Lack of time dictates to use an extended variant of the well known Backus Normal Form (BNF) instead of syntax diagrams to notate the grammar. This variant employs these meta characters: {x} means zero or more occurrences of x [x] means zero or more occurrences of x (x) means grouping, x-y means all z with x <= z <= y | means "or" <- means the left arrow character whose ASCII code is the same as of the underline character. (End of Extension of Backus Normal Form) The Grammar is Incomplete. The syntax diagrams define a grammar which rejects expressions that are acceptable according to other parts of the Blue Book. This must be considered an error. There is no point in using a grammar to define a programming language if the grammar will not accept all valid programs. To remedy these deficits I propose changes to the grammar. Deficit 0: Number literals The syntax diagrams do not allow for letters in the digits of a based number, i. e. a number starting with a radix prefix. Remedy: digit ::= 0-9 digits ::= digit {digit} extended_digit ::= digit | A-Z extended_digits ::= extended_digit {extended_digit} exponent ::= e [-] digits decimal_number ::= [-] digits [.digits] [exponent] based_number ::= digits r [-] extended_digits [.extended_digits] [exponent] number ::= decimal_number | based_number Remark: The nonterminal "number" should be called "number literal". It lessens confusion and enhances clarity if you differ between a number and a notation of a number. Programmers tend to think of numbers stored in a memory of "hex numbers". This forces them to use base 16 notation instead of base 10 notation, which usually better fits the problem at hand. I was even drilled to manually convert between notations when taught machine language, which turned out to be totally unneccessary. Use the notation that best serves you when thinking about the problem. When you specify the entries of a bitmap, base 2 notation seems fine. (End of Remark) (End of Deficit 0) Deficit 1: Character sets in comments, character literals, and string literals. The set of characters that may occur in a string literal, character literal or in a comment turns out to contain 91 characters (Blue Book) or 93 characters (Purple Book). But 95 characters are accepted by the implementation, namely the graphic characters from the ASCII set. Remedy: letter ::= A-Z | a-z special_character ::= + | / | \ | * | ~ | < | > | = | @ | % | | | & | ? | ! | , character ::= [ | ] | { | } | ( | ) | <-| ^ | ; | $ | ! | # | : | - | ` | | digit | letter | special_character Watch the invisible space character below the "!"! Thus 15 special characters, 10 digits, 52 letters and 16 other characters can be derived from "character", which totals to 93. Single and double quotes are not derivable, because of the production rule for a string, that must not contain a single quote and for a comment that must not contain a single double quote. (End of Deficit 1) Deficit 2: The Comma is a Binary Selector Binary selectors are one or two special characters or the minus sign. Since the comma is a valid binary selector, it must be a special character, as in the remedy of Deficit 1.

Remark: I needed a magnifying glass to decipher some of the characters in the syntax diagrams, notably "|", "^", "{", "}". (End of Remark) (End of Deficit 2) Deficit 3: Non graphic characters On page 20 the Blue Book states that comments, strings and character constants may contain 'any character'. This term seems to include control characters. At least the carriage return (ASCII 13) and horizontal tabulator (ASCII 9) seem to be "any character". Remedy: control_character ::= (ASCII 0 - ASCII 31) | ASCII 127 character ::= ... | control_character Remark: Expect errors in Smalltalk-80 if ASCII 0 occurs in a string or a character constant. (End of Remark) (End of The Grammar is Incomplete). Question: The nonterminal "special_character" generates most characters that are commonly used as binary operators. What is so special about the minus character that it is not even a special character? Answer: The only occurrence of special characters is in the rule binary_selector ::= (- | special_character) [special_character] In English, this means: "The minus character must not appear as the second character in a binary selector." This is more easily expressed in a grammar, if "-" is excluded from the special characters. (End of Answer) Question: Which disaster could occur if "-" were the second character of a binary selector? Answer: The expression "1@-1" would be ambiguous. It could be parsed as "1 @- 1" or as "1 @ -1". The V2 sources contain nine expressions that would turn obscure by allowing "-" as the second character of a binary operator: in Form>>shapeFill:interiorPoint: dirs <- Array with: 1@0 with: -1@0 with: 0@1 with: 0@-1. in ListView>>deEmphasizeView aRectangle <- aRectangle insetOriginBy: 0@-1 cornerBy: 0@0 and seven more occurrences in Cursor class>>initialize, which creates various cursors and specifies their offsets as, e. g., offset: -5@-7. (End of Answer) Remark: These ambiguities can be resolved by borrowing a rule from the C language: Adjacent tokens must be separated by white space to resolve ambiguities. This ammendment would have saved us two questions and one remark. (End of Remark) (End of Minus is not a Special Character) (End of Grammar of Smalltalk-80) Date: 25.05.2006 Author: Wolfgang Helbig (helbigAtLehreDotBA-StuttgartDotDE) Changes: 15.06.2006: Replaced Backus Naur Form by Backus Normal Form. Why: First, Naur does not want to be associated with BNF and second, Naur does not acknowledge the potential contribution of mathematics to computing science. 23.07.2006: Borrow white space rule from the C language.