Java, Smalltalk and Wasting Time. Introduction This is about two Squeak related bugs, one in Java, one in the GNU-C-optimizer. The last bug provokes three questions. One to Tim Rowledge, one to Jan Piumarta and one to Stefane Ducasse. (End of Introduction) The Java Bug: As the int type of the Java programming language covers a limited range of integers, a programmer is obliged to prove for every int expression that its value fits in this range. While programming the binary search method, Joshua Bloch overlooked that the sum of two int values might exceed the maximal int value and thus added one more example to the instructive list of error riddled expositions and implementations of this ill-famed algorithm. See also http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html (End of The Java Bug) Smalltalk shines This error cannot happen in Smalltalk since there is no limited range of integers. If the result of an expression exceeds the range of small integers, the virtual machine (VM) detects it and then either chooses an appropiate object to store the value or reports the overflow to a Smalltalk method, which then takes the right action. This way overflow is handled centralized in the VM and some kernel classes. And programming in Smalltalk means you are relieved from the annoying task to prove that integer values are inside proper ranges. Which is quite different from Java. And I expect more overflow errors sleeping in Java classes only waiting for memory chips becoming cheap enough and data structures big enough to break some neatly composed method. (End of Smalltalk shines) VM-Bug description This article could end here--without annoying fellow Squeakers for a change. If the VM were correct. But is it? Not the one from Squeak 1.18 as distributed from http://ftp.squeak.org/1.18/1.18.tar.gz When you send initialize to Cursor, you are notified of an improper store into indexable object. Squeaks "debugger" reveals that the expression 2r0100000001000000 bitShift:16 evaluates to -1069547520. But obviously, it should be 1077936128. This value is than stored into a bitmap -- which explaines the notification, since negative values must not be stored in a bitmap. This leaves the error in the routine primitiveBitShift:, a routine of the VM that is supposed to use the appropiate object to store the shifted value. And that's where it failed. It invokes the routine isIntegerValue: which returns true for 1077936128, even though in Squeak small integers are encoded as two's complemented 31 bit numbers, so their values vary in [-2^30 .. 2^30), which does not contain 1077936128. Time to read isIntegerValue: ! The routine ObjectMemory>>isIntegerValue: reads in Squeak 1.18: isIntegerValue: valueWord ^ valueWord >= 16r-40000000 and: [valueWord < 16r40000000] Squeak's C translator then yields: int isIntegerValue(int valueWord) { return (valueWord >= -1073741824) && (valueWord < 1073741824); } Both the method and the C-function look fair--at least to me. As Cursor>>initialize works with Squeak 3.6 I looked there and discovered a beauty, this time decorated with a verbose commentary trying to rephrase the Smalltalk expression in English. isIntegerValue: intValue "Return true if the given value can be represented as a Smalltalk integer value." "Details: This trick is from Tim Rowledge. Use a shift and XOR to set the sign bit if and only if the top two bits of the given value are the same, then test the sign bit. Note that the top two bits are equal for exactly those integers in the range that can be represented in 31-bits." ^ (intValue bitXor: (intValue << 1)) >= 0 Remark: The above commentary is not only rather useless but also wrong: If should read "can be represented in wordsize-1 bits." (End of Remark) I cut and pasted this expression into Squeak 1.18, translated it by Interpreter translate: 'Interp.c' doInlining: true And then rebuild the VM in Unix. First you need to replace carriage return characters by linefeed characters: $ cr2lf ../src/interp.c My cr2lf script reads: tr '\r' '\n' Then you need to recompile and relink the VM by $ make This gives you a new file SqueakVM, which I moved into my bin directory to relaunch the modified VM. Much to my surprise, Cursor initialize does not end with a notification any more! Great! But ... . What breaks the original version? I've stared at it over and over again, not trusting my eyes. I even applied formal predicate calculus to prove that both expressions (intValue bitXor: (intValue <<)) >= 0 and ^ valueWord >= 16r-40000000 and: [valueWord < 16r40000000] are equivalent for 16-bit words. Exploring interp.c, the output of the C-translator, yields the effect of inlining. The routine primitiveBitShift: invokes positive32BitIntegerOf:, which inlines isIntegerValue: as: ... if ((integerValue >= 0) && ((integerValue >= -1073741824) && (integerValue < 1073741824))) ... Still staring ... and still clueless. This leaves the hardware or the C compiler being responsible for this error. As I didn't feel reading the source of the GNU-C-Compiler or its output, I resorted to experimenting--I wrote is_small.c: main() { printf("%d\n", positive32BitIntegerFor(1073741824)); } int positive32BitIntegerFor(int integerValue) { int newLargeInteger; if ((integerValue >= 0) && ((integerValue >= -1073741824) && (integerValue < 1073741824))) { printf("ivalue: %d\n", integerValue); return 1; } else { return 0; } } Compiled and run it: $ cc is_small.c $ a.out 0 No error. Then I recompiled it useing the compile flags from the makefile as provided by Ian Piumarta: $ cc -O3 -funroll-loops -g is_small.c $ a.out ivalue: 1073741824 1 $ OK! The GNU-C-Compiler "optimizes" away the meaning of the expression. Which it shouldn't! I abondoned the O3 flag in the makefile, restored the old expression in Squeak's ObjectMemory>>isIntegerValue:, translated it, rebuild the VM and -- it works. And I'll never again will let GNU's optimizers fool around with my code. Needless to say I did not notice any slowdown of Squeak due to abandoning the optimizer. (End of VM-Bug description) Remark: Again I find myself wasting time running broken programs instead of spending time writing correct programs. (End of Remark) Proposal: In Squeak 1.18 the file InterpTestInline.c differs from the output of the C-translator. It would be nice, if the distributed VM source equalled the one produced by the distributed image. I also missed a hint as how to invoke the C-translator. (End of Proposal) Three questions: Question 0: Tim Rowledge, the expression ^ (intValue bitXor: (intValue << 1)) >= 0 is independent of the word size. Was that the reason you switched to it in Squeak 3.6 (or before)? Question 1: Ian Piumarta, what made you use the ill-famed GNU-C optimizer? Question 2: Stefane Ducasse, in a somewhat harsh tone you tried to persuade me to use the "right tools" to make the Hobbes Emulator accessible. Does Squeak's source repository let me browse all Squeak versions of all methods--together with a log message explaining the reason for the change? Like they do in the BSD projects with CVS. If so, I'll draw back question 0, since I'd find the answer in Squeak's source repository. Note, this would even start numbering my questions with one instead of zero. For your convenience :-)! (End of three (or two) questions) Quote: The programmer had complete control over his tool without the need for "experimental" programs for discovering its properties. This rigorous and indispensible clarity did not only extend itself over the hardware, but also over the basic software, such as loaders, input and output routines, etc., no more than a few hundred, perhaps a thousand instructions anyhow. (If I remember correctly, the ominous term "software" had still to be invented.) Sad remark. Since then we have witnessed the proliferation of baroque, ill-defined and, therefore, unstable software systems. Instead of working with a formal tool, which their task requires, many programmers now live in a limbo of folklore, in a vague and slippery world, in which they are never quite sure what the system will do to their programs. Under such regretful circumstances the whole notion of a correct program --let alone a program that has been proved to be correct-- becomes void. What the proliferation of such systems has done to the morale of the computing communitiy is more than I can describe. (End of sad remark.) (Quoted from E.W. Dijkstra: "A Discipline of Programming", Prentice Hall 1976) Date: 15.06.2006 Author: Wolfgang Helbig (helbigAtLehreDotBA-StuttgartDotDE)