Posted to tcl by kbk at Mon Jan 23 19:25:25 GMT 2017view pretty
Some work that I've been messing with recently in the tclquadcode compiler raises an issue that has been a niggling itch to me for nearly the entire 25+ years that I've been using Tcl: the fact that our comparison operations do not define an ordering. The issue is, of course, that we use the same notation for string and numeric comparisons, which yields anomalies when comparing operands of mixed type. In particular, we have the cycle: "0x10" < "0y" (string comparison) "0y" < "1" (string comparison again) "1" < "0x10" (numeric comparison) I'd like to get a TIP rolling to introduce some means of declaring what style of comparison is meant. We already have introduced one disambiguation (which was a start, but not nearly enough), with the 'eq' and 'ne' operators. {$x eq $y} and {$x ne $y} are always string comparisons. At the very least, I want to propose 'lt', 'le', 'gt' and 'ge', to complete the set of string comparison operators. That alone is not enough. I also want to have some way of forcing a numeric comparison. One possibility is that the desired semantics can actually be achieved in current code, by writing [expr {+$x < +$y}] Since unary '+' requires a numeric argument, this expression will have my desired effect of making sure that $x and $y are both numeric. If we go this route, we should probably document this as an idiom in the manual pages that discuss arithmetic expressions. It can be foreseen to have a significant impact in the performance, for instance, of code compiled by tclquadcode, because in a form like: for {set x $a} {$x < $b} {incr x} { ... } it may be impossible for the compiler to prove that $b is numeric. Without such a guarantee, the compiler is forced to generate code for a possible string comparison on each trip through the loop. I have notes on how the test of $b's type might be detected as loop-invariant and hoisted out of the inner loop, but have not tried to implement such a thing, and the implementation looks decidedly non-trivial. If we change the loop to for {set x $a} {+$x < +$b} {incr x} { ...} the problem, while still non-trivial, is somewhat easier. There's no possibility of unexpected comparison semantics. There will still be unneeded type-checking code checking whether unary '+' needs to throw an error; again, I have notes on how this could be avoided, by unrolling the loop once. As an alternative, we might want to adopt a second syntax for numeric comparisons. Something like [if {$a :<: $b} {...}] for forcing the comparison to be numeric would work. This wouldn't save many keystrokes, but might make the intent more obvious than a somewhat arcane use of unary '+'. I'll let others argue about notation, but take as a requirement that it has to be lexically compatible with whatever we do in today's arithmetic expressions. A more radical proposal might be, once 'lt' and friends are in place, to deprecate the use of '<', '<=', etc. for non-numeric comparisons. I've actually learnt to be fairly careful in Tcl code with such things, and in my own code will usually write explicitly: if {[string compare $a $b] < 0} { ... } if string comparison is intended, precisely to avoid surprises if $a and $b both look like numbers. (That is, I already reserve <, <=, etc. for numeric comparisons.) Given our conservatism about breaking working code, no matter how weird the behaviour that it depends on, I suspect this last idea is a bridge too far, even though I'd like to see it. So, what do people think? I'd be happy to draft a TIP to reflect a consensus - consider this to be the skeleton of such a TIP.