Posted to tcl by kbk at Mon Jan 23 19:25:25 GMT 2017view pretty

Some work that I've been messing with recently in the tclquadcode
compiler raises an issue that has been a niggling itch to me for
nearly the entire 25+ years that I've been using Tcl: the fact that
our comparison operations do not define an ordering.

The issue is, of course, that we use the same notation for string and
numeric comparisons, which yields anomalies when comparing operands of
mixed type. In particular, we have the cycle:

"0x10" < "0y" (string comparison)
"0y" < "1" (string comparison again)
"1" < "0x10" (numeric comparison)

I'd like to get a TIP rolling to introduce some means of declaring
what style of comparison is meant. We already have introduced one
disambiguation (which was a start, but not nearly enough), with the
'eq' and 'ne' operators.  {$x eq $y} and {$x ne $y} are always string
comparisons.

At the very least, I want to propose 'lt', 'le', 'gt' and 'ge', to
complete the set of string comparison operators.

That alone is not enough. I also want to have some way of forcing a
numeric comparison.

One possibility is that the desired semantics can actually be achieved
in current code, by writing

    [expr {+$x < +$y}]

Since unary '+' requires a numeric argument, this expression will have
my desired effect of making sure that $x and $y are both numeric. If
we go this route, we should probably document this as an idiom in the
manual pages that discuss arithmetic expressions. It can be foreseen
to have a significant impact in the performance, for instance, of code
compiled by tclquadcode, because in a form like:

    for {set x $a} {$x < $b} {incr x} { ... }

it may be impossible for the compiler to prove that $b is
numeric. Without such a guarantee, the compiler is forced to generate
code for a possible string comparison on each trip through the loop. I
have notes on how the test of $b's type might be detected as
loop-invariant and hoisted out of the inner loop, but have not tried
to implement such a thing, and the implementation looks decidedly
non-trivial. If we change the loop to

    for {set x $a} {+$x < +$b} {incr x} { ...}

the problem, while still non-trivial, is somewhat easier. There's no
possibility of unexpected comparison semantics. There will still be
unneeded type-checking code checking whether unary '+' needs to throw
an error; again, I have notes on how this could be avoided, by
unrolling the loop once.

As an alternative, we might want to adopt a second syntax for numeric
comparisons. Something like [if {$a :<: $b} {...}] for forcing the
comparison to be numeric would work. This wouldn't save many
keystrokes, but might make the intent more obvious than a somewhat
arcane use of unary '+'. I'll let others argue about notation, but
take as a requirement that it has to be lexically compatible with
whatever we do in today's arithmetic expressions.

A more radical proposal might be, once 'lt' and friends are in place,
to deprecate the use of '<', '<=', etc. for non-numeric comparisons.
I've actually learnt to be fairly careful in Tcl code with such
things, and in my own code will usually write explicitly:

    if {[string compare $a $b] < 0} { ... }

if string comparison is intended, precisely to avoid surprises if $a
and $b both look like numbers. (That is, I already reserve <, <=, etc.
for numeric comparisons.) Given our conservatism about breaking
working code, no matter how weird the behaviour that it depends on, I
suspect this last idea is a bridge too far, even though I'd like to
see it.

So, what do people think? I'd be happy to draft a TIP to reflect a
consensus - consider this to be the skeleton of such a TIP.