Posted to tcl by aspect at Thu Mar 06 18:19:34 GMT 2014view pretty

# Tcl feels like a programmable debugger for itself.  The facilities for introspection
# and reflection are par excellence.  What seems sorely neglected though - especially because it's so close - is the few more building blocks for Abstract Interpretation.
#
# In AI, the goal is to run the program on a machine with simplified semantics.  For example:  I have a small script containing a number of procs for which I want to generate a complete call-graph, without having to run them all with every possible input while tracing.  In a page of code I can get most of the way there.
#
# Here's a simple test input:
#
#    set script {
#        proc a {} { b; c }
#        proc b {} { c }
#        proc c {} {
#            puts "Hello, world!"
#        }
#    }

proc K {x args} {set x}
proc readfile {fn} {K [read [set fd [open $fn r]]] [close $fd]}
set script [readfile [lindex $argv 0]]

interp create -safe i0
interp eval i0 {namespace delete ::}

namespace eval ai { ;# our abstract interpreter
    variable procs ;# an array
    proc collect_procs {name args body} {
        variable procs
        set procs($name) ""
    }
    proc trace_calls {name args body} {
        variable current_proc
        set current_proc $name
        try {
            i0 eval $body
        } finally {
            unset current_proc
        }
    }
    proc unknown {cmd args} {
        variable procs
        variable current_proc
        if {[info exists current_proc] && [info exists procs($cmd)]} {
            # track the call
            lappend procs($current_proc) $cmd
        }
    }
}

# 1st run: collect only proc statements
interp alias i0 unknown {} ai::unknown
interp alias i0 proc {} ai::collect_procs

i0 eval $script

# 2nd run: collect any calls to known procs
interp alias i0 proc {} ai::trace_calls
i0 eval $script

# procs(a) = b c    ;# indicates [a] calls [b] and [c]
parray ::ai::procs

# this works perfectly, except for a few details:
#
#  - core commands that take a script argument should be shimmed like [proc].  Trivial.  It's noteworthy that (for this analysis) in every case we want to "execute" (under AI) every script argument exactly once.  Callbacks are a bit more tricky (should an [apply] argument be evaluated?), but not that much .. and AI doesn't seek 100% replication of the host language - that would defeat the point.
#  - namespace resolution support.  Not exactly trivial, but a SMOP.  Anyone who's written procs using [uplevel] is aware of the sort of precautions that are needed here.
#  - variables.  This bit sucks.
#
# The slave interp doesn't give us any control over variable resolution (as it does over command resolution).  Variables need to exist in the enclosing scope before they are referenced, or an ugly error will be thrown, that we cannot recover from.  To address variables with existing machinery we would need to ensure [set], [lappend] and all its friends create the appropriate variables and even handle juggling scopes for [apply] and [proc] execution.  This is a lot of work ... particularly for this analysis, which simply wants to *ignore* variables!
# And for a more complex analysis using variables - taint analysis, simple type tracking - the AI would have to pass around structures (or handles) that every command must know about.  It significantly complicates the code.
#
# An answer?  Well, if i0 could be given a command for variable resolution (kind of like [unknown] for command resolution), this particular AI could be completed with two lines.  And more complex analyses would be well within reach.  With nice and simple code.
#