Posted to tcl by dgp at Tue Apr 14 14:24:44 GMT 2015view pretty

# demo.tcl
# Call as: tclsh demo.tcl $length $width $copies $iters
lassign $argv length width copies iters
proc demo {length width copies} {
    set original [lrepeat $length [lrepeat $width {}]]
    while {[incr copies -1]} {
        set original [lmap _ $original {lrepeat $width {}}]
    }
}
while {[incr iters -1]} {
    puts [time {demo $length $width $copies}]
}

$ tclsh demo.tcl 15000 1 2000 30
5527193 microseconds per iteration
5858040 microseconds per iteration
5994792 microseconds per iteration
6113862 microseconds per iteration
6195977 microseconds per iteration
6261455 microseconds per iteration
6318572 microseconds per iteration
6356606 microseconds per iteration
6397576 microseconds per iteration
6480433 microseconds per iteration
6449891 microseconds per iteration
6454701 microseconds per iteration
6480876 microseconds per iteration
6494438 microseconds per iteration
6506500 microseconds per iteration
6516628 microseconds per iteration
6529514 microseconds per iteration
6556522 microseconds per iteration
6531511 microseconds per iteration
6546579 microseconds per iteration
6551882 microseconds per iteration
6545872 microseconds per iteration
6560726 microseconds per iteration
6555647 microseconds per iteration
6563487 microseconds per iteration
6574076 microseconds per iteration
6588759 microseconds per iteration
6582920 microseconds per iteration
6583538 microseconds per iteration

Previous demo script ran into trouble because
it failed to demo the issue if testing an 8.6.*
recent enough to optimize constant lists in bytecode.

The essence here is to force the zippy allocator to
allocate a large number of same-sized structs, and then
free them all back, over and over and over.  This gets
slower up to a point and then seems to stabilize.

Speculation is that when zippy first allocates memory,
it does so in a large block, and forms a free list (bucket)
out of the memory all in that block.  Drawing from that free
list is acting on localized memory, so we get cache line benefits,
etc.  When the memory is freed, it goes back into the free
lists, but perhaps not in order, and likely interleaved
with memory frees from other allocation times.  Effect is
that the free list buckets get more "fragmented" (maybe poor
term) and walking the free list tends to take longer because
more often we have to actual go out and get memory instead of
luckily finding it already in the CPU.

Don't have strong evidence for this.  Just my best guess.