Posted to tcl by dgp at Tue Apr 14 14:24:44 GMT 2015view pretty
# demo.tcl # Call as: tclsh demo.tcl $length $width $copies $iters lassign $argv length width copies iters proc demo {length width copies} { set original [lrepeat $length [lrepeat $width {}]] while {[incr copies -1]} { set original [lmap _ $original {lrepeat $width {}}] } } while {[incr iters -1]} { puts [time {demo $length $width $copies}] } $ tclsh demo.tcl 15000 1 2000 30 5527193 microseconds per iteration 5858040 microseconds per iteration 5994792 microseconds per iteration 6113862 microseconds per iteration 6195977 microseconds per iteration 6261455 microseconds per iteration 6318572 microseconds per iteration 6356606 microseconds per iteration 6397576 microseconds per iteration 6480433 microseconds per iteration 6449891 microseconds per iteration 6454701 microseconds per iteration 6480876 microseconds per iteration 6494438 microseconds per iteration 6506500 microseconds per iteration 6516628 microseconds per iteration 6529514 microseconds per iteration 6556522 microseconds per iteration 6531511 microseconds per iteration 6546579 microseconds per iteration 6551882 microseconds per iteration 6545872 microseconds per iteration 6560726 microseconds per iteration 6555647 microseconds per iteration 6563487 microseconds per iteration 6574076 microseconds per iteration 6588759 microseconds per iteration 6582920 microseconds per iteration 6583538 microseconds per iteration Previous demo script ran into trouble because it failed to demo the issue if testing an 8.6.* recent enough to optimize constant lists in bytecode. The essence here is to force the zippy allocator to allocate a large number of same-sized structs, and then free them all back, over and over and over. This gets slower up to a point and then seems to stabilize. Speculation is that when zippy first allocates memory, it does so in a large block, and forms a free list (bucket) out of the memory all in that block. Drawing from that free list is acting on localized memory, so we get cache line benefits, etc. When the memory is freed, it goes back into the free lists, but perhaps not in order, and likely interleaved with memory frees from other allocation times. Effect is that the free list buckets get more "fragmented" (maybe poor term) and walking the free list tends to take longer because more often we have to actual go out and get memory instead of luckily finding it already in the CPU. Don't have strong evidence for this. Just my best guess.