Posted to tcl by dgp at Tue Apr 14 14:24:44 GMT 2015view raw
- # demo.tcl
- # Call as: tclsh demo.tcl $length $width $copies $iters
- lassign $argv length width copies iters
- proc demo {length width copies} {
- set original [lrepeat $length [lrepeat $width {}]]
- while {[incr copies -1]} {
- set original [lmap _ $original {lrepeat $width {}}]
- }
- }
- while {[incr iters -1]} {
- puts [time {demo $length $width $copies}]
- }
- $ tclsh demo.tcl 15000 1 2000 30
- 5527193 microseconds per iteration
- 5858040 microseconds per iteration
- 5994792 microseconds per iteration
- 6113862 microseconds per iteration
- 6195977 microseconds per iteration
- 6261455 microseconds per iteration
- 6318572 microseconds per iteration
- 6356606 microseconds per iteration
- 6397576 microseconds per iteration
- 6480433 microseconds per iteration
- 6449891 microseconds per iteration
- 6454701 microseconds per iteration
- 6480876 microseconds per iteration
- 6494438 microseconds per iteration
- 6506500 microseconds per iteration
- 6516628 microseconds per iteration
- 6529514 microseconds per iteration
- 6556522 microseconds per iteration
- 6531511 microseconds per iteration
- 6546579 microseconds per iteration
- 6551882 microseconds per iteration
- 6545872 microseconds per iteration
- 6560726 microseconds per iteration
- 6555647 microseconds per iteration
- 6563487 microseconds per iteration
- 6574076 microseconds per iteration
- 6588759 microseconds per iteration
- 6582920 microseconds per iteration
- 6583538 microseconds per iteration
- Previous demo script ran into trouble because
- it failed to demo the issue if testing an 8.6.*
- recent enough to optimize constant lists in bytecode.
- The essence here is to force the zippy allocator to
- allocate a large number of same-sized structs, and then
- free them all back, over and over and over. This gets
- slower up to a point and then seems to stabilize.
- Speculation is that when zippy first allocates memory,
- it does so in a large block, and forms a free list (bucket)
- out of the memory all in that block. Drawing from that free
- list is acting on localized memory, so we get cache line benefits,
- etc. When the memory is freed, it goes back into the free
- lists, but perhaps not in order, and likely interleaved
- with memory frees from other allocation times. Effect is
- that the free list buckets get more "fragmented" (maybe poor
- term) and walking the free list tends to take longer because
- more often we have to actual go out and get memory instead of
- luckily finding it already in the CPU.
- Don't have strong evidence for this. Just my best guess.