Posted to tcl by dgp at Tue Apr 14 14:24:44 GMT 2015view raw

  1.  
  2. # demo.tcl
  3. # Call as: tclsh demo.tcl $length $width $copies $iters
  4. lassign $argv length width copies iters
  5. proc demo {length width copies} {
  6. set original [lrepeat $length [lrepeat $width {}]]
  7. while {[incr copies -1]} {
  8. set original [lmap _ $original {lrepeat $width {}}]
  9. }
  10. }
  11. while {[incr iters -1]} {
  12. puts [time {demo $length $width $copies}]
  13. }
  14.  
  15. $ tclsh demo.tcl 15000 1 2000 30
  16. 5527193 microseconds per iteration
  17. 5858040 microseconds per iteration
  18. 5994792 microseconds per iteration
  19. 6113862 microseconds per iteration
  20. 6195977 microseconds per iteration
  21. 6261455 microseconds per iteration
  22. 6318572 microseconds per iteration
  23. 6356606 microseconds per iteration
  24. 6397576 microseconds per iteration
  25. 6480433 microseconds per iteration
  26. 6449891 microseconds per iteration
  27. 6454701 microseconds per iteration
  28. 6480876 microseconds per iteration
  29. 6494438 microseconds per iteration
  30. 6506500 microseconds per iteration
  31. 6516628 microseconds per iteration
  32. 6529514 microseconds per iteration
  33. 6556522 microseconds per iteration
  34. 6531511 microseconds per iteration
  35. 6546579 microseconds per iteration
  36. 6551882 microseconds per iteration
  37. 6545872 microseconds per iteration
  38. 6560726 microseconds per iteration
  39. 6555647 microseconds per iteration
  40. 6563487 microseconds per iteration
  41. 6574076 microseconds per iteration
  42. 6588759 microseconds per iteration
  43. 6582920 microseconds per iteration
  44. 6583538 microseconds per iteration
  45.  
  46. Previous demo script ran into trouble because
  47. it failed to demo the issue if testing an 8.6.*
  48. recent enough to optimize constant lists in bytecode.
  49.  
  50. The essence here is to force the zippy allocator to
  51. allocate a large number of same-sized structs, and then
  52. free them all back, over and over and over. This gets
  53. slower up to a point and then seems to stabilize.
  54.  
  55. Speculation is that when zippy first allocates memory,
  56. it does so in a large block, and forms a free list (bucket)
  57. out of the memory all in that block. Drawing from that free
  58. list is acting on localized memory, so we get cache line benefits,
  59. etc. When the memory is freed, it goes back into the free
  60. lists, but perhaps not in order, and likely interleaved
  61. with memory frees from other allocation times. Effect is
  62. that the free list buckets get more "fragmented" (maybe poor
  63. term) and walking the free list tends to take longer because
  64. more often we have to actual go out and get memory instead of
  65. luckily finding it already in the CPU.
  66.  
  67. Don't have strong evidence for this. Just my best guess.