Posted to tcl by sebres at Mon Nov 18 18:15:10 GMT 2019view raw

  1. tclThreadAlloc vs. CPU-cache (fragmentation/granularity/etc)...
  2.  
  3. Issue: the longer tcl works - the slower it becomes...
  4.  
  5. The simplest example illustrating this:
  6.  
  7. proc test {} {timerate -calibrate {incr i} 10000 10000000}; test; # calibrate overhead for incr
  8. proc test args {puts "set: [timerate { set a([incr i]) _ } {*}$args]"; lset args 1 $i; set i 0; puts "get: [timerate { set a([incr i]) } {*}$args]" }
  9. time { test 10000 1000000 } 50
  10.  
  11. * 1x-threaded:
  12. set: 0.153809 µs/# 1000000 # 6501570 #/sec 153.809 net-ms
  13. get: 0.049532 µs/# 1000000 # 20188968 #/sec 49.532 net-ms
  14. ... 100 times ...
  15. set: 0.598031 µs/# 1000000 # 1672154 #/sec 598.031 net-ms
  16. get: 0.331693 µs/# 1000000 # 3014836 #/sec 331.693 net-ms
  17. * 16x-threaded:
  18. set * 16x: 0.573478 µs/# 4000000 # 291248 #/sec 2293.913 net-ms
  19. get * 16x: 0.077514 µs/# 4000000 # 14923264 #/sec 310.059 net-ms
  20. ... 20 times ...
  21. set * 16x: 3.539208 µs/# 4000000 # 304646 #/sec 14156.834 net-ms
  22. get * 16x: 0.519618 µs/# 4000000 # 2048606 #/sec 2078.474 net-ms
  23.  
  24. I fixed this "wrong" behaviour in my own threaded-alloc module, which prefers moving of whole free pages the single objects:
  25.  
  26. * 1x-threaded:
  27. set : 0.140644 µs/# 250000 # 7110150 #/sec 35.161 net-ms
  28. get : 0.018572 µs/# 250000 # 53844497 #/sec 4.643 net-ms
  29. ... 100 times ...
  30. set : 0.083600 µs/# 250000 # 11961722 #/sec 20.900 net-ms
  31. get : 0.017704 µs/# 250000 # 56484410 #/sec 4.426 net-ms
  32. * 16x-threaded:
  33. set * 16x: 0.810806 µs/# 4000000 # 129652 #/sec 3243.227 net-ms
  34. get * 16x: 0.556715 µs/# 4000000 # 246766 #/sec 2226.860 net-ms
  35. ... 20 times ...
  36. set * 16x: 0.665778 µs/# 4000000 # 1671624 #/sec 2663.113 net-ms
  37. get * 16x: 0.343585 µs/# 4000000 # 3428775 #/sec 1374.340 net-ms
  38.