Posted to tcl by dbohdan at Sat Feb 02 20:51:17 GMT 2019view pretty
#include <tcl.h> /* https://github.com/sheredom/utf8.h */ #include "utf8.h" int main() { printf("%i\n", Tcl_NumUtfChars("Hello %F0%9F%8C%8D!", -1)); /* 11 */ printf("%zi\n", utf8len("Hello %F0%9F%8C%8D!")); /* 8 */ }
Comments
Posted by sebres at Wed Feb 06 20:53:02 GMT 2019 [text] [code]
out of BMP chars are basically supported firstly since 8.7... # <= tcl8.6: % string length [encoding convertfrom utf-8 "Hello \xF0\x9F\x8C\x8D!"] 11 # >= tcl8.7: % string length [encoding convertfrom utf-8 "Hello \xF0\x9F\x8C\x8D!"] 9 The one char more (compared to utf2len) counted by 8.7 here, going to "wrong" recognition of the length (2) by such glyphs. Like here by this globe-char (just it is really 2 "chars" in sense of console/stream, and one will indeed see 2 characters if the font does not support it): % string length \U1F30D 2 % puts \U1F30D %F0%9F%8C%8D