Posted to tcl by dbohdan at Sat Feb 02 20:51:17 GMT 2019view pretty

#include <tcl.h>
/* https://github.com/sheredom/utf8.h */
#include "utf8.h"

int main() {
    printf("%i\n", Tcl_NumUtfChars("Hello %F0%9F%8C%8D!", -1)); /* 11 */
    printf("%zi\n", utf8len("Hello %F0%9F%8C%8D!")); /* 8 */
}

Comments

Posted by sebres at Wed Feb 06 20:53:02 GMT 2019 [text] [code]

out of BMP chars are basically supported firstly since 8.7... # <= tcl8.6: % string length [encoding convertfrom utf-8 "Hello \xF0\x9F\x8C\x8D!"] 11 # >= tcl8.7: % string length [encoding convertfrom utf-8 "Hello \xF0\x9F\x8C\x8D!"] 9 The one char more (compared to utf2len) counted by 8.7 here, going to "wrong" recognition of the length (2) by such glyphs. Like here by this globe-char (just it is really 2 "chars" in sense of console/stream, and one will indeed see 2 characters if the font does not support it): % string length \U1F30D 2 % puts \U1F30D %F0%9F%8C%8D