Posted to tcl by dbohdan at Sat Feb 02 20:51:17 GMT 2019view raw
- #include <tcl.h>
- /* https://github.com/sheredom/utf8.h */
- #include "utf8.h"
- int main() {
- printf("%i\n", Tcl_NumUtfChars("Hello %F0%9F%8C%8D!", -1)); /* 11 */
- printf("%zi\n", utf8len("Hello %F0%9F%8C%8D!")); /* 8 */
- }
Comments
Posted by sebres at Wed Feb 06 20:53:02 GMT 2019 [text] [code]
out of BMP chars are basically supported firstly since 8.7... # <= tcl8.6: % string length [encoding convertfrom utf-8 "Hello \xF0\x9F\x8C\x8D!"] 11 # >= tcl8.7: % string length [encoding convertfrom utf-8 "Hello \xF0\x9F\x8C\x8D!"] 9 The one char more (compared to utf2len) counted by 8.7 here, going to "wrong" recognition of the length (2) by such glyphs. Like here by this globe-char (just it is really 2 "chars" in sense of console/stream, and one will indeed see 2 characters if the font does not support it): % string length \U1F30D 2 % puts \U1F30D %F0%9F%8C%8D