There's a limit of 65535 glyphs per OpenType font, and Unicode has 150249 graphic characters in version 15.1, of which 98682 is Han (you need 2 fonts for that alone), with leaves 51567 of everything else, however Indic scripts may require many ligatures, and (U)CSUR is tacitly endorsed as a method to use IP-encumbered scripts.
Alternatively, one can pretend Unicode 3.0 from 1999 is the last version with 49168 graphic characters all in BMP, or that Plane 2 never existed until 9.0 from 2016, when annoyingly large Tangut was added to Plane 1, but that gets outdated pretty quickly. If you ignore Tangut, Hieroglyphs, Cuneiform, and Bamum Supplement, you get basically Unifont-EX, which plateaus at 15.1 BMP and 11.0 SMP, but with some glyph deduplication could get to have the 16.0 Symbols for Legacy Computing and its Supplement.
In future Unicode, there will be more than 64K of nonHan, so better to choose interesting scripts and blocks now.
Hangul is best served by an advanced OpenType syllable composing system, there is way more than those precomposed 11172 syllables, 125*95*137=1626875, with North Korean extensions 128*99*141=1786752. That amount precomposed would need 25 or 28 fonts and doesn't even fit into 10:FFFFh limited Unicode, but may just fit into most-4-bytes UTF-8, which ends at 1F:FFFFh. Time to define usage of FDD0..FDEF as UTF-16 super-surrogate 8-plets to reach full 32 bits of UCS-4.
PUA assigment is based on Fairfax and Constructium. The E000~F8FF range should be considered mostly (inter)nationalized according to USCUR, as this contains the much needed tlhIngan pI'qaD and Tengwar. The Trekkies and Tolkienists are a stronger user base than medievalists and linguists. Most of MUFI, CYFI and SIL has been incorporated and the leftovers are mostly ligatures, variations, stylistic sets, or precomposed. There is SMuFL PUA agreement, but that is mostly getting into Unicode too, and I'm a tracker, pianoroll, and ASCII tab guy anyway. Also Nerd Fonts have finally fixed them overflowing into Arabic Presentation Forms by moving them to astral PUA, so they no longer mess with Quran text from tanzil.net, however Powerline conflicts with Tengwar (not Cirth though).
Begin End Name Size Stot
000000 0033FF Lower BMP 3400 3400
004DC0 004DFF Yijing Hexagrams 0040 3440
00A4D0 00ABFF Middle BMP 0730 3B70
00D7B0 00D7FF Hangul Jamo Extended-B 0050 3BC0
00E000 00EDFF Lower UCSUR 0E00 49C0
00EF00 00EFFF Hex Byte Pictures 0100 4AC0
00F000 00F1FF Kamakawi 0200 4CC0
00F200 00F27F Box Drawing Ext, Fill Patterns, Shade Quads 0080 4D40
00F400 00F43F C1 Control Pictures 0040 4D80
00F4C0 00F4EF Ath 0030 4DB0
00F500 00F54F Kodo Symbols 0050 4E00
00F550 00F55F Mathematical Symbols Appendix 0010 4E10
00F560 00F56F Camp Duodecimal Numerals 0010 4E20
00F580 00F58F Geomantic Figures 0010 4E30
00F590 00F5FF C64-OS and Commander X16 Symbols 0070 4EA0
00F600 00F7FF Adobe: LGC Compatibility Forms 0200 50A0
00F800 00F83F Apple: Hoefler Ornaments 0040 50E0
00F880 00F89F Adobe: Thai Compatibility Forms 0020 5100
00F8A0 00F8FF UCSUR: Aiha and Klingon 0060 5160
00FB00 00FFFF Upper BMP 0500 5600
010000 012FFF Lower SMP, Cuneiform 3000 8600
013000 015AFF Egyptian, Anatolian, and Mayan Hieroglyphs 2B00 B170
016000 0160FF Cirth and Tengwar (no Mandombe) 0100 B270
016140 0161FF Sarati, other Tolkien scripts, and Moon 00C0 B330
016200 0167FF Blissymbols 0600 B900
016EF0 016EFF Bopomofo Ext-A, Kanbun Ext-A, IdeoSym&Punc 0060 B960
01A760 01A77F Rejang Extended 0020 B980
01AFD0 01AFFF Kana Extended-C and B 0030 B9B0
01B000 01B16F Kana Supplement, Kana Ext-A, Small Kana Ext 0170 BB20
01BA00 01BCFF Indus, Shorthands (RIP Rongorongo) 0300 BE20
01CC00 01CBFF Symbols for Legacy Computing Supplement 0300 C120
01D100 01D24F Musical Symbols, Ancient Greek Music Not. 0150 C270
01D2C0 01D2FF Kaktovik and Mayan Numerals 0040 C2B0
01D300 01D37F Tai Xuan Jing Symbols, Counting Rod Nums 0080 C240
01D380 01D7FF Mathematical Alphanumerical Symbols 0400 C640
01D800 01DAAF Sutton SignWriting 02B0 C7F0
01DF00 01E08F Latin Ext-G, Glagolitic Sup, Cyrillic Ext-D 0190 C980
01E7E0 01E7FF Buginese Sup, Lontara B-B, Ethiopic Ext-B 0090 CA10
01E900 01E95F Adlam 0060 CA70
01EC00 01FFFF Upper SMP 1400 DE70
0F0000 0F1C3F Upper USCUR 1C40 FBB0
0FF030 0FF0DF Domino Tiles Extended, Powerline Symbols 00B0 FC60
0FE000 0FE07F Tengwar Presentation Forms 0080 FCE0
0FE680 0FE6DF Ewellic Presentation Forms 0060 FD40
0FF380 0FF3FF Tahano Veno and Aliphbeph 0080 FDC0
0FF400 0FF51F Voynich 0120 FEE0
0FF700 0FF7FF 7 Segment Display Patterns 0100 FFE0
0FF900 0FFEFF Sitelen Pona Presentation Forms-A,B 0300 02E0
0FFF00 0FFFFF Symbols for Legacy Computing Appendix 0100 03E0
I am 3E0h=992 codepoints over 65536 in blocks, but some are intentionally oversized for potential expension (Cuneiform, Hieroglyphs), some are only proposals (Indus, Blissymbols), there are gaps in the allocated space (1300 codepoints in BMP alone), and some characters look the same and can use the same glyph. However there're the Indic scripts, which would have to contend with only viramas instead of ligatures.
No comments:
Post a Comment
Barely anyone comments, so I don't moderate. Free advertising, I guess.