Character decomposition
From CharacterDB
Many characters can be decomposed into two or more components, that again are Chinese characters. Character decompositions can be used in many ways, including character lookup, finding patterns for font design or studying characters. Even the stroke order and stroke count can be deduced from the stroke information of the character’s components.
A character decomposition depends on the appearance of the character, a glyph. A decomposition into a set of components most often cannot be done in a distinct way, see e.g. 亘. Furthermore sometimes only one component can be given, while the other component is not encoded by Unicode as a character in its own right.
Components again might be characters that contain further components, thus a complex decomposition in several steps is possible.
Decompositions are stored using Ideographic Description Sequences (IDS). These sequences consist of Unicode IDS operators and characters to describe the structure of the character. There are binary IDS operators to describe decomposition into two components (e.g. ⿰ for one component left, one right as in 好: ⿰女子) or trinary IDS operators for decomposition into three components (e.g. ⿲ for three components from left to right as in 辨: ⿲⾟刂⾟). By using IDS operators it is possible to give a basic structural information, that for example is sufficient in many cases to derive an overall stroke order from two single sets of stroke orders, namely that of the components.
| TODO | Describe IDS structure and derivation from the definition by Unicode. |
| TODO | Transfer decomposition guidelines from http://code.google.com/p/cjklib/wiki/Decomposition |
[edit] Character structures
| Structure | Description | Example |
|---|---|---|
| ⿰ | left-right | 打 |
| ⿱ | above-below | 它 |
| ⿲ | left-middle-right | 辦 |
| ⿳ | above-middle-below | 京 |
| ⿴ | full surround | 困 |
| ⿵ | surround from above | 向 |
| ⿶ | surround from below | 凶 |
| ⿷ | surround from left | 匹 |
| ⿸ | surround from upper left | 后 |
| ⿹ | surround from upper right | 句 |
| ⿺ | surround from lower left | 返 |
| ⿻ | overlaid | 中 |
| TODO | Give meaningful examples and describe their usage and link to similar IDS operators that should not get confused. |

