The canonical model is the format-neutral centre of astilba. Every file/syntax adapter maps
files to it on the way in (parse) and from it on the way out (export). It lives in
@astilba/core and knows nothing about i18next, ICU, JSON, or YAML.
The shape
A CanonicalModel holds one language’s worth of data. Each Key is a logical message: a
base key path with one entry per context value. Each context cell holds a PluralSet,
which carries the per-CLDR-category values.
interface CanonicalModel {
language: string; // BCP-47, e.g. "en", "en-US", "pt-BR"
keys: Map<string, Key>; // `${namespace}:${base}` -> Key
}
interface Key {
namespace: string;
base: string; // key path (project key separator, default ".") without namespace or plural/context suffixes
contexts: Map<string, PluralSet>; // "" === no context
}
interface PluralSet {
kind: PluralKind; // "none" | "cardinal" | "ordinal"
values: Map<CLDRCategory, Value>; // for kind "none", the single value is stored under "other"
bare?: Value; // rare: a suffix-less form alongside plural forms
}
interface Value {
raw: string; // byte-exact source text — the source of truth
tokens: ValueToken[]; // derived view, used for masking/validation
}
A value’s tokens are derived from raw: value.tokens.map(t => t.raw).join("") always
reconstructs raw exactly. Four token kinds exist:
| Kind | Examples |
|---|
text | plain prose |
interpolation | {{var}}, {{var, format}}, {{var, format(options)}} |
nesting | $t(ref), $t(ref, {"opt": ...}) |
markup | an HTML/XML tag <...> or entity &...; — opaque |
Invariants worth knowing
Three properties hold by design, and the harness relies on them.
Value bytes are preserved exactly
Value.raw is the source of truth and is never mutated. tokens is a derived view used
only for masking and analysis — it is never used to reconstruct output. On export, the raw
bytes are written verbatim. This is why a deterministic formatter renders identically on both
sides of a round-trip: the value text is byte-identical, so any deterministic function of it
is too.
Plurals are structural, not suffixed
Plurals are stored as a CLDR-category → value map, not as a set of suffixed flat keys
(_one, _other, …). The suffix set is re-derived from the target language on export,
never carried through. So a key parsed from English _one/_other can export with the full
Russian suffix set if the target is Russian.
One plural kind per (key, context)
A cell holds either a cardinal map or an ordinal map, never both. A key that carries
both _one and _ordinal_one is valid native i18next, but the Phase-0 model cannot
represent both at once — so the i18next adapter rejects it loudly (INVALID_RESOURCE) rather
than silently dropping a form. Holding both is a road-to-1.0 item.
"none" is distinct from a single-category cardinal. foo (none) and foo_other
(cardinal, in a language whose only category is other, like Japanese) are different keys
and round-trip differently.
The bare field
PluralSet.bare exists for the rare i18next case where a context key has both a
suffix-less form (used when t() is called without count) and plural forms (used when
count is given). Keeping it lets both render paths round-trip losslessly.
In-memory only, for now
The model is Map-based for fast lookup, which means it is not directly
JSON-serialisable — JSON.stringify yields {} for the Maps. A persistence/transport DTO
(plain objects, or a toJSON/fromJSON pair) is a v1.0 item, needed once a backend stores
or ships the model. For now, treat the model as a transient, in-process structure.