Skip to main content
The canonical model is the format-neutral centre of astilba. Every file/syntax adapter maps files to it on the way in (parse) and from it on the way out (export). It lives in @astilba/core and knows nothing about i18next, ICU, JSON, or YAML.

The shape

A CanonicalModel holds one language’s worth of data. Each Key is a logical message: a base key path with one entry per context value. Each context cell holds a PluralSet, which carries the per-CLDR-category values.
interface CanonicalModel {
  language: string;              // BCP-47, e.g. "en", "en-US", "pt-BR"
  keys: Map<string, Key>;        // `${namespace}:${base}` -> Key
}

interface Key {
  namespace: string;
  base: string;                  // key path (project key separator, default ".") without namespace or plural/context suffixes
  contexts: Map<string, PluralSet>;  // "" === no context
}

interface PluralSet {
  kind: PluralKind;              // "none" | "cardinal" | "ordinal"
  values: Map<CLDRCategory, Value>;  // for kind "none", the single value is stored under "other"
  bare?: Value;                  // rare: a suffix-less form alongside plural forms
}

interface Value {
  raw: string;                   // byte-exact source text — the source of truth
  tokens: ValueToken[];          // derived view, used for masking/validation
}
A value’s tokens are derived from raw: value.tokens.map(t => t.raw).join("") always reconstructs raw exactly. Four token kinds exist:
KindExamples
textplain prose
interpolation{{var}}, {{var, format}}, {{var, format(options)}}
nesting$t(ref), $t(ref, {"opt": ...})
markupan HTML/XML tag <...> or entity &...; — opaque

Invariants worth knowing

Three properties hold by design, and the harness relies on them.

Value bytes are preserved exactly

Value.raw is the source of truth and is never mutated. tokens is a derived view used only for masking and analysis — it is never used to reconstruct output. On export, the raw bytes are written verbatim. This is why a deterministic formatter renders identically on both sides of a round-trip: the value text is byte-identical, so any deterministic function of it is too.

Plurals are structural, not suffixed

Plurals are stored as a CLDR-category → value map, not as a set of suffixed flat keys (_one, _other, …). The suffix set is re-derived from the target language on export, never carried through. So a key parsed from English _one/_other can export with the full Russian suffix set if the target is Russian.

One plural kind per (key, context)

A cell holds either a cardinal map or an ordinal map, never both. A key that carries both _one and _ordinal_one is valid native i18next, but the Phase-0 model cannot represent both at once — so the i18next adapter rejects it loudly (INVALID_RESOURCE) rather than silently dropping a form. Holding both is a road-to-1.0 item.
"none" is distinct from a single-category cardinal. foo (none) and foo_other (cardinal, in a language whose only category is other, like Japanese) are different keys and round-trip differently.

The bare field

PluralSet.bare exists for the rare i18next case where a context key has both a suffix-less form (used when t() is called without count) and plural forms (used when count is given). Keeping it lets both render paths round-trip losslessly.

In-memory only, for now

The model is Map-based for fast lookup, which means it is not directly JSON-serialisableJSON.stringify yields {} for the Maps. A persistence/transport DTO (plain objects, or a toJSON/fromJSON pair) is a v1.0 item, needed once a backend stores or ships the model. For now, treat the model as a transient, in-process structure.