> ## Documentation Index
> Fetch the complete documentation index at: https://docs.astilba.com/llms.txt
> Use this file to discover all available pages before exploring further.

# MT masking

> How astilba protects placeholders and markup through a machine-translation call, and validates they came back unmodified.

When you send a translation string to a machine-translation (MT) or LLM engine, two things
must survive the round-trip **unmodified**: the interpolation variables and formatter
keywords (`{{count}}`, `{{date, datetime}}`), the `$t()` nesting refs, and any markup tags.
`@astilba/core` provides the masking and validation logic to make that reliable.

## The problem masking solves

Left unprotected, an MT engine will happily "translate" the parts it shouldn't:

* `{{count}} items` might come back as `{{cuenta}} elementos` — the variable renamed, so
  interpolation silently breaks.
* A formatter keyword like `one`/`other` inside a token, or a `$t()` ref name, can be
  translated to `uno`/`otros`, which then resolves to nothing.
* A markup tag can be dropped or rewritten.

astilba's answer is to **mask** every non-text token with an opaque sentinel before the MT
call, and **validate** that every placeholder came back unmodified after it.

## Mask, translate, unmask

`maskTokens` replaces every non-text token (interpolation, nesting, markup) with an opaque
sentinel drawn from the Unicode private-use area (`U+E000`…`U+E001`). The formatter keyword
and the `$t()` ref name live **inside** the masked span, so the engine never even sees them
to translate.

```ts theme={null}
import { maskTokens, unmask } from "@astilba/core";

const tokens = [
  { type: "text", raw: "Hello, " },
  { type: "interpolation", raw: "{{name}}", variable: "name" },
  { type: "text", raw: "! You have " },
  { type: "interpolation", raw: "{{count}}", variable: "count" },
  { type: "text", raw: " messages." },
] as const;

const { masked, parts } = maskTokens(tokens);
// masked → "Hello, \uE0000\uE001! You have \uE0001\uE001 messages."
// parts  → ["{{name}}", "{{count}}"]

// ...send `masked` to your MT engine, get a translation back...

const restored = unmask(translated, parts); // splices the originals back in
```

Sentinels use private-use-area delimiters so they carry **no linguistic content** for the
engine to "helpfully" translate, while still being detectable if the engine mangles them.

### One guard on masking

`maskTokens` throws (`MASK_VALIDATION`) if the literal text already contains a reserved
sentinel delimiter (`U+E000` / `U+E001`). This is rare but legal in real values — private-use
glyphs from icon fonts like Material Icons or Nerd Fonts — and masking it would be
ambiguous. Strip or escape those characters before masking.

## After translation: two complementary checks

Once the translation comes back, astilba offers two checks, each suited to a different point
in the pipeline.

### `validateSentinels` — operate on the still-masked string

If you still have the masked string the engine returned (before unmasking),
`validateSentinels` checks that every sentinel was returned **exactly once, unmodified**, and
that the engine invented none. Reordering is allowed (target languages reorder freely); pass
`requireOrder: true` to also assert original order.

```ts theme={null}
import { validateSentinels } from "@astilba/core";

const check = validateSentinels(translatedMasked, parts);
check.ok;     // true if every placeholder survived exactly once
check.errors; // e.g. ['placeholder #0 ({{name}}) was dropped by MT']
```

It also detects a corrupted sentinel — stray delimiter characters that aren't part of a valid
token.

### `validatePlaceholderTokens` — operate on restored tokens

`validatePlaceholderTokens` is the **fail-closed** placeholder validator. It compares a
source value's tokens against its translation's tokens and fails if any placeholder was
added, dropped, or modified. Placeholder identity is the canonical fields directly — variable

* format for interpolation, ref + options for nesting, raw for markup — so a value and its
  own translation carry byte-identical placeholders, and no syntax-specific normalisation is
  needed.

```ts theme={null}
import { validatePlaceholderTokens } from "@astilba/core";

const check = validatePlaceholderTokens(sourceTokens, translatedTokens);
check.ok;     // true if the placeholder multisets match
check.errors; // e.g. ['source placeholder "interp:name|" is missing or altered...']
```

### The string-entry form

The one place a raw string must be re-tokenized is a translation returned from MT — it was
never in the model, so it has no token view. `validatePlaceholders(source, translated,
tokenize)` takes the adapter's tokenizer by injection, tokenizes both sides, and defers to
`validatePlaceholderTokens`. The i18next adapter pre-binds its tokenizer so you get an
ergonomic two-argument `validatePlaceholders(source, translated)` — see
[the adapter reference](/reference/adapter-i18next).

<Note>
  An adapter wanting looser placeholder matching (for example, normalising whitespace) can
  pre-normalise its tokens before calling `validatePlaceholderTokens`. By default the check
  is strict, because a value and its translation carry byte-identical placeholders.
</Note>

## Related

* [The canonical model](/concepts/canonical-model) — the `ValueToken` kinds masking operates
  on.
* [@astilba/core API](/reference/core-api) — the masking function signatures.
