Lexicon Schema Reference
Complete reference for the lexicon JSON schema.
Root structure
interface Lexicon {
id: string; // Unique identifier (required)
version?: string; // Version string
language: string; // Language code, e.g. "en" (required)
termSets?: Record<string, TermSet>;
patterns?: Record<string, Pattern>;
distributions?: Record<string, DistributionEntry[]>;
correlations?: Correlation[];
constraints?: Constraint[];
invariants?: Invariant[];
archetypes?: Record<string, Archetype>;
relations?: Relation[];
outputTransforms?: LexiconOutputTransforms;
}
Only id and language are required. All other fields are optional.
Term sets
Named pools of words grouped by part of speech. Each term has a value and optional weight (default 1). Higher weights increase selection probability.
{
"termSets": {
"noun.business": {
"pos": "noun",
"tags": ["domain:business", "register:formal"],
"terms": [
{ "value": "strategy", "weight": 5 },
{ "value": "stakeholder", "weight": 3 },
{ "value": "synergy", "weight": 2 }
]
},
"verb.business": {
"pos": "verb",
"tags": ["domain:business"],
"terms": [
{ "value": "leverage", "weight": 4 },
{ "value": "optimize", "weight": 4 },
{ "value": "streamline", "weight": 3 }
]
}
}
}
Supported POS values
noun, verb, adj, adv, prep, conj, intj, det
Term features
Each term can carry features for morphology and constraint evaluation:
{
"value": "child",
"weight": 3,
"features": {
"countable": true,
"number": "singular",
"irregular": { "plural": "children" }
}
}
Available features:
| Feature | Type | Description |
|---|---|---|
countable | boolean | Whether the noun is countable |
number | 'singular' \| 'plural' \| 'both' | Grammatical number |
person | 1 \| 2 \| 3 | Grammatical person |
transitive | boolean | Whether the verb is transitive |
irregular.plural | string | Irregular plural form |
irregular.pastTense | string | Irregular past tense |
irregular.pastParticiple | string | Irregular past participle |
irregular.presentParticiple | string | Irregular present participle |
irregular.thirdPerson | string | Irregular third person singular |
Patterns
Syntactic templates that define phrase or sentence structure. These supplement the built-in grammar engine.
{
"patterns": {
"sentence.corporate": {
"type": "sentence",
"slots": ["NP", "VP", "PUNCT"],
"tags": ["domain:business"],
"weight": 2
}
}
}
Pattern types
sentence, nounPhrase, verbPhrase, prepPhrase, adjPhrase, advPhrase, clause
Pattern fields
| Field | Type | Description |
|---|---|---|
type | string | Pattern type (see above) |
slots | string[] | Structural slots defining the template |
constraints | string[] | Constraint IDs to apply |
weight | number | Selection weight (default 1) |
tags | string[] | Tags for categorization |
Distributions
Named weight tables that bias choices. Referenced by archetypes.
{
"distributions": {
"sentenceTypes.corporate": [
{ "key": "simpleDeclarative", "weight": 50 },
{ "key": "compound", "weight": 20 },
{ "key": "introAdverbial", "weight": 15 },
{ "key": "subordinate", "weight": 12 },
{ "key": "interjection", "weight": 2 },
{ "key": "question", "weight": 1 }
],
"termSetBias.domain:business": [
{ "key": "noun.business", "weight": 10 },
{ "key": "verb.business", "weight": 8 }
]
}
}
Each entry is a { key, weight } pair. Weights are relative.
Correlations
Conditional weight adjustments triggered by generation events. When a term from a specific set is chosen, other sets can be boosted or suppressed.
{
"correlations": [
{
"when": { "chosenTermSet": "noun.business" },
"thenBoost": [
{ "termSet": "verb.business", "weightDelta": 5 },
{ "termSet": "adj.business", "weightDelta": 3 }
],
"scope": "sentence"
}
]
}
Condition types
| Field | Description |
|---|---|
chosenTermSet | A term from this set was selected |
chosenTag | A term with this tag was selected |
chosenValue | A specific word was selected |
usedPattern | A specific pattern was used |
Boost targets
| Field | Description |
|---|---|
termSet | Term set ID to boost |
pattern | Pattern ID to boost |
weightDelta | Weight adjustment (positive to boost, negative to suppress) |
Scopes
token, phrase, sentence, paragraph
Constraints
Rules that restrict generation. Hard constraints cause retries; soft constraints produce warnings.
{
"constraints": [
{
"id": "c.noRepeatNoun",
"level": "hard",
"scope": "sentence",
"type": "noRepeat",
"target": "pos:noun"
},
{
"id": "c.maxPP",
"level": "hard",
"scope": "phrase",
"type": "maxCount",
"target": "PP",
"value": 2
}
]
}
Constraint types
| Type | Description |
|---|---|
noRepeat | No repeated values for the target |
maxCount | Maximum occurrences (requires value) |
minCount | Minimum occurrences (requires value) |
required | Target must appear |
forbidden | Target must not appear |
custom | Custom function (requires customFn) |
Levels
| Level | Behavior |
|---|---|
hard | Causes retry on violation |
soft | Produces warning, generation continues |
Scopes
token, phrase, clause, sentence, paragraph, text
Invariants
Conditions that must always hold true. Checked after generation.
{
"invariants": [
{ "id": "inv.capitalized", "type": "capitalization", "scope": "sentence" },
{ "id": "inv.endsWithPunct", "type": "punctuation", "scope": "sentence" },
{ "id": "inv.noDoubleSpaces", "type": "whitespace", "scope": "text" }
]
}
Invariant types
| Type | Description |
|---|---|
capitalization | First letter is capitalized |
punctuation | Proper end punctuation |
whitespace | No double spaces or trailing whitespace |
agreement | Subject-verb agreement |
custom | Custom function (requires customFn) |
Relations
Graph connections between terms for co-occurrence modeling.
{
"relations": [
{ "from": "strategy", "type": "hasPart", "to": "initiative", "weight": 3 },
{ "from": "meeting", "type": "produces", "to": "deliverable", "weight": 2 },
{ "from": "synergy", "type": "enables", "to": "leverage", "weight": 2 }
]
}
| Field | Type | Description |
|---|---|---|
from | string | Source term |
type | string | Relation type (any string) |
to | string | Target term |
weight | number | Relation weight for sampling (optional) |
Lexicon-level transforms
Lexicons can specify default transforms and per-archetype transform pipelines:
{
"outputTransforms": {
"defaults": [
{ "id": "bizJargon" }
]
},
"archetypes": {
"pirate-corp": {
"tags": ["domain:business"],
"outputTransforms": {
"pipeline": [
{ "id": "pirate" },
{ "id": "bizJargon" }
]
}
}
}
}
Transform config merges in this order (later overrides earlier):
- Base config (from
GeneratorConfig.outputTransforms) - Lexicon defaults (from
lexicon.outputTransforms.defaults) - Archetype transforms (from
lexicon.archetypes[name].outputTransforms.pipeline) - Per-call overrides (from
opts.outputTransforms)
See Output Transforms > Chaining & Configuration for full merge details.