Lexicons

A lexicon is a JSON file that steers text generation toward specific vocabularies and styles. Without a lexicon, Malarky uses a general-purpose English vocabulary. With one, you control exactly which words appear and how they’re weighted.

Why use a lexicon?

  • Domain-specific text – Generate corporate, medical, legal, or technical nonsense
  • Style control – Adjust sentence type distributions and complexity per domain
  • Weighted vocabulary – Favor certain words over others
  • Correlated choices – When a business noun is picked, boost business verbs
  • Quality constraints – Prevent word repetition, limit phrase complexity

Minimal example

A lexicon needs only an id, language, and at least one termSet:

{
  "id": "lexicon.startup",
  "language": "en",
  "termSets": {
    "noun.startup": {
      "pos": "noun",
      "tags": ["domain:startup"],
      "terms": [
        { "value": "disruptor", "weight": 5 },
        { "value": "unicorn", "weight": 3 },
        { "value": "pivot", "weight": 4 },
        { "value": "runway", "weight": 2 }
      ]
    },
    "verb.startup": {
      "pos": "verb",
      "tags": ["domain:startup"],
      "terms": [
        { "value": "disrupt", "weight": 5 },
        { "value": "scale", "weight": 4 },
        { "value": "pivot", "weight": 3 },
        { "value": "iterate", "weight": 3 }
      ]
    }
  },
  "archetypes": {
    "startup": {
      "tags": ["domain:startup"]
    }
  }
}

Loading a lexicon in code

import {
  TextGenerator,
  SimpleFakerAdapter,
  loadLexiconFromString,
} from 'malarky';
import { readFileSync } from 'fs';

const lexicon = loadLexiconFromString(readFileSync('./startup.json', 'utf-8'));

const generator = new TextGenerator({
  fakerAdapter: new SimpleFakerAdapter(),
  lexicon,
});

generator.setArchetype('startup');
console.log(generator.paragraph());

Loading a lexicon from the CLI

malarky paragraph --lexicon ./startup.json --archetype startup

What’s in a lexicon?

A lexicon can contain any of these sections (all optional except id and language):

Section Purpose
termSets Named pools of words grouped by part of speech
patterns Syntactic templates for phrases/sentences
distributions Named weight tables that bias choices
correlations Conditional boosts triggered by word choices
constraints Hard/soft rules restricting generation
invariants Conditions that must always hold true
archetypes Style presets combining tags, distributions, overrides
relations Graph connections between terms
outputTransforms Default transform pipelines

See the Schema Reference for full documentation of each section.


Table of contents


Back to top

Malarky © 2026. Distributed under the MIT License.

This site uses Just the Docs, a documentation theme for Jekyll.