Chaining & Configuration

Advanced configuration for the output transform pipeline.

Pipeline ordering

Transforms execute sequentially. The output tokens of one transform become the input to the next. Each transform gets its own seeded RNG fork, so results are deterministic regardless of pipeline length.

Every transform has a preferredOrder value (lower = earlier). View these with:

malarky list transforms

Auto-ordering

Set autoOrder: true in the config to sort the pipeline by preferred order automatically:

const result = generator.sentence({
  outputTransforms: {
    enabled: true,
    pipeline: [
      { id: 'mockCase' }, // preferredOrder: 50
      { id: 'pirate' }, // preferredOrder: 10
    ],
    autoOrder: true, // reorders to: pirate, mockCase
  },
});

Without autoOrder, transforms execute in the order you specify.

CLI ordering

From the CLI, transforms execute in the order given:

# pirate runs first, then mockCase
malarky sentence --seed 42 --transform pirate,mockCase

# Same with repeated flags
malarky sentence --seed 42 -x pirate -x mockCase

Transform protection

The protection system prevents transforms from modifying certain tokens. This ensures things like acronyms, numbers, and URLs survive transformations intact.

Protection rules

Rule	Default	Description
`keepAcronyms`	`true`	ALL-CAPS words are skipped
`keepNumbers`	`true`	Number tokens are skipped
`keepCodeTokens`	`true`	Code-like tokens (camelCase, etc.) are skipped
`keepUrlsEmails`	`true`	URL and email tokens are skipped
`minWordLength`	`2`	Words shorter than this are skipped
`customProtectedRegex`	`[]`	Additional regex patterns to protect

Overriding protection

const result = generator.sentence({
  outputTransforms: {
    enabled: true,
    pipeline: [{ id: 'leet' }],
    protection: {
      keepAcronyms: true,
      keepNumbers: true,
      minWordLength: 3, // skip 1-2 letter words
      customProtectedRegex: [
        '/^Dr\\.$/i', // protect "Dr."
      ],
    },
  },
});

Config merge order

Transform configuration can come from multiple sources. They merge in this order (later overrides earlier):

Base config – from GeneratorConfig.outputTransforms
Lexicon defaults – from lexicon.outputTransforms.defaults
Archetype transforms – from lexicon.archetypes[name].outputTransforms.pipeline
Per-call overrides – from opts.outputTransforms

Merge modes

The mergeMode option on generation methods controls how per-call pipelines combine with the base:

Mode	Behavior
`'replace'`	Per-call pipeline replaces the base pipeline entirely
`'append'`	Per-call pipeline steps are appended after the base pipeline

// Replace the base pipeline
generator.sentence({
  outputTransforms: { enabled: true, pipeline: [{ id: 'leet' }] },
  mergeMode: 'replace',
});

// Append to the base pipeline
generator.sentence({
  outputTransforms: { enabled: true, pipeline: [{ id: 'uwu' }] },
  mergeMode: 'append',
});

Transform system exports

For building custom pipelines or integrations, Malarky exports the full transform infrastructure:

import {
  // Tokenizer
  tokenize, // string -> Token[]
  render, // Token[] -> string
  classifyChar, // char -> TokenType

  // Protection
  applyProtection, // Mark tokens as protected
  isProtected, // Check if a token is protected

  // Registry
  TransformRegistry, // Transform storage class
  createDefaultRegistry, // Pre-loaded with all V1 transforms

  // Pipeline
  executePipeline, // Run transforms on tokenized text
  checkPipelineOrder, // Warn about suboptimal ordering

  // Config utilities
  mergeOutputTransformsConfig,
  mergeProtectionConfig,
  DEFAULT_PROTECTION_CONFIG,
  DEFAULT_OUTPUT_TRANSFORMS_CONFIG,

  // Individual transforms
  V1_TRANSFORMS, // Array of all V1 transforms
} from 'malarky';

See Custom Transforms for how to build and register your own transforms.