Custom Transforms
Build your own output transforms by implementing the IOutputTransform interface and registering them with the transform registry.
IOutputTransform interface
interface IOutputTransform {
id: string;
version: string;
capabilities: TransformCapabilities;
validateParams(params: unknown): ValidationResult;
apply(input: TransformInput): TransformOutput;
}
TransformCapabilities
interface TransformCapabilities {
requiresTrace: boolean; // Does this transform need trace data?
posAware: boolean; // Does it use part-of-speech info?
deterministic: boolean; // Same input = same output?
safeToStack: boolean; // Can it run after other transforms?
preferredOrder: number; // Pipeline ordering hint (lower = earlier)
}
TransformInput
interface TransformInput {
tokens: Token[]; // The token stream to transform
rng: IRng; // Seeded RNG for deterministic randomness
params?: unknown; // Optional per-step parameters
trace?: GenerationTrace; // Trace data (if requiresTrace is true)
}
TransformOutput
interface TransformOutput {
tokens: Token[]; // The transformed token stream
}
Token
interface Token {
value: string; // The text content
type: TokenType; // 'word' | 'punctuation' | 'whitespace'
meta?: {
protected?: boolean; // If true, transforms should skip this token
pos?: string; // Part of speech (if available)
[key: string]: unknown; // Additional metadata
};
}
Complete example
Here’s a custom transform that converts all unprotected words to uppercase:
import {
TransformRegistry,
createDefaultRegistry,
SimpleFakerAdapter,
TextGenerator,
} from 'malarky';
import type {
IOutputTransform,
TransformInput,
TransformOutput,
ValidationResult,
} from 'malarky';
const shoutTransform: IOutputTransform = {
id: 'shout',
version: '1.0.0',
capabilities: {
requiresTrace: false,
posAware: false,
deterministic: true,
safeToStack: true,
preferredOrder: 50,
},
validateParams(params: unknown): ValidationResult {
return { valid: true, errors: [] };
},
apply(input: TransformInput): TransformOutput {
const tokens = input.tokens.map((token) => {
// Only transform unprotected words
if (token.type === 'word' && !token.meta?.protected) {
return { ...token, value: token.value.toUpperCase() };
}
return token;
});
return { tokens };
},
};
Registering a custom transform
Access the transform registry from a generator instance and register your transform:
const generator = new TextGenerator({
fakerAdapter: new SimpleFakerAdapter(),
});
// Register the custom transform
generator.getTransformRegistry().register(shoutTransform);
// Use it
const result = generator.sentence({
outputTransforms: {
enabled: true,
pipeline: [{ id: 'shout' }],
},
});
TransformRegistry API
| Method | Description |
|---|---|
register(transform) | Register a new transform |
has(id) | Check if a transform is registered |
get(id) | Get a transform by ID |
list() | List all registered transform IDs |
Using createDefaultRegistry
For standalone use (outside a generator), create a pre-loaded registry:
import { createDefaultRegistry } from 'malarky';
const registry = createDefaultRegistry();
registry.register(shoutTransform);
// All 11 transforms now available (10 built-in + shout)
console.log(registry.list());
Tips for custom transforms
-
Check
token.meta?.protected– Always skip protected tokens unless your transform specifically needs to modify them. -
Use the provided RNG – If your transform involves randomness, use
input.rnginstead ofMath.random()to ensure deterministic output. -
Set
deterministic: trueonly if the same input always produces the same output (which it will if you use the provided RNG). -
Set
safeToStack: falseif your transform’s output might confuse other transforms (e.g., if it inserts special characters that look like punctuation). -
Choose
preferredOrdercarefully – Lower values run earlier. Character-level transforms (leet, mockCase) should typically run after word-level transforms (pirate, bizJargon). -
Use
posAware: trueandrequiresTrace: trueif your transform needs linguistic context, but be aware this couples it to the tracing system.