Bad — a single catch-all field:
token details { .* }
Better — specific named fields:
token customer-email { <-[\s]>+ '@' <-[\s]>+ }
token customer-issue { '"' <( <-["]>+ )> '"' }
Broad tokens give the grammar nothing to check and make downstream code harder to write.
Put structure in the grammar — not in the prompt.
Fragile:
Extract the name, party size as an integer between 1 and 20, the time in
24-hour format, the restaurant name without articles, and the date as a
single lowercase word or ISO date…
Better: write a tight grammar and tell the LLM only the DSL format and one worked example. The grammar enforces everything else.
Prefer precise token patterns over bare \S+:
| Vague | Precise |
| \S+ | \d\d\d\d '-' \d\d '-' \d\d (date) |
| \S+ | \d\d ':' \d\d (time) |
| \S+ | 'low', 'medium', 'high' (enum) |
| .+ | '"' <( <-["]>+ )> '"' (quoted string) |
The more specific the token, the earlier bad LLM output is caught.
Always verify the grammar parses your canonical DSL before adding the LLM step. Use the Slangify Playground or a quick prove6 t/01-basic.rakutest. If the grammar itself is broken, no amount of prompt tuning will help.
my $m = Grammar.parse($canonical, :actions(Actions.new));
die "Parse failed on: $canonical" unless $m; # always check!
A silent Nil match means the LLM deviated from the DSL. Log the canonical string so you can see exactly what the LLM produced and tighten the prompt or grammar accordingly.