Slangify-Tutorial

10. Common Pitfalls

Overly broad schemas

Bad — a single catch-all field:

token details { .* }

Better — specific named fields:

token customer-email { <-[\s]>+ '@' <-[\s]>+ }
token customer-issue { '"' <( <-["]>+ )> '"' }

Broad tokens give the grammar nothing to check and make downstream code harder to write.

Too much logic in the prompt

Put structure in the grammar — not in the prompt.

Fragile:

Extract the name, party size as an integer between 1 and 20, the time in
24-hour format, the restaurant name without articles, and the date as a
single lowercase word or ISO date…

Better: write a tight grammar and tell the LLM only the DSL format and one worked example. The grammar enforces everything else.

Ambiguous types

Prefer precise token patterns over bare \S+:

Vague	Precise
\S+	\d\d\d\d '-' \d\d '-' \d\d (date)
\S+	\d\d ':' \d\d (time)
\S+	'low', 'medium', 'high' (enum)
.+	'"' <( <-["]>+ )> '"' (quoted string)

The more specific the token, the earlier bad LLM output is caught.

Not testing the grammar standalone

Always verify the grammar parses your canonical DSL before adding the LLM step. Use the Slangify Playground or a quick prove6 t/01-basic.rakutest. If the grammar itself is broken, no amount of prompt tuning will help.

Ignoring parse failures

my $m = Grammar.parse($canonical, :actions(Actions.new));
die "Parse failed on: $canonical" unless $m;   # always check!

A silent Nil match means the LLM deviated from the DSL. Log the canonical string so you can see exactly what the LLM produced and tighten the prompt or grammar accordingly.

This site is open source. Improve this page.