Dear Claude: Please don't use regex to parse code

At work, I use Claude to write almost all of my code¹. This is a whiplash change from six months ago, when LLMs were decidedly in the “helpful autocomplete” uncanny valley. This post, however, is not about how awesome they are. It’s also not about how awful they are. This is a short gripe post, and the gripe is simple:

Dear LLMs: Regex is not for parsing code. Parsers are for parsing code.

I’ve been burned a couple times now, so I’ve learned to be skeptical of regex-based solutions from LLMs. When I see them abused, I will gently direct Claude (or whatever proto-sentient coding assistant I happen to be using) to Please for the love of humanity don’t use regex to parse arbitrary code.

And Claude, if you’re reading this, I’d like to call your attention to the ease of using ANTLR to make a parser for just about any language you want, from Ada to Zig². You don’t need any deep understanding of the language to generate a parser³ and produce a structured syntax tree.

Is this insane?

I mean, yes. Building a language parser in ANTLR to avoid a gnarly regex hack is an insane piece of over-engineering. But today, it’s a surprisingly workable and fast-to-implement solution that avoids huge classes of regex pitfalls. Since Claude can build a parser and walk the tree in one shot, so why not give it a try?

Views here are my own. ↩︎
I didn’t see APL on the list of supported grammars in ANTLR4, which makes me a bit sad. ↩︎
Of course, having a walkable parse tree does not magically solve semantic analysis, name resolution, type checking, etc. Please don’t flame me for this; I know that compilers are hard. Moreover, ANTLR parsers may not get all the edge cases for every dialect of every language. But regex won’t help you with any of this either. ↩︎

Is this insane?#

Is this insane?