I've been working on this problem coming from the program synthesis school of th...

I've been working on this problem coming from the program synthesis school of thought over at https://promptless.ai (which you would have no clue just from looking at the website because its targeted at tech writers).

I'm quite fond of the idea of incremental mutation of agent trajectories to move/embody some of the reasoning steps from LLM tokens into a program. Imagine you have a long agent transcript/trajectory and you have a magic want to replace a run of messages with "and now I'll call this script which gives me exactly the information I need," then seeing if the rewritten trajectory is stable.

To give credit where it's due, it's an overly complicated restatement of what Manny Silva has been saying with docs-as-tests https://www.docsastests.com/. Once you describe some user flow to humans (your "docs"), you can "compile" or translate part or all of those steps into deterministic test programs that perform and validate state transitions. Ideally you compile an agent trajectory all the way.

So: working with coding agents, you've cranked up the defect rate in exchange for speed, lets try testing all important flows. The first thing you try is: ok, I've got these user guides, I guess I'll have the agent follow along and try do it. And that works! But it's a little expensive and slow.

So I go, ok I'll have the agent do it once, and if it finds a trajectory through a product that works, we can reflect on that transcript and make some helper scripts to automate some or all of those state transitions, then store these next to our docs.

And then you say, ok if I ship a product change, can I have my coding agent update those testing scripts to save the expense and time of re-running the original follow-along. Also an obvious thing to do, and you can totally build it yourself with Claude Code in a github action. But I think there is a lot of complexity in how you go about doing this, what kind of incremental computation you can do to keep the LLM costs of all this under a couple hundred bucks a month for teams shipping 20 changes a day with 200 pages of docs.

The most polished open source "compiler/translator" I've seen exploring these ideas so far is Doc Detective (https://doc-detective.com) by Manny.