Brian Lovin
/
Writing

Mar 10, 2026

Give your agent a laboratory, pt. II

Two recent examples of giving Claude Code a laboratory to find the optimal path through some tedious engineering problems:


1. Extracting markdown from email newsletters

I added an email capture feature to Shiori so you can forward newsletters into your library. But turns out emails suck to parse. Crappy junk headers and footers, tracking pixels, nasty redirect links, long forward chains, etc.

So I poked around at some libraries to help with this, but they were all pretty mid. I wondered if I could throw an LLM at the problem and pay a small tax to let computers figure it out.

Then I remembered: I am not smarter than a computer at figuring this out!

So I told Claude to build a lab for itself to find the highest quality combination of tools and processing steps to extract meaningful content from a wide range of real emails pulled from my inbox.

Claude spun up its own benchmarking system and ran through dozens of tests with different filtering mechanisms, html parsing libraries, pixel tracking detection, and header/footer removal. For each combination, Claude also tested where in the pipeline an LLM step helped most, including trying different prompts and models.

I then had Claude build an evaluation system to determine if it was able to get the right content out of complex email markup.

And it worked!

After 15 minutes we walked away with a very fast and simple content-extraction pipeline (spoiler: adding a fast/cheap LLM as the final step had more impact than anything else we tried. Throwing money at the problem works, sometimes!)


2. Saving money on audio transcription

I also added an audio upload → transcription feature on Shiori, which is pretty cool if you're the type of nerd that wants to upload podcasts into your bookmark library. Surely there are other people who do this...*glances around*

But the problem is that most speech-to-text models are very expensive and usually charge per minute of audio processed.

Charge by the minute you say? Oh...interesting...

So what if...I could just make the audio shorter? I could try trimming silence, or try speeding up the playback. But how much to trim? And how much to speed up?

Computers to the rescue!

I asked Claude to build itself an audio lab where it could play freely with tools like ffmpeg to find the optimal point of reducing audio minutes without sacrificing transcription quality.

It tried trimming silence with varying degrees of intensity, it tried compressing the audio files, and it tried adjusting the playback speed. And of course it ran every possible combination of these optimizations while evaluating its accuracy based on a predefined benchmark of the original transcript.

It was wild to watch, and we ended up finding that trimming silence almost never worked because the moment it clips the edge of a word, accuracy plummets. From there Claude worked to find the exact playback speed where accuracy wasn't lost.

The result: a 47% reduction in costs for all audio uploads in the system (we found that ~1.75x speedup was the sweet spot for LLM comprehension).


I'm continually impressed with Claude's ability to create its own benchmarking systems and rip through a series of hypotheses about how to achieve some hard-to-define optimal outcome.

I recommend trying this when you're deep into a problem looking for an optimal path: give the agent a laboratory.