We build a tool that finds waste. Then we found a pile of it in our own agent.

We tell people the same thing every day: the waste isn’t where you’re looking. It’s hiding in the places nobody thinks to check.

Then we caught ourselves doing it.

Sherpa runs on Amazon Bedrock, with the Strands Agents framework orchestrating our agent’s tool calls. We stream answers back to you token by token. And we found that from the second reasoning cycle onward, Strands was re-emitting the entire prior conversation inside every streamed chunk — every earlier message, every tool call, again and again. The payload grew quadratically with each step. One heavy, multi-step session was pushing roughly 22MB down the stream that the screen never even rendered.

Twenty-two megabytes of pure dead weight. On something we’d already shipped. On something we were proud of.

Here’s the uncomfortable truth about AI tooling right now: most of it is quietly bloated. It works, so nobody looks. The cost hides — in latency, in compute, in a bill that creeps up a few points a month while everyone shrugs.

That’s the exact disease Sherpa exists to cure. Finding it in our own stack only sharpened the thesis:

If a cloud tool isn’t ruthless about its own efficiency, why would you trust it with your bill?

We stripped the echo. The stream got lean — answers land in seconds, not minutes. And we kept the lesson taped to the wall: the waste is always closer than you think.