Week 1 Running a 20-Agent Studio: The Real Numbers
Key Takeaway
An AI team can build validation infrastructure in days that would take a human team weeks. But speed without measurement is just expensive motion.
Last week I wrote about designing myself out of the loop. This week I got to see what happens when the machine actually runs without me.
Short version: a lot got built. Nothing made money. And the gap between "shipped" and "validated" turned out to be the most important lesson of the week.
The Output
In 7 days, with zero human hours on execution, the agent team shipped:
- 10 waitlist landing pages across two validation waves
- 9 Google Ads campaigns launched and actively spending
- 6 GREENLIGHT verdicts on a second batch of micro-SaaS bets
- 4 original landing pages converted from direct checkout to waitlist capture
- GDPR compliance retrofitted across all pages
- The team itself grew from 13 to 20 agents
To put that in context: a traditional early-stage startup might spend 4 to 6 weeks getting this much infrastructure live. The agents did it in one.
The Revenue Line
Zero.
No paying customers. No conversions. Under $3 in total ad spend with nothing to show for it.
I expected this. Week 1 was always about building the validation machine, not running it. You cannot measure demand if the landing pages do not exist yet. You cannot test channels if the campaigns have not launched. The infrastructure had to come first. That part worked.
But here is where it gets interesting.
The Real Lesson: Shipping Is Not Validating
Ten landing pages are live. Nine campaigns are running. The machine is moving fast. And yet I have almost no usable signal about whether any of these bets will work.
Why? Because speed created a false sense of progress.
The agents optimized for throughput. Get pages live. Get campaigns launched. Get bets scored. Check the boxes. But the measurement layer, the part that actually tells you whether anyone cares, lagged behind the build speed. Forms were collecting emails but the data was not flowing into the tracking system cleanly. Campaigns were spending but attribution was incomplete.
This is a pattern I have seen in every fast-moving organization I have ever run. When the team is optimized for output, the feedback loop is the first thing that breaks. Not because anyone ignores it, but because building is more visible than measuring. Shipping feels like progress. Instrumenting feels like overhead. Until you realize you have been running blind.
Week 2's entire focus is fixing that gap. Not more building. Better measuring.
What Surprised Me About the Channels
Two things I did not expect.
LinkedIn cold outreach failed hard. 91 connection requests sent. Roughly one real reply. That is not a "needs optimization" result. That is a "wrong channel" result. The hypothesis was that cold LinkedIn outreach at this scale and ICP density would generate validation conversations. The data says otherwise. I am deprioritizing it and shifting that energy to channels where founders actually engage: X threads, niche communities, content that pulls people in instead of pushing messages out.
Google Ads is promising but too early to call. Most campaigns only went live in the last 48 hours. The early signals are mixed. Some bets are getting impressions. Others are at zero. I need at least two weeks of data before making any calls here. The temptation to judge a campaign after 48 hours is strong. I am resisting it.
The Autonomy Worked
One thing that did work exactly as designed: the no-bottleneck architecture.
Last week I wrote about removing myself from the approval loop. This was the first real test. The CEO agent ran its heartbeat cycle. Scouts sourced ideas overnight. The Pattern Analysis agent scored them. The Landing Page agent built and deployed. The Marketing Campaigns Manager launched ads. All of this happened while I was sleeping, running a marathon, or doing something other than approving task queues.
Not everything was perfect. Some agent outputs needed refinement. A few decisions were not what I would have made. But the throughput was 10x what it would have been with me in the loop. That trade-off, imperfect speed over perfect stagnation, is the entire thesis of this studio.
What Week 2 Looks Like
The build phase is over. Now the real work starts.
This week is about one thing: getting clean signal from every bet in the pipeline. That means fixing the measurement layer so I can see exactly how many people land on each page, how many sign up, and which channels are driving them. Without that data, the 50-signup gate that triggers MVP decisions is meaningless.
If by week 3 there is no demand signal on any of the 10 bets, I start killing them and narrowing focus. That is not failure. That is the system working as designed. The whole point of validating fast is to fail fast, so you can redirect to what actually has a chance.
The machine built the infrastructure. Now let's see if anyone wants what we are building.