<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheet.xsl" type="text/xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:podcast="https://podcastindex.org/namespace/1.0">
  <channel>
    <atom:link rel="self" type="application/rss+xml" href="https://feeds.transistor.fm/the-experimentation-edge" title="MP3 Audio"/>
    <atom:link rel="hub" href="https://pubsubhubbub.appspot.com/"/>
    <podcast:podping usesPodping="true"/>
    <title>The Experimentation Edge</title>
    <generator>Transistor (https://transistor.fm)</generator>
    <itunes:new-feed-url>https://feeds.transistor.fm/the-experimentation-edge</itunes:new-feed-url>
    <description>How do product teams decide what to build and what not to? The Experimentation Edge is the podcast where product, growth, and engineering leaders share how A/B testing, feature flags, and experimentation drive real business outcomes — backed by named companies and real numbers. From DoorDash's 12,000 A/B tests a year to Atlassian's experimentation-led product win to UPS's $500M experimentation team, each episode goes deep with operators running experimentation programs at scale.

Hosted by Ashley Stirrup, CMO at GrowthBook and a 25-year executive in data and experimentation. For product managers, engineers, data scientists, and growth leaders at B2B tech companies who care about experimentation culture, statistical rigor, and shipping with confidence. No marketing speak. Just operators explaining what they shipped, what moved the needle, and how experimentation reshaped their teams.

Topics: A/B testing, experimentation, growth experimentation, product experimentation, tech experimentation, feature flags, experimentation culture, statistical significance, marketplace experimentation, conversion rate optimization, experimentation at scale.</description>
    <copyright>Ashley Stirrup</copyright>
    <podcast:guid>e03ff1cc-4468-5ca2-a16f-a7a84195031d</podcast:guid>
    <podcast:locked>yes</podcast:locked>
    <language>en</language>
    <pubDate>Tue, 02 Jun 2026 10:22:44 -0600</pubDate>
    <lastBuildDate>Tue, 02 Jun 2026 10:23:13 -0600</lastBuildDate>
    <link>https://the-experimentation-edge.transistor.fm/</link>
    <image>
      <url>https://img.transistorcdn.com/swQmJWlBr0i0NiPZuyg9ULjIj7vBdCykVhI-iVV7qEc/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS80YTFk/MGU1MjJlODhlNjJh/MTdlZTZkN2Q1ODY5/OTdjYy5wbmc.jpg</url>
      <title>The Experimentation Edge</title>
      <link>https://the-experimentation-edge.transistor.fm/</link>
    </image>
    <itunes:category text="Business"/>
    <itunes:category text="Technology"/>
    <itunes:type>episodic</itunes:type>
    <itunes:author>Growthbook</itunes:author>
    <itunes:image href="https://img.transistorcdn.com/swQmJWlBr0i0NiPZuyg9ULjIj7vBdCykVhI-iVV7qEc/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS80YTFk/MGU1MjJlODhlNjJh/MTdlZTZkN2Q1ODY5/OTdjYy5wbmc.jpg"/>
    <itunes:summary>How do product teams decide what to build and what not to? The Experimentation Edge is the podcast where product, growth, and engineering leaders share how A/B testing, feature flags, and experimentation drive real business outcomes — backed by named companies and real numbers. From DoorDash's 12,000 A/B tests a year to Atlassian's experimentation-led product win to UPS's $500M experimentation team, each episode goes deep with operators running experimentation programs at scale.

Hosted by Ashley Stirrup, CMO at GrowthBook and a 25-year executive in data and experimentation. For product managers, engineers, data scientists, and growth leaders at B2B tech companies who care about experimentation culture, statistical rigor, and shipping with confidence. No marketing speak. Just operators explaining what they shipped, what moved the needle, and how experimentation reshaped their teams.

Topics: A/B testing, experimentation, growth experimentation, product experimentation, tech experimentation, feature flags, experimentation culture, statistical significance, marketplace experimentation, conversion rate optimization, experimentation at scale.</itunes:summary>
    <itunes:subtitle>How do product teams decide what to build and what not to.</itunes:subtitle>
    <itunes:keywords>A/B testing,experimentation,product management,growth strategy,feature flags,product experimentation,data-driven decisions,conversion optimization,product leadership,GrowthBook,experimentation culture,product analytics,hypothesis testing,growth marketing,product strategy,CPO,VP product,experimentation platform,product metrics,evidence-based product</itunes:keywords>
    <itunes:owner>
      <itunes:name>Ashley Stirrup</itunes:name>
    </itunes:owner>
    <itunes:complete>No</itunes:complete>
    <itunes:explicit>No</itunes:explicit>
    <item>
      <title>The 2% close rate increase that turned Ford Credit's product teams into believers</title>
      <itunes:episode>15</itunes:episode>
      <podcast:episode>15</podcast:episode>
      <itunes:title>The 2% close rate increase that turned Ford Credit's product teams into believers</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">e7e90d7c-e7b8-4f2e-9eb8-a54609113851</guid>
      <link>https://share.transistor.fm/s/c3607cb8</link>
      <description>
        <![CDATA[<p><strong>Summary</strong></p><p>On this edition of The Experimentation Edge, Ashley Stirrup talks with Geoffrey Bell, Experimentation Product Specialist at Ford Credit, about building an experimentation practice inside a captive auto lender. Geoffrey shares the losing test that earned his program credibility, the "experimentation piggy bank" he picked up at Microsoft, and the breakthrough of connecting online experiments to offline dealership receivables. The throughline: a program proves its worth not just by the wins it ships, but by the expensive mistakes it prevents and the revenue it can finally trace. It's for product managers, data scientists, and growth leaders who want experimentation taken seriously by the business.</p><p><strong>Chapters</strong></p><p>00:00 Intro</p><p>01:15 Geoffrey's path: Lowe's, Microsoft, Ford Credit</p><p>07:15 How Ford Credit fits with Ford Motor</p><p>10:15 The teams behind every Ford Credit page</p><p>15:15 The vehicle selector test that lost on purpose</p><p>19:15 Why feature placement beats feature ideas</p><p>21:15 Personalization and the shrinking-audience problem</p><p>25:15 Telling the story when a test loses</p><p>30:45 Connecting an online test to an offline car sale</p><p>33:55 The experimentation piggy bank</p><p><strong>Takeaways</strong></p><p>1. Losing tests often create more value than winners because they stop expensive mistakes before they ship.</p><p>2. Measure experimentation two ways: the revenue you earn from wins and the revenue you save by killing bad experiences.</p><p>3. A feature that fails early in a flow can succeed later; placement and timing often matter more than the idea itself.</p><p>4. Connecting online experiments to offline outcomes like receivables turns a small lift into a number leadership cares about.</p><p>5. When you struggle to land a result, lead with the story of what the customer did, then bring the numbers.</p><p><br></p><p><br><strong>Connect with the Guest<br></strong>LinkedIn: <a href="https://www.linkedin.com/in/geoffrey-bell-62a03617/">https://www.linkedin.com/in/geoffrey-bell-62a03617/</a> <br>Website: <a href="https://www.ford.com/finance/">https://www.ford.com/finance/</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p><br>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a> </p>
<ul><li>(00:00) - Intro</li>
<li>(01:15) - Geoffrey's path: Lowe's, Microsoft, Ford Credit</li>
<li>(07:15) - How Ford Credit fits with Ford Motor</li>
<li>(10:15) - The teams behind every Ford Credit page</li>
<li>(15:15) - The vehicle selector test that lost on purpose</li>
<li>(19:15) - Why feature placement beats feature ideas</li>
<li>(21:15) - Personalization and the shrinking-audience problem</li>
<li>(25:15) - Telling the story when a test loses</li>
<li>(30:45) - Connecting an online test to an offline car sale</li>
<li>(33:55) - The experimentation piggy bank</li>
</ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><strong>Summary</strong></p><p>On this edition of The Experimentation Edge, Ashley Stirrup talks with Geoffrey Bell, Experimentation Product Specialist at Ford Credit, about building an experimentation practice inside a captive auto lender. Geoffrey shares the losing test that earned his program credibility, the "experimentation piggy bank" he picked up at Microsoft, and the breakthrough of connecting online experiments to offline dealership receivables. The throughline: a program proves its worth not just by the wins it ships, but by the expensive mistakes it prevents and the revenue it can finally trace. It's for product managers, data scientists, and growth leaders who want experimentation taken seriously by the business.</p><p><strong>Chapters</strong></p><p>00:00 Intro</p><p>01:15 Geoffrey's path: Lowe's, Microsoft, Ford Credit</p><p>07:15 How Ford Credit fits with Ford Motor</p><p>10:15 The teams behind every Ford Credit page</p><p>15:15 The vehicle selector test that lost on purpose</p><p>19:15 Why feature placement beats feature ideas</p><p>21:15 Personalization and the shrinking-audience problem</p><p>25:15 Telling the story when a test loses</p><p>30:45 Connecting an online test to an offline car sale</p><p>33:55 The experimentation piggy bank</p><p><strong>Takeaways</strong></p><p>1. Losing tests often create more value than winners because they stop expensive mistakes before they ship.</p><p>2. Measure experimentation two ways: the revenue you earn from wins and the revenue you save by killing bad experiences.</p><p>3. A feature that fails early in a flow can succeed later; placement and timing often matter more than the idea itself.</p><p>4. Connecting online experiments to offline outcomes like receivables turns a small lift into a number leadership cares about.</p><p>5. When you struggle to land a result, lead with the story of what the customer did, then bring the numbers.</p><p><br></p><p><br><strong>Connect with the Guest<br></strong>LinkedIn: <a href="https://www.linkedin.com/in/geoffrey-bell-62a03617/">https://www.linkedin.com/in/geoffrey-bell-62a03617/</a> <br>Website: <a href="https://www.ford.com/finance/">https://www.ford.com/finance/</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p><br>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a> </p>
<ul><li>(00:00) - Intro</li>
<li>(01:15) - Geoffrey's path: Lowe's, Microsoft, Ford Credit</li>
<li>(07:15) - How Ford Credit fits with Ford Motor</li>
<li>(10:15) - The teams behind every Ford Credit page</li>
<li>(15:15) - The vehicle selector test that lost on purpose</li>
<li>(19:15) - Why feature placement beats feature ideas</li>
<li>(21:15) - Personalization and the shrinking-audience problem</li>
<li>(25:15) - Telling the story when a test loses</li>
<li>(30:45) - Connecting an online test to an offline car sale</li>
<li>(33:55) - The experimentation piggy bank</li>
</ul>]]>
      </content:encoded>
      <pubDate>Tue, 02 Jun 2026 08:47:53 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/c3607cb8/b92b83cc.mp3" length="35423913" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/IhgAJES-DtJ28fiBfcLny2FonoW-YqNhZxyOyIPChyA/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lZmI0/ZWI4OTNlZDJmOTBi/ODliNzJiZGZmNDIy/Njc0NS5wbmc.jpg"/>
      <itunes:duration>2211</itunes:duration>
      <itunes:summary>
        <![CDATA[<p><strong>Summary</strong></p><p>On this edition of The Experimentation Edge, Ashley Stirrup talks with Geoffrey Bell, Experimentation Product Specialist at Ford Credit, about building an experimentation practice inside a captive auto lender. Geoffrey shares the losing test that earned his program credibility, the "experimentation piggy bank" he picked up at Microsoft, and the breakthrough of connecting online experiments to offline dealership receivables. The throughline: a program proves its worth not just by the wins it ships, but by the expensive mistakes it prevents and the revenue it can finally trace. It's for product managers, data scientists, and growth leaders who want experimentation taken seriously by the business.</p><p><strong>Chapters</strong></p><p>00:00 Intro</p><p>01:15 Geoffrey's path: Lowe's, Microsoft, Ford Credit</p><p>07:15 How Ford Credit fits with Ford Motor</p><p>10:15 The teams behind every Ford Credit page</p><p>15:15 The vehicle selector test that lost on purpose</p><p>19:15 Why feature placement beats feature ideas</p><p>21:15 Personalization and the shrinking-audience problem</p><p>25:15 Telling the story when a test loses</p><p>30:45 Connecting an online test to an offline car sale</p><p>33:55 The experimentation piggy bank</p><p><strong>Takeaways</strong></p><p>1. Losing tests often create more value than winners because they stop expensive mistakes before they ship.</p><p>2. Measure experimentation two ways: the revenue you earn from wins and the revenue you save by killing bad experiences.</p><p>3. A feature that fails early in a flow can succeed later; placement and timing often matter more than the idea itself.</p><p>4. Connecting online experiments to offline outcomes like receivables turns a small lift into a number leadership cares about.</p><p>5. When you struggle to land a result, lead with the story of what the customer did, then bring the numbers.</p><p><br></p><p><br><strong>Connect with the Guest<br></strong>LinkedIn: <a href="https://www.linkedin.com/in/geoffrey-bell-62a03617/">https://www.linkedin.com/in/geoffrey-bell-62a03617/</a> <br>Website: <a href="https://www.ford.com/finance/">https://www.ford.com/finance/</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p><br>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a> </p>
<ul><li>(00:00) - Intro</li>
<li>(01:15) - Geoffrey's path: Lowe's, Microsoft, Ford Credit</li>
<li>(07:15) - How Ford Credit fits with Ford Motor</li>
<li>(10:15) - The teams behind every Ford Credit page</li>
<li>(15:15) - The vehicle selector test that lost on purpose</li>
<li>(19:15) - Why feature placement beats feature ideas</li>
<li>(21:15) - Personalization and the shrinking-audience problem</li>
<li>(25:15) - Telling the story when a test loses</li>
<li>(30:45) - Connecting an online test to an offline car sale</li>
<li>(33:55) - The experimentation piggy bank</li>
</ul>]]>
      </itunes:summary>
      <itunes:keywords>a/b testing,experimentation,feature flags,ford credit,geoffrey bell,prequalification testing,vehicle selector test,experimentation piggy bank,offline conversion,receivables,personalization testing,sample size,close rate,adobe target,experimentation roi,how to prove experimentation roi,why ab tests fail,online to offline attribution,experimentation culture,contentsquare</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/c3607cb8/transcript.txt" type="text/plain"/>
      <podcast:chapters url="https://share.transistor.fm/s/c3607cb8/chapters.json" type="application/json+chapters"/>
    </item>
    <item>
      <title>Atlassian's Andrew Willingham on the Talent Product That A/B Testing Turned Around</title>
      <itunes:episode>14</itunes:episode>
      <podcast:episode>14</podcast:episode>
      <itunes:title>Atlassian's Andrew Willingham on the Talent Product That A/B Testing Turned Around</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">17d64360-385f-46c5-bad0-c9e1d1568b2f</guid>
      <link>https://share.transistor.fm/s/69fb860f</link>
      <description>
        <![CDATA[<p>This episode of The Experimentation Edge explores how A/B testing, feature flags, and user research transformed Atlassian's talent product after it failed with its first users. Andrew Willingham — 11 years at Amazon, now Head of Legal and People Products at Atlassian — shares how product experimentation works when you can't test at scale, why your customer and your user are not the same person, and how the metrics you choose decide which experiments you can even run.</p><p><strong>Summary</strong><br>Andrew Willingham, Head of Legal and People Products at Atlassian, spent 11 years at Amazon before joining Atlassian a year ago. His path from running A/B tests on millions of Amazon shoppers to building talent management software for a few hundred thousand employees forced a fundamental shift: when you can't run tests at scale, you have to sit with your actual users and watch them fail. He shares how building a talent review product for Amazon's HR specialists completely flopped when handed to HRBPs — and why that failure taught him more than any winning experiment. Now at Atlassian, he's applying that same rigor to reimagining hiring processes with AI, testing everything from recruiter screens to interview sequences that the industry has run the same way for decades.</p><p><strong>Timestamps</strong><br>03:09 From marketing Amazon's mobile app to building HR software for 1.5 million associates  <br>08:19 Why a talent review product loved by IO psych experts flopped with actual HRBPs  <br>11:11 How A/B testing helps product managers escape opinion-based politics  <br>15:25 Testing copy that changes behavior: "We'll generate that status report for you"  <br>17:20 The two North Star metrics Andrew optimizes: efficiency and quality  <br>19:05 Khan Academy's metric trap: measuring cognitive engagement, not just completion  <br>21:10 Why product managers resist experimentation — and what changes when you admit you don't know  </p><p><strong>Takeaways</strong><br>- Your customer and your user may not be the same person — building for HR specialists instead of the HRBPs who actually run talent reviews resulted in a feature nobody could use.  <br>- When you can't test at scale, desk rides replace A/B tests — sitting with users and watching them struggle reveals failures faster than any dashboard.  <br>- Experimentation short-circuits political debates by removing opinion from product decisions.  <br>- Test metrics before you test features — usage time could signal engagement or just mean your product takes too long to do its job.  <br>- The experiments that fail deliver the most valuable learnings, especially when you expected a slam dunk.  </p><p><br><strong>Connect with the guest</strong><br>Andrew Willingham on LinkedIn: <a href="https://www.linkedin.com/in/andrewwillingham/">https://www.linkedin.com/in/andrewwillingham/</a><br>Learn more about Atlassian: <a href="https://www.atlassian.com/">https://www.atlassian.com/</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, product experimentation, feature flags, user research, talent management, qualitative research, metric design, experimentation at scale, growth experimentation.</p>
<ul><li>(03:09) - From marketing Amazon's mobile app to building HR software for 1.5 million associates </li>
<li>(08:19) - Why a talent review product loved by IO psych experts flopped with actual HRBPs </li>
<li>(11:11) - How A/B testing helps product managers escape opinion-based politics </li>
<li>(15:25) - Testing copy that changes behavior: "We'll generate that status report for you" </li>
<li>(17:20) - The two North Star metrics Andrew optimizes: efficiency and quality </li>
<li>(19:05) - Khan Academy's metric trap: measuring cognitive engagement, not just completion </li>
<li>(21:10) - Why product managers resist experimentation — and what changes when you admit you don't know </li>
</ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>This episode of The Experimentation Edge explores how A/B testing, feature flags, and user research transformed Atlassian's talent product after it failed with its first users. Andrew Willingham — 11 years at Amazon, now Head of Legal and People Products at Atlassian — shares how product experimentation works when you can't test at scale, why your customer and your user are not the same person, and how the metrics you choose decide which experiments you can even run.</p><p><strong>Summary</strong><br>Andrew Willingham, Head of Legal and People Products at Atlassian, spent 11 years at Amazon before joining Atlassian a year ago. His path from running A/B tests on millions of Amazon shoppers to building talent management software for a few hundred thousand employees forced a fundamental shift: when you can't run tests at scale, you have to sit with your actual users and watch them fail. He shares how building a talent review product for Amazon's HR specialists completely flopped when handed to HRBPs — and why that failure taught him more than any winning experiment. Now at Atlassian, he's applying that same rigor to reimagining hiring processes with AI, testing everything from recruiter screens to interview sequences that the industry has run the same way for decades.</p><p><strong>Timestamps</strong><br>03:09 From marketing Amazon's mobile app to building HR software for 1.5 million associates  <br>08:19 Why a talent review product loved by IO psych experts flopped with actual HRBPs  <br>11:11 How A/B testing helps product managers escape opinion-based politics  <br>15:25 Testing copy that changes behavior: "We'll generate that status report for you"  <br>17:20 The two North Star metrics Andrew optimizes: efficiency and quality  <br>19:05 Khan Academy's metric trap: measuring cognitive engagement, not just completion  <br>21:10 Why product managers resist experimentation — and what changes when you admit you don't know  </p><p><strong>Takeaways</strong><br>- Your customer and your user may not be the same person — building for HR specialists instead of the HRBPs who actually run talent reviews resulted in a feature nobody could use.  <br>- When you can't test at scale, desk rides replace A/B tests — sitting with users and watching them struggle reveals failures faster than any dashboard.  <br>- Experimentation short-circuits political debates by removing opinion from product decisions.  <br>- Test metrics before you test features — usage time could signal engagement or just mean your product takes too long to do its job.  <br>- The experiments that fail deliver the most valuable learnings, especially when you expected a slam dunk.  </p><p><br><strong>Connect with the guest</strong><br>Andrew Willingham on LinkedIn: <a href="https://www.linkedin.com/in/andrewwillingham/">https://www.linkedin.com/in/andrewwillingham/</a><br>Learn more about Atlassian: <a href="https://www.atlassian.com/">https://www.atlassian.com/</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, product experimentation, feature flags, user research, talent management, qualitative research, metric design, experimentation at scale, growth experimentation.</p>
<ul><li>(03:09) - From marketing Amazon's mobile app to building HR software for 1.5 million associates </li>
<li>(08:19) - Why a talent review product loved by IO psych experts flopped with actual HRBPs </li>
<li>(11:11) - How A/B testing helps product managers escape opinion-based politics </li>
<li>(15:25) - Testing copy that changes behavior: "We'll generate that status report for you" </li>
<li>(17:20) - The two North Star metrics Andrew optimizes: efficiency and quality </li>
<li>(19:05) - Khan Academy's metric trap: measuring cognitive engagement, not just completion </li>
<li>(21:10) - Why product managers resist experimentation — and what changes when you admit you don't know </li>
</ul>]]>
      </content:encoded>
      <pubDate>Wed, 13 May 2026 06:00:00 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/69fb860f/42caa581.mp3" length="21711071" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/ulTUu5Mi7dEBlrV4U_4bRIypJCAythnyBy3_fOR3Uts/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS84MjM1/MTJlYTljNzhmZGQ0/NTkyMjg1MDcyMzI1/ZTk3Yi5wbmc.jpg"/>
      <itunes:duration>1354</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>This episode of The Experimentation Edge explores how A/B testing, feature flags, and user research transformed Atlassian's talent product after it failed with its first users. Andrew Willingham — 11 years at Amazon, now Head of Legal and People Products at Atlassian — shares how product experimentation works when you can't test at scale, why your customer and your user are not the same person, and how the metrics you choose decide which experiments you can even run.</p><p><strong>Summary</strong><br>Andrew Willingham, Head of Legal and People Products at Atlassian, spent 11 years at Amazon before joining Atlassian a year ago. His path from running A/B tests on millions of Amazon shoppers to building talent management software for a few hundred thousand employees forced a fundamental shift: when you can't run tests at scale, you have to sit with your actual users and watch them fail. He shares how building a talent review product for Amazon's HR specialists completely flopped when handed to HRBPs — and why that failure taught him more than any winning experiment. Now at Atlassian, he's applying that same rigor to reimagining hiring processes with AI, testing everything from recruiter screens to interview sequences that the industry has run the same way for decades.</p><p><strong>Timestamps</strong><br>03:09 From marketing Amazon's mobile app to building HR software for 1.5 million associates  <br>08:19 Why a talent review product loved by IO psych experts flopped with actual HRBPs  <br>11:11 How A/B testing helps product managers escape opinion-based politics  <br>15:25 Testing copy that changes behavior: "We'll generate that status report for you"  <br>17:20 The two North Star metrics Andrew optimizes: efficiency and quality  <br>19:05 Khan Academy's metric trap: measuring cognitive engagement, not just completion  <br>21:10 Why product managers resist experimentation — and what changes when you admit you don't know  </p><p><strong>Takeaways</strong><br>- Your customer and your user may not be the same person — building for HR specialists instead of the HRBPs who actually run talent reviews resulted in a feature nobody could use.  <br>- When you can't test at scale, desk rides replace A/B tests — sitting with users and watching them struggle reveals failures faster than any dashboard.  <br>- Experimentation short-circuits political debates by removing opinion from product decisions.  <br>- Test metrics before you test features — usage time could signal engagement or just mean your product takes too long to do its job.  <br>- The experiments that fail deliver the most valuable learnings, especially when you expected a slam dunk.  </p><p><br><strong>Connect with the guest</strong><br>Andrew Willingham on LinkedIn: <a href="https://www.linkedin.com/in/andrewwillingham/">https://www.linkedin.com/in/andrewwillingham/</a><br>Learn more about Atlassian: <a href="https://www.atlassian.com/">https://www.atlassian.com/</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, product experimentation, feature flags, user research, talent management, qualitative research, metric design, experimentation at scale, growth experimentation.</p>
<ul><li>(03:09) - From marketing Amazon's mobile app to building HR software for 1.5 million associates </li>
<li>(08:19) - Why a talent review product loved by IO psych experts flopped with actual HRBPs </li>
<li>(11:11) - How A/B testing helps product managers escape opinion-based politics </li>
<li>(15:25) - Testing copy that changes behavior: "We'll generate that status report for you" </li>
<li>(17:20) - The two North Star metrics Andrew optimizes: efficiency and quality </li>
<li>(19:05) - Khan Academy's metric trap: measuring cognitive engagement, not just completion </li>
<li>(21:10) - Why product managers resist experimentation — and what changes when you admit you don't know </li>
</ul>]]>
      </itunes:summary>
      <itunes:keywords>product management,talent management,HR technology,HRIS,Amazon experimentation culture,A/B testing,behavioral metrics,user research,talent acquisition,AI in HR,hiring process optimization,quality metrics,employee experience,experimentation at scale,product strategy,qualitative research,quantitative research,HR product development,metric design,learning velocity,product experimentation,feature flags,tech experimentation,growth experimentation</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/69fb860f/transcript.txt" type="text/plain"/>
      <podcast:chapters url="https://share.transistor.fm/s/69fb860f/chapters.json" type="application/json+chapters"/>
    </item>
    <item>
      <title>How DoorDash's Experimentation Platform Saved Millions With One A/B Test</title>
      <itunes:episode>13</itunes:episode>
      <podcast:episode>13</podcast:episode>
      <itunes:title>How DoorDash's Experimentation Platform Saved Millions With One A/B Test</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">5e97cd6d-9977-4d41-be92-a1ece2a8c412</guid>
      <link>https://share.transistor.fm/s/59877618</link>
      <description>
        <![CDATA[<p>This episode of The Experimentation Edge unpacks how DoorDash's experimentation platform runs 12,000+ A/B tests per year across 42 million monthly active users — and now powers merchant-led testing on menu pricing and promotions. Ilya Izrailevsky, Senior Engineering Manager leading the platform, explains how feature flags, marketplace experimentation, and CEO-level experiment reviews built a multi-million-dollar experimentation culture across consumers, dashers, and merchants.</p><p><strong>Summary</strong><br>Most companies struggle to scale experimentation beyond engineering teams. DoorDash runs over 12,000 experiments per year across 42 million monthly active users — and now they're enabling restaurant owners to run their own A/B tests on menu pricing and promotions. Ilya Izrailevsky, Senior Engineering Manager leading DoorDash's experimentation platform, shares how the company built a three-sided marketplace testing program that balances consumers, dashers, and merchants across 40+ countries. From his time scaling search at Amazon (where offline model evaluation narrowed hundreds of candidates down to 10 for live testing) to preventing DashPass churn at DoorDash, Ilya reveals what happens when experimentation scales beyond product teams — and why CEO-level experiment review emails drive cultural change faster than any training program.</p><p>One standout learning: expanding delivery radius to 11+ miles increased grocery orders but tanked retail conversions. The lesson wasn't about distance — it was that one metric approach breaks in multi-dimensional marketplaces. DoorDash now segments experimentation by vertical, behavior pattern, and regional market, using AI agents to mine institutional knowledge from past tests and auto-generate experiment summaries that ship company-wide within hours of readout.</p><p><br><strong>Timestamps</strong><br>00:40 From building Wasabi (Intuit's open-source platform) to running ML at Amazon and Uber  <br>03:04 Why product velocity without experimentation creates feature bloat, not impact  <br>05:32 Scaling search at Amazon: billions of products, 10 visible results, 25% win rate  <br>08:22 Offline evaluation as a filter — golden data sets cut model candidates before live traffic  <br>10:23 DoorDash's three-sided marketplace: 300 million feature flag evaluations per second  <br>12:38 CEO Tony Xu reads every experiment email and replies with alternative hypotheses  <br>13:33 Democratization at scale: enabling merchants to A/B test menu pricing and promotions  <br>17:05 DashPass churn experiment uncovered value perception gap — became a full product area  <br>22:03 Why expanding delivery radius killed retail orders but boosted grocery conversions  <br>24:16 No single North Star metric — balancing consumer quality, dasher earnings, merchant mix  <br>27:29 Four-dimensional scale: democratization, global expansion, new verticals, AI agents  <br>31:03 Agentic experimentation: AI mines past tests to generate hypotheses and debug imbalance</p><p><br><strong>Takeaways</strong><br>- Win rate matters less than learnings per test — DoorDash ships company-wide experiment summaries (win or lose) that the CEO actively reads and responds to, creating cultural accountability around testing rigor.<br>- Offline evaluation acts as a pre-filter for model velocity — Amazon's search team used golden data sets to cut hundreds of ML candidates down to 10 for live A/B testing, preventing wasted experiment slots.<br>- One-size metrics break in multi-dimensional marketplaces — DoorDash balances consumer retention, dasher utilization, and merchant inventory mix across verticals because optimizing one side degrades the ecosystem.<br>- Democratization requires opinionated templates, not open-ended tools — enabling non-technical users to run tests means embedding success metrics and guardrails into pre-built experiment configs.<br>- AI scales institutional knowledge, not just analysis speed — mining past experiment readouts to auto-generate new hypotheses turns your testing history into a compounding advantage.</p><p><br><strong>Connect with the guest</strong><br>LinkedIn: <a href="https://www.linkedin.com/in/ilyaizrailevsky/">https://www.linkedin.com/in/ilyaizrailevsky/</a><br>Learn more about DoorDash: <a href="https://www.doordash.com/">https://www.doordash.com/</a></p><p><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, experimentation platform, feature flags, marketplace experimentation, machine learning, growth experimentation, statistical significance, experimentation culture, agentic AI workflows.</p>
<ul><li>(00:40) - From building Wasabi (Intuit's open-source platform) to running ML at Amazon and Uber </li>
<li>(03:04) - Why product velocity without experimentation creates feature bloat, not impact </li>
<li>(05:32) - Scaling search at Amazon: billions of products, 10 visible results, 25% win rate </li>
<li>(08:22) - Offline evaluation as a filter — golden data sets cut model candidates before live traffic </li>
<li>(10:23) - DoorDash's three-sided marketplace: 300 million feature flag evaluations per second </li>
<li>(12:38) - CEO Tony Xu reads every experiment email and replies with alternative hypotheses </li>
<li>(13:33) - Democratization at scale: enabling merchants to A/B test menu pricing and promotions </li>
<li>(17:05) - DashPass churn experiment uncovered value perception gap — became a full product area </li>
<li>(22:03) - Why expanding delivery radius killed retail orders but boosted grocery conversions </li>
<li>(24:16) - No single North Star metric — balancing consumer quality, dasher earnings, merchant mix </li>
<li>(27:29) - Four-dimensional scale: democratization, global expansion, new verticals, AI agents </li>
<li>(31:03) - Agentic experimentation: AI mines past tests to generate hypotheses and debug imbalance</li>
</ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>This episode of The Experimentation Edge unpacks how DoorDash's experimentation platform runs 12,000+ A/B tests per year across 42 million monthly active users — and now powers merchant-led testing on menu pricing and promotions. Ilya Izrailevsky, Senior Engineering Manager leading the platform, explains how feature flags, marketplace experimentation, and CEO-level experiment reviews built a multi-million-dollar experimentation culture across consumers, dashers, and merchants.</p><p><strong>Summary</strong><br>Most companies struggle to scale experimentation beyond engineering teams. DoorDash runs over 12,000 experiments per year across 42 million monthly active users — and now they're enabling restaurant owners to run their own A/B tests on menu pricing and promotions. Ilya Izrailevsky, Senior Engineering Manager leading DoorDash's experimentation platform, shares how the company built a three-sided marketplace testing program that balances consumers, dashers, and merchants across 40+ countries. From his time scaling search at Amazon (where offline model evaluation narrowed hundreds of candidates down to 10 for live testing) to preventing DashPass churn at DoorDash, Ilya reveals what happens when experimentation scales beyond product teams — and why CEO-level experiment review emails drive cultural change faster than any training program.</p><p>One standout learning: expanding delivery radius to 11+ miles increased grocery orders but tanked retail conversions. The lesson wasn't about distance — it was that one metric approach breaks in multi-dimensional marketplaces. DoorDash now segments experimentation by vertical, behavior pattern, and regional market, using AI agents to mine institutional knowledge from past tests and auto-generate experiment summaries that ship company-wide within hours of readout.</p><p><br><strong>Timestamps</strong><br>00:40 From building Wasabi (Intuit's open-source platform) to running ML at Amazon and Uber  <br>03:04 Why product velocity without experimentation creates feature bloat, not impact  <br>05:32 Scaling search at Amazon: billions of products, 10 visible results, 25% win rate  <br>08:22 Offline evaluation as a filter — golden data sets cut model candidates before live traffic  <br>10:23 DoorDash's three-sided marketplace: 300 million feature flag evaluations per second  <br>12:38 CEO Tony Xu reads every experiment email and replies with alternative hypotheses  <br>13:33 Democratization at scale: enabling merchants to A/B test menu pricing and promotions  <br>17:05 DashPass churn experiment uncovered value perception gap — became a full product area  <br>22:03 Why expanding delivery radius killed retail orders but boosted grocery conversions  <br>24:16 No single North Star metric — balancing consumer quality, dasher earnings, merchant mix  <br>27:29 Four-dimensional scale: democratization, global expansion, new verticals, AI agents  <br>31:03 Agentic experimentation: AI mines past tests to generate hypotheses and debug imbalance</p><p><br><strong>Takeaways</strong><br>- Win rate matters less than learnings per test — DoorDash ships company-wide experiment summaries (win or lose) that the CEO actively reads and responds to, creating cultural accountability around testing rigor.<br>- Offline evaluation acts as a pre-filter for model velocity — Amazon's search team used golden data sets to cut hundreds of ML candidates down to 10 for live A/B testing, preventing wasted experiment slots.<br>- One-size metrics break in multi-dimensional marketplaces — DoorDash balances consumer retention, dasher utilization, and merchant inventory mix across verticals because optimizing one side degrades the ecosystem.<br>- Democratization requires opinionated templates, not open-ended tools — enabling non-technical users to run tests means embedding success metrics and guardrails into pre-built experiment configs.<br>- AI scales institutional knowledge, not just analysis speed — mining past experiment readouts to auto-generate new hypotheses turns your testing history into a compounding advantage.</p><p><br><strong>Connect with the guest</strong><br>LinkedIn: <a href="https://www.linkedin.com/in/ilyaizrailevsky/">https://www.linkedin.com/in/ilyaizrailevsky/</a><br>Learn more about DoorDash: <a href="https://www.doordash.com/">https://www.doordash.com/</a></p><p><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, experimentation platform, feature flags, marketplace experimentation, machine learning, growth experimentation, statistical significance, experimentation culture, agentic AI workflows.</p>
<ul><li>(00:40) - From building Wasabi (Intuit's open-source platform) to running ML at Amazon and Uber </li>
<li>(03:04) - Why product velocity without experimentation creates feature bloat, not impact </li>
<li>(05:32) - Scaling search at Amazon: billions of products, 10 visible results, 25% win rate </li>
<li>(08:22) - Offline evaluation as a filter — golden data sets cut model candidates before live traffic </li>
<li>(10:23) - DoorDash's three-sided marketplace: 300 million feature flag evaluations per second </li>
<li>(12:38) - CEO Tony Xu reads every experiment email and replies with alternative hypotheses </li>
<li>(13:33) - Democratization at scale: enabling merchants to A/B test menu pricing and promotions </li>
<li>(17:05) - DashPass churn experiment uncovered value perception gap — became a full product area </li>
<li>(22:03) - Why expanding delivery radius killed retail orders but boosted grocery conversions </li>
<li>(24:16) - No single North Star metric — balancing consumer quality, dasher earnings, merchant mix </li>
<li>(27:29) - Four-dimensional scale: democratization, global expansion, new verticals, AI agents </li>
<li>(31:03) - Agentic experimentation: AI mines past tests to generate hypotheses and debug imbalance</li>
</ul>]]>
      </content:encoded>
      <pubDate>Tue, 12 May 2026 06:00:00 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/59877618/b95fea3b.mp3" length="30055189" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/1UaQNgXc09Z0lmewBY91yncvftGJVPKt1B8h5WRrR2I/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9hMTM4/OTRmOWRjOThjMzY5/ZjRjN2YwYzYxNzk0/MTFkZS5wbmc.jpg"/>
      <itunes:duration>1875</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>This episode of The Experimentation Edge unpacks how DoorDash's experimentation platform runs 12,000+ A/B tests per year across 42 million monthly active users — and now powers merchant-led testing on menu pricing and promotions. Ilya Izrailevsky, Senior Engineering Manager leading the platform, explains how feature flags, marketplace experimentation, and CEO-level experiment reviews built a multi-million-dollar experimentation culture across consumers, dashers, and merchants.</p><p><strong>Summary</strong><br>Most companies struggle to scale experimentation beyond engineering teams. DoorDash runs over 12,000 experiments per year across 42 million monthly active users — and now they're enabling restaurant owners to run their own A/B tests on menu pricing and promotions. Ilya Izrailevsky, Senior Engineering Manager leading DoorDash's experimentation platform, shares how the company built a three-sided marketplace testing program that balances consumers, dashers, and merchants across 40+ countries. From his time scaling search at Amazon (where offline model evaluation narrowed hundreds of candidates down to 10 for live testing) to preventing DashPass churn at DoorDash, Ilya reveals what happens when experimentation scales beyond product teams — and why CEO-level experiment review emails drive cultural change faster than any training program.</p><p>One standout learning: expanding delivery radius to 11+ miles increased grocery orders but tanked retail conversions. The lesson wasn't about distance — it was that one metric approach breaks in multi-dimensional marketplaces. DoorDash now segments experimentation by vertical, behavior pattern, and regional market, using AI agents to mine institutional knowledge from past tests and auto-generate experiment summaries that ship company-wide within hours of readout.</p><p><br><strong>Timestamps</strong><br>00:40 From building Wasabi (Intuit's open-source platform) to running ML at Amazon and Uber  <br>03:04 Why product velocity without experimentation creates feature bloat, not impact  <br>05:32 Scaling search at Amazon: billions of products, 10 visible results, 25% win rate  <br>08:22 Offline evaluation as a filter — golden data sets cut model candidates before live traffic  <br>10:23 DoorDash's three-sided marketplace: 300 million feature flag evaluations per second  <br>12:38 CEO Tony Xu reads every experiment email and replies with alternative hypotheses  <br>13:33 Democratization at scale: enabling merchants to A/B test menu pricing and promotions  <br>17:05 DashPass churn experiment uncovered value perception gap — became a full product area  <br>22:03 Why expanding delivery radius killed retail orders but boosted grocery conversions  <br>24:16 No single North Star metric — balancing consumer quality, dasher earnings, merchant mix  <br>27:29 Four-dimensional scale: democratization, global expansion, new verticals, AI agents  <br>31:03 Agentic experimentation: AI mines past tests to generate hypotheses and debug imbalance</p><p><br><strong>Takeaways</strong><br>- Win rate matters less than learnings per test — DoorDash ships company-wide experiment summaries (win or lose) that the CEO actively reads and responds to, creating cultural accountability around testing rigor.<br>- Offline evaluation acts as a pre-filter for model velocity — Amazon's search team used golden data sets to cut hundreds of ML candidates down to 10 for live A/B testing, preventing wasted experiment slots.<br>- One-size metrics break in multi-dimensional marketplaces — DoorDash balances consumer retention, dasher utilization, and merchant inventory mix across verticals because optimizing one side degrades the ecosystem.<br>- Democratization requires opinionated templates, not open-ended tools — enabling non-technical users to run tests means embedding success metrics and guardrails into pre-built experiment configs.<br>- AI scales institutional knowledge, not just analysis speed — mining past experiment readouts to auto-generate new hypotheses turns your testing history into a compounding advantage.</p><p><br><strong>Connect with the guest</strong><br>LinkedIn: <a href="https://www.linkedin.com/in/ilyaizrailevsky/">https://www.linkedin.com/in/ilyaizrailevsky/</a><br>Learn more about DoorDash: <a href="https://www.doordash.com/">https://www.doordash.com/</a></p><p><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, experimentation platform, feature flags, marketplace experimentation, machine learning, growth experimentation, statistical significance, experimentation culture, agentic AI workflows.</p>
<ul><li>(00:40) - From building Wasabi (Intuit's open-source platform) to running ML at Amazon and Uber </li>
<li>(03:04) - Why product velocity without experimentation creates feature bloat, not impact </li>
<li>(05:32) - Scaling search at Amazon: billions of products, 10 visible results, 25% win rate </li>
<li>(08:22) - Offline evaluation as a filter — golden data sets cut model candidates before live traffic </li>
<li>(10:23) - DoorDash's three-sided marketplace: 300 million feature flag evaluations per second </li>
<li>(12:38) - CEO Tony Xu reads every experiment email and replies with alternative hypotheses </li>
<li>(13:33) - Democratization at scale: enabling merchants to A/B test menu pricing and promotions </li>
<li>(17:05) - DashPass churn experiment uncovered value perception gap — became a full product area </li>
<li>(22:03) - Why expanding delivery radius killed retail orders but boosted grocery conversions </li>
<li>(24:16) - No single North Star metric — balancing consumer quality, dasher earnings, merchant mix </li>
<li>(27:29) - Four-dimensional scale: democratization, global expansion, new verticals, AI agents </li>
<li>(31:03) - Agentic experimentation: AI mines past tests to generate hypotheses and debug imbalance</li>
</ul>]]>
      </itunes:summary>
      <itunes:keywords>a/b testing,experimentation,machine learning optimization,search ranking,multi-dimensional optimization,win rate,guardrail metrics,offline experimentation,doordash experimentation,experimentation program,democratization of experimentation,experiment lifecycle,ai-powered experimentation,marketplace experimentation,consumer behavior testing,dashpass retention,experimentation culture,ceo experiment reviews,experiment templates,agentic ai workflows,tech experimentation,growth experimentation,product experimentation,feature flags</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/59877618/transcript.txt" type="text/plain"/>
      <podcast:chapters url="https://share.transistor.fm/s/59877618/chapters.json" type="application/json+chapters"/>
    </item>
    <item>
      <title>How UPS's Experimentation Team Generated Half a Billion From 80+ Apps With A/B Testing</title>
      <itunes:episode>12</itunes:episode>
      <podcast:episode>12</podcast:episode>
      <itunes:title>How UPS's Experimentation Team Generated Half a Billion From 80+ Apps With A/B Testing</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">37d8b34d-0487-430a-8acb-1bf1d854b497</guid>
      <link>https://share.transistor.fm/s/7b0fed20</link>
      <description>
        <![CDATA[<p>This episode of The Experimentation Edge shows how UPS's A/B testing program drove $500M+ in incremental revenue across 80+ customer-facing applications. Dave Massey — head of the J.E.D.I. team (Journey Experience and Design Innovation) — walks through the first test that proved UX could move revenue, how he defended counterintuitive results to skeptical execs, and how a small experimentation team can override opinion with data at enterprise scale.</p><p><strong>Summary</strong><br>Dave Massey walked into UPS in 2016 and immediately got pulled into a meeting about AB testing tools. By the end of the day, he owned the platform—and the problem: UPS hadn't run a single meaningful experiment. Three years later, senior leadership gave him a hard number to hit. Prove UX could move revenue, or the pilot dies. His first test—removing navigation from the checkout flow—delivered $35 million in incremental revenue. Senior leaders didn't believe it. They made him defend the results upside down and sideways. When the dust settled, the data held. Today, Massey's team has driven over half a billion dollars in incremental revenue by treating UPS.com like the e-commerce business it actually is.</p><p>Massey's approach is simple: test everything, especially what senior leaders think will work. His team, Journey Experience and Design Innovation (nicknamed J.E.D.I.), has built a reputation for saying no with data, not opinion. When a business unit demanded required recipient emails to capture customer data, J.E.D.I. ran the test in 24 hours and killed it. Conversion tanked. Two years later, the international team asked for the same feature—but framed it as a customs solution. That test passed. Same feature, different reason, different outcome. That's the edge Massey's team delivers: rigorous hypothesis design, a UX research team embedded in the experimentation workflow, and zero tolerance for untested ideas.</p><p><br><strong>Timestamps<br></strong>03:09 Dave's first day at UPS: inheriting an AB testing tool with no program  <br>05:59 Senior leadership's ultimatum: prove UX ROI or kill the pilot  <br>08:38 First test result: $35M from removing navigation in checkout  <br>09:48 Defending the numbers: how Massey's team survived scrutiny  <br>11:07 Why a data-driven engineering culture made experimentation inevitable  <br>16:12 Team size: 80 people supporting almost 80 customer-facing applications  <br>19:08 The 24-hour test: when required email fields killed conversion  <br>22:28 Why Massey embeds UX research inside the experimentation team  <br>24:41 AI at UPS: treating it as a tool, not a replacement  </p><p><br><strong>Takeaways<br></strong>- Massey's first test removed navigation from UPS's shipping checkout flow and delivered $35 million in incremental revenue—proving e-commerce best practices apply even when customers think "this is just a tool, not e-commerce."  <br>- J.E.D.I.'s win rate stays high because UX research and experimentation teams operate under the same leader, giving the program both behavioral metrics and voice-of-customer insight before tests ever launch.  <br>- When senior leaders push ideas, Massey's team tests them instead of arguing—then delivers results that either validate the idea or identify three better alternatives the data actually supports.  <br>- The same feature (required recipient email) failed for customer data capture but passed for international customs—proof that framing and customer benefit matter more than the feature itself.  <br>- UPS runs everything centrally now, but the real win is that demand for testing has decentralized—business units across the company now come to J.E.D.I. asking to test their ideas.</p><p><br><strong>Connect with the guest</strong><br>LinkedIn: <a href="https://www.linkedin.com/in/masseycreates/">https://www.linkedin.com/in/masseycreates/</a><a href="https://www.linkedin.com/in/davemassey"> </a><br>Learn more about UPS: <a href="https://www.ups.com">https://www.ups.com</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, experimentation, conversion rate optimization, feature flags, UX research, e-commerce experimentation, statistical significance, experimentation team building, growth experimentation, sequential testing.</p>
<ul><li>(03:09) - Dave's first day at UPS: inheriting an AB testing tool with no program </li>
<li>(05:59) - Senior leadership's ultimatum: prove UX ROI or kill the pilot </li>
<li>(08:38) - First test result: $35M from removing navigation in checkout </li>
<li>(09:48) - Defending the numbers: how Massey's team survived scrutiny </li>
<li>(11:07) - Why a data-driven engineering culture made experimentation inevitable </li>
<li>(16:12) - Team size: 80 people supporting almost 80 customer-facing applications </li>
<li>(19:08) - The 24-hour test: when required email fields killed conversion </li>
<li>(22:28) - Why Massey embeds UX research inside the experimentation team </li>
<li>(24:41) - AI at UPS: treating it as a tool, not a replacement </li>
</ul>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>This episode of The Experimentation Edge shows how UPS's A/B testing program drove $500M+ in incremental revenue across 80+ customer-facing applications. Dave Massey — head of the J.E.D.I. team (Journey Experience and Design Innovation) — walks through the first test that proved UX could move revenue, how he defended counterintuitive results to skeptical execs, and how a small experimentation team can override opinion with data at enterprise scale.</p><p><strong>Summary</strong><br>Dave Massey walked into UPS in 2016 and immediately got pulled into a meeting about AB testing tools. By the end of the day, he owned the platform—and the problem: UPS hadn't run a single meaningful experiment. Three years later, senior leadership gave him a hard number to hit. Prove UX could move revenue, or the pilot dies. His first test—removing navigation from the checkout flow—delivered $35 million in incremental revenue. Senior leaders didn't believe it. They made him defend the results upside down and sideways. When the dust settled, the data held. Today, Massey's team has driven over half a billion dollars in incremental revenue by treating UPS.com like the e-commerce business it actually is.</p><p>Massey's approach is simple: test everything, especially what senior leaders think will work. His team, Journey Experience and Design Innovation (nicknamed J.E.D.I.), has built a reputation for saying no with data, not opinion. When a business unit demanded required recipient emails to capture customer data, J.E.D.I. ran the test in 24 hours and killed it. Conversion tanked. Two years later, the international team asked for the same feature—but framed it as a customs solution. That test passed. Same feature, different reason, different outcome. That's the edge Massey's team delivers: rigorous hypothesis design, a UX research team embedded in the experimentation workflow, and zero tolerance for untested ideas.</p><p><br><strong>Timestamps<br></strong>03:09 Dave's first day at UPS: inheriting an AB testing tool with no program  <br>05:59 Senior leadership's ultimatum: prove UX ROI or kill the pilot  <br>08:38 First test result: $35M from removing navigation in checkout  <br>09:48 Defending the numbers: how Massey's team survived scrutiny  <br>11:07 Why a data-driven engineering culture made experimentation inevitable  <br>16:12 Team size: 80 people supporting almost 80 customer-facing applications  <br>19:08 The 24-hour test: when required email fields killed conversion  <br>22:28 Why Massey embeds UX research inside the experimentation team  <br>24:41 AI at UPS: treating it as a tool, not a replacement  </p><p><br><strong>Takeaways<br></strong>- Massey's first test removed navigation from UPS's shipping checkout flow and delivered $35 million in incremental revenue—proving e-commerce best practices apply even when customers think "this is just a tool, not e-commerce."  <br>- J.E.D.I.'s win rate stays high because UX research and experimentation teams operate under the same leader, giving the program both behavioral metrics and voice-of-customer insight before tests ever launch.  <br>- When senior leaders push ideas, Massey's team tests them instead of arguing—then delivers results that either validate the idea or identify three better alternatives the data actually supports.  <br>- The same feature (required recipient email) failed for customer data capture but passed for international customs—proof that framing and customer benefit matter more than the feature itself.  <br>- UPS runs everything centrally now, but the real win is that demand for testing has decentralized—business units across the company now come to J.E.D.I. asking to test their ideas.</p><p><br><strong>Connect with the guest</strong><br>LinkedIn: <a href="https://www.linkedin.com/in/masseycreates/">https://www.linkedin.com/in/masseycreates/</a><a href="https://www.linkedin.com/in/davemassey"> </a><br>Learn more about UPS: <a href="https://www.ups.com">https://www.ups.com</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, experimentation, conversion rate optimization, feature flags, UX research, e-commerce experimentation, statistical significance, experimentation team building, growth experimentation, sequential testing.</p>
<ul><li>(03:09) - Dave's first day at UPS: inheriting an AB testing tool with no program </li>
<li>(05:59) - Senior leadership's ultimatum: prove UX ROI or kill the pilot </li>
<li>(08:38) - First test result: $35M from removing navigation in checkout </li>
<li>(09:48) - Defending the numbers: how Massey's team survived scrutiny </li>
<li>(11:07) - Why a data-driven engineering culture made experimentation inevitable </li>
<li>(16:12) - Team size: 80 people supporting almost 80 customer-facing applications </li>
<li>(19:08) - The 24-hour test: when required email fields killed conversion </li>
<li>(22:28) - Why Massey embeds UX research inside the experimentation team </li>
<li>(24:41) - AI at UPS: treating it as a tool, not a replacement </li>
</ul>]]>
      </content:encoded>
      <pubDate>Mon, 11 May 2026 06:00:00 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/7b0fed20/74587a02.mp3" length="22283046" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/xemB3fCyovr4nhTHoLEGRLZ11ibzgkeYl5DHp_5Gvrg/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS80NmUz/YmY3MDI0OGJiZGIy/NzRiNDI5MjZkOWQ5/NzY2MS5wbmc.jpg"/>
      <itunes:duration>1390</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>This episode of The Experimentation Edge shows how UPS's A/B testing program drove $500M+ in incremental revenue across 80+ customer-facing applications. Dave Massey — head of the J.E.D.I. team (Journey Experience and Design Innovation) — walks through the first test that proved UX could move revenue, how he defended counterintuitive results to skeptical execs, and how a small experimentation team can override opinion with data at enterprise scale.</p><p><strong>Summary</strong><br>Dave Massey walked into UPS in 2016 and immediately got pulled into a meeting about AB testing tools. By the end of the day, he owned the platform—and the problem: UPS hadn't run a single meaningful experiment. Three years later, senior leadership gave him a hard number to hit. Prove UX could move revenue, or the pilot dies. His first test—removing navigation from the checkout flow—delivered $35 million in incremental revenue. Senior leaders didn't believe it. They made him defend the results upside down and sideways. When the dust settled, the data held. Today, Massey's team has driven over half a billion dollars in incremental revenue by treating UPS.com like the e-commerce business it actually is.</p><p>Massey's approach is simple: test everything, especially what senior leaders think will work. His team, Journey Experience and Design Innovation (nicknamed J.E.D.I.), has built a reputation for saying no with data, not opinion. When a business unit demanded required recipient emails to capture customer data, J.E.D.I. ran the test in 24 hours and killed it. Conversion tanked. Two years later, the international team asked for the same feature—but framed it as a customs solution. That test passed. Same feature, different reason, different outcome. That's the edge Massey's team delivers: rigorous hypothesis design, a UX research team embedded in the experimentation workflow, and zero tolerance for untested ideas.</p><p><br><strong>Timestamps<br></strong>03:09 Dave's first day at UPS: inheriting an AB testing tool with no program  <br>05:59 Senior leadership's ultimatum: prove UX ROI or kill the pilot  <br>08:38 First test result: $35M from removing navigation in checkout  <br>09:48 Defending the numbers: how Massey's team survived scrutiny  <br>11:07 Why a data-driven engineering culture made experimentation inevitable  <br>16:12 Team size: 80 people supporting almost 80 customer-facing applications  <br>19:08 The 24-hour test: when required email fields killed conversion  <br>22:28 Why Massey embeds UX research inside the experimentation team  <br>24:41 AI at UPS: treating it as a tool, not a replacement  </p><p><br><strong>Takeaways<br></strong>- Massey's first test removed navigation from UPS's shipping checkout flow and delivered $35 million in incremental revenue—proving e-commerce best practices apply even when customers think "this is just a tool, not e-commerce."  <br>- J.E.D.I.'s win rate stays high because UX research and experimentation teams operate under the same leader, giving the program both behavioral metrics and voice-of-customer insight before tests ever launch.  <br>- When senior leaders push ideas, Massey's team tests them instead of arguing—then delivers results that either validate the idea or identify three better alternatives the data actually supports.  <br>- The same feature (required recipient email) failed for customer data capture but passed for international customs—proof that framing and customer benefit matter more than the feature itself.  <br>- UPS runs everything centrally now, but the real win is that demand for testing has decentralized—business units across the company now come to J.E.D.I. asking to test their ideas.</p><p><br><strong>Connect with the guest</strong><br>LinkedIn: <a href="https://www.linkedin.com/in/masseycreates/">https://www.linkedin.com/in/masseycreates/</a><a href="https://www.linkedin.com/in/davemassey"> </a><br>Learn more about UPS: <a href="https://www.ups.com">https://www.ups.com</a></p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p>Topics: A/B testing, experimentation, conversion rate optimization, feature flags, UX research, e-commerce experimentation, statistical significance, experimentation team building, growth experimentation, sequential testing.</p>
<ul><li>(03:09) - Dave's first day at UPS: inheriting an AB testing tool with no program </li>
<li>(05:59) - Senior leadership's ultimatum: prove UX ROI or kill the pilot </li>
<li>(08:38) - First test result: $35M from removing navigation in checkout </li>
<li>(09:48) - Defending the numbers: how Massey's team survived scrutiny </li>
<li>(11:07) - Why a data-driven engineering culture made experimentation inevitable </li>
<li>(16:12) - Team size: 80 people supporting almost 80 customer-facing applications </li>
<li>(19:08) - The 24-hour test: when required email fields killed conversion </li>
<li>(22:28) - Why Massey embeds UX research inside the experimentation team </li>
<li>(24:41) - AI at UPS: treating it as a tool, not a replacement </li>
</ul>]]>
      </itunes:summary>
      <itunes:keywords>experimentation,a/b testing,conversion rate optimization,statistical significance,testing velocity,user research,personalization,ecommerce optimization,checkout flow,feature flagging,experimentation roi,win rate,product experimentation,data-driven decision making,behavioral metrics,customer experience testing,experimentation team building,sequential testing,ux research,guardrail metrics,tech experimentation,growth experimentation,feature flags</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/7b0fed20/transcript.txt" type="text/plain"/>
      <podcast:chapters url="https://share.transistor.fm/s/7b0fed20/chapters.json" type="application/json+chapters"/>
    </item>
    <item>
      <title>How Experimentation Led to Annual Growth at Fanatics</title>
      <itunes:title>How Experimentation Led to Annual Growth at Fanatics</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">c35052ae-96e1-4626-9a22-3b184fad5709</guid>
      <link>https://share.transistor.fm/s/6a9adab0</link>
      <description>
        <![CDATA[<p><strong>Summary</strong></p><p>Most e-commerce companies test a handful of features each month. Fanatics runs nearly 100 experiments monthly and delivers a big portion of the company's total annual growth through experimentation alone. Medha Umarji, VP of Growth and Experimentation at the multi-billion dollar sports merchandising retailer, explains how she built a program that scales from 10 tests per month to 100—and maintains enough rigor to spot false positives before they become costly decisions.</p><p>The difference isn't tooling or headcount. It's culture. When your CEO reads Excel spreadsheets for fun and actively wants data to prove him wrong, you stop debating whether to test and start debating how to test smarter. Medha shares the frameworks Fanatics uses to balance speed with rigor: a "do no harm" track for brand plays that won't show up in conversion metrics, a small-sample framework for teams that can't hit statistical significance thresholds, and an experimentation Wiki that feeds a continuous iteration flywheel. One surprising test on ad removal initially showed 95% statistical significance—until they replicated it and found the result was a false positive. The lesson: even at scale, you need to double-click on causality.<br></p><p><strong>Timestamps</strong></p><p>03:09 How Fanatics scaled from 10 to 100 experiments per month over 10 years</p><p>05:25 Why some leadership teams embrace experimentation and others resist it</p><p>07:06 How experimentation consistently delivers a big portion of Fanatics' annual growth</p><p>08:20 What happens when your CEO consumes Excel spreadsheets and questions everything</p><p>10:35 How top-down humility shapes an entire company's testing culture</p><p>12:10 The ad removal test that looked like a 95% win—then failed replication</p><p>15:55 How Fanatics built an experimentation Wiki that powers their growth engine</p><p>22:45 The "do no harm" framework for features that don't measure cleanly in A/B tests</p><p>25:20 Why lowering barriers to adoption matters more than statistical perfection early on</p><p>26:27 Your odds of winning at experimentation are worse than roulette<br></p><p><strong>Takeaways</strong></p><ul><li>Replication catches false positives: A 95% confidence level still means 1 in 20 results are noise—if a critical test outcome can't be explained through micro-metrics, run it again before committing resources.</li><li>Top-down buy-in shifts the conversation from "why test?" to "how do we test?": When leadership treats data as the tiebreaker, teams stop defending opinions and start building better experiments.</li><li>Frameworks like "do no harm" and "small sample" expand who can test: Not every initiative needs 30,000 orders to ship value—lower the barrier for teams that can't hit statistical thresholds while protecting core KPIs.</li><li>Documenting experiments in a centralized Wiki creates a growth flywheel: Fanatics' Wiki feeds their roadmap with iterations on already-built features, reducing tech dependency and accelerating velocity.</li><li>Micro-metrics establish causality beyond top-line KPIs: If revenue moves but scroll depth, cart adds, and product views don't follow the same pattern, question the result before declaring a win.</li></ul><p><br><strong>Connect with the guest</strong></p><p>LinkedIn: <a href="https://www.linkedin.com/in/medhaumarji/">https://www.linkedin.com/in/medhaumarji/</a></p><p><strong>Learn more about Fanatics</strong></p><p><a href="https://www.fanatics.com/">https://www.fanatics.com/</a></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><strong>Summary</strong></p><p>Most e-commerce companies test a handful of features each month. Fanatics runs nearly 100 experiments monthly and delivers a big portion of the company's total annual growth through experimentation alone. Medha Umarji, VP of Growth and Experimentation at the multi-billion dollar sports merchandising retailer, explains how she built a program that scales from 10 tests per month to 100—and maintains enough rigor to spot false positives before they become costly decisions.</p><p>The difference isn't tooling or headcount. It's culture. When your CEO reads Excel spreadsheets for fun and actively wants data to prove him wrong, you stop debating whether to test and start debating how to test smarter. Medha shares the frameworks Fanatics uses to balance speed with rigor: a "do no harm" track for brand plays that won't show up in conversion metrics, a small-sample framework for teams that can't hit statistical significance thresholds, and an experimentation Wiki that feeds a continuous iteration flywheel. One surprising test on ad removal initially showed 95% statistical significance—until they replicated it and found the result was a false positive. The lesson: even at scale, you need to double-click on causality.<br></p><p><strong>Timestamps</strong></p><p>03:09 How Fanatics scaled from 10 to 100 experiments per month over 10 years</p><p>05:25 Why some leadership teams embrace experimentation and others resist it</p><p>07:06 How experimentation consistently delivers a big portion of Fanatics' annual growth</p><p>08:20 What happens when your CEO consumes Excel spreadsheets and questions everything</p><p>10:35 How top-down humility shapes an entire company's testing culture</p><p>12:10 The ad removal test that looked like a 95% win—then failed replication</p><p>15:55 How Fanatics built an experimentation Wiki that powers their growth engine</p><p>22:45 The "do no harm" framework for features that don't measure cleanly in A/B tests</p><p>25:20 Why lowering barriers to adoption matters more than statistical perfection early on</p><p>26:27 Your odds of winning at experimentation are worse than roulette<br></p><p><strong>Takeaways</strong></p><ul><li>Replication catches false positives: A 95% confidence level still means 1 in 20 results are noise—if a critical test outcome can't be explained through micro-metrics, run it again before committing resources.</li><li>Top-down buy-in shifts the conversation from "why test?" to "how do we test?": When leadership treats data as the tiebreaker, teams stop defending opinions and start building better experiments.</li><li>Frameworks like "do no harm" and "small sample" expand who can test: Not every initiative needs 30,000 orders to ship value—lower the barrier for teams that can't hit statistical thresholds while protecting core KPIs.</li><li>Documenting experiments in a centralized Wiki creates a growth flywheel: Fanatics' Wiki feeds their roadmap with iterations on already-built features, reducing tech dependency and accelerating velocity.</li><li>Micro-metrics establish causality beyond top-line KPIs: If revenue moves but scroll depth, cart adds, and product views don't follow the same pattern, question the result before declaring a win.</li></ul><p><br><strong>Connect with the guest</strong></p><p>LinkedIn: <a href="https://www.linkedin.com/in/medhaumarji/">https://www.linkedin.com/in/medhaumarji/</a></p><p><strong>Learn more about Fanatics</strong></p><p><a href="https://www.fanatics.com/">https://www.fanatics.com/</a></p>]]>
      </content:encoded>
      <pubDate>Thu, 07 May 2026 06:00:00 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/6a9adab0/d3357f35.mp3" length="27477331" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/jfZB9oGxWX42tXoQqAYsXxxg3rGq3z4zTvsBb0C1IDk/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8zYmMy/OTY4MGM0YWFhYTE0/NDBlNzQwMjZhODUw/YTJhZC5qcGc.jpg"/>
      <itunes:duration>1718</itunes:duration>
      <itunes:summary>How Experimentation Leads to Annual Growth Every Year at Fanatics</itunes:summary>
      <itunes:subtitle>How Experimentation Leads to Annual Growth Every Year at Fanatics</itunes:subtitle>
      <itunes:keywords>experimentation program, A/B testing, e-commerce optimization, testing velocity, data-driven decision making, CEO buy-in, experimentation culture, do no harm framework, non-inferiority testing, experimentation wiki, false positive rate, 95% statistical significance, replicating test results, micro-metrics analysis, guardrail metrics, small sample testing, experimentation roadmap, feature iteration, VP Growth, retail experimentation</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/6a9adab0/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Inside Chess.com's Plan to Run 1,000 Experiments in a Single Year</title>
      <itunes:episode>9</itunes:episode>
      <podcast:episode>9</podcast:episode>
      <itunes:title>Inside Chess.com's Plan to Run 1,000 Experiments in a Single Year</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">a951862b-60e8-486a-a8e2-ec0e438b7a10</guid>
      <link>https://share.transistor.fm/s/5ebd1b8f</link>
      <description>
        <![CDATA[<p><strong>Summary</strong></p><p><br>Chess.com ran its first A/B test in 2023. Two years later, the team is on track to run 1,000 experiments in a single year—and they've already shipped 195 in Q1. </p><p><br>In this episode, Ashley Stirrup sits down with Nafis Shaikh, Director of Product Management at Chess.com, to get inside the experimentation engine powering one of the world's most beloved gaming products. </p><p><br>Nafis brings experience from Zynga and Prodigy and a refreshingly honest take on what changes when a product built on passion suddenly has to serve a 10-million-DAU user base that spans absolute beginners to rated FIDE players. He and Ashley get into why one-size-fits-all doesn't actually fit anyone, how to measure an AI coach when you can't tell whether users have their volume on, and a game review experiment that completely upended the team's assumptions about how players want to learn. </p><p><br>Nafis also shares practical advice for product managers trying to introduce experimentation culture to organizations that have never done it before—starting with a simple pre/post test rather than a fancy platform. If you lead product, care about experimentation maturity, or just want to hear how a classic product is scaling its learning loop, this one's worth your time.</p><p><strong><br>Timestamps</strong></p><ul><li>[00:35] – Chess.com's experimentation origin story and the 1,000-test goal</li><li>[05:01] – Designing for a user base that spans beginners to FIDE-rated players</li><li>[07:30] – The four metrics dimensions Nafis uses to evaluate tests</li><li>[12:03] – How do you A/B test an AI coach when you can't tell who's listening?</li><li>[15:49] – Embracing humility and the shift away from "we know what works"</li><li>[20:50] – The game review test that surprised everyone: 80% of users review wins</li><li>[24:06] – Advice for PMs introducing experimentation at a new company</li><li>[29:15] – The onboarding debate and personalization from session zero<p></p></li></ul><p><strong><br>Takeaways</strong></p><ul><li>Scale test volume to learning speed, not just shipping speed</li><li>Build hypotheses around user psychology, not just KPI movement</li><li>Accept that being wrong is the point—experimentation only works when leadership embraces humility</li><li>Start simple if you're new to experimentation; a clean pre/post comparison beats a fancy platform you don't use</li><li>Reposition features around how users actually feel, not how you assume they should feel</li><li>Design onboarding around the shortest path to value, not the longest path to personalization<p></p></li></ul><p><strong><br>Guest LinkedIn:</strong> <a href="https://www.linkedin.com/in/nafis-shaikh-20161916/">https://www.linkedin.com/in/nafis-shaikh-20161916/</a></p><p><strong>Company website:</strong> <a href="https://www.chess.com">https://www.chess.com</a></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><strong>Summary</strong></p><p><br>Chess.com ran its first A/B test in 2023. Two years later, the team is on track to run 1,000 experiments in a single year—and they've already shipped 195 in Q1. </p><p><br>In this episode, Ashley Stirrup sits down with Nafis Shaikh, Director of Product Management at Chess.com, to get inside the experimentation engine powering one of the world's most beloved gaming products. </p><p><br>Nafis brings experience from Zynga and Prodigy and a refreshingly honest take on what changes when a product built on passion suddenly has to serve a 10-million-DAU user base that spans absolute beginners to rated FIDE players. He and Ashley get into why one-size-fits-all doesn't actually fit anyone, how to measure an AI coach when you can't tell whether users have their volume on, and a game review experiment that completely upended the team's assumptions about how players want to learn. </p><p><br>Nafis also shares practical advice for product managers trying to introduce experimentation culture to organizations that have never done it before—starting with a simple pre/post test rather than a fancy platform. If you lead product, care about experimentation maturity, or just want to hear how a classic product is scaling its learning loop, this one's worth your time.</p><p><strong><br>Timestamps</strong></p><ul><li>[00:35] – Chess.com's experimentation origin story and the 1,000-test goal</li><li>[05:01] – Designing for a user base that spans beginners to FIDE-rated players</li><li>[07:30] – The four metrics dimensions Nafis uses to evaluate tests</li><li>[12:03] – How do you A/B test an AI coach when you can't tell who's listening?</li><li>[15:49] – Embracing humility and the shift away from "we know what works"</li><li>[20:50] – The game review test that surprised everyone: 80% of users review wins</li><li>[24:06] – Advice for PMs introducing experimentation at a new company</li><li>[29:15] – The onboarding debate and personalization from session zero<p></p></li></ul><p><strong><br>Takeaways</strong></p><ul><li>Scale test volume to learning speed, not just shipping speed</li><li>Build hypotheses around user psychology, not just KPI movement</li><li>Accept that being wrong is the point—experimentation only works when leadership embraces humility</li><li>Start simple if you're new to experimentation; a clean pre/post comparison beats a fancy platform you don't use</li><li>Reposition features around how users actually feel, not how you assume they should feel</li><li>Design onboarding around the shortest path to value, not the longest path to personalization<p></p></li></ul><p><strong><br>Guest LinkedIn:</strong> <a href="https://www.linkedin.com/in/nafis-shaikh-20161916/">https://www.linkedin.com/in/nafis-shaikh-20161916/</a></p><p><strong>Company website:</strong> <a href="https://www.chess.com">https://www.chess.com</a></p>]]>
      </content:encoded>
      <pubDate>Tue, 21 Apr 2026 06:00:00 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/5ebd1b8f/8114a915.mp3" length="30552351" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/F27wmxVbKyoD-z8yQHePT9M_yQrlpMz2HNM1aTxCIg8/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS83ZDlm/ODRjZWJhZThkOGZi/MjhlMmRmM2U3NGFh/NjQ0My5wbmc.jpg"/>
      <itunes:duration>1907</itunes:duration>
      <itunes:summary>
        <![CDATA[<p><strong>Summary</strong></p><p><br>Chess.com ran its first A/B test in 2023. Two years later, the team is on track to run 1,000 experiments in a single year—and they've already shipped 195 in Q1. </p><p><br>In this episode, Ashley Stirrup sits down with Nafis Shaikh, Director of Product Management at Chess.com, to get inside the experimentation engine powering one of the world's most beloved gaming products. </p><p><br>Nafis brings experience from Zynga and Prodigy and a refreshingly honest take on what changes when a product built on passion suddenly has to serve a 10-million-DAU user base that spans absolute beginners to rated FIDE players. He and Ashley get into why one-size-fits-all doesn't actually fit anyone, how to measure an AI coach when you can't tell whether users have their volume on, and a game review experiment that completely upended the team's assumptions about how players want to learn. </p><p><br>Nafis also shares practical advice for product managers trying to introduce experimentation culture to organizations that have never done it before—starting with a simple pre/post test rather than a fancy platform. If you lead product, care about experimentation maturity, or just want to hear how a classic product is scaling its learning loop, this one's worth your time.</p><p><strong><br>Timestamps</strong></p><ul><li>[00:35] – Chess.com's experimentation origin story and the 1,000-test goal</li><li>[05:01] – Designing for a user base that spans beginners to FIDE-rated players</li><li>[07:30] – The four metrics dimensions Nafis uses to evaluate tests</li><li>[12:03] – How do you A/B test an AI coach when you can't tell who's listening?</li><li>[15:49] – Embracing humility and the shift away from "we know what works"</li><li>[20:50] – The game review test that surprised everyone: 80% of users review wins</li><li>[24:06] – Advice for PMs introducing experimentation at a new company</li><li>[29:15] – The onboarding debate and personalization from session zero<p></p></li></ul><p><strong><br>Takeaways</strong></p><ul><li>Scale test volume to learning speed, not just shipping speed</li><li>Build hypotheses around user psychology, not just KPI movement</li><li>Accept that being wrong is the point—experimentation only works when leadership embraces humility</li><li>Start simple if you're new to experimentation; a clean pre/post comparison beats a fancy platform you don't use</li><li>Reposition features around how users actually feel, not how you assume they should feel</li><li>Design onboarding around the shortest path to value, not the longest path to personalization<p></p></li></ul><p><strong><br>Guest LinkedIn:</strong> <a href="https://www.linkedin.com/in/nafis-shaikh-20161916/">https://www.linkedin.com/in/nafis-shaikh-20161916/</a></p><p><strong>Company website:</strong> <a href="https://www.chess.com">https://www.chess.com</a></p>]]>
      </itunes:summary>
      <itunes:keywords>product management, product experimentation, A/B testing strategy, experimentation culture, product management metrics, AI coaching product, gaming product management, user segmentation testing, onboarding experimentation, feature flag testing, Zynga product management, Prodigy Education, experimentation maturity, hypothesis-driven product development, personalization strategy, freemium conversion testing, game review feature, product KPIs, experimentation scaling</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/5ebd1b8f/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Scaling Experimentation: Ancestry’s VP on AI Storytelling, Paywalls, and Decision Quality</title>
      <itunes:episode>8</itunes:episode>
      <podcast:episode>8</podcast:episode>
      <itunes:title>Scaling Experimentation: Ancestry’s VP on AI Storytelling, Paywalls, and Decision Quality</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">b0ad794d-dddd-4707-8db3-72272dcf6600</guid>
      <link>https://share.transistor.fm/s/87484f97</link>
      <description>
        <![CDATA[<p><strong>Summary</strong></p><p>What happens when A/B testing stops being a tool and becomes your operating system? Suresh Teckchandani, VP of Product &amp; Technology at Ancestry (formerly PayPal and eBay), shares how the team scaled experimentation from isolated tests to capability-building that drives roadmap and revenue. He details the “growth metering” and paywall experiments that unlocked a 5.3% lift in key engagement and improved conversions—then became platform features. Suresh explains Ancestry’s centralized experimentation platform with self-serve access for PMs and engineers, why “obvious” UX changes can be the riskiest, and how removing friction actually hurt engagement by 20–25% due to user mental models. He also breaks down a major growth lever: AI-powered storytelling that turns raw records into narratives, delivering 30%+ CTR lift and a 5x increase in story views. You’ll learn how Ancestry balances input vs. output metrics, when not to test, and why the best leaders optimize for decision quality over win counts—with clean baselines, right audiences, adequate sample sizes, and true statistical significance.</p><p><strong>Timestamps</strong></p><p>[00:45] – Ancestry’s experimentation maturity: metering, paywalls, and the 5.3% lift that unlocked capabilities</p><p>[03:48] – From isolated tests to a capability mindset: experimentation as an operating system</p><p>[05:34] – Balancing wins with learning: zooming out for subscription engagement and NPS</p><p>[08:49] – Operating model: centralized platform, self-serve dashboards, and baseline resets</p><p>[11:59] – Counterintuitive UX lesson: removing friction backfired (–20–25% CTR); respect mental models</p><p>[15:10] – AI storytelling as a growth lever: record comparisons into narratives, 30%+ CTR and 5x views</p><p>[18:27] – Input vs. output metrics: when to roll back and how to link short- and long-term outcomes</p><p>[30:00] – Parting advice: test what changes CX, avoid vanity testing, and optimize decision quality</p><p><strong>Takeaways</strong></p><p>- Build capabilities, not just tests—use experiments to unlock platform features (e.g., metering, paywalls).</p><p>- Democratize experimentation with a centralized platform and self-serve tooling; reset baselines regularly.</p><p>- Test “obvious” UX changes; preserve helpful friction and align with user mental models.</p><p>- Turn data into narratives with AI to deepen engagement and increase discovery.</p><p>- Define input and output metrics; ship only what improves core outcomes (retention, sign-ups), and roll back fast if not.</p><p>- Optimize for decision quality: right audience, sufficient sample sizes, clean baselines, and true statistical significance.</p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><strong>Summary</strong></p><p>What happens when A/B testing stops being a tool and becomes your operating system? Suresh Teckchandani, VP of Product &amp; Technology at Ancestry (formerly PayPal and eBay), shares how the team scaled experimentation from isolated tests to capability-building that drives roadmap and revenue. He details the “growth metering” and paywall experiments that unlocked a 5.3% lift in key engagement and improved conversions—then became platform features. Suresh explains Ancestry’s centralized experimentation platform with self-serve access for PMs and engineers, why “obvious” UX changes can be the riskiest, and how removing friction actually hurt engagement by 20–25% due to user mental models. He also breaks down a major growth lever: AI-powered storytelling that turns raw records into narratives, delivering 30%+ CTR lift and a 5x increase in story views. You’ll learn how Ancestry balances input vs. output metrics, when not to test, and why the best leaders optimize for decision quality over win counts—with clean baselines, right audiences, adequate sample sizes, and true statistical significance.</p><p><strong>Timestamps</strong></p><p>[00:45] – Ancestry’s experimentation maturity: metering, paywalls, and the 5.3% lift that unlocked capabilities</p><p>[03:48] – From isolated tests to a capability mindset: experimentation as an operating system</p><p>[05:34] – Balancing wins with learning: zooming out for subscription engagement and NPS</p><p>[08:49] – Operating model: centralized platform, self-serve dashboards, and baseline resets</p><p>[11:59] – Counterintuitive UX lesson: removing friction backfired (–20–25% CTR); respect mental models</p><p>[15:10] – AI storytelling as a growth lever: record comparisons into narratives, 30%+ CTR and 5x views</p><p>[18:27] – Input vs. output metrics: when to roll back and how to link short- and long-term outcomes</p><p>[30:00] – Parting advice: test what changes CX, avoid vanity testing, and optimize decision quality</p><p><strong>Takeaways</strong></p><p>- Build capabilities, not just tests—use experiments to unlock platform features (e.g., metering, paywalls).</p><p>- Democratize experimentation with a centralized platform and self-serve tooling; reset baselines regularly.</p><p>- Test “obvious” UX changes; preserve helpful friction and align with user mental models.</p><p>- Turn data into narratives with AI to deepen engagement and increase discovery.</p><p>- Define input and output metrics; ship only what improves core outcomes (retention, sign-ups), and roll back fast if not.</p><p>- Optimize for decision quality: right audience, sufficient sample sizes, clean baselines, and true statistical significance.</p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p><br></p>]]>
      </content:encoded>
      <pubDate>Tue, 14 Apr 2026 06:00:00 -0600</pubDate>
      <author>The Experimentation Edge</author>
      <enclosure url="https://media.transistor.fm/87484f97/bfb4b339.mp3" length="24231135" type="audio/mpeg"/>
      <itunes:author>The Experimentation Edge</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/kL0uERD2NwT_lilK3KeGtET2Tv2Ft2rdYxHDXICof9I/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9jMWNk/ODI3M2M1ZTlhMzc1/Njc5Y2UyOGUzMzlj/YTdlMy5qcGc.jpg"/>
      <itunes:duration>1511</itunes:duration>
      <itunes:summary>
        <![CDATA[<p><strong>Summary</strong></p><p>What happens when A/B testing stops being a tool and becomes your operating system? Suresh Teckchandani, VP of Product &amp; Technology at Ancestry (formerly PayPal and eBay), shares how the team scaled experimentation from isolated tests to capability-building that drives roadmap and revenue. He details the “growth metering” and paywall experiments that unlocked a 5.3% lift in key engagement and improved conversions—then became platform features. Suresh explains Ancestry’s centralized experimentation platform with self-serve access for PMs and engineers, why “obvious” UX changes can be the riskiest, and how removing friction actually hurt engagement by 20–25% due to user mental models. He also breaks down a major growth lever: AI-powered storytelling that turns raw records into narratives, delivering 30%+ CTR lift and a 5x increase in story views. You’ll learn how Ancestry balances input vs. output metrics, when not to test, and why the best leaders optimize for decision quality over win counts—with clean baselines, right audiences, adequate sample sizes, and true statistical significance.</p><p><strong>Timestamps</strong></p><p>[00:45] – Ancestry’s experimentation maturity: metering, paywalls, and the 5.3% lift that unlocked capabilities</p><p>[03:48] – From isolated tests to a capability mindset: experimentation as an operating system</p><p>[05:34] – Balancing wins with learning: zooming out for subscription engagement and NPS</p><p>[08:49] – Operating model: centralized platform, self-serve dashboards, and baseline resets</p><p>[11:59] – Counterintuitive UX lesson: removing friction backfired (–20–25% CTR); respect mental models</p><p>[15:10] – AI storytelling as a growth lever: record comparisons into narratives, 30%+ CTR and 5x views</p><p>[18:27] – Input vs. output metrics: when to roll back and how to link short- and long-term outcomes</p><p>[30:00] – Parting advice: test what changes CX, avoid vanity testing, and optimize decision quality</p><p><strong>Takeaways</strong></p><p>- Build capabilities, not just tests—use experiments to unlock platform features (e.g., metering, paywalls).</p><p>- Democratize experimentation with a centralized platform and self-serve tooling; reset baselines regularly.</p><p>- Test “obvious” UX changes; preserve helpful friction and align with user mental models.</p><p>- Turn data into narratives with AI to deepen engagement and increase discovery.</p><p>- Define input and output metrics; ship only what improves core outcomes (retention, sign-ups), and roll back fast if not.</p><p>- Optimize for decision quality: right audience, sufficient sample sizes, clean baselines, and true statistical significance.</p><p><br><strong>Sponsor</strong><br>Growthbook helps you ship features with confidence by bringing experimentation and feature flagging into one open-source platform. No more guessing whether that new checkout flow actually moved the needle, waiting weeks for data team bandwidth, or flying blind on rollouts.</p><p>Growthbook gives you a single place to run A/B tests, manage feature flags, and analyze results against your existing data warehouse.</p><p>With powerful stats built in, it takes the complexity out of experimentation, helps you catch regressions before they hit every user, and makes it easy to test ideas that keep your product improving and your metrics moving in the right direction.</p><p>See a demo at <a href="https://www.growthbook.io/">https://www.growthbook.io/</a></p><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>A/B testing,experimentation,product management,Ancestry,experimentation at scale,product experimentation,AI storytelling,user engagement,product metrics,feature testing,experimentation culture,VP product,subscription growth,conversion optimization,UX testing,data-driven product,test and learn,paywall optimization,product innovation,experimentation platform</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/87484f97/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>From $1M to $35M ARR: Fyxer’s Growth Engineering Playbook—PLG Loops, AI, and 1,000 Experiments</title>
      <itunes:episode>7</itunes:episode>
      <podcast:episode>7</podcast:episode>
      <itunes:title>From $1M to $35M ARR: Fyxer’s Growth Engineering Playbook—PLG Loops, AI, and 1,000 Experiments</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">02648e43-ae26-4719-8179-d2a7f0c463dc</guid>
      <link>https://share.transistor.fm/s/cf8beed6</link>
      <description>
        <![CDATA[<p>Summary</p><p>How do you drive hypergrowth without guessing? Kameron Tanseli, Head of Growth Engineering at Fyxer—an AI assistant for your email—breaks down the experimentation playbook that helped the company scale from $1M to $35M ARR, with sights set on $100–$150M. Kameron explains how startups should think about A/B testing differently: de-risk big bets, not just button colors. He shares a risk-based approach to when to run rigorous tests vs. ship-and-measure, why a 25% win rate is a sign you’re testing ambitiously, and how PLG features should be shipped first, then rapidly iterated to drive usage. You’ll hear how Fyxer uses AI to speed the entire lifecycle—Claude, Cursor desktop cloud agents, GrowthBook, and BigQuery—plus how a Slack-first changelog and an internal “AI data scientist” democratize insights. Kameron also details turning everyday product usage into growth loops, personalizing signup paths, and measuring success by movement in global ARR, not just local metrics. He closes with candid advice for new growth engineers: expect to struggle early, be T-shaped, and adopt your customer’s language.</p><p><br></p><p>Timestamps</p><p>[00:34] – Startup A/B testing mindset: de-risking big bets with only a 25% win rate</p><p>[02:45] – When to A/B test vs. ship: risk appetite, funnel stage, and non-inferiority tests</p><p>[04:43] – 360 experiments with 4 people: scaling to 1,000 using AI and Cursor cloud agents</p><p>[08:22] – Separating feature impact from momentum: PLG and trial model moves ARR</p><p>[10:29] – Ship PLG features, then iterate to drive usage; measuring DAU and revenue impact</p><p>[11:40] – Habit loops to growth loops: turning product features into PLG (scheduling case study)</p><p>[16:47] – Building an experimentation culture: founder buy-in, Slack changelog, shared data</p><p>[26:50] – The modern growth stack: Claude, Cursor, GrowthBook, BigQuery, and DOT in Slack</p><p><br></p><p>Takeaways</p><p>- Prioritize by risk: run rigorous A/B tests where you have volume; use before/after or non-inferiority for low-risk in-product changes.</p><p>- Test big levers—not just UI: pricing models, usage limits, onboarding pathways—and judge success by ARR movement, not micro-metrics.</p><p>- Ship first, then optimize: launch PLG features and immediately run experiments to increase adoption; track daily active usage per feature.</p><p>- Build growth loops from habits: design shareable artifacts and personalized signup paths; drive users back to your domain to capture value.</p><p>- Scale experimentation with AI: use Cursor desktop/cloud agents for parallel builds and visual QA; orchestrate docs/analysis via Claude; automate cleanups and reporting.</p><p>- Make experimentation company-wide: centralize data (BigQuery), broadcast wins/losses in Slack via GrowthBook, and auto-correlate metric dips to releases.</p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Summary</p><p>How do you drive hypergrowth without guessing? Kameron Tanseli, Head of Growth Engineering at Fyxer—an AI assistant for your email—breaks down the experimentation playbook that helped the company scale from $1M to $35M ARR, with sights set on $100–$150M. Kameron explains how startups should think about A/B testing differently: de-risk big bets, not just button colors. He shares a risk-based approach to when to run rigorous tests vs. ship-and-measure, why a 25% win rate is a sign you’re testing ambitiously, and how PLG features should be shipped first, then rapidly iterated to drive usage. You’ll hear how Fyxer uses AI to speed the entire lifecycle—Claude, Cursor desktop cloud agents, GrowthBook, and BigQuery—plus how a Slack-first changelog and an internal “AI data scientist” democratize insights. Kameron also details turning everyday product usage into growth loops, personalizing signup paths, and measuring success by movement in global ARR, not just local metrics. He closes with candid advice for new growth engineers: expect to struggle early, be T-shaped, and adopt your customer’s language.</p><p><br></p><p>Timestamps</p><p>[00:34] – Startup A/B testing mindset: de-risking big bets with only a 25% win rate</p><p>[02:45] – When to A/B test vs. ship: risk appetite, funnel stage, and non-inferiority tests</p><p>[04:43] – 360 experiments with 4 people: scaling to 1,000 using AI and Cursor cloud agents</p><p>[08:22] – Separating feature impact from momentum: PLG and trial model moves ARR</p><p>[10:29] – Ship PLG features, then iterate to drive usage; measuring DAU and revenue impact</p><p>[11:40] – Habit loops to growth loops: turning product features into PLG (scheduling case study)</p><p>[16:47] – Building an experimentation culture: founder buy-in, Slack changelog, shared data</p><p>[26:50] – The modern growth stack: Claude, Cursor, GrowthBook, BigQuery, and DOT in Slack</p><p><br></p><p>Takeaways</p><p>- Prioritize by risk: run rigorous A/B tests where you have volume; use before/after or non-inferiority for low-risk in-product changes.</p><p>- Test big levers—not just UI: pricing models, usage limits, onboarding pathways—and judge success by ARR movement, not micro-metrics.</p><p>- Ship first, then optimize: launch PLG features and immediately run experiments to increase adoption; track daily active usage per feature.</p><p>- Build growth loops from habits: design shareable artifacts and personalized signup paths; drive users back to your domain to capture value.</p><p>- Scale experimentation with AI: use Cursor desktop/cloud agents for parallel builds and visual QA; orchestrate docs/analysis via Claude; automate cleanups and reporting.</p><p>- Make experimentation company-wide: centralize data (BigQuery), broadcast wins/losses in Slack via GrowthBook, and auto-correlate metric dips to releases.</p><p><br></p>]]>
      </content:encoded>
      <pubDate>Thu, 02 Apr 2026 13:34:09 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/cf8beed6/44b21d50.mp3" length="29579244" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/Tsql0to9rV_1FkNzCAmK6JCuGEAMVv8TDXZh3__7_is/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9lNGM3/YjQyOGM4MDg1Mjc1/YmE2Y2I3NThjNjkz/M2UzYS5qcGc.jpg"/>
      <itunes:duration>1847</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Summary</p><p>How do you drive hypergrowth without guessing? Kameron Tanseli, Head of Growth Engineering at Fyxer—an AI assistant for your email—breaks down the experimentation playbook that helped the company scale from $1M to $35M ARR, with sights set on $100–$150M. Kameron explains how startups should think about A/B testing differently: de-risk big bets, not just button colors. He shares a risk-based approach to when to run rigorous tests vs. ship-and-measure, why a 25% win rate is a sign you’re testing ambitiously, and how PLG features should be shipped first, then rapidly iterated to drive usage. You’ll hear how Fyxer uses AI to speed the entire lifecycle—Claude, Cursor desktop cloud agents, GrowthBook, and BigQuery—plus how a Slack-first changelog and an internal “AI data scientist” democratize insights. Kameron also details turning everyday product usage into growth loops, personalizing signup paths, and measuring success by movement in global ARR, not just local metrics. He closes with candid advice for new growth engineers: expect to struggle early, be T-shaped, and adopt your customer’s language.</p><p><br></p><p>Timestamps</p><p>[00:34] – Startup A/B testing mindset: de-risking big bets with only a 25% win rate</p><p>[02:45] – When to A/B test vs. ship: risk appetite, funnel stage, and non-inferiority tests</p><p>[04:43] – 360 experiments with 4 people: scaling to 1,000 using AI and Cursor cloud agents</p><p>[08:22] – Separating feature impact from momentum: PLG and trial model moves ARR</p><p>[10:29] – Ship PLG features, then iterate to drive usage; measuring DAU and revenue impact</p><p>[11:40] – Habit loops to growth loops: turning product features into PLG (scheduling case study)</p><p>[16:47] – Building an experimentation culture: founder buy-in, Slack changelog, shared data</p><p>[26:50] – The modern growth stack: Claude, Cursor, GrowthBook, BigQuery, and DOT in Slack</p><p><br></p><p>Takeaways</p><p>- Prioritize by risk: run rigorous A/B tests where you have volume; use before/after or non-inferiority for low-risk in-product changes.</p><p>- Test big levers—not just UI: pricing models, usage limits, onboarding pathways—and judge success by ARR movement, not micro-metrics.</p><p>- Ship first, then optimize: launch PLG features and immediately run experiments to increase adoption; track daily active usage per feature.</p><p>- Build growth loops from habits: design shareable artifacts and personalized signup paths; drive users back to your domain to capture value.</p><p>- Scale experimentation with AI: use Cursor desktop/cloud agents for parallel builds and visual QA; orchestrate docs/analysis via Claude; automate cleanups and reporting.</p><p>- Make experimentation company-wide: centralize data (BigQuery), broadcast wins/losses in Slack via GrowthBook, and auto-correlate metric dips to releases.</p><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>experimentation, A/B-testing, feature-flagging, product-experimentation, experimentation-culture, data-driven-decisions, product-development, growth-engineering, startup-experimentation, PLG, product-led-growth, growth-loops, habitual-loops, viral-growth, conversion-rate-optimization, non-inferiority-testing, win-rate, experiment-velocity, AI-assisted-development, Cursor, Claude, GrowthBook, Fixer, hyper-growth, startup-growth, pricing-experiments, B2B-SaaS, onboarding-optimization, personalization, developer-productivity, experiment-automation, data-analysis, BigQuery, Slack-automation, experiment-cleanup, guardrail-metrics, before-and-after-testing, risk-appetite, exploitation-vs-exploration, T-shaped-skills, growth-culture, founder-led-experimentation, ARR-growth, series-B, CRO</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/cf8beed6/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Five Pixels That Cost LinkedIn a Million Dollars a Month: What Makram Mansour Learned as an Experimentation Leader at LinkedIn</title>
      <itunes:episode>6</itunes:episode>
      <podcast:episode>6</podcast:episode>
      <itunes:title>Five Pixels That Cost LinkedIn a Million Dollars a Month: What Makram Mansour Learned as an Experimentation Leader at LinkedIn</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">d917baa4-8cf1-4079-a812-427d04cc0afa</guid>
      <link>https://share.transistor.fm/s/316ada1b</link>
      <description>
        <![CDATA[<p>Summary</p><p>How do you build a culture where nothing ships without evidence—and leaders actually act on the data? Makram Mansour, Head of Marketplace at ID.me and former experimentation leader at LinkedIn and Intuit, shares the systems, mindsets, and guardrails behind “experimenting everywhere.” At LinkedIn, he helped support 10,000+ annual experiments with 2,000 weekly platform users, and he explains the hard-earned lessons (like a 5px UI tweak causing a million-dollar ad loss) that led to a “test before release” mandate. At Intuit, he operationalized “fail forward,” partnering with HR to rewrite OKRs so teams are rewarded for learning, not just launching. Makram breaks down why to shift from MVP to MVT (minimum viable test), how to surface leap-of-faith assumptions with PRFAQs and “unit of one” prototypes, and where AI now unlocks faster, safer front-end testing. He also details critical guardrails—cost visibility for AI infrastructure, ethical and inclusion metrics, and the people-process-technology triad—plus practical ways to remove bottlenecks via a center of excellence. If you’re starting from scratch or scaling your program, you’ll learn how to personalize responsibly at the top of the funnel, define your North Star and signposts, and stack early wins while building influence across the org.</p><p><br></p><p>Timestamps</p><p>[00:45] – Makram’s path: running experimentation at LinkedIn and Intuit, and why nothing ships without an A/B test</p><p>[02:15] – Costly lessons: 5px banner change, algorithm tweaks, and the case for rigorous guardrails</p><p>[06:40] – Leadership discipline: killing features (voice meetups, LinkedIn Stories) and changing OKRs to reward learning</p><p>[11:05] – People, process, technology: top-down and bottom-up tracks, and embedding “fail forward”</p><p>[13:40] – From MVP to MVT: validating leap-of-faith assumptions, PRFAQ, and rapid “unit of one” prototypes</p><p>[15:55] – Bottlenecks and unlocks: engineering/data science capacity, centers of excellence, and AI for fast front-end tests</p><p>[22:45] – Personalization at the top of funnel: avoid waste, design reviews, and right-size testing before building</p><p>[25:45] – Guardrail metrics that matter: AI infra costs, ethics/compliance, and fairness-by-design</p><p>[29:45] – ID.me now: zero-to-one builds, vision-to-values, North Star and leading indicators</p><p>[33:30] – How to start at a new org: crawl-walk-run, small wins, relationships, and over-communication</p><p><br></p><p>Takeaways</p><p>- Shift from MVP to MVT: list leap-of-faith assumptions and design minimum viable tests before you build.</p><p>- Institutionalize learning: align OKRs with “fail forward,” and be willing to kill low-performing features quickly.</p><p>- Build the triad: pair an easy-to-use platform with training, top-down sponsorship, and clear launch processes.</p><p>- Add real guardrails: track AI infrastructure costs, ethics/compliance, and inclusion metrics alongside growth KPIs.</p><p>- Unblock teams: create a center of excellence for data science and enable rapid variants with AI-powered tooling.</p><p>- Start small and visible: rack up quick wins, over-communicate progress, and grow influence through relationships.</p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Summary</p><p>How do you build a culture where nothing ships without evidence—and leaders actually act on the data? Makram Mansour, Head of Marketplace at ID.me and former experimentation leader at LinkedIn and Intuit, shares the systems, mindsets, and guardrails behind “experimenting everywhere.” At LinkedIn, he helped support 10,000+ annual experiments with 2,000 weekly platform users, and he explains the hard-earned lessons (like a 5px UI tweak causing a million-dollar ad loss) that led to a “test before release” mandate. At Intuit, he operationalized “fail forward,” partnering with HR to rewrite OKRs so teams are rewarded for learning, not just launching. Makram breaks down why to shift from MVP to MVT (minimum viable test), how to surface leap-of-faith assumptions with PRFAQs and “unit of one” prototypes, and where AI now unlocks faster, safer front-end testing. He also details critical guardrails—cost visibility for AI infrastructure, ethical and inclusion metrics, and the people-process-technology triad—plus practical ways to remove bottlenecks via a center of excellence. If you’re starting from scratch or scaling your program, you’ll learn how to personalize responsibly at the top of the funnel, define your North Star and signposts, and stack early wins while building influence across the org.</p><p><br></p><p>Timestamps</p><p>[00:45] – Makram’s path: running experimentation at LinkedIn and Intuit, and why nothing ships without an A/B test</p><p>[02:15] – Costly lessons: 5px banner change, algorithm tweaks, and the case for rigorous guardrails</p><p>[06:40] – Leadership discipline: killing features (voice meetups, LinkedIn Stories) and changing OKRs to reward learning</p><p>[11:05] – People, process, technology: top-down and bottom-up tracks, and embedding “fail forward”</p><p>[13:40] – From MVP to MVT: validating leap-of-faith assumptions, PRFAQ, and rapid “unit of one” prototypes</p><p>[15:55] – Bottlenecks and unlocks: engineering/data science capacity, centers of excellence, and AI for fast front-end tests</p><p>[22:45] – Personalization at the top of funnel: avoid waste, design reviews, and right-size testing before building</p><p>[25:45] – Guardrail metrics that matter: AI infra costs, ethics/compliance, and fairness-by-design</p><p>[29:45] – ID.me now: zero-to-one builds, vision-to-values, North Star and leading indicators</p><p>[33:30] – How to start at a new org: crawl-walk-run, small wins, relationships, and over-communication</p><p><br></p><p>Takeaways</p><p>- Shift from MVP to MVT: list leap-of-faith assumptions and design minimum viable tests before you build.</p><p>- Institutionalize learning: align OKRs with “fail forward,” and be willing to kill low-performing features quickly.</p><p>- Build the triad: pair an easy-to-use platform with training, top-down sponsorship, and clear launch processes.</p><p>- Add real guardrails: track AI infrastructure costs, ethics/compliance, and inclusion metrics alongside growth KPIs.</p><p>- Unblock teams: create a center of excellence for data science and enable rapid variants with AI-powered tooling.</p><p>- Start small and visible: rack up quick wins, over-communicate progress, and grow influence through relationships.</p><p><br></p>]]>
      </content:encoded>
      <pubDate>Thu, 02 Apr 2026 13:30:43 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/316ada1b/23090b42.mp3" length="36383883" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/-uQbemIFStvqUb3nBCzSJmqKvgKiJLtEuY1ZXUwJ0co/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9jOTE3/OTBjYzNlYjM2ZDkw/YTIzMTFmMzM2NTQ2/YjA4NC5wbmc.jpg"/>
      <itunes:duration>2272</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>Summary</p><p>How do you build a culture where nothing ships without evidence—and leaders actually act on the data? Makram Mansour, Head of Marketplace at ID.me and former experimentation leader at LinkedIn and Intuit, shares the systems, mindsets, and guardrails behind “experimenting everywhere.” At LinkedIn, he helped support 10,000+ annual experiments with 2,000 weekly platform users, and he explains the hard-earned lessons (like a 5px UI tweak causing a million-dollar ad loss) that led to a “test before release” mandate. At Intuit, he operationalized “fail forward,” partnering with HR to rewrite OKRs so teams are rewarded for learning, not just launching. Makram breaks down why to shift from MVP to MVT (minimum viable test), how to surface leap-of-faith assumptions with PRFAQs and “unit of one” prototypes, and where AI now unlocks faster, safer front-end testing. He also details critical guardrails—cost visibility for AI infrastructure, ethical and inclusion metrics, and the people-process-technology triad—plus practical ways to remove bottlenecks via a center of excellence. If you’re starting from scratch or scaling your program, you’ll learn how to personalize responsibly at the top of the funnel, define your North Star and signposts, and stack early wins while building influence across the org.</p><p><br></p><p>Timestamps</p><p>[00:45] – Makram’s path: running experimentation at LinkedIn and Intuit, and why nothing ships without an A/B test</p><p>[02:15] – Costly lessons: 5px banner change, algorithm tweaks, and the case for rigorous guardrails</p><p>[06:40] – Leadership discipline: killing features (voice meetups, LinkedIn Stories) and changing OKRs to reward learning</p><p>[11:05] – People, process, technology: top-down and bottom-up tracks, and embedding “fail forward”</p><p>[13:40] – From MVP to MVT: validating leap-of-faith assumptions, PRFAQ, and rapid “unit of one” prototypes</p><p>[15:55] – Bottlenecks and unlocks: engineering/data science capacity, centers of excellence, and AI for fast front-end tests</p><p>[22:45] – Personalization at the top of funnel: avoid waste, design reviews, and right-size testing before building</p><p>[25:45] – Guardrail metrics that matter: AI infra costs, ethics/compliance, and fairness-by-design</p><p>[29:45] – ID.me now: zero-to-one builds, vision-to-values, North Star and leading indicators</p><p>[33:30] – How to start at a new org: crawl-walk-run, small wins, relationships, and over-communication</p><p><br></p><p>Takeaways</p><p>- Shift from MVP to MVT: list leap-of-faith assumptions and design minimum viable tests before you build.</p><p>- Institutionalize learning: align OKRs with “fail forward,” and be willing to kill low-performing features quickly.</p><p>- Build the triad: pair an easy-to-use platform with training, top-down sponsorship, and clear launch processes.</p><p>- Add real guardrails: track AI infrastructure costs, ethics/compliance, and inclusion metrics alongside growth KPIs.</p><p>- Unblock teams: create a center of excellence for data science and enable rapid variants with AI-powered tooling.</p><p>- Start small and visible: rack up quick wins, over-communicate progress, and grow influence through relationships.</p><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>experimentation, A/B-testing, feature-flagging, product-experimentation, experimentation-culture, data-driven-decisions, product-development, experimentation-platform, North-Star-metrics, guardrail-metrics, hypothesis-testing, fail-forward, minimum-viable-test, personalization, customer-driven-innovation, experimentation-at-scale, LinkedIn, Intuit, id.me, leadership-buy-in, center-of-excellence, experiment-velocity, AI-experimentation, infrastructure-cost, ROI-measurement, product-management, growth-teams, MVT, leap-of-faith-assumptions, experimentation-maturity, crawl-walk-run, experiment-design, KPIs, HIPPO-effect, network-effects, marketplace-experimentation, sphere-of-influence, change-management, people-process-technology</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/316ada1b/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Shipping Faster, Safely: Truist’s SVP on AI, Developer Experience, and Human-in-the-Loop Banking</title>
      <itunes:episode>5</itunes:episode>
      <podcast:episode>5</podcast:episode>
      <itunes:title>Shipping Faster, Safely: Truist’s SVP on AI, Developer Experience, and Human-in-the-Loop Banking</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">91320eba-3686-4b4f-b8ae-af386d6ae881</guid>
      <link>https://share.transistor.fm/s/e0e5d644</link>
      <description>
        <![CDATA[<p>How do you boost developer velocity in a highly regulated industry—without sacrificing safety or customer trust? Charles Williams, Senior Vice President and Software Engineering Director at Truist (formed from the BB&amp;T and SunTrust merger), shares how his team elevates developer experience to ship faster and more reliably. Charles breaks down shifting quality “left” with automation, measuring success with both DORA metrics and developer sentiment, and why human-in-the-loop is non-negotiable for AI in finance. He details Truist’s governance model—steering committees, enterprise architecture, and clear guardrails—to avoid tool sprawl while building a purpose-built AI ecosystem: Microsoft Copilot for productivity, GitLab’s AI-enabled DevSecOps platform for engineering, and separate consumer-facing capabilities. Expect practical insights on starting with low-risk, high-yield use cases (unit tests, docs, security triage), tracking AI utilization, and upskilling teams in prompt engineering so developers can “manage” AI agents effectively. Charles also explores the path to personalized experiences balanced with privacy, why branches should be enhanced—not reduced—by AI, and the cultural skills leaders need now: empathy, neurodiversity awareness, and change management. He closes with where AI is driving ROI first—developer onboarding and pipeline productivity—with code quality gains following close behind.</p><p><br></p><p>Timestamps</p><p>[00:02] – Truist overview and Charles’s mandate: improving developer experience at scale</p><p>[00:56] – AI as a strategic priority; shifting quality left with automation to remove bottlenecks</p><p>[02:19] – Measuring success: DORA metrics plus sentiment—eliminating toil to drive happiness</p><p>[04:33] – Human-in-the-loop AI for high-stakes finance; customer and internal use cases</p><p>[07:25] – How Truist evaluates tools: personas, pain points, and starting with tests, docs, security</p><p>[08:35] – The stack: Microsoft Copilot, GitLab’s AI gateway approach, and tracking utilization</p><p>[10:36] – New skills and culture: prompt engineering, “managing” AI agents, and strong governance</p><p>[20:45] – What’s next: personalization vs privacy, fintech agility + bank stability, and where AI pays off now</p><p><br></p><p>Takeaways</p><p>- Shift quality left with automated checks so developers catch issues early without human gatekeeping.</p><p>- Measure DORA metrics and developer sentiment; remove mundane toil to increase speed and satisfaction.</p><p>- Keep humans in the loop for AI-assisted coding and customer answers—trust but verify in regulated contexts.</p><p>- Build an AI ecosystem with clear purposes (productivity, engineering, consumer) and a steering committee to avoid duplication.</p><p>- Start with low-risk, high-yield AI use cases—unit tests, documentation, and security triage—to build confidence and momentum.</p><p>- Upskill teams in prompt engineering and AI oversight so developers can effectively direct and review AI “agents.”</p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>How do you boost developer velocity in a highly regulated industry—without sacrificing safety or customer trust? Charles Williams, Senior Vice President and Software Engineering Director at Truist (formed from the BB&amp;T and SunTrust merger), shares how his team elevates developer experience to ship faster and more reliably. Charles breaks down shifting quality “left” with automation, measuring success with both DORA metrics and developer sentiment, and why human-in-the-loop is non-negotiable for AI in finance. He details Truist’s governance model—steering committees, enterprise architecture, and clear guardrails—to avoid tool sprawl while building a purpose-built AI ecosystem: Microsoft Copilot for productivity, GitLab’s AI-enabled DevSecOps platform for engineering, and separate consumer-facing capabilities. Expect practical insights on starting with low-risk, high-yield use cases (unit tests, docs, security triage), tracking AI utilization, and upskilling teams in prompt engineering so developers can “manage” AI agents effectively. Charles also explores the path to personalized experiences balanced with privacy, why branches should be enhanced—not reduced—by AI, and the cultural skills leaders need now: empathy, neurodiversity awareness, and change management. He closes with where AI is driving ROI first—developer onboarding and pipeline productivity—with code quality gains following close behind.</p><p><br></p><p>Timestamps</p><p>[00:02] – Truist overview and Charles’s mandate: improving developer experience at scale</p><p>[00:56] – AI as a strategic priority; shifting quality left with automation to remove bottlenecks</p><p>[02:19] – Measuring success: DORA metrics plus sentiment—eliminating toil to drive happiness</p><p>[04:33] – Human-in-the-loop AI for high-stakes finance; customer and internal use cases</p><p>[07:25] – How Truist evaluates tools: personas, pain points, and starting with tests, docs, security</p><p>[08:35] – The stack: Microsoft Copilot, GitLab’s AI gateway approach, and tracking utilization</p><p>[10:36] – New skills and culture: prompt engineering, “managing” AI agents, and strong governance</p><p>[20:45] – What’s next: personalization vs privacy, fintech agility + bank stability, and where AI pays off now</p><p><br></p><p>Takeaways</p><p>- Shift quality left with automated checks so developers catch issues early without human gatekeeping.</p><p>- Measure DORA metrics and developer sentiment; remove mundane toil to increase speed and satisfaction.</p><p>- Keep humans in the loop for AI-assisted coding and customer answers—trust but verify in regulated contexts.</p><p>- Build an AI ecosystem with clear purposes (productivity, engineering, consumer) and a steering committee to avoid duplication.</p><p>- Start with low-risk, high-yield AI use cases—unit tests, documentation, and security triage—to build confidence and momentum.</p><p>- Upskill teams in prompt engineering and AI oversight so developers can effectively direct and review AI “agents.”</p><p><br></p>]]>
      </content:encoded>
      <pubDate>Thu, 02 Apr 2026 13:24:48 -0600</pubDate>
      <author>Growthbook</author>
      <enclosure url="https://media.transistor.fm/e0e5d644/084ba601.mp3" length="26898727" type="audio/mpeg"/>
      <itunes:author>Growthbook</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/HEpKLq6MvVN6TCZylcbOgld5XS28KXeB3545kRs-JwE/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9jNTQx/Y2Q0YWM4NTEzZjBk/YmI0N2RiMDhjYjMy/YjYxMy5qcGc.jpg"/>
      <itunes:duration>1679</itunes:duration>
      <itunes:summary>
        <![CDATA[<p>How do you boost developer velocity in a highly regulated industry—without sacrificing safety or customer trust? Charles Williams, Senior Vice President and Software Engineering Director at Truist (formed from the BB&amp;T and SunTrust merger), shares how his team elevates developer experience to ship faster and more reliably. Charles breaks down shifting quality “left” with automation, measuring success with both DORA metrics and developer sentiment, and why human-in-the-loop is non-negotiable for AI in finance. He details Truist’s governance model—steering committees, enterprise architecture, and clear guardrails—to avoid tool sprawl while building a purpose-built AI ecosystem: Microsoft Copilot for productivity, GitLab’s AI-enabled DevSecOps platform for engineering, and separate consumer-facing capabilities. Expect practical insights on starting with low-risk, high-yield use cases (unit tests, docs, security triage), tracking AI utilization, and upskilling teams in prompt engineering so developers can “manage” AI agents effectively. Charles also explores the path to personalized experiences balanced with privacy, why branches should be enhanced—not reduced—by AI, and the cultural skills leaders need now: empathy, neurodiversity awareness, and change management. He closes with where AI is driving ROI first—developer onboarding and pipeline productivity—with code quality gains following close behind.</p><p><br></p><p>Timestamps</p><p>[00:02] – Truist overview and Charles’s mandate: improving developer experience at scale</p><p>[00:56] – AI as a strategic priority; shifting quality left with automation to remove bottlenecks</p><p>[02:19] – Measuring success: DORA metrics plus sentiment—eliminating toil to drive happiness</p><p>[04:33] – Human-in-the-loop AI for high-stakes finance; customer and internal use cases</p><p>[07:25] – How Truist evaluates tools: personas, pain points, and starting with tests, docs, security</p><p>[08:35] – The stack: Microsoft Copilot, GitLab’s AI gateway approach, and tracking utilization</p><p>[10:36] – New skills and culture: prompt engineering, “managing” AI agents, and strong governance</p><p>[20:45] – What’s next: personalization vs privacy, fintech agility + bank stability, and where AI pays off now</p><p><br></p><p>Takeaways</p><p>- Shift quality left with automated checks so developers catch issues early without human gatekeeping.</p><p>- Measure DORA metrics and developer sentiment; remove mundane toil to increase speed and satisfaction.</p><p>- Keep humans in the loop for AI-assisted coding and customer answers—trust but verify in regulated contexts.</p><p>- Build an AI ecosystem with clear purposes (productivity, engineering, consumer) and a steering committee to avoid duplication.</p><p>- Start with low-risk, high-yield AI use cases—unit tests, documentation, and security triage—to build confidence and momentum.</p><p>- Upskill teams in prompt engineering and AI oversight so developers can effectively direct and review AI “agents.”</p><p><br></p>]]>
      </itunes:summary>
      <itunes:keywords>experimentation, A/B-testing, feature-flagging, product-experimentation, experimentation-culture, data-driven-decisions, product-development, developer-experience, AI-in-development, enterprise-AI, code-quality, developer-productivity, shift-left-testing, CI-CD-pipelines, fintech-innovation, banking-technology, AI-governance, human-in-the-loop, change-management, experimentation-strategy, AI-in-financial-services, developer-onboarding, prompt-engineering, AI-adoption, GitLab, Truist, DORA-metrics, customer-experience, machine-learning, neurodiversity</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/e0e5d644/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>From Chatbots to Open‑World Agents: Microsoft’s Marco Casalaina on Evals, Go‑Live Metrics, and Copilot Velocity</title>
      <itunes:episode>4</itunes:episode>
      <podcast:episode>4</podcast:episode>
      <itunes:title>From Chatbots to Open‑World Agents: Microsoft’s Marco Casalaina on Evals, Go‑Live Metrics, and Copilot Velocity</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">d7943904-0cd0-4d98-bdd4-57026366eb7e</guid>
      <link>https://share.transistor.fm/s/cb699bef</link>
      <description>
        <![CDATA[<p>AI is moving so fast that what “good” looked like a few months ago is already outdated. </p><p>So how do you measure value, ship safely, and scale what works? Marco Casalaina, VP of Products, Core AI and AI Futurist at Microsoft, joins to unpack how his team builds and evaluates next‑gen AI—at hyperspeed. </p><p>Marco leads the AI Futures team and previously led Azure OpenAI, Azure Cognitive Services, Responsible AI, and AI Studio; before Microsoft, he ran Salesforce Einstein. </p><p>He explains why enterprise value is best measured by go‑lives and real usage, how Microsoft’s Foundry equips developers with agent‑specific evals (tool call accuracy, task adherence), and why old metrics like “accepted completions” don’t fit modern dev loops. </p><p>We dig into model routing now productized across model families, orchestration frameworks and the Copilot SDK, shared memory experiments, and the rise of self‑verifying agents that iterate to defined thresholds. </p><p>Expect concrete examples—from rewriting docs for coding agents to Ralph loops with browser testing—and practical advice for leaders: major in evals, set acceptable error rates by use case, and get hands‑on with the tools daily.</p><p><br></p><p><strong>Timestamps</strong></p><p>[00:45] – Guest intro and Microsoft’s enterprise AI focus</p><p>[02:07] – Measuring value: go‑lives, telemetry thresholds, and token volume</p><p>[03:43] – From chatbots to agents: Foundry evals (tool calls, task adherence) and A/B testing in Microsoft 365 Copilot</p><p>[06:17] – When “good” changes monthly: model routing productized across model families</p><p>[07:23] – Orchestration and Copilot SDK: agents that create their own tools; OpenClaw and shared memory experiments</p><p>[11:45] – Engagement redefined: coding agents read your docs; writing for agents vs. humans</p><p>[14:55] – New dev loops: why accepted completions died; Ralph loops and self‑verifying builds</p><p>[17:06] – Evals in practice and guardrails: thresholds, non‑determinism, and out‑of‑domain tests; how to keep up without burning out</p><p><br></p><p><strong>Takeaways</strong></p><p>- Measure value by go‑lives and real usage (token volume), not time in portals or playgrounds.</p><p>- Evolve evals for agents: track tool call accuracy/success and task completion/adherence; A/B test models and strategies.</p><p>- Productize adaptability with model routing to match tasks to the right model family as capabilities shift.</p><p>- Build self‑verification into workflows: pair agents with automated testing (e.g., browser runners) and iterate to thresholds, not perfection.</p><p>- Write for agents as readers: tighten documentation, ship vetted code samples, and monitor bot traffic patterns.</p><p>- Guardrail open‑world agents: add out‑of‑domain evals and explicit capability limits; set acceptable error rates based on the stakes of your use case.</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>AI is moving so fast that what “good” looked like a few months ago is already outdated. </p><p>So how do you measure value, ship safely, and scale what works? Marco Casalaina, VP of Products, Core AI and AI Futurist at Microsoft, joins to unpack how his team builds and evaluates next‑gen AI—at hyperspeed. </p><p>Marco leads the AI Futures team and previously led Azure OpenAI, Azure Cognitive Services, Responsible AI, and AI Studio; before Microsoft, he ran Salesforce Einstein. </p><p>He explains why enterprise value is best measured by go‑lives and real usage, how Microsoft’s Foundry equips developers with agent‑specific evals (tool call accuracy, task adherence), and why old metrics like “accepted completions” don’t fit modern dev loops. </p><p>We dig into model routing now productized across model families, orchestration frameworks and the Copilot SDK, shared memory experiments, and the rise of self‑verifying agents that iterate to defined thresholds. </p><p>Expect concrete examples—from rewriting docs for coding agents to Ralph loops with browser testing—and practical advice for leaders: major in evals, set acceptable error rates by use case, and get hands‑on with the tools daily.</p><p><br></p><p><strong>Timestamps</strong></p><p>[00:45] – Guest intro and Microsoft’s enterprise AI focus</p><p>[02:07] – Measuring value: go‑lives, telemetry thresholds, and token volume</p><p>[03:43] – From chatbots to agents: Foundry evals (tool calls, task adherence) and A/B testing in Microsoft 365 Copilot</p><p>[06:17] – When “good” changes monthly: model routing productized across model families</p><p>[07:23] – Orchestration and Copilot SDK: agents that create their own tools; OpenClaw and shared memory experiments</p><p>[11:45] – Engagement redefined: coding agents read your docs; writing for agents vs. humans</p><p>[14:55] – New dev loops: why accepted completions died; Ralph loops and self‑verifying builds</p><p>[17:06] – Evals in practice and guardrails: thresholds, non‑determinism, and out‑of‑domain tests; how to keep up without burning out</p><p><br></p><p><strong>Takeaways</strong></p><p>- Measure value by go‑lives and real usage (token volume), not time in portals or playgrounds.</p><p>- Evolve evals for agents: track tool call accuracy/success and task completion/adherence; A/B test models and strategies.</p><p>- Productize adaptability with model routing to match tasks to the right model family as capabilities shift.</p><p>- Build self‑verification into workflows: pair agents with automated testing (e.g., browser runners) and iterate to thresholds, not perfection.</p><p>- Write for agents as readers: tighten documentation, ship vetted code samples, and monitor bot traffic patterns.</p><p>- Guardrail open‑world agents: add out‑of‑domain evals and explicit capability limits; set acceptable error rates based on the stakes of your use case.</p>]]>
      </content:encoded>
      <pubDate>Tue, 17 Mar 2026 01:00:00 -0600</pubDate>
      <author>Ashley Stirrup</author>
      <enclosure url="https://media.transistor.fm/cb699bef/ce3b7b9b.mp3" length="27294738" type="audio/mpeg"/>
      <itunes:author>Ashley Stirrup</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/9a4onDNsv69nArjMHqcRfRJdZQqkm4fNo5_aSSeEHB4/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8yYWJj/ZjUxNGVjMmYyNmY1/ZTE2MmQxN2Y3MTdi/NzEwOS5qcGc.jpg"/>
      <itunes:duration>1706</itunes:duration>
      <itunes:summary>AI is moving so fast that what “good” looked like a few months ago is already outdated. So how do you measure value, ship safely, and scale what works? Marco Casalaina, VP of Products, Core AI and AI Futurist at Microsoft, joins to unpack how his team builds and evaluates next‑gen AI—at hyperspeed. Marco leads the AI Futures team and previously led Azure OpenAI, Azure Cognitive Services, Responsible AI, and AI Studio; before Microsoft, he ran Salesforce Einstein. He explains why enterprise value is best measured by go‑lives and real usage, how Microsoft’s Foundry equips developers with agent‑specific evals (tool call accuracy, task adherence), and why old metrics like “accepted completions” don’t fit modern dev loops. We dig into model routing now productized across model families, orchestration frameworks and the Copilot SDK, shared memory experiments, and the rise of self‑verifying agents that iterate to defined thresholds. Expect concrete examples—from rewriting docs for coding agents to Ralph loops with browser testing—and practical advice for leaders: major in evals, set acceptable error rates by use case, and get hands‑on with the tools daily.Timestamps[00:45] – Guest intro and Microsoft’s enterprise AI focus[02:07] – Measuring value: go‑lives, telemetry thresholds, and token volume[03:43] – From chatbots to agents: Foundry evals (tool calls, task adherence) and A/B testing in Microsoft 365 Copilot[06:17] – When “good” changes monthly: model routing productized across model families[07:23] – Orchestration and Copilot SDK: agents that create their own tools; OpenClaw and shared memory experiments[11:45] – Engagement redefined: coding agents read your docs; writing for agents vs. humans[14:55] – New dev loops: why accepted completions died; Ralph loops and self‑verifying builds[17:06] – Evals in practice and guardrails: thresholds, non‑determinism, and out‑of‑domain tests; how to keep up without burning outTakeaways- Measure value by go‑lives and real usage (token volume), not time in portals or playgrounds.- Evolve evals for agents: track tool call accuracy/success and task completion/adherence; A/B test models and strategies.- Productize adaptability with model routing to match tasks to the right model family as capabilities shift.- Build self‑verification into workflows: pair agents with automated testing (e.g., browser runners) and iterate to thresholds, not perfection.- Write for agents as readers: tighten documentation, ship vetted code samples, and monitor bot traffic patterns.- Guardrail open‑world agents: add out‑of‑domain evals and explicit capability limits; set acceptable error rates based on the stakes of your use case.</itunes:summary>
      <itunes:subtitle>AI is moving so fast that what “good” looked like a few months ago is already outdated. So how do you measure value, ship safely, and scale what works? Marco Casalaina, VP of Products, Core AI and AI Futurist at Microsoft, joins to unpack how his team bui</itunes:subtitle>
      <itunes:keywords>experimentation, A/B-testing, feature-flagging, product-experimentation, experimentation-culture, data-driven-decisions, product-development, Microsoft, Marco Casalaina, Copilot, AI-agents, model-evaluation, go-live-metrics, task-completion, tool-call-accuracy, groundedness-testing, coherence-evaluation, fluency-metrics, model-routing, Azure-OpenAI, AI-governance, responsible-AI, LLM-evaluation, evaluation-thresholds, token-volume, agentic-workflows, error-tolerance, self-verification, Ralph-loop, eval-metrics, task-adherence, non-deterministic-testing</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/cb699bef/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Experimentation at Scale: Upwork’s VP of Engineering on Blast Radius, CPQI, and AI-Driven Ops</title>
      <itunes:episode>3</itunes:episode>
      <podcast:episode>3</podcast:episode>
      <itunes:title>Experimentation at Scale: Upwork’s VP of Engineering on Blast Radius, CPQI, and AI-Driven Ops</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">1d8c0760-23e3-4439-bed9-82f7481748bb</guid>
      <link>https://share.transistor.fm/s/73141a1f</link>
      <description>
        <![CDATA[<p><strong>Summary</strong></p><p>What do you test rigorously—and what do you ship fast and fix forward—when every change could impact millions? </p><p>Vinoj Kumar, Vice President of Engineering at Upwork, leads at the intersection of infrastructure and product, where feedback loops are longer and the blast radius is wider. </p><p>He shares a pragmatic framework for experimentation—blast radius x reversibility—that sets testing rigor, plus how he measures success in product terms: faster search, resilient marketplace trust, and developer velocity. Vinoj explains why “high engagement” can mask low-quality experiences, how his team instrumented an internal NL chatbot with turns-to-success and downstream signals (like fewer JIRA tickets), and how a composite metric—cost per quality inference (CPQI)—aligns finance, engineering, and data science by uniting cloud costs, performance, and model accuracy. </p><p>He details where AI is already paying off (build pipelines, incident detection, testing), how to monitor model drift post-launch, and why some wins on paper must be killed in production to protect trust—like a high-hit-rate caching project that surfaced stale profile data. </p><p>Expect concrete practices: shadow traffic, slow canaries, synthetic staging that mirrors reality, feature flags, LLMs-as-judges, and the mindset to tie infrastructure to business outcomes.</p><p><strong>Timestamps</strong></p><p>[00:45] – Guest intro: Infrastructure meets product—and why experimentation looks different</p><p>[01:36] – Deciding what to test: blast radius x reversibility; canaries, shadow traffic, ship-and-monitor</p><p>[03:09] – Defining “good”: internal dev metrics vs. marketplace outcomes—and when engagement lies</p><p>[06:23] – Case study: “Talk to Data” chatbot—thumbs, turns-to-success, and reduced JIRA tickets</p><p>[09:45] – CPQI: a composite metric for cost, performance, and model quality that breaks silos</p><p>[16:55] – AI in engineering: build-time gains, MTTR/MTTD, agentic testing, and drift monitoring</p><p>[24:06] – The caching miss: 92% hit rate, stale data, trust risks—and what to do instead</p><p>[29:12] – Career advice: balance stability with bold experiments; always link infra to business value</p><p><br></p><p><strong>Takeaways</strong></p><p>- Decide testing rigor with blast radius x reversibility; reserve heavy testing for irreversible, high-impact systems.</p><p>- Measure quality by efficiency and success ratio—not raw clicks or query counts.</p><p>- Instrument NL tools with “turns to success” and track downstream impact (e.g., fewer ad hoc data tickets).</p><p>- Build composite metrics (e.g., CPQI) to align finance, engineering, and data science around shared outcomes.</p><p>- Use AI to accelerate builds, detect incidents sooner, and evaluate models; watch MTTR and MTTD.</p><p>- Treat ML features as living systems: feature-flag rollouts, realistic staging, drift monitoring, and LLM-as-judge evaluations—and be willing to kill “wins” that erode trust.</p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p><strong>Summary</strong></p><p>What do you test rigorously—and what do you ship fast and fix forward—when every change could impact millions? </p><p>Vinoj Kumar, Vice President of Engineering at Upwork, leads at the intersection of infrastructure and product, where feedback loops are longer and the blast radius is wider. </p><p>He shares a pragmatic framework for experimentation—blast radius x reversibility—that sets testing rigor, plus how he measures success in product terms: faster search, resilient marketplace trust, and developer velocity. Vinoj explains why “high engagement” can mask low-quality experiences, how his team instrumented an internal NL chatbot with turns-to-success and downstream signals (like fewer JIRA tickets), and how a composite metric—cost per quality inference (CPQI)—aligns finance, engineering, and data science by uniting cloud costs, performance, and model accuracy. </p><p>He details where AI is already paying off (build pipelines, incident detection, testing), how to monitor model drift post-launch, and why some wins on paper must be killed in production to protect trust—like a high-hit-rate caching project that surfaced stale profile data. </p><p>Expect concrete practices: shadow traffic, slow canaries, synthetic staging that mirrors reality, feature flags, LLMs-as-judges, and the mindset to tie infrastructure to business outcomes.</p><p><strong>Timestamps</strong></p><p>[00:45] – Guest intro: Infrastructure meets product—and why experimentation looks different</p><p>[01:36] – Deciding what to test: blast radius x reversibility; canaries, shadow traffic, ship-and-monitor</p><p>[03:09] – Defining “good”: internal dev metrics vs. marketplace outcomes—and when engagement lies</p><p>[06:23] – Case study: “Talk to Data” chatbot—thumbs, turns-to-success, and reduced JIRA tickets</p><p>[09:45] – CPQI: a composite metric for cost, performance, and model quality that breaks silos</p><p>[16:55] – AI in engineering: build-time gains, MTTR/MTTD, agentic testing, and drift monitoring</p><p>[24:06] – The caching miss: 92% hit rate, stale data, trust risks—and what to do instead</p><p>[29:12] – Career advice: balance stability with bold experiments; always link infra to business value</p><p><br></p><p><strong>Takeaways</strong></p><p>- Decide testing rigor with blast radius x reversibility; reserve heavy testing for irreversible, high-impact systems.</p><p>- Measure quality by efficiency and success ratio—not raw clicks or query counts.</p><p>- Instrument NL tools with “turns to success” and track downstream impact (e.g., fewer ad hoc data tickets).</p><p>- Build composite metrics (e.g., CPQI) to align finance, engineering, and data science around shared outcomes.</p><p>- Use AI to accelerate builds, detect incidents sooner, and evaluate models; watch MTTR and MTTD.</p><p>- Treat ML features as living systems: feature-flag rollouts, realistic staging, drift monitoring, and LLM-as-judge evaluations—and be willing to kill “wins” that erode trust.</p>]]>
      </content:encoded>
      <pubDate>Wed, 11 Mar 2026 01:00:00 -0600</pubDate>
      <author>Ashley Stirrup</author>
      <enclosure url="https://media.transistor.fm/73141a1f/0bc3f76d.mp3" length="29436325" type="audio/mpeg"/>
      <itunes:author>Ashley Stirrup</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/fXyB_SNb_FkVOwrGaNEkppqbSZ5XrZGC3oLDC2u8Rr8/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8zZWY3/MzNlZjg2YzRiNDY5/MDg2OTEwMjQ1OTM2/MjIxNC5qcGc.jpg"/>
      <itunes:duration>1840</itunes:duration>
      <itunes:summary>SummaryWhat do you test rigorously—and what do you ship fast and fix forward—when every change could impact millions? Vinoj Kumar, Vice President of Engineering at Upwork, leads at the intersection of infrastructure and product, where feedback loops are longer and the blast radius is wider. He shares a pragmatic framework for experimentation—blast radius x reversibility—that sets testing rigor, plus how he measures success in product terms: faster search, resilient marketplace trust, and developer velocity. Vinoj explains why “high engagement” can mask low-quality experiences, how his team instrumented an internal NL chatbot with turns-to-success and downstream signals (like fewer JIRA tickets), and how a composite metric—cost per quality inference (CPQI)—aligns finance, engineering, and data science by uniting cloud costs, performance, and model accuracy. He details where AI is already paying off (build pipelines, incident detection, testing), how to monitor model drift post-launch, and why some wins on paper must be killed in production to protect trust—like a high-hit-rate caching project that surfaced stale profile data. Expect concrete practices: shadow traffic, slow canaries, synthetic staging that mirrors reality, feature flags, LLMs-as-judges, and the mindset to tie infrastructure to business outcomes.Timestamps[00:45] – Guest intro: Infrastructure meets product—and why experimentation looks different[01:36] – Deciding what to test: blast radius x reversibility; canaries, shadow traffic, ship-and-monitor[03:09] – Defining “good”: internal dev metrics vs. marketplace outcomes—and when engagement lies[06:23] – Case study: “Talk to Data” chatbot—thumbs, turns-to-success, and reduced JIRA tickets[09:45] – CPQI: a composite metric for cost, performance, and model quality that breaks silos[16:55] – AI in engineering: build-time gains, MTTR/MTTD, agentic testing, and drift monitoring[24:06] – The caching miss: 92% hit rate, stale data, trust risks—and what to do instead[29:12] – Career advice: balance stability with bold experiments; always link infra to business valueTakeaways- Decide testing rigor with blast radius x reversibility; reserve heavy testing for irreversible, high-impact systems.- Measure quality by efficiency and success ratio—not raw clicks or query counts.- Instrument NL tools with “turns to success” and track downstream impact (e.g., fewer ad hoc data tickets).- Build composite metrics (e.g., CPQI) to align finance, engineering, and data science around shared outcomes.- Use AI to accelerate builds, detect incidents sooner, and evaluate models; watch MTTR and MTTD.- Treat ML features as living systems: feature-flag rollouts, realistic staging, drift monitoring, and LLM-as-judge evaluations—and be willing to kill “wins” that erode trust.</itunes:summary>
      <itunes:subtitle>SummaryWhat do you test rigorously—and what do you ship fast and fix forward—when every change could impact millions? Vinoj Kumar, Vice President of Engineering at Upwork, leads at the intersection of infrastructure and product, where feedback loops are l</itunes:subtitle>
      <itunes:keywords>experimentation, A/B-testing, feature-flagging, product-experimentation, experimentation-culture, data-driven-decisions, product-development, Upwork, Vinoj Kumar, blast-radius, CPQI, infrastructure-optimization, platform-engineering, search-relevance, machine-learning-models, cost-per-quality-inference, performance-metrics, developer-experience, marketplace-metrics, freelancer-matching, inference-cost, model-accuracy, latency-optimization, quality-measurement, natural-language-tools, talk-to-data, turns-to-success, friction-measurement, composite-metrics, infrastructure-scaling, AI-agents, build-efficiency</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/73141a1f/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>From Spreadsheets to AI: How Moxie Pest Control Boosted Conversions 5% with Data and Call Coaching</title>
      <itunes:episode>2</itunes:episode>
      <podcast:episode>2</podcast:episode>
      <itunes:title>From Spreadsheets to AI: How Moxie Pest Control Boosted Conversions 5% with Data and Call Coaching</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">8bfa7c21-9558-48ed-89b2-1515c3644d0d</guid>
      <link>https://share.transistor.fm/s/366bbe93</link>
      <description>
        <![CDATA[<p>Think pest control isn’t a digital business? Think again. </p><p>Raj Mehta, Vice President of Product and Technology at Moxie Pest Control, outlines how he turned a spreadsheet-run operation into a data-driven engine across 9,000+ daily calls. </p><p>Raj shares how consolidating fragmented systems into a data lake unlocked automation—from shrinking lead routing from 20–25 minutes to under 30 seconds—to deploying AI-powered call intelligence that scores every sales and retention conversation against a playbook. He breaks down a practical roadmap for traditional businesses: build MVPs, pilot in one branch with a trained feedback team, iterate fast, then scale. </p><p>You’ll hear how he positioned AI as a growth amplifier (not a job cutter), the difference between deterministic automation and LLM use cases, and the measurable impact: a 5% lift in conversion that compounds in a recurring-revenue model. Plus, Raj’s concise advice for leaders bringing AI into operations without breaking trust or momentum.</p><p><br></p><p><strong>Timestamps</strong></p><p>[00:45] – Guest intro: Raj Mehta, Moxie’s tech transformation, and 9,000+ daily calls</p><p>[01:20] – Starting point: 90% of ops in spreadsheets; why a data lake became the foundation</p><p>[02:45] – Automating time-to-lead: 25 minutes to &lt;30 seconds and a 5% conversion lift</p><p>[04:27] – Roadmap design: MVPs, single-branch pilots, and scaling what works</p><p>[06:05] – Culture building: framing AI as growth and upskilling, not headcount cuts</p><p>[07:34] – Two lanes of automation: deterministic scripts vs. LLM-driven workflows</p><p>[08:45] – Call intelligence: scoring every sales/retention call and coaching at scale</p><p>[14:05] – Impact and advice: recurring revenue compounding and Raj’s playbook for getting started</p><p><br></p><p><strong>Takeaways</strong></p><p>- Build a single source of truth (data lake) to power automation and AI reliably.</p><p>- Cut time-to-lead with workflow automation and track the downstream impact on conversion.</p><p>- Pilot in one branch with a trained “feedback team,” iterate, then roll out—don’t scale too soon.</p><p>- Position AI as a growth multiplier; retain and upskill top performers to shape the culture.</p><p>- Separate deterministic automation from LLM use cases; do deep discovery with frontline teams.</p><p>- Use AI call intelligence to score every call against your playbook, surface coaching themes, and save manager time.</p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>Think pest control isn’t a digital business? Think again. </p><p>Raj Mehta, Vice President of Product and Technology at Moxie Pest Control, outlines how he turned a spreadsheet-run operation into a data-driven engine across 9,000+ daily calls. </p><p>Raj shares how consolidating fragmented systems into a data lake unlocked automation—from shrinking lead routing from 20–25 minutes to under 30 seconds—to deploying AI-powered call intelligence that scores every sales and retention conversation against a playbook. He breaks down a practical roadmap for traditional businesses: build MVPs, pilot in one branch with a trained feedback team, iterate fast, then scale. </p><p>You’ll hear how he positioned AI as a growth amplifier (not a job cutter), the difference between deterministic automation and LLM use cases, and the measurable impact: a 5% lift in conversion that compounds in a recurring-revenue model. Plus, Raj’s concise advice for leaders bringing AI into operations without breaking trust or momentum.</p><p><br></p><p><strong>Timestamps</strong></p><p>[00:45] – Guest intro: Raj Mehta, Moxie’s tech transformation, and 9,000+ daily calls</p><p>[01:20] – Starting point: 90% of ops in spreadsheets; why a data lake became the foundation</p><p>[02:45] – Automating time-to-lead: 25 minutes to &lt;30 seconds and a 5% conversion lift</p><p>[04:27] – Roadmap design: MVPs, single-branch pilots, and scaling what works</p><p>[06:05] – Culture building: framing AI as growth and upskilling, not headcount cuts</p><p>[07:34] – Two lanes of automation: deterministic scripts vs. LLM-driven workflows</p><p>[08:45] – Call intelligence: scoring every sales/retention call and coaching at scale</p><p>[14:05] – Impact and advice: recurring revenue compounding and Raj’s playbook for getting started</p><p><br></p><p><strong>Takeaways</strong></p><p>- Build a single source of truth (data lake) to power automation and AI reliably.</p><p>- Cut time-to-lead with workflow automation and track the downstream impact on conversion.</p><p>- Pilot in one branch with a trained “feedback team,” iterate, then roll out—don’t scale too soon.</p><p>- Position AI as a growth multiplier; retain and upskill top performers to shape the culture.</p><p>- Separate deterministic automation from LLM use cases; do deep discovery with frontline teams.</p><p>- Use AI call intelligence to score every call against your playbook, surface coaching themes, and save manager time.</p><p><br></p>]]>
      </content:encoded>
      <pubDate>Thu, 05 Mar 2026 00:00:00 -0700</pubDate>
      <author>Ashley Stirrup</author>
      <enclosure url="https://media.transistor.fm/366bbe93/58926313.mp3" length="16394655" type="audio/mpeg"/>
      <itunes:author>Ashley Stirrup</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/2xwSt78q-6Po5bNPd6tH1uLi_4fSqlrSGaQxS3_izcQ/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8yYmY4/OTExZjM0ZGYzOTNl/MmZjMDY1MGZkMThk/NzExZC5qcGc.jpg"/>
      <itunes:duration>1025</itunes:duration>
      <itunes:summary>Think pest control isn’t a digital business? Think again. Raj Mehta, Vice President of Product and Technology at Moxie Pest Control, outlines how he turned a spreadsheet-run operation into a data-driven engine across 9,000+ daily calls. Raj shares how consolidating fragmented systems into a data lake unlocked automation—from shrinking lead routing from 20–25 minutes to under 30 seconds—to deploying AI-powered call intelligence that scores every sales and retention conversation against a playbook. He breaks down a practical roadmap for traditional businesses: build MVPs, pilot in one branch with a trained feedback team, iterate fast, then scale. You’ll hear how he positioned AI as a growth amplifier (not a job cutter), the difference between deterministic automation and LLM use cases, and the measurable impact: a 5% lift in conversion that compounds in a recurring-revenue model. Plus, Raj’s concise advice for leaders bringing AI into operations without breaking trust or momentum.Timestamps[00:45] – Guest intro: Raj Mehta, Moxie’s tech transformation, and 9,000+ daily calls[01:20] – Starting point: 90% of ops in spreadsheets; why a data lake became the foundation[02:45] – Automating time-to-lead: 25 minutes to &amp;lt;30 seconds and a 5% conversion lift[04:27] – Roadmap design: MVPs, single-branch pilots, and scaling what works[06:05] – Culture building: framing AI as growth and upskilling, not headcount cuts[07:34] – Two lanes of automation: deterministic scripts vs. LLM-driven workflows[08:45] – Call intelligence: scoring every sales/retention call and coaching at scale[14:05] – Impact and advice: recurring revenue compounding and Raj’s playbook for getting startedTakeaways- Build a single source of truth (data lake) to power automation and AI reliably.- Cut time-to-lead with workflow automation and track the downstream impact on conversion.- Pilot in one branch with a trained “feedback team,” iterate, then roll out—don’t scale too soon.- Position AI as a growth multiplier; retain and upskill top performers to shape the culture.- Separate deterministic automation from LLM use cases; do deep discovery with frontline teams.- Use AI call intelligence to score every call against your playbook, surface coaching themes, and save manager time.</itunes:summary>
      <itunes:subtitle>Think pest control isn’t a digital business? Think again. Raj Mehta, Vice President of Product and Technology at Moxie Pest Control, outlines how he turned a spreadsheet-run operation into a data-driven engine across 9,000+ daily calls. Raj shares how con</itunes:subtitle>
      <itunes:keywords>experimentation, A/B-testing, feature-flagging, product-experimentation, experimentation-culture, data-driven-decisions, product-development, Moxie Pest Control, Raj Mehta, data-consolidation, call-center-optimization, lead-conversion, automation, call-coaching, AI-powered-insights, NPS-scores, customer-retention, sales-playbook, workflow-automation, MVP-testing, phased-rollout, cultural-buy-in, AI-adoption, operational-efficiency, phone-system-automation, lead-scoring, conversation-analysis, team-enablement, scaling-automation</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/366bbe93/transcript.txt" type="text/plain"/>
    </item>
    <item>
      <title>Stop Running Experiments, Start Earning Them</title>
      <itunes:episode>1</itunes:episode>
      <podcast:episode>1</podcast:episode>
      <itunes:title>Stop Running Experiments, Start Earning Them</itunes:title>
      <itunes:episodeType>full</itunes:episodeType>
      <guid isPermaLink="false">b6e9183d-a2bb-416f-aa6c-30ba920c78c0</guid>
      <link>https://share.transistor.fm/s/bd2615c2</link>
      <description>
        <![CDATA[<p>If 80% of A/B tests fail, how do you de-risk decisions that touch pricing, product, and brand? Aleksandra (Aleks) Bass, Chief Product &amp; Technology Officer at Typeform, shares how her team “earns the right to A/B test” with medical-grade rigor—moving from literature reviews and user tests to simulated trials before exposing changes to customers. She details Typeform’s repositioning from “forms” to an AI engagement platform—and the pricing and packaging bet behind it: a 15% drop in new business count offset by a 32% increase in ASP and a 25% lift in annual attach. </p><p>Aleks unpacks how Typeform AI acts as a co-pilot that doubled activation and boosted one-day conversion, plus the design shifts (CTA altitude and onboarding) that increased adoption. She also reveals why they moved video features down-tier and what a head-to-head test showed: video interviewers generated 14x more words and 10x fewer skipped questions with comparable completion time. </p><p>Finally, Aleks breaks down the cultural side—eliminating “anti-knowledge,” standardizing experiment design, and creating a cross-functional review that prevents false learnings—along with how her data engineering team evaluates LLMs for quality, latency, and trust.</p><p><br></p><p>Timestamps</p><p>[00:45] – Rethinking experimentation: “earn the right to A/B test” with staged rigor  </p><p>[03:31] – Pricing and packaging shift: from forms to flows, ASP up 32%, annual attach up 25%  </p><p>[05:34] – Typeform AI as a co-pilot: doubling activation and lifting one-day conversion  </p><p>[07:27] – Adoption lessons: elevating AI CTAs and reducing friction to use  </p><p>[10:02] – Behind the scenes: model selection, quality bars, and why MVP can backfire in AI  </p><p>[12:40] – Moving video down-tier: demand signals, cannibalization checks, and net gains  </p><p>[14:52] – Video vs. standard forms: 14x more words, 10x fewer skips, similar completion time  </p><p>[20:22] – Building an experimentation culture: process resistance, “anti-knowledge,” and cross-functional review  </p><p>[30:16] – Leader playbook: visibility, empathy, and incentives for rigorous testing</p><p><br></p><p>Takeaways</p><p>- Implement a staged experimentation funnel—discovery, simulation, then customer A/B—to reduce risk.  </p><p>- Use pricing experiments to trade volume for revenue quality; pair higher monthly prices with stronger annual discounts to grow annual attach.  </p><p>- Treat AI as an activation lever: elevate AI-first CTAs and streamline onboarding to boost adoption.  </p><p>- Add video interviewer options to increase response richness (14x more words) while keeping completion rates steady.  </p><p>- Enforce experiment hygiene: change one variable at a time, randomize at the right unit (account vs. user), and run long enough for effect size.  </p><p>- Purge “anti-knowledge” by standardizing design, instituting cross-functional reviews, and only codifying learnings supported by repeatable data.</p><p><br></p>]]>
      </description>
      <content:encoded>
        <![CDATA[<p>If 80% of A/B tests fail, how do you de-risk decisions that touch pricing, product, and brand? Aleksandra (Aleks) Bass, Chief Product &amp; Technology Officer at Typeform, shares how her team “earns the right to A/B test” with medical-grade rigor—moving from literature reviews and user tests to simulated trials before exposing changes to customers. She details Typeform’s repositioning from “forms” to an AI engagement platform—and the pricing and packaging bet behind it: a 15% drop in new business count offset by a 32% increase in ASP and a 25% lift in annual attach. </p><p>Aleks unpacks how Typeform AI acts as a co-pilot that doubled activation and boosted one-day conversion, plus the design shifts (CTA altitude and onboarding) that increased adoption. She also reveals why they moved video features down-tier and what a head-to-head test showed: video interviewers generated 14x more words and 10x fewer skipped questions with comparable completion time. </p><p>Finally, Aleks breaks down the cultural side—eliminating “anti-knowledge,” standardizing experiment design, and creating a cross-functional review that prevents false learnings—along with how her data engineering team evaluates LLMs for quality, latency, and trust.</p><p><br></p><p>Timestamps</p><p>[00:45] – Rethinking experimentation: “earn the right to A/B test” with staged rigor  </p><p>[03:31] – Pricing and packaging shift: from forms to flows, ASP up 32%, annual attach up 25%  </p><p>[05:34] – Typeform AI as a co-pilot: doubling activation and lifting one-day conversion  </p><p>[07:27] – Adoption lessons: elevating AI CTAs and reducing friction to use  </p><p>[10:02] – Behind the scenes: model selection, quality bars, and why MVP can backfire in AI  </p><p>[12:40] – Moving video down-tier: demand signals, cannibalization checks, and net gains  </p><p>[14:52] – Video vs. standard forms: 14x more words, 10x fewer skips, similar completion time  </p><p>[20:22] – Building an experimentation culture: process resistance, “anti-knowledge,” and cross-functional review  </p><p>[30:16] – Leader playbook: visibility, empathy, and incentives for rigorous testing</p><p><br></p><p>Takeaways</p><p>- Implement a staged experimentation funnel—discovery, simulation, then customer A/B—to reduce risk.  </p><p>- Use pricing experiments to trade volume for revenue quality; pair higher monthly prices with stronger annual discounts to grow annual attach.  </p><p>- Treat AI as an activation lever: elevate AI-first CTAs and streamline onboarding to boost adoption.  </p><p>- Add video interviewer options to increase response richness (14x more words) while keeping completion rates steady.  </p><p>- Enforce experiment hygiene: change one variable at a time, randomize at the right unit (account vs. user), and run long enough for effect size.  </p><p>- Purge “anti-knowledge” by standardizing design, instituting cross-functional reviews, and only codifying learnings supported by repeatable data.</p><p><br></p>]]>
      </content:encoded>
      <pubDate>Tue, 24 Feb 2026 11:03:37 -0700</pubDate>
      <author>Ashley Stirrup</author>
      <enclosure url="https://media.transistor.fm/bd2615c2/3ee4fe19.mp3" length="30641157" type="audio/mpeg"/>
      <itunes:author>Ashley Stirrup</itunes:author>
      <itunes:image href="https://img.transistorcdn.com/UZhp9goFTvSyc1XTO5WXWcFsjet5PU9wOU_jC80Wh9A/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS80ZmE4/OGM2MDAyNDU4MjJl/NDEwNTgzN2RmNTM1/MDJkOS5qcGc.jpg"/>
      <itunes:duration>1915</itunes:duration>
      <itunes:summary>If 80% of A/B tests fail, how do you de-risk decisions that touch pricing, product, and brand? Aleksandra (Aleks) Bass, Chief Product &amp;amp; Technology Officer at Typeform, shares how her team “earns the right to A/B test” with medical-grade rigor—moving from literature reviews and user tests to simulated trials before exposing changes to customers. She details Typeform’s repositioning from “forms” to an AI engagement platform—and the pricing and packaging bet behind it: a 15% drop in new business count offset by a 32% increase in ASP and a 25% lift in annual attach. Aleks unpacks how Typeform AI acts as a co-pilot that doubled activation and boosted one-day conversion, plus the design shifts (CTA altitude and onboarding) that increased adoption. She also reveals why they moved video features down-tier and what a head-to-head test showed: video interviewers generated 14x more words and 10x fewer skipped questions with comparable completion time. Finally, Aleks breaks down the cultural side—eliminating “anti-knowledge,” standardizing experiment design, and creating a cross-functional review that prevents false learnings—along with how her data engineering team evaluates LLMs for quality, latency, and trust.Timestamps[00:45] – Rethinking experimentation: “earn the right to A/B test” with staged rigor  [03:31] – Pricing and packaging shift: from forms to flows, ASP up 32%, annual attach up 25%  [05:34] – Typeform AI as a co-pilot: doubling activation and lifting one-day conversion  [07:27] – Adoption lessons: elevating AI CTAs and reducing friction to use  [10:02] – Behind the scenes: model selection, quality bars, and why MVP can backfire in AI  [12:40] – Moving video down-tier: demand signals, cannibalization checks, and net gains  [14:52] – Video vs. standard forms: 14x more words, 10x fewer skips, similar completion time  [20:22] – Building an experimentation culture: process resistance, “anti-knowledge,” and cross-functional review  [30:16] – Leader playbook: visibility, empathy, and incentives for rigorous testingTakeaways- Implement a staged experimentation funnel—discovery, simulation, then customer A/B—to reduce risk.  - Use pricing experiments to trade volume for revenue quality; pair higher monthly prices with stronger annual discounts to grow annual attach.  - Treat AI as an activation lever: elevate AI-first CTAs and streamline onboarding to boost adoption.  - Add video interviewer options to increase response richness (14x more words) while keeping completion rates steady.  - Enforce experiment hygiene: change one variable at a time, randomize at the right unit (account vs. user), and run long enough for effect size.  - Purge “anti-knowledge” by standardizing design, instituting cross-functional reviews, and only codifying learnings supported by repeatable data.</itunes:summary>
      <itunes:subtitle>If 80% of A/B tests fail, how do you de-risk decisions that touch pricing, product, and brand? Aleksandra (Aleks) Bass, Chief Product &amp;amp; Technology Officer at Typeform, shares how her team “earns the right to A/B test” with medical-grade rigor—moving f</itunes:subtitle>
      <itunes:keywords>experimentation, A/B-testing, feature-flagging, product-experimentation, experimentation-culture, data-driven-decisions, product-development, Typeform, Alex Bass, AI-form-generation, survey-data, pricing-strategy, activation-metrics, statistical-rigor, hypothesis-testing, research-methodology, medical-experimentation-approach, user-testing, conversion-rate-optimization, annual-attach-rate, AI-adoption, form-completion, customer-satisfaction, response-quality, interview-experience, video-forms, accessibility-testing, experimentation-standards, cross-functional-teams, anti-knowledge, belief-validation, humility-in-data</itunes:keywords>
      <itunes:explicit>No</itunes:explicit>
      <podcast:transcript url="https://share.transistor.fm/s/bd2615c2/transcript.txt" type="text/plain"/>
    </item>
  </channel>
</rss>
