✨ Heartcore Insights
Edition #135
Hi there,
Welcome to the 135th edition of Heartcore Insights, curated with 🖤 by the Heartcore Team.
If you missed the past newsletters, you can catch up here. Now, let’s dive in!
Reinforcement Learning is Re-writing AI, and Europe has a Seat at the Table
For most of the half-decade, the story of AI progress was simple: make the model bigger, feed it more data and performance improves. However, recent frontier models are posting smaller benchmark gains despite eye-watering compute budgets and high-quality training data is running out. Anthropic CEO Dario Amodei said in 2025, “two years ago, we thought there was this fundamental obstacle around reasoning. Turned out just to be RL.”
Reinforcement learning is the second scaling axis that pre-training alone couldn’t provide, generating its own training signal through feedback rather than consuming ever more human-written data.
A 70-year-old Idea whose Moment has Arrived
RL isn’t new. Richard Bellman laid its foundations in the 1950s, Sutton and Barto formalised temporal-difference learning in the 1980s which won the 2025 Turing Award for that foundational work (and humbled Dwarkesh in a podcast appearance last year).
DeepMind’s 2013 Atari paper reignited it, AlphaGo shocked the world in 2016, and AlphaFold earned the 2024 Nobel Prize in Chemistry. Still, these were domain-specific wins.
The real unlock came when RL collided with language models. Standard language models learn by predicting the next word, sophisticated pattern matching at enormous scale. That works well, but it has a ceiling. You can’t pattern-match your way to solving a hard math problem you’ve never seen before and pre-training, by itself, teaches imitation rather than teaching reasoning.
RL changes the equation fundamentally. At its core, RL is about trial and error. An agent tries things, gets rewarded or penalised at the end of its run (up or down weighted), and gradually learns what works.
Instead of imitating human text, models trained with RL learn to reason toward correct answers, verified against objective outcomes like code that compiles, proofs that check out, or answers that match a ground truth. This approach, called Reinforcement Learning from Verifiable Rewards (RLVR), generates training signals automatically with no human annotators needed. Less dependence on curated data, less reliance on ever-larger pretrained models. This gets a little more tricky for non-verifiable tasks, where the fall-back are basically judge models trained from RLHF, RLAIF or other methods, but the principle still remains.
From Research Labs to Enterprise
Europe has the chance to be a leader in the space:
Paris-based Adaptive ML has built a dedicated RLOps platform already deployed by AT&T and SK Telecom.
Mistral’s Magistral confirmed in recent research, that a moderately-sized model fine-tuned with a single phase of RL on the right domain can consistently outperform larger general-purpose models. A 14B parameter model trained with RL on biological reasoning can outperform GPT-4 class models on those tasks at a fraction of the inference cost.
DeepMind’s RL system cut Google’s data centre cooling costs by 40%.
London’s InstaDeep (acquired by BioNTech for £562M) identified 12 of 13 WHO-flagged COVID variants two months ahead of official designation.
Wayve raised $1.2B in early 2026 at an $8.6B valuation, training autonomous vehicles with end-to-end RL across 500+ cities.
Mistral AI raised €2B with RL-based reasoning models. On the emerging side, Isomorphic Labs is applying RLVR to drug discovery and a flood of startups are addressing “RL for code generation”, thanks to tight verifiable feedback loops, and recommendation systems are being rebuilt around long-horizon reward signals rather than clicks.
And countless more very promising early-stage companies across Europe.
Emerging RL Categories Worth Watching
RL for code has already gone mainstream due to tight feedback loop and clear verifiable objectives, making it one of the cleanest domains for RL to shine. Beyond the obvious use cases, several newer RL applications are gaining serious traction:
Multi-agent RL, systems where multiple agents learn to coordinate in real time, is moving into logistics, energy grids and autonomous fleets.
Scientific discovery is arguably the most exciting frontier. RLVR-trained agents are now automating multi-step research tasks (hypothesis generation, experimental design, data analysis) using computational verification as the reward signal.
Personalisation and recommendation systems are getting an RL refresh too. Rather than optimising for click-through rates, next-generation systems are training agents with long-horizon reward signals (user retention, satisfaction and real-world outcomes) producing qualitatively different behaviour to traditional recommendation algorithms. Next will be real world simulation and feeding it as behavioural context into world models.
Bottomline: RL is having a moment (again, or still?), and Europe should be the main stage! As always if you are building in this space, we’d love to talk.
~ Bodi Tent, Associate, Heartcore Capital
Web4.0, Sigil Wen
Lessons of a First-Time Fund Manager, The Generalist
The AI Productivity Paradox: High Adoption, Low Transformation, Inference by Sequoia Capital
A Primer on Data Centers, Generative Value, Eric Flaningam
We are near the end of the exponential, Dwarkesh Podcast
The Benchmark Partnership, Uncapped #41
🇪🇺 Notable European early-stage rounds
Paraglide, a Sweden-based agentic AI startup focused on accounts receivable, raises $5M with Bessemer Ventures - link
Co-reactive, a Germany-based developer of CO2-negative materials tech, raises €6.5M with HTGF - link
Electric Twin, a UK-based developer of synthetic audiences, raises $10M with Atomico/LocalGlobe - link
Capalo AI 🖤, a Finland-based battery storage and renewable asset optimiser, raises €11M with Heartcore Capital/Tesi - link
Stacks, a Netherlands-based agentic platform for enterprise finance, raises $23M with Lightspeed - link
Duna, a Netherlands-based AI-native business identity platform, raises €30M with CapitalG - link
Onodrim Industries, a Netherlands-based defence-tech and supply resiliency startup, raises €40M with FoundersFund/Lakestar - link
🇺🇸 Notable US early-stage rounds
Sapiom, a startup providing financial infrastructure for autonomous AI transactions, raises $15M with Accel - link
Moab, an equipment rental and dealership software developer, raises $16M with Elad Gil - link
Meridian, a provider of AI-powered workspace for financial modeling, raises $17M with A16Z - link
Complyance, a provider of AI agents that automate enterprise governance and compliance, raises $20M with Lightspeed - link
Pasito, a workspace for group health, life and retirement benefits, raises $21M with Insight Partners - link
Lotus Health AI, a developer of AI doctors, raises $35M with CRV/Kleiner Perkins - link
Monaco, a provider of AI agents that automate enterprise sales workflows, raises $35M with FoundersFund - link
🔭 Notable later stage rounds
RobCo, a Germany-based developer of industrial robots, raises $100M with Lightspeed - link
Synthesia, a UK-based B2B AI video generation platform, raises $200M with GV - link
Runway, a US-based AI video startup, raises $315M with General Atlantic - link
ElevenLabs, a UK-based AI voice generation platform for creators and enterprises, raises $500M with Sequoia - link
Cerebras Systems, a US-based developer of wafer-scale AI chips and compute services, raises $1B with Tiger Global - link
Wayve, a UK-based builder of autonomous driving software and AI models, raises $1.2B with Eclipse/Balderton/SoftBank Vision - link
Databricks, a US-based cloud platform to build data and AI systems, raises $5B with Goldman Sachs - link
Waymo, a US-based autonomous robotaxi service operator, raises $16B with Dragoneer/DST/Sequoia - link
Anthropic, a US-based AI lab behind Claude, raises $30B with Dragoneer/ FoundersFund/ ICONIQ - link
🖤 Heartcore News
Huge congratulations to Capalo AI 🖤 on raising €11M in a Series A led by us and participation from TESI and others to scale Europe’s battery storage optimization layer! ⚡
Kive 🖤 has blown past $6.5M ARR with just 7 people! Open roles here. 🚀
Our Web3 portfolio company Arcium’s 🖤 CEO and founder Yannik Schrade, appeared on The Tucker Carlson Show to talk about privacy, encryption and the future of computing. 🔥🎙️
Signe is featured in KapitalWatch in a series spotlighting women in venture capital! 🪽
Thank you for being a loyal subscriber of our monthly Insights. Please feel free to share this newsletter with anyone you’d think would appreciate it!




