bets

Falsifiable beliefs about AI, systems, and leadership - things I believe that I could be wrong about.

9 Bets

Bets

These are beliefs I hold that could be wrong. Not obvious truths or safe consensus positions - actual bets where I'm taking a side that reasonable people might disagree with.

I'm writing them down because it's easy to have vague opinions and retrofit explanations later. Specific, falsifiable claims are harder to hide from. If I'm wrong about something here, I'd like to know.

01
AI safety that cannot be operationalized is mostly theater.
The questions are real - alignment, deception, sandbagging. But if your safety approach can't be measured, tested, or deployed, it's philosophy, not engineering. When I built sandbagging detection probes, the point was to get a classifier that actually works - 90%+ accuracy across multiple model families. It turns out you can detect strategic underperformance at the activation level. The discussions matter, but they need to eventually become code that runs in production.
02
Memory bandwidth, not compute, is the real constraint in LLM deployment.
Most conversations about LLM infrastructure focus on GPU compute and FLOPS. But when you actually profile inference, the GPU is idle most of the time, waiting for model weights to load from memory. On an A100, loading a 7B model takes ~7ms; the computation takes ~0.1ms. We're memory-bound by two orders of magnitude. This changes which optimizations matter - speculative decoding works precisely because we're paying the memory tax anyway. Understanding this constraint shapes everything I build.
03
The next wave of AI safety will be representation-level tooling, not RLHF.
RLHF was the first serious attempt at alignment, and it worked well enough to ship products. But it's a blunt instrument - you're shaping outputs without understanding what's happening inside the model. The next wave will be tools that operate on internal representations: probing for deception, steering activations, detecting when a model "knows" something it's not saying. My work on activation probing and steering vectors is a bet on this direction.
04
The player-coach model works better than pure management for deep-tech teams.
Conventional wisdom says senior leaders should step back from hands-on work and focus on strategy and people. I think this is wrong for technical organizations. When I stopped writing code, I made worse architectural decisions because I lost touch with real constraints. When I started again, I got better at calling bullshit on unrealistic timelines and knowing when to push versus back off. The right ratio shifts as you grow, but zero is the wrong number.
05
Consent architecture will be as fundamental as security architecture.
Most consent systems today are legal checkboxes. But as AI systems train on user content, act as agents on users' behalf, and make decisions with user data, consent becomes an architectural concern - not a compliance one. You need to track provenance, enforce permissions at runtime, handle revocation gracefully. I've been working on this since OConsent in 2021, and more recently with LLMConsent. Companies that build consent into their foundations will have an advantage when regulations catch up.
06
The infrastructure around the model matters more than the model itself.
There's a narrative that the biggest model wins - that differentiation comes from having the best foundation model. I don't think this holds for most enterprise applications. The model is maybe 10% of the system. The other 90% is data pipelines, evaluation infrastructure, serving, monitoring, and operational glue. Most AI projects fail not because the model doesn't work, but because everything around the model doesn't work. This is why I spend more time on evaluation frameworks and data processing than on model architecture.
07
LLM-augmented ETL is the next frontier of enterprise data engineering.
Traditional ETL is brittle - it breaks when schemas change, when source systems evolve, when edge cases multiply. LLMs can handle ambiguity, interpret intent, and adapt to variation in ways that rule-based systems can't. The ETLC framework I worked on is an early version of this: using language models not to replace pipelines, but to make them more robust and self-healing. I expect this pattern to become standard within a few years.
08
Robots will be bottlenecked by data and motor inference pipelines, not reasoning.
The popular narrative is that once we solve AGI, robotics will follow. I think the constraint is elsewhere. Getting sensor data processed fast enough, running inference at the edge with tight latency budgets, coordinating motor control in real-time - these are harder engineering problems than high-level reasoning for most robotic applications. The same memory bandwidth constraints that limit LLM inference will limit embodied AI. The winners will be teams that solve the data and inference pipeline, not the ones waiting for better foundation models.
09
Multi-agent architectures will become the default for enterprise AI.
Single-model approaches hit limits when problems require different types of reasoning, access to different tools, or coordination across domains. The agent frameworks emerging now are clunky, but the underlying pattern - specialized agents collaborating with clear handoffs and fallbacks - will become standard. I've seen this in enterprise deployments: the systems that actually work aren't single brilliant models, they're orchestrated collections of specialized capabilities. ARTEMIS and CatchMe are bets on this architecture.
Last updated: January 2026