Origin
Why Morum AI exists.
GitHub Issue #46188: Pro Max subscriber: quality regression cost me days of work and real money
Filed April 10, 2026. Closed as “stale, not planned.”
What actually happened
I asked an AI model a direct infrastructure question: can my Supabase Micro compute tier handle a bulk-matching pipeline across 5.3M product records? I got a confident, specific answer. Yes, it can. I relied on it. I built days of pipeline execution on top of it.
It was wrong.
Not wrong in a way that was obviously wrong. Wrong in a way that sounded right enough that I didn't question it. The model used the right vocabulary, arrived at a specific conclusion, and delivered it with the tonal markers of expert analysis. The underlying reasoning didn't support the conclusion. Basic capacity planning math (IOPS limits, write amplification across 21 indexes) never got done. The answer was structurally persuasive but diagnostically empty.
When the failure surfaced (pipeline crawling at 25 minutes per 25K-row chunk), the model's response was hand-waving: “Supabase IO is the bottleneck.” No research into what that actually meant. I had to push four separate times in a single session before getting the analysis that should have been the starting point. The research took five minutes once actually performed. It should have been done on the first ask.
The combination of wrong sizing advice, undiagnosed IO bottleneck, and silently skipped pipeline phases cost approximately 4–5 days of execution time.
Why this matters beyond one incident
I'm not a naive user who got tricked by a chatbot.
I spent 24 years in management consulting, including time as a Managing Director at Accenture. I ran application maintenance and support for Bristol-Myers Squibb. I know what production systems look like under stress. I know what good analysis looks like. I know when I'm being given a real answer versus a confident-sounding placeholder.
And it still got me.
The model didn't hallucinate. It didn't generate nonsense. It produced an output that used the right technical vocabulary, arrived at a defensible-sounding conclusion, and delivered it with enough confidence that I acted on it without verifying the math underneath. That's not a hallucination problem. That's a behavioral integrity problem.
If it happened to me, with my background, on a task I understood well enough to eventually catch the failure, it is happening to teams across every industry that has moved past AI experimentation and into operational reliance. The difference is that most of those teams don't file a ticket. They don't catch it. The flawed reasoning propagates into decisions, recommendations, and actions that nobody traces back to a model output that sounded right but wasn't.
The patterns this issue demonstrates
Authority Laundering. The model presented infrastructure sizing conclusions with the formatting and tonal markers of expert analysis the underlying reasoning did not support. It looked like a specialist assessed the architecture. The confidence was manufactured, not earned.
Confidence Persistence.The wrong answer repeated across multiple sessions. I asked multiple times whether the compute tier could handle the workload. The model said yes, repeatedly. New context didn't update the conclusion. The confidence persisted independent of the evidence.
Shallow-Default Behavior. Every response defaulted to surface-level analysis. Depth only appeared after sustained human pressure. The five-minute research that resolved the issue was available from the first ask. The model chose the shallow path until forced off it.
The founding thesis
I built Morum AI because this happened to me. The most dangerous AI failure isn't the wrong answer. It's the answer that sounds right enough that someone acts on it. Hallucination is the failure mode everyone talks about because it's the easiest one to catch. The harder failure is the response that uses the right vocabulary and arrives just confident enough that nobody pauses to check whether the reasoning actually holds.
That's what AI Behavioral Intelligence diagnoses. Not whether the model can do the task. Whether the reasoning holds up under the weight of the decisions being made on top of it.
I built this firm because this exact failure happened to me. Here's the GitHub issue.Here's what it cost. I'm a 24-year management consulting veteran, and a model's confident-sounding but structurally empty analysis cost me almost a week of work on my own project. Now imagine that same pattern running silently through your compliance workflow, your financial analysis pipeline, or your clinical decision support system. Except nobody files a ticket, because nobody notices.