Fix or Rebuild? — March 20, 2026

Three models, one question

We asked Grok, Gemini, and Claude the same question with the same context: given everything that's broken, should we fix the bot or start over?

Grok 3 Fix It "Infrastructure is 80% there"

Gemini 2.5 Pro Gut Reno "Keep plumbing, rebuild brain"

Claude Opus 4.6 Fix It "Bugs are in ~2 files, not the architecture"

What We Asked

The prompt

All three models received identical context: the bot's architecture, current status, all 5 confirmed logic bugs, what works well, and the dual-instance history.

Shared prompt (abbreviated)

I have a weather prediction market trading bot on Kalshi (real money). I need your honest assessment: should I fix it or start over? The bot: 225 ensemble members from 6 weather models, bets on 7 US cities, $2 flat bets, 30 lifetime trades, 10W-8L (55.6%), +$3.71 P&L, hasn't traded in 8 days. 5 confirmed logic bugs: 1. Probability is overconfident (raw ensemble counting, no calibration) 2. Settlement SQL bind-parameter mismatch (6 values, 7 placeholders) 3. Settlement uses different bracket logic than trading engine 4. Uses > instead of >= for threshold contracts 5. Depth check too strict (requires 5 at ask, only buying 2-3) What works: Kalshi API auth, daemon loop, risk guardrails, Discord, SQLite logging, clean module structure. Additional: Ran on 2 machines for 10 days (unknown true P&L), zero tests, dead Python v1 files, possibly wrong Chicago station. Should I fix it or start over?

Response 01

Grok 3

✦

Grok 3

xAI · grok-3

Fix It

Why Fix It?

Infrastructure is Solid. The core operational components — Kalshi API integration, daemon loop, scheduling, risk guardrails, logging, and notifications — are working well. These are the hardest parts to get right in a trading bot, especially with real money on the line. Rewriting from scratch risks reintroducing subtle bugs in these areas, and you'd burn time rebuilding what already works.

Bugs are Localized. The 5 logic bugs, including the critical probability overconfidence, are concentrated in a small subset of the codebase (~2 files). Fixing these is likely faster than a full rewrite. The settlement issues (SQL mismatch, bracket logic) are straightforward coding errors, not architectural flaws. The probability calibration is a bigger lift but still addressable within the existing framework.

Low Capital at Risk. With a balance of ~$17 and a net P&L of +$3.71 on $40 deployed, the financial stakes are minimal. This is essentially a live test environment. Fixing it now lets you iterate and learn without significant loss, whereas starting over delays getting real-world feedback.

Time to Market. A rewrite could take weeks to months. Fixing the current bot gets you trading again sooner, which is critical for validating whether your edge is real after calibration.

Why Not Start Over?

Foundational Issues Aren't Fatal. The core issue — overconfident probabilities due to raw ensemble counting — is a common problem in weather modeling and can be mitigated with post-processing (e.g., bias correction, variance inflation, or historical calibration). This doesn't require a full rewrite; it's a module-level fix.

Learning Opportunity. Fixing this bot forces you to confront the specific failures and build better practices incrementally. A rewrite might let you avoid these lessons, only to repeat them later.

Cost-Benefit. With only 30 trades and minimal P&L, the bot hasn't proven itself enough to justify the investment of a full rewrite. If you fix it and still see no edge after 100 trades, then you can reassess with better data.

Conditions for Fixing

Fix the Core Bugs First. Prioritize probability calibration and settlement logic. Don't trade until these are resolved.
Add Basic Tests. Write unit tests for the probability engine and settlement logic before deploying fixes. This isn't optional.
Reconcile Trade History. Manually sync your DB with Kalshi's records to establish true P&L. Stop running on multiple machines.
Adjust SD Filter. Recalibrate based on spring climatology or disable temporarily while you fix the probability engine.
Set a Stop-Loss and Review Point. Define a clear limit (e.g., $50 loss or 50 more trades) to reassess. If P&L doesn't improve post-fix, consider a rewrite then.

Bottom Line

Fix it, but treat this as a probationary period. Address the 5 bugs, add minimal testing, reconcile P&L, and recalibrate the SD filter. If after 50-100 more trades you're still breakeven or losing with no clear path to profitability, then start over with a focus on a robust probability engine and test-driven development. You're playing with small money — use this as a learning lab.

Response 02

Gemini 2.5 Pro

✷

Gemini 2.5 Pro

Google · gemini-2.5-pro

Gut Renovation

Why Not "Fix It"

The fundamental problem isn't the number of bugs; it's the nature of the bugs. Your bot has a compromised brain and a broken memory.

The Brain is Flawed. The core of any trading bot is its edge. Your bot's edge comes from its probability calculation, but that calculation is fundamentally, conceptually wrong. Raw ensemble counting is notoriously overconfident. This isn't a simple off-by-one error; it's a GIGO (Garbage In, Garbage Out) problem at the heart of the strategy. "Fixing" this requires a complete redesign of the probability model, not just a patch.

The Memory is Corrupt. The bot cannot correctly determine if it won or lost, and it can't even save the result to the database due to the SQL bug. The P&L of +$3.71 is effectively a random number. Running it on two machines means your local SQLite database is permanently out of sync with Kalshi's records. A bot that cannot learn from its past is doomed.

Death by a Thousand Cuts. The combination of > vs >= errors, inconsistent win logic between modules, and flawed liquidity checks points to a deeper issue: a lack of rigor during initial development. Fixing these five bugs will likely uncover five more. You'll be playing whack-a-mole, and with real money on the line, that's dangerous.

What "Start Over" Looks Like: The Ship of Theseus

"Start over" doesn't mean deleting the folder and running npm init. It means you strategically rebuild the core while salvaging the parts that work.

Archive the current code — Create a legacy branch. Push it. Start fresh on main.
Set up a clean project — New directory with package.json, a testing framework (Vitest), and a linter. This is your clean workshop.
Port the plumbing, don't rewrite it — Copy over the proven modules: Kalshi API auth, daemon loop, risk guards, Discord, database schema. This saves 50-70% of the work.
Rebuild the brain with TDD — Start with settlement tests that force correct >= and bracket logic. Then tackle the probability engine: instead of counting members, model the ensemble as a distribution (mean + SD) and use the CDF. Test it before integration.
Reconcile true P&L — Write a one-time script using the Kalshi API to fetch all actual trades and settlements.
Clean up — Delete dead Python files. Verify Chicago station. Fix depth check.

Bottom Line

The choice isn't between a quick fix and a long rewrite. It's between building on a foundation of sand or a foundation of bedrock. The "Fix It" path leads to constant anxiety, wondering what other bugs are lurking. The "Gut Renovation" path leads to a system you can trust.

Given that your money is on the line, trust is everything. Salvage the solid infrastructure, but rebuild the core decision-making and reporting logic from scratch with a rigorous, test-driven approach.

Response 03

Claude Opus 4.6

✦

Claude Opus 4.6

Anthropic · claude-opus-4-6

Fix It

Why Fix

The bugs are concentrated, not systemic. The 5 confirmed issues live in essentially 2 files (forecast.js probability math + checker.js settlement logic), plus a few one-line fixes in rules.js. The architecture, API integrations, daemon loop, risk management, Discord, and storage are all solid and would be painful to rebuild.

Probability calibration is additive, not a rewrite. The fix is adding a calibration layer on top of the existing ensemble counting — Platt scaling, isotonic regression, or even just modeling the ensemble as a Gaussian distribution and using the CDF. This wraps the existing computeWeightedProbability function; it doesn't replace the architecture.

The other 4 bugs are trivial. > to >= is one character. The SQL bind mismatch is adding one parameter. The bracket settlement logic is aligning one function. The depth check is changing one comparison. Combined: maybe 20 lines of code.

Starting over means rebuilding the boring stuff. RSA-PSS auth against Kalshi's API, orderbook parsing, fill checking, METAR polling, systemd integration, Discord webhook formatting — all of this works and would take days to rewrite. You'd be spending time on plumbing instead of on the actual problem (probability quality).

The Plan

Fix the 4 trivial bugs — >=, SQL bind, bracket settlement, depth check. One session.
Add tests for probability math and settlement — Before touching the calibration, lock down what "correct" looks like.
Add probability calibration — Use the 171-day backtest data to fit a calibration curve. Even simple Platt scaling (logistic regression on raw probability vs outcome) would close the 70% → 55% gap significantly.
Reconcile with Kalshi — Pull actual order history, determine true P&L.
Trade for a week, then evaluate. If calibrated probabilities still don't match reality after real data, then the approach itself is wrong and a rebuild is justified.

Bottom Line

Fix it. The bugs are real but fixable. The infrastructure is the hard part and it works. A rebuild gives you a cleaner codebase but the same probability problem to solve — just in a new directory. Spend the time on calibration, not scaffolding.

Synthesis

Where they agree

The real split isn't Fix vs. Rebuild

All three models actually agree on more than they disagree. Gemini's "gut renovation" is essentially "fix it but more carefully" — it explicitly says to keep the API auth, daemon, risk guards, and Discord. The real disagreement is about how much ceremony to wrap around the fixes.

✓ Keep the infrastructure — All three say the API auth, daemon loop, risk guardrails, Discord, and storage are solid and shouldn't be rewritten.
✓ The probability engine is the #1 problem — Raw ensemble counting is overconfident. All three recommend calibration (Platt scaling, CDF modeling, or variance inflation).
✓ Add tests before fixing — All three say unit tests for probability math and settlement are non-negotiable.
✓ Reconcile with Kalshi — Pull actual trade history from the API. The local DB cannot be trusted.
✓ Set a checkpoint — Don't fix and hope forever. After 50-100 more trades, evaluate whether the approach works.
✓ Fix the 4 trivial bugs immediately — >=, SQL bind, bracket logic, depth check. No debate needed.
✓ Clean up the dead weight — Remove Python v1 files, verify Chicago station, fix .env permissions.

The bottom line from all three: the architecture isn't the problem, the math is. Whether you fix it in place or port it to a clean project, you're solving the same probability calibration challenge. The fastest path to knowing if this strategy works is to fix the known bugs, add calibration, and get it trading again.