This case study examines the StyleSwipe React Native codebase where coding agents repeatedly struggled to fix visual bugs in card swiping and collection gameplay. StyleSwipe is a mobile game combining fashion industry simulation with Tinder-style card swiping—players swipe to accept or pass on fashion industry contacts to build their asset collection. Over 3 weeks, agents made 6 attempts to fix a single race condition, introduced days of follow-up bugs after a major refactor, and left 30+ defensive guard comments warning future agents what not to do.
Executive Summary
Key Finding
Visual bugs are fundamentally different from logic bugs. They exist in timing gaps between async systems, manifest for only 1-2 frames, and often require counter-intuitive fixes (don’t reset values, keep stale data, add delays). Agents trained on logic bugs repeatedly fail at visual bugs because their pattern-matching doesn’t apply.
Evidence Summary
| Evidence Type | Finding |
|---|---|
| Git History | Same bug “fixed” 3 times over 3 weeks (throttle/cooldown) |
| Code Comments | 30+ “re-check latest” guards for race conditions |
| Complexity | 47 synchronization points, 4 timing domains, 6-state machine |
| Success Rate | Visual bugs: 30-40% first-time fix vs. 95% for typing errors |
Why Agents Fail
- Symptom-First Diagnosis - Fix where error appears, not where it originates
- Adding vs. Removing - Agents add guards; successful fixes remove code
- Magic Number Tinkering - Adjust milliseconds until symptoms disappear
- Shotgun Debugging - Change 6+ files hoping one works
- Partial Fix Acknowledgment - Commit knowing it’s incomplete
What Success Looks Like
Successful fixes (30-40% of visual bug attempts) share these traits:
| Successful Pattern | Failed Pattern |
|---|---|
| 1-3 files changed | 4+ files scattered |
| Describes mechanism in commit | Describes symptom |
| Removes code | Adds guards |
| Uses library primitives | Invents custom timing |
| No follow-ups in 7 days | Follow-ups same day |
The Vicious Cycle
Agent doesn't understand system → Makes localized fix
↓
Localized fix breaks something else → Another agent adds guard
↓
Guards accumulate → System becomes harder to understand
↓
Next agent understands even less → Makes even more localized fix
↓
Repeat
Quantified Impact
- Throttle bug: 3 “fixes” across 3 weeks, still broken
- Two-stack refactor: 2,914 lines added, buttons broke 3 hours later
- Guard proliferation: “Re-check latest” appears 30+ times in one file
- Watchdog timers: 4 different 1500ms+ timeouts as band-aids
Part 1: Evidence of Struggle - Git History Patterns
Note: Bracketed tags like [add-sync-logic] indicate the fix strategy employed for each commit, making patterns like escalation or repeated attempts visible.
Pattern A: Same-Day Fix Cycles
Example: “App Top Bar Not Updating” (Week 2)
| Elapsed | Approach | Description |
|---|---|---|
| T+0 | [add-sync-logic] | ”fixed app top bar not updating connections and introductions” (+45 lines) |
| T+2h 20m | [full-rewrite] | ”fix networking swipes not updating apptopbar’s introductions and connections” (-146 lines, +38 lines) |
What Happened: The first fix only addressed part of the problem. 2.5 hours later, a complete rewrite was needed - the agent deleted 146 lines and rewrote the approach entirely.
Example: State Synchronization Cascade (Week 2, next day)
Six commits in 6 hours, each discovering the previous fix was incomplete:
| Elapsed | Approach | What It Fixed | What It Broke/Missed |
|---|---|---|---|
| T+0 | [data-mapping-fix] | ”map oddsForTop to previous oddsForPeek” | Modal rendered when no card |
| T+37m | [error-handling] | ”explicit client handling for draw-required errors” | Screen didn’t rehydrate on focus |
| T+1h 24m | [rehydration-fix] | ”ensure networking screen rehydrates when returning” | Still had race conditions |
| T+2h 47m | [partial-fix] | ”PARTIAL fixes to networking swipe state” | Still using untyped state access |
| T+3h 4m | [type-safety] | ”Added typed resource selectors” | Root cause still not fixed |
| T+6h 14m | [root-cause-fix] | ”enforce latest-response wins and gate side-effects” | Finally the real fix |
What Happened: Each agent (or the same agent in multiple sessions) fixed symptoms rather than the root cause. The real issue was a race condition where server responses could arrive out of order - but this wasn’t diagnosed until the 6th attempt.
Pattern B: Major Refactor Immediately Breaks Something
Example: Two-Stack System Introduction (Week 3, overnight session)
| Elapsed | Approach | Description |
|---|---|---|
| T+0 | [architectural-refactor] | ”networking stack replaced by two-stack system” (+2,914 lines, 14 new files) |
| T+3h | [emergency-fix] | ”fix swipe buttons not responding” |
| T+3h 45m | [follow-up-fix] | ”Added consistent haptic feedback” |
What Happened: A massive architectural refactor to solve animation issues immediately broke basic button functionality. The agent had to scramble to fix touch handling that worked fine in the old system.
The Following Day: A [late-regression-fix] was still addressing “card overlay persistence and cooldown gating logic” - side effects from the original refactor.
Pattern C: Commits Named “Partial” or “Attempt”
[partial-fix] | partial fixes to networking swipe state issues
[defensive-hardening] | hardening on swipe system
[type-alignment] | fix typing errors, add temporary momentum values
[type-alignment] | fix more typing errors
[type-alignment] | type fixes
The word “partial” and repetitive “fix more” commits indicate agents knew they weren’t fully solving the problem but pushed anyway, hoping subsequent passes would complete the fix.
Part 2: Evidence of Struggle - Defensive Code Comments
The codebase is filled with warnings left by agents for future agents. These comments document hard-won knowledge about what NOT to do.
Category: “Don’t Reset Transforms”
Card Stack Component (ghost cleanup)
// Do not reset ghost transforms here — resetting to 0 while still mounted
// causes a brief flash at center with overlay before unmount. The transforms
// will be reinitialized by runGhostAnimation for the next swipe.
What This Reveals: An agent tried the obvious fix (reset transforms to 0 when done) and it caused a visual flash. The workaround is counter-intuitive: leave stale transform values and let unmounting handle cleanup.
Category: “Watchdog Timers for Deadlocks”
Card Stack Component (deadlock watchdog)
// Safety watchdog: if we remain in 'handoff' but the authoritative top does not
// update within a short window, unlock interactions to avoid deadlock.
// This covers rare cases where the server reports success but the top card
// remains the same (or stale) for longer than expected, leaving the stack
// non-interactive.
What This Reveals: The system can enter a state where the UI is permanently locked. Rather than fixing the root cause, agents added a 1500ms watchdog timer to force-unlock. This is a band-aid, not a cure.
Category: “Race Condition Guards”
Swipe Logic Hook - The phrase “latest check” or “re-check latest” appears 30+ times:
// Check if draw response is latest before applying side-effects
// Re-check latest before counter sync to avoid races
// Re-check latest before UI side-effects
// Secondary guards on fetch state before any updates
// Triple-check before side-effects to prevent mid-flight races
What This Reveals: Agents couldn’t prevent race conditions architecturally, so they added guard checks at every possible point. The code is littered with defensive if (isLatest) checks because agents couldn’t reason about when responses would arrive.
Category: “Cursor Identity Confusion”
Card Stack Component (card identity tracking)
// Uniqueness is determined by deck cursor, not id. Prefer cursor comparison
// when available; if cursors are absent, this is an error condition.
What This Reveals: Cards can have the same ID across different decks (the same card definition can appear multiple times at different positions). Agents initially used card IDs to track which card was swiped away, causing bugs when the same card appeared again. This comment documents the hard-learned rule: use position-based deck cursors (server-owned indices), not card IDs.
Category: “Don’t Do This or Everything Breaks”
Swipe Overlay Component (hooks ordering rule)
// All animated styles must be declared before any conditional returns
Swipe Logic Hook (draw loop prevention)
// Avoid repeated draw attempts until we detect introductions > 0
// (introductions are a consumable resource required to draw new cards)
drawBlockedByNoIntrosRef.current = true;
What This Reveals: Agents discovered landmines through trial and error. React hooks ordering rules, infinite retry loops, animation state contamination - each comment represents a bug that was fixed and then documented to prevent regression.
Part 3: Why Visual Bugs Are Harder Than Logic Bugs
The Complexity Metrics
| Metric | Value | Why It Matters |
|---|---|---|
| State machine states | 6 | Each state has different visibility/interaction rules |
| State transitions | 9+ | Must be tested in combination |
| Timing domains | 4 | Redux, Reanimated, setTimeout, requestAnimationFrame |
| Shared values | 6 | ghostX, ghostY, ghostRot, peekScale, peekOffsetY, promote |
| Imperative refs | 7+ | topCardRef, ghostCardRef, peekSnapshotRef, etc. |
| Synchronization points | 47 | Places where systems must coordinate |
| Race condition vectors | 7+ | Documented in code comments |
The Four Timing Domains Problem
Visual bugs often involve timing mismatches between:
- Redux/Server State - Updates when server responds (0-2000ms latency)
- Reanimated Worklets - JavaScript functions that run on the native UI thread (16ms frame budget)
- setTimeout Callbacks - JavaScript event loop (variable timing)
- requestAnimationFrame - Vsync-aligned (16.67ms intervals)
Example Bug: “Card flashes at center after swipe”
- Swipe completes → Reanimated sets ghostX to final position
- Server responds → Redux updates top card
- React re-renders → Ghost component still mounted for 1 frame
- Ghost transform resets to 0 → Flash at center
- Next frame → Ghost unmounts
The bug exists in the gap between timing domains. Fixing it requires understanding that Reanimated animations complete on UI thread while React re-renders on JS thread, and they don’t synchronize automatically.
The Snapshot Pattern (Emergent Solution)
The codebase evolved a pattern that agents discovered through iteration:
Problem: When the top card flies away and the peek card (the next card visible behind it) should promote, there’s a gap where:
- Top card data is stale (swiped away)
- New card data hasn’t arrived from server
- UI shows nothing or wrong card for 1-3 frames
Solution: Snapshot the peek card’s data at swipe initiation, render snapshot during animation, only swap to live data when server confirms.
// Take snapshot when swipe starts
// peek = the next card's data, computedNextTop = what we expect to become the new top
peekSnapshotRef.current = peek;
nextTopSnapshotRef.current = computedNextTop;
// Render snapshot during animation
const displayCard = serverTopId === expectedTopId
? liveTop
: nextTopSnapshotRef.current;
Why Agents Struggled: This pattern is non-obvious. The natural approach is “show what the server says” - but that causes flicker. Agents had to discover through repeated failures that optimistic snapshotting was required.
The Z-Index Collision Problem
Current Hierarchy (NetworkingCardStack):
zIndex: 1000 → ChainCelebration (streak milestone overlay)
zIndex: 30 → Action buttons
zIndex: 30 → Rarity indicators ← COLLISION
zIndex: 30 → Disabled notices ← COLLISION
zIndex: 20 → Ghost card
zIndex: 10 → Top card
zIndex: 0 → Peek card
Why Agents Keep Breaking This:
- No centralized z-index system
- Four different elements share zIndex: 30
- Adding new UI requires guessing which z-index won’t collide
- React Native’s “later in tree = higher” rule adds implicit ordering
- Platform differences (Android elevation vs iOS zIndex)
Agents adding features naturally pick existing z-index values, causing overlaps that only manifest in certain states (e.g., when both rarity indicator AND disabled notice are visible).
Part 4: Cognitive Load Analysis
Why This Exceeds Agent Capabilities
Task: “Fix the 1-frame flash when card is swiped”
What an agent must understand to fix this:
- The SwipeStackReducer state machine (6 states, 9 transitions)
- That
showTopdepends on 4 different conditions includingwaitingForNewTop - That
waitingForNewTopcomparesoutgoingTopCursortotop.deckCursor - That cursors can be undefined, requiring null checks
- That
holdForXpAnimgates both visibility AND animation start - That
setTimeoutschedulesrunGhostAnimationafter XP bar completes - That ghost transforms should NOT be reset to 0 (counter-intuitive)
- That
peekSnapshotRefmust persist until!ghostVisible && !waitingForNewTop - That the watchdog timer at 1500ms can interfere with the fix
- That there are TWO systems (legacy NetworkingCardStack, new TwoFrameDeck)
Typical Agent Behavior:
- Read error description
- Find code that seems related
- Make change that fixes immediate symptom
- Doesn’t realize it broke another state transition
- Push fix
- Bug reappears in different form
- Repeat
The “Fix One Thing, Break Another” Trap
Example: Fixing ghost flash by resetting transforms
// Agent's "fix"
const settleGhost = () => {
ghostX.value = 0; // Reset to center
ghostY.value = 0;
ghostCardRef.current = null;
};
What breaks:
- Transform reset happens BEFORE React unmounts the component
- For 1 frame, ghost is at (0,0) with full opacity
- Visual: card flashes at center
Actual fix:
// Don't reset transforms - let unmount handle it
const settleGhost = () => {
ghostCardRef.current = null;
setGhostDir(null);
};
An agent cannot know this without either:
- Reading the existing comment warning about it
- Making the mistake and observing the flash
- Understanding React’s reconciliation timing vs Reanimated’s animation thread
Part 5: Patterns for Agent Improvement
What Would Help Agents Succeed
-
Explicit State Machine Diagrams
- Current: State machine buried in reducer, transitions implicit
- Better: Visual diagram showing all states and what’s visible in each
-
Timing Sequence Diagrams
- Current: Timing scattered across callbacks and effects
- Better: Document the exact sequence of events for swipe-complete
-
Z-Index Registry
- Current: Magic numbers scattered across components
- Better: Centralized
Z_INDEX.GHOST = 20,Z_INDEX.BUTTONS = 30
-
“Don’t Touch” Zones
- Current: Warnings in comments
- Better: Explicit documentation of invariants that must be preserved
-
Visual Regression Tests
- Current: Unit tests can’t catch 1-frame flashes
- Better: Screenshot tests at specific animation frames
-
Snapshot Pattern as First-Class Concept
- Current: Pattern exists but isn’t named or documented
- Better:
useOptimisticSnapshot()hook with clear semantics
Part 6: Agent Behavior Anti-Patterns
This section documents the specific behavioral patterns that cause agents to fail repeatedly at visual bug fixes.
Anti-Pattern 1: Symptom-First Diagnosis
Symptom-First Diagnosis: Agent reads error description, finds code that seems related, fixes that code without understanding root cause.
Evidence from StyleSwipe:
The “Networking Swipe Saga” (Week 1):
| Approach | What Agent Thought | What Agent Did | Actual Problem |
|---|---|---|---|
[add-sync-logic] | ”Throttle never clears” | Added timeout logic, requestAnimationFrame tracking | State machine transition bug |
[hide-dont-fix] | ”Too many refetches” | Commented out UI sections | Incomplete hydration |
[full-rewrite] | ”Overlay replays animations” | 218-line file rewrite | State wasn’t being reset |
The Tell: Each “fix” touched different files and different systems. If the agent understood the root cause, the fixes would converge on the same location.
Why Agents Do This:
- Error messages describe symptoms, not causes
- Agents pattern-match “throttle” → find throttle code → fix throttle code
- No mechanism to trace data flow backward to origin
Anti-Pattern 2: Over-Engineering Response
Over-Engineering Response: Simple bug gets architectural refactor instead of targeted fix.
Evidence:
The Swipe Logic Hook grew from ~600 lines to 1200+ lines through a 14-step “refactoring plan”:
Step 1: utilities extraction
Step 3: hardened odds normalization
Step 4: filtered undefined artwork URLs
Step 5: refactored dev warn-once logic
Step 8: explicit hydration flags
Step 9: extracted accept-gate hook
Step 10: extracted analytics hook
Step 13: centralized throttle handling
The Original Bug: Draw loop when out of introductions. The Fix Needed: One boolean flag to block repeated draws. What Actually Happened: 7 new hook abstractions, 3 new model files, complete deck model reimplementation.
Why Agents Do This:
- Large context windows encourage “while I’m here” thinking
- Agents confuse “related code” with “code that needs changing”
- No cost signal for scope expansion
Anti-Pattern 3: Magic Number Tinkering
Magic Number Tinkering: Timing bugs “fixed” by adjusting millisecond values until symptoms disappear.
Evidence:
// [magic-number] - added with no justification
if (delta < 250) { return; }
// Various timing constants added over time:
250ms debounce [magic-number]
1200ms hide delay [rehydration-fix]
1400ms screen reader timeout [rehydration-fix]
220ms animation duration [rehydration-fix]
1500ms watchdog timeout (NetworkingCardStack)
The Problem: These numbers are cargo-culted. 250ms “works” but nobody knows why. When the system changes, the magic number breaks.
Why Agents Do This:
- Timing bugs are hard to reason about
- Trial and error with numbers is faster than understanding the system
- Numbers that “work” get committed without understanding
Anti-Pattern 4: Defensive Duplication
Defensive Duplication: Same check added multiple times in sequence because agent doesn’t trust earlier checks.
Evidence from the Swipe Logic Hook:
// Check deck revision regression
if (nextRev < prevRev) { /* abort */ }
// IDENTICAL check, copy-pasted elsewhere in same file
if (nextRev < prevRev) { /* abort */ }
// Same logic, third location
if (nextCursor > prevCursor && nextTopCursor === prevTopCursor) { /* abort */ }
The “Re-Check Latest” Pattern: The phrase appears 30+ times:
- “Check if draw response is latest”
- “Re-check latest before counter sync”
- “Re-check latest before UI side-effects”
- “Triple-check before side-effects”
Why Agents Do This:
- Race condition discovered → add check at that location
- Another race discovered → add check at new location
- No architectural solution, just more guards
- Each guard is a monument to a past bug
Anti-Pattern 5: Hide-Don’t-Fix
Hide-Don’t-Fix: When confused, comment out the problematic code rather than understanding it.
Evidence:
[hide-dont-fix] - “Networking swipe redundantly refetches state”:
- Problem: UI flickering due to multiple refetches
- “Fix”: Comment out the counters display, chain indicator, and other UI
- Result: Problem hidden, not solved
Similar Pattern in Error Handling:
try { HapticsManager.triggerSwipeHaptic('left', 'complete'); } catch {}
try { dispatchCtrl({ ... }); } catch {}
Empty catch blocks = “I don’t know why this fails, so I’ll silence it.”
Why Agents Do This:
- Removing code is faster than understanding code
- If UI doesn’t show, user doesn’t see the bug
- Agent moves on, declares victory
Anti-Pattern 6: Shotgun Debugging
Shotgun Debugging: Change many files hoping one change fixes the issue.
Evidence:
| Bug | Files Changed | Surgical? |
|---|---|---|
| ”Buttons not responding” | 6 files | No - scattered across layers |
| ”Card overlay persistence” | 44 files | No - includes 37 unrelated YAML files |
| ”Overlay replays animations” | 5 files | No - component + hook + types |
Correlation: Visual bug fixes average 4-6 files. Logic bug fixes average 1-2 files.
Why Agents Do This:
- Visual symptoms can have causes anywhere in the render tree
- Agent doesn’t know which layer is responsible
- Changing multiple files increases chance of hitting the right one
Anti-Pattern 7: Planning as Procrastination
Planning as Procrastination: Writing planning documents instead of implementing fixes.
Evidence:
14 consecutive commits updating planning files:
[step-1] Step 1: utilities
[step-3] Step 3: hardened odds normalization
...
[step-14] Step 14: Summary
The Problem: Planning became the work. Each “step” was a commit to a planning document, not actual code. The planning documents grew more elaborate while bugs remained unfixed.
Why Agents Do This:
- Planning feels like progress
- Complex problems are scary; planning is safe
- Token generation on documentation is easier than on code
Anti-Pattern 8: Partial Fix Acknowledgment
Partial Fix Acknowledgment: Agent knows the fix is incomplete but commits anyway.
Evidence:
[partial-fix] | "partial fixes to networking swipe state issues"
[defensive-hardening] | "hardening on swipe system"
[type-alignment] | "fix typing errors, add temporary momentum values"
The word “partial” is explicit admission. The agent knew more work was needed but stopped anyway.
Why Agents Do This:
- Context window pressure - better to commit something than lose work
- Optimism - “I’ll fix the rest in the next pass”
- Scope fatigue - agent ran out of steam
Anti-Pattern 9: Misidentifying the Failing Layer
Misidentifying the Failing Layer: Fix applied at wrong abstraction layer.
Evidence:
| Symptom | Where Agent Fixed | Where Problem Actually Was |
|---|---|---|
| ”Overlay flickers” | Swipe Overlay Component | Card State Machine |
| ”Buttons not responding” | Button components | Frame controller state |
| ”Card shows briefly” | Card visibility logic | Ghost transform timing |
Why Agents Do This:
- Symptoms manifest in UI components
- Agent fixes where symptom appears
- Root cause is 2-3 layers deeper in data flow
Anti-Pattern 10: Regression Blindness
Regression Blindness: Fix one bug, don’t notice it broke something else.
Evidence:
The Two-Stack refactor [architectural-refactor]:
- 2,914 lines added, 14 new files
- 3 hours later:
[emergency-fix]“fix swipe buttons not responding” - The following day: still fixing “cooldown gating logic”
The agent never tested:
- Button responsiveness
- State propagation through new component tree
- Interaction with existing cooldown system
Why Agents Do This:
- No automated visual regression tests
- Manual testing is time-consuming
- Agent assumes “if it compiles, it works”
Part 7: Diagnostic Accuracy Statistics
Based on manual analysis of 984 commits in the StyleSwipe codebase (a commit was considered successful if no related fixes appeared within 7 days):
Fix Success Rates by Category
| Category | First-Time Success | Needed Follow-Up |
|---|---|---|
| Typing/imports | ~95% | ~5% |
| Configuration | ~90% | ~10% |
| Single-file surgical | ~85% | ~15% |
| Hook refactors | ~40% | ~60% |
| Visual bug fixes | ~30-40% | ~60-70% |
| State synchronization | ~20% | ~80% |
| Multi-system changes | ~10% | ~90% |
The 3-Week Throttle Bug
Most telling statistic: The throttle/cooldown system was “fixed” 3 times over 3 weeks:
- Week 1:
[add-sync-logic]“throttle never clears after cooldown” - Week 3:
[centralization]“centralized throttle/cooldown handling” - Week 3, next day:
[late-regression-fix]“fix card overlay persistence and cooldown gating logic”
Same bug. Three weeks. Three “fixes.” Still broken.
Commit Message Quality as Diagnostic Signal
Messages indicating correct diagnosis (rare, ~2%):
- “enforce latest-response wins and gate side-effects”
- “Realigns Networking swipes with a deck-of-cards mental model”
Messages indicating symptom-chasing (common, ~30%):
- “fix swipe buttons not responding”
- “partial fixes to networking swipe state issues”
- “Swipe decision overlay replays animations”
The tell: Root-cause messages describe the mechanism. Symptom messages describe what the user sees.
Part 8: Why Visual Bugs Specifically Break Agents
The Fundamental Mismatch
Agents are trained on:
- Logic bugs (wrong value computed)
- Compile errors (syntax/type issues)
- API misuse (wrong parameters)
Visual bugs are different:
- Correct logic, wrong timing
- No error message
- Symptoms appear far from cause
- Fix requires understanding multiple async systems simultaneously
The Observability Gap
Logic bug: Add console.log, see wrong value, trace backward.
Visual bug:
- Flicker happens in 16ms
- Can’t console.log a single frame
- Must reason about Reanimated (UI thread) + React (JS thread) + Redux (store) simultaneously
- No tooling to visualize this
The Counter-Intuitive Fix Problem
Agents learn patterns: “problem X → fix Y”
Visual bugs often require anti-patterns:
- DON’T reset the value (leave it stale)
- DON’T show the live data (show snapshot)
- DON’T clean up immediately (wait for unmount)
- ADD artificial delays (setTimeout before animation)
Agents can’t learn these because they violate normal programming intuition.
Part 9: What Successful Fixes Look Like
Not all fixes failed. Analyzing the commits that worked on the first try reveals clear patterns that distinguish success from failure.
Successful Fix #1: Ghost Flash Fix [code-removal]
Bug: Ghost card flashing at center after swipe.
What Made It Work:
| Aspect | Approach |
|---|---|
| Root Cause Identified | ”Resetting to 0 while still mounted causes brief flash” |
| Files Changed | 3 (surgical, related files only) |
| Fix Type | Removed code rather than adding |
| Logic | Simple conditional: ghostDir !== null |
The Key Insight: Agent identified the EXACT FRAME when the problem occurred. Instead of adding state guards, they removed the problematic reset and improved conditional rendering.
No Follow-Up Needed: No commits to same files within 7 days.
Successful Fix #2: Rare+ Overlay Fix [input-validation]
Bug: Rarity overlays not rendering correctly on rare or higher tier cards.
What Made It Work:
| Aspect | Approach |
|---|---|
| Root Cause Identified | Invalid threshold causing NaN in calculations |
| Files Changed | 2 (RarityFrame, SwipeDecisionOverlay) |
| Fix Type | Added input validation at boundary |
| Logic | `!Number.isFinite(swipeThreshold) |
The Key Insight: Agent validated inputs BEFORE using them, not after symptoms appeared. Also added pointerEvents="none" to prevent interaction conflicts.
Successful Fix #3: Collection Feed Animation [library-primitive]
Bug: Post fade-in animation not smooth.
What Made It Work:
| Aspect | Approach |
|---|---|
| Root Cause Identified | Custom keyframe didn’t match visual feel |
| Files Changed | 1 (SocialFeed only) |
| Fix Type | Replaced custom code with library primitive |
| Logic | Swapped to FadeInDown.duration(800) |
The Key Insight: Agent used a library primitive instead of custom implementation. No state coordination needed.
Failed Fix for Comparison: Partial Swipe State Fix [partial-fix]
Bug: “Networking swipe state issues” (vague).
Why It Failed:
| Aspect | Approach |
|---|---|
| Root Cause | NOT identified - commit says “partial fixes” |
| Files Changed | 4 (scattered across layers) |
| Fix Type | Added multiple ref-based guards |
| Logic | Complex: introsKnownZero, drawBlockedByNoIntrosRef, introsFromCounters |
Result: Required follow-up fix 3 hours later [root-cause-fix], then another same day.
The Pattern: Success vs Failure
| Factor | Successful Fixes | Failed Fixes |
|---|---|---|
| Files Changed | 1-3 (surgical) | 4+ (scattered) |
| Commit Message | Describes mechanism | Describes symptom |
| Fix Type | Remove/simplify | Add guards/complexity |
| Conditionals | Simple boolean | Multi-state heuristics |
| State Coordination | None required | Cross-file ref tracking |
| Follow-Ups Needed | None in 7 days | Multiple same day |
What Successful Agents Did Differently
1. They Identified the Exact Moment
Successful [code-removal]:
“Resetting to 0 while still mounted causes brief flash”
Failed [partial-fix]:
“partial fixes to networking swipe state issues”
The successful agent could point to a specific line and frame. The failed agent couldn’t articulate what was actually wrong.
2. They Removed Rather Than Added
Successful agents:
- Removed transform resets
[code-removal] - Replaced custom animation with library primitive
[library-primitive] - Net code reduction
Failed agents:
- Added
drawBlockedByNoIntrosRef - Added
introsFromCountersANDintrosFromStore - Added “latest check” guards (30+ times)
- Net code increase
3. They Validated at Boundaries
Successful [input-validation]:
// Validate BEFORE using
if (!Number.isFinite(swipeThreshold) || swipeThreshold <= 0) {
return { opacity: 0 };
}
Failed (scattered across useNetworkingSwipe):
// Check AFTER problem occurs, in 30+ places
if (!isLatest) return;
// ... later ...
if (!isLatest) return; // re-check
// ... later ...
if (!isLatest) return; // triple-check
4. They Used Library Primitives
Successful [library-primitive]:
// Use what the library provides
FadeInDown.duration(800)
Failed (custom timing scattered everywhere):
// Invent timing from scratch
setTimeout(() => { ... }, 250); // why 250?
setTimeout(() => { ... }, 1200); // why 1200?
setTimeout(() => { ... }, 1500); // why 1500?
The 14-Step Refactor: What Good Planning Looks Like
One successful effort was a 14-step incremental refactor of useNetworkingSwipe. What made it work:
Step 1: Extract pure utilities first (no side effects)
// Guard Utilities Module - isolated, testable
export function guardLatest(ref, id) { ... }
export function makeClientActionId() { ... }
Steps 2-7: Individual hardening fixes, each independently verifiable
Steps 8-11: Extract concerns into focused hooks
// Each hook has ONE job
useNetworkingAcceptGate()
useNetworkingAnalytics()
useNetworkingArtwork()
useNetworkingCooldown()
Steps 12-14: Introduce explicit state machine
// Replace scattered booleans with phases
type Phase = 'idle' | 'fetching' | 'drawing' | 'ready' | 'throttled' | 'error'
Key Principle: Each step had explicit acceptance criteria:
Acceptance: Build passes; no imports used yet.
Acceptance: All tests pass and cover error conditions.
The Error Classifier Pattern: Pure Functions Win
One of the most successful refactors extracted error handling into a pure classifier:
Before (failed pattern):
// 71 lines of nested if/else scattered in hook
if (error.code === 'ERR_THROTTLED') {
setBanner({ type: 'throttle', message: '...' });
setCooldown(error.cooldownEndsAt);
// ... more side effects
} else if (error.code === 'ERR_TIMEOUT') {
// ... different side effects
}
After (successful pattern):
// Pure classifier function
function classifySwipeError(error): ClassifiedSwipeError {
if (error.code === 'ERR_THROTTLED') {
return {
kind: 'throttle',
banner: { type: 'throttle', message: '...' },
cooldownEndsAt: error.cooldownEndsAt,
shouldRefresh: false
};
}
// ... returns data, not side effects
}
// Hook just reads the result
const classified = classifySwipeError(error);
if (classified.banner) setBanner(classified.banner);
if (classified.cooldownEndsAt) setCooldown(classified.cooldownEndsAt);
Why This Works:
- Classifier is testable without mocking hooks
- Logic is auditable in one place
- Hook is simplified to “read result, apply effects”
- No scattered conditionals
Golden Rules from Successful Fixes
-
Identify the exact frame/moment - If you can’t say “at line X, during frame Y, the value is Z when it should be W”, you haven’t diagnosed the problem.
-
Remove before adding - If your fix adds more code than it removes, question whether you’ve found the root cause.
-
Validate at boundaries - Check inputs when they enter the system, not after they’ve corrupted state.
-
Use library primitives - Don’t reimplement what Reanimated, React Navigation, or Redux already provide.
-
Extract pure functions - Domain logic (error classification, state computation) should be pure and testable.
-
One commit = one coherent change - If your fix touches 6+ files across different layers, you’re probably fixing symptoms.
-
Plan in markdown first - Successful refactors had written plans with acceptance criteria before coding started.
-
Type-check before committing - Every successful commit mentioned running
pnpm type-check.
Quantified Success Factors
From analyzing 984 commits:
| Factor | Success Correlation |
|---|---|
| Single file changed | ~85% no follow-up needed |
| Commit message describes mechanism | ~78% no follow-up needed |
| Code removed > code added | ~73% no follow-up needed |
| Library primitive used | ~91% no follow-up needed |
| Pure function extracted | ~82% no follow-up needed |
| ”partial” in commit message | 100% needed follow-up |
| 4+ files changed | ~67% needed follow-up |
| Magic number added | ~71% needed follow-up |
Recommendations for Agent Improvement
For Agent Developers
- Train on timing, not just logic - Visual bugs require understanding execution order across threads
- Reward removal - Fixes that delete code should be scored higher than fixes that add code
- Add trace tooling - Agents need to see state across time, not just at breakpoints
- Teach the snapshot pattern - This is the most common solution to flicker bugs
For Codebase Authors
- Document invariants - “Never reset transforms while mounted” should be in a README, not a comment
- Centralize timing constants - Magic numbers cause magic bugs
- Add visual regression tests - Screenshot tests at specific animation frames
- Name your patterns - “Optimistic snapshot” is easier to apply than discovering it from scratch
For Users Prompting Agents
- Describe the exact moment - “Card flashes at center” is better than “animation is buggy”
- Provide frame-level detail - “After swipe completes but before next card appears”
- Mention what you’ve tried - Saves the agent from repeating failed approaches
- Ask for diagnosis first - “What’s causing this?” before “Fix this”
The path to better agent performance on visual bugs runs through better understanding of timing, better tooling for observation, and better incentives for simplification over complexity.
Conclusion
Coding agents struggle with visual implementations because:
-
Visual bugs are timing bugs - They exist in gaps between async systems, not in logic errors that can be traced statically.
-
The fix is often counter-intuitive - Don’t reset transforms, keep stale data longer, add artificial delays.
-
State is distributed - React state, Redux state, Reanimated shared values, and refs all must synchronize.
-
Testing is inadequate - Unit tests pass while 1-frame flashes ship to users.
-
Symptoms mislead - “Card flashes” could be caused by transform reset, snapshot timing, z-index collision, or state machine transition - agents guess wrong.
-
Fixes cascade - Changing one timing constant affects multiple animations. Agents can’t predict second-order effects.
The StyleSwipe codebase shows this pattern clearly: 6 commits to fix one race condition, days of follow-up fixes after a major refactor, and 47 synchronization points that must all work together.
For agents to improve at visual bugs, they need:
- Better mental models of timing (not just logic)
- Explicit documentation of “what not to do”
- Tools to visualize state across time, not just at a single point
- Architectural patterns that reduce coordination requirements
The Vicious Cycle
- Agent doesn’t understand system → makes localized fix
- Localized fix breaks something else → another agent adds guard
- Guards accumulate → system becomes harder to understand
- Next agent understands even less → makes even more localized fix
- Repeat
The Visual Bug Paradox
Visual bugs create a paradox for coding agents:
- The symptom is visible but the cause is invisible (timing gaps between systems)
- The obvious fix (reset values, add checks) often makes it worse
- The correct fix (remove code, use snapshots, delay unmount) is counter-intuitive
- Testing is impossible with standard tools (can’t unit test a 16ms flicker)
- Success requires removal but agents are trained to add code
The agents who succeeded at visual bugs shared one trait: they understood the system well enough to remove code rather than add it.
The agents who failed kept adding guards, checks, and workarounds—each one a monument to their incomplete understanding, and each one making the next agent’s job harder.
The Documentation Debt
Every defensive comment is technical debt:
- “Do not reset ghost transforms here” = agent couldn’t fix root cause
- Watchdog timer = agent couldn’t prevent deadlock
- 30+ “re-check latest” guards = agent couldn’t eliminate race
The Path Forward
For agents to succeed at visual bugs, they need:
- Trace tools - Visualize state across time, not just at one point
- Animation debuggers - Step through frame by frame
- Pattern libraries - Named solutions for common visual problems
- Invariant documentation - What must NEVER change
- Integration test gates - Prevent commits that break existing behavior
Until then, visual bugs will remain the domain where agents struggle most.