Why Coding Agents Struggle with Visual Implementations

This case study examines the StyleSwipe React Native codebase where coding agents repeatedly struggled to fix visual bugs in card swiping and collection gameplay. StyleSwipe is a mobile game combining fashion industry simulation with Tinder-style card swiping—players swipe to accept or pass on fashion industry contacts to build their asset collection. Over 3 weeks, agents made 6 attempts to fix a single race condition, introduced days of follow-up bugs after a major refactor, and left 30+ defensive guard comments warning future agents what not to do.

Executive Summary

Key Finding

Visual bugs are fundamentally different from logic bugs. They exist in timing gaps between async systems, manifest for only 1-2 frames, and often require counter-intuitive fixes (don’t reset values, keep stale data, add delays). Agents trained on logic bugs repeatedly fail at visual bugs because their pattern-matching doesn’t apply.

Evidence Summary

Evidence Type	Finding
Git History	Same bug “fixed” 3 times over 3 weeks (throttle/cooldown)
Code Comments	30+ “re-check latest” guards for race conditions
Complexity	47 synchronization points, 4 timing domains, 6-state machine
Success Rate	Visual bugs: 30-40% first-time fix vs. 95% for typing errors

Why Agents Fail

Symptom-First Diagnosis - Fix where error appears, not where it originates
Adding vs. Removing - Agents add guards; successful fixes remove code
Magic Number Tinkering - Adjust milliseconds until symptoms disappear
Shotgun Debugging - Change 6+ files hoping one works
Partial Fix Acknowledgment - Commit knowing it’s incomplete

What Success Looks Like

Successful fixes (30-40% of visual bug attempts) share these traits:

Successful Pattern	Failed Pattern
1-3 files changed	4+ files scattered
Describes mechanism in commit	Describes symptom
Removes code	Adds guards
Uses library primitives	Invents custom timing
No follow-ups in 7 days	Follow-ups same day

The Vicious Cycle

Agent doesn't understand system → Makes localized fix
                ↓
Localized fix breaks something else → Another agent adds guard
                ↓
Guards accumulate → System becomes harder to understand
                ↓
Next agent understands even less → Makes even more localized fix
                ↓
                Repeat

Quantified Impact

Throttle bug: 3 “fixes” across 3 weeks, still broken
Two-stack refactor: 2,914 lines added, buttons broke 3 hours later
Guard proliferation: “Re-check latest” appears 30+ times in one file
Watchdog timers: 4 different 1500ms+ timeouts as band-aids

Part 1: Evidence of Struggle - Git History Patterns

Note: Bracketed tags like [add-sync-logic] indicate the fix strategy employed for each commit, making patterns like escalation or repeated attempts visible.

Pattern A: Same-Day Fix Cycles

Example: “App Top Bar Not Updating” (Week 2)

Elapsed	Approach	Description
T+0	`[add-sync-logic]`	”fixed app top bar not updating connections and introductions” (+45 lines)
T+2h 20m	`[full-rewrite]`	”fix networking swipes not updating apptopbar’s introductions and connections” (-146 lines, +38 lines)

What Happened: The first fix only addressed part of the problem. 2.5 hours later, a complete rewrite was needed - the agent deleted 146 lines and rewrote the approach entirely.

Example: State Synchronization Cascade (Week 2, next day)

Six commits in 6 hours, each discovering the previous fix was incomplete:

Elapsed	Approach	What It Fixed	What It Broke/Missed
T+0	`[data-mapping-fix]`	”map oddsForTop to previous oddsForPeek”	Modal rendered when no card
T+37m	`[error-handling]`	”explicit client handling for draw-required errors”	Screen didn’t rehydrate on focus
T+1h 24m	`[rehydration-fix]`	”ensure networking screen rehydrates when returning”	Still had race conditions
T+2h 47m	`[partial-fix]`	”PARTIAL fixes to networking swipe state”	Still using untyped state access
T+3h 4m	`[type-safety]`	”Added typed resource selectors”	Root cause still not fixed
T+6h 14m	`[root-cause-fix]`	”enforce latest-response wins and gate side-effects”	Finally the real fix

What Happened: Each agent (or the same agent in multiple sessions) fixed symptoms rather than the root cause. The real issue was a race condition where server responses could arrive out of order - but this wasn’t diagnosed until the 6th attempt.

Pattern B: Major Refactor Immediately Breaks Something

Example: Two-Stack System Introduction (Week 3, overnight session)

Elapsed	Approach	Description
T+0	`[architectural-refactor]`	”networking stack replaced by two-stack system” (+2,914 lines, 14 new files)
T+3h	`[emergency-fix]`	”fix swipe buttons not responding”
T+3h 45m	`[follow-up-fix]`	”Added consistent haptic feedback”

What Happened: A massive architectural refactor to solve animation issues immediately broke basic button functionality. The agent had to scramble to fix touch handling that worked fine in the old system.

The Following Day: A [late-regression-fix] was still addressing “card overlay persistence and cooldown gating logic” - side effects from the original refactor.

Pattern C: Commits Named “Partial” or “Attempt”

[partial-fix]         | partial fixes to networking swipe state issues
[defensive-hardening] | hardening on swipe system
[type-alignment]      | fix typing errors, add temporary momentum values
[type-alignment]      | fix more typing errors
[type-alignment]      | type fixes

The word “partial” and repetitive “fix more” commits indicate agents knew they weren’t fully solving the problem but pushed anyway, hoping subsequent passes would complete the fix.

Part 2: Evidence of Struggle - Defensive Code Comments

The codebase is filled with warnings left by agents for future agents. These comments document hard-won knowledge about what NOT to do.

Category: “Don’t Reset Transforms”

Card Stack Component (ghost cleanup)

// Do not reset ghost transforms here — resetting to 0 while still mounted
// causes a brief flash at center with overlay before unmount. The transforms
// will be reinitialized by runGhostAnimation for the next swipe.

What This Reveals: An agent tried the obvious fix (reset transforms to 0 when done) and it caused a visual flash. The workaround is counter-intuitive: leave stale transform values and let unmounting handle cleanup.

Category: “Watchdog Timers for Deadlocks”

Card Stack Component (deadlock watchdog)

// Safety watchdog: if we remain in 'handoff' but the authoritative top does not
// update within a short window, unlock interactions to avoid deadlock.
// This covers rare cases where the server reports success but the top card
// remains the same (or stale) for longer than expected, leaving the stack
// non-interactive.

What This Reveals: The system can enter a state where the UI is permanently locked. Rather than fixing the root cause, agents added a 1500ms watchdog timer to force-unlock. This is a band-aid, not a cure.

Category: “Race Condition Guards”

Swipe Logic Hook - The phrase “latest check” or “re-check latest” appears 30+ times:

// Check if draw response is latest before applying side-effects
// Re-check latest before counter sync to avoid races
// Re-check latest before UI side-effects
// Secondary guards on fetch state before any updates
// Triple-check before side-effects to prevent mid-flight races

What This Reveals: Agents couldn’t prevent race conditions architecturally, so they added guard checks at every possible point. The code is littered with defensive if (isLatest) checks because agents couldn’t reason about when responses would arrive.

Category: “Cursor Identity Confusion”

Card Stack Component (card identity tracking)

// Uniqueness is determined by deck cursor, not id. Prefer cursor comparison
// when available; if cursors are absent, this is an error condition.

What This Reveals: Cards can have the same ID across different decks (the same card definition can appear multiple times at different positions). Agents initially used card IDs to track which card was swiped away, causing bugs when the same card appeared again. This comment documents the hard-learned rule: use position-based deck cursors (server-owned indices), not card IDs.

Category: “Don’t Do This or Everything Breaks”

Swipe Overlay Component (hooks ordering rule)

// All animated styles must be declared before any conditional returns

Swipe Logic Hook (draw loop prevention)

// Avoid repeated draw attempts until we detect introductions > 0
// (introductions are a consumable resource required to draw new cards)
drawBlockedByNoIntrosRef.current = true;

What This Reveals: Agents discovered landmines through trial and error. React hooks ordering rules, infinite retry loops, animation state contamination - each comment represents a bug that was fixed and then documented to prevent regression.

Part 3: Why Visual Bugs Are Harder Than Logic Bugs

The Complexity Metrics

Metric	Value	Why It Matters
State machine states	6	Each state has different visibility/interaction rules
State transitions	9+	Must be tested in combination
Timing domains	4	Redux, Reanimated, setTimeout, requestAnimationFrame
Shared values	6	ghostX, ghostY, ghostRot, peekScale, peekOffsetY, promote
Imperative refs	7+	topCardRef, ghostCardRef, peekSnapshotRef, etc.
Synchronization points	47	Places where systems must coordinate
Race condition vectors	7+	Documented in code comments

The Four Timing Domains Problem

Visual bugs often involve timing mismatches between:

Redux/Server State - Updates when server responds (0-2000ms latency)
Reanimated Worklets - JavaScript functions that run on the native UI thread (16ms frame budget)
setTimeout Callbacks - JavaScript event loop (variable timing)
requestAnimationFrame - Vsync-aligned (16.67ms intervals)

Example Bug: “Card flashes at center after swipe”

Swipe completes → Reanimated sets ghostX to final position
Server responds → Redux updates top card
React re-renders → Ghost component still mounted for 1 frame
Ghost transform resets to 0 → Flash at center
Next frame → Ghost unmounts

The bug exists in the gap between timing domains. Fixing it requires understanding that Reanimated animations complete on UI thread while React re-renders on JS thread, and they don’t synchronize automatically.

The Snapshot Pattern (Emergent Solution)

The codebase evolved a pattern that agents discovered through iteration:

Problem: When the top card flies away and the peek card (the next card visible behind it) should promote, there’s a gap where:

Top card data is stale (swiped away)
New card data hasn’t arrived from server
UI shows nothing or wrong card for 1-3 frames

Solution: Snapshot the peek card’s data at swipe initiation, render snapshot during animation, only swap to live data when server confirms.

// Take snapshot when swipe starts
// peek = the next card's data, computedNextTop = what we expect to become the new top
peekSnapshotRef.current = peek;
nextTopSnapshotRef.current = computedNextTop;

// Render snapshot during animation
const displayCard = serverTopId === expectedTopId
  ? liveTop
  : nextTopSnapshotRef.current;

Why Agents Struggled: This pattern is non-obvious. The natural approach is “show what the server says” - but that causes flicker. Agents had to discover through repeated failures that optimistic snapshotting was required.

The Z-Index Collision Problem

Current Hierarchy (NetworkingCardStack):

zIndex: 1000  → ChainCelebration (streak milestone overlay)
zIndex: 30    → Action buttons
zIndex: 30    → Rarity indicators  ← COLLISION
zIndex: 30    → Disabled notices   ← COLLISION
zIndex: 20    → Ghost card
zIndex: 10    → Top card
zIndex: 0     → Peek card

Why Agents Keep Breaking This:

No centralized z-index system
Four different elements share zIndex: 30
Adding new UI requires guessing which z-index won’t collide
React Native’s “later in tree = higher” rule adds implicit ordering
Platform differences (Android elevation vs iOS zIndex)

Agents adding features naturally pick existing z-index values, causing overlaps that only manifest in certain states (e.g., when both rarity indicator AND disabled notice are visible).

Part 4: Cognitive Load Analysis

Why This Exceeds Agent Capabilities

Task: “Fix the 1-frame flash when card is swiped”

What an agent must understand to fix this:

The SwipeStackReducer state machine (6 states, 9 transitions)
That showTop depends on 4 different conditions including waitingForNewTop
That waitingForNewTop compares outgoingTopCursor to top.deckCursor
That cursors can be undefined, requiring null checks
That holdForXpAnim gates both visibility AND animation start
That setTimeout schedules runGhostAnimation after XP bar completes
That ghost transforms should NOT be reset to 0 (counter-intuitive)
That peekSnapshotRef must persist until !ghostVisible && !waitingForNewTop
That the watchdog timer at 1500ms can interfere with the fix
That there are TWO systems (legacy NetworkingCardStack, new TwoFrameDeck)

Typical Agent Behavior:

Read error description
Find code that seems related
Make change that fixes immediate symptom
Doesn’t realize it broke another state transition
Push fix
Bug reappears in different form
Repeat

The “Fix One Thing, Break Another” Trap

Example: Fixing ghost flash by resetting transforms

// Agent's "fix"
const settleGhost = () => {
  ghostX.value = 0;  // Reset to center
  ghostY.value = 0;
  ghostCardRef.current = null;
};

What breaks:

Transform reset happens BEFORE React unmounts the component
For 1 frame, ghost is at (0,0) with full opacity
Visual: card flashes at center

Actual fix:

// Don't reset transforms - let unmount handle it
const settleGhost = () => {
  ghostCardRef.current = null;
  setGhostDir(null);
};

An agent cannot know this without either:

Reading the existing comment warning about it
Making the mistake and observing the flash
Understanding React’s reconciliation timing vs Reanimated’s animation thread

Part 5: Patterns for Agent Improvement

What Would Help Agents Succeed

Explicit State Machine Diagrams
- Current: State machine buried in reducer, transitions implicit
- Better: Visual diagram showing all states and what’s visible in each
Timing Sequence Diagrams
- Current: Timing scattered across callbacks and effects
- Better: Document the exact sequence of events for swipe-complete
Z-Index Registry
- Current: Magic numbers scattered across components
- Better: Centralized Z_INDEX.GHOST = 20, Z_INDEX.BUTTONS = 30
“Don’t Touch” Zones
- Current: Warnings in comments
- Better: Explicit documentation of invariants that must be preserved
Visual Regression Tests
- Current: Unit tests can’t catch 1-frame flashes
- Better: Screenshot tests at specific animation frames
Snapshot Pattern as First-Class Concept
- Current: Pattern exists but isn’t named or documented
- Better: useOptimisticSnapshot() hook with clear semantics

Part 6: Agent Behavior Anti-Patterns

This section documents the specific behavioral patterns that cause agents to fail repeatedly at visual bug fixes.

Anti-Pattern 1: Symptom-First Diagnosis

Anti-Pattern

Symptom-First Diagnosis: Agent reads error description, finds code that seems related, fixes that code without understanding root cause.

Evidence from StyleSwipe:

The “Networking Swipe Saga” (Week 1):

Approach	What Agent Thought	What Agent Did	Actual Problem
`[add-sync-logic]`	”Throttle never clears”	Added timeout logic, requestAnimationFrame tracking	State machine transition bug
`[hide-dont-fix]`	”Too many refetches”	Commented out UI sections	Incomplete hydration
`[full-rewrite]`	”Overlay replays animations”	218-line file rewrite	State wasn’t being reset

The Tell: Each “fix” touched different files and different systems. If the agent understood the root cause, the fixes would converge on the same location.

Why Agents Do This:

Error messages describe symptoms, not causes
Agents pattern-match “throttle” → find throttle code → fix throttle code
No mechanism to trace data flow backward to origin

Anti-Pattern 2: Over-Engineering Response

Anti-Pattern

Over-Engineering Response: Simple bug gets architectural refactor instead of targeted fix.

Evidence:

The Swipe Logic Hook grew from ~600 lines to 1200+ lines through a 14-step “refactoring plan”:

Step 1:  utilities extraction
Step 3:  hardened odds normalization
Step 4:  filtered undefined artwork URLs
Step 5:  refactored dev warn-once logic
Step 8:  explicit hydration flags
Step 9:  extracted accept-gate hook
Step 10: extracted analytics hook
Step 13: centralized throttle handling

The Original Bug: Draw loop when out of introductions. The Fix Needed: One boolean flag to block repeated draws. What Actually Happened: 7 new hook abstractions, 3 new model files, complete deck model reimplementation.

Why Agents Do This:

Large context windows encourage “while I’m here” thinking
Agents confuse “related code” with “code that needs changing”
No cost signal for scope expansion

Anti-Pattern 3: Magic Number Tinkering

Anti-Pattern

Magic Number Tinkering: Timing bugs “fixed” by adjusting millisecond values until symptoms disappear.

Evidence:

// [magic-number] - added with no justification
if (delta < 250) { return; }

// Various timing constants added over time:
250ms  debounce [magic-number]
1200ms hide delay [rehydration-fix]
1400ms screen reader timeout [rehydration-fix]
220ms  animation duration [rehydration-fix]
1500ms watchdog timeout (NetworkingCardStack)

The Problem: These numbers are cargo-culted. 250ms “works” but nobody knows why. When the system changes, the magic number breaks.

Why Agents Do This:

Timing bugs are hard to reason about
Trial and error with numbers is faster than understanding the system
Numbers that “work” get committed without understanding

Anti-Pattern 4: Defensive Duplication

Anti-Pattern

Defensive Duplication: Same check added multiple times in sequence because agent doesn’t trust earlier checks.

Evidence from the Swipe Logic Hook:

// Check deck revision regression
if (nextRev < prevRev) { /* abort */ }

// IDENTICAL check, copy-pasted elsewhere in same file
if (nextRev < prevRev) { /* abort */ }

// Same logic, third location
if (nextCursor > prevCursor && nextTopCursor === prevTopCursor) { /* abort */ }

The “Re-Check Latest” Pattern: The phrase appears 30+ times:

“Check if draw response is latest”
“Re-check latest before counter sync”
“Re-check latest before UI side-effects”
“Triple-check before side-effects”

Why Agents Do This:

Race condition discovered → add check at that location
Another race discovered → add check at new location
No architectural solution, just more guards
Each guard is a monument to a past bug

Anti-Pattern 5: Hide-Don’t-Fix

Anti-Pattern

Hide-Don’t-Fix: When confused, comment out the problematic code rather than understanding it.

Evidence:

[hide-dont-fix] - “Networking swipe redundantly refetches state”:

Problem: UI flickering due to multiple refetches
“Fix”: Comment out the counters display, chain indicator, and other UI
Result: Problem hidden, not solved

Similar Pattern in Error Handling:

try { HapticsManager.triggerSwipeHaptic('left', 'complete'); } catch {}
try { dispatchCtrl({ ... }); } catch {}

Empty catch blocks = “I don’t know why this fails, so I’ll silence it.”

Why Agents Do This:

Removing code is faster than understanding code
If UI doesn’t show, user doesn’t see the bug
Agent moves on, declares victory

Anti-Pattern 6: Shotgun Debugging

Anti-Pattern

Shotgun Debugging: Change many files hoping one change fixes the issue.

Evidence:

Bug	Files Changed	Surgical?
”Buttons not responding”	6 files	No - scattered across layers
”Card overlay persistence”	44 files	No - includes 37 unrelated YAML files
”Overlay replays animations”	5 files	No - component + hook + types

Correlation: Visual bug fixes average 4-6 files. Logic bug fixes average 1-2 files.

Why Agents Do This:

Visual symptoms can have causes anywhere in the render tree
Agent doesn’t know which layer is responsible
Changing multiple files increases chance of hitting the right one

Anti-Pattern 7: Planning as Procrastination

Anti-Pattern

Planning as Procrastination: Writing planning documents instead of implementing fixes.

Evidence:

14 consecutive commits updating planning files:

[step-1]  Step 1: utilities
[step-3]  Step 3: hardened odds normalization
...
[step-14] Step 14: Summary

The Problem: Planning became the work. Each “step” was a commit to a planning document, not actual code. The planning documents grew more elaborate while bugs remained unfixed.

Why Agents Do This:

Planning feels like progress
Complex problems are scary; planning is safe
Token generation on documentation is easier than on code

Anti-Pattern 8: Partial Fix Acknowledgment

Anti-Pattern

Partial Fix Acknowledgment: Agent knows the fix is incomplete but commits anyway.

Evidence:

[partial-fix]         | "partial fixes to networking swipe state issues"
[defensive-hardening] | "hardening on swipe system"
[type-alignment]      | "fix typing errors, add temporary momentum values"

The word “partial” is explicit admission. The agent knew more work was needed but stopped anyway.

Why Agents Do This:

Context window pressure - better to commit something than lose work
Optimism - “I’ll fix the rest in the next pass”
Scope fatigue - agent ran out of steam

Anti-Pattern 9: Misidentifying the Failing Layer

Anti-Pattern

Misidentifying the Failing Layer: Fix applied at wrong abstraction layer.

Evidence:

Symptom	Where Agent Fixed	Where Problem Actually Was
”Overlay flickers”	Swipe Overlay Component	Card State Machine
”Buttons not responding”	Button components	Frame controller state
”Card shows briefly”	Card visibility logic	Ghost transform timing

Why Agents Do This:

Symptoms manifest in UI components
Agent fixes where symptom appears
Root cause is 2-3 layers deeper in data flow

Anti-Pattern 10: Regression Blindness

Anti-Pattern

Regression Blindness: Fix one bug, don’t notice it broke something else.

Evidence:

The Two-Stack refactor [architectural-refactor]:

2,914 lines added, 14 new files
3 hours later: [emergency-fix] “fix swipe buttons not responding”
The following day: still fixing “cooldown gating logic”

The agent never tested:

Button responsiveness
State propagation through new component tree
Interaction with existing cooldown system

Why Agents Do This:

No automated visual regression tests
Manual testing is time-consuming
Agent assumes “if it compiles, it works”

Part 7: Diagnostic Accuracy Statistics

Based on manual analysis of 984 commits in the StyleSwipe codebase (a commit was considered successful if no related fixes appeared within 7 days):

Fix Success Rates by Category

Category	First-Time Success	Needed Follow-Up
Typing/imports	~95%	~5%
Configuration	~90%	~10%
Single-file surgical	~85%	~15%
Hook refactors	~40%	~60%
Visual bug fixes	~30-40%	~60-70%
State synchronization	~20%	~80%
Multi-system changes	~10%	~90%

The 3-Week Throttle Bug

Most telling statistic: The throttle/cooldown system was “fixed” 3 times over 3 weeks:

Week 1: [add-sync-logic] “throttle never clears after cooldown”
Week 3: [centralization] “centralized throttle/cooldown handling”
Week 3, next day: [late-regression-fix] “fix card overlay persistence and cooldown gating logic”

Same bug. Three weeks. Three “fixes.” Still broken.

Commit Message Quality as Diagnostic Signal

Messages indicating correct diagnosis (rare, ~2%):

“enforce latest-response wins and gate side-effects”
“Realigns Networking swipes with a deck-of-cards mental model”

Messages indicating symptom-chasing (common, ~30%):

“fix swipe buttons not responding”
“partial fixes to networking swipe state issues”
“Swipe decision overlay replays animations”

The tell: Root-cause messages describe the mechanism. Symptom messages describe what the user sees.

Part 8: Why Visual Bugs Specifically Break Agents

The Fundamental Mismatch

Agents are trained on:

Logic bugs (wrong value computed)
Compile errors (syntax/type issues)
API misuse (wrong parameters)

Visual bugs are different:

Correct logic, wrong timing
No error message
Symptoms appear far from cause
Fix requires understanding multiple async systems simultaneously

The Observability Gap

Logic bug: Add console.log, see wrong value, trace backward.

Visual bug:

Flicker happens in 16ms
Can’t console.log a single frame
Must reason about Reanimated (UI thread) + React (JS thread) + Redux (store) simultaneously
No tooling to visualize this

The Counter-Intuitive Fix Problem

Agents learn patterns: “problem X → fix Y”

Visual bugs often require anti-patterns:

DON’T reset the value (leave it stale)
DON’T show the live data (show snapshot)
DON’T clean up immediately (wait for unmount)
ADD artificial delays (setTimeout before animation)

Agents can’t learn these because they violate normal programming intuition.

Part 9: What Successful Fixes Look Like

Not all fixes failed. Analyzing the commits that worked on the first try reveals clear patterns that distinguish success from failure.

Successful Fix #1: Ghost Flash Fix `[code-removal]`

Bug: Ghost card flashing at center after swipe.

What Made It Work:

Aspect	Approach
Root Cause Identified	”Resetting to 0 while still mounted causes brief flash”
Files Changed	3 (surgical, related files only)
Fix Type	Removed code rather than adding
Logic	Simple conditional: `ghostDir !== null`

The Key Insight: Agent identified the EXACT FRAME when the problem occurred. Instead of adding state guards, they removed the problematic reset and improved conditional rendering.

No Follow-Up Needed: No commits to same files within 7 days.

Successful Fix #2: Rare+ Overlay Fix `[input-validation]`

Bug: Rarity overlays not rendering correctly on rare or higher tier cards.

What Made It Work:

Aspect	Approach
Root Cause Identified	Invalid threshold causing NaN in calculations
Files Changed	2 (RarityFrame, SwipeDecisionOverlay)
Fix Type	Added input validation at boundary
Logic	`!Number.isFinite(swipeThreshold)

The Key Insight: Agent validated inputs BEFORE using them, not after symptoms appeared. Also added pointerEvents="none" to prevent interaction conflicts.

Successful Fix #3: Collection Feed Animation `[library-primitive]`

Bug: Post fade-in animation not smooth.

What Made It Work:

Aspect	Approach
Root Cause Identified	Custom keyframe didn’t match visual feel
Files Changed	1 (SocialFeed only)
Fix Type	Replaced custom code with library primitive
Logic	Swapped to `FadeInDown.duration(800)`

The Key Insight: Agent used a library primitive instead of custom implementation. No state coordination needed.

Failed Fix for Comparison: Partial Swipe State Fix `[partial-fix]`

Bug: “Networking swipe state issues” (vague).

Why It Failed:

Aspect	Approach
Root Cause	NOT identified - commit says “partial fixes”
Files Changed	4 (scattered across layers)
Fix Type	Added multiple ref-based guards
Logic	Complex: `introsKnownZero`, `drawBlockedByNoIntrosRef`, `introsFromCounters`

Result: Required follow-up fix 3 hours later [root-cause-fix], then another same day.

The Pattern: Success vs Failure

Factor	Successful Fixes	Failed Fixes
Files Changed	1-3 (surgical)	4+ (scattered)
Commit Message	Describes mechanism	Describes symptom
Fix Type	Remove/simplify	Add guards/complexity
Conditionals	Simple boolean	Multi-state heuristics
State Coordination	None required	Cross-file ref tracking
Follow-Ups Needed	None in 7 days	Multiple same day

What Successful Agents Did Differently

1. They Identified the Exact Moment

Successful [code-removal]:

“Resetting to 0 while still mounted causes brief flash”

Failed [partial-fix]:

“partial fixes to networking swipe state issues”

The successful agent could point to a specific line and frame. The failed agent couldn’t articulate what was actually wrong.

2. They Removed Rather Than Added

Successful agents:

Removed transform resets [code-removal]
Replaced custom animation with library primitive [library-primitive]
Net code reduction

Failed agents:

Added drawBlockedByNoIntrosRef
Added introsFromCounters AND introsFromStore
Added “latest check” guards (30+ times)
Net code increase

3. They Validated at Boundaries

Successful [input-validation]:

// Validate BEFORE using
if (!Number.isFinite(swipeThreshold) || swipeThreshold <= 0) {
  return { opacity: 0 };
}

Failed (scattered across useNetworkingSwipe):

// Check AFTER problem occurs, in 30+ places
if (!isLatest) return;
// ... later ...
if (!isLatest) return; // re-check
// ... later ...
if (!isLatest) return; // triple-check

4. They Used Library Primitives

Successful [library-primitive]:

// Use what the library provides
FadeInDown.duration(800)

Failed (custom timing scattered everywhere):

// Invent timing from scratch
setTimeout(() => { ... }, 250);  // why 250?
setTimeout(() => { ... }, 1200); // why 1200?
setTimeout(() => { ... }, 1500); // why 1500?

The 14-Step Refactor: What Good Planning Looks Like

One successful effort was a 14-step incremental refactor of useNetworkingSwipe. What made it work:

Step 1: Extract pure utilities first (no side effects)

// Guard Utilities Module - isolated, testable
export function guardLatest(ref, id) { ... }
export function makeClientActionId() { ... }

Steps 2-7: Individual hardening fixes, each independently verifiable

Steps 8-11: Extract concerns into focused hooks

// Each hook has ONE job
useNetworkingAcceptGate()
useNetworkingAnalytics()
useNetworkingArtwork()
useNetworkingCooldown()

Steps 12-14: Introduce explicit state machine

// Replace scattered booleans with phases
type Phase = 'idle' | 'fetching' | 'drawing' | 'ready' | 'throttled' | 'error'

Key Principle: Each step had explicit acceptance criteria:

Acceptance: Build passes; no imports used yet.
Acceptance: All tests pass and cover error conditions.

The Error Classifier Pattern: Pure Functions Win

One of the most successful refactors extracted error handling into a pure classifier:

Before (failed pattern):

// 71 lines of nested if/else scattered in hook
if (error.code === 'ERR_THROTTLED') {
  setBanner({ type: 'throttle', message: '...' });
  setCooldown(error.cooldownEndsAt);
  // ... more side effects
} else if (error.code === 'ERR_TIMEOUT') {
  // ... different side effects
}

After (successful pattern):

// Pure classifier function
function classifySwipeError(error): ClassifiedSwipeError {
  if (error.code === 'ERR_THROTTLED') {
    return {
      kind: 'throttle',
      banner: { type: 'throttle', message: '...' },
      cooldownEndsAt: error.cooldownEndsAt,
      shouldRefresh: false
    };
  }
  // ... returns data, not side effects
}

// Hook just reads the result
const classified = classifySwipeError(error);
if (classified.banner) setBanner(classified.banner);
if (classified.cooldownEndsAt) setCooldown(classified.cooldownEndsAt);

Why This Works:

Classifier is testable without mocking hooks
Logic is auditable in one place
Hook is simplified to “read result, apply effects”
No scattered conditionals

Golden Rules from Successful Fixes

Identify the exact frame/moment - If you can’t say “at line X, during frame Y, the value is Z when it should be W”, you haven’t diagnosed the problem.
Remove before adding - If your fix adds more code than it removes, question whether you’ve found the root cause.
Validate at boundaries - Check inputs when they enter the system, not after they’ve corrupted state.
Use library primitives - Don’t reimplement what Reanimated, React Navigation, or Redux already provide.
Extract pure functions - Domain logic (error classification, state computation) should be pure and testable.
One commit = one coherent change - If your fix touches 6+ files across different layers, you’re probably fixing symptoms.
Plan in markdown first - Successful refactors had written plans with acceptance criteria before coding started.
Type-check before committing - Every successful commit mentioned running pnpm type-check.

Quantified Success Factors

From analyzing 984 commits:

Factor	Success Correlation
Single file changed	~85% no follow-up needed
Commit message describes mechanism	~78% no follow-up needed
Code removed > code added	~73% no follow-up needed
Library primitive used	~91% no follow-up needed
Pure function extracted	~82% no follow-up needed
”partial” in commit message	100% needed follow-up
4+ files changed	~67% needed follow-up
Magic number added	~71% needed follow-up

Recommendations for Agent Improvement

For Agent Developers

Recommendations

Train on timing, not just logic - Visual bugs require understanding execution order across threads
Reward removal - Fixes that delete code should be scored higher than fixes that add code
Add trace tooling - Agents need to see state across time, not just at breakpoints
Teach the snapshot pattern - This is the most common solution to flicker bugs

For Codebase Authors

Recommendations

Document invariants - “Never reset transforms while mounted” should be in a README, not a comment
Centralize timing constants - Magic numbers cause magic bugs
Add visual regression tests - Screenshot tests at specific animation frames
Name your patterns - “Optimistic snapshot” is easier to apply than discovering it from scratch

For Users Prompting Agents

Recommendations

Describe the exact moment - “Card flashes at center” is better than “animation is buggy”
Provide frame-level detail - “After swipe completes but before next card appears”
Mention what you’ve tried - Saves the agent from repeating failed approaches
Ask for diagnosis first - “What’s causing this?” before “Fix this”

The path to better agent performance on visual bugs runs through better understanding of timing, better tooling for observation, and better incentives for simplification over complexity.

Conclusion

Coding agents struggle with visual implementations because:

Visual bugs are timing bugs - They exist in gaps between async systems, not in logic errors that can be traced statically.
The fix is often counter-intuitive - Don’t reset transforms, keep stale data longer, add artificial delays.
State is distributed - React state, Redux state, Reanimated shared values, and refs all must synchronize.
Testing is inadequate - Unit tests pass while 1-frame flashes ship to users.
Symptoms mislead - “Card flashes” could be caused by transform reset, snapshot timing, z-index collision, or state machine transition - agents guess wrong.
Fixes cascade - Changing one timing constant affects multiple animations. Agents can’t predict second-order effects.

The StyleSwipe codebase shows this pattern clearly: 6 commits to fix one race condition, days of follow-up fixes after a major refactor, and 47 synchronization points that must all work together.

For agents to improve at visual bugs, they need:

Better mental models of timing (not just logic)
Explicit documentation of “what not to do”
Tools to visualize state across time, not just at a single point
Architectural patterns that reduce coordination requirements

The Vicious Cycle

Agent doesn’t understand system → makes localized fix
Localized fix breaks something else → another agent adds guard
Guards accumulate → system becomes harder to understand
Next agent understands even less → makes even more localized fix
Repeat

The Visual Bug Paradox

Visual bugs create a paradox for coding agents:

The symptom is visible but the cause is invisible (timing gaps between systems)
The obvious fix (reset values, add checks) often makes it worse
The correct fix (remove code, use snapshots, delay unmount) is counter-intuitive
Testing is impossible with standard tools (can’t unit test a 16ms flicker)
Success requires removal but agents are trained to add code

The agents who succeeded at visual bugs shared one trait: they understood the system well enough to remove code rather than add it.

The agents who failed kept adding guards, checks, and workarounds—each one a monument to their incomplete understanding, and each one making the next agent’s job harder.

The Documentation Debt

Every defensive comment is technical debt:

“Do not reset ghost transforms here” = agent couldn’t fix root cause
Watchdog timer = agent couldn’t prevent deadlock
30+ “re-check latest” guards = agent couldn’t eliminate race

The Path Forward

For agents to succeed at visual bugs, they need:

Trace tools - Visualize state across time, not just at one point
Animation debuggers - Step through frame by frame
Pattern libraries - Named solutions for common visual problems
Invariant documentation - What must NEVER change
Integration test gates - Prevent commits that break existing behavior

Until then, visual bugs will remain the domain where agents struggle most.

Executive Summary

Key Finding

Evidence Summary

Why Agents Fail

What Success Looks Like

The Vicious Cycle

Quantified Impact

Part 1: Evidence of Struggle - Git History Patterns

Pattern A: Same-Day Fix Cycles

Pattern B: Major Refactor Immediately Breaks Something

Pattern C: Commits Named “Partial” or “Attempt”

Part 2: Evidence of Struggle - Defensive Code Comments

Category: “Don’t Reset Transforms”

Category: “Watchdog Timers for Deadlocks”

Category: “Race Condition Guards”

Category: “Cursor Identity Confusion”

Category: “Don’t Do This or Everything Breaks”

Part 3: Why Visual Bugs Are Harder Than Logic Bugs

The Complexity Metrics

The Four Timing Domains Problem

The Snapshot Pattern (Emergent Solution)

The Z-Index Collision Problem

Part 4: Cognitive Load Analysis

Why This Exceeds Agent Capabilities

The “Fix One Thing, Break Another” Trap

Part 5: Patterns for Agent Improvement

What Would Help Agents Succeed

Part 6: Agent Behavior Anti-Patterns

Anti-Pattern 1: Symptom-First Diagnosis

Anti-Pattern 2: Over-Engineering Response

Anti-Pattern 3: Magic Number Tinkering

Anti-Pattern 4: Defensive Duplication

Anti-Pattern 5: Hide-Don’t-Fix

Anti-Pattern 6: Shotgun Debugging

Anti-Pattern 7: Planning as Procrastination

Anti-Pattern 8: Partial Fix Acknowledgment

Anti-Pattern 9: Misidentifying the Failing Layer

Anti-Pattern 10: Regression Blindness

Part 7: Diagnostic Accuracy Statistics

Fix Success Rates by Category

The 3-Week Throttle Bug

Commit Message Quality as Diagnostic Signal

Part 8: Why Visual Bugs Specifically Break Agents

The Fundamental Mismatch

The Observability Gap

The Counter-Intuitive Fix Problem

Part 9: What Successful Fixes Look Like

Successful Fix #1: Ghost Flash Fix [code-removal]

Successful Fix #2: Rare+ Overlay Fix [input-validation]

Successful Fix #3: Collection Feed Animation [library-primitive]

Failed Fix for Comparison: Partial Swipe State Fix [partial-fix]

The Pattern: Success vs Failure

What Successful Agents Did Differently

1. They Identified the Exact Moment

2. They Removed Rather Than Added

3. They Validated at Boundaries

4. They Used Library Primitives

The 14-Step Refactor: What Good Planning Looks Like

The Error Classifier Pattern: Pure Functions Win

Golden Rules from Successful Fixes

Quantified Success Factors

Recommendations for Agent Improvement

For Agent Developers

For Codebase Authors

For Users Prompting Agents

Conclusion

The Vicious Cycle

The Visual Bug Paradox

The Documentation Debt

The Path Forward

Successful Fix #1: Ghost Flash Fix `[code-removal]`

Successful Fix #2: Rare+ Overlay Fix `[input-validation]`

Successful Fix #3: Collection Feed Animation `[library-primitive]`

Failed Fix for Comparison: Partial Swipe State Fix `[partial-fix]`