Skip to content

Simplify Operational Commands

Priority: 🔴 CRITICAL Status: Planning Related QA Analysis: qa-analysis-overview.md

Problem Statement

Two operational commands have extreme complexity that impacts reliability:

VerifyCacheConnectionCommand

File: Command/VerifyCacheConnectionCommand.php:29

Violations:

  • Cyclomatic Complexity: 21/10 (110% over threshold)
  • NPath Complexity: 6,912/250 (2,665% over threshold)
  • Method Length: 204 lines (36% over threshold of 150, 581% over guideline of 30)

ResumeCrawlsCommand

File: Command/ResumeCrawlsCommand.php:30

Violations:

  • Cyclomatic Complexity: 18/10 (80% over threshold)
  • NPath Complexity: 15,288/250 (6,015% over threshold!)
  • Method Length: 150 lines (at threshold, 400% over guideline of 30)

Impact

  • Operational Reliability: These are critical infrastructure/maintenance commands
  • Debugging Difficulty: Extreme complexity makes troubleshooting failures very difficult
  • Test Coverage: NPath complexity indicates exponential test case requirements
  • Maintainability: Long methods violate "Long Methods" anti-pattern

Guideline Violations

  • Long Methods Anti-pattern: Both exceed 30-line guideline by 5-7x
  • Cyclomatic Complexity: Both far exceed threshold
  • SOLID - Single Responsibility Principle: Commands doing too much

Current Issues

VerifyCacheConnectionCommand (204 lines, CC: 21)

Likely handles:

  • Cache connection testing
  • Multiple cache backend verification
  • Error handling and reporting
  • Output formatting
  • Connection diagnostics
  • Performance metrics
  • Configuration validation

ResumeCrawlsCommand (150 lines, CC: 18)

Likely handles:

  • Finding paused/failed crawls
  • Validation of crawl state
  • Resume logic
  • Error recovery
  • Status reporting
  • Orchestration of multiple crawls
  • Cleanup operations

Proposed Refactoring Strategy

Step 1: Analyze Command Responsibilities

For each command:

  • Document all operations performed
  • Identify business logic vs orchestration
  • Find reusable service logic
  • Map error handling flows

Step 2: Extract Service Layer Logic

VerifyCacheConnectionCommand:

  • Create CacheConnectionVerifier service
  • Create CacheDiagnosticsReporter service
  • Keep command as thin orchestrator
  • Move verification logic to services

ResumeCrawlsCommand:

  • Create CrawlResumeService service
  • Create CrawlStateValidator service
  • Extract crawl selection logic
  • Move resume logic to service layer

Step 3: Simplify Control Flow

  • Extract complex conditionals into named methods
  • Use early returns to reduce nesting
  • Replace complex conditions with strategy pattern
  • Break into smaller, focused methods

Step 4: Improve Error Handling

  • Centralize error handling logic
  • Create exception hierarchy if needed
  • Use try-catch blocks appropriately
  • Separate error reporting from logic

Step 5: Enhance Output/Reporting

  • Extract output formatting to helper
  • Use consistent output methods
  • Separate reporting from business logic
  • Consider using progress bars for long operations

Success Criteria

VerifyCacheConnectionCommand

  • __invoke() method < 30 lines (guideline adherence)
  • Cyclomatic complexity < 10
  • NPath complexity < 250
  • Business logic extracted to services
  • Clear, testable service boundaries

ResumeCrawlsCommand

  • __invoke() method < 30 lines (guideline adherence)
  • Cyclomatic complexity < 10
  • NPath complexity < 250
  • Resume logic in dedicated service
  • Easy to test crawl resume scenarios

General

  • Commands are thin orchestrators only
  • Improved debuggability
  • Better error messages
  • Service logic is reusable
  • Comprehensive test coverage

Risk Assessment

Medium-High Risk:

  • Critical operational commands
  • Complex existing logic to preserve
  • Failure could impact crawler operations

Mitigation:

  • Thorough testing before refactoring
  • Test all error scenarios
  • Maintain backward compatibility
  • Run on staging before production
  • Keep detailed logs during refactoring

Estimated Effort

Medium - Both commands require:

  • Careful analysis of existing logic
  • Service extraction
  • Comprehensive testing
  • Documentation updates

Estimated Time:

  • VerifyCacheConnectionCommand: 1-2 days
  • ResumeCrawlsCommand: 1-2 days
  • Testing & verification: 1 day
  • Total: 3-5 days

Dependencies

None - can be addressed independently

Suggested Order

  1. Start with ResumeCrawlsCommand (slightly simpler, but very high NPath)
  2. Then VerifyCacheConnectionCommand (more complex, critical for operations)

Notes

  • NPath complexity is EXTREMELY high (15,288 and 6,912) - indicates many execution paths
  • These commands are essential for operations - must maintain reliability
  • Consider adding integration tests that exercise the actual commands
  • May discover opportunities to improve crawler resilience during refactoring
  • Both commands likely have hidden bugs due to complexity - refactoring may reveal them