Simplify Operational Commands¶
Priority: 🔴 CRITICAL Status: Planning Related QA Analysis: qa-analysis-overview.md
Problem Statement¶
Two operational commands have extreme complexity that impacts reliability:
VerifyCacheConnectionCommand¶
File: Command/VerifyCacheConnectionCommand.php:29
Violations:
- Cyclomatic Complexity: 21/10 (110% over threshold)
- NPath Complexity: 6,912/250 (2,665% over threshold)
- Method Length: 204 lines (36% over threshold of 150, 581% over guideline of 30)
ResumeCrawlsCommand¶
File: Command/ResumeCrawlsCommand.php:30
Violations:
- Cyclomatic Complexity: 18/10 (80% over threshold)
- NPath Complexity: 15,288/250 (6,015% over threshold!)
- Method Length: 150 lines (at threshold, 400% over guideline of 30)
Impact¶
- Operational Reliability: These are critical infrastructure/maintenance commands
- Debugging Difficulty: Extreme complexity makes troubleshooting failures very difficult
- Test Coverage: NPath complexity indicates exponential test case requirements
- Maintainability: Long methods violate "Long Methods" anti-pattern
Guideline Violations¶
- Long Methods Anti-pattern: Both exceed 30-line guideline by 5-7x
- Cyclomatic Complexity: Both far exceed threshold
- SOLID - Single Responsibility Principle: Commands doing too much
Current Issues¶
VerifyCacheConnectionCommand (204 lines, CC: 21)¶
Likely handles:
- Cache connection testing
- Multiple cache backend verification
- Error handling and reporting
- Output formatting
- Connection diagnostics
- Performance metrics
- Configuration validation
ResumeCrawlsCommand (150 lines, CC: 18)¶
Likely handles:
- Finding paused/failed crawls
- Validation of crawl state
- Resume logic
- Error recovery
- Status reporting
- Orchestration of multiple crawls
- Cleanup operations
Proposed Refactoring Strategy¶
Step 1: Analyze Command Responsibilities¶
For each command:
- Document all operations performed
- Identify business logic vs orchestration
- Find reusable service logic
- Map error handling flows
Step 2: Extract Service Layer Logic¶
VerifyCacheConnectionCommand:
- Create
CacheConnectionVerifierservice - Create
CacheDiagnosticsReporterservice - Keep command as thin orchestrator
- Move verification logic to services
ResumeCrawlsCommand:
- Create
CrawlResumeServiceservice - Create
CrawlStateValidatorservice - Extract crawl selection logic
- Move resume logic to service layer
Step 3: Simplify Control Flow¶
- Extract complex conditionals into named methods
- Use early returns to reduce nesting
- Replace complex conditions with strategy pattern
- Break into smaller, focused methods
Step 4: Improve Error Handling¶
- Centralize error handling logic
- Create exception hierarchy if needed
- Use try-catch blocks appropriately
- Separate error reporting from logic
Step 5: Enhance Output/Reporting¶
- Extract output formatting to helper
- Use consistent output methods
- Separate reporting from business logic
- Consider using progress bars for long operations
Success Criteria¶
VerifyCacheConnectionCommand¶
__invoke()method < 30 lines (guideline adherence)- Cyclomatic complexity < 10
- NPath complexity < 250
- Business logic extracted to services
- Clear, testable service boundaries
ResumeCrawlsCommand¶
__invoke()method < 30 lines (guideline adherence)- Cyclomatic complexity < 10
- NPath complexity < 250
- Resume logic in dedicated service
- Easy to test crawl resume scenarios
General¶
- Commands are thin orchestrators only
- Improved debuggability
- Better error messages
- Service logic is reusable
- Comprehensive test coverage
Risk Assessment¶
Medium-High Risk:
- Critical operational commands
- Complex existing logic to preserve
- Failure could impact crawler operations
Mitigation:
- Thorough testing before refactoring
- Test all error scenarios
- Maintain backward compatibility
- Run on staging before production
- Keep detailed logs during refactoring
Estimated Effort¶
Medium - Both commands require:
- Careful analysis of existing logic
- Service extraction
- Comprehensive testing
- Documentation updates
Estimated Time:
- VerifyCacheConnectionCommand: 1-2 days
- ResumeCrawlsCommand: 1-2 days
- Testing & verification: 1 day
- Total: 3-5 days
Dependencies¶
None - can be addressed independently
Suggested Order¶
- Start with ResumeCrawlsCommand (slightly simpler, but very high NPath)
- Then VerifyCacheConnectionCommand (more complex, critical for operations)
Notes¶
- NPath complexity is EXTREMELY high (15,288 and 6,912) - indicates many execution paths
- These commands are essential for operations - must maintain reliability
- Consider adding integration tests that exercise the actual commands
- May discover opportunities to improve crawler resilience during refactoring
- Both commands likely have hidden bugs due to complexity - refactoring may reveal them