Skip to content

Refactor CoffeeBeanPersister

Priority: 🟠 HIGH Status: Planning Related QA Analysis: qa-analysis-overview.md

Problem Statement

Service/Crawler/Persistance/CoffeeBeanPersister.php:52 has excessive complexity:

Violations

  • Method: processCoffeeBeanFromDTO()
  • Cyclomatic Complexity: 14/10 (40% over threshold)
  • NPath Complexity: 308/250 (23% over threshold)

Impact

  • Data Quality: Critical path for crawler data ingestion
  • Bug Risk: Complex persistence logic is fragile and error-prone
  • Maintainability: Difficult to understand and modify
  • Testing: Hard to test all execution paths
  • Mixed Concerns: Likely mixing validation, transformation, and persistence

Guideline Violations

  • SOLID - Single Responsibility Principle: Method doing too much
  • Separation of Concerns: Business logic mixed with persistence

Current Issues

Based on method name and complexity, processCoffeeBeanFromDTO() likely handles:

  1. DTO validation
  2. Data transformation (DTO → Entity)
  3. Business rule validation
  4. Relationship handling (roaster, variety, region, etc.)
  5. Duplicate detection
  6. Entity creation or update logic
  7. Persistence orchestration
  8. Error handling
  9. Possibly logging/auditing

Root Cause Analysis

High complexity likely from:

  • Multiple conditional paths for create vs update
  • Complex relationship resolution
  • Validation logic embedded in method
  • Error handling for various failure scenarios
  • Duplicate detection logic
  • Data transformation rules

Proposed Refactoring Strategy

Step 1: Analyze Current Method

  • Document all operations performed
  • Identify business rules
  • Map conditional branches
  • Find duplication opportunities
  • Review error scenarios

Step 2: Extract Validation Logic

Create dedicated validators:

  • CoffeeBeanDtoValidator - Validate DTO data
  • CoffeeBeanBusinessRules - Business rule validation
  • Keep validation separate from persistence

Step 3: Extract Transformation Logic

Create dedicated transformers:

  • CoffeeBeanDtoToEntityTransformer - DTO → Entity mapping
  • Handle data type conversions
  • Apply transformation rules
  • Keep transformation pure (no side effects)

Step 4: Extract Relationship Resolution

Create relationship resolver:

  • CoffeeBeanRelationshipResolver
  • Resolve roaster, variety, region, etc.
  • Handle relationship creation/lookup
  • Separate concern from main persistence

Step 5: Simplify Persistence Method

Reduce processCoffeeBeanFromDTO() to orchestration:

public function processCoffeeBeanFromDTO(CoffeeBeanDTO $dto): CoffeeBean
{
    // Validate
    $this->validator->validate($dto);

    // Transform
    $entity = $this->transformer->transform($dto);

    // Resolve relationships
    $this->relationshipResolver->resolve($entity, $dto);

    // Check for duplicates
    $existing = $this->duplicateDetector->find($entity);
    if ($existing) {
        return $this->updateExisting($existing, $entity);
    }

    // Persist
    return $this->persist($entity);
}

Step 6: Extract Duplicate Detection

  • CoffeeBeanDuplicateDetector service
  • Clear duplicate detection rules
  • Separate from persistence logic

Success Criteria

  • processCoffeeBeanFromDTO() cyclomatic complexity < 10
  • NPath complexity < 250
  • Method < 30 lines (guideline adherence)
  • Clear separation of concerns:
    • Validation
    • Transformation
    • Relationship resolution
    • Duplicate detection
    • Persistence
  • Improved testability
  • No regression in data quality
  • Better error messages

Testing Strategy

Before Refactoring

  • Add comprehensive integration tests
  • Test all known scenarios:
    • New coffee bean creation
    • Duplicate detection
    • Update existing bean
    • Various DTO variations
    • Error scenarios

After Refactoring

  • Unit test each extracted component
  • Verify integration tests still pass
  • Add tests for edge cases discovered
  • Test error handling paths

Risk Assessment

High Risk:

  • Critical data ingestion path
  • Affects crawler data quality
  • Complex existing logic to preserve
  • Potential for subtle bugs

Mitigation:

  • Comprehensive test coverage first
  • Incremental extraction (one concern at a time)
  • Extensive integration testing
  • Test with production-like data
  • Feature flag new implementation
  • Monitor data quality metrics post-deployment

Estimated Effort

Medium:

  • Method analysis: 0.5 day
  • Test coverage: 1 day
  • Validator extraction: 0.5 day
  • Transformer extraction: 1 day
  • Relationship resolver: 1 day
  • Duplicate detector: 0.5 day
  • Integration & testing: 1.5 days
  • Total: 6 days

Implementation Order

  1. Add test coverage (critical!)
  2. Extract validator (easiest, least risky)
  3. Extract transformer
  4. Extract relationship resolver
  5. Extract duplicate detector
  6. Simplify main method
  7. Comprehensive testing

Dependencies

  • H1: Simplify Domain Entities - CoffeeBean entity refactoring may impact this
    • Consider coordinating if entity gets value objects
    • Transformer may need updates
  • May reveal data quality issues during refactoring
  • Could expose unclear business rules
  • Might find opportunities to improve duplicate detection

Notes

  • This is the PRIMARY data ingestion point for crawler
  • Data quality is paramount - test thoroughly
  • Complex method suggests unclear requirements - may need domain expert input
  • Consider adding data quality metrics/monitoring
  • May discover edge cases not currently handled
  • Refactoring could improve crawler reliability overall