Skip to content

Coffee Bean Image Management - Planning Document

This document covers three related image features: 1. Backend Image Dashboard (Priority: ASAP) 2. Broken Image Crawl Job (Priority: Medium-term) 3. Image Caching & Proxy (Priority: Later, needs most planning)


Design Decisions (Confirmed)

Decision Choice Rationale
Phase 2: Check result storage Separate ImageCheck table Audit trail, check history, cleaner entity
Phase 2: Validation method GET + magic bytes Most thorough - verifies actual image data
Phase 3: Storage backend S3/MinIO Scalable, CDN-ready for future
Phase 3: Image transformation Optimize + thumbnails WebP conversion, multiple sizes
Phase 3: API response Replace imageUrl Proxy URL replaces original in API response

Current State

Entity Structure

  • CoffeeBean.imageUrl: nullable VARCHAR(255) storing external URLs
  • Images extracted from JSON-LD Product schema or Open Graph og:image meta tags
  • URLs normalized (scheme-less // converted to https:)
  • Validated with Symfony URL constraint during FullDiscovery group
  • Image URL is one of 6 "strongly recommended" fields affecting data quality status

Existing Patterns

  • Admin: EasyAdmin 4.x with custom CRUD controllers and dashboards
  • Scheduling: Symfony Scheduler with #[AsSchedule] providers
  • Async Jobs: Symfony Messenger with Doctrine transport
  • HTTP Client: HttpClientInterface with custom exception handling

Phase 1: Backend Image Dashboard (ASAP)

Goal

Create an admin view showing coffee beans with image thumbnails, accessible from the review dashboard.

Implementation Approach

New CRUD Controller: CoffeeBeanImageCrudController

  • Focused view with minimal fields: thumbnail, name, roaster name
  • Filters: roaster, has/missing image URL, data status
  • Sortable by creation date, roaster

Fields Configuration

- ImageField::new('imageUrl')->setLabel('Image') - thumbnail display
- TextField::new('name')->setLabel('Bean Name')
- AssociationField or formatted text for roaster name (via CrawlUrl relationship)
- BooleanField or ChoiceField for image status (present/missing)

Filters

  • RoasterFilter (existing) - filter by roaster
  • BooleanFilter or custom filter for imageUrl IS NOT NULL
  • ChoiceFilter for status
  • Add menu item in DashboardController under "Coffee Bean Management"
  • Add link card in review_dashboard.html.twig

Files to Create/Modify

  • src/Controller/Admin/CoffeeBeanImageCrudController.php (new)
  • src/Controller/Admin/DashboardController.php (menu item)
  • templates/admin/review_dashboard.html.twig (link card)
  • Possibly: src/Filter/HasImageFilter.php (custom filter if needed)

Phase 2: Broken Image Crawl Job (Medium-term)

Goal

Periodically verify image URLs return HTTP 200 and contain valid image content.

Design: Separate ImageCheck Entity

#[ORM\Entity]
class ImageCheck
{
    #[ORM\Id]
    #[ORM\Column(type: 'uuid')]
    private Uuid $id;

    #[ORM\ManyToOne(targetEntity: CoffeeBean::class)]
    #[ORM\JoinColumn(nullable: false, onDelete: 'CASCADE')]
    private CoffeeBean $coffeeBean;

    #[ORM\Column(length: 255)]
    private string $imageUrl;  // Snapshot of URL at check time

    #[ORM\Column(type: 'datetime_immutable')]
    private DateTimeImmutable $checkedAt;

    #[ORM\Column(enumType: ImageCheckStatus::class)]
    private ImageCheckStatus $status;  // VALID, BROKEN, TIMEOUT, ERROR

    #[ORM\Column(nullable: true)]
    private ?int $httpStatusCode = null;

    #[ORM\Column(length: 100, nullable: true)]
    private ?string $contentType = null;

    #[ORM\Column(length: 20, nullable: true)]
    private ?string $detectedFormat = null;  // From magic bytes: jpeg, png, webp, gif

    #[ORM\Column(nullable: true)]
    private ?int $contentLength = null;

    #[ORM\Column(type: 'text', nullable: true)]
    private ?string $errorMessage = null;
}

Validation: GET + Magic Bytes

Image magic byte signatures to detect: - JPEG: FF D8 FF - PNG: 89 50 4E 47 0D 0A 1A 0A - GIF: 47 49 46 38 (GIF8) - WebP: 52 49 46 46 ... 57 45 42 50 (RIFF...WEBP)

Service will: 1. Send GET request with Range: bytes=0-15 header (fetch first 16 bytes only) 2. Check HTTP status code (200 or 206) 3. Verify Content-Type header starts with image/ 4. Match magic bytes against known signatures

Implementation

Entity & Enum

  • src/Entity/ImageCheck.php
  • src/Enum/ImageCheckStatus.php (VALID, BROKEN, TIMEOUT, ERROR)

Service: ImageValidationService

  • validate(string $url): ImageCheckResult
  • Uses HttpClient with Range header
  • Magic byte detection logic
  • Returns structured result

Scheduler: ImageValidationSchedulerService

  • Cron: 0 3 */3 * * (3 AM every 3 days)
  • Query: CoffeeBeans with imageUrl where no ImageCheck in last 3 days
  • Dispatch ImageValidationMessage per bean

Message & Handler

  • src/Message/ImageValidationMessage.php
  • src/MessageHandler/ImageValidationHandler.php

Command: app:validate-images

  • Manual trigger with options: --dry-run, --limit=N, --force

Files to Create

  • src/Entity/ImageCheck.php
  • src/Repository/ImageCheckRepository.php
  • src/Enum/ImageCheckStatus.php
  • src/Service/Image/ImageValidationService.php
  • src/Scheduler/ImageValidationSchedulerService.php
  • src/Message/ImageValidationMessage.php
  • src/MessageHandler/ImageValidationHandler.php
  • src/Command/ValidateImagesCommand.php
  • Migration for image_check table

Phase 3: Image Caching & Proxy (Later - Needs Most Planning)

Goal

Cache external images in S3/MinIO, transform to optimized formats, and serve through our infrastructure.

Storage: S3/MinIO with Flysystem

Use league/flysystem-aws-s3-v3 for abstraction: - Development: MinIO container - Production: S3 or S3-compatible storage - Easy CDN integration later (CloudFront, etc.)

Bucket Structure

images/
├── original/
│   └── {bean_uuid}.{ext}      # Original fetched image
├── optimized/
│   └── {bean_uuid}.webp       # Full-size WebP
└── thumbnails/
    ├── {bean_uuid}_sm.webp    # 150x150
    └── {bean_uuid}_md.webp    # 400x400

Database: CachedImage Entity

#[ORM\Entity]
class CachedImage
{
    #[ORM\Id]
    #[ORM\Column(type: 'uuid')]
    private Uuid $id;

    #[ORM\OneToOne(targetEntity: CoffeeBean::class)]
    #[ORM\JoinColumn(nullable: false, onDelete: 'CASCADE')]
    private CoffeeBean $coffeeBean;

    #[ORM\Column(length: 255)]
    private string $originalUrl;  // Source URL

    #[ORM\Column(length: 64)]
    private string $originalUrlHash;  // SHA256 for dedup

    #[ORM\Column(length: 64, nullable: true)]
    private ?string $contentHash = null;  // SHA256 of image bytes

    #[ORM\Column(type: 'datetime_immutable')]
    private DateTimeImmutable $cachedAt;

    #[ORM\Column(type: 'datetime_immutable', nullable: true)]
    private ?DateTimeImmutable $lastValidatedAt = null;

    #[ORM\Column(enumType: CachedImageStatus::class)]
    private CachedImageStatus $status;  // PENDING, CACHED, FAILED, STALE

    #[ORM\Column(length: 50, nullable: true)]
    private ?string $originalMimeType = null;

    #[ORM\Column(nullable: true)]
    private ?int $originalSize = null;

    #[ORM\Column(nullable: true)]
    private ?int $optimizedSize = null;

    #[ORM\Column(type: 'json', nullable: true)]
    private ?array $variants = null;  // ['sm' => true, 'md' => true, 'full' => true]
}

Image Transformation: Intervention Image

Use intervention/image with GD or Imagick driver: - Convert to WebP (80% quality) - Generate thumbnail sizes: 150x150 (sm), 400x400 (md) - Preserve aspect ratio with cover/contain

Sizes Configuration

// config/packages/image_cache.php or service parameter
'image_variants' => [
    'sm' => ['width' => 150, 'height' => 150, 'fit' => 'cover'],
    'md' => ['width' => 400, 'height' => 400, 'fit' => 'contain'],
    'full' => ['width' => 1200, 'height' => 1200, 'fit' => 'contain'],  // max size
]

Cache Invalidation Strategy

  1. Time-based: Re-validate after 7 days (via Phase 2 job)
  2. On broken detection: Phase 2 marks as STALE, triggers re-fetch
  3. Manual: Admin action to invalidate and re-cache
  4. Source URL change: If CoffeeBean.imageUrl changes, invalidate

API Integration

Proxy Controller

GET /api/images/{beanUuid}              → Full optimized WebP
GET /api/images/{beanUuid}?size=sm      → 150x150 thumbnail
GET /api/images/{beanUuid}?size=md      → 400x400 medium
GET /api/images/{beanUuid}?original=1   → Original (if stored)

Response headers: - Cache-Control: public, max-age=86400 (1 day) - ETag based on content hash - Content-Type: image/webp

DTO Changes

Modify CoffeeBeanDTO mapping: - imageUrl returns proxy URL when cached, fallback to original if not cached - Original URL stored in CachedImage.originalUrl for reference - Add imageVariants array for size options: ['sm' => url, 'md' => url, 'full' => url]

// In EntityToDtoMapper
$imageUrl = $cachedImage?->getStatus() === CachedImageStatus::CACHED
    ? $this->imageProxyUrlGenerator->generate($coffeeBean)
    : $coffeeBean->getImageUrl();

Implementation

Services

  • ImageCacheService: Orchestrates caching workflow
  • ImageStorageService: S3/Flysystem operations
  • ImageTransformService: Resize/convert with Intervention
  • ImageProxyUrlGenerator: Generate signed/public URLs

Async Processing

  • ImageCacheMessage: Trigger caching for a bean
  • ImageCacheHandler: Fetch, transform, upload to S3
  • Batch job for initial migration of existing images

Controller

  • ImageProxyController: Serve images, handle cache-on-demand

Integration with Phase 2

  1. Phase 2 validates → marks ImageCheck as BROKEN
  2. Listener detects broken check → marks CachedImage as STALE
  3. Re-cache job picks up STALE images → attempts re-fetch
  4. If still broken → CachedImage status = FAILED, API returns fallback

Files to Create

  • src/Entity/CachedImage.php
  • src/Repository/CachedImageRepository.php
  • src/Enum/CachedImageStatus.php
  • src/Service/Image/ImageCacheService.php
  • src/Service/Image/ImageStorageService.php
  • src/Service/Image/ImageTransformService.php
  • src/Service/Image/ImageProxyUrlGenerator.php
  • src/Controller/Api/ImageProxyController.php
  • src/Message/ImageCacheMessage.php
  • src/MessageHandler/ImageCacheHandler.php
  • src/EventListener/BrokenImageListener.php (optional: event-driven)
  • Migration for cached_image table
  • Config for Flysystem S3 adapter

Dependencies to Add

composer require league/flysystem-aws-s3-v3
composer require intervention/image

Implementation Order

  1. Phase 1: Image Dashboard - Implement ASAP (simple, self-contained)
  2. Phase 2: Broken Image Job - After Phase 1 (foundation for Phase 3)
  3. Phase 3: Image Caching - After Phase 2 (depends on validation infrastructure)

Critical Files Reference

Existing (patterns to follow)

  • src/Controller/Admin/CoffeeBeanCrudController.php - CRUD controller pattern
  • src/Controller/Admin/DashboardController.php - Menu items, custom routes
  • src/Filter/RoasterFilter.php - Custom filter through relationships
  • src/Scheduler/AvailabilityCrawlSchedulerService.php - Scheduler pattern
  • src/MessageHandler/CrawlStepHandler.php - Message handler pattern
  • templates/admin/review_dashboard.html.twig - Dashboard template

New Files Summary

Phase 1 (4-5 files)

  • src/Controller/Admin/CoffeeBeanImageCrudController.php
  • src/Filter/HasImageFilter.php (optional)
  • Modify: DashboardController.php, review_dashboard.html.twig

Phase 2 (9 files + migration)

  • src/Entity/ImageCheck.php
  • src/Repository/ImageCheckRepository.php
  • src/Enum/ImageCheckStatus.php
  • src/Service/Image/ImageValidationService.php
  • src/Scheduler/ImageValidationSchedulerService.php
  • src/Message/ImageValidationMessage.php
  • src/MessageHandler/ImageValidationHandler.php
  • src/Command/ValidateImagesCommand.php

Phase 3 (12+ files + migration + config)

  • src/Entity/CachedImage.php
  • src/Repository/CachedImageRepository.php
  • src/Enum/CachedImageStatus.php
  • src/Service/Image/ImageCacheService.php
  • src/Service/Image/ImageStorageService.php
  • src/Service/Image/ImageTransformService.php
  • src/Service/Image/ImageProxyUrlGenerator.php
  • src/Controller/Api/ImageProxyController.php
  • src/Message/ImageCacheMessage.php
  • src/MessageHandler/ImageCacheHandler.php
  • Config: Flysystem S3 adapter
  • Modify: CoffeeBeanDTO.php, EntityToDtoMapper.php