Add Statistics to RoasterCrudController Index¶
Priority: 🟢 LOW - UX Improvement
Status: Planning
Type: Enhancement
Related Files:
- src/Controller/Admin/RoasterCrudController.php
- src/Entity/Roaster.php
- src/Entity/RoasterCrawlConfig.php
- src/Entity/CrawlUrl.php
- src/Entity/CoffeeBean.php
Problem Statement¶
The RoasterCrudController INDEX view doesn't show key statistics about each roaster's crawling activity. Administrators need to click into the detail view to understand which roasters are configured and how productive they are, making it harder to quickly assess roaster health and activity.
User Story¶
As an administrator I want to see crawl configs, URLs, and extracted coffee beans counts for each roaster in the index view So that I can quickly assess roaster configuration and productivity without clicking into details
Current Behavior¶
RoasterCrudController::INDEX displays: - ID (detail only) - Name - Country - Website
Statistics currently hidden: - Crawl configs: Only visible on DETAIL page (lines 71-94) - formatted as clickable links - Crawl URLs: No visibility anywhere - Extracted coffee beans: No visibility anywhere
Administrators cannot easily answer: - Which roasters are configured for crawling? - How many URLs have been discovered for each roaster? - How many coffee beans have been extracted from each roaster? - Which roasters are most/least productive?
Proposed Solution¶
Add three computed fields to display key statistics on the INDEX page: 1. Crawl Configs - Number of crawl configurations 2. URLs - Number of discovered crawl URLs 3. Beans - Number of extracted coffee beans
Design Pattern¶
Follow existing EasyAdmin patterns in the codebase:
- Use TextField::new() with formatValue() callback (established pattern)
- Display on INDEX only (onlyOnIndex())
- Keep consistent with existing field formatters (see lines 75-94)
Entity Relationships¶
Understanding the data structure:
Roaster
↓ OneToMany
RoasterCrawlConfig
↓ OneToMany (inverse side, accessed via repository)
CrawlUrl
↓ ManyToOne
CoffeeBean
Key points:
- Roaster has direct access to crawlConfigs (OneToMany)
- CrawlUrl belongs to RoasterCrawlConfig (not directly to Roaster)
- CoffeeBean is linked via CrawlUrl
Implementation Approach¶
Recommended by Architect: TextField with formatValue callback for all three counts
1. Crawl Configs Count (Simple)¶
yield TextField::new('crawlConfigCount')
->setLabel('Configs')
->onlyOnIndex()
->setSortable(false)
->formatValue(static function ($value, Roaster $entity): string {
return (string) $entity->getCrawlConfigs()->count();
});
2. Crawl URLs Count (Requires Repository Query)¶
// Note: Requires injecting CrawlUrlRepository in constructor
yield TextField::new('urlCount')
->setLabel('URLs')
->onlyOnIndex()
->setSortable(false)
->formatValue(function ($value, Roaster $entity): string {
// Must use non-static callback to access $this->crawlUrlRepository
$crawlConfigIds = $entity->getCrawlConfigs()->map(
fn(RoasterCrawlConfig $config) => $config->getId()
)->toArray();
if (empty($crawlConfigIds)) {
return '0';
}
$count = $this->crawlUrlRepository->createQueryBuilder('cu')
->select('COUNT(cu.id)')
->where('cu.roasterCrawlConfig IN (:configIds)')
->setParameter('configIds', $crawlConfigIds)
->getQuery()
->getSingleScalarResult();
return (string) $count;
});
3. Coffee Beans Count (Requires Repository Query)¶
// Note: Requires injecting CrawlUrlRepository in constructor
yield TextField::new('beanCount')
->setLabel('Beans')
->onlyOnIndex()
->setSortable(false)
->formatValue(function ($value, Roaster $entity): string {
// Must use non-static callback to access $this->crawlUrlRepository
$crawlConfigIds = $entity->getCrawlConfigs()->map(
fn(RoasterCrawlConfig $config) => $config->getId()
)->toArray();
if (empty($crawlConfigIds)) {
return '0';
}
// Count DISTINCT coffee beans linked to this roaster's crawl URLs
$count = $this->crawlUrlRepository->createQueryBuilder('cu')
->select('COUNT(DISTINCT cu.coffeeBean)')
->where('cu.roasterCrawlConfig IN (:configIds)')
->andWhere('cu.coffeeBean IS NOT NULL')
->setParameter('configIds', $crawlConfigIds)
->getQuery()
->getSingleScalarResult();
return (string) $count;
});
Why This Approach?¶
Pros:
1. ✅ Follows established pattern - Uses formatValue() callbacks
2. ✅ Pagination-aware - Only executes for displayed rows (25-50 per page)
3. ✅ No entity changes - No domain model pollution
4. ✅ Accurate counts - Database-level aggregation ensures correctness
5. ✅ Efficient queries - COUNT operations are optimized by database
Cons:
- ⚠️ N+1 Query Potential - Each roaster triggers 2 additional COUNT queries (URLs + Beans)
- ⚠️ Performance concern - With 50 roasters per page: 100 additional queries
- ⚠️ Repository injection needed - Non-static callbacks require CrawlUrlRepository
Performance Considerations¶
Query Count Analysis:
For INDEX page with 25 roasters: - Base query: 1 (fetch roasters) - Crawl configs: Already loaded (OneToMany, no extra query) - URL counts: 25 queries (one COUNT per roaster) - Bean counts: 25 queries (one COUNT per roaster) - Total: 51 queries
For INDEX page with 50 roasters: - Base query: 1 - URL counts: 50 queries - Bean counts: 50 queries - Total: 101 queries
This is a classic N+1 problem and needs optimization!
Recommended Optimization: Custom Query Builder¶
Override createIndexQueryBuilder() to fetch all counts in a single query using subqueries:
public function createIndexQueryBuilder(
SearchDto $searchDto,
EntityDto $entityDto,
FieldCollection $fields,
FilterCollection $filters
): QueryBuilder {
$qb = parent::createIndexQueryBuilder($searchDto, $entityDto, $fields, $filters);
// Add subqueries for URL and Bean counts
$qb->addSelect('(
SELECT COUNT(cu.id)
FROM App\Entity\CrawlUrl cu
JOIN cu.roasterCrawlConfig rcc
WHERE rcc.roaster = entity
) as HIDDEN urlCount')
->addSelect('(
SELECT COUNT(DISTINCT cu.coffeeBean)
FROM App\Entity\CrawlUrl cu
JOIN cu.roasterCrawlConfig rcc
WHERE rcc.roaster = entity
AND cu.coffeeBean IS NOT NULL
) as HIDDEN beanCount');
return $qb;
}
Then use the pre-computed values in formatValue:
// Access pre-computed values from query result
yield TextField::new('urlCount')
->setLabel('URLs')
->onlyOnIndex()
->setSortable(false);
yield TextField::new('beanCount')
->setLabel('Beans')
->onlyOnIndex()
->setSortable(false);
Query Count with This Optimization: - Single query with subqueries - Total: 1 query (vs 51-101 queries) - 50-100x performance improvement
Alternative Approaches Considered¶
❌ Option 1: Entity Method¶
// In Roaster entity
public function getCrawlConfigCount(): int
{
return $this->crawlConfigs->count();
}
// In controller
yield TextField::new('crawlConfigCount')
->onlyOnIndex();
Why rejected: Couples display logic to domain model. The entity shouldn't be concerned with admin UI display needs.
❌ Option 2: Custom Query Builder¶
public function createIndexQueryBuilder(/*...*/)
{
return parent::createIndexQueryBuilder(/*...*/)
->addSelect('COUNT(crawlConfigs) as crawlConfigCount')
->leftJoin('entity.crawlConfigs', 'crawlConfigs')
->groupBy('entity.id');
}
Why rejected: Over-engineered. Adds complexity to query builder for simple count display. EasyAdmin's formatValue is designed for this.
❌ Option 3: Virtual Property with Doctrine¶
Why rejected: Adds unnecessary database column for computed value. Violates normalization.
Implementation Plan¶
Step 1: Override createIndexQueryBuilder¶
File: src/Controller/Admin/RoasterCrudController.php
Location: Add new method after configureFields()
use Doctrine\ORM\QueryBuilder;
use EasyCorp\Bundle\EasyAdminBundle\Collection\FieldCollection;
use EasyCorp\Bundle\EasyAdminBundle\Collection\FilterCollection;
use EasyCorp\Bundle\EasyAdminBundle\Dto\EntityDto;
use EasyCorp\Bundle\EasyAdminBundle\Dto\SearchDto;
public function createIndexQueryBuilder(
SearchDto $searchDto,
EntityDto $entityDto,
FieldCollection $fields,
FilterCollection $filters
): QueryBuilder {
$qb = parent::createIndexQueryBuilder($searchDto, $entityDto, $fields, $filters);
// Add subqueries for URL and Bean counts to avoid N+1 queries
$qb->addSelect('(
SELECT COUNT(cu.id)
FROM App\Entity\CrawlUrl cu
JOIN cu.roasterCrawlConfig rcc
WHERE rcc.roaster = entity
) as HIDDEN urlCount')
->addSelect('(
SELECT COUNT(DISTINCT cu.coffeeBean)
FROM App\Entity\CrawlUrl cu
JOIN cu.roasterCrawlConfig rcc
WHERE rcc.roaster = entity
AND cu.coffeeBean IS NOT NULL
) as HIDDEN beanCount');
return $qb;
}
Step 2: Add Count Fields¶
File: src/Controller/Admin/RoasterCrudController.php
Location: After line 62 (after website field, before crawlConfigs fieldset)
// Crawl Configs count - simple collection count
yield TextField::new('crawlConfigCount')
->setLabel('Configs')
->onlyOnIndex()
->setSortable(false)
->formatValue(static function ($value, Roaster $entity): string {
return (string) $entity->getCrawlConfigs()->count();
});
// URLs count - uses pre-computed value from createIndexQueryBuilder
yield TextField::new('urlCount')
->setLabel('URLs')
->onlyOnIndex()
->setSortable(false);
// Beans count - uses pre-computed value from createIndexQueryBuilder
yield TextField::new('beanCount')
->setLabel('Beans')
->onlyOnIndex()
->setSortable(false);
Step 3: Update Field Order (Optional)¶
Consider reordering fields for better UX:
Current Order: 1. Name 2. Country 3. Website
Proposed Order: 1. Name 2. Country 3. Configs ← New field 4. URLs ← New field 5. Beans ← New field 6. Website
Rationale: Statistics are more actionable than website URL for assessing roaster health.
Step 4: Import Required Classes¶
File: src/Controller/Admin/RoasterCrudController.php
Location: Top of file with other use statements
use Doctrine\ORM\QueryBuilder;
use EasyCorp\Bundle\EasyAdminBundle\Collection\FieldCollection;
use EasyCorp\Bundle\EasyAdminBundle\Collection\FilterCollection;
use EasyCorp\Bundle\EasyAdminBundle\Dto\EntityDto;
use EasyCorp\Bundle\EasyAdminBundle\Dto\SearchDto;
Testing Strategy¶
Manual Testing - Data Accuracy¶
- [ ] No Activity: Roaster with 0 configs, 0 URLs, 0 beans - verify "0 | 0 | 0"
- [ ] Configured Only: Roaster with 1 config, 0 URLs, 0 beans - verify "1 | 0 | 0"
- [ ] URLs Discovered: Roaster with URLs but no beans - verify correct counts
- [ ] Full Pipeline: Roaster with configs, URLs, and beans - verify all counts accurate
- [ ] Multiple Configs: Roaster with 3+ configs - verify aggregated counts
- [ ] Duplicate Beans: Same bean from multiple URLs - verify COUNT DISTINCT works
- [ ] Pagination: Verify counts display correctly across pages
- [ ] Search/Filter: Verify counts remain accurate when filtering
- [ ] Page Scope: Verify fields only appear on INDEX (not DETAIL/EDIT/NEW)
Performance Testing - Query Optimization¶
- [ ] Symfony Profiler: Open profiler and verify single query for roasters list
- [ ] Query Count: Confirm subqueries in SELECT, not separate queries
- [ ] Query Execution Time: Should be <100ms for 50 roasters
- [ ] Large Dataset: Test with 100+ roasters - verify no slowdown
- [ ] No N+1: Confirm query count doesn't scale with roaster count
Database Testing¶
-- Manually verify subquery logic
SELECT
r.id,
r.name,
(SELECT COUNT(cu.id)
FROM crawl_url cu
JOIN roaster_crawl_config rcc ON cu.roaster_crawl_config_id = rcc.id
WHERE rcc.roaster_id = r.id) as url_count,
(SELECT COUNT(DISTINCT cu.coffee_bean_id)
FROM crawl_url cu
JOIN roaster_crawl_config rcc ON cu.roaster_crawl_config_id = rcc.id
WHERE rcc.roaster_id = r.id
AND cu.coffee_bean_id IS NOT NULL) as bean_count
FROM roaster r
LIMIT 25;
Regression Testing¶
- [ ] Verify existing fields (name, country, website) still work
- [ ] Verify detail page crawl configs links still work
- [ ] Verify actions (View URLs, Add Crawl Config) still work
- [ ] Verify search functionality unaffected
- [ ] Verify filters work correctly
- [ ] Verify sorting by other columns works
Performance Considerations¶
Why Custom QueryBuilder Approach Works¶
The Problem Solved: - ❌ Before: N+1 queries (1 + 2N where N = number of roasters) - ✅ After: Single query with subqueries
How it works:
1. Override createIndexQueryBuilder() to add subqueries
2. Subqueries execute as part of the main SELECT
3. Database optimizes subquery execution
4. All counts fetched in one round-trip
Generated SQL (approximate):
SELECT
r.*,
(SELECT COUNT(cu.id)
FROM crawl_url cu
JOIN roaster_crawl_config rcc ON cu.roaster_crawl_config_id = rcc.id
WHERE rcc.roaster_id = r.id) as urlCount,
(SELECT COUNT(DISTINCT cu.coffee_bean_id)
FROM crawl_url cu
JOIN roaster_crawl_config rcc ON cu.roaster_crawl_config_id = rcc.id
WHERE rcc.roaster_id = r.id
AND cu.coffee_bean_id IS NOT NULL) as beanCount
FROM roaster r
LIMIT 25 OFFSET 0;
Performance Impact:
- Single query regardless of page size
- Subqueries use indexes (on roaster_crawl_config.roaster_id and crawl_url.roaster_crawl_config_id)
- Database can optimize all counts together
- No additional round-trips to database
Recommended Database Indexes¶
Ensure these indexes exist for optimal performance:
-- On roaster_crawl_config
CREATE INDEX idx_rcc_roaster_id ON roaster_crawl_config(roaster_id);
-- On crawl_url
CREATE INDEX idx_cu_config_id ON crawl_url(roaster_crawl_config_id);
CREATE INDEX idx_cu_bean_id ON crawl_url(coffee_bean_id);
These are likely already in place due to foreign key constraints.
Success Criteria¶
- [ ] Three count fields displayed on RoasterCrudController INDEX: Configs, URLs, Beans
- [ ] All counts are accurate for all roasters
- [ ] Performance: Single query total (verified in Symfony profiler)
- [ ] No N+1: Query count doesn't scale with number of roasters
- [ ] Counts use subqueries in main SELECT
- [ ] Fields only appear on INDEX page (not DETAIL/EDIT/NEW)
- [ ] Visual consistency with existing fields
- [ ] Works correctly with pagination
- [ ] Works correctly with search/filters
- [ ] COUNT DISTINCT works for beans (no duplicates from multiple URLs)
Deployment Plan¶
Pre-Deployment¶
- Implement changes locally
- Test with various roaster counts (0, 1, many)
- Check Symfony profiler for queries
- Verify across different page sizes
Deployment¶
- Create branch:
feature/roaster-crawl-config-count - Commit:
feat: Add crawl config count to Roaster index view - Create PR
- Code review
- Merge to main
- Deploy to staging
- Verify in staging admin panel
- Deploy to production
Post-Deployment¶
- Verify counts display correctly in production
- Monitor performance metrics
- Gather user feedback
Risk Assessment¶
Risk: VERY LOW 🟢¶
Why: - Display-only change, no business logic affected - Uses established EasyAdmin patterns - No database schema changes - Easy to rollback (just remove field)
Mitigation¶
- Test thoroughly with Symfony profiler
- Verify on staging before production
- Monitor performance after deployment
Estimated Effort¶
- Implementation: 1-1.5 hours (query builder + 3 fields)
- Testing: 1-1.5 hours (data accuracy + performance verification)
- Code Review: 30 minutes
- Deployment: 15 minutes
- Total: ~3-3.5 hours
Related Documentation¶
- EasyAdmin Fields: https://symfony.com/bundles/EasyAdminBundle/current/fields.html
- formatValue Documentation: https://symfony.com/bundles/EasyAdminBundle/current/fields/TextField.html#format-value
- Doctrine Collections: https://www.doctrine-project.org/projects/doctrine-collections/en/latest/index.html
Assignee¶
TBD
Status Updates¶
- 2025-11-09: Plan created based on architect recommendations
- Next: Implement and test