Fix Database Deadlock in Similarity Calculation¶
Priority: 🔴 CRITICAL - Production Database Deadlock Status: Planning Sentry Issue: BEANS-BACKEND-4C
Problem Statement¶
Database deadlock occurring when multiple similarity calculation workers process messages concurrently, leading to PostgreSQL deadlock errors on the region table.
Error Details¶
- Error Type:
Doctrine\DBAL\Exception\DeadlockException - Database Error:
SQLSTATE[40P01]: Deadlock detected - Location:
src/MessageHandler/CalculateSimilarityForBeanHandler.php:60 - Frequency: 1 occurrence (but retries configured)
- Environment: Production
- First Seen: 2025-11-09T02:06:13.356Z
PostgreSQL Deadlock Details¶
Process 845172 waits for ShareLock on transaction 166374; blocked by process 845171.
Process 845171 waits for ShareLock on transaction 166373; blocked by process 845172.
CONTEXT: while updating tuple (5,14) in relation "region"
Classic Circular Wait Deadlock: - Process A locks resource 1, waits for resource 2 - Process B locks resource 2, waits for resource 1 - Neither can proceed → deadlock
Root Cause Analysis¶
The Deadlock Scenario¶
Code Flow¶
-
Handler Entry (
CalculateSimilarityForBeanHandler.php:25) -
Delete Existing Similarities (Line 42-46)
-
Calculate New Similarities (Line 48)
-
Persist Similarities (Lines 50-58)
-
Flush All Changes (Line 60) - DEADLOCK OCCURS HERE
Why Deadlocks Occur¶
1. Concurrent Workers Processing Different Beans¶
Multiple similarity_calculation workers run concurrently (messenger:consume processes messages in parallel)
2. Shared Entity Updates via Doctrine ORM¶
When persisting CoffeeBeanSimilarity, Doctrine:
- Persists the similarity record
- Updates inverse side of ManyToMany relationships on related entities
- CoffeeBean has ManyToMany with: Region, ProcessingMethod, RoastLevel, Variety, FlavorWheelNode
3. Non-Deterministic Lock Order¶
Worker A processing Bean X:
- Loads Bean X with regions [Brazil, Colombia, Ethiopia]
- Creates similarities with Beans [Y, Z, W]
- Flush tries to update regions in some order (e.g., Brazil → Colombia)
Worker B processing Bean Y:
- Loads Bean Y with regions [Colombia, Kenya, Ethiopia]
- Creates similarities with Beans [X, A, B]
- Flush tries to update regions in different order (e.g., Kenya → Ethiopia)
DEADLOCK:
Worker A: Lock Brazil → Wait for Ethiopia
Worker B: Lock Ethiopia → Wait for Brazil
4. Region Table Update Context¶
The error specifically mentions: while updating tuple (5,14) in relation "region"
This happens because:
- CoffeeBean has ManyToMany with Region (line 41 in CoffeeBean.php)
- Region has inverse side: ManyToMany mappedBy regions (line 46 in Region.php)
- When Doctrine flushes, it may update counters or metadata on Region rows
- Multiple workers updating different beans with overlapping regions → lock contention
Entity Relationships¶
CoffeeBeanSimilarity
├─ sourceBean (ManyToOne → CoffeeBean)
└─ similarBean (ManyToOne → CoffeeBean)
CoffeeBean (both source and similar beans)
├─ regions (ManyToMany → Region) ← DEADLOCK HAPPENS HERE
├─ processingMethods (ManyToMany → ProcessingMethod)
├─ roastLevels (ManyToMany → RoastLevel)
├─ varieties (ManyToMany → Variety)
├─ flavorNotes (ManyToMany → FlavorWheelNode)
└─ nonStandardFlavorNotes (ManyToMany → NonStandardFlavorNote)
Region (inverse side)
└─ coffeeBeans (ManyToMany mappedBy regions)
Trigger Conditions¶
The deadlock is triggered when:
1. Multiple workers consume from similarity_calculation queue concurrently
2. Different coffee beans share common regions (or other ManyToMany entities)
3. Lock acquisition order differs between workers
4. Timing coincidence - both workers flush at approximately same time
Why It's Rare (But Critical)¶
- Single occurrence suggests specific race condition
- Happens when:
- Multiple beans persisted simultaneously (triggers
CoffeeBeanListener::postPersist) - Beans share common regions (e.g., "Ethiopia", "Colombia")
- Workers process messages at same time
- Retry mechanism makes it eventually succeed (3 retries configured)
Impact Assessment¶
Severity: HIGH 🟠 (Not CRITICAL due to retry mechanism)¶
Positive: - ✅ Retry strategy configured (3 retries with exponential backoff) - ✅ Only 1 occurrence so far - ✅ Message will eventually succeed after retry
Negative: - ❌ Causes message processing delays - ❌ Wastes system resources on retries - ❌ Can cascade if load increases - ❌ Poor user experience if many concurrent bean creations occur
Affected Functionality¶
- Similarity calculation background jobs
- New coffee bean processing pipeline
- Recommendation system data freshness
Solution Options¶
Option 1: Read-Only Entity Manager (Recommended - Quick Fix)¶
Use read-only/detached entities for similarity calculation to avoid triggering updates.
public function __invoke(CalculateSimilarityForBean $message): void
{
$coffeeBeanId = $message->coffeeBeanId;
$this->logger->info('Starting similarity calculation for coffee bean', [
'coffeeBeanId' => $coffeeBeanId,
]);
$sourceBean = $this->coffeeBeanRepository->find($coffeeBeanId);
if (!$sourceBean instanceof CoffeeBean) {
$this->logger->warning('Coffee bean not found for similarity calculation', [
'coffeeBeanId' => $coffeeBeanId,
]);
return;
}
// Delete existing similarities
$this->entityManager->createQuery(
'DELETE FROM App\Entity\CoffeeBeanSimilarity cbs WHERE cbs.sourceBean = :sourceBean'
)
->setParameter('sourceBean', $sourceBean)
->execute();
// Calculate similarities
$similarityResults = $this->similarityService->calculateScores($sourceBean);
// NEW: Detach related beans to prevent relationship updates
foreach ($similarityResults as $result) {
// Detach the similar bean to prevent Doctrine from tracking changes
$this->entityManager->detach($result['bean']);
// Re-fetch as a reference (lightweight proxy without loading all relations)
$similarBeanRef = $this->entityManager->getReference(CoffeeBean::class, $result['bean']->getId());
$similarity = new CoffeeBeanSimilarity(
$sourceBean,
$similarBeanRef, // Use reference instead of full entity
$result['score']
);
$this->entityManager->persist($similarity);
}
$this->entityManager->flush();
$this->logger->info('Completed similarity calculation for coffee bean', [
'coffeeBeanId' => $coffeeBeanId,
'similarityCount' => count($similarityResults),
]);
}
Pros: - Quick fix, minimal code change - Prevents Doctrine from updating related entities - Maintains functionality - Low risk
Cons: - Doesn't address root architectural issue - Still has potential for deadlock if sourceBean is shared
Option 2: Single-Threaded Similarity Queue¶
Configure similarity calculation to run with a single worker.
messenger.php changes:
'similarity_calculation' => [
'dsn' => '%env(MESSENGER_TRANSPORT_DSN)%',
'options' => [
'queue_name' => 'similarity_calculation',
'serializer' => 'messenger.transport.symfony_serializer',
],
'retry_strategy' => [
'max_retries' => 3,
'delay' => 1000,
'multiplier' => 2,
],
// NEW: Limit to single worker via deployment config
],
Deployment/Supervisor config:
# Only run 1 worker for similarity_calculation
messenger:consume similarity_calculation --limit=100 --time-limit=3600
Pros: - Completely eliminates deadlock risk - Simple configuration change - No code changes needed
Cons: - Reduced throughput - Slower similarity calculation processing - Bottleneck if many beans created simultaneously - Doesn't scale
Option 3: Batch Processing with Row-Level Locking¶
Use explicit locking or batch processing to control concurrency.
public function __invoke(CalculateSimilarityForBean $message): void
{
// Use transaction with explicit lock timeout
$this->entityManager->getConnection()->beginTransaction();
try {
// Set lock timeout to prevent indefinite waits
$this->entityManager->getConnection()->executeStatement(
'SET LOCAL lock_timeout = \'2s\''
);
// Rest of processing...
$this->entityManager->getConnection()->commit();
} catch (DeadlockException $e) {
$this->entityManager->getConnection()->rollBack();
throw $e; // Let retry mechanism handle it
}
}
Pros: - More control over locking behavior - Faster failure detection - Explicit error handling
Cons: - Still requires retry mechanism - More complex code - Doesn't prevent deadlock, just handles it faster
Option 4: Deferred Similarity Updates (Eventual Consistency)¶
Calculate similarities asynchronously without blocking, update in batches.
Pros: - Highest scalability - No blocking operations - Best long-term solution
Cons: - Significant architectural change - More complex implementation - Eventual consistency trade-off
Option 5: Optimistic Locking + Retry (Status Quo Enhancement)¶
Keep current approach but add optimistic locking detection.
// Add version column to CoffeeBeanSimilarity
#[ORM\Version]
#[ORM\Column(type: 'integer')]
private int $version = 0;
Pros: - Minimal changes - Leverages existing retry mechanism - Good for low-concurrency scenarios
Cons: - Doesn't prevent deadlocks - Relies on retries
Recommended Approach¶
Phase 1: Immediate Fix (This Week)¶
Implement Option 1 (Detached Entities) + Optimize Retry
- Use detached entities to prevent relationship updates
- Improve retry strategy for faster recovery
- Add monitoring for deadlock occurrences
Phase 2: Medium-Term (Next Sprint)¶
Implement Option 2 (Single Worker) OR optimize concurrent processing
Evaluate: - Current similarity calculation throughput - Typical number of concurrent bean creations - If throughput is low → single worker acceptable - If throughput is high → need better solution (Option 4)
Phase 3: Long-Term (Future)¶
Consider Option 4 (Batch Processing)
If similarity calculations become a bottleneck: - Implement batch similarity recalculation - Decouple from bean creation - Calculate similarities in scheduled jobs
Implementation Plan¶
Step 1: Add Detached Entity Pattern¶
File: src/MessageHandler/CalculateSimilarityForBeanHandler.php
public function __invoke(CalculateSimilarityForBean $message): void
{
$coffeeBeanId = $message->coffeeBeanId;
$this->logger->info('Starting similarity calculation for coffee bean', [
'coffeeBeanId' => $coffeeBeanId,
]);
$sourceBean = $this->coffeeBeanRepository->find($coffeeBeanId);
if (!$sourceBean instanceof CoffeeBean) {
$this->logger->warning('Coffee bean not found for similarity calculation', [
'coffeeBeanId' => $coffeeBeanId,
]);
return;
}
// Delete existing similarities
$this->entityManager->createQuery(
'DELETE FROM App\Entity\CoffeeBeanSimilarity cbs WHERE cbs.sourceBean = :sourceBean'
)
->setParameter('sourceBean', $sourceBean)
->execute();
// Calculate similarities
$similarityResults = $this->similarityService->calculateScores($sourceBean);
// Persist new similarities with detached similar beans
foreach ($similarityResults as $result) {
// Detach the bean to prevent Doctrine from tracking relationship changes
$this->entityManager->detach($result['bean']);
// Get a lightweight reference instead of the full entity
// This creates a proxy without loading all relationships
$similarBeanRef = $this->entityManager->getReference(
CoffeeBean::class,
$result['bean']->getId()
);
$similarity = new CoffeeBeanSimilarity(
$sourceBean,
$similarBeanRef,
$result['score']
);
$this->entityManager->persist($similarity);
}
// Flush - now only updates CoffeeBeanSimilarity table, not related entities
$this->entityManager->flush();
$this->logger->info('Completed similarity calculation for coffee bean', [
'coffeeBeanId' => $coffeeBeanId,
'similarityCount' => count($similarityResults),
]);
}
Step 2: Improve Retry Strategy¶
File: config/packages/messenger.php
'similarity_calculation' => [
'dsn' => '%env(MESSENGER_TRANSPORT_DSN)%',
'options' => [
'queue_name' => 'similarity_calculation',
],
'retry_strategy' => [
'max_retries' => 5, // Increase retries
'delay' => 500, // Faster initial retry (500ms)
'multiplier' => 2, // Exponential backoff
'max_delay' => 10000, // Cap at 10s
],
],
Step 3: Add Deadlock Monitoring¶
File: src/MessageHandler/CalculateSimilarityForBeanHandler.php
Add try-catch for deadlock detection:
use Doctrine\DBAL\Exception\DeadlockException;
public function __invoke(CalculateSimilarityForBean $message): void
{
try {
// ... existing code ...
} catch (DeadlockException $e) {
$this->logger->error('Deadlock detected during similarity calculation', [
'coffeeBeanId' => $message->coffeeBeanId,
'exception' => $e->getMessage(),
]);
// Re-throw to trigger retry mechanism
throw $e;
}
}
Step 4: Add Unit Tests¶
File: tests/MessageHandler/CalculateSimilarityForBeanHandlerTest.php
Add tests for: 1. Successful similarity calculation 2. Deadlock scenario simulation 3. Verify detached entities don't trigger updates 4. Verify retry mechanism works
public function testDeadlockRetryMechanism(): void
{
// Mock EntityManager to throw DeadlockException on first flush
$entityManager = $this->createMock(EntityManagerInterface::class);
$entityManager->expects($this->once())
->method('flush')
->willThrowException(new DeadlockException('Test deadlock', null));
// ... rest of test
$this->expectException(DeadlockException::class);
$handler($message);
}
public function testDetachedEntitiesDontTriggerUpdates(): void
{
// Verify that detaching similar beans prevents relationship updates
// Mock flush to track what gets updated
}
Step 5: Monitor in Production¶
After deployment:
1. Monitor Sentry for BEANS-BACKEND-4C recurrence
2. Check retry count metrics
3. Verify similarity calculation success rate
4. Monitor database lock wait times
Testing Strategy¶
Unit Tests¶
- [ ] Test successful similarity calculation
- [ ] Test handling of non-existent bean
- [ ] Test deadlock exception re-throw
- [ ] Test detached entity behavior
Integration Tests¶
- [ ] Test with multiple concurrent workers
- [ ] Simulate deadlock scenario
- [ ] Verify retry mechanism
- [ ] Test with beans sharing regions
Performance Tests¶
- [ ] Benchmark similarity calculation throughput
- [ ] Test with high concurrency
- [ ] Measure deadlock occurrence rate
Manual Testing¶
- [ ] Create multiple beans simultaneously
- [ ] Monitor database locks
- [ ] Verify similarities are calculated correctly
- [ ] Check retry counts in logs
Deployment Plan¶
Pre-Deployment¶
- Run full test suite:
make test - Test locally with multiple workers
- Review code changes
- Update monitoring alerts
Deployment¶
- Create branch:
fix/similarity-calculation-deadlock - Commit with message:
fix: Prevent database deadlock in similarity calculation using detached entities - Fixes BEANS-BACKEND-4C - Create PR
- Code review
- Deploy to staging
- Run load test with concurrent bean creation
- Deploy to production
Post-Deployment¶
- Monitor Sentry for 72 hours
- Check retry counts
- Verify deadlock rate drops to zero
- Review database performance metrics
Success Criteria¶
- [ ] Zero deadlock errors in similarity calculation
- [ ] Sentry issue BEANS-BACKEND-4C marked as resolved
- [ ] All tests passing
- [ ] No increase in similarity calculation latency
- [ ] Retry rate remains low (<5%)
- [ ] Database lock wait time unchanged or improved
Risk Assessment¶
Risk: LOW 🟢¶
- Detaching entities is safe - prevents unintended updates
- Retry mechanism already in place as safety net
- Easy to rollback if issues occur
Mitigation¶
- Comprehensive testing before deployment
- Staging environment validation
- Gradual rollout monitoring
- Easy rollback plan
Alternative Considerations¶
Why Not Use Database-Level Solutions?¶
Option: Increase lock timeout - Doesn't fix root cause - Just delays the inevitable - Poor user experience
Option: Advisory locks - Over-engineered for this problem - Adds complexity - Harder to maintain
Option: Serializable isolation level - Performance impact - Overkill for this scenario - May cause more conflicts
Performance Impact¶
Expected Impact: NEUTRAL to POSITIVE¶
Neutral: - Detached entities reduce Doctrine overhead - No additional database queries
Positive: - Fewer locks held during flush - Faster flush operations - Reduced database contention
Risk: - Minimal - only changes in-memory entity management
Follow-up Actions¶
After Fix Deployed¶
- Monitor deadlock metrics for 1 week
- Evaluate if single-worker approach needed
- Consider batch processing for future scaling
- Document entity management patterns
Related Improvements¶
- Review other message handlers for similar patterns
- Add database deadlock detection to monitoring
- Create guidelines for concurrent entity updates
Additional Context¶
PostgreSQL Deadlock Detection¶
PostgreSQL automatically detects deadlocks and aborts one transaction. The deadlock_timeout (default 1s) controls detection speed.
Doctrine Behavior¶
- ManyToMany relationships can trigger updates on inverse side
persist()doesn't immediately write to databaseflush()executes all pending operations- Update order is non-deterministic
- Detached entities are not tracked by UnitOfWork
Messenger Retry Strategy¶
Current config: - Max retries: 3 - Initial delay: 1s - Multiplier: 2x - Retries: 1s, 2s, 4s
References¶
- Sentry Issue: https://alpipego.sentry.io/issues/7009563819/
- PostgreSQL Deadlocks: https://www.postgresql.org/docs/current/explicit-locking.html
- Doctrine Detach: https://www.doctrine-project.org/projects/doctrine-orm/en/latest/reference/working-with-objects.html#detaching-entities
- Symfony Messenger Retry: https://symfony.com/doc/current/messenger.html#retries-failures
Estimated Effort¶
- Detached Entities Implementation: 2-3 hours
- Testing: 3-4 hours
- Retry Strategy Tuning: 1 hour
- Monitoring Setup: 1-2 hours
- Deployment & Validation: 2-3 hours
- Total: 9-13 hours (1.5-2 days)
Assignee¶
TBD
Status Updates¶
- 2025-11-09: Plan created after Sentry deadlock analysis
- Next: Implement detached entity pattern and enhanced retry strategy