# Quickawards Performance Optimization Plan ## Overview This document outlines the comprehensive optimization plan for Quickawards, focusing primarily on resolving critical performance issues in QSO statistics queries. ## Critical Performance Issue ### Current Problem The `getQSOStats()` function loads ALL user QSOs into memory before calculating statistics: - **Location**: `src/backend/services/lotw.service.js:496-517` - **Impact**: Users with 200k QSOs experience 5-10 second page loads - **Memory Usage**: 100MB+ per request - **Concurrent Users**: Limited to 2-3 due to memory pressure ### Root Cause ```javascript // Current implementation (PROBLEMATIC) export async function getQSOStats(userId) { const allQSOs = await db.select().from(qsos).where(eq(qsos.userId, userId)); // Loads 200k+ records into memory // ... processes with .filter() and .forEach() } ``` ### Target Performance - **Query Time**: <100ms for 200k QSO users (currently 5-10 seconds) - **Memory Usage**: <1MB per request (currently 100MB+) - **Concurrent Users**: Support 50+ concurrent users ## Optimization Plan ### Phase 1: Emergency Performance Fix (Week 1) #### 1.1 SQL Query Optimization **File**: `src/backend/services/lotw.service.js` Replace the memory-intensive `getQSOStats()` function with SQL-based aggregates: ```javascript // Optimized implementation export async function getQSOStats(userId) { const [basicStats, uniqueStats] = await Promise.all([ // Basic statistics db.select({ total: sql`COUNT(*)`, confirmed: sql`SUM(CASE WHEN lotw_qsl_rstatus = 'Y' OR dcl_qsl_rstatus = 'Y' THEN 1 ELSE 0 END)` }).from(qsos).where(eq(qsos.userId, userId)), // Unique counts db.select({ uniqueEntities: sql`COUNT(DISTINCT entity)`, uniqueBands: sql`COUNT(DISTINCT band)`, uniqueModes: sql`COUNT(DISTINCT mode)` }).from(qsos).where(eq(qsos.userId, userId)) ]); return { total: basicStats[0].total, confirmed: basicStats[0].confirmed, uniqueEntities: uniqueStats[0].uniqueEntities, uniqueBands: uniqueStats[0].uniqueBands, uniqueModes: uniqueStats[0].uniqueModes, }; } ``` **Benefits**: - Query executes entirely in SQLite - Only returns 5 integers instead of 200k+ objects - Reduces memory from 100MB+ to <1MB - Expected query time: 50-100ms for 200k QSOs #### 1.2 Critical Database Indexes **File**: `src/backend/migrations/add-performance-indexes.js` (extend existing file) Add essential indexes for QSO statistics queries: ```javascript // Index for primary user queries await db.run(sql`CREATE INDEX IF NOT EXISTS idx_qsos_user_primary ON qsos(user_id)`); // Index for confirmation status queries await db.run(sql`CREATE INDEX IF NOT EXISTS idx_qsos_user_confirmed ON qsos(user_id, lotw_qsl_rstatus, dcl_qsl_rstatus)`); // Index for unique counts (entity, band, mode) await db.run(sql`CREATE INDEX IF NOT EXISTS idx_qsos_user_unique_counts ON qsos(user_id, entity, band, mode)`); ``` **Benefits**: - Speeds up WHERE clause filtering by 10-100x - Optimizes COUNT(DISTINCT) operations - Critical for sub-100ms query times #### 1.3 Testing & Validation **Test Cases**: 1. Small dataset (1k QSOs): Query time <10ms 2. Medium dataset (50k QSOs): Query time <50ms 3. Large dataset (200k QSOs): Query time <100ms **Validation Steps**: 1. Run test queries with logging enabled 2. Compare memory usage before/after 3. Verify frontend receives identical API response format 4. Load test with 50 concurrent users **Success Criteria**: - ✅ Query time <100ms for 200k QSOs - ✅ Memory usage <1MB per request - ✅ API response format unchanged - ✅ No errors in production for 1 week ### Phase 2: Stability & Monitoring (Week 2) #### 2.1 Basic Caching Layer **File**: `src/backend/services/lotw.service.js` Add 5-minute TTL cache for QSO statistics: ```javascript const statsCache = new Map(); export async function getQSOStats(userId) { const cacheKey = `stats_${userId}`; const cached = statsCache.get(cacheKey); if (cached && Date.now() - cached.timestamp < 300000) { // 5 minutes return cached.data; } // Run optimized SQL query (from Phase 1.1) const stats = await calculateStatsWithSQL(userId); statsCache.set(cacheKey, { data: stats, timestamp: Date.now() }); return stats; } // Invalidate cache after QSO syncs export async function invalidateStatsCache(userId) { statsCache.delete(`stats_${userId}`); } ``` **Benefits**: - Cache hit: <1ms response time - Reduces database load by 80-90% - Automatic cache invalidation after syncs #### 2.2 Performance Monitoring **File**: `src/backend/utils/logger.js` (extend existing) Add query performance tracking: ```javascript export async function trackQueryPerformance(queryName, fn) { const start = performance.now(); const result = await fn(); const duration = performance.now() - start; logger.debug('Query Performance', { query: queryName, duration: `${duration.toFixed(2)}ms`, threshold: duration > 100 ? 'SLOW' : 'OK' }); if (duration > 500) { logger.warn('Slow query detected', { query: queryName, duration: `${duration.toFixed(2)}ms` }); } return result; } // Usage in getQSOStats: const stats = await trackQueryPerformance('getQSOStats', () => calculateStatsWithSQL(userId) ); ``` **Benefits**: - Detect performance regressions early - Identify slow queries in production - Data-driven optimization decisions #### 2.3 Cache Invalidation Hooks **Files**: `src/backend/services/lotw.service.js`, `src/backend/services/dcl.service.js` Invalidate cache after QSO imports: ```javascript // lotw.service.js - after syncQSOs() export async function syncQSOs(userId, lotwUsername, lotwPassword, sinceDate, jobId) { // ... existing sync logic ... await invalidateStatsCache(userId); } // dcl.service.js - after syncQSOs() export async function syncQSOs(userId, dclApiKey, sinceDate, jobId) { // ... existing sync logic ... await invalidateStatsCache(userId); } ``` #### 2.4 Monitoring Dashboard **File**: Create `src/backend/routes/health.js` (or extend existing health endpoint) Add performance metrics to health check: ```javascript app.get('/api/health', async (req) => { return { status: 'healthy', uptime: process.uptime(), database: await checkDatabaseHealth(), performance: { avgQueryTime: getAverageQueryTime(), cacheHitRate: getCacheHitRate(), slowQueriesCount: getSlowQueriesCount() } }; }); ``` ### Phase 3: Scalability Enhancements (Month 1) #### 3.1 SQLite Configuration Optimization **File**: `src/backend/db/index.js` Optimize SQLite for read-heavy workloads: ```javascript const db = new Database('data/award.db'); // Enable WAL mode for better concurrency db.pragma('journal_mode = WAL'); // Increase cache size (default -2000KB, set to 100MB) db.pragma('cache_size = -100000'); // Optimize for SELECT queries db.pragma('synchronous = NORMAL'); // Balance between safety and speed db.pragma('temp_store = MEMORY'); // Keep temporary tables in RAM db.pragma('mmap_size = 30000000000'); // Memory-map database (30GB limit) ``` **Benefits**: - WAL mode allows concurrent reads - Larger cache reduces disk I/O - Memory-mapped I/O for faster access #### 3.2 Materialized Views for Large Datasets **File**: Create `src/backend/migrations/create-materialized-views.js` For users with >50k QSOs, create pre-computed statistics: ```javascript // Create table for pre-computed stats await db.run(sql` CREATE TABLE IF NOT EXISTS qso_stats_cache ( user_id INTEGER PRIMARY KEY, total INTEGER, confirmed INTEGER, unique_entities INTEGER, unique_bands INTEGER, unique_modes INTEGER, updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ) `); // Create trigger to auto-update stats after QSO changes await db.run(sql` CREATE TRIGGER IF NOT EXISTS update_qso_stats AFTER INSERT OR UPDATE OR DELETE ON qsos BEGIN INSERT OR REPLACE INTO qso_stats_cache (user_id, total, confirmed, unique_entities, unique_bands, unique_modes, updated_at) SELECT user_id, COUNT(*) as total, SUM(CASE WHEN lotw_qsl_rstatus = 'Y' OR dcl_qsl_rstatus = 'Y' THEN 1 ELSE 0 END) as confirmed, COUNT(DISTINCT entity) as unique_entities, COUNT(DISTINCT band) as unique_bands, COUNT(DISTINCT mode) as unique_modes, CURRENT_TIMESTAMP as updated_at FROM qsos WHERE user_id = NEW.user_id GROUP BY user_id; END; `); ``` **Benefits**: - Stats updated automatically in real-time - Query time: <5ms for any dataset size - No cache invalidation needed **Usage in getQSOStats()**: ```javascript export async function getQSOStats(userId) { // First check if user has pre-computed stats const cachedStats = await db.select().from(qsoStatsCache).where(eq(qsoStatsCache.userId, userId)); if (cachedStats.length > 0) { return { total: cachedStats[0].total, confirmed: cachedStats[0].confirmed, uniqueEntities: cachedStats[0].uniqueEntities, uniqueBands: cachedStats[0].uniqueBands, uniqueModes: cachedStats[0].uniqueModes, }; } // Fall back to regular query for small users return calculateStatsWithSQL(userId); } ``` #### 3.3 Connection Pooling **File**: `src/backend/db/index.js` Implement connection pooling for better concurrency: ```javascript import { Pool } from 'bun-sqlite3'; const pool = new Pool({ filename: 'data/award.db', max: 10, // Max connections timeout: 30000, // 30 second timeout }); export async function getDb() { return pool.getConnection(); } ``` **Note**: SQLite has limited write concurrency, but read connections can be pooled. #### 3.4 Advanced Caching Strategy **File**: `src/backend/services/cache.service.js` Implement Redis-style caching with Bun's built-in capabilities: ```javascript class CacheService { constructor() { this.cache = new Map(); this.stats = { hits: 0, misses: 0 }; } async get(key) { const value = this.cache.get(key); if (value) { this.stats.hits++; return value.data; } this.stats.misses++; return null; } async set(key, data, ttl = 300000) { this.cache.set(key, { data, timestamp: Date.now(), ttl }); // Auto-expire after TTL setTimeout(() => this.delete(key), ttl); } async delete(key) { this.cache.delete(key); } getStats() { const total = this.stats.hits + this.stats.misses; return { hitRate: total > 0 ? (this.stats.hits / total * 100).toFixed(2) + '%' : '0%', hits: this.stats.hits, misses: this.stats.misses, size: this.cache.size }; } } export const cacheService = new CacheService(); ``` ## Implementation Checklist ### Phase 1: Emergency Performance Fix - [ ] Replace `getQSOStats()` with SQL aggregates - [ ] Add database indexes - [ ] Run migration - [ ] Test with 1k, 50k, 200k QSO datasets - [ ] Verify API response format unchanged - [ ] Deploy to production - [ ] Monitor for 1 week ### Phase 2: Stability & Monitoring - [ ] Implement 5-minute TTL cache - [ ] Add performance monitoring - [ ] Create cache invalidation hooks - [ ] Add performance metrics to health endpoint - [ ] Deploy to production - [ ] Monitor cache hit rate (target >80%) ### Phase 3: Scalability Enhancements - [ ] Optimize SQLite configuration (WAL mode, cache size) - [ ] Create materialized views for large datasets - [ ] Implement connection pooling - [ ] Deploy advanced caching strategy - [ ] Load test with 100+ concurrent users ## Additional Issues Identified (Future Work) ### High Priority 1. **Unencrypted LoTW Password Storage** - **Location**: `src/backend/services/auth.service.js:124` - **Issue**: LoTW password stored in plaintext in database - **Fix**: Encrypt with AES-256 before storing - **Effort**: 4 hours 2. **Weak JWT Secret Security** - **Location**: `src/backend/config.js:27` - **Issue**: Default JWT secret in production - **Fix**: Use environment variable with strong secret - **Effort**: 1 hour 3. **ADIF Parser Logic Error** - **Location**: `src/backend/utils/adif-parser.js:17-18` - **Issue**: Potential data corruption from incorrect parsing - **Fix**: Use case-insensitive regex for `` tags - **Effort**: 2 hours ### Medium Priority 4. **Missing Database Transactions** - **Location**: Sync operations in `lotw.service.js`, `dcl.service.js` - **Issue**: No transaction support for multi-record operations - **Fix**: Wrap syncs in transactions - **Effort**: 6 hours 5. **Memory Leak Potential in Job Queue** - **Location**: `src/backend/services/job-queue.service.js` - **Issue**: Jobs never removed from memory - **Fix**: Implement cleanup mechanism - **Effort**: 4 hours ### Low Priority 6. **Database Path Exposure** - **Location**: Error messages reveal database path - **Issue**: Predictable database location - **Fix**: Sanitize error messages - **Effort**: 2 hours ## Monitoring & Metrics ### Key Performance Indicators (KPIs) 1. **QSO Statistics Query Time** - Target: <100ms for 200k QSOs - Current: 5-10 seconds - Tool: Application performance monitoring 2. **Memory Usage per Request** - Target: <1MB per request - Current: 100MB+ - Tool: Node.js memory profiler 3. **Concurrent Users** - Target: 50+ concurrent users - Current: 2-3 users - Tool: Load testing with Apache Bench 4. **Cache Hit Rate** - Target: >80% after Phase 2 - Current: 0% (no cache) - Tool: Custom metrics in cache service 5. **Database Response Time** - Target: <50ms for all queries - Current: Variable (some queries slow) - Tool: SQLite query logging ### Alerting Thresholds - **Critical**: Query time >500ms - **Warning**: Query time >200ms - **Info**: Cache hit rate <70% ## Rollback Plan If issues arise after deployment: 1. **Phase 1 Rollback** (if SQL query fails): - Revert `getQSOStats()` to original implementation - Keep database indexes (they help performance) - Estimated rollback time: 5 minutes 2. **Phase 2 Rollback** (if cache causes issues): - Disable cache by bypassing cache checks - Keep monitoring (helps diagnose issues) - Estimated rollback time: 2 minutes 3. **Phase 3 Rollback** (if SQLite config causes issues): - Revert SQLite configuration changes - Drop materialized views if needed - Estimated rollback time: 10 minutes ## Success Criteria ### Phase 1 Success - ✅ Query time <100ms for 200k QSOs - ✅ Memory usage <1MB per request - ✅ Zero bugs in production for 1 week - ✅ User feedback: "Page loads instantly now" ### Phase 2 Success - ✅ Cache hit rate >80% - ✅ Database load reduced by 80% - ✅ Zero cache-related bugs for 1 week ### Phase 3 Success - ✅ Support 50+ concurrent users - ✅ Query time <5ms for materialized views - ✅ Zero performance complaints for 1 month ## Timeline - **Week 1**: Phase 1 - Emergency Performance Fix - **Week 2**: Phase 2 - Stability & Monitoring - **Month 1**: Phase 3 - Scalability Enhancements - **Month 2-3**: Address additional high-priority security issues - **Ongoing**: Monitor, iterate, optimize ## Resources ### Documentation - SQLite Performance: https://www.sqlite.org/optoverview.html - Drizzle ORM: https://orm.drizzle.team/ - Bun Runtime: https://bun.sh/docs ### Tools - Query Performance: SQLite EXPLAIN QUERY PLAN - Load Testing: Apache Bench (`ab -n 1000 -c 50 http://localhost:3001/api/qsos/stats`) - Memory Profiling: Node.js `--inspect` flag with Chrome DevTools - Database Analysis: `sqlite3 data/award.db "PRAGMA index_info(idx_qsos_user_primary);"` --- **Last Updated**: 2025-01-21 **Author**: Quickawards Optimization Team **Status**: Planning Phase - Ready to Start Phase 1 Implementation