Back to ER Diagram
InfraTraq MongoDB

MongoDB Architecture

Document store design for InfraTraq — 35 collections across 10 categories covering audit logs, API integration, activity tracking, messaging, notifications, caches, user preferences, AI/ML, system observability, and file attachments.

35 Collections
10 Categories
TTL Policies
Indexed Queries
MongoDB 7.x

Overview

InfraTraq uses a dual-database architecture: PostgreSQL for transactional/relational data (398 tables) and MongoDB for event logs, caches, flexible documents, and high-write workloads (35 collections). The decision framework is simple: if the data requires foreign key constraints, ACID transactions, or complex JOINs, it goes to PostgreSQL. If the data is append-only, has variable schema, needs TTL expiry, or is write-heavy with eventual consistency tolerance, it goes to MongoDB.


Application Event

Decision Router

PostgreSQL



MongoDB
35
Collections
10
Categories
24
With TTL
11
Permanent
60+
Indexes

Status States

StatusDescriptionAllowed ActionsNext States
ActiveCollection is in production useRead/Write enabledArchived
TTL ExpiringDocuments approaching TTL boundaryMonitor, extend if neededActive, Archived
ArchivedCollection data moved to cold storageRead-only queriesDeleted
MaintenanceCollection under reindex or migrationAdmin operations onlyActive

RDBMS vs MongoDB Decision Framework

Use PostgreSQL When

Foreign key constraints required • ACID transactions needed • Complex JOINs across entities • Strict schema enforcement • Financial/accounting data • Master data with referential integrity

Use MongoDB When

Schema varies per record • Append-only write pattern • TTL auto-expiry needed • Nested/array fields per document • High-write throughput (logs, events) • Flexible JSON payloads

Bridge Pattern

Store the entity_type + entity_id reference in MongoDB • Look up relational details in PostgreSQL • Use CDC (Change Data Capture) for sync • Materialized views for cross-DB reporting

Anti-Patterns to Avoid

Never store financial transactions in MongoDB • Never build JOINs across MongoDB collections • Never use MongoDB for approval chains needing FK integrity • Never skip TTL for high-volume logs

Collection Categories (35 Collections)

Category 1: Audit Logs (2 collections)

  • entity_changelog — Full before/after snapshots on every RDBMS write — TTL: 2-7 years
  • field_audit_log — Field-level change tracking for sensitive fields — TTL: 2-7 years

Category 2: API & Integration Logs (3 collections)

  • api_request_log — Inbound/outbound HTTP request + response payloads — TTL: 30-90 days
  • webhook_event_log — Webhook delivery attempts and responses — TTL: 30-90 days
  • integration_sync_log — Sync job execution details per external system — TTL: 90 days

Category 3: Activity & Behavioral Tracking (3 collections)

  • user_activity_event — Clickstream: page views, button clicks, time-on-page — TTL: 30-365 days
  • search_query_log — Search terms, filters, results count, click-through — TTL: 90 days
  • feature_usage_metric — Feature adoption tracking per user/module — TTL: 365 days

Category 4: Messaging & Communication (3 collections)

  • chat_message — Real-time chat messages with nested attachments — Permanent
  • comment_thread — Polymorphic comments on any entity (entity_type + entity_id) — Permanent
  • email_communication — Email send/receive history with parsed body/attachments — Permanent

Category 5: Notification Delivery Logs (3 collections)

  • notification_delivery — Per-recipient delivery status: sent/delivered/read/failed — TTL: 90 days
  • push_notification_log — Mobile push delivery with device/token details — TTL: 90 days
  • sms_delivery_log — SMS gateway response tracking — TTL: 90 days

Category 6: Caches & Computed Snapshots (4 collections)

  • report_cache — Rendered report output cached by parameter hash — TTL: 1hr-30 days
  • dashboard_snapshot — Pre-aggregated KPI data for dashboard tiles — TTL: 24hr
  • analytics_aggregate — Pre-computed rollups (daily/weekly/monthly) — TTL: 30 days
  • search_index_cache — Full-text search index for instant search — TTL: 7 days

Category 7: User Preferences & UI State (4 collections)

  • user_preference — Per-user settings (locale, timezone, default project) — Permanent
  • dashboard_layout — Widget positions, sizes, data sources per user — Permanent
  • user_favorite — Pinned reports, bookmarked entities, recent items — Permanent
  • user_alert_config — Custom alert thresholds and notification preferences — Permanent

Category 8: AI/ML Artifacts (5 collections)

  • ml_training_dataset — Feature arrays for model training — TTL: 365 days
  • ml_model_registry — Model versions, hyperparameters, metrics — TTL: 365 days
  • ml_prediction_log — Input features → output predictions with confidence — TTL: 90 days
  • nlp_query_history — Parsed intents, extracted entities from NL queries — TTL: 90 days
  • ai_conversation — Multi-turn AI assistant conversation context — TTL: 90 days

Category 9: System Observability (4 collections)

  • performance_metric — Response times, throughput, error rates per endpoint — TTL: 7-90 days
  • background_job_log — Job execution details: params, duration, output, errors — TTL: 30 days
  • system_health_event — CPU/memory/disk alerts, service health checks — TTL: 30 days
  • error_log — Structured error logs with stack traces and context — TTL: 90 days

Category 10: File Attachments (1 collection)

  • attachment_metadata — Files attached to any entity_type + entity_id (polymorphic) — Permanent

MongoDB Operations Lifecycle

1

Collection Design

Design each collection with its document schema, key fields, and expected growth pattern. Use the decision framework: FK needed → RDBMS; variable schema → MongoDB; append-only → MongoDB; TTL needed → MongoDB.

2

Index Strategy

Create compound indexes on frequently queried field combinations. Use TTL indexes for automatic document expiry. Partial indexes for sparse fields. Text indexes for search_index_cache.

-- TTL index example
db.api_request_log.createIndex(
  { "created_at": 1 },
  { expireAfterSeconds: 7776000 }  -- 90 days
)

-- Compound index for entity lookup
db.entity_changelog.createIndex(
  { "entity_type": 1, "entity_id": 1, "changed_at": -1 }
)
3

Write Path

Application events flow through the Decision Router. RDBMS writes trigger CDC events that create entity_changelog documents. API calls log to api_request_log. User actions log to user_activity_event. All writes are fire-and-forget for performance.

4

Read Path

Dashboard queries hit dashboard_snapshot and analytics_aggregate for pre-computed data. Audit queries scan entity_changelog with compound index. Search uses search_index_cache with text indexes. Report generation checks report_cache before computing.

5

TTL Management

Configure TTL per collection based on data retention policy. Audit logs: 2-7 years (regulatory). API logs: 30-90 days (debugging). Caches: 1hr-30 days (freshness). System metrics: 7-90 days (capacity planning). Archive to S3/GCS before TTL expiry for cold storage.

6

Monitoring & Alerting

Monitor collection sizes, index usage, slow queries, and replication lag. Alert on: collection size > threshold, TTL backlog, write concern failures, index miss rate > 5%, oplog window < 24hr.

Implementation Examples

Entity Changelog Write (Python)

# After any RDBMS write, log the change to MongoDB
async def log_entity_change(entity_type, entity_id, operation, old_val, new_val, user_id):
    doc = {
        "entity_type": entity_type,
        "entity_id": str(entity_id),
        "operation": operation,  # INSERT, UPDATE, DELETE
        "old_value": old_val,
        "new_value": new_val,
        "changed_by": str(user_id),
        "changed_at": datetime.utcnow(),
        "ip_address": request.client.host,
        "session_id": request.state.session_id
    }
    await db.entity_changelog.insert_one(doc)

Report Cache Pattern

# Check cache before generating expensive report
async def get_report(report_type, params):
    cache_key = hashlib.sha256(
        f"{report_type}:{json.dumps(params, sort_keys=True)}".encode()
    ).hexdigest()
    
    cached = await db.report_cache.find_one({"cache_key": cache_key})
    if cached and cached["expires_at"] > datetime.utcnow():
        return cached["output_data"]
    
    # Generate report
    output = await generate_report(report_type, params)
    
    # Cache with TTL
    await db.report_cache.update_one(
        {"cache_key": cache_key},
        {"$set": {"output_data": output, "expires_at": datetime.utcnow() + timedelta(hours=1)}},
        upsert=True
    )
    return output

Validation Rules

Business Rules

  • Document Size: Max 16MB per document — split large payloads across multiple docs or use GridFS
  • Index Limit: Max 64 indexes per collection — prioritize compound indexes over single-field
  • TTL Precision: TTL index runs every 60 seconds — documents may live up to 60s past expiry
  • Write Concern: Use w:majority for audit logs and entity_changelog to prevent data loss
  • Read Preference: Use secondaryPreferred for analytics_aggregate and dashboard_snapshot reads
  • Sharding Key: Shard high-volume collections (entity_changelog, api_request_log) by tenant_id + created_at

Integration Points

Connected Modules

  • PostgreSQL CDC: Debezium or WAL2JSON captures RDBMS changes → writes to entity_changelog
  • Redis: Hot cache layer in front of dashboard_snapshot for sub-10ms reads
  • Elasticsearch: Sync search_index_cache to ES for advanced full-text search
  • S3/GCS Cold Storage: Archive expired documents to object storage before TTL deletion
  • Grafana/Prometheus: Export performance_metric and system_health_event for dashboards
  • Notification Engine: notification_rule (RDBMS) triggers → notification_delivery (MongoDB) tracking

Best Practices

Schema Design

  • Embed related data within documents to avoid cross-collection lookups
  • Use entity_type + entity_id pattern for polymorphic references
  • Keep documents under 1KB for high-throughput collections (metrics, events)
  • Use arrays for bounded lists (attachments, reactions) but not for unbounded growth

Performance

  • Always create indexes before deploying collection to production
  • Use projection to return only needed fields on large documents
  • Avoid $regex queries without anchored prefix — use text indexes instead
  • Batch inserts using insertMany() for high-volume write paths
  • Monitor slow query log (>100ms) and optimize with explain()

Operations

  • Enable authentication and TLS in production
  • Use replica sets (minimum 3 nodes) for high availability
  • Schedule compaction during off-peak hours for TTL-heavy collections
  • Backup strategy: daily mongodump + continuous oplog tailing
  • Monitor oplog window — should be >= 24 hours for disaster recovery