⚡ Sprint 6.3 · Federation + 2nd Hi-End Server · 15 Nov - 5 Dec 2027
← Hub ← Phase 6 SPRINT 6.3 · INFRA Multi-Region

Sprint 6.3 · Federation + 2nd Hi-End Server

No more single-region cluster. Cyberjaya 2nd Hi-End cluster goes live · cross-region replication · automatic failover · per-tenant region pinning · zero-downtime fail-over verified. By end-sprint: KL outage doesn't affect Penang or Kuching tenants. RTO < 30s · 99.9% multi-region uptime.

1. 🎯 Sprint Summary

Sprint6.3 (Federation + 2nd Hi-End Server)
Duration15 Nov - 5 Dec 2027 (3 weeks · 15 working days)
GoalCyberjaya 2nd Hi-End cluster · cross-region replication · automatic failover · per-tenant region pinning · DR drill (full restore < 4h) · 99.9% multi-region uptime · 35 tenants live
Capacity5 FTE (2 BE + 1 FE + 2 DevOps + 0.5 Compliance) + 0.5 Founder + 0.3 Doc Zam
Velocity target90 SP
Critical riskNo data loss during cutover · cross-region replication lag < 1s p95
Demo date5 Dec 2027

2. 🏗️ Federation Architecture

┌─────────────────────────┐                ┌─────────────────────────┐
│  KL DC · Region A       │ ◄──────────►   │  Cyberjaya · Region B   │
│                         │  10G dark      │                         │
│  Hi-End Cluster #1      │  fibre + VPN   │  Hi-End Cluster #2      │
│  4× L40S GPU            │                │  4× L40S GPU            │
│  Inference: vLLM        │                │  Inference: vLLM        │
│  Storage: NVMe primary  │                │  Storage: NVMe replica  │
│                         │                │                         │
│  MariaDB Galera node 1  │ ◄─async────►   │  MariaDB Galera node 2  │
│  pgvector primary       │  cross-region  │  pgvector replica       │
│  MinIO bucket replica   │  < 1s p95     │  MinIO bucket replica   │
└──────────┬──────────────┘                └──────────┬──────────────┘
           │                                          │
           ▼                                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    HAProxy Region Router (DNS-aware)                 │
│  Tenant region tag → primary region · automatic failover on health   │
│  Health check every 10s · failover RTO target < 30s                 │
└─────────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
                    Tenants · Patients · Doctors

PER-TENANT REGION PINNING
  · Tenant created with primary_region tag (kl_dc_1 OR cyberjaya_1)
  · Read/write goes to primary first
  · On primary unhealthy → automatic failover to secondary
  · After recovery → automated catch-up + re-pin to primary

REPLICATION GUARANTEES
  · Galera cluster · synchronous within region · async cross-region
  · pgvector · async streaming replica · < 1s lag
  · MinIO · cross-region bucket replication · 5min interval
  · Audit log · WORM with cross-region hash chain

3. 🚦 Pre-Sprint Gate Checklist

  • Sprint 6.2 closeout · 30 tenants stable across 4 states
  • Cyberjaya colocation contract signed · power audit passed
  • 2nd Hi-End hardware delivered · 4× L40S GPUs · same spec as KL DC
  • 10G dark fibre + VPN to KL DC provisioned · < 5ms RTT verified
  • Galera cluster strategy agreed (synchronous in-region · async cross-region)
  • HAProxy region router deployed in test environment
  • Per-tenant region tag schema migration drafted
  • DR runbook v2 with cross-region procedures
  • Pen-test scope updated (multi-region surface)
  • Compliance Lead reviewed cross-region data residency (still on Malaysian soil)

4. 🧩 Sprint Scope

  • Cyberjaya Cluster Bring-Up: Rack · power · cooling · 4× L40S install · OS + driver · vLLM · Triton · Litellm proxy
  • Cross-Region Networking: 10G dark fibre · VPN tunnel · IPsec · DNS region-aware
  • Galera Cross-Region Cluster: Synchronous in-region · async streaming cross-region · conflict resolution · last-writer-wins with audit
  • pgvector Async Replica: Streaming replication · < 1s lag p95 · monitoring
  • MinIO Bucket Replication: Cross-region S3 replication · 5min interval · DICOM + audio shared
  • Per-Tenant Region Pinning: Tenant migration with region tag · routing rules · admin override
  • HAProxy Region Router: DNS-aware · health check 10s · failover RTO < 30s · automatic re-pin
  • Audit Log Cross-Region Hash Chain: WORM extended cross-region · M9 audit verifies both sides
  • DR Drill v2: Full restore from cold backup · cross-region · target RTO < 4h · documented
  • Multi-Region Monitoring: Filament admin extension · per-region health · replication lag · latency
  • Per-Tenant Migration: 5 East Malaysia tenants migrated to Cyberjaya as primary (closer to East ML)

5. 📅 Day-by-Day Plan (15 days)

D1Mon 15 Nov · Cyberjaya Hardware + OS
Rack power up · OS install · NVIDIA driver · network bring-up.
D2Tue 16 Nov · Cross-Region Network
10G fibre + VPN · IPsec · < 5ms RTT verified · DNS region-aware setup.
D3Wed 17 Nov · Inference Stack Cyberjaya
vLLM · Triton · Litellm · same model versions as KL DC · smoke test.
D4Thu 18 Nov · Galera Cross-Region Cluster
Galera node 2 join · synchronous in-region · async streaming cross-region.
D5Fri 19 Nov · Mid-Demo + Replication Verification
Live mid-demo · replication lag < 1s p95 verified · MinIO sync confirmed.
D6Mon 22 Nov · pgvector Replica
Streaming replica · CPG corpus mirrored · vector search cross-region tested.
D7Tue 23 Nov · Per-Tenant Region Tag Migration
tenant_region column · backfill all 30 tenants with kl_dc_1 default · audit.
D8Wed 24 Nov · HAProxy Region Router
DNS-aware routing · health check 10s · failover < 30s tested.
D9Thu 25 Nov · Tenant Failover Drills
5 failover scenarios per tenant · RTO measured · zero data loss verified.
D10Fri 26 Nov · Mid-Demo Round 2 + DR Drill
Full DR drill · cold backup restore cross-region · RTO measured.
D11Mon 29 Nov · Audit Log Cross-Region Chain
WORM extended · cross-region hash chain · M9 audit verifies both sides.
D12Tue 30 Nov · East Malaysia Tenant Migration
5 East Malaysia tenants migrated to Cyberjaya primary · latency improvement measured.
D13Wed 1 Dec · Multi-Region Monitoring Dashboard
Filament dashboard · per-region health · replication lag · latency per tenant.
D14Thu 2 Dec · Pen-Test Light + Hardening
Multi-region surface tested · cross-region data leak attempt blocked.
D15Fri 3 Dec · Demo Prep + Polish
Demo deck · 35-tenant milestone · federation success metrics.
+Mon 5 Dec · Sprint Demo + Retro
9am demo · 11am retro · 2pm 6.4 (MOH Partnership) prep.

6. 📦 Deliverables

FRItemSP
FR-6.3.1Cyberjaya Hi-End cluster bring-up8
FR-6.3.210G dark fibre + VPN cross-region5
FR-6.3.3vLLM + Triton + Litellm Cyberjaya5
FR-6.3.4Galera cross-region cluster8
FR-6.3.5pgvector streaming replica5
FR-6.3.6MinIO cross-region replication5
FR-6.3.7Per-tenant region tag schema + migration5
FR-6.3.8HAProxy region router + failover8
FR-6.3.95 failover drill scenarios + measurements5
FR-6.3.10DR drill v2 cross-region full restore8
FR-6.3.11WORM audit cross-region hash chain5
FR-6.3.125 East Malaysia tenants migrated to Cyberjaya5
FR-6.3.13Multi-region monitoring dashboard5
FR-6.3.14Pen-test light · multi-region surface5
FR-6.3.15Compliance docs update · cross-region3
TOTAL85 SP

7. 👥 Team Capacity

RoleAllocationFocus
Eng Lead / BE1.0 FTERegion tag · Galera · ConsentedFetch cross-region
BE Dev 21.0 FTEReplication · audit hash chain · failover handler
FE Dev1.0 FTEMulti-region monitoring dashboard · admin region switcher
DevOps 11.0 FTECyberjaya cluster · vLLM · network · DR
DevOps 21.0 FTEGalera cluster · pgvector · MinIO · monitoring
Compliance Lead0.5 FTECross-region PDPA + data residency · doc pack update
Founder0.5 FTECyberjaya colo + vendor relations · pen-test coord
Doc Zam0.3 FTEClinical impact assessment · failover oversight

8. 🔔 Sprint Ceremonies

  • Mon 15 Nov 9am — Sprint Planning (90 min)
  • Daily 9am — Standup (15 min · DevOps deep-dive Tue/Thu)
  • Fri 19 + Fri 26 Nov 4pm — Mid-sprint demos (60 min each)
  • Wed 25 Nov 4pm — Failover drill rehearsal (90 min)
  • Tue 30 Nov 4pm — DR drill review (60 min)
  • Thu 2 Dec 2pm — Pen-test debrief (60 min)
  • Mon 5 Dec 9am — Sprint Demo (90 min)
  • Mon 5 Dec 11am — Sprint Retro (60 min)

9. 🩺 Sign-off Items

  • Cross-region replication lag < 1s p95 verified
  • Failover RTO < 30s on 5 different drill scenarios
  • DR drill: full restore < 4h · cross-region · audit intact
  • Per-tenant region pinning works · admin override functional
  • WORM audit cross-region hash chain unbroken
  • 5 East Malaysia tenants migrated · latency improvement measured · zero data loss
  • Pen-test light: 0 critical · 0 high · ≤ 2 medium with mitigations
  • Compliance pack updated · external consultant attestation
  • Final demo (5 Dec) — written sign-off · Compliance Lead + Doc Zam

10. 🎬 Demo Agenda — 5 Dec 9am (90 min)

TimeSegment
0-5Federation narrative · 35-tenant milestone · multi-region rationale
5-15Architecture walk · KL DC + Cyberjaya · replication topology
15-30Live failover demo · kill KL DC simulator · East Malaysia tenant continues
30-45DR drill replay · full restore from cold backup · audit chain verified
45-55Per-tenant region pinning · admin region switcher · East Malaysia migration
55-65Multi-region monitoring dashboard · per-region health · replication lag
65-80Pen-test results · cross-region surface verified · compliance attestation
80-90Compliance Lead + Doc Zam sign-off · 6.4 (MOH Partnership) prep

11. 🛡️ Contingency

RiskTriggerResponse
Hardware delivery slip4× L40S vendor lateCloud burst · use existing KL DC plus rented Cyberjaya cloud GPU temporarily
Replication lag > 1sp95 fail targetTighten Galera config · add NVMe write cache · escalate to vendor
Failover RTO > 30sHealth check too slowTune health interval · DNS TTL reduce · TCP fast-failover
Cross-region data leak (pen-test)High-sev findingHot-fix · re-test · slip 6.4 by 1 week if needed
Compliance attestation withheldCross-region PDPA concernEngage backup consultant · address findings · iterate
Cyberjaya colo issuePower/cooling/network failFailover to KL DC · post-mortem · escalate to colo provider