Architecture & Data Flow
Overview
The SOC follows a layered data flow: collect at endpoints → correlate in Wazuh Manager → store in Wazuh Indexer → auto-dispatch to TheHive/CrowdSec/Teams → human triage → investigation in a Case → optional forensics via Velociraptor.
Current state (2026-04-15) has 2 key deviations from the target roadmap:
- Orchestration runs via simple custom scripts on Wazuh Manager, not via Shuffle (SOAR). Shuffle is deployed but idle — playbooks haven't been written yet.
- Tier 3 components are not deployed: Suricata (NIDS) blocked by missing SPAN port, Grafana not deployed, Sigma community rules not imported.
Data flow (step by step)
Step 1. Collection
- Wazuh agents (35 Active) on Academy's CTs/hosts collect login events, File Integrity Monitoring (FIM), processes, USB events, vulnerability scans.
- FortiGate sends syslog to Wazuh Manager (port 514/udp).
Step 2. Reception and correlation
- Wazuh Manager (CT702) receives events on port 1514 (agent protocol, TCP+UDP).
- Applies decoders + rules:
- Builtin Wazuh ruleset (thousands of rules)
- Local
/var/ossec/etc/rules/local_rules.xml(including RFC 5737 suppressions — see Daily Routine Tier 1)
- Generates alerts with level 0-15 (0 = suppressed, 15 = critical).
Step 3. Storage
- All events are indexed in Wazuh Indexer (3-node OpenSearch cluster):
- CT701 primary on siem-px1
- CT704 replica 1 on siem-px3
- CT705 replica 2 on siem-px5
- Data has replica factor 2 (each doc on 2 nodes) — losing 1 node doesn't lose data.
Step 4. Automatic dispatch (level ≥ 10)
⚠️ Divergence from plan: instead of a single Shuffle (SOAR) conductor, there are currently 3 independent custom scripts on Wazuh Manager (temporary — migration to Shuffle tracked as TASK-114):
| Script | Destination | Purpose |
|---|---|---|
custom-thehive.py |
TheHive (CT710) → POST /api/v1/alert |
Create Alert in TheHive (NOT Case!) |
custom-crowdsec-block.py |
CrowdSec (CT707) → POST /v1/alerts |
Ban IP (with whitelist filter + TTL) |
custom-teams.sh |
Microsoft Teams webhook | Adaptive Card in SIEM-Alerts channel |
These three jobs run in parallel from Wazuh Manager.
Step 5. Triage (Tier 1 analyst, human)
Analyst opens TheHive Alerts queue. For each Alert:
- 📄 Preview and Import (icon on the alert row) → modal with description, tags, observables.
- Based on the text (rule, agent, srcip, full_log) the analyst decides:
- Suspicious → "Yes, Import" → alert becomes a Case
- FP (noise) → Cancel, "Mark as read"
- Cortex analyzers (VirusTotal, AbuseIPDB, MISP) run only inside a Case (not on Alerts) — the "Run analyzers" button is available on an observable in Case → Observables tab.
Alert vs Case — the fundamental distinction:
| Alert | Case | |
|---|---|---|
| How it's created | Automatically (Wazuh→TheHive) | Manually (from an Alert via "Yes, Import") |
| What it means | Raw warning, not yet triaged | Confirmed incident under investigation |
| How many | Many (~100-1000/day), 70% FP | Few (~5-20/day), all significant |
| Cortex analyzers | ❌ not available | ✅ "Run analyzers" button on each observable |
| Where in TheHive UI | Alerts tab | Cases tab |
Without the "Alert → triage → Case" intermediate step, TheHive would be flooded with noise within days.
Step 6. Investigation (Tier 2+)
Analyst works inside a TheHive Case:
- Adds tasks (what to check)
- Adds observables (discovered IoCs)
- For each observable — Cortex enrichment (VT/AbuseIPDB/MISP again)
- Searches MISP — "has anyone else seen this IoC? when? in which event?"
- If needed — Velociraptor hunt on the host: "show processes / files / registry"
Step 7. WAN attack response
This is a parallel flow, independent of TheHive:
- CrowdSec stores decisions locally (SQLite)
blocklist-mirroron CT707 exports active decisions as an HTTP feed (http://10.250.0.16/security/blocklist)- FortiGate pulls the feed every minute as an
external-resource - FortiGate Policy 137 DROPs traffic from IPs in the feed
So a ban takes effect within 1-2 minutes of the Wazuh alert — the attacking IP loses access even to Academy's public services.
Diagram (current real flow)
External threat feeds ┌────────────────────────┐
(CERT-UA, MISP feeds)──────hourly feed────────▶│ MISP (CT706) │
│ 570+ events │
└───────┬─────────┬──────┘
│ │
(analyzer query│ │ hourly sync
from Cortex) │ ▼
│ ┌───────────────┐
│ │ TheHive Alerts │
│ │ queue (CT710) │
│ └───────┬───────┘
│ │ Import as Case
│ │ (human click)
▼ ▼
┌──────────────────────┐
│ Cortex (CT711) │
│ ↑ enrichment │
│ VT+AbuseIPDB+MISP │
└──────────┬───────────┘
│
┌───────────────┐ │
35 Wazuh agents ─────│ │ custom-thehive.py │
FortiGate syslog ────│ Wazuh Manager ├───────────────────────▶│
│ (CT702) │ custom-crowdsec... │
│ │─────────▶ CrowdSec (CT707) ──feed──▶ FortiGate Policy 137 DROP
│ │ custom-teams.sh
│ │─────────▶ MS Teams SIEM-Alerts
└───────┬────────┘
│ all events indexed
▼
┌─────────────────────────────────┐
│ Wazuh Indexer (OpenSearch 3-node)│
│ Primary: CT701 @ siem-px1 │
│ Replica1: CT704 @ siem-px3 │
│ Replica2: CT705 @ siem-px5 │
└──────────┬──────────────────────┘
│ queries
▼
┌──────────────────────┐
│ Wazuh Dashboard (CT703)│
└──────────────────────┘
┌───────────────────────────┐
│ Velociraptor (CT713) │
│ — server deployed │
│ — clients NOT yet deployed │ ⟵ subtask 26
└───────────────────────────┘
🟡 NOT IN FLOW currently:
Shuffle (idle — no playbooks written, subtask 22)
Suricata (not deployed — blocked by SPAN port)
Grafana (not deployed, Tier 3)
Sigma rules (not imported, Tier 3)
Integration points (active)
| From | To | Method | Status |
|---|---|---|---|
| Wazuh Agent | Wazuh Manager | Agent protocol (1514 TCP+UDP) | 🟢 35 agents active |
| FortiGate | Wazuh Manager | Syslog (514 UDP) | 🟢 |
| Wazuh Manager | Wazuh Indexer | REST API (9200 TCP) | 🟢 3-node cluster |
| Wazuh Manager | TheHive | custom-thehive.py → HTTPS POST /api/v1/alert |
🟢 (level ≥ 10) |
| Wazuh Manager | CrowdSec | custom-crowdsec-block.py → HTTP POST /v1/alerts |
🟢 (level ≥ 10, with whitelist) |
| Wazuh Manager | MS Teams | custom-teams.sh → HTTPS Power Automate webhook |
🟢 (level ≥ 10) |
| Wazuh Dashboard | Wazuh Indexer | REST API (9200) | 🟢 |
| TheHive | Cortex | REST API from application.conf (Bearer auth) |
🟢 |
| TheHive | MISP | REST API from application.conf (key auth, hourly sync) |
🟢 570+ events imported |
| Cortex | VirusTotal | REST API (HTTPS outbound) | 🟢 500 lookups/day free tier |
| Cortex | AbuseIPDB | REST API (HTTPS outbound) | 🟢 1000 lookups/day free tier |
| Cortex | MISP (local) | REST API | 🟢 |
| CrowdSec | FortiGate | External-resource HTTP feed (pulled every minute) | 🟢 Policy 137 DROP |
Planned integrations (NOT active)
| Integration | Purpose | Tracked in BACKLOG |
|---|---|---|
| Wazuh Manager → Shuffle | SOAR orchestration replacing custom scripts | TASK-092d subtask 22 |
| Shuffle → TheHive / Cortex / Teams / CrowdSec | Playbooks for automation flows | subtask 22 |
| MISP → Wazuh CDB lists | IoC matching in Wazuh rules (99906-99920) | subtask 28 |
| Suricata → Wazuh Manager | Network-level detection | TASK-092e |
| Sigma rules → Wazuh rules | 100+ community detection rules | TASK-092e |
| Wazuh Indexer → Grafana | Executive dashboards for Manager/CISO | TASK-092e |
| Teams Adaptive Card → action buttons | One-click "Open in TheHive" / "Wazuh Discover" | subtask 29 |
| Velociraptor agents → Velociraptor server | Endpoint forensics actually working | subtask 26 (Phase 1-3) |
Why this flow (historical context)
The original plan (Tier 2 deployment plan 2026-04-14) had Shuffle as the central orchestration layer. During actual deployment we decided:
- Ship the MVP fast — Wazuh→TheHive flow is critical, can't wait for playbook authoring
- Write simple custom Python/bash scripts (~99 lines total) for trivial flows
- Keep Shuffle deployed but idle — until more complex workflows (
if → else → parallelbranches) are needed
This made the MVP live in 1 day instead of weeks. The cost — code in 3 places instead of one Shuffle UI.
⚠️ TASK-114: Migration of custom scripts → Shuffle (mandatory)
Principle: if something CAN live in Shuffle instead of Wazuh Manager as a custom script — it MUST be in Shuffle. Wazuh Manager = detection + correlation. Shuffle SOAR = orchestration + response + notification.
All 3 custom scripts (custom-thehive.py, custom-crowdsec-block.py, custom-teams.sh) must be replaced by Shuffle playbooks. Migration is incremental (one script at a time with parallel testing). Tracked in BACKLOG as TASK-114.
Last updated: 2026-04-16.