Risk-Based Alerting: From Alert Fatigue to Signal

At enterprise scale, SOCs don't usually fail from under-detection. They fail from over-alerting. A typical detection stack fires thousands of events a day, and most of those events are either low-fidelity, redundant, or missing the context an analyst would need to do anything useful. Risk-based alerting fixes this, but not by making rules smarter. It changes what rules are supposed to produce. Instead of each rule firing an alert, each rule contributes to a score. Alerts only fire when a specific entity - a user, a host, an identity - has accumulated enough evidence against it to be worth a human's time. Done well, it turns a stream of noise into a short queue of real investigations. Done badly, it's a more expensive way to miss the same attacks.

Why rule-per-alert stops scaling

Traditional detection treats every rule as a standalone alert producer. User logs in from a new country - alert. Process runs from a suspicious path - alert. Encoded PowerShell - alert. Each one lands in the queue and waits for an analyst.

At small scale this is fine. At enterprise scale it falls apart, and it falls apart in very specific ways.

The obvious fix is "write higher-fidelity rules." It mostly doesn't work. A new-country login could be someone on holiday. A process running from %APPDATA% could be Teams, Slack, or that one piece of business software finance refuses to stop using. Encoded PowerShell shows up in legitimate Intune policies. So teams tune these rules until they only fire on the blatant cases - which is exactly where a half-competent attacker isn't operating anyway. You end up with a ruleset that's both noisy and blind at the same time.

I've watched this cycle play out in more than one SOC. The team writes a rule, it fires too often, someone tunes it down, it misses the next incident, the postmortem blames "detection gaps," and two months later a new rule gets added that repeats the whole loop. RBA doesn't make your rules better. It changes what the rules have to be right about.

What risk-based alerting actually changes

RBA separates two things that traditional detection glues together: noticing something suspicious, and deciding to alert on it.

In an RBA model, detection rules don't produce alerts. They produce risk events - small, structured records attributed to an entity, with a score, a reason, and some context. These risk events accumulate over a rolling window. The alerting layer is a separate thing on top, asking one question: has any entity built up enough risk to be worth a look?

Three things change once you do that:

Rules can be tuned for sensitivity instead of specificity. They don't have to be right on their own anymore.
Weak signals start earning their keep. One suspicious event is noise. The same event plus three others on the same user in the same day is a pattern.
The analyst gets a case, not an alert. When something fires, it arrives with the full history of contributions already attached to one entity. Half the triage work is done before a human sees it.

Fewer alerts. More evidence behind each one. Most of the investigation already assembled.

How it actually works - Splunk ES and Databricks

Four moving parts: detections that emit risk events, an entity model those events stick to, an aggregation layer that scores and decays them, and an alerting layer that fires on thresholds.

In Splunk Enterprise Security, this maps onto the Risk Analysis Framework. Correlation searches no longer emit notables directly. They trigger the Risk Analysis adaptive response action, which writes to the risk index with risk_object, risk_object_type, risk_score, risk_message, and critically, annotations.mitre_attack. The detection searches are just sensors. The alerting layer is a separate set of searches - usually called risk incident rules - that query the risk index looking for entities that meet specific criteria.

One thing I learned the hard way: don't use raw | search against the risk index at scale. Once you're above a few thousand risk events per day, that query starts dragging. | tstats against the Risk data model is an order of magnitude faster, and that difference is what separates near-real-time RBA from a 20-minute detection lag. I've had incidents where the attacker was in and out before the correlation search even caught up. That's not a detection problem, that's a query design problem.

The output of a risk incident rule is a notable representing the entity - with contributing risk events as drill-down, and MITRE techniques already tagged on the incident.

For teams doing detection on a data lakehouse, the pattern is the same with different plumbing. In Databricks, detection logic runs as Structured Streaming or scheduled batch jobs writing risk events to a Delta table partitioned by date. The aggregator reads that table over a rolling window, applies decay and grouping, and publishes entities above threshold to an alerts table. Three things matter more than the SQL syntax: watermarking so late-arriving data doesn't force you to recompute the whole window, schema evolution so new detection rules can add fields without breaking aggregation, and idempotent writes so if you reprocess, scores don't get double-counted. Miss any of those and your aggregator becomes a false positive factory.

But the plumbing is secondary. The first question you need to answer is: what is the entity? Get that wrong and nothing downstream works.

Entity resolution is where most deployments quietly fail

Risk has to be attributed to something, and that something has to be stable, resolvable, and meaningful.

A user identity is usually the strongest entity - most attack activity touches a user eventually. A host is useful when endpoint telemetry dominates, but it's unreliable in cloud-heavy environments where hosts live for minutes. An asset identifier works for servers and service accounts but breaks down for users who hop between laptops and phones.

In practice, mature deployments run multiple entity types side by side - user risk, host risk, sometimes service-account risk - with pivots between them. A compromised credential used on a new host should raise risk on both. Correlations that span entities (same user, same host, short window) are some of the strongest signals you'll ever build.

The messy part is entity resolution. A rule firing on svc_backup01 and another firing on [email protected] need to reconcile to the same underlying identity, or risk just fragments across identifiers and never aggregates. In Splunk ES this is what the Asset and Identity Framework is for - mapping hostnames, IPs, MACs, user IDs, email addresses, and sAMAccountNames into single resolvable entities through identity and asset lookups. An RBA program running on top of an empty or neglected Asset and Identity Framework will silently undercount every entity in the environment. I've seen teams pour six months of engineering into RBA content and then spend another two working out why the risk scores looked suspiciously low. It was almost always this.

Scoring, decay, and thresholds - where people overengineer

There's a temptation to build elaborate scoring systems. Weighted matrices, multi-factor severity models, asset criticality multipliers. All of it can be useful. Most of it becomes unmaintainable within a year.

Start simple: low (10-20), medium (30-50), high (60-80), critical (90+), scores assigned per rule based on confidence and severity. But raw sum is the wrong aggregation, and this is where naive implementations blow up.

Summing scores lets one noisy rule dominate an entity's risk. If a misconfigured detection fires 40 times on the same user in an hour, that user is now sitting at 400+ points on a single signal - which is exactly what RBA is supposed to prevent. Mature deployments aggregate on three dimensions at once: total score, distinct source count (how many different rules contributed), and MITRE tactics coverage (how many ATT&CK tactics the risk events span). A threshold that reads "risk_score > 100 AND distinct_sources >= 3 AND distinct_tactics >= 2" is a much better signal than "risk_score > 100" on its own.

The tactics dimension is where RBA starts earning its real keep. An entity with risk events across Initial Access, Credential Access, and Lateral Movement is a different animal from an entity with three risk events all tagged Discovery - same score, completely different story. Tagging MITRE annotations on every risk event is what makes this possible.

Decay is the other thing teams often skip. Without it, risk accumulates forever and every senior engineer eventually crosses threshold just by existing. Exponential decay with a 4-8 hour half-life is a reasonable default - full weight when fresh, half after one half-life, a quarter after two. Important: this is a weighting function applied at aggregation time, not a delete job on the risk index. The events stay; only their contribution to the live score decays.

Thresholds should match what the SOC can actually absorb, not what looks severe on paper. If your team handles 15 investigations a day, threshold is whatever produces roughly 15 entities a day above it. Everything else - weights, decay curves, tactics requirements - is tuning around that constraint. This is the part I find most teams get wrong in their first pass. They set thresholds based on "risk severity," queue fills up, analysts get overwhelmed again, and someone declares RBA a failure. It isn't. The threshold is just too low.

Applying ML to risk events

Most RBA implementations stop at threshold logic. That works, but it leaves the highest-value signal on the table: the shape of the risk event sequence, not just its magnitude.

Once an environment has been running RBA for a few months, the risk index itself turns into something useful - a labeled dataset. Each entity-window has a structured representation: the set of contributing rules, their timing, their MITRE annotations, the order in which they fired. Confirmed incidents give you positive labels, closed-as-FP cases give you negatives. That's a rich enough feature space to do something beyond threshold math.

Two applications tend to pay off. The first is risk event clustering - unsupervised grouping of entities by the pattern of their risk contributions, not just total score. Entities that cluster together usually share a root cause. Investigating one often resolves the others. And clusters that appear suddenly, matching nothing previously seen, are strong candidates for priority triage.

The second is sequence-based anomaly detection on the order and timing of risk events within an entity window. An attacker moving through an environment produces a recognizable sequence - initial access, then credential access, then lateral movement. It looks very different from the random co-occurrence of risk events that dominates noise. Treating each entity's risk events as a sequence and training an anomaly model to flag unusual progressions catches attack chains that might cross threshold on raw score but get buried in the queue by higher-scoring noise.

Neither of these replaces the threshold layer - they feed into it. The output of the ML layer is another risk score, or a multiplier on the entity's existing one. It's just another sensor in the same architecture, but one that operates on the output of every other sensor. That recursive quality - detection on detection - is where RBA stops being just an alerting model and starts being an analytics platform.

Where RBA fails - the pitfalls I keep seeing

RBA fails in recognizable ways. Most of them aren't technical.

Scoring inflation is the most common. New rules get high default scores because the team wants them to "be caught." Existing rules never get their scores reduced, even after a year of data shows they fire more than expected. Over time, the top of the queue fills with the same handful of high-score rules firing repeatedly, and the aggregation benefit evaporates - a single event is enough to cross threshold. The fix is boring but effective: every rule's contribution gets reviewed periodically against its actual fire rate and against how often it shows up in confirmed incidents. Rules that fire a lot and rarely appear in confirmed cases get their scores cut.

Entity fragmentation comes second. Covered above - this is almost always the Asset and Identity Framework (or its equivalent) being neglected. It's unglamorous work. It's also the single highest-ROI thing you can invest in before scaling RBA.

Missing MITRE annotations is a quieter failure. Teams build hundreds of detection rules, enable RBA, and never backfill ATT&CK mappings on existing content. Aggregation still works on raw score, but the tactics dimension is empty and you've lost the strongest signal RBA has to offer. Making annotations.mitre_attack required on every new risk event - and enforcing it at the detection-as-code level - is cheap insurance.

Loss of rule visibility is subtler. When every rule becomes a risk contribution rather than an alert, it gets hard to tell which rules are firing and which have silently broken. A detection that stopped producing events last week looks identical, from the alert queue, to one that's working correctly and contributing low-level signals. RBA environments need rule health dashboards - simple fire-rate trackers per rule, flagging anything whose baseline shifted unexpectedly. We had a parser break on one data source in a previous deployment, and about a third of our detection content quietly went silent for nine days before anyone noticed.

Over-trust in the score is the most dangerous. A risk score is a heuristic, not a verdict. An entity at 150 points isn't necessarily compromised - it could be an admin doing admin things, a security tool generating expected noise, or a business process that happens to look suspicious. The score tells analysts where to look. It doesn't tell them what to conclude.

When RBA is the right tool

RBA isn't a universal improvement over traditional detection. It needs a baseline of working detection content, reasonable entity resolution, MITRE-annotated rules, and enough signal volume to make aggregation meaningful. Teams with a small ruleset and low alert volume get less out of it - they mostly need better rules, not a different alerting architecture.

Where RBA genuinely changes things is at scale. Detection stacks with hundreds of rules, thousands of daily events, SOCs past the point where manual triage of every alert was ever going to work. In those environments the question isn't whether to move to risk-based alerting. It's how carefully.

The technical shift - from alerts to scores - is the easy part. A decent engineer can stand it up in a quarter. The operational shift - from triaging events to investigating entities - takes longer, because it changes how analysts work day to day. Both matter. Only the second one actually reduces alert fatigue.

< Back to Defense Labs