Real Security Data: Training and Deploying Anomaly Detection Models (Part 4 of 4)

Introduction

The first three parts of this series built the infrastructure. This final post uses it. You now have four Jetson Nano inference containers running HTTPS, DSDL wired to Splunk, and HEC delivering scored events back to your indexers. In this post, you will train three Isolation Forest models on real security data and build the ES correlation rules that turn AI-scored anomalies into actionable notable events.

The three detection use cases are chosen for complementary coverage across the kill chain. Zeek connection log anomaly detection catches behavioral outliers in network traffic – the kinds of connections that do not match your environment’s normal patterns. DNS anomaly detection catches tunneling, command-and-control beaconing, and DGA callbacks that generate no network flow anomalies because they hide inside legitimate DNS traffic. Windows authentication anomaly detection catches credential abuse, brute force, and lateral movement patterns that look like legitimate events when examined individually but are anomalous in aggregate.

Together these three models cover T1046, T1071, T1048, T1110, T1078, and T1021 – a meaningful portion of the techniques that appear in advanced persistent threat activity.

Prerequisites

All four Nano inference containers healthy from Part 3
DSDL fit and apply smoke tests passing from Part 3
Zeek connection logs indexed at index=net_data sourcetype="bro:conn:json"
Splunk Stream DNS data indexed at index=stream sourcetype="stream:dns"
Windows Security event logs indexed at index=wineventlog sourcetype="WinEventLog:Security"
Minimum 7 days of each data source for viable baseline models

Step 1 – Understanding What Isolation Forest Detects

Before training, understanding what Isolation Forest actually measures prevents misinterpreting its output.

Isolation Forest is an unsupervised anomaly detection algorithm. It works by building random trees that partition the feature space. Points that are easy to isolate – requiring few partitions to separate from the rest – receive high anomaly scores. Points in dense, predictable regions require many partitions and receive low anomaly scores.

For security data, this translates to behavioral outliers: connections with unusual byte ratios, DNS queries with unusually long subdomains, users authenticating from unusual numbers of machines. The model has no concept of malicious versus benign – it only knows normal versus statistically unusual. Your correlation rules are what assign security meaning to high anomaly scores.

The contamination parameter tells the model what fraction of training data to treat as anomalies. Setting it to 0.05 means the model expects 5% of normal traffic to score as anomalous. For a well-tuned model on security data, start at 0.05 and tune based on your actual false positive rate after a week of operation.

Step 2 – Job 1: Zeek Connection Log Anomaly Detection

Run a field discovery search first to confirm your exact Zeek field names before running the training job:

index=net_data sourcetype="bro:conn:json" earliest=-1h
| head 100
| fieldsummary
| table field, count, distinct_count
| sort -count

Confirm these fields exist: duration, orig_bytes, resp_bytes, orig_pkts, resp_pkts, id.resp_p, conn_state.

The log transform on byte counts is important and not optional. Raw byte values span six orders of magnitude – a single large file transfer dwarfs hundreds of small connections in the feature space. Without log scaling the model fixates on the largest values and ignores behavioral anomalies in smaller flows, which is exactly where command-and-control and lateral movement traffic hides.

FIT – run once against 7 days of Zeek data:

index=net_data sourcetype="bro:conn:json"
  earliest=-7d
| where isnotnull(duration) AND isnotnull(orig_bytes) AND isnotnull(resp_bytes)
| eval conn_state_enc=case(
    conn_state="SF",   1,
    conn_state="S0",   0,
    conn_state="REJ",  2,
    conn_state="RSTO", 3,
    conn_state="RSTOS0", 4,
    conn_state="SHR",  5,
    conn_state="OTH",  6,
    true(),            7)
| eval bytes_ratio=round(orig_bytes/(resp_bytes+1),4)
| eval log_duration=if(duration>0,log(duration),0)
| eval log_orig_bytes=if(orig_bytes>0,log(orig_bytes+1),0)
| eval log_resp_bytes=if(resp_bytes>0,log(resp_bytes+1),0)
| table log_duration, log_orig_bytes, log_resp_bytes,
        orig_pkts, resp_pkts, "id.resp_p", bytes_ratio, conn_state_enc
| fit MLTKContainer algo=isolation_forest
    log_duration, log_orig_bytes, log_resp_bytes,
    orig_pkts, resp_pkts, "id.resp_p", bytes_ratio, conn_state_enc
    into app:zeek_net_anomaly
    environment=1

After fit completes, verify the model by checking what scores highest. You should see short-duration zero-byte flows (scans), high destination port numbers, and upload-heavy byte ratios scoring above 0.8:

index=net_data sourcetype="bro:conn:json" earliest=-1h
| where isnotnull(duration) AND isnotnull(orig_bytes)
| eval conn_state_enc=case(conn_state="SF",1,conn_state="S0",0,true(),7)
| eval bytes_ratio=round(orig_bytes/(resp_bytes+1),4)
| eval log_duration=if(duration>0,log(duration),0)
| eval log_orig_bytes=if(orig_bytes>0,log(orig_bytes+1),0)
| eval log_resp_bytes=if(resp_bytes>0,log(resp_bytes+1),0)
| apply zeek_net_anomaly
| where anomaly_score > 0.80
| table _time, "id.orig_h", "id.resp_h", "id.resp_p",
        conn_state, orig_bytes, resp_bytes, duration,
        anomaly_score, anomaly_label
| sort -anomaly_score

ES Correlation Rule SPL (create in Security > Content Management > Create New Content > Correlation Search):

index=net_data sourcetype="bro:conn:json"
  earliest=-15m latest=now
| where isnotnull(duration) AND isnotnull(orig_bytes)
| eval conn_state_enc=case(conn_state="SF",1,conn_state="S0",0,true(),7)
| eval bytes_ratio=round(orig_bytes/(resp_bytes+1),4)
| eval log_duration=if(duration>0,log(duration),0)
| eval log_orig_bytes=if(orig_bytes>0,log(orig_bytes+1),0)
| eval log_resp_bytes=if(resp_bytes>0,log(resp_bytes+1),0)
| apply zeek_net_anomaly
| where anomaly_score > 0.80
| eval src="id.orig_h", dest="id.resp_h", dest_port="id.resp_p"
| eval signature="AI_NETWORK_BEHAVIOR_ANOMALY"
| eval severity=if(anomaly_score>0.90,"critical","high")
| table _time, src, dest, dest_port, anomaly_score, severity, signature

Schedule this correlation rule to run every 15 minutes. Set the threshold to 0.80 for initial deployment – tune down to 0.75 after a week of baseline once you understand your false positive rate.

Step 3 – Job 2: DNS Tunneling and DGA Detection

DNS is the protocol that attackers abuse most reliably because most environments allow it to reach the internet without inspection. DNS tunneling encodes data in query names to exfiltrate information or receive command-and-control instructions. DGA (Domain Generation Algorithm) callbacks generate hundreds of unique domain names that are computationally generated and statistically distinct from human-registered domains.

The key features that distinguish tunneling and DGA traffic from normal DNS are query length, entropy, NXDOMAIN rate, and TXT record usage. Tunneling tools base64-encode payloads in query names making them unusually long. DGA names have high character entropy because they are pseudorandom. High NXDOMAIN rates occur because most DGA domains are never registered. Tunneling tools use TXT records heavily because they can carry the most payload per query.

Run field discovery first to confirm your exact Splunk Stream DNS field names – they vary by Stream version:

index=stream sourcetype="stream:dns" earliest=-1h
| head 100
| fieldsummary
| table field, count, distinct_count
| sort -count

Confirm: query, record_type, reply_code, src_ip. If fields are named differently (e.g., query_name instead of query), adjust the fit SPL accordingly.

FIT – aggregates per source IP over one-hour windows before training:

index=stream sourcetype="stream:dns"
  earliest=-7d
| eval query_len=len(query)
| eval subdomain_len=len(mvindex(split(query,"."),0))
| eval is_txt=if(match(record_type,"TXT"),1,0)
| eval is_nxdomain=if(reply_code="NXDOMAIN",1,0)
| bin _time span=1h
| stats
    count                     as query_count,
    avg(query_len)            as avg_query_len,
    avg(subdomain_len)        as avg_subdomain_len,
    dc(query)                 as unique_domains,
    avg(is_txt)               as txt_ratio,
    avg(is_nxdomain)          as nxdomain_ratio,
    stdev(query_len)          as query_len_stdev
    by src_ip, _time
| fillnull value=0
| fit MLTKContainer algo=isolation_forest
    avg_query_len, avg_subdomain_len, query_count, unique_domains,
    txt_ratio, nxdomain_ratio, query_len_stdev
    into app:dns_tunnel_anomaly
    environment=2

APPLY (run every 5 minutes – DNS events are fast-moving):

index=stream sourcetype="stream:dns" earliest=-5m
| eval query_len=len(query)
| eval subdomain_len=len(mvindex(split(query,"."),0))
| eval is_txt=if(match(record_type,"TXT"),1,0)
| eval is_nxdomain=if(reply_code="NXDOMAIN",1,0)
| bin _time span=5m
| stats
    count                     as query_count,
    avg(query_len)            as avg_query_len,
    avg(subdomain_len)        as avg_subdomain_len,
    dc(query)                 as unique_domains,
    avg(is_txt)               as txt_ratio,
    avg(is_nxdomain)          as nxdomain_ratio,
    stdev(query_len)          as query_len_stdev
    by src_ip, _time
| fillnull value=0
| apply dns_tunnel_anomaly
| where anomaly_score > 0.78
| table _time, src_ip, avg_query_len, avg_entropy,
        query_count, txt_ratio, nxdomain_ratio,
        anomaly_score, anomaly_label
| sort -anomaly_score

Step 4 – Job 3: Windows Authentication Anomaly Detection

Windows Security event logs contain rich behavioral data for authentication anomaly detection. The key EventCodes are 4624 (successful logon), 4625 (failed logon), 4648 (explicit credentials – RunAs or pass-the-hash indicator), and 4672 (special privileges assigned – admin logon indicator).

The power of this model is not in detecting individual events – Splunk ES already does that with threshold rules. The power is in detecting behavioral patterns that emerge only in aggregate: a user whose fail-to-success ratio spikes at 2 AM, a service account that suddenly starts authenticating to hosts it has never touched, a privileged user who logs in from three different source IPs in a single hour.

The aggregation window is one hour per user. This deliberately hides individual events and surfaces behavioral patterns, which is what makes the model complementary to signature-based detection rather than redundant with it.

FIT – 14 days of data to capture enough baseline variation including weekends and off-hours patterns:

index=wineventlog sourcetype="WinEventLog:Security"
  EventCode IN (4624,4625,4648,4672,4768,4769,4776)
  earliest=-14d
  NOT user IN ("SYSTEM","LOCAL SERVICE","NETWORK SERVICE","*$")
  NOT user=""
| eval is_success=if(EventCode=4624,1,0)
| eval is_failure=if(EventCode=4625,1,0)
| eval is_network=if(EventCode=4624 AND Logon_Type="3",1,0)
| eval is_explicit=if(EventCode=4648,1,0)
| eval is_priv=if(EventCode=4672,1,0)
| eval is_kerb=if(EventCode IN (4768,4769),1,0)
| eval is_ntlm=if(EventCode=4776,1,0)
| eval hour=tonumber(strftime(_time,"%H"))
| eval offhours=if(hour>=22 OR hour<=6,1,0)
| bin _time span=1h
| stats
    sum(is_success)   as success_logins,
    sum(is_failure)   as failed_logins,
    sum(is_network)   as network_logons,
    sum(is_explicit)  as explicit_cred_count,
    sum(is_priv)      as priv_logons,
    sum(is_kerb)      as kerb_tickets,
    sum(is_ntlm)      as ntlm_auths,
    sum(offhours)     as offhours_events,
    dc(src_ip)        as unique_src_hosts,
    dc(ComputerName)  as unique_dest_hosts
    by user, _time
| eval fail_ratio=round(failed_logins/(success_logins+1),4)
| eval offhours_flag=if(offhours_events>0,1,0)
| fillnull value=0
| fit MLTKContainer algo=isolation_forest
    failed_logins, success_logins, fail_ratio, network_logons,
    unique_src_hosts, unique_dest_hosts, explicit_cred_count,
    priv_logons, ntlm_auths, kerb_tickets, offhours_flag
    into app:auth_anomaly
    environment=2

APPLY (run every 30 minutes):

index=wineventlog sourcetype="WinEventLog:Security"
  EventCode IN (4624,4625,4648,4672,4768,4769,4776) earliest=-30m
  NOT user IN ("SYSTEM","LOCAL SERVICE","NETWORK SERVICE","*$") NOT user=""
| eval is_success=if(EventCode=4624,1,0)
| eval is_failure=if(EventCode=4625,1,0)
| eval is_network=if(EventCode=4624 AND Logon_Type="3",1,0)
| eval is_explicit=if(EventCode=4648,1,0)
| eval hour=tonumber(strftime(_time,"%H"))
| eval offhours=if(hour>=22 OR hour<=6,1,0)
| stats
    sum(is_success)   as success_logins,
    sum(is_failure)   as failed_logins,
    sum(is_network)   as network_logons,
    sum(is_explicit)  as explicit_cred_count,
    sum(offhours)     as offhours_events,
    dc(src_ip)        as unique_src_hosts,
    dc(ComputerName)  as unique_dest_hosts
    by user
| eval fail_ratio=round(failed_logins/(success_logins+1),4)
| eval offhours_flag=if(offhours_events>0,1,0)
| fillnull value=0
| apply auth_anomaly
| where anomaly_score > 0.75
| table user, failed_logins, success_logins, network_logons,
        unique_dest_hosts, offhours_flag, anomaly_score, anomaly_label
| sort -anomaly_score

Step 5 – Tuning Thresholds and Managing False Positives

After a week of operation, review the distribution of anomaly scores from each model:

index=ai_inference sourcetype=jetson_inference earliest=-7d
| stats count, avg(anomaly_score), perc90(anomaly_score), max(anomaly_score)
  by host

Use this to calibrate thresholds. A well-tuned model should have:

Less than 5% of scored events above your correlation rule threshold
The top-scoring events should represent genuinely unusual behavior on review
Score distribution should be roughly stable day over day

If too many events score above threshold, raise the threshold or retrain with a longer window (earliest=-30d) to give the model more baseline variation. If the score distribution shifts significantly week over week, retrain – the model is drifting from your evolving environment.

Step 6 – Enriching ES Notable Events

The three AI models are most valuable when their output is combined with existing ES context. A Zeek anomaly that coincides with a Suricata IDS alert on the same source IP is a very different priority than an isolated anomaly. Add correlation context to your notable events:

index=net_data sourcetype="bro:conn:json" earliest=-15m latest=now
| where isnotnull(duration) AND isnotnull(orig_bytes)
| eval conn_state_enc=case(conn_state="SF",1,conn_state="S0",0,true(),7)
| eval bytes_ratio=round(orig_bytes/(resp_bytes+1),4)
| eval log_duration=if(duration>0,log(duration),0)
| eval log_orig_bytes=if(orig_bytes>0,log(orig_bytes+1),0)
| eval log_resp_bytes=if(resp_bytes>0,log(resp_bytes+1),0)
| apply zeek_net_anomaly
| where anomaly_score > 0.80
| eval src="id.orig_h", dest="id.resp_h", dest_port="id.resp_p"
| lookup asset_lookup_by_str str AS src OUTPUT priority AS src_priority, category AS src_category
| eval signature="AI_NETWORK_BEHAVIOR_ANOMALY"
| eval severity=case(
    anomaly_score>0.90 AND src_priority="critical", "critical",
    anomaly_score>0.90, "high",
    anomaly_score>0.80, "medium",
    true(), "low")
| table _time, src, dest, dest_port, anomaly_score,
        src_priority, src_category, severity, signature

Conclusion

You have deployed a working edge AI detection pipeline that covers three distinct attack surfaces using real security data your environment already produces. The Isolation Forest models are behavioral baselines – they capture what normal looks like in your specific environment and score deviations against that baseline. Unlike signature-based detection, they will catch novel variants of known techniques because they detect behavioral patterns rather than specific indicators.

The architecture is maintainable. Models are retrained with a single SPL fit command. Thresholds are adjusted in correlation rule SPL. New data sources are added by writing a new training job and adding a stanza to containers.conf. The entire pipeline runs on hardware that costs a fraction of cloud GPU inference services, keeps your security data on your network, and integrates naturally into the Splunk workflow your team already uses.

What comes next depends on your environment. Node 4 is available for a fourth use case – the natural candidate is Suricata alert scoring: combining Suricata rule metadata with the Zeek anomaly scores from the same source IP and time window to prioritize which IDS alerts represent true positives versus background noise. That build would follow exactly the same pattern as the three jobs in this post.

The source code, Dockerfile, configuration files, and complete installation guide for this series are available in the companion repository.

Part 1: Architecture and Concepts Part 2: Building the DSDL-Native Inference Container on Jetson Nano Part 3: Wiring the Pipeline – DSDL Configuration, HEC, and Splunk Integration

Introduction#

Prerequisites#

Step 1 – Understanding What Isolation Forest Detects#

Step 2 – Job 1: Zeek Connection Log Anomaly Detection#

Step 3 – Job 2: DNS Tunneling and DGA Detection#

Step 4 – Job 3: Windows Authentication Anomaly Detection#

Step 5 – Tuning Thresholds and Managing False Positives#

Step 6 – Enriching ES Notable Events#

Conclusion#